- You are here: Home > Remote Data Ecosystems: Transforming Data
You are here
Remote Data Ecosystems: Transforming Data
This article is the fourth in a series about what it takes to deploy a fully functioning remote data ecosystem for your organization. Anywhere you have assets in the field, whether it’s machinery, people, tools, light vehicles, or specialized equipment, those assets are generating information that could be valuable or critical to running a successful and efficient operation. Over the past few decades, connectivity and communications systems have improved to the point that there is no longer an excuse for not having that kind of information at your fingertips when you need it.
Unfortunately, a lot of people in decision making roles don’t have the expertise to set up a data ecosystem, and they rely on technology partners, who often take advantage of the knowledge gap. GSE believes in transparent business relationships, because well-informed customers and partners make better long-term decisions. For that reason, we have created this series to make sure everyone understands the functional components of a data ecosystem, and how a technology partner like GSE can help you build the one that best suits the budget and needs of your organization.
Part Four: Transforming Data
Data isn't clean. Sensors don't tell you what you want to know in a format that makes sense to you, and machine CAN networks don't generate outputs and command logs that you could look at, read, and understand. They generally produce messages that can be translated by other machines that are trained to understand, parse, and transform those messages into usable data. While transformation of data is rarely considered a decision point for most data solutions, Satellite-based solutions, particularly Satellite IoT solutions, should take time to consider the data transformation strategy because of the significant influence it can have on the total cost of the solution.
Decision Point: Where to Transform Data
Transformation of data can occur in two places: in the cloud and/or on the edge. Before you come at me and say something about your on-prem data storage, please understand that your cloud is just in your building instead of someone else’s. If you prefer, we can say on the edge and on the server, but let's not get too hung up on semantics. What matters in this decision is where the computer(s) exist that will be transforming your data for you, and whether that transformation happens before or after you send data over the air. Where and how you transform data can have tremendous effects on the cost, complexity, and performance of your data solution.
Understanding Options: Transforming Data on the Edge
Most hardware terminals will be able to parse sensor readings and process other raw messages into packets, but what happens to those packets is what determines your data transformation strategy. The major differentiator here is the demands of the hardware that will be part of your solution. Going back to what we learned in Remote Data Ecosystems Part 2 - Gathering Data, there are many decisions that influence your hardware requirements, and this is one of them. Knowing whether or not you want to process data in some capacity before it's sent over the air is important, as it narrows the hardware options available for your solution to only those capable of edge computing, aggregation, and/or data storage.
Likewise, if you choose to transform data on the edge, you're committing to taking a strategy of having a higher up-front cost in the effort to minimize your variable costs by controlling the amount of satellite airtime you consume. Edge computing can be as basic as aggregating message data into averages and counts, or as complex as applying logic to filter out "noise" reports or only identify very specific sequences of data or event triggers before sending information.
Understanding Options: Transforming Data in the Cloud
Transforming data after it has been sent over the air is the standard approach for most non-satellite solutions, as data consumption costs are significantly lower. This is also a large part of why non-satellite IoT devices are so inexpensive; there's hardly any intelligence required or built into them, as all of the transformation and computation takes place after the data is sent from the field. For satellite, however, taking this approach almost requires a solution that is intended to provide basic data or send data less frequently.
The benefits of this approach are generally less expensive hardware costs up front, and when used properly, more accurate data logs. Because the terminal doesn't parse, combine, or filter any of the messages produced by the the sensors or CAN networks, the message logs on the server side are much more complete, regular, and representative of a complete story. Additionally, the computing power is nearly limitless after the data has been sent from the field, which allows for much more advanced analysis.
Understanding Options: Hybrid Approach
In nearly every circumstance for satellite-based data ecosystems, particularly IoT applications, it will be correct to run some amount of edge computing logic simply to cut down on the airtime costs associated with sending "useless" data. Ultimately, the value of data is its ability to deliver relevant insights that can be used to assess a situation and/or inform a decision. Being able to filter out 50% or more at the source helps with prioritizing the information that matters most.
The fact remains, unfortunately, that edge computing requires a strong understanding of what is and is not useful, and in many cases, how to program that logic into a terminal. The level of expertise necessary for these types of hybrid solutions is not something that most companies can expect to have in-house, so a technology partner is the best way to ensure a good match between technology and data strategy.
Applying Context: Data Science
The lens through which we view data transformation has a lot to do with the fundamentals of data analytics and data science. The type of data analytics that we want to perform (Descriptive, Diagnostic, Predictive, and Prescriptive) tells us what kind of data we need, and how frequently we need it. The appetite we have for data science and modeling tells us the volume of data we need to capture and clean.
Applying logic on the edge where we are collecting data could potentially preclude our ability to identify unforeseen patterns through deep learning. Narrowing the scope of data collection is great when we are certain of what we want to know, how we plan to use the data, and how much we would like to control the budget. Broadening the scope of collection aligns more with an approach of data discovery, and with organizations looking for deep learning opportunities.
GSE Expertise: Finding the Right Strategy for your Budget
Your organization's data strategy is only as strong and efficient as your ability to understand your data collection needs, and then marry those needs with the technology that achieves your goals. Going in with a low understanding of what you intend to do with data can leave you transmitting more than is necessary from the field, and suffering over-inflated data movement costs. Furthermore, however, a low understanding of collecting and processing technology can leave you over-paying for a hardware solution for an incredibly basic data gathering requirement.
GSE has spent decades helping clients understand the balance between the costs of terminal intelligence and the costs of airtime, and how to find the best mix for any solution. If, for example, the terminal costs increase $300 up front for edge capabilities, but parse 70% of data that doesn't need to be sent, it could result in the savings of thousands of dollars worth of airtime over the life of the project. For a network with dozens or hundreds of nodes capable of communicating with each other, however, it may still be the case that edge computing is going to save money, but possibly not when deployed to every unit in the ecosystem. GSE knows how to configure a single parsing terminal that receives data from the other nodes, and then sends back messages for those units to send over the air when appropriate.
To put it simply, GSE knows exactly how to help you with your data strategy in a way that maximizes the budget and keeps your data costs controlled. If that's something you'd like help with, please get in touch with us.
Next up: Delivering Data
The last article in this series will be covering the process of putting insights from the field into your hands. Yes, that means it's the GSatTrack one! We love discussing the industry's best portal, but there are also other ways to deliver data, and we want to make sure you understand everything that's available, from data routing to API connections. These articles build on each other, so we’re excited to have you along for this educational journey.