IoT Data Analytics: Processing Real-World Data in Real Time
Understand the complexities of working with IoT data. Learn how to process IoT data in real time to get the most out of your IoT deployment.
IoT data analytics is a powerful application of data analytics in real time. Understanding the basic challenges and opportunities in working with IoT data will enable you to extract value from your IoT deployment. By performing IoT data analytics at scale in real time, you can leverage the full potential of IoT technologies.
This article will cover the following topics related to IoT data analytics:
- What is IoT Data Analytics?
- Why IoT Data Analytics is not just regular data analytics
- The importance of aggregating your IoT data
- Dealing with out-of-order data points
- Combining real time and batch processing
The article will use examples from the logistics and supply chain industry to illustrate key concepts and challenges. The IoT data analytics concepts discussed are equally valid for other industries. If you’d like to discuss the application IoT data analytics to your specific use case, join the #IoT-data channel on Discord.
What is IoT Data Analytics?
IoT data analytics is the analysis of data generated by Internet-of-Things (IoT) devices, such as smart watches, sensors, AI voice assistants and other devices that are connected to the Internet and can record and transmit data. IoT data analytics takes the raw data transmitted by these devices and processes it to turn it into valuable and actionable insights. Through IoT data analytics, patterns in the data generated by IoT devices can be identified which in turn can guide key business decisions.
The market for IoT solutions has grown exponentially over the past decade and is projected to increase from 201B USD in 2022 to 483B USD in 2027. Forecasts predict that by 2025 there will be over 75 billion IoT devices in active use globally. That’s a lot of data…and a lot of potential value to gain if you can crunch it right.
What’s special about IoT data analytics?
So, what’s the big deal? Data analytics has been around for decades and we know how to do it well. Isn’t IoT data analytics just regular data analytics but then with a different data source? Can’t we just take the solutions and approaches we know from regular data analytics and apply them to IoT data?
Well, not quite.
It turns out that IoT data analytics requires a particular set of skills and tools because IoT data poses unique challenges when it comes to data quality and processing requirements. Specifically, IoT data generally:
- needs to be aggregated from heterogeneous data sources, across multiple types of hardware, locations, and timezones,
- contains out-of-order data points that need to be handled appropriately, and
- often requires a mix of real-time and batch processing that not many data processing frameworks adequately support
Let’s take a closer look at why IoT data analytics is so complex and then dive into each of the three points in more detail.
IoT Data Analytics: Real-World Complexities
Because IoT data is generated by devices that exist in the real, physical world, it tends to be messier and more error- and delay-prone than data generated in purely digital contexts (such as clickstreams or web traffic). At the same time, the companies performing IoT data analytics generally want to process the data from their IoT devices and extract value as soon as possible, preferably in real time. This means that real time computations need to be enriched with insights from batch processes, which requires complex reasoning and an infrastructure that can operate comfortably across both.
Noisy GPS data makes it difficult to monitor processes and maintain quality.
Take a smart logistics container, for example. Between its departure from the warehouse and its arrival at the destination thousands of miles away, it is subject to a whole range of unpredictable external influences. These include weather patterns, human behavior, and sensor malfunction, just to name a few. If a container veers off-course or if a secured door is opened at an unplanned moment, this signal should be identified and acted upon as quickly as possible to reduce potential value loss.
IoT data analytics poses such a challenging and unique set of problems that Pathway was built precisely to solve it. The Pathway data processing framework grew out of a close collaboration with major organizations performing IoT data analytics at scale.
IoT Data Analytics: The Need to Aggregate
Now imagine you are a data analyst working for a global shipping company. You are part of the team that is deploying a massive rollout of IoT devices on the company’s fleet of shipping containers. This means not one, not ten, but millions of smart containers equipped with IoT devices that are in constant motion around the planet. How can you reliably build an IoT data analytics pipeline that will help the company ensure that the fleet (and its valuable cargo!) is in good condition and that your clients’ shipments are safe and on-track for scheduled delivery?
The key here is the ability to aggregate the data to perform advanced analytics. The data points arriving from the single IoT devices on the containers are quite simple in nature: a timestamp, latitude and longitude values, and a single or small number of values measured by the device, such as shock and temperature. Crunching this data on its own is not that powerful.
The real value is unlocked when you are able to take the data from these millions of IoT devices and aggregate them into an advanced IoT data analytics dashboard to model the process as a whole. This means going from thinking about single IoT devices that make relatively straightforward measurements to understanding entire supply chains and complex processes at a system level. This is a game-changer as it allows insights into operations at scale.
Aggregating IoT data provides valuable insights into complex processes. Use the slider to see the impact interactively.
IoT Data Analytics: Dealing with Out-of-Order Data
But that’s not the whole story. Let’s say you have successfully built your IoT data analytics dashboard that can aggregate the data points from all your IoT devices and crunch the data to return valuable insights. That’s a great first step.
But now remember that you are dealing with containers: containers that move across the planet through areas with limited internet coverage or restricted freedom of information movement. In fact, for significant parts of their journey, many shipping containers move through so-called DDIL (Disrupted, Disconnected, Interrupted or Low-Bandwidth) environments that interfere with the ability of the IoT devices on these containers to properly sync their measurements to the cloud.
This means that whenever the IoT device regains connectivity, it will send a backlog of (recent) historical data, along with the continued transmission of its real time measurements. In other words, the order of the points in our data stream has been disrupted. This requires a data processing framework that is able to handle out-of-order data points smoothly.
IoT streaming data often arrives out-of-order.
Pathway was built to work with IoT data. Its incremental engine enables you to build reactive data products that will automatically update themselves whenever the out-of-order data points arrive. As a developer, you only have to describe the logic once and Pathway will handle the data updates under the hood.
IoT Data Analytics: Combining Batch and Real-Time
Now let’s go one last step further. Your IoT data analytics system is in place and you’ve built a system that is able to confidently handle out-of-order data points. The next challenge is one of interpretation: how do you know whether an alert coming from one of your IoT devices requires action?
Let’s say one of your shipping containers has a refrigerated section which is monitored by an IoT device. This IoT device measures the temperature and sends out an alert if the temperature rises above a certain threshold value. Now, it is very possible that in the course of the container’s journey the temperature might rise in the refrigerated section for completely legitimate reasons that do not require any kind of costly intervention. The door might be opened at a customs inspection, for example, or the contents may have been off-loaded and refrigeration is no longer needed.
This means you can’t depend on just the IoT data coming from the temperature sensor. In order to make the right decisions, you need to perform IoT data analytics on this data point within the context of other data points about the journey, the cargo, and the environment. This is called
.Contextual anomaly detection requires a mix of real-time processing of the IoT data together with batch processing of available historical data in order to appropriately interpret the signal coming from your IoT device. By combining your real time IoT data stream with historical data, you can then enrich your data pipeline to guide decision-making.
Pathway makes it easy to switch from batch code to streaming.
IoT Data Analytics: Conclusion
This article has introduced you to the concept of IoT data analytics and presented the main challenges and opportunities of working with IoT data. Understanding these concepts will enable you to extract value from your IoT deployment. Performing IoT data analytics is most powerful when done at scale and in real time. Pathway helps you do this by making it easy for developers to switch from batch to real-time with a single Python syntax and by supporting incremental updates to effectively handle out-of-order data points. To get started with Pathway, take a look at the User Guide.
If you have more questions or would like to discuss the application IoT data analytics to your specific use case, join the #IoT-data channel on Discord.