From Raw Telemetry to Actionable Insights: Building an ML Pipeline for Industrial Machinery Usage Detection

For many equipment manufacturers, one fundamental question remains unanswered:

How are their machines actually being used in the field?

Industrial machinery often operates in diverse, unpredictable environments. And the same machine can be used for different applications. Yet, despite decades of equipment expertise, many manufacturers still lack structured insight into how their machines are actually used in the field.

One global machinery manufacturer approached us with a clear challenge:

They had an IoT platform streaming telemetry data from thousands of machines worldwide. However, they had no insights into what applications these machines are being used for. These insights would enable them to:

Product optimisation: Engineers can adapt machine design to reflect how customers actually use them.
Sales enablement: If a customer uses a machine unsuited for their task, sales teams can recommend better-fit equipment.
Energy efficiency: Customers who don’t know how to put the machine in Eco mode but rather leave it idling for hours, can be made aware and save fuel.
Marketing campaigns: Tailored marketing campaigns will have a bigger impact.
Service and maintenance: Linking applications to wear-and-tear improves predictive maintenance accuracy.
Market insights: Understanding which applications dominate across regions or industries informs future R&D and strategy.

This blog post outlines the step-by-step process we designed to turn raw telemetry into meaningful machine-usage insights, giving the manufacturer a unique level of visibility into machine behaviour across their global fleet.

From Raw Data to Actionable Insights

Step 1 – Data Aggregation and Feature Engineering

Every run of a machine can last hours and generates thousands of datapoints across multiple sensor signals and operational states (such as idle, eco, or active).

Our first step was to summarise each run into a set of features — numerical descriptors that capture the shape and behaviour of the signals. We engineered more than a hundred such features, including:

Statistical summaries: averages, percentiles, variation, skewness, kurtosis.
Dynamic behaviour: how signals rise or fall together, how often peaks occur, and how sharply values change.
Operating modes: the time spent idle, in eco mode, or actively running.
Shape-based features: automatically extracted patterns from the time series that repeat across different runs.

For more complex shapes, we applied automated feature extraction methods (e.g. MultiRocket) to capture “shapelets” repeating local motifs in time series.

This way, instead of thousands of datapoints per run, each run was summarised in a set of high-quality features, rich enough to capture what happened, simple enough to compare across machines.

Step 2 – Clustering and Dimensionality Reduction

With features in place, we wanted to see whether runs naturally grouped into distinct categories. To do this, we applied a dimensionality reduction technique (t-SNE), which compresses the complex, multi-feature data into two dimensions.

This allowed us to identify recurring usage patterns of runs that consistently behaved alike, and these usage patterns became the foundation for labelling and training models.

Step 3 – Semi-Automated Labelling

A machine learning model is only as good as its training data. Our client had no labelled dataset to start with, so we built one using an iterative labelling process that blended automation with expert validation:

Clustering-assisted filtering: Analysts zoomed into dense clusters of runs and labelled them in bulk when similarities were obvious.
Rule-based labelling: Simple, repetitive behaviours like constant idle or steady eco mode were auto-labelled using predefined rules.
Model-assisted suggestions: Early classifiers proposed labels that experts validated or corrected. These corrections became new training data.

This approach steadily expanded the labelled dataset without overwhelming domain experts, making the labelling process efficient.

Step 4 – Model Training and Evaluation

Once enough patterns were labelled, we trained supervised classification models to automatically predict them on new runs.

All experiments were tracked using Databricks’ MLflow integration, which enabled us to:

Compare models trained on different datasets or features.
Visualise performance with metrics and confusion matrices.
Version models so that the best-performing ones could be deployed with confidence.

This systematic approach ensured models improved over time and were production-ready.

Step 5 – Recurrent Jobs and Data Engineering

To move beyond one-off analysis, we built a recurring data pipeline in Databricks. Every new run streamed from the client’s IoT platform was automatically processed:

Features were generated.
The trained model classified the run into patterns.
Results were written back to structured tables for downstream use.

This automation meant the system evolves continuously, incorporating fresh field data without manual intervention.

Step 6 – Insights and Dashboards

Finally, insights were surfaced through interactive dashboards. These dashboards showed:

Per machine: timelines of runs, with each run colour-coded by detected pattern, making usage behaviour easy to follow.
Per machine model: aggregated statistics that revealed how models were being used across different regions and industries.
Opportunities for action: machines that spent most of their time idling or running inefficiently could be flagged, opening discussions with customers on energy savings or better equipment fit.

This gave engineers and business teams unprecedented visibility:

Machines are consistently misused (e.g., switched on but unused in 80% of runs).
Regional anomalies (e.g., faster wear of machines in certain countries linked to local conditions).

One striking insight was the prevalence of “idle” runs, machines idling for hours, burning diesel with no productive output. These cases alone opened immediate cost-saving opportunities for customers.

Conclusion

Transforming raw IoT telemetry into actionable insights is not straightforward. Our project showed that:

The key bottleneck is not the model, but curating a high-quality labelled dataset.
Applications break down into repeating patterns, which must be detected before higher-level application classification.
Hybrid labelling is essential. Combining expert rules, clustering, and model-assisted review enabled rapid dataset growth.
Feature engineering is still king; despite advances in autoML and time-series feature extraction, domain-guided features proved critical.
Visual dashboards drive adoption by connecting technical insights to business value.

With this foundation in place, our client can now expand pattern detection across their global fleet, connect usage insights to commercial opportunities, and ultimately close the loop between machine telemetry, engineering design, and customer engagement.

Part of sequence:

No items found.

Jeroen Boeye

Head of AI

From Raw Telemetry to Actionable Insights: Building an ML Pipeline for Industrial Machinery Usage Detection