Human activity capturer and klassifier wins first prize at Vinci Energies hackaton

Share on facebook
Share on twitter
Share on linkedin
We like hackathons. And this weekend, a delegation of Faktion’s machine learning team participated in a very special one: the VINCI Energies Human Beyond Digital Hackathon in Frankfurt. We were challenged to detect human activities in videos. Out of 12 international expert teams, Faktion won first place by creating the most accurate model of all submissions.

Current situation – Human video operators

Security cameras send a live video feed to a control center where operators try to detect emergency situations that require their response. We humans can only focus on just one video feed at a time. However, their job requires them to keep track of many. They are drowning in too much data and have no means of prioritizing what to watch. When an event of interest happens – e.g. when a person faints – often that event gets noticed after valuable minutes have passed. Sometimes, it is even lost in the mountain of video and never seen. This delay in response time costs lives. For example, in case of cardiac arrest survival probabilities go down by 10% (source) each minute, until the person receives proper help.

Solution – detecting human behaviour and suggesting a response

Using the power of A.I. we can improve this system by giving each video feed the attention it deserves at all times. Out of all contestants our team managed to build both the most accurate and the fastest model to detect human behaviour in real-time. We used a state of the art two-stage deep neural network to detect persons, objects, motions and behaviour of the people in the frames.

While VINCI Energies challenged our team to classify individual video files they realised the actual use case would involve video streaming data. That’s why we tailored our solution appropriately by showing live probabilities for the behaviours of interest. When the live probability of a behaviour reaches a critical threshold, the model suggests the appropriate response to the operator.

In case of a medical emergency he could send an ambulance to the right location with a single click, saving crucial time and valuable minutes. Lower response times increase the quality of our solution. Additionally, by focussing on the videos where things are actually happening, each operator can handle more video streams and costs go down. A double win!

An very crude interface built during the hackathon to visualise the top 3 behaviours detected in the video both overall (bar charts left) and in real time (below video).

Implementation

Using video data is a quite interesting case for a classification problem because each video contains important information both in the spatial and the temporal domain. Spatial information is available in each individual frame (e.g. shape and location of various objects in-frame) and temporal information lies in the context of a frame in relation to earlier or later frames in time. To capture all this information available in videos and create an efficient classifier we built a Deep Neural Network architecture that combines the effectiveness of Convolutional Neural Networks (CNNs) to detect spatial features and the ability of Long Short-Term Memory networks (LSTMs) to capture temporal features. Using transfer learning, we avoided training a CNN from scratch and instead employed the pre-trained Inception-v3 model from Google, which gives state-of-the-art results in the Imagenet classification challenge (3.46% top-5 accuracy, 1.2 million images, 1000 classes). For each frame, we extracted features from its final pool layer and that produced our input sequences to be fed to the LSTM. The model was trained on an Nvidia-provided DGX-1 station (8x Tesla V100 GPUs) and managed to accurately classify behaviors in the given videos.

Deep learning architecture implemented

Future – activating passive cameras

When our model makes a poor suggestion, the operator just declines it and the model learns from its mistakes. The operator and the model will thus both improve each others performance. In the future, our model will be even more accurate and will be deployed on cameras that currently don’t have an operator but solely collect evidence. Since the large majority of security camera systems fall in this category, the potential is huge!

Jeroen_Boeye-IMG_6370
Jeroen Boeye, PhD
Head of Sensor Data
About the author

I enjoy unlocking the hidden value in data. The techniques I use to do so include data cleaning, wrangling and machine learning. To transfer the lessons learned I create clear and attractive visuals.

Related blog posts

The Jane smart alert system has matured through several iterations with testers providing invaluable feedback that allowed the system to reach the accuracy it has today.
While simple in nature, averages are tricky and deceptive when misused, the variance in your data is a treasure!
When you throw a fresh dataset at a Python data scientist the first thing he or she will do is spin up a Jupyter notebook and dig in. Notebooks offer you the freedom to run and tweak a block of code until you’re happy with it, add some nicely formatted documentation with a plot, and then move on to the next code block.

LET'S TALK

Curious to learn what we can do for you?

Scroll to Top

We use cookies to improve user experience and analyze website traffic. For these reasons, we may share your site usage data with our analytics partners. By clicking “Accept” you consent to store on your device all the technologies described in our Cookie Policy. You can change your cookie settings at any time by clicking “Cookie Preferences” in the footer. Please read our Terms and Conditions and Privacy Policy for full details.

Inquiry for your POC