The Teachable Camera

August 28, 2020

Luke Berndt • Senior Director

John Speed Meyers • Data Scientist

This post originally appeared on the IQT Blog.

(Or Has Peter Rabbit Met His Match?)

Beatrix Potter’s The Tale of Peter Rabbit might have turned out differently if Mr. McGregor, the gardener, and antagonist in the classic children’s book, had employed the technology embodied in the Teachable Camera, a new open-source computer vision hardware and software kit created by IQT Labs. Unfortunately for the protagonist Peter, a rabbit intent on eating vegetables in Mr. McGregor’s garden, the Teachable Camera is a low-power, rugged, inexpensive, video camera capable of performing computer vision on-device and, moreover, enables the user to retrain the machine learning models for custom object detection. The importance of this technological combination — which we will explain further — reaches beyond imaginary tales though; the cheap computing enabled by computers like Raspberry Pi, the increasing ease of deploying machine learning models on small computers, and the growth of user-friendly, open-source machine learning software means that citizens, companies, and governments will enter an age in which low-cost smart sensors proliferate. This short article, the first in our Teachable Camera series, explains the underlying technology and uses the story of Peter to explain how to use computer vision and the Teachable Camera to monitor remote locations.

Teachable Camera Briefly Explained

This combined hardware and software project allows a user to deploy customized computer vision models to low-cost, embedded machine learning accelerators, such as Google’s Coral Development Board. The Teachable Camera performs object detection, i.e., detecting what types of objects are in a camera image, on-device, and without an internet connection. When specific objects are detected, the camera sends an alert back to the operator. The machine learning models included as part of the Teachable Camera use a neural network architecture called MobileNet, a deep learning model designed especially for mobile phones and other small computers. This model is initially trained on a dataset called MS COCO (i.e., Microsoft’s Common Objects in Context) that enables MobileNet’s deep learning models to recognize a general set of objects in realistic environments. The user can then use the software contained in the Teachable Camera, which builds upon TensorFlow Lite (a machine learning software designed for embedded and mobile devices), to train their own models that perform object detection. Additional technical details are available on the project’s open-source Github page.

How On-Device Computer Vision Helps Mr. McGregor

Beatrix Potter makes it clear that Mr. McGregor suffers from a problem faced by most gardeners: just as his produce reaches the peak of perfection and becomes ready for harvest, someone (or something) mysteriously pilfers it. Mr. McGregor suspects a local rabbit, squirrel, or even badger as the culprit, but he cannot rule out his neighbor’s kid.

You can imagine Mr. McGregor spending a sleepless night in the garden to little avail. It’s also not far-fetched to imagine Mr. McGregor devising an alarm that is triggered by motion or setting up a security camera. Unfortunately, the motion detector is likely to generate many false alerts and the security camera requires Mr. McGregor to constantly monitor the screen to stop the culprit in the middle of the act. If he was a thoroughly modern man, Mr. McGregor might even decide that machine learning may solve his problems. However, he will likely be disappointed to find that his Nest camera, which needs to stream its footage to the cloud for processing, cannot be moved to the garden because his Wi-Fi doesn’t reach that far.

Luckily for Mr. McGregor (though not Peter), the Teachable Camera solves or at least ameliorates these problems. The software in the Teachable Camera uses computer vision, not just motion detection, to detect when an object (or specific objects) of interest is present in the image, reducing false alerts. Additionally, Teachable Camera’s use of computer vision models also means that Mr. McGregor does not need to keep his attention on a security camera feed and can instead receive custom alerts. Finally, the Teachable Camera overcomes the internet connection issue; the machine learning is done on the device (and not in the cloud) and the camera uses a wireless protocol called LoRa that can operate in a remote area without internet connectivity.

Training Customized Models that Help Mr. McGregor

The Teachable Camera also solves a second problem for McGregor. Even if he had been able to extend the range of Wi-Fi and connect a Nest or Nest-like device to the internet, Mr. McGregor would still have lacked the ability to detect and identify small animals. His smart device might have had a person detection capability or maybe even an animal detection capability, but it’s unlikely that his device will have a rabbit detection capability. If Mr. McGregor wants to “teach” (to borrow the metaphor of machine learning) the camera to recognize animals that interest him, then he will need to retrain the on-board machine learning models. Unfortunately, most computer vision-enabled security cameras do not possess a feature that lets users easily retrain the models on new data and with new object classes.

The Teachable Camera, however, makes retraining the models on the hardware straightforward. For instance, Mr. McGregor could annotate all the small animal images he has already collected, using more specific labels such as “bunny” and “squirrel.” He can then “retrain” the models on a standard computer using a graphics processing unit (GPU) appropriate for machine learning. After retraining the model with more specific animal categories, he can then deploy those models back to the camera. Watch out Peter Rabbit!

Limitations or Why Peter Rabbit Might Still Evade Detection

Before you worry about Peter Rabbit’s safety, we should point out that the Teachable Camera might still not be enough for Mr. McGregor to actually catch his vegetable thief. The computer vision models embedded in the Teachable Camera can be sensitive to the camera angles associated with the images on which the models are trained and also the time of day. Additionally, even setting aside the ability for Peter to engage in deception, deep learning models do still make mistakes. That said, this technological suite of low-cost computers and cameras combined with capable open-source machine learning software seems to herald a transformation of cameras from simple capture devices into sensors that provide higher-level information. Harnessing these technologies will be the challenge for engineers, technologists, and policy-makers alike. We invite you to download the Teachable Camera and see if it can work for you! (And a final note: Don’t go catching Peter! Just put a fence up.)