Working of Smart Home Devices

Now that the general idea of a Smart Home Device is uncovered, there needs to be more clarity about the physical workings of these devices.
What do they save? What do they process and where do they process this? But maybe the best question to begin with is;
What sensors do smart home devices have?

1.0Sensors

Smart home devices are designed for different intended use cases.

Picture of a Smart Lightbulb zoofy.nl
Smart Lightbulb

Just on/off.

Picture of a Google Home youversion.com
Google Home

Sound.

Picture of a Smart Vacuum irobot.com
Smart Vacuum

Environment scanning sensors.

Picture of a Ring Doorbell ring.com
Ring Doorbell

Sound and Visuals.

Each of these devices is designed for specific use cases. Depending on the intended function, these use cases may require multiple sensors to ensure the device is functioning properly.

1.1. Google Nest Mini

Let’s take the Google Home Mini or now called Nest Mini. This device is as the manufacturer puts it:

“The little speaker that’s a huge help around the house.”

A little speaker for sure, but what does it do on the inside to be the "huge help"?
This speaker is equipped with capacitive touch sensors (as buttons) and three far-field microphones and of course a speaker.

These sensors are needed to create the “huge help”. That begs the question; what more does it pack on the inside?
This particular device has only some processors and processing units. Not too interesting.

1.2. Ring Doorbell

The other device we are going to check is the Ring Doorbell. As Ring puts it:

“Ring Video Doorbells allow you to see, hear and speak to visitors, from anywhere.”

Well, isn’t that great, always being able to check who is in front of your door.

As far as sensors go Ring doesn’t really tell what is in the device specifically like Google did.
Ring tells us that it has video capabilities, motion detection and two-way audio. What does it really have inside? That’s not exactly clear from looking at the product page.

After some more digging we found that the Ring Doorbell has an HD camera, a button, a speaker, microphone, “motion detection sensors” and infrared LEDS for the night vision of the camera.
With all these sensors a lot more data can be processed compared to the Google Nest Mini.

2.0Data Processing

Alright, now that we know the insides of a small smart speaker and a camera doorbell. The following question comes to mind: "what is controlling the sensors and what does the device do with the collected data?"

2.1. Google Nest Mini

The Google Nest Mini has multiple microphones that listen to the magic word;

“Ok Google.” Right?

According to Google they say, quote:

“Google Assistant is designed to wait in standby mode until it detects an activation, like when it hears "Hey Google."”

They say in standby mode, the device records small parts of audio every few seconds to check for the magic word. After that the device is activated (showed by an indicator light) and will record the request from the user. This request will at last be send to the servers of Google. More about the servers later.

2.2. Ring Doorbell

The Ring Doorbell boasts a motion detection feature that's always on the lookout for movement.

But how does it do this?

Let’s start with the main component, the camera. The camera, according to Ring, only starts recording when one of the motion sensors sees, well, motion. The camera also starts recording when the doorbell is pressed physically.
The Ring Doorbell also has a microphone, so this will be recorded together with the camera footage. Depending on if you have a subscription or not these videos will be stored online.

They say it’s safely stored against anyone else. But is it really? Is it just a security camera for you? You will get information about this later in this course.

3.0

Where is the data being processed?

Having explored the sensors within two smart devices and the data they process, let's turn our attention to the processing itself and where it takes place.

First, it's crucial for us to distinguish between on-device and cloud processing. This is a subtle difference that can easily be overlooked, especially in our fast-paced, tech-savy era.

3.1. On-Device

With On-Device processing it means that the data DOES NOT leave the device. It is processed On-Device. Nowadays this is possible by things called ’Machine Learning Engines’.

Machine Learning Engines can be thought of as processors or computers programmed with specific “tasks” to search for. Take the Google Nest Mini as an example. It's programmed to respond to the command "Hey Google". Google has trained a program to listen specifically for these two words. Just these two.

So, the Nest Mini is always listening for voice snippets, and the internal computer swiftly checks if these snippets contain the activation phrase. Detecting this isn't a challenging task, and it's something that can be performed locally on the device itself. That's why it's called on-device.

Some other things that can be done On-Device are:

GIF of a smart speaker detecting a wake word Giphy.com
Wake Word Detection

Smart speakers like Google Home or Amazon Echo use on-device machine learning to detect their wake words (Hey Google or Alexa). The device is constantly listening for these specific words on the device itself.

GIF of a smartphone using facial recognition Tenor.com
Facial Recognition

Many smartphones use on-device machine learning for facial recognition features. When you unlock your phone with your face, the device uses a machine learning algorithm to compare the current image with the stored image of your face.

GIF of a smartphone keyboard predicting text Stackoverflow.com
Text Prediction

Text prediction on your smartphone keyboard is a form of on-device machine learning. As you type, the device predicts what word you'll type next based on your typing history.

GIF of a fitness tracker recognizing physical activity Pinterest.com
Activity Recognition

Many fitness trackers and smartwatches use on-device machine learning to recognize different types of physical activity (like walking, running, or cycling). These devices use machine learning algorithms to analyze sensor data and categorize the user's current activity.

3.2. Cloud

With Cloud processing it means that the data DOES leave the device. It is processed in the cloud. This happens in massive data centers scattered across the globe.

Like on-device processing, these data centers also utilize "Machine Learning Engines" to analyze data. However, it's essential to understand that these data-center computers are way, WAY bigger and more powerful than the modest computer in a device like the Google Nest Mini.

Because of their superior processing power, these computers in the data centers can run much more advanced "Machine Learning" programs.

Consider a voice-activated speaker like the Google Nest Mini: when you pose a question to the device, your voice is recorded and transmitted over the internet to Google's cloud servers. These servers employ machine learning algorithms to interpret your speech, determine the best response, and then send that response back to your device.

This process demands the computational power and storage capacity of the cloud, as it involves complex tasks like natural language processing and accessing large databases of information. This is just not feasible with on-device processing.

Other examples of Cloud processing are:

GIF of a social media platform using image recognition Gifer.com
Image Recognition and Analysis

Social media platforms like Facebook use cloud-based machine learning for their image recognition features. When you upload a photo, it's sent to Facebook's cloud servers, which use machine learning algorithms to identify and tag faces, recognize objects, and even interpret the content of the image. This requires significant computational power and large datasets for training, which are available in the cloud.

GIF of a chatbot using conversational AI Giphy.com
Chatbots and Conversational AI

Cloud-based machine learning powers advanced chatbots and conversational AI systems like OpenAI's GPT-4. When you interact with a chatbot, your input is sent to the cloud. Here, machine learning algorithms process the text, comprehend the context, and generate a relevant response.

3.3. Usage Google Nest

Now that we've established the difference between on-device and cloud processing, let's examine what happens with the data from our Google Nest Mini.

First, the device will listen On-Device to a wake-word; “Hey Google”. It does this every few seconds. If it does detect the word, it will record the sentences after “Hey Google” and send all of this to the Cloud. All of this information is then processed on one of the computers in a datacenter. Once it understands your question, it attempts to find the best response. When it does, it sends the answer back to your device. It's a repeating process.

But does this mean that Google Nest Mini doesn't listen to other words or things we say? We can't say for sure. This is where we have to place our trust in large companies like Google, Amazon, Meta, Microsoft and Apple and hope that they don't misuse the data.

The thing we do know, is that your voice snippets will be sent to one of the big companies to check if the wake word has been said. Human reviewers also transcribe and annotate voice clips to enhance speech recognition systems.

This means that your conversations could potentially be reviewed by employees.

3.4. Usage Ring Doorbell

Now, let's delve into a product that uses more data - the Ring Doorbell.

The information about what is processed On-Device and what is processed in the Cloud for this device is rather limited. As such, we must make some educated guesses about where certain processes occur.

It's likely that the motion sensors process everything on-device. This is because motion detection isn't a complex task and needs to be done quickly, which is best achieved on-device.

Once motion is detected, the camera and microphone kick into action, recording videos. Here's where things get murky. We're not certain if features like "package detection" or "person detection" are processed on-device. This is exactly why this needs to be more transparent.

When a subscription is bought from Ring every time motion is detected the video will be stored in your online cloud storage. However, if you don’t have this it’s possible to check the feed on your mobile device.

So, what's happening in the cloud? Perhaps the package or person detection is processed there. Or maybe even some sophisticated facial recognition?
According to Ring, that's not the case:

"Ring does not have facial recognition technology in any of its devices or services."

Yet, it's been reported that Amazon has used public videos to train a facial recognition system.[1][2]

As for the access of the footage by others, that's another question that needs answering. According to Ring:

“A small number of video recordings are viewed by our research and development team to improve Ring’s products, services and technology. These video recordings are either from customers who have made them publicly available (by posting them on the Neighbors App* or otherwise on the Internet), or from customers, team members and their friends and family who have given us explicit permission to use them for this purpose (which they may revoke at any time).”

From this, we can conclude that they can access your videos only after obtaining explicit permission. Yet, this does suggest that they have the potential to view your recordings. But how can you be sure they aren't doing this without your consent?

The reality is you can't be certain. There have been numerous cases where data was provided to law enforcement without the user's consent[1], raising serious concerns about privacy and data security.

4.0Dangers

As these devices are used a lot, privacy & security experts have researched them extensively.

For the smart speakers discussed earlier, there's no concrete evidence to suggest that the devices are constantly recording audio. This implies that the On-Device wake word detection is likely in operation, and only after the wake word is detected is the data sent to the cloud.

The only thing we can hope is that the audio that is reviewed is solely used for system improvement and processing responses to requests, and not for any other purposes.

Test your knowledge on this chapter!

What is the main concern about the use of smart devices?

They are too expensive

They may record and share data without consent

They are not user-friendly

They are not durable

What is suggested about the operation of smart speakers?

They record audio constantly

They only record audio after a wake word is said

They do not record any audio

They record audio at random intervals

What incident is mentioned regarding Amazon's reviewers and audio clips?

Reviewers used audio clips for amusement

Reviewers intervened in potential criminal situations

Reviewers had direct access to personal customer information

None of the above

What information is associated with Amazon's voice recordings?

Account number, first name, and device serial number

Personally identifiable information

Randomized identifier

None of the above