Recently we heard news that the police had requested Alexa recordings to assist with a murder enquiry. The victim had an Amazon Echo and the police feel there’s useful data to be obtained.
This leads to speculation about what sort of information is recorded by these devices, and how secure are they?
What type of devices are we talking about?
There are a number of devices out there these days which you can talk to, to request things. These include Apple’s “Siri”, Google’s nameless “Assistant”, Amazon’s “Alexa” and Microsoft’s “Cortana”.
Some of these require you to press a button to wake them up, but others are always listening for a “wake word”. Here you might say “Hey Siri, what’s the weather?“. The device hears the magic word “Hey Siri” and then works out what you said and acts accordingly.
These devices come in varied form factors; any modern smart phone likely has an assistant which may be voice activated (and may be enabled by default; there may be switches to turn it off or convert it to activate in limited cases); your smart watch may have it (my Motorola 360 will listen for “OK Google” but only when the watch is active; so raise your arm to wake it up and talk to it); your laptop may have it (macOS Sierra has Siri, but requires a button press to activate); tablets may have it; and, of course, the new “hot” items of Amazon Echo and Google Home, which are designed to sit in your house and listen to you.
How do they work?
These devices don’t typically process your voice commands locally. They just don’t have the processing power or storage to understand natural language commands. Instead they are always listening to you for the wake word. These are generally fixed or (as with Samsung’s attempt) trainable. This processing is done locally.
Once the magic word is said then the device will start streaming data to the service provider, along with a small buffer of data from before the command (Amazon claim “a fraction of a second” in their FAQ).
The cloud service now has a copy of your voice recording, can attempt to parse it, and respond accordingly. Because the activity is parsed remotely it can be made smarter and updated and new commands added just by the provider updating their servers; no remote update is needed. New “skills” can be added at any time.
Amazon Echo in practice
I received an Echo as a present so, of course, I started to network sniff to see what it was doing. Excluding DHCP/ARP and other normal local traffic, a few things stood out.
Every 30 seconds I saw a UDP packet of 159 bytes being sent out to some ec2 instance on port 33434. This might be the Amazon Device Messaging (ADM) which implies a remote command might be able to wake up your Echo up.
Every 30 seconds, HTTPS traffic to an Amazon IP address (205.251.242.52 in my sample) of 41 bytes. This would seem to be another polling mechanism.
NTP traffic to various places
Infrequent ping traffic to the default gateway, the DNS server I listed
in DHCP and Google and OpenDNS DNS servers, followed by DNS lookups for
www.example.org
and a HTTP attempt to the result. This is possibly
a connectivity attempt to verify the WiFi connection is still working
properly.
A small communication with device-metrics-us.amazon.com
. Hmm!
And that’s all I saw when the Echo was idle.
When I spoke the activity word then I started to see a lot of https traffic to the same Amazon IP address previously mentioned. So perhaps the 30 seconds polling test mentioned earlier may just be a “is the remote end still there?” test, to allow for quicker sending of data. After the conversation had completed then the traffic resumed back to idle levels.
I also tested streaming of music (“Alexa play 80s pop”). What was interesting, here, was that the song appears to be buffered locally; I could see a LOT of https traffic (from a cloudfront server) which then stopped. So it’s clear the Echo has enough memory to store a 4 minute song locally.
What are the risks?
Based on the above data, the risk would appear to be small. Let’s look at it:
Amazon etc store your voice recordings
They do this to help train your device to your voice (or so they claim). In theory this now creates a record of all your requests and so could become subject to a subpoena. Even if you trust Amazon/Google/whoever will they fight for you to keep this data out of the hands of the police?
How much do they send before the magic word?
We only have their word that a fraction of a second is sent prior to the magic word. They might lie.
Activity may change
What I record is how Echo works today. I have no doubt that Amazon can force a firmware update to devices as necessary. So some of the protections we see (e.g. the LED ring lighting up to show streaming) might disappear. Could the Feds demand a version of the firmware that would record and stream permanently without a visual indicator? How many of the protections are software and how many are hardwired?
Attacks against the device
A simple nmap
of the device shows no listening ports, so an attacker on
your local network (remotely by a router exploit!) do not appear to have
an attack vector this way. This leaves custom skills. Now these are
processed remotely on a remote server; the amount of data sent back is
primarily limited to a “speech” text channel or to an audio channel. If
there was a bug in the codec or text-to-speech modules then an attacker
might gain access that way, but it requires you to have enabled their
skill.
Other risks
Attacks against the account
This, in my mind, is more likely… your voice recordings and enabled skills are all controled via the Alexa site, and this is protected by your normal Amazon password. Is that secure? The data you’re now protecting has extended beyond your purchase history and ability to buy stuff, but to voice recordings and command histories.
Inappropriate use
There’s a second class of risk relating to inappropriate use of these types of device (why would you link it to your bank account when there are no voice level controls? Anyone in your house could ask for the same information). This includes the built-in “buy stuff from Amazon” skill, so turn that off! But that’s another blog post :-)
Conclusion
Everyone has to make their own risk assessment of these devices. Based on what I’ve observed and tested, I don’t think the Amazon Echo creates a large exposure. Although the device may be “always listening” it doesn’t send data until the magic word is spoken. I’ve seen a twitter joke about Orwell’s 1984 along the lines of “we didn’t need to have enforced surveillance, we brought it into our own homes”, but I don’t think this is accurate. These devices aren’t streaming 24x7, but only on demand.
So why did the police demand recordings in the murder case? I don’t know, but what if the victim had said “Alexa, Fred is killing me!” while he was being murdered? That would be recorded and could be relevant information :-)
Personally, until I find out otherwise, I feel the Echo device is not a large risk… as long as I don’t connect it to personal data sources (no, Alexa, you won’t get access to my bank account!).
But what about laptops?
A common recommendation for laptop owners is to cover your camera (and block the microphone) when you’re not using them. You can buy camera shutters that can be stuck over the camera, allowing you to slide open/closed as needed. Why is this different to an “always listening” device?
It’s a matter of attack vectors. Earlier I showed that there’s no open ports on an Echo; an attacker can’t break into it so easily. However a laptop is possibly running Windows or macOS, is used to read emails or surf the web and is at a much greater risk of malware being installed which could then access the camera or microphone. The Echo, being an appliance device, has fewer malware insertion vectors (I won’t say none because everything has bugs) and so doesn’t present the same risk surface.