In the previous entry we looked at how a Docker container image is built.
In this entry we’re going to look a little bit about how a container runs.
Let’s take another look at the container we built last time, running apache:
% cat Dockerfile
FROM centos
RUN yum -y update
RUN yum -y install httpd
CMD ["/usr/sbin/httpd","-DFOREGROUND"]
% docker build -t web-server .
% docker run --rm -d -p 80:80 -v $PWD/web_base:/var/www/html \
-v /tmp/weblogs:/var/log/httpd web-server
63250d9d48bb784ac59b39d5c0254337384ee67026f27b144e2717ae0fe3b57b
% docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
63250d9d48bb web-server "/usr/sbin/httpd -..." 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp modest_shirley
So how does network traffic get into this container? And what does
that -p
flag mean?
Basic Docker networking
By default, Docker creates a bridge called docker0
. This bridge
is not connected to the primary network, so there’s no communication
to containers on this bridge. The bridge is associated with a private
network.
When a container starts up, it is given a virtual ethernet (veth) device, that allows for IP communication between the host and the container. Inside the container it looks just like a normal network device.
This veth device is added to the bridge, and an IP address associated.
With our test Apache container we can see how this looks:
% brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.024234e17ca9 no veth336564a
% ip -4 addr show dev docker0
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
% docker inspect --format='{{ .NetworkSettings.IPAddress }}' modest_shirley
172.17.0.2
So we can see our container’s “veth” device is on the bridge. The bridge itself has an IP address (172.17.0.1) on a /16 network (allowing for 65k addresses). Our container has an address 172.17.0.2 on this network.
We’ve effectively created a private network, 172.17.0.0/16; the host acts as the default gateway for the containers.
Now, of course, the rest of your network (other hosts, etc) do not
know how to reach this private network, so a
set of iptable
rules are created so that outgoing
traffic from the container is NAT’d to the host’s IP address.
In this way containers can reach out to the main network.
Incoming traffic needs to be port forwarded, and this is set up with
the -p
flag; you can specify a port on the host and the port on the
container it should move to. So -p 80:80
means forward port 80 from
the host to port 80 inside the container.
It gets a little messy handling traffic from the outside network to the container, traffic between containers, and traffic from the container to itself
% ps -ef | grep docker-proxy
root 10054 760 0 10:18 ? 00:00:00 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 80 -container-ip 172.17.0.2 -container-port 80
% sudo iptables -v -t nat -L POSTROUTING
Chain POSTROUTING (policy ACCEPT 11 packets, 754 bytes)
pkts bytes target prot opt in out source destination
78 4961 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere
0 0 MASQUERADE tcp -- any any 172.17.0.2 172.17.0.2 tcp dpt:http
% sudo iptables -v -t nat -L DOCKER
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 any anywhere anywhere
0 0 DNAT tcp -- !docker0 any anywhere anywhere tcp dpt:http to:172.17.0.2:80
Exercise for those following on at home. See what other rules are in the complete
iptables
output, including the main FORWARD chain
This is just the default; it can be changed!
Container processes
With the CMD
entry we told the Docker daemon to start this container
by running the httpd
process. We know Apache creates a number of
child processes. We can see this, pretty easily:
% docker top modest_shirley
UID PID PPID C STIME TTY TIME CMD
root 6458 6442 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 6471 6458 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 6472 6458 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 6473 6458 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 6474 6458 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 6475 6458 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
Note the PIDs are those as seen from the host. When we go inside the container, later, we’ll see the PID numbers look different
Container files
The normal running of program like Apache causes temporary files to be generated
(eg the PID file, at the very least). Your app may make use of /tmp
,
or /run
or other areas.
By default Docker running containers are transient; when you shut them down the changes are lost. But while they’re running we can see what changes have been made:
We can see what files have changed
% docker diff modest_shirley
C /run
A /run/mount
A /run/mount/utab
C /run/httpd
A /run/httpd/httpd.pid
A /run/httpd/authdigest_shm.1
Note the log files don’t show because they’re not part of the container image; they were written to a mounted volume (-v flag)
Going inside the container
We’ve seen some ways of looking at a container from the outside, using
the docker top
and docker diff
commands. But what does the container
look like from the inside? We can use docker exec
to run a command.
(The details of how it works involve selecting the same namespaces for
your new container, but you can think of it as if you were running a new
process inside the container)
The filesystem from inside
% docker exec -it modest_shirley /bin/sh
sh-4.2# ls
anaconda-post.log dev lib media proc sbin tmp
bin etc lib64 mnt root srv usr
boot home lost+found opt run sys var
The filesystem looks like a normal CentOS one.
sh-4.2# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-252:1-131082-0940ddeec345786e6a77a45645d662721d239266ede70f2620b21d4abe11ad0d
10G 355M 9.7G 4% /
tmpfs 245M 0 245M 0% /dev
tmpfs 245M 0 245M 0% /sys/fs/cgroup
/dev/mapper/dockerce-fs
8.8G 23M 8.3G 1% /etc/hosts
shm 64M 0 64M 0% /dev/shm
/dev/vda3 3.0G 1.6G 1.3G 56% /var/log/httpd
tmpfs 245M 0 245M 0% /sys/firmware
If you look carefully, you can see some “data leakage”. For example,
the /var/log/httpd
has exposed the filesystem mount point /dev/vda3
(which is where /tmp
lives on my test machine). The root disk is
showing how much space I allocated to the docker data volume.
Other data may be exposed, eg via the dmesg
command
sh-4.2# dmesg | grep Hypervisor
[ 0.000000] Hypervisor detected: KVM
We can see that Docker, in its default setup, doesn’t hide so much of the host machine as we might like! That’s the consequence of a virtualised OS, as opposed to virtualised hardware.
Processes from inside
sh-4.2# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 5 1 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 6 1 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 7 1 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 8 1 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 9 1 0 14:08 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
root 27 0 1 14:22 ? 00:00:00 /bin/sh
root 31 27 0 14:22 ? 00:00:00 ps -ef
Note the PIDs; the container has its own PID namespace and so our first
Apache process now shows as PID 1. Recall, from earlier, that it showed
as 6458 in the docker top
output.
Networking from inside
This image doesn’t have an ip
or ifconfig
command inside, but if it
did (or if we copied it in) then the output would look something like:
4: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link-netnsid 0
inet 172.17.0.2/16 scope global eth0
valid_lft forever preferred_lft forever
Similarly the routing table would look like
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
So we can see it shows as a normal network interface, with a default route to the bridge IP address.
Output from a Docker container
A program may send output to stdout or stderr. In a normal VM this might be considered the equivalent of the console. Docker allows us to inspect this as well. Let’s create a simple container that just writes out a line once a minute
#!/bin/sh
while [ 1 ]
do
echo Hello, the time is `date`
sleep 1
done
Let’s run this:
% docker run --rm -d timeloop
c03f1a63e4c7e55ab37c973a2fe231621340c48aae633865049f2588168b1c1e
% docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c03f1a63e4c7 timeloop "/hello" 7 seconds ago Up 2 seconds eloquent_torvalds
% docker logs eloquent_torvalds
Hello, the time is Sat Jun 24 14:55:19 UTC 2017
Hello, the time is Sat Jun 24 14:55:20 UTC 2017
Hello, the time is Sat Jun 24 14:55:21 UTC 2017
Hello, the time is Sat Jun 24 14:55:22 UTC 2017
Docker supports different logging modules; we can see what one the container is using:
% docker inspect -f '{{.HostConfig.LogConfig.Type}}' eloquent_torvalds
json-file
This is the default driver and has no default limits; can fill up the disk!
Being nasty
We’ve seen enough in this blog entry to see how we can be nasty. If
you look closely, you’ll notice that we did docker exec
we were root
inside the container. We can abuse this!
sh-4.2# rpm -e passwd
sh-4.2# cat > /bin/passwd
echo Hahahahaha
sh-4.2# chmod 755 /bin/passwd
sh-4.2#
OK, that’s not much of an abuse, but it shows we can make changes.
Fortunately we can detect this type of abuse:
% docker diff modest_shirley
C /root
A /root/.bash_history
[ ... ]
C /etc
C /etc/pam.d
D /etc/pam.d/passwd
C /var
C /var/lib
C /var/lib/rpm
[ ... ]
C /usr/bin
C /usr/bin/passwd
[ ... ]
If we know what files should change (the /tmp
and /run
files?) then
we may be able to use this for intrusion detection (only if filesystem
artifacts are left behind) and File Integrity Monitoring (FIM).
Changes are transient
If we destroy and recreate this container then those changes are lost and a “virgin” image is restarted.
% docker kill modest_shirley
modest_shirley
% docker run --rm -d -p 80:80 -v $PWD/web_base:/var/www/html -v /tmp/weblogs:/var/log/httpd web-server
2118033d42f2fe6bfe10861e838adb7a5df0c408431ba070d77ff6fa213ff45d
% docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2118033d42f2 web-server "/usr/sbin/httpd -..." 3 seconds ago Up 3 seconds 0.0.0.0:80->80/tcp compassionate_liskov
% docker diff compassionate_liskov
C /run
C /run/httpd
A /run/httpd/authdigest_shm.1
A /run/httpd/httpd.pid
This is useful for recovering from a broken container, but it loses a potentially useful audit trail (which could hamper incident response).
Keeping changes after termination
We can keep terminated containers by not using the --rm
flag, but this will start using up disk space.
To demonstrate this I created a simple container that just creates three files and terminates (by now you should be able to do this yourself, so I won’t show the Dockerfile or script).
We’ll run it without the --rm
flag:
% docker image ls change
REPOSITORY TAG IMAGE ID CREATED SIZE
change latest 7627c3a09d4e 30 minutes ago 124 MB
% docker run change
% docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
The container doesn’t show in the ps
listing. We need to use an
additional flag to show these terminated containers:
% docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b216fbcca2c0 change "/hello" 7 seconds ago Exited (0) 6 seconds ago naughty_sinoussi
There it is!
Because the results have been kept around we can inspect it and even pull the contents.
% docker diff naughty_sinoussi
C /tmp
A /tmp/testfile2
C /run
A /run/testfile3
A /testfile1
% docker cp naughty_sinoussi:/tmp/testfile2 - | tar tvf -
-rw-r--r-- 0/0 29 2017-06-09 14:40 testfile2
% docker cp naughty_sinoussi:/tmp/testfile2 - | tar xOf - testfile2
I am modifying a file in tmp
The docker cp
command is useful; it can be used to extract (or push!)
files and directories from a container. The output is in a tar
format.
You can do this on running containers as well.
Finally we can clear this up:
% docker rm naughty_sinoussi
naughty_sinoussi
Disk space used
Obviously keeping these changes (and “console” log output) around takes up disk space. But how much?
Let’s start with a clean system:
% docker info | grep Space.Used
Data Space Used: 840.2 MB
Metadata Space Used: 1.217 MB
% docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 10 0 660.8 MB 660.8 MB (100%)
Containers 0 0 0 B 0 B
Local Volumes 0 0 0 B 0 B
Now let’s run the docker run change
command 100 times (obviously
by cheating and run it in a loop).
Now how much space is used?
Data Space Used: 1.45 GB
Metadata Space Used: 6.554 MB
Images 10 1 660.8 MB 660.8 MB (99%)
Containers 100 0 8.8 kB 8.8 kB (100%)
Local Volumes 0 0 0 B 0 B
Now I know my script changes around 88 bytes of data inside the container.
The df
command only shows a 8.8K increase in size (which matches 88
bytes changed in 100 containers), but the info
command shows
usage has grown by over 600M
Important note: The df
command is slow with so many terminated containers
This gives us the ability to create a management process around running
containers. For example we could start them up without the --rm
flag.
Periodically during the running we can check the docker diff
results
and if something looks bad we can alert the SOC. Potentially terminate
the container. Similarly on container termination we can check the diff
results and if that looks clean then we can rm
the results to recover
disk space, or else retain it for forensic analysis.
Read-only containers
There’s another way of running Docker that can help protect against
modification: use the --read-only
flag. With this the whole filesystem
is made immutable. Now your normal app requires some temporary space;
we can do this with --tmpfs
. Annoyingly the permissions on /run
may
not be correct, so we can create a simple startup wrapper.
Going back to our Apache example, we build it the same way but with a startup wrapper instead
% cat startup
#!/bin/sh
mkdir -m 0777 /run/httpd
exec /usr/sbin/httpd -DFOREGROUND
% cat Dockerfile
FROM centos
RUN yum -y update
RUN yum -y install httpd
ADD /startup /
CMD ["/startup"]
% docker build -t readonly-web .
% docker run -d --rm --read-only -p80:80 -v $PWD/web_base:/var/www/html:ro -v /tmp/weblogs:/var/log/httpd --tmpfs /run --tmpfs /tmp readonly-web
Note the :ro
on the /var/www/html
directory to make the html tree
also immutable, and /run
and /tmp
are set as tmpfs directories
If we try to make changes inside the container it fails, but the web server can still write out its logs
% docker exec -it f17484a0529f /bin/sh
sh-4.2# touch /foo
touch: cannot touch '/foo': Read-only file system
sh-4.2# rpm -e passwd
error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Read-only file system)
sh-4.2# touch /tmp/foo
sh-4.2# touch /var/www/html/bar
touch: cannot touch '/var/www/html/bar': Read-only file system
sh-4.2# tail -1 /var/log/httpd/error_log
[Mon Jun 12 17:01:11.024721 2017] [core:notice] [pid 1] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
A bit of a gotcha, here, is that docker diff
shows no files have changed, which may hide intrusion indicators! We’ve been inside the container and
looked around, but no history file was generated. This may be a small
price for preventing abuse in the first place.
Being nasty outside the container
We can take what we’ve learned and use that to break into the host.
For example, we could map the root directory!
% docker run --rm -it -v /:/mnt centos /bin/sh
sh-4.2# cp /bin/id /mnt/tmp/badperson
sh-4.2# chmod 4711 /mnt/tmp/badperson
sh-4.2# exit
exit
% id
uid=500(sweh) gid=500(sweh) groups=500(sweh),499(docker)
% /tmp/badperson
uid=500(sweh) gid=500(sweh) euid=0(root) groups=500(sweh),499(docker)
Note the euid
has changed.
If you give a normal user permission to run the docker
command (which,
basically, means being in the docker
group) then they have effective root on the whole machine.
SELinux can mitigate this, to some extent, by preventing the container from having permissions to modify stuff. Indeed, I had to disable SELinux to do this test. These security features are there for a reason, but sometimes they’re disabled.
The best defense is to not allow people to be in the docker
group in
the first place!
Summary
In this blog entry we’ve looked at the running container:
- Networking, Processes, file changes
- Container stdout/stderr logs
- Abusing docker privileges (root exploit!)
- And how we can detect this
- Some protections we can do against this
Docker also has a lot of advanced security functions (SELinux, AppArmour, seccomp, capabilities) which can protect the system and the application. These are beyond the scope of this “basics” blog entry, but are definitely something an enterprise user of Docker needs to be aware of.