As we’ve previously seen, Docker Swarm mode is a pretty powerful tool for deploying containers across a cluster. It has self-healing capabilities, built in network load balancers, scaling, private VXLAN networks and more.
Docker Swarm will automatically try and place your containers to provide maximum resiliency within the service. So, for example, if you request 3 running copies of a container then it will try and place these on three different machines. Only if resources are unavailable will two containers be placed on the same host.
We saw this with our simple pinger
application; it ran on 3 nodes:
% docker service ps pinger
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
xxn5tnej6bao pinger.1 centos:latest test1.spuddy.org Running Running 5 minutes ago
dcmno194ymjf pinger.2 centos:latest test2.spuddy.org Running Running 2 second ago
g89lpi6xeuji pinger.3 centos:latest test3.spuddy.org Running Running 2 second ago
Sometimes, however, you need to control where a container is run. This may be for functionality reasons; for example a container that monitors and reports on the Swarm state needs to run on a manager node in order to get the data it needs. Or there may be OS requirements (a container designed to run on a Windows machine shouldn’t be deployed to a Linux machine!).
Frequently, however, this due to some nodes having the necessary
resources, and the most common of these are “volume” dependencies.
Remember that while the containers may be run on multiple nodes, a Swarm
is really just a collection of standalone Docker engines pulling images
from a registry. That means that backend resources, such as file system
volumes, are served locally from each node. The contents of /myapp
on
server test1
may be different from /myapp
on test2
. We also saw
this, in passing, with the MySQL container; it was constrained to only
run on a specific node so that the backing datafiles were consistent.
db:
image: mysql:5.5
networks:
- appdb
environment:
- MYSQL_ROOT_PASSWORD=foobar
- MYSQL_DATABASE=mydb1
volumes:
- db-data:/var/lib/mysql
deploy:
placement:
constraints: [node.hostname == test1.spuddy.org]
In this case we use a named volume rather than a filesystem directory, but the constraint is still required; each node would have its own unique “db-data” volume.
Now constraining a host by hostname works, but you’re limited to just a single host. What if you wanted to run 3 copies of Tomcat? We need to constrain it to run, just where the config files are and we can’t just list all three names (the rules are ANDed together).
So, instead, we can define a label and constrain it to that:
tomcat:
image: sweh/test:fake_tomcat
deploy:
replicas: 2
placement:
constraints: [node.labels.Tomcat == true ]
volumes:
- "/myapp/apache/certs:/etc/pki/tls/certs/myapp"
- "/myapp/apache/logs:/etc/httpd/logs"
- "/myapp/tomcat/webapps:/usr/local/apache-tomcat-8.5.3/webapps"
- "/myapp/tomcat/logs:/usr/local/apache-tomcat-8.5.3/logs"
ports:
- "8443:443"
(“fake_tomcat” is just a dummy program I wrote that listens on the requested port; it doesn’t do any real work).
I want to run this on test1
and test2
so on those two machines I make
the necessary directories.
We now need to add the labels to tell the Swarm these are able to run Tomcat:
$ docker node update --label-add Tomcat=true test1.spuddy.org
test1.spuddy.org
$ docker node update --label-add Tomcat=true test2.spuddy.org
test2.spuddy.org
$ docker node inspect --format '{{ .Spec.Labels }}' test1.spuddy.org
map[Tomcat:true]
$ docker node inspect --format '{{ .Spec.Labels }}' test2.spuddy.org
map[Tomcat:true]
I then create the stack, and we can see it running
$ docker stack deploy -c stack.tomcat myapp
Creating network myapp_default
Creating service myapp_tomcat
$ docker stack ls
NAME SERVICES
myapp 1
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
bnhvu1cc3orx myapp_tomcat replicated 2/2 sweh/test:fake_tomcat *:8443->443/tcp
$ docker service ps myapp_tomcat
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
to0yvevap2om myapp_tomcat.1 sweh/test:fake_tomcat test2.spuddy.org Running Running about a minute ago
dk9pnthgvdcp myapp_tomcat.2 sweh/test:fake_tomcat test1.spuddy.org Running Running about a minute ago
We can see this is working by testing port 8443:
$ echo hello | nc localhost 8443
You are caller #1 to have reached fake_tomcat on 38be98384cb9:443
fake_tomcat has received message `hello'
$ echo hello | nc localhost 8443
You are caller #1 to have reached fake_tomcat on d095e9f2490f:443
fake_tomcat has received message `hello'
Two calls hit the different containers (as shown by the different hostname in the output).
Multiple services
Let’s create a more complicated stack which has Tomcat and Memcached and Zookeeper.
The additional lines to the stack are:
zookeeper:
image: sweh/test:fake_zookeeper
deploy:
replicas: 2
placement:
constraints: [node.labels.Zookeeper == true ]
volumes:
- "/myapp/zookeeper/data:/usr/zookeeper/data"
- "/myapp/zookeeper/logs:/usr/zookeeper/logs"
- "/myapp/zookeeper/conf:/usr/zookeeper/conf"
environment:
CFG_FILE: /usr/zookeeper/conf/zoo.cfg
memcached:
image: sweh/test:fake_memcached
deploy:
replicas: 1
placement:
constraints: [node.labels.Memcached == true ]
And let’s create the relevant labels to distribute over the three servers. The resulting labels look like:
docker-ce.spuddy.org
map[]
test1.spuddy.org
map[Memcached:true Tomcat:true]
test2.spuddy.org
map[Tomcat:true Zookeeper:true]
test3.spuddy.org
map[Zookeeper:true]
Note there’s nothing labeled on the manager node; my app won’t run there.
Again, ensure the directories exist with the necessary configuration and deploy:
$ docker stack rm myapp
Removing service myapp_tomcat
Removing network myapp_default
$ docker stack deploy -c stack.full myapp
Creating network myapp_default
Creating service myapp_memcached
Creating service myapp_tomcat
Creating service myapp_zookeeper
$ docker stack ps myapp
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
iidao6f3vt6z myapp_zookeeper.1 sweh/test:fake_zookeeper test2.spuddy.org Running Running 34 seconds ago
haq4vvben495 myapp_tomcat.1 sweh/test:fake_tomcat test1.spuddy.org Running Running 35 seconds ago
6u4s6w4fs0cm myapp_memcached.1 sweh/test:fake_memcached test1.spuddy.org Running Running 37 seconds ago
zixk0eu2aahu myapp_zookeeper.2 sweh/test:fake_zookeeper test3.spuddy.org Running Running 35 seconds ago
4u8ytws4kl5v myapp_tomcat.2 sweh/test:fake_tomcat test2.spuddy.org Running Running 35 seconds ago
Rescale
In my fake environment I decided that memcached is running slow (and, besides, only having one copy isn’t very resilient!). Now we can see the power of labels; I can add the Memcached label to another node and then rescale:
$ docker node update --label-add Memcached=true test3.spuddy.org
test3.spuddy.org
$ docker service scale myapp_memcached=2
myapp_memcached scaled to 2
$ docker service ps myapp_memcached
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
6u4s6w4fs0cm myapp_memcached.1 sweh/test:fake_memcached test1.spuddy.org Running Running 3 minutes ago
pf56ix45e887 myapp_memcached.2 sweh/test:fake_memcached test3.spuddy.org Running Running 7 seconds ago
Now that was easy because it didn’t have any external volumes to depend on, but we could do the same for zookeeper or tomcat; just create the necessary volumes and configurations on the new node, then add the label.
What if you forget the volumes?
Let’s add Zookeeper to test1 but “forget” to make the volumes:
$ docker node update --label-add Zookeeper=true test1.spuddy.org
test1.spuddy.org
$ docker service scale myapp_zookeeper=3
myapp_zookeeper scaled to 3
$ docker service ps myapp_zookeeper
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
iidao6f3vt6z myapp_zookeeper.1 sweh/test:fake_zookeeper test2.spuddy.org Running Running 6 minutes ago
zixk0eu2aahu myapp_zookeeper.2 sweh/test:fake_zookeeper test3.spuddy.org Running Running 6 minutes ago
ytyp3impjzyf myapp_zookeeper.3 sweh/test:fake_zookeeper test1.spuddy.org Ready Rejected less than a second ago "invalid mount config for type�
sudgw3w694ja \_ myapp_zookeeper.3 sweh/test:fake_zookeeper test1.spuddy.org Shutdown Rejected 5 seconds ago "invalid mount config for type�
3gv64qgli504 \_ myapp_zookeeper.3 sweh/test:fake_zookeeper test1.spuddy.org Shutdown Rejected 6 seconds ago "invalid mount config for type�
That’s messy!
Eventually the system settles down and runs two copies on an existing node:
$ docker service ps myapp_zookeeper
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
iidao6f3vt6z myapp_zookeeper.1 sweh/test:fake_zookeeper test2.spuddy.org Running Running 8 minutes ago
zixk0eu2aahu myapp_zookeeper.2 sweh/test:fake_zookeeper test3.spuddy.org Running Running 8 minutes ago
ib7eiekpes3c myapp_zookeeper.3 sweh/test:fake_zookeeper test2.spuddy.org Running Running 49 seconds ago
Monitoring of service state becomes very important in this environment!
Migration of services between nodes
For performance reasons I want to move the memcached currently run on
test3
and migrate it to test2
, so it’s on the same machine as the
tomcat instance. We can force a migration of the container by adding
the label to the new node and removing it from the old.
$ docker service ps myapp_memcached
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
6u4s6w4fs0cm myapp_memcached.1 sweh/test:fake_memcached test1.spuddy.org Running Running 13 minutes ago
pf56ix45e887 myapp_memcached.2 sweh/test:fake_memcached test3.spuddy.org Running Running 9 minutes ago
$ docker node update --label-add Memcached=true test2.spuddy.org
test2.spuddy.org
$ docker node update --label-rm Memcached test3.spuddy.org
test3.spuddy.org
$ docker service ps myapp_memcached
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
6u4s6w4fs0cm myapp_memcached.1 sweh/test:fake_memcached test1.spuddy.org Running Running 14 minutes ago
n7bv9331opoq myapp_memcached.2 sweh/test:fake_memcached test2.spuddy.org Running Running 8 seconds ago
pf56ix45e887 \_ myapp_memcached.2 sweh/test:fake_memcached test3.spuddy.org Shutdown Rejected 14 seconds ago
The instance that had been running on test3
has been rejected and a new
instance started on test2
. Remember the Docker scheduler will attempt to
pick a node that isn’t already running a copy of the container. In this
case the only valid servers to meet the placement constraints were test1
and test2, and test1 already had a copy running.
Summary
Using labels allows for a very dynamic way of determining where your workloads run; you can rescale, migrate and even extend the cluster (add new nodes, add labels to the node, modify the scaling) without needing to redeploy the stack.
This can become more important as multiple workloads in multiple stacks are deployed to the same swarm; the stack owners don’t need to know about the underlying node names, they can just use labels and the same stack can be deployed to different targets (how easy would this be to spin up on AWS EC2 instances? No stack changes needed at all!)
Of course this doesn’t come for free; it takes time for containers to spin up (about 6 seconds in my fake_memcached migration) so make sure your services can handle short outages. The closer to 12factor your apps are, the better they are at handling dynamic migration of containers.
And, of course, there’s still the persistent data volume question to handle; this can complicate your deployments.
But despite these complications I would recommend taking a look at labels if you have a need to constrain where your containers run.