Imagine that you're building the next big thing and would like to use cloud services in crafting that beautiful unicorn to life. To help you get things done and sleep soundly at night, big established players (e.g. Heroku, AWS, Azure, GAE) provide you with a big black box that handles one of your non-core activities: complex system administration. However should you accept this black box as a given?
At Futurice there is a long history of internal development to create tools and procedures to let our employees focus on their work. This means making hour markings less painful, tracking inventory (eg. phones, laptops, gadgets, books), integration with 3rd party services to avoid manual duplication of HR information, and in general creating tools for problems worth solving.
Since 2015, we’ve been containerizing our existing internal applications to be cloud-ready. The plan has been to replace our own hosted hardware in a local data center, bring forth a better developer experience, and ease the maintenance burden for our IT staff. This would result in less hassle with project kickoffs, maintenance, firewall rules, backups, hardware capacity for VPS-s, SSL certificate renewals, web server configurations and so forth. This kind of hassle that is a sign of dated working habits.
We could not find a product on the market that would meet our minimum requirements:
- Private networking
- IPSec VPN
- Seamless deployments into a secure wildcard domain
- Employee authentication using our LDAP
- SSO that works with Google Suite
Making your own Platform as a Service (PaaS) is seen as a foolish endeavour due to the complexity combined with rapid development and turbulence in this field that renders home-rolled solutions obsolete within months. In the past our automation was done using tools such as Fabric, Puppet and Ansible to bring cloud/hosted servers up to a predefined state. It worked like eating soup with chopsticks — slightly messy. Docker made me begrudgingly throw all this away to package software into containers and lean on self-healing orchestration platforms like Docker Swarm and Kubernetes.
We started with a 3rd party solution called Deis in late 2015. Soon our containers were up and running and things looked good, except none of us had any idea on how to maintain this black box. When Deis eventually broke down a few months later, we were back at square one.
The journey to making our own platform began with a proof of concept (PoC) in early 2016 to gauge the size of this possibly massive undertaking. Much to my delight the PoC required only two days of tinkering and could potentially meet our minimum requirements. This result reassured me to continue building a PaaS based on readily available open source components running on a single cloud provider. The main software components are Docker Swarm, Docker Flow Proxy and SSSD. Amazon Web Services provides load balancing (ELB), instances (EC2), a private network (VPC), firewalls (SG), encryption keys (KMS), key-value storage (S3), relational databases (RDS PostgreSQL), SSL certificates (ACM), caching (EC Redis, Memcached), persistent storage (EBS) and logging (CloudWatch) among other things. Docker Hub serves as a private container registry for image backups. Apache handles authentication with Google Suite to provide a Single Sign On experience for our employees' intranet. Red Hat's System Security Services Daemon (SSSD) serves as the SSH key lookup for our employee database (LDAP) needed for authentication. Docker Flow Proxy listens to the Docker API to keep a tally of running services and handles service discovery in the form of routing service domains to their respective swarm service node ports. Docker Swarm is the heart of the system that provides the orchestration for deploying and running containers at scale. A touch of scripting and four weeks later the first iteration was running our internal services.
The main goals for the PaaS were:
- Provide installation, maintenance and recovery that can be done and understood by anyone in IT
- Hosting and ecosystem from a trusted cloud provider
- Immutable backups of software deployments
- Developer happiness with an empowering client interface
- Personal learning of Docker and AWS ecosystems
These goals were met by:
- Creating an installer in Bash that configures all the pieces together and also performs any maintenance. Disaster recovery is done by creating a new installation that runs a copy of the previous deployment. All in a single click.
- Tapping into the AWS ecosystem. Our staff likes their nightly sleep and we intend to keep it that way.
- Docker images are automatically backed up to a private registry, configuration variables are stored in S3, data persistence is handled by RDS and asset persistence by EBS; no permanent state is kept in Docker Swarm itself. Failure of the stateless PaaS is not a problem.
- A client/server interface to swiftly push software to our intranet or to the public internet. The CLI allows access to all company personnel without any account registration required thanks to public keys stored in our LDAP. No need to contact IT to start using the PaaS.
- Currently we host over 50 services built with different tools, operating systems and programming languages. While singular service deployments lead to higher costs — as cloud pricing is based on RAM, CPU and additional resources like load balancers — a reserved pool of Docker Swarm nodes hosting these containers behind a single elastic load balancer brings cost savings as the number of services increases. This quickly sums up to over 10x monthly savings compared to generic platform solutions.
- Our own PaaS — the futuswarm — is born. Making stuff is the best way to learn.
Now, after creating an image recipe (Dockerfile), deployment to https://retirement-plan.lb.example.com is easy as:
$ futuswarm app:deploy --image bitcoin-miner --tag 1.0 --name retirement-plan
The next person to update the software only needs to increment the build (eg. --tag 1.1) to have new changes live. This is a reproducible application deployment process from local development to production utilizing the best of open source configured to our internal requirements. Our platform will be open sourced soon. This is not to say there aren’t better solutions out there, however we’ve laid the groundwork for better internal practices and tailored the hosting environment to our needs.
Having dared a peek into the darkness of the rabbit hole, knowledge of the boring parts with a bit of programming magic helped create a system that ticks all the boxes for our IT's maintenance happiness while providing a tool for our developers to craft and showcase their creations with less pain.
EDIT 28.12: We're live https://github.com/futurice/futuswarm