Published June 2, 2016

To production in 3 minutes: devops in the SMP team

Are you looking for a development process for a small team focused on shipping code in a fast and robust way? Is Continuous Deployment approach something that you consider? Check out how we – Schibsted Media Platform – implemented it in our team.


The SMP Create/Discover
The SMP Create/Discover team in Kraków
develops backend of the Schibsted Media
Platform (SMP) project. SMP is used by
multiple newsrooms within Schibsted Media
Group to discover and publish news content.

Most of the teams which are about to start a new project need to figure out what technologies to use and what processes to put in place to achieve their goals. In many cases organizations try to establish a uniform workflow for all the teams – to maintain control. If it happens, you may inherit a set of procedures that don’t always fit your vision of how you would like to work. Your choice of software may be limited because of licenses your organization already owns. Not only that – even the hardware you will be using to run your programs may already be purchased and not really customized to fit your real needs.

Building it from the very beginning

But what if you could start from scratch? For example: imagine you have a team of six developers sharing a goal of shipping high-quality software, using the agile way – and you may choose whatever process and tools you like. What would you choose?

That’s exactly the situation we found ourselves in about 3 years ago. In terms of technologies, processes and culture we were building from the ground up having simplicity, transparency and trust in mind. In this article we will describe the development process and deployment pipeline we currently use in our daily work. To make it more digestible we’ll break the article into three parts.

Local development

Our team is distributed between Stockholm, Kraków and Oslo. That’s why we picked Git as our version control system and GitHub as repositories provider. GitHub has a great code review and collaboration features that perfectly scale across many locations and we utilize them heavily. Because of its excellent integration with GitHub we use Travis for Continuous Integration.

Our Git workflow is pretty simple – the first rule we have, is to never commit directly to master – it should always contain tested and deployable code. All feature development and bug fixes happen in short-lived feature branches. After the code has been written and unit tested, we also run integration tests on our development machines. We use Foreman for that,

because we host our services on Heroku. Once our changes are tested, we push the local branch to GitHub. Next, we do a quick check and, if everything looks good, we open a Pull Request (PR). Travis immediately notices that PR has been created and starts build process of related branch. Once finished, the build result is shown directly in the originating pull request:

Build results inside Pull Requests

Although we all like to ship stuff fast, we believe that correctness trumps speed. Every piece of code that is expected to reach production needs to go through the code review first. It is mandatory for at least one other developer to take a look at the PR and, if necessary, provide relevant remarks. When the developer and the reviewer agree that the code needs no more improvements, the owner of the PR triggers a merge to the master branch.

Development workflow

Deployment pipeline

When a new commit lands in master, Travis automatically starts a build. Every code repository contains a .travis.yml file which holds Travis configuration and describes how to build the service. In that file we instruct the CI tool to automatically start the build for the latest commit to master and, provided the build is successful, to deploy it to Heroku via git push. Heroku compiles the code and produces an executable package, which is then automatically transferred to our stage environment and executed. Stage environment adds an additional layer of safety which allows us to verify that updated service correctly cooperates with related services.

The next step is to move the package to production. For this we use a nice Heroku feature called pipelines. They allow connecting Heroku applications in a chain representing steps of the deployment. This is how it looks in Heroku dashboard:

Pipeline containing one stage application corresponding to one production application

Pipeline containing one stage application corresponding to one production application

All of our services use pipelines and are deployed to Heroku as at least two applications: one for stage and at least one for production. There are cases where we need more than one production application per service so that we can scale it individually on a per-customer basis.

Pipeline containing one stage application corresponding to many production applications

Pipeline containing one stage application corresponding to many production applications

Promoting from stage to production is as simple as pressing a button in the GUI or, as we prefer to do, typing a command in the terminal:

heroku pipelines:promote --app [name_of_stage_application]

Another useful Heroku feature that we use is called preboot. It helps to achieve zero-downtime deployments by ensuring that the new version of the app is started (and receiving traffic) before terminating the previous one.

That’s it. For most of our services a trip from a merge of Pull Request to the production environment takes less than 3 minutes. Of course, in most cases we take time and perform additional tests in the stage environment.

To provide visibility and make the whole deployment process fully transparent we’ve configured notifications delivered to our chat application (currently Slack). These notifications are fired in the following cases:

  • by Travis, when a build of the master branch fails.
  • by Heroku (via deploy hooks), when a service was deployed to stage or production.

Deployment workflow

Life After The Deployment

Our team doesn’t have dedicated people to do operations – every team member is involved in it. We also have our turns in a 24/7 on-call and we have admin access to all the services. Everyone should feel responsible for the quality and health of the whole system.

Currently, we have about 30 different services in production. Wherever it makes sense, we embrace eventual consistency and use asynchronous communication between the services. Deployment with a bug might have an impact on other services and if this happens, it can be hard to detect it quickly and correlate. We believe that proper monitoring helps to mitigate this problem. All of our services are instrumented and push various kinds of metrics to a monitoring service – Librato – in real time. Librato converts metrics to graphs that we display on large monitors in our office. This way we are able to observe any potential deviations from standard behaviour after a deploy.

 

Application metrics from Librato on monitors in our office

Application metrics from Librato on monitors in our office

It’s also worth to mention that Heroku has a useful Metrics dashboard which provides detailed graphs of the performance of each application in terms of response times, throughput and memory usage.

Application metrics provided by Heroku

Application metrics provided by Heroku

It’s the developer’s responsibility to keep an eye on those metrics after the deploy and act in case of issues. Heroku helps us a lot by keeping a log of all the previous deploys. It allows running an instant and automatic rollback to one of the previous versions with just one click or command.

Of course we don’t rely on observing the charts manually, we have a big set of Librato alerts. Alerts trigger notifications which are forwarded to the person on call if something strange is going on.

One thing that we are still missing is the correlation of events between multiple services. We don’t have it yet because our logs are stored on a per service basis and not aggregated in a single place. That’s one of the things we would like to change in the future.

Conclusion

The process described above, although pretty simple, works perfectly for us. Even though our team has doubled to about twelve, the setup seems to scale well. Simplicity is something that we value, it allows us to deploy our changes multiple times per day and get a fast feedback from our users. By using the right tools, together with a culture of mutual trust, we have built an environment where we don’t have to waste time on pointless procedures or tedious release planning. Instead, we can focus on what we really want to do – to code and build amazing new features. As we have proven, we managed to achieve it without compromising the quality of our products.

We recommend our approach to building software to every team that either starts a new development, or is tired of their current process and wants to become more agile, in the true sense of that word.

Published June 2, 2016