In the pre-cloud days of monolithic environments and waterfall software development, code deployments were isolated, manageable events. Code was deployed to a specific piece of infrastructure, and the operations team would monitor its impact once in production. This was limiting in many ways, but it meant that understanding the impact of new code was straightforward — operations teams knew where to look, and what to do, if a new code deployment went bad.
In today’s world of dynamic cloud environments and DevOps practices, teams can deploy to a single well identified micro-service — instead of the whole monolith — which potentially serves tens of upstream services deployed over an auto-scaled fleet of tens or hundreds of containers or hosts. While this provides more control over functionality, performance, and the cost of operation, the level of interconnectedness and complexity increases the overall risk of making a mistake.
This dynamism has led to a new world of modern deployment strategies — and given the pace of innovation, these strategies have gone from “nice-to-have” to “need-to-have” in most modern software backed businesses. So how can engineering teams enter this new world with confidence and reap the benefits while keeping the risks under control?
New deployment strategies allow for speed and scale
While there are key differences between each of these new deployment strategies, they all reflect the distributed, dynamic, and collaborative nature of software development in the cloud, which allows for continuous improvement to applications over time.
The most common modern deployment strategies are:
Gradual Deployment: in which code is deployed gradually, to small increments of infrastructure — a few containers at first, for example — to make sure that nothing goes wrong before deploying more widely.
Canary Deployment: in which code is deployed to specific instances which act as the “canary in the coal mine” — letting teams know what might happen if the code is deployed to a larger number of similar instances.
Blue/Green Deployment: which involves running old code in the background while deploying new code at the same time, with the old code as a backup in case things go wrong.
A/B Deployment: which utilizes two new deployments running at the same time so they can be compared for their respective performance.
Each of these strategies enables faster development of new applications and features because they offer the opportunity to analyze new deployments and roll them back if necessary. When teams can deploy quickly to targeted infrastructure, they can minimize the impact of unseen issues, identify and fix those issues, then rapidly re-deploy to further refine their code.
It sounds simple, but reality is always more challenging. In fact, multiple teams, or even individual engineers, can decide to deploy various services in parallel, which can make it difficult to keep track of everything.
So let’s look at a specific example to see how this works in practice.
Blue/green deployment in action
Imagine a Blue/Green deployment of a critical update to an application. The new code is deployed to segregated containers while the existing code runs uninterrupted, then traffic is switched over to the new code. The DevOps or SRE (site reliability engineering) teams responsible will watch what happens, testing the performance of the code under varying conditions and reverting to the existing code as a backup if necessary. When the new code is fully validated (checked for performance, errors, and functionality), the existing code can be deprecated, or left alone for a future deployment.
With a more traditional deployment strategy, teams only need to consult data from an individual deployment over a limited period of time. But in the Blue/Green strategy described above, monitoring is more complex. To handle this complexity, teams need robust monitoring, so all the relevant data — across all the deployments, for as long as they run — is collected and organized for both real-time and post-mortem investigation.
Start with the data, collected across deployment versions
Most teams will already have their monitoring tools and processes in place to collect, organize, and visualize their data. To improve visibility for modern deployment strategies, tags need to be applied to each deployment, enabling comparisons between them. No deployment can be left unwatched — when trying to troubleshoot between multiple deployments and varying versions of your code, having full visibility and easy toggling is required.
The data collected also needs to be granular — to identify root causes, depth needs to complement breadth. Metrics, logs, and traces should be collected from individual endpoints and requests, so you can zoom in to specific issues and view them in context with the broader environment and the multiple deployments you’re running.
Once you’re collecting data at a granular level and implementing monitoring across every deployment, you’ll be able to conduct fine-grained troubleshooting and remediation, no matter how multitudinous your deployments may be.
Monitoring modern deployment strategies, in practice
Having rich data from the breadth of all your deployments is just the beginning. The way you make use of that data for troubleshooting is also important to managing modern deployment strategies.
First, you’ll need to identify a set of “golden signals” — the data you’re collecting that provides the strongest indication of a deployment’s performance. Requests, latency, and error rates are the most common, while infrastructure metrics and code-level performance changes from application traces are also important.
Next, you’ll need to orient your monitoring practices around maintaining and observing multiple deployment “versions” that run concurrently. This approach to monitoring shouldn’t be centered around specific events in time, but rather around ongoing observation of the various deployments.
Then, when troubleshooting issues that arise from new or parallel deployments, you’ll want easy access to other information from across your observability stack, so you can identify differences between all microservice versions. This way, logs, traces, and code profiles can be compared and contrasted on the fly to remediate an issue.
And finally, you’ll want a robust program of predetermined alerts and pre-built dashboards, so your team can be notified of the issues that matter most and visualize what’s happening in real time.
Deploy with confidence
Code Deployments are no longer discrete events with bracketed impacts — the process of deploying code and monitoring its effects is truly continuous. To use these new strategies with confidence, you’ll need a well-structured approach to monitoring, allowing for quick comparisons of golden signals across every single service version, and leveraging rich investigative data when something seems off. With these practices in place, the benefits of modern code deployment will begin to accrue — features shipped faster, errors avoided, and keep customers satisfied.
Renaud Boutet, VP of Product, Datadog