Answer-driven DevOps automation can help enterprises accelerate insight and reduce the risk of unexpected downtime. Here’s how.
As organizations mature on their digital transformation journey, they begin to realize that automation – specifically, DevOps automation – is critical for rapid software delivery and reliable applications.
But as multicloud environments grow, they become increasingly complex and generate massive amounts of data. In turn, manual approaches to identifying code issues and troubleshooting are not scalable. Organizations can’t manage their cloud environments effectively with these traditional approaches.
“There are too many manual processes,” said Michael Winkler, senior principal in product management at Dynatrace, during a Perform 2023 breakout session. “In fact, this is one of the major things that [hold] people back from really adopting DevOps principles.”
Indeed, according to the 2023 Global CIO Report, 55% of respondents reported that they are often sacrificing code quality, reliability, or security to meet the demand for rapid software delivery.
To quell the volume of manual tasks and free up time for innovation, more site reliability engineering (SRE) and DevOps teams are using automation. But DevOps automation isn’t just about reducing manual effort and increasing process speed; it’s also about gaining insight with answer-driven operations.
So, how do organizations move from simple automation to more mature models that deliver on decision-making potential? In the Perform 2023 session “Answer-driven automation with Dynatrace for DevOps and SRE,” Winkler shared the stage with Saif Gunja, director of product marketing at Dynatrace, to explore DevOps automation use cases and how teams can progress from initial application to accelerated insight.
Maturity: Addressing the DevOps automation obstacle
Though numerous organizations have invested in automation, DevOps teams still face challenges.
According to the Dynatrace 2023 Global CIO Report, 31% of DevOps teams’ time is spent on manual tasks. This statistic is despite the $9.1 million average investment in automation across development, security, and operations, and the expected 35% average increase in automation investment by 2024. Meanwhile, the Gartner 2022 State of Infrastructure and Operations (I&O) Automation report indicates that just 21% of I&O leaders report high success in their automation endeavors.
The disconnect between dollars spent and confidence in automation capabilities is attributable to DevOps maturity levels. Organizations that have just gotten started with automation often rely on scheduled frameworks that use low-code and no-code solutions to automate processes, such as reporting and infrastructure routines. Event-driven automation is typically the next stage in DevOps automation maturity, adding functions as a service to handle problem remediation and threat protection. Answer-driven DevOps automation rounds out the maturity model with change impact analysis and progressive delivery solutions that enable DevOps teams and site reliability engineers (SREs) to take targeted action that improve results.
The challenge: while many organizations have implemented scheduled automation, event-driven frameworks are far less common and answer-driven developments remain rare. As a result, many organizations don’t use automation to its full potential. While organizations are still seeing improvement compared with non-automated processes, lacking maturity leaves substantial room for improvement.
Key players: SREs and DevOps professionals
SREs and DevOps professionals both benefit from implementing automation. DevOps automation can help them complete priorities efficiently while mitigating pain points.
For example, DevOps teams aim to release secure and high-performing applications and services faster. As part of achieving this goal, they’re interested in shorter feedback loops and self-service automation. But pain points — including too many manual processes, buggy software that escapes into production, and pipeline complexity — often frustrate these efforts.
For SREs, meanwhile, resiliency, reliability, and automation are their top priorities. But app downtime, security vulnerabilities, and incident management pose obstacles for SRE experts. Brand reputation is also a factor, as downtime and reliability issues can hinder customer experience. “From a pain point perspective,” Gunja said, “it’s about downtime and mean time to recovery [MTTR]: things that nobody wants to be working on, but we spend most of the time doing.”
Answer-driven DevOps automation use cases
Effective DevOps automation reflects a four-stage, feedback-based process: sense, think, act, and optimize.
The following automation use cases can help teams understand what answer-driven SRE and DevOps automation look like in practice:
Automation use case 1: Targeted notification and collaboration
The goal of targeted notification and collaboration is to automatically notify the correct teams and provide them with the context necessary for faster triaging. Ideally, this notification and collaboration should extend to any event type and status. It should also include context-specific actions, such as notifying applicable teams, searching for existing tickets, and creating new tickets as necessary.
Automation use case 2: Closed-loop remediation
The goal of closed-loop remediation is to reduce MTTR and return to a steady state. “You have a workflow that’s defined,” said Gunja, “and you have a trigger, such as a Davis-detected problem, which triggers a particular workflow.” This workflow is designed to collect problem information, analyze problem details, and remediate the problem where possible without human intervention.
Automation use case 3: Change/release impact analysis
The more that organizations understand the impact of software changes and releases, the better equipped they are to improve the reliability and quality of the software they produce. With answer-driven automation, organizations can move beyond simple analysis to include trigger-based actions. For example, if release analysis delivers a “fail” result, it can trigger a rollback action. If it delivers a “pass” result, it can trigger a promote action. A “warning” result may trigger an approval action, followed up by further analysis.
Automation use case 4: Progressive delivery orchestration
Software releases happen in progressive steps. First, the software is released to internal teams. It then becomes available to limited test groups and finally to the public at large. At each step, DevOps teams must assess functions and features and ensure they’re delivered safely and securely. Here, the Dynatrace Site Reliability Guardian application can help. “The Site Reliability Guardian makes sure that every time you roll out and release features to new customers, everything is where you want it to be,” said Winkler.
Three steps for successful DevOps automation
Moving from time-based or event-driven automation to answer-driven efforts starts by recognizing the importance of observability and security data in context. By understanding what’s happening and why, teams can better prepare to address potential issues and streamline key operations.
The following three steps can drive more successful DevOps automation:
1. Start simple, scale it out
Start small and simple. Doing too much too quickly creates complexity, making it difficult for teams to separate signals from noise, ultimately undermining automation’s benefits. By starting with a simple, use-case-based implementation, organizations can see what works, what doesn’t, and widen the implementation from there.
2. Automate what matters most
While automation benefits most processes, not every process has the same priority. As a result, organizations may find the most success by prioritizing what matters most. For instance, automating processes surrounding issue identification and a return to steady operations play a key role in reducing mean time to response (or MTTR) and satisfying users.
3. Implement answer-driven frameworks
With processes identified and prioritized, the final step is to go beyond time- and event-driven options to implement answer-driven frameworks. Context is the key advantage of this step. By going beyond the event or the issue itself and placing it in the context of other operations, organizations can discover new ways to optimize processes and inform future decision making.