AWS Well-Architected Framework: Operational Excellence

Tal Shladovsky
June 2, 2023
3
min. read
Tags
No items found.
Related Resource(s)
No items found.

TL;DR

The Operational Excellence pillar of the AWS Well-Architected Framework is focused on designing and operating systems to deliver business value by enabling continuous improvement of people, processes, and technology. It involves having well-defined processes, procedures, and automation to manage changes and respond to events, while regularly reviewing and refining these processes to optimize operations.

Learn about the design principles. best practices and how Lightlytics can help achieve operational excellence on AWS.

Overview

The Operational Excellence pillar is focused on designing and operating systems to deliver business value by enabling continuous improvement of people, processes, and technology. It involves having well-defined processes, procedures, and automation to manage changes and respond to events, while regularly reviewing and refining these processes to optimize operations.

Design Principles

The Operational Excellence pillar includes several design principles to guide organizations in achieving operational excellence.  
These principles include:

  • Perform operations as code: Use code to automate operations procedures and manage infrastructure as code to increase the reliability and repeatability of operations.
  • Annotate documentation: Use annotations and other metadata to capture important information about infrastructure and applications to improve the efficiency of operations.
  • Make frequent, small, reversible changes: Use agile development practices to make small, reversible changes to infrastructure and applications to reduce risk and enable rapid recovery from errors.
  • Refine operations procedures frequently: Continuously refine and improve operations procedures to reduce manual intervention and increase efficiency.
  • Anticipate failure: Use automation and testing to detect and anticipate potential failures and develop plans to mitigate them.
  • Learn from all operational failures: Develop processes to capture and learn from operational failures to improve systems and processes over time.

Best Practices

The Operational Excellence pillar includes four best practice areas:

  • Organization: This best practice area is focused on organizing and managing resources, roles, and responsibilities to optimize operations. It involves defining and communicating clear objectives and goals, establishing efficient communication channels, and ensuring that all team members have the necessary skills and resources to perform their roles effectively.
  • Prepare: This best practice area involves preparing for production changes before they occur. It includes defining and documenting operational procedures, establishing communication channels, and implementing change management processes to ensure that changes are made in a secure, agile, and efficient manner.
  • Operate: This best practice area focuses on running and monitoring systems to deliver business value. It includes implementing automation, monitoring tools, and effective communication channels to ensure that systems are reliable, efficient, and secure.
  • Evolve: This best practice area involves regularly reviewing and refining operational processes to optimize operations. It includes collecting and analyzing data and metrics to identify opportunities for improvement, as well as implementing continuous improvement processes to ensure that systems and processes are constantly evolving to meet changing business needs.

How can Lightlytics help achieve the Operational Excellence pillar?

  • Lightlytics Discovery is your cloud asset inventory – the place where you can review all the resources in your environment, watch resources configurations, states and dependencies, helping you to better understand the way your environment is shaped and behaves. It simply optimizes the way you organize and manage your resources, allowing you to easily plan ahead.
  • Lightlytics Simulation allows you to test the full impact of proposed infrastructure as code (IaC) changes against your running environment before you deploy.
    Get prepared and validate the change to be made meets resilience and cost best practices as well as organizational standards.
  • Lightlytics Events is your AWS resources change tracking monitor – it allows you to track cloud configuration changes with exact, real-time model of your cloud environment, where you can get notified of changes in real-time to your favorite communications tool – Slack, Microsoft teams or other 3rd party such as PagerDuty, Opsgenie and Splunk, and review with the complete context and impact analysis.

Conclusion

Operational Excellence is a continuous process. To establish a successful organization, establish shared goals, ensure that everyone comprehends their role in achieving business outcomes and their impact on others' ability to succeed. Support team members to help them support business outcomes.

Treat every operational event and failure as an opportunity to improve your architecture's operations. To better prepare your operations and respond more effectively when incidents occur, understand your workloads' needs, predefine runbooks for routine activities, use playbooks to guide issue resolution, leverage AWS's operations as code features, and maintain situational awareness.

By focusing on incremental improvement based on changing priorities and lessons learned from event response and retrospective analysis, increase the efficiency and effectiveness of your activities to help your business succeed.

What's new