Data-driven Post-mortems
Using data to learn from failure
Jason Yee
Jason is a technical writer and evangelist at Datadog, where he works to inspire developers and ops engineers with the power of metrics and monitoring. He’s also a co-organizer of DevOpsDays Portland. When he’s not speaking at conferences or helping organize them, he likes to spend time on planes “travel hacking” and hunting for interesting, regional whiskey.
The DevOps movement has not only influenced the tools we use in modern development and operations engineering, but also how we work. As part of how we work, DevOps has changed how we respond when systems inevitably stop working or don't work as expected. Although "blameless" post-mortems are commonly espoused in the DevOps community, they often only tell us what not to do (blame), rather than focussing on what we should do. This session will provide methods and techniques for gathering information and effectively using that information to avoid and mitigate failure in the future.
I'll cover best practices for gathering systems-related data, including monitoring and logging. This session will also cover practices for gathering and recording people-related data; including methods we can adopt from police, accident investigators, and other safety management professions to learn the most from incidents.
After discussing how to gather data, I'll discuss how we can use the data to formulate actionable response plans and how to adjust existing organizational practices to avoid repeating failure.
I plan to keep the technical portions of this talk at a novice level so that it's accessible to both developers/engineers and those in non-technical roles who will be involved in incident response.
- Date:
- 2016 November 11 - 15:30
- Duration:
- 1 h
- Room:
- Room 3180
- Conference:
- Seattle GNU/Linux Conference 2016
- Language:
- Track:
- Difficulty:
- Easy
- The Set of Programmers: How Math Restricts Us
- Start Time:
- 2016 November 11 15:30
- Room:
- Room 3178
- Your Resume Is Code
- Start Time:
- 2016 November 11 15:30
- Room:
- Room 3184
- Rust flies a rocket!
- Start Time:
- 2016 November 11 15:30
- Room:
- Room 3183
- What are Observables and Why Should I Care?
- Start Time:
- 2016 November 11 15:30
- Room:
- Room 3179