BY OUR POWERS COMBINED: OBSERVABILITY FOR DEVELOPERS @CrayZeigh — #CodeMash2020

HI, CODEMASH!

! $ ” : theelasticast.com : @CrayZeigh : aaron.aldrich@elastic.co : noti.st/crayzeigh

!

OBSERVABILITY @CrayZeigh — #CodeMash2020

DEVOPS @CrayZeigh — #CodeMash2020

DEVOPS @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

DEVOPS @CrayZeigh — #CodeMash2020

DEVOPS > Wave 1: Ops learns code & automation @CrayZeigh — #CodeMash2020

DEVOPS > Wave 1: Ops learns code & automation 1 > Wave 2: Dev owns code through production 1 https://vimeo.com/341142053 @CrayZeigh — #CodeMash2020

*Simon Wardley: https://twitter.com/swardley/status/1014883354481741825?lang=en

https://dev.to/molly_struve/making-on-call-not-suck-490

SHARED LANGUAGE @CrayZeigh — #CodeMash2020

SHARED TOOLS @CrayZeigh — #CodeMash2020

SHARED SOURCE OF TRUTH @CrayZeigh — #CodeMash2020

Isn’t it just Monitoring with better SEO? — You @CrayZeigh — #CodeMash2020

YOU’RE NOT WRONG.. @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

MONITORING Tracking system health and watching for known failure conditions. Good for tracking known unknowns and generating failure alerts for operational issues. Debugging often requires further investigation of systems and failure state recreation. Finds horses. @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

THERE IS NO ROOT CAUSE @CrayZeigh — #CodeMash2020

OBSERVABILITY A system is observable when you can ask arbitrary questions about it and receive meaningful answers without having to resort to writing new code or command line tools. It lets you discover unknown-unknowns and debug in production. Helps debug zebras. @CrayZeigh — #CodeMash2020

Software is inherently opaque, we have to instrument it to output meaningful information @CrayZeigh — #CodeMash2020

THE THREE PILLARS OF OBSERVABILITY @CrayZeigh — #CodeMash2020

THE THREE PILLARS OF OBSERVABILITY 1. Logs @CrayZeigh — #CodeMash2020

THE THREE PILLARS OF OBSERVABILITY 1. Logs 2. Metrics @CrayZeigh — #CodeMash2020

THE THREE PILLARS OF OBSERVABILITY 1. Logs 2. Metrics 3. APM @CrayZeigh — #CodeMash2020

THE THREE PILLARS OF OBSERVABILITY 1. Logs Events 2. Metrics 3. APM @CrayZeigh — #CodeMash2020

THE THREE PILLARS OF OBSERVABILITY 1. Logs Events 2. Metrics 3. APM 4. Distributed Tracing @CrayZeigh — #CodeMash2020

THE THREE PILLARS OF OBSERVABILITY 1. Logs Events 2. Metrics 3. APM & Distributed Tracing 4. Distributed Tracing @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

LOGS/METRICS/APM ARE THE MEDIA WE WORK IN @CrayZeigh — #CodeMash2020

METRICS @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

EVENTS @CrayZeigh — #CodeMash2020

CARDINALITY & YOU @CrayZeigh — #CodeMash2020

EXISTING LOGS 66.249.65.159 - - [06/Nov/2014:19:10:38 +0600] “GET /news/53f8d72920ba2744fe873ebc.html HTTP/1.1” 404 177 “-” “Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version /6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” 66.249.65.3 - - [06/Nov/2014:19:11:24 +0600] “GET /?q=%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0 HTTP/ 1.1” 200 4223 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” 66.249.65.62 - - [06/Nov/2014:19:12:14 +0600] “GET /?q=%E0%A6%A6%E0%A7%8B%E0%A7%9F%E0%A6%BE HTTP/1.1” 200 4356 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” @CrayZeigh — #CodeMash2020

STRUCTURED LOGGING { “@timestamp”: “2019-08-04T12:30:04.000Z”, … “container”: { “image.id”: “48f5af6667f3457be0a2c7814caefe21ed3c94fb94bd6243096b3a61ea502b1d”, “version”: “version”, }, “build.id”: “efdd0b5e69b0742fa5e5bad0771df4d1df2459d1” … “transaction”: “transaction_ID”, “user”: “importantPerson”, “account”: “0129”, “os”: “osx”, … “api_endpoint”: “endpoint”, … “response”: 400, … “message”: “Some informative thing, probably more human readable and friendly, but difficult to parse” } @CrayZeigh — #CodeMash2020

APM & TRACING @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

SHARED LANGUAGE @CrayZeigh — #CodeMash2020

SHARED LANGUAGE > Learning to speak Prod @CrayZeigh — #CodeMash2020

SHARED LANGUAGE > Learning to speak Prod > Teaching Prod to speak Dev (Structured Logs; Traces) @CrayZeigh — #CodeMash2020

SLI/SLO/SLA @CrayZeigh — #CodeMash2020

SYSTEMS RELIABILITY ENGINEERING @CrayZeigh — #CodeMash2020

SYSTEMS RELIABILITY ENGINEERING > SLI: Service Level Indicator @CrayZeigh — #CodeMash2020

SYSTEMS RELIABILITY ENGINEERING > SLI: Service Level Indicator > SLO: Service level Objective @CrayZeigh — #CodeMash2020

SYSTEMS RELIABILITY ENGINEERING > SLI: Service Level Indicator > SLO: Service level Objective > SLA: Service level Agreement @CrayZeigh — #CodeMash2020

SYSTEMS RELIABILITY ENGINEERING > SLI: Service Level Indicator > SLO: Service level Objective > SLA: Service level Agreement @CrayZeigh — #CodeMash2020

SHARED LANGUAGE > Learning to speak Prod > Teaching Prod to speak Dev (Structured Logs; Traces) @CrayZeigh — #CodeMash2020

SHARED LANGUAGE > Learning to speak Prod > Teaching Prod to speak Dev (Structured Logs; Traces) > Speaking directly to business value & Customer Experience @CrayZeigh — #CodeMash2020

SHARED TOOLS @CrayZeigh — #CodeMash2020

SHARED TOOLS > Debugging in Prod @CrayZeigh — #CodeMash2020

SHARED TOOLS > Debugging in Prod > Ops skills transferrable and replicable @CrayZeigh — #CodeMash2020

SHARED TOOLS > Debugging in Prod > Ops skills transferrable and replicable > New knowledge and methods shareable @CrayZeigh — #CodeMash2020

CONVERGED TOOLSETS @CrayZeigh — #CodeMash2020

CONVERGED TOOLSETS > Single platform for all data means easy, improved debugging @CrayZeigh — #CodeMash2020

CONVERGED TOOLSETS > Single platform for all data means easy, improved debugging > Any arbitrary business data can be added and correlated @CrayZeigh — #CodeMash2020

CONVERGED TOOLSETS > Single platform for all data means easy, improved debugging > Any arbitrary business data can be added and correlated > SIEM & InfoSec @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

CONVERGED TOOLSETS Vendor Warning @CrayZeigh — #CodeMash2020 ⚠

CONVERGED TOOLSETS > Single platform for all data means easy, improved debugging Vendor Warning @CrayZeigh — #CodeMash2020 ⚠

CONVERGED TOOLSETS > Single platform for all data means easy, improved debugging > Any arbitrary business data can be added and correlated (ECS) Vendor Warning @CrayZeigh — #CodeMash2020 ⚠

CONVERGED TOOLSETS > Single platform for all data means easy, improved debugging > Any arbitrary business data can be added and correlated (ECS) > SIEM & InfoSec Vendor Warning @CrayZeigh — #CodeMash2020 ⚠

SHARED SOURCE OF TRUTH @CrayZeigh — #CodeMash2020

SHARED SOURCE OF TRUTH > Real production data @CrayZeigh — #CodeMash2020

SHARED SOURCE OF TRUTH > Real production data > Draw better lines from code to prod @CrayZeigh — #CodeMash2020

SHARED SOURCE OF TRUTH > Real production data > Draw better lines from code to prod > Write better, production ready code. @CrayZeigh — #CodeMash2020

WHERE DO WE GO FROM HERE? @CrayZeigh — #CodeMash2020

Testing & Experimentation @CrayZeigh — #CodeMash2020

TEST IN PRODUCTION @CrayZeigh — #CodeMash2020

@CrayZeigh — #CodeMash2020

FEATURE FLAGS ! https://martinfowler.com/articles/feature-toggles.html @CrayZeigh — #CodeMash2020

config.json { “featureFlags”: { “newThing”: true, } } app.js import { featureFlags } from “./config.json” if (featureFlags.newThing) { // do the new thing! } else { // do the old thing! } @CrayZeigh — #CodeMash2020

DON’T DEBATE EXPERIMENT @CrayZeigh — #CodeMash2020

2 Speaking of Testing : 2 QA https://theelasticast.com/episodes/0017-qa/ @CrayZeigh — #CodeMash2020

Chaos Engineering is about refining our mental models @CrayZeigh — #CodeMash2020

DEVOPS > Wave 1: Ops learns code & automation 1 > Wave 2: Dev owns code through production 1 https://vimeo.com/341142053 @CrayZeigh — #CodeMash2020

DEV OWNS CODE THROUGH PRODUCTION @CrayZeigh — #CodeMash2020

DEV OWNS CODE THROUGH PRODUCTION > Better, more production-ready code @CrayZeigh — #CodeMash2020

DEV OWNS CODE THROUGH PRODUCTION > Better, more production-ready code > Real World experimentation @CrayZeigh — #CodeMash2020

DEV OWNS CODE THROUGH PRODUCTION > Better, more production-ready code > Real World experimentation > Improved operational resiliency @CrayZeigh — #CodeMash2020

THE POWER IS YOURS

THANKS! > Slides & References: noti.st/crayzeigh > Trial: ela.st/aaron-aldrich-trial > Come say, “Hi!” at the Elastic booth (N6) > Check out Elastic APM @ 14:00 in Salon A! @CrayZeigh — #CodeMash2020