Sticking Together while Staying Apart: Resilience in the time of global pandemic

A presentation at Seattle DevOps Meetup in March 2021 in by fen aldrich

Slide 1

Slide 1

Sticking Together while Staying Apart Resilience in the time of global pandemic Aaron Aldrich @CrayZeigh

Slide 2

Slide 2

Million-to-one chances… crop up nine times out of ten. —Sir Terry Pratchet GNU @CrayZeigh

Slide 3

Slide 3

Everything’s a little bit broken all of the time… but it keeps working anyway @CrayZeigh

Slide 4

Slide 4

Resilience @CrayZeigh

Slide 5

Slide 5

Resilience Rebound Graceful Extensibility Robustness Sustained Adaptability @CrayZeigh

Slide 6

Slide 6

Rebound Return to “normal” after a surprise or traumatic incident. Work done ahead of time @CrayZeigh

Slide 7

Slide 7

Robustness The ability to withstand and absorb well-modeled disturbances. Known-knowns @CrayZeigh

Slide 8

Slide 8

Graceful Extensibility The ability to stretch with challenges to operational boundaries. Opposed to brittleness. @CrayZeigh

Slide 9

Slide 9

Sustained Adaptability Recognizing and managing adaptive capabilities over long timescales. Requires people @CrayZeigh

Slide 10

Slide 10

@CrayZeigh

Slide 11

Slide 11

Bone • Continuously created and destroyed • Reconstruction directed by mechanical strain • Process directed by signals through layered networks at cell-level @CrayZeigh

Slide 12

Slide 12

https://youtu.be/8LbePBiOvZ4 @CrayZeigh

Slide 13

Slide 13

Rebound Graceful Extensibility Robustness @CrayZeigh

Slide 14

Slide 14

Socio-Technical Systems @CrayZeigh

Slide 15

Slide 15

Conway’s Law Designed systems represent an organization’s communication structure @CrayZeigh

Slide 16

Slide 16

@CrayZeigh

Slide 17

Slide 17

@CrayZeigh

Slide 18

Slide 18

@CrayZeigh

Slide 19

Slide 19

Blunt End Removed from experience, upstream decision makers Sharp End Closest to the work, practitioners @CrayZeigh

Slide 20

Slide 20

• • • • Constantly building and destroying systems Strong signaling Improve systems based on strain Will do so naturally given ownership Sharp End Closest to the work, practitioners @CrayZeigh

Slide 21

Slide 21

Teams that do well dealing with impact [surprises/incidents] are those that have a strong common ground —J. Paul Reed (@jpaulreed), Sr Applied Resilience Engineer, Net ix Failover Conf

fl @CrayZeigh

Slide 22

Slide 22

If we want to improve a team’s resilience, we must build a strong common ground —Me, Just now. @CrayZeigh

Slide 23

Slide 23

Common Ground • • • • Basic Compact Goal Alignment/ Commitment Inter-predictability Sustain & Repair @CrayZeigh

Slide 24

Slide 24

Building Common Ground • • • • Blameless Postmortems Chaos Engineering Game Days Modeling Vulnerability @CrayZeigh

Slide 25

Slide 25

@CrayZeigh

Slide 26

Slide 26

Our analysis found that this culture of psychological safety is predictive of software delivery performance, organizational performance and productivity. — Accelerate State of DevOps 2019 @CrayZeigh

Slide 27

Slide 27

https://youtu.be/SgCGD7rutSw @CrayZeigh

Slide 28

Slide 28

Resilience is about creating the conditions that maximize everyone’s potential —Rein Henrichs, >Code Podcast, 174: Resilience @CrayZeigh

Slide 29

Slide 29

@CrayZeigh

Slide 30

Slide 30

@CrayZeigh

Slide 31

Slide 31

What happens when governments fail? @CrayZeigh

Slide 32

Slide 32

It’s left to us @CrayZeigh

Slide 33

Slide 33

Community Building is Resilience Engineering —Me again, just now again. @CrayZeigh

Slide 34

Slide 34

Strong Communities • • • • • Diverse High Trust & Safety Sustain & Repair Inter-predictability Loosely Coupled, layered networks @CrayZeigh

Slide 35

Slide 35

@CrayZeigh

Slide 36

Slide 36

@CrayZeigh

Slide 37

Slide 37

@CrayZeigh

Slide 38

Slide 38

@CrayZeigh

Slide 39

Slide 39

@CrayZeigh

Slide 40

Slide 40

@CrayZeigh

Slide 41

Slide 41

@CrayZeigh

Slide 42

Slide 42

https://bit.ly/2Ym7Tp9 @CrayZeigh

Slide 43

Slide 43

@CrayZeigh

Slide 44

Slide 44

@CrayZeigh

Slide 45

Slide 45

@CrayZeigh

Slide 46

Slide 46

https://desertedislanddevops.com @CrayZeigh

Slide 47

Slide 47

https://youtu.be/L9A6ZauhOhg @CrayZeigh

Slide 48

Slide 48

Enable potential and get out of the way @CrayZeigh

Slide 49

Slide 49

Slides & Resources speaking.crayzeigh.com OSMIhelp.org Aaron Aldrich Managed OpenShift Black Belt EmotionalAPI.com @CrayZeigh devopsdays.org

Slide 50

Slide 50

twitch.tv/desertedislandtv discord.gg/CPM5Jcg

Slide 51

Slide 51

I love you Do Good out there We’re all in this together @CrayZeigh

Slide 52

Slide 52

Watching/Listening Four concepts for resilience and the implications for the future of resilience engineering - David Wood https://bit.ly/3bITTdc The Marvelous Resilience of Bone - Dr. Richard Cook, REdeploy 201 https://www.youtube.com/watch?v=8LbePBiOvZ4 Greater Than Code, 174: Resilienc https://www.greaterthancode.com/resilience The Worst Year Ever, How to Save your Community When The Government Fail https://ihr.fm/3eVNFbI s 9 e s @CrayZeigh

Slide 53

Slide 53

Watching/Listening Behind Human Error(2nd Edition) - Woods, Dekker, Cook, Johannessen, Carter The Woolworths Experimen https://safetydifferently.com/the-woolworths-experiment/ The Field Guide to Understanding Human Error - Sydney Dekker Literally every video from REdeploy https://www.youtube.com/channel/UCHbJcI6Kfyx Rqdv26b3Qw On Borrowing From Yourself - Aaron Aldric https://dev.to/crayzeigh/a-re ection-on-borrowing-from-yourself-3jhf fl h : t fl @CrayZeigh

Slide 54

Slide 54

Watching/Listening Kick ‘Em or Keep ‘Em - Collaborating on our own Deserted Islands - Matt Stratto https://youtu.be/SgCGD7rutSw ACCELERATE State of DevOps 2019 https://services.google.com/fh/ les/misc/state-of-devops-2019.pdf n fi @CrayZeigh