Sticking Together while Staying Apart: Resilience in the time of global pandemic

A presentation at Deserted Island DevOps in April 2020 in by Fen Aldrich

Slide 1

Slide 1

Sticking Together…together while Staying Apart…apart Resilience in the time of global pandemic Aaron Aldrich Developer Advocate @CrayZeigh

Slide 2

Slide 2

Everythingʼs a little bit broken all of the time… but it keeps working anyway

Slide 3

Slide 3

Million-to-one chances… crop up nine times out of ten. ̶(GNU) Sir Terry Pratchet

Slide 4

Slide 4

Resilience @CrayZeigh

Slide 5

Slide 5

Resilience Graceful Extensibility Rebound Robustness Sustained Adaptability @CrayZeigh

Slide 6

Slide 6

Rebound Return to “normal” after a surprise or traumatic incident. Work done ahead of time. @CrayZeigh

Slide 7

Slide 7

Robustness The ability to withstand and absorb well-modeled disturbances. Knownknowns @CrayZeigh

Slide 8

Slide 8

Graceful Extensibility The ability to stretch with challenges to operational boundaries. Opposed to brittleness. @CrayZeigh

Slide 9

Slide 9

Sustained Adaptability Recognizing and managing adaptive capabilities over long timescales @CrayZeigh

Slide 10

Slide 10

@CrayZeigh

Slide 11

Slide 11

Bone • Continuously created and destroyed • Reconstruction directed by mechanical strain • Process directed by signals through layered networks at cell-level @CrayZeigh

Slide 12

Slide 12

@CrayZeigh

Slide 13

Slide 13

@CrayZeigh

Slide 14

Slide 14

Rebound Graceful Extensibility Robustness @CrayZeigh

Slide 15

Slide 15

Socio-Technical Systems @CrayZeigh

Slide 16

Slide 16

Conway’s Law Designed systems represent an organizationʼs communication structure @CrayZeigh

Slide 17

Slide 17

@CrayZeigh

Slide 18

Slide 18

Blunt end Removed from experience, upstream decision makers Sharp end Closest to the work, practitioners @CrayZeigh

Slide 19

Slide 19

• Constantly building and destroying systems • Strong signaling • Improve systems based on strain Sharp end • Will do so naturally given ownership Closest to the work, practitioners @CrayZeigh

Slide 20

Slide 20

Teams that do well dealing with impact [surprises/incidents] are those that have a strong common ground ̶J. Paul Reed (@jpaulreed), Failover Conf

Slide 21

Slide 21

If we want to improve a teamʼs resilience, we must build a strong common ground ̶Me, Just now.

Slide 22

Slide 22

Common Ground • • • • Basic Compact Goal Alignment/ Commitment Inter-predictability Sustain & Repair @CrayZeigh

Slide 23

Slide 23

Building Common Ground • • • • Blameless Postmortems Chaos Engineering Game Days Modeling Vulnerability @CrayZeigh

Slide 24

Slide 24

@CrayZeigh

Slide 25

Slide 25

@CrayZeigh

Slide 26

Slide 26

@CrayZeigh

Slide 27

Slide 27

Resilience is about creating the conditions that maximize everyoneʼs potential ̶Rein Hendrichs, >Code Podcast, 174: Resilience

Slide 28

Slide 28

@CrayZeigh

Slide 29

Slide 29

@CrayZeigh

Slide 30

Slide 30

What happens when governments fail? @CrayZeigh

Slide 31

Slide 31

It’s left to us @CrayZeigh

Slide 32

Slide 32

Community Building is Resilience Engineering ̶Me again, just now again.

Slide 33

Slide 33

Strong Communities • • • • • Diverse High Trust & Safety Sustain & Repair Inter-predictability Loosely Coupled, layered networks @CrayZeigh

Slide 34

Slide 34

@CrayZeigh

Slide 35

Slide 35

@CrayZeigh

Slide 36

Slide 36

Slide 37

Slide 37

Slide 38

Slide 38

https://bit.ly/2Ym7Tp9 @CrayZeigh

Slide 39

Slide 39

@CrayZeigh

Slide 40

Slide 40

@CrayZeigh

Slide 41

Slide 41

@CrayZeigh

Slide 42

Slide 42

@CrayZeigh

Slide 43

Slide 43

Enable potential and get out of the way @CrayZeigh

Slide 44

Slide 44

Slides & Resources speaking.crayzeigh.com OSMIhelp.org Aaron Aldrich Developer Advocate EmotionalAPI.com devopsdays.org @CrayZeigh

Slide 45

Slide 45

I love you Be safe out there Weʼre all in this together @CrayZeigh

Slide 46

Slide 46

Further Reading/Watching/Listening Four concepts for resilience and the implications for the future of resilience engineering - David Woods https://bit.ly/3bITTdc The Marvelous Resilience of Bone - Dr. Richard Cook, REdeploy 2019 https://www.youtube.com/watch?v=8LbePBiOvZ4 Greater Than Code, 174: Resilience https://www.greaterthancode.com/resilience The Worst Year Ever, How to Save your Community When The Government Fails https://ihr.fm/3eVNFbI @CrayZeigh

Slide 47

Slide 47

Further Reading/Watching/Listening Behind Human Error(2nd Edition) - Woods, Dekker, Cook, Johannessen, Carter The Woolworths Experiment https://safetydifferently.com/the-woolworths-experiment/ The Field Guide to Understanding Human Error - Sydney Dekker Literally every video from REdeploy: https://www.youtube.com/channel/UCHbJcI6KfyxflRqdv26b3Qw On Borrowing From Yourself - Aaron Aldrich https://dev.to/crayzeigh/a-reflection-on-borrowing-from-yourself-3jhf @CrayZeigh