[38;5;12m                           [39m[38;2;255;187;0m[1m[4mAwesome Site Reliability Engineering  [0m[38;5;14m[1m[4m![0m[38;2;255;187;0m[1m[4mAwesome[0m[38;5;14m[1m[4m (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)[0m[38;2;255;187;0m[1m[4m (https://github.com/sindresorhus/awesome)[0m
[38;5;12m (https://dastergon.gr/awesome-sre)[39m


[38;5;12mA curated list of awesome [39m[38;5;14m[1mSite Reliability[0m[38;5;12m (https://www.usenix.org/conference/srecon14/technical-sessions/presentation/keys-sre) and [39m[38;5;14m[1mProduction[0m[38;5;12m (https://www.usenix.org/conference/srecon15/program/presentation/canahuati) Engineering resources.[39m

[38;2;255;187;0m[4mWhat is Site Reliability Engineering?[0m
[38;5;11m[1m▐[0m[38;5;12m [39m[38;5;12m"Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE[39m

[38;2;255;187;0m[4mContributing[0m

[38;5;12mPlease take a look at the [39m[38;5;14m[1mcontribution guidelines[0m[38;5;12m (CONTRIBUTING.md) first.[39m
[38;5;12mContributions are always welcome![39m

[38;2;255;187;0m[4mContents[0m
[38;5;12m- [39m[38;5;14m[1mCulture[0m[38;5;12m (#culture)[39m
[38;5;12m- [39m[38;5;14m[1mEducation[0m[38;5;12m (#education)[39m
[38;5;12m- [39m[38;5;14m[1mBooks[0m[38;5;12m (#books)[39m
[38;5;12m- [39m[38;5;14m[1mHiring[0m[38;5;12m (#hiring)[39m
[38;5;12m- [39m[38;5;14m[1mReliability[0m[38;5;12m (#reliability)[39m
[38;5;12m- [39m[38;5;14m[1mMonitoring & Observability & Alerting[0m[38;5;12m (#monitoring--observability--alerting)[39m
[38;5;12m- [39m[38;5;14m[1mOn-Call[0m[38;5;12m (#on-call)[39m
[38;5;12m- [39m[38;5;14m[1mPost-Mortem[0m[38;5;12m (#post-mortem)[39m
[38;5;12m- [39m[38;5;14m[1mCapacity Planning[0m[38;5;12m (#capacity-planning)[39m
[38;5;12m- [39m[38;5;14m[1mService Level Agreement[0m[38;5;12m (#service-level-agreement)[39m
[38;5;12m- [39m[38;5;14m[1mPerformance[0m[38;5;12m (#performance)[39m
[38;5;12m- [39m[38;5;14m[1mProgramming[0m[38;5;12m (#programming)[39m
[38;5;12m- [39m[38;5;14m[1mMisc Articles[0m[38;5;12m (#misc-articles)[39m
[38;5;12m- [39m[38;5;14m[1mReal-time Messaging[0m[38;5;12m (#real-time-messaging)[39m
[38;5;12m- [39m[38;5;14m[1mBlogs[0m[38;5;12m (#blogs)[39m
[38;5;12m- [39m[38;5;14m[1mNewsletters[0m[38;5;12m (#newsletters)[39m
[38;5;12m- [39m[38;5;14m[1mConferences & Meetups[0m[38;5;12m (#conferences-meetups)[39m
[38;5;12m- [39m[38;5;14m[1mTwitter[0m[38;5;12m (#twitter)[39m
[38;5;12m- [39m[38;5;14m[1mSRE Tools[0m[38;5;12m (#sre-tools)[39m
[38;5;12m- [39m[38;5;14m[1mSRE Podcasts[0m[38;5;12m (#podcasts)[39m

[38;2;255;187;0m[4mCulture[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat is Site Reliability Engineering?[0m[38;5;12m (https://landing.google.com/sre/interview/ben-treynor.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mKeys To SRE by Ben Treynor[0m[38;5;12m (https://www.usenix.org/conference/srecon14/technical-sessions/presentation/keys-sre)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGoogle SRE Resources[0m[38;5;12m (https://landing.google.com/sre/resources.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mNotes from Production Engineering by Pedro Canahuati[0m[38;5;12m (https://www.usenix.org/conference/srecon15/program/presentation/canahuati)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPostOps: Recovery from Operations[0m[38;5;12m (https://www.usenix.org/conference/srecon15europe/program/presentation/underwood)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLove DevOps? Wait 'till you meet SRE[0m[38;5;12m (https://www.atlassian.com/it-service/site-reliability-engineering-sre) [39m[38;5;12mvideo[39m[38;5;14m[1m [0m[38;5;12m (https://youtu.be/fsTpRx8Pt-k)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow Google Does Planet-Scale Engineering for Planet-Scale Infra[0m[38;5;12m (https://www.youtube.com/watch?v=H4vMcD7zKM0)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering at Facebook[0m[38;5;12m (https://www.facebook.com/notes/facebook-engineering/site-reliability-engineering-at-facebook/291616313919/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA History of Site Reliability Engineering at Uber[0m[38;5;12m (https://www.youtube.com/watch?v=qJnS-EfIIIE&nohtml5=False)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCase Study: Adopting SRE Principles at StackOverflow[0m[38;5;12m (https://www.usenix.org/conference/srecon15/program/presentation/limoncelli)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering at Dropbox[0m[38;5;12m (https://www.youtube.com/watch?v=ggizCjUCCqE)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineers — Keeping Google up and running 24/7[0m[38;5;12m (https://www.youtube.com/watch?v=yXI7r0_J29M)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering at Salesforce[0m[38;5;12m (https://www.salesforce.com/video/193050/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;12mFrom Sys Admin to Netflix SRE - [39m[38;5;14m[1mvideo[0m[38;5;12m (https://www.youtube.com/watch?v=lZI51YzIgVE) and [39m[38;5;14m[1mslides[0m[38;5;12m (https://www.socallinuxexpo.org/sites/default/files/presentations/Scale%20x14%20Slides.pdf)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE@Google: Thousands of DevOps Since 2004[0m[38;5;12m (https://www.youtube.com/watch?v=iIuTnhdTzK0)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTransactional System Administration Is Killing Us and Must be Stopped[0m[38;5;12m (https://www.usenix.org/conference/lisa15/conference-program/presentation/limoncelli)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA hierarchy of SRE needs[0m[38;5;12m (https://web.archive.org/web/20190401220948/https://plus.google.com/+lizthegrey/posts/MLAJFVyEb2f)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPostOps: A Non-Surgical Tale of Software, Fragility, and Reliability[0m[38;5;12m (https://www.usenix.org/conference/lisa13/technical-sessions/plenary/underwood)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE: An incomplete guide to cultural Narnia[0m[38;5;12m (https://web.archive.org/web/20180820235243/http://anthonycaiafa.com/2016/04/10/sre-cultural-narnia/) - [39m[38;5;12mVideo[39m[38;5;14m[1m [0m[38;5;12m (https://www.youtube.com/watch?v=__wypEhdcrQ&t=0s)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPutting Together Great SRE Teams[0m[38;5;12m (https://www.usenix.org/conference/srecon16/program/presentation/krishnan)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWork at Google: Meet our Production Engineers for Site Reliability Hangout on Air[0m[38;5;12m (https://www.youtube.com/watch?v=bwt6TZjefGM)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mToil: A Word Every Engineer Should Know[0m[38;5;12m (https://sharpend.io/toil-a-word-every-engineer-should-know/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEngineering Reliability into Web Sites: Google SRE[0m[38;5;12m (https://research.google.com/pubs/pub32583.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDEVOPS & SRE AMA - Building High Performance Organizations[0m[38;5;12m (https://vimeo.com/179914447)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mJohn Allspaw's AMA on Incident Analysis and Postmortems[0m[38;5;12m (https://community.atlassian.com/t5/Jira-Ops-questions/I-m-John-Allspaw-Ask-Me-Anything-about-incident-analysis-and/qaq-p/957084)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;12mSite Reliability Engineering with Paul Newson - [39m[38;5;14m[1mPart 1[0m[38;5;12m (https://www.gcppodcast.com/post/episode-38-site-reliability-engineering-with-paul-newson/) & [39m[38;5;14m[1mPart 2[0m[38;5;12m (https://gcppodcast.com/post/episode-59-sre-ii-with-paul-newson/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow SysAdmins Devalue Themselves[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=2891413)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Softer Side of DevOps[0m[38;5;12m (https://www.youtube.com/watch?v=ry51Llzil1I)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE, noun. See also: confidence, trust.[0m[38;5;12m (https://medium.com/@kobolog/sre-noun-see-also-confidence-trust-e7e33e19efc1)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering with Stephen Weinberg[0m[38;5;12m (https://youtu.be/24xb7oZgu-I?t=29m24s)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWe are the Google Site Reliability team. We make Google’s websites work. Ask us Anything![0m[38;5;12m (https://www.reddit.com/r/IAmA/comments/177267/we_are_the_google_site_reliability_team_we_make)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWe are the Google Site Reliability Engineering team. Ask us Anything![0m[38;5;12m (https://www.reddit.com/r/IAmA/comments/1w1y5m/we_are_the_google_site_reliability_engineering/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Ops Identity Crisis[0m[38;5;12m (http://www.susanjfowler.com/blog/2016/10/13/the-ops-identity-crisis)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Irreproducibility Of Bugs In Large-Scale Production Systems[0m[38;5;12m (http://www.susanjfowler.com/blog/2016/11/2/the-irreproducibility-of-bugs-in-large-scale-production-systems)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSE-Radio Episode 276: Björn Rabenstein on Site Reliability Engineering[0m[38;5;12m (http://www.se-radio.net/2016/12/se-radio-episode-276-bjorn-rabenstein-on-site-reliability-engineering/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMicroservices, DevOps and Production Complexity[0m[38;5;12m (https://blog.netsil.com/microservices-devops-and-operational-complexity-be98cb01b660)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIntroducing Google Customer Reliability Engineering[0m[38;5;12m (https://cloudplatform.googleblog.com/2016/10/introducing-a-new-era-of-customer-support-Google-Customer-Reliability-Engineering.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEvolution or Rebellion? The rise of Site Reliability Engineers (SRE)[0m[38;5;12m (https://robhirschfeld.com/2016/12/29/evolution-or-rebellion-the-rise-of-site-reliability-engineers-sre/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe difference between Site Reliability Engineering, System Administration, and DevOps[0m[38;5;12m (https://standalone-sysadmin.com/the-difference-between-site-reliability-engineering-system-administration-and-devops-d05031495499)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE in the Small and in the Large[0m[38;5;12m (https://www.usenix.org/conference/lisa16/conference-program/presentation/closing-plenary)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSBSRE Meetup: Different SRE roles and challenges(Netflix)[0m[38;5;12m (https://www.youtube.com/watch?v=zLXf0cKDOv0)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPanel: Who/What Is SRE?[0m[38;5;12m (https://www.usenix.org/conference/srecon16/program/presentation/definition-of-sre-panel)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHope Is Not a Strategy[0m[38;5;12m (https://medium.com/@jerub/hope-is-not-a-strategy-6a7d0a3b1c08)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTenets of SRE[0m[38;5;12m (https://medium.com/@jerub/tenets-of-sre-8af6238ae8a8)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering Demystified[0m[38;5;12m (https://medium.com/@venkatachalamrangasamy/site-reliability-engineering-demystified-ed676e0a7d56)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIs Site Reliability Engineering the True ‘Ops’ in DevOps?[0m[38;5;12m (https://devops.com/site-reliability-engineering-sre-true-ops-devops/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE vs. DevOps vs. Cloud Native: The Server Cage Match[0m[38;5;12m (https://devops.com/sre-devops-cloud-native-server-cage-match/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE: What’s The Big Idea?[0m[38;5;12m (https://youtu.be/8dfYLRAWn_c)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding the SRE Culture at LinkedIn[0m[38;5;12m (https://engineering.linkedin.com/blog/2017/05/building-the-sre-culture-at-linkedin)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPodcast #111 – SRE: Occasionally Maintaining Infrastructure That You Hate[0m[38;5;12m (https://stackoverflow.blog/2017/06/12/podcast-111-sre-occasionally-maintaining-infrastructure-hate/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSplicing SRE DNA Sequences in the Biggest Software Company on the Planet[0m[38;5;12m (https://www.usenix.org/conference/srecon16europe/program/presentation/splicing-sre-dna-sequences-biggest-software-company)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhy should your app get SRE support? - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/06/why-should-your-app-get-SRE-support-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow SREs find the landmines in a service - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/06/how-SREs-find-the-landmines-in-a-service-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMaking the most of an SRE service takeover - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/07/making-the-most-of-an-SRE-service-takeover-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Cloudcast #301: SRE and Infrastructure Operations (Podcast)[0m[38;5;12m (https://dzone.com/articles/the-cloudcast-301-sre-and-infrastructure-operation)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe SRE model[0m[38;5;12m (https://medium.com/@rakyll/the-sre-model-6e19376ef986)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOnboarding New Site Reliability Engineers[0m[38;5;12m (https://circleci.com/blog/onboarding-new-site-reliability-engineers/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding Blocks for Site Reliability At Google[0m[38;5;12m (https://www.youtube.com/watch?v=nQv9ySa8MTU)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBeyond Google SRE: What is Site Reliability Engineering like at Medium?[0m[38;5;12m (https://blog.netsil.com/beyond-google-sre-what-is-site-reliability-engineering-like-at-medium-71c65bd35f4e)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIntelligent Site Reliability Engineering – A Machine Learning Perspective[0m[38;5;12m (http://blog.adnanmasood.com/2016/05/19/intelligent-site-reliability-engineering-a-machine-learning-perspective/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA crash course in LinkedIn's global site operations[0m[38;5;12m (https://engineering.linkedin.com/day-life/crash-course-linkedins-global-site-operations)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGoogle’s Site Reliability Engineering with Todd Underwood[0m[38;5;12m (https://softwareengineeringdaily.com/2016/06/14/googles-site-reliability-engineering-todd-underwood/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat is Site Reliability Engineering? (VMware)[0m[38;5;12m (https://blogs.vmware.com/services-education-insights/2018/02/site-reliability-engineering.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA Gentle Introduction to SRE[0m[38;5;12m (http://geekologist.co/introduction-to-sre/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUnderstanding Site Reliability Engineering through Movies and Books[0m[38;5;12m (http://engineering.medallia.com/blog/posts/understanding-site-reliability-engineering-through-movies-and-books/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGOTO 2017 • Site Reliability Engineering at Google • Christof Leng[0m[38;5;12m (https://www.youtube.com/watch?v=Cxb7a8lTv8A)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;12mThe[39m[38;5;12m [39m[38;5;12mMakeup[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mSuccessful[39m[38;5;12m [39m[38;5;12mGeographically-Distributed[39m[38;5;12m [39m[38;5;12mSRE[39m[38;5;12m [39m[38;5;12mTeams[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mPart1[0m[38;5;12m [39m[38;5;12m(https://engineering.linkedin.com/blog/2018/03/the-makeup-of-successful-geographically-distributed-sre-teams--p)[39m[38;5;12m [39m[38;5;12m&[39m[38;5;12m [39m[38;5;14m[1mPart2[0m[38;5;12m [39m
[38;5;12m(https://engineering.linkedin.com/blog/2018/03/the-makeup-of-successful-geographically-distributed-sre-teams--p0)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTech Leadership in SRE[0m[38;5;12m (https://www.youtube.com/watch?v=6G2V1xPIM64)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Azure Podcast: Episode 227 - Azure SRE[0m[38;5;12m (http://azpodcast.azurewebsites.net/post/Episode-227-Azure-SRE1)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe human scalability of "DevOps"[0m[38;5;12m (https://medium.com/@mattklein123/the-human-scalability-of-devops-e36c37d3db6a)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPodcast: Site Reliability Management with Mike Hiraga[0m[38;5;12m (https://softwareengineeringdaily.com/2018/04/09/site-reliability-management-with-mike-hiraga/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow a cat inspired system reliability at Knowlarity[0m[38;5;12m (https://medium.com/@Knowlarity_Engineering/how-a-cat-inspired-system-reliability-at-knowlarity-ad73c24f29a7)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGetting Started with Site Reliability Engineering[0m[38;5;12m (https://github.com/devopsenterprise/2018-London/blob/master/Tuesday/Breakout%20Sessions/Throne%2C%20Stephen%2C%20Getting%20Started%20with%20Site%20Reliability%20Engineering.pdf)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m"Practical Applications of the Dickerson Pyramid" by Nat Welch[0m[38;5;12m (https://www.youtube.com/watch?v=xWAfTAu0Mww)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLinkedIn’s Kurt Andersen Uncovers Blindspots in SRE Implementations[0m[38;5;12m (https://blameless.com/blog/sre-implementations-blindspots/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInterview with Betsy Beyer, Stephen Thorne of Google[0m[38;5;12m (https://driftboatdave.com/2018/10/09/interview-with-betsy-beyer-stephen-thorne-of-google/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLess Risk Through Greater Humanity - Dave Rensin[0m[38;5;12m (https://www.youtube.com/watch?v=0zqBlRW_6jA)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGetting Started with SRE - Stephen Thorne, Google[0m[38;5;12m (https://www.youtube.com/watch?v=c-w_GYvi0eA)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding Successful SRE in Large Enterprises[0m[38;5;12m (https://drive.google.com/file/d/1FXwHm6mpmRA9NaIJEu4cB1s6ffbyGBfl/view)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSolving Reliability Fears with Site Reliability Engineering[0m[38;5;12m (https://www.youtube.com/watch?v=ZcZtU_TiFEM)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE vs. DevOps: competing standards or close friends?[0m[38;5;12m (https://cloud.google.com/blog/products/gcp/sre-vs-devops-competing-standards-or-close-friends)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to Avoid the 5 SRE Implementation Traps that Catch Even the Best Teams[0m[38;5;12m (https://thenewstack.io/how-to-avoid-the-5-sre-implementation-traps-that-catch-even-the-best-teams/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mReliability Engineering – The Essential Discipline for Complex Systems[0m[38;5;12m (https://vimeo.com/344515149)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Modern Site Reliability Workbench on Top of OCI[0m[38;5;12m (https://www.youtube.com/watch?v=bC5dIPzNH24)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE in the Third Age[0m[38;5;12m (https://www.usenix.org/conference/srecon19emea/presentation/rabenstein)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAbout SRE and how (not) to apply it[0m[38;5;12m (https://www.youtube.com/watch?v=vF6ajM3P_wM)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTransitioning a typical engineering ops team into an SRE powerhouse[0m[38;5;12m (https://cloud.google.com/blog/products/management-tools/transitioning-a-typical-engineering-ops-team-into-an-sre-powerhouse)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMaking a Lion Bulletproof: SRE in Banking[0m[38;5;12m (https://www.infoq.com/presentations/ing-sre-teams-practices/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIdentifying and tracking toil using SRE principles[0m[38;5;12m (https://cloud.google.com/blog/products/management-tools/identifying-and-tracking-toil-using-sre-principles)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mFrom Ops to SRE: Evolution of the OpenShift Dedicated Team[0m[38;5;12m (https://www.openshift.com/blog/from-ops-to-sre-evolution-of-the-openshift-dedicated-team)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMeeting reliability challenges with SRE principles[0m[38;5;12m (https://cloud.google.com/blog/products/management-tools/meeting-reliability-challenges-with-sre-principles)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA quick introduction to SRE principles[0m[38;5;12m (https://github.com/fhivemind/sre-playground)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe SRE I Aspire to Be[0m[38;5;12m (https://www.youtube.com/watch?v=KnC2eRUZMKY)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTaming Operational Load with VMware CRE[0m[38;5;12m (https://tanzu.vmware.com/content/blog/taming-operational-load-vmware-cre)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Cultural Values[0m[38;5;12m (https://dubrie.medium.com/sre-cultural-values-a0073b475183)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAre we there yet? Thoughts on assessing an SRE team’s maturity[0m[38;5;12m (https://cloud.google.com/blog/products/devops-sre/evaluating-where-your-team-lies-on-the-sre-spectrum)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat SREs have to do with project-based services?[0m[38;5;12m (https://www.linkedin.com/pulse/what-sres-have-do-project-based-services-rod-anami/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMaking operational work more visible[0m[38;5;12m (https://github.com/readme/guides/ops-work-visible)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE vs. DevOps: What’s the Difference Between Them?[0m[38;5;12m (https://spacelift.io/blog/sre-vs-devops)[39m

[38;2;255;187;0m[4mEducation[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPanel: Educating SRE[0m[38;5;12m (https://www.usenix.org/conference/srecon15/program/presentation/sebenik)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mFrom Zero to Hero: Recommended Practices for Training your Ever-Evolving SRE Teams[0m[38;5;12m (https://www.usenix.org/conference/srecon15/program/presentation/widdowson)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mNew to an SRE team?[0m[38;5;12m (https://www.linkedin.com/pulse/new-sre-team-anthony-caiafa/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Systems Engineering Side of Site Reliability Engineering[0m[38;5;12m (https://www.usenix.org/publications/login/june15/hixson)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGraduating from Bootcamp and interested in becoming a Site Reliability Engineer?[0m[38;5;12m (https://medium.com/@tammybutow/graduating-from-bootcamp-and-interested-in-becoming-a-site-reliability-engineer-b69a38ce858b)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSo you want to be a Site Reliability Engineer?[0m[38;5;12m (https://www.loomsystems.com/single-post/2016/03/23/So-you-want-to-be-a-Site-Reliability-Engineer)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSpiraling Ops Debt & the SRE Coding Imperative[0m[38;5;12m (https://www.loomsystems.com/blog/2017/02/06/spiraling-ops-debt-the-sre-coding-imperative)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSo you want to be an SRE?[0m[38;5;12m (https://hackernoon.com/so-you-want-to-be-an-sre-34e832357a8c)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCareer Profiles/Site Reliability Engineer[0m[38;5;12m (https://www.khanacademy.org/college-careers-more/career-content/career-profile-videos/site-reliability-engineer/v/ruth-grace-site-reliability-engineer-what-i-do-and-how-much-i-make)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat is the role of a Site Reliability Engineer?[0m[38;5;12m (https://cloudacademy.com/blog/what-is-the-role-of-a-site-reliability-engineer/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLynda.com: DevOps Foundations: Site Reliability Engineering[0m[38;5;12m (https://www.lynda.com/Software-Development-tutorials/DevOps-Foundations-Site-Reliability-Engineering/669542-2.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident Management Training: Wheel of Misfortune[0m[38;5;12m (https://dastergon.gr/wheel-of-misfortune/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Un-Reliability Engineering [0m[38;5;12mVideo Series[39m[38;5;14m[1m [0m[38;5;12m (https://www.youtube.com/watch?v=rmY8_PHanuI)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Ultimate Guide to Structuring a 90-Day Onboarding Plan[0m[38;5;12m (https://medium.com/swlh/the-ultimate-guide-to-structuring-a-90-day-onboarding-plan-c91af947376)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE fundamentals: SLIs, SLAs and SLOs[0m[38;5;12m (https://cloud.google.com/blog/products/gcp/sre-fundamentals-slis-slas-and-slos)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to Get Into SRE[0m[38;5;12m (https://blog.alicegoldfuss.com/how-to-get-into-sre/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDo you have an SRE team yet? How to start and assess your journey[0m[38;5;12m (https://cloud.google.com/blog/products/devops-sre/how-to-start-and-assess-your-sre-journey)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow SRE teams are organized, and how to get started[0m[38;5;12m (https://cloud.google.com/blog/products/devops-sre/how-sre-teams-are-organized-and-how-to-get-started)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhy SRE Documents Matter[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=3283589)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to get started with site reliability engineering (SRE)[0m[38;5;12m (https://www.oreilly.com/ideas/how-to-get-started-with-site-reliability-engineering-sre)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDuties of a Site Reliability Engineering Manager[0m[38;5;12m (https://victorops.com/blog/duties-of-a-site-reliability-engineering-manager)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDesigning distributed systems using NALSD flashcards[0m[38;5;12m (https://cloud.google.com/blog/products/management-tools/sre-principles-and-flashcards-to-design-nalsd)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTraining Site Reliability Engineers: What Your Organization Needs to Create a Learning Program[0m[38;5;12m (https://landing.google.com/sre/resources/practicesandprocesses/training-site-reliability-engineers)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Classroom: Distributed PubSub workshop[0m[38;5;12m (https://landing.google.com/sre/resources/practicesandprocesses/sre-classroom/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSchool of SRE: Curriculum for onboarding non-traditional hires and new grads[0m[38;5;12m (https://linkedin.github.io/school-of-sre/)[39m

[38;2;255;187;0m[4mBooks[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPractical Linux Infrastructure[0m[38;5;12m (https://link.springer.com/book/10.1007/978-1-4842-0511-2)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering: How Google Runs Production Systems[0m[38;5;12m (https://landing.google.com/sre/book.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Site Reliability Workbook: Practical Ways to Implement SRE[0m[38;5;12m (https://landing.google.com/sre/book.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mObservability Engineering: Achieving Production Excellence[0m[38;5;12m (https://info.honeycomb.io/observability-engineering-oreilly-book-2022)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Practice Of Cloud System Administration: Designing and Operating Large Distributed Systems[0m[38;5;12m (http://the-cloud-book.com/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWeb Operations - Keeping the Data On Time[0m[38;5;12m (http://shop.oreilly.com/product/0636920000136.do)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Checklist Manifesto: How to Get Things Right[0m[38;5;12m (http://atulgawande.com/book/the-checklist-manifesto/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMicroservices in Production - Standard Principles and Requirements[0m[38;5;12m (http://www.oreilly.com/programming/free/microservices-in-production.csp)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mProduction-Ready Microservices - Building Standardized Systems Across an Engineering Organization[0m[38;5;12m (http://shop.oreilly.com/product/0636920053675.do)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSystems Performance: Enterprise and the Cloud[0m[38;5;12m (https://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098/) [39m[38;5;12m*[39m[48;2;30;30;40m[38;5;13m[3mSample chapter titled [0m[48;2;30;30;40m[38;5;14m[1m[3mCPUs[0m[48;2;30;30;40m[38;5;13m[3m (http://ptgmedia.pearsoncmg.com/images/9780133390094/samplepages/0133390098.pdf)[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring Distributed Systems: Case Studies from Google's SRE Teams[0m[38;5;12m (http://www.oreilly.com/webops-perf/free/monitoring-distributed-systems.csp)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Human Side of Postmortems: Managing Stress and Cognitive Biases[0m[38;5;12m (http://www.oreilly.com/webops-perf/free/the-human-side-of-postmortems.csp)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mChaos Engineering: Building Confidence in System Behavior through Experiment[0m[38;5;12m (http://www.oreilly.com/webops-perf/free/chaos-engineering.csp)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPost-Incident Reviews: Learning from Failure for Improved Incident Responses[0m[38;5;12m (https://victorops.com/oreilly-post-incident-review/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAntifragile Systems and Teams[0m[38;5;12m (http://www.oreilly.com/webops-perf/free/antifragile-systems-and-teams.csp)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to Monitoring the SRE Golden Signals (E-Book)[0m[38;5;12m (https://www.slideshare.net/OpsStack/how-to-monitoring-the-sre-golden-signals-ebook/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident Management for Operations[0m[38;5;12m (http://shop.oreilly.com/product/0636920036159.do)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mReal-World SRE[0m[38;5;12m (https://www.packtpub.com/web-development/real-world-sre)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSeeking SRE[0m[38;5;12m (http://shop.oreilly.com/product/0636920063964.do)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat is SRE?[0m[38;5;12m (https://www.verizondigitalmedia.com/e-book/oreilly-what-is-sre/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEngineering Reliable Mobile Applications: Strategies for Developing Resilient Native Mobile Applications[0m[38;5;12m (https://landing.google.com/sre/resources/practicesandprocesses/engineering-reliable-mobile-applications/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding Secure and Reliable Systems[0m[38;5;12m (https://landing.google.com/sre/book.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mChaos Engineering: Crash test your applications[0m[38;5;12m (https://www.manning.com/books/chaos-engineering/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m97 Things Every SRE Should Know[0m[38;5;12m (https://www.oreilly.com/library/view/97-things-every/9781492081487/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mFour Steps to Creating Effective Game Day Tests[0m[38;5;12m (https://shopify.engineering/four-steps-creating-effective-game-day-tests)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Linux Programming Interface[0m[38;5;12m (https://nostarch.com/tlpi)[39m

[38;2;255;187;0m[4mHiring[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Hiring[0m[38;5;12m (https://www.usenix.org/conference/srecon15/program/presentation/fong)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHiring SREs at LinkedIn[0m[38;5;12m (https://engineering.linkedin.com/engineering-culture/hiring-sres-linkedin)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHiring Site Reliability Engineers[0m[38;5;12m (https://www.usenix.org/publications/login/june15/hiring-site-reliability-engineers)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHiring your first SRE[0m[38;5;12m (https://sreally.com/hiring-your-first-sre-bdda38ee175d#.2m3sqyuw9)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGrowing the Site Reliability Team at LinkedIn: Hiring is Hard[0m[38;5;12m (https://www.youtube.com/watch?v=ZemNg9GYvOA)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEngineering Manager - Site Reliability Engineering Interview Preparation[0m[38;5;12m (https://danrl.com/blog/srm)[39m

[38;2;255;187;0m[4mReliability[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Realities of the Job of Delivering Reliability[0m[38;5;12m (https://www.usenix.org/conference/srecon16/program/presentation/kroll)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mFail at Scale by Ben Maurer[0m[38;5;12m (http://queue.acm.org/detail.cfm?id=2839461)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEmbracing Failure: Fault-Injection and Service Reliability[0m[38;5;12m (https://www.youtube.com/watch?v=wrY7XoOnysg)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m10 Years of Crashing Google[0m[38;5;12m (https://www.usenix.org/conference/lisa15/conference-program/presentation/krishnan)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow we break things at Twitter: failure testing[0m[38;5;12m (https://blog.twitter.com/2015/how-we-break-things-at-twitter-failure-testing)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mReliable Cron across the Planet[0m[38;5;12m (http://queue.acm.org/detail.cfm?id=2745840)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPush our limits - reliability testing at Twitter[0m[38;5;12m (https://blog.twitter.com/2014/push-our-limits-reliability-testing-at-twitter)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Verification of a Distributed System by Caitie McCaffrey[0m[38;5;12m (http://queue.acm.org/detail.cfm?ref=rss&id=2889274)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWeathering the Unexpected[0m[38;5;12m (http://queue.acm.org/detail.cfm?id=2371516)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Hour: Tech Talks by Box & Yelp[0m[38;5;12m (https://www.youtube.com/watch?v=YFDwdRVTg4g)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSimplicity: A Prerequisite for Reliability[0m[38;5;12m (https://sharpend.io/simplicity-a-prerequisite-for-reliability/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Two Sides to Google Infrastructure for Everyone Else[0m[38;5;12m (https://speakerdeck.com/garethr/the-two-sides-to-google-infrastructure-for-everyone-else)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow Embracing Continuous Release Reduced Change Complexity[0m[38;5;12m (https://www.usenix.org/conference/ures14west/summit-program/presentation/dickson)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMaking "Push On Green" a Reality[0m[38;5;12m (https://www.usenix.org/publications/login/october-2014-vol-39-no-5/making-push-green-reality)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBeyondCorp: A New Approach to Enterprise Security[0m[38;5;12m (https://www.usenix.org/publications/login/dec14/ward)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBrainstorming Failure by Jeff Smith[0m[38;5;12m (https://www.youtube.com/watch?v=dKe9S8u44Yk)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Ripple Effect Of Outages And Downtime Cannot Be Underestimated[0m[38;5;12m (http://cloudtweaks.com/2016/04/outages-and-downtime/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe infrastructure behind Twitter: efficiency and optimization[0m[38;5;12m (https://blog.twitter.com/2016/the-infrastructure-behind-twitter-efficiency-and-optimization)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDickerson's Hierarchy of Reliability[0m[38;5;12m (https://docs.google.com/drawings/d/1kshrK2RLkW-XV8enmWZxeRFRgADj6d4Ru_w5txz_k9I/edit)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Morning Paper on Operability[0m[38;5;12m (https://blog.acolyer.org/2016/09/21/the-morning-paper-on-operability/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mProduction is all that matters[0m[38;5;12m (http://naildrivin5.com/blog/2013/06/16/production-is-all-that-matters.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUsing load shedding to survive a success disaster - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2016/12/using-load-shedding-to-survive-a-success-disaster-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to avoid a self-inflicted DDoS Attack - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2016/11/how-to-avoid-a-self-inflicted-DDoS-Attack-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDon't gamble when it comes to reliability[0m[38;5;12m (https://www.oreilly.com/ideas/dont-gamble-when-it-comes-to-reliability)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mResilience Engineering: Learning to Embrace Failure[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=2371297)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Infrastructure Behind Twitter: Scale[0m[38;5;12m (https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mScaling Reliability at Twitter: So You Want to Add a 9[0m[38;5;12m (https://www.youtube.com/watch?v=hYu13kBenjE)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPrinciples Of Chaos Engineering[0m[38;5;12m (http://principlesofchaos.org/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mChaos Engineering[0m[38;5;12m (https://www.infoq.com/articles/chaos-engineering)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAvailable...or not? That is the question - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/01/available-or-not-that-is-the-question-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow Google Backs Up The Internet Along With Exabytes Of Other Data[0m[38;5;12m (http://highscalability.com/blog/2014/2/3/how-google-backs-up-the-internet-along-with-exabytes-of-othe.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPerformance, Scalability, And High Availability: 3 Key Infrastructure Adaptability Requirements[0m[38;5;12m (http://highscalability.com/blog/2017/2/2/performance-scalability-and-high-availability-3-key-infrastr.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;12mThe Production Environment at Google - [39m[38;5;14m[1mPart 1[0m[38;5;12m (https://medium.com/@jerub/the-production-environment-at-google-8a1aaece3767) & [39m[38;5;14m[1mPart 2[0m[38;5;12m (https://medium.com/@jerub/the-production-environment-at-google-part-2-610884268aaa)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mReliable releases and rollbacks - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/03/reliable-releases-and-rollbacks-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow release canaries can save your bacon - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/03/how-release-canaries-can-save-your-bacon-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThings I Learned Managing Site Reliability for Some of the World’s Busiest Gambling Sites[0m[38;5;12m (https://zwischenzugs.wordpress.com/2017/04/04/things-i-learned-managing-site-reliability-for-some-of-the-worlds-busiest-gambling-sites/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEvery Day Is Monday in Operations[0m[38;5;12m (https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUnder the Hood: Ensuring Site Reliability[0m[38;5;12m (https://engineering.squarespace.com/blog/2017/under-the-hood-ensuring-site-reliability)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDesigning reliable systems with cloud infrastructure (Google Cloud Next '17)[0m[38;5;12m (https://www.youtube.com/watch?v=7Hy_6SMn8pY)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA Google SRE explores GitHub reliability with BigQuery[0m[38;5;12m (https://cloud.google.com/blog/big-data/2016/10/a-google-sre-explores-github-reliability-with-bigquery)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mKnow thy enemy: how to prioritize and communicate risks - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/05/know-thy-enemy-how-to-prioritize-and-communicate-risks-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mChaos Engineering resources[0m[38;5;12m (https://github.com/dastergon/awesome-chaos-engineering)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCRE life lessons: What is a dark launch, and what does it do for me?[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/08/CRE-life-lessons-what-is-a-dark-launch-and-what-does-it-do-for-me.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhy you should pick strong consistency, whenever possible[0m[38;5;12m (https://cloudplatform.googleblog.com/2018/01/why-you-should-pick-strong-consistency-whenever-possible.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Network is Reliable[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=2655736)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAre You Load Balancing Wrong?[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=3028689)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow production engineers support global events on Facebook[0m[38;5;12m (https://code.facebook.com/posts/166966743929963/how-production-engineers-support-global-events-on-facebook/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGoogle: A Collection Of Best Practices For Production Services[0m[38;5;12m (http://highscalability.com/blog/2018/4/16/google-a-collection-of-best-practices-for-production-service.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCanary Analysis Service[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=3194655)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTips for High Availability[0m[38;5;12m (https://medium.com/@NetflixTechBlog/tips-for-high-availability-be0472f2599c)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mProgressive Service Architecture At Auth0[0m[38;5;12m (https://auth0.com/blog/progressive-service-architecture-at-auth0/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGoogle Cloud Production Guideline[0m[38;5;12m (https://medium.com/google-cloud/production-guideline-9d5d10c8f1e)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mproduction readiness[0m[38;5;12m (https://jbd.dev/prod-readiness/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTrust By Design: The Fusion of Operational Maturity and Risk Modeling[0m[38;5;12m (https://www.youtube.com/watch?v=Vvd3uvNvMns)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTop Seven Myths of Robust Systems[0m[38;5;12m (https://www.verica.io/top-seven-myths-of-robust-systems/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTaming chaos: Preparing for your next incident[0m[38;5;12m (https://www.oreilly.com/ideas/taming-chaos-preparing-for-your-next-incident)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPID Loops and the Art of Keeping Systems Stable[0m[38;5;12m (https://www.youtube.com/watch?v=3AxSwCC7I4s)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAre you ready for production?[0m[38;5;12m (https://www.youtube.com/watch?v=YptJ2rrGAYY) - [39m[38;5;14m[1mSlides[0m[38;5;12m (https://speakerdeck.com/rakyll/are-you-ready-for-production)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mProduction Checklist for Web Apps on Kubernetes[0m[38;5;12m (https://srcco.de/posts/web-service-on-kubernetes-production-checklist-2019.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mFinding a problem at the bottom of the Google stack[0m[38;5;12m (https://cloud.google.com/blog/products/management-tools/sre-keeps-digging-to-prevent-problems)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mRethinking Task Size in SRE[0m[38;5;12m (https://www.oreilly.com/content/rethinking-task-size-in-sre/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow maintenance windows affect your error budget[0m[38;5;12m (https://cloud.google.com/blog/products/management-tools/sre-error-budgets-and-maintenance-windows)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Production Readiness Spectrum[0m[38;5;12m (https://dastergon.gr/posts/2020/09/the-production-readiness-spectrum/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGeneric mitigations[0m[38;5;12m (https://www.oreilly.com/content/generic-mitigations/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow we’re building a production readiness review process at Grafana Labs[0m[38;5;12m (https://grafana.com/blog/2021/10/13/how-were-building-a-production-readiness-review-process-at-grafana-labs/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mResiliency Planning for High-Traffic Events[0m[38;5;12m (https://shopify.engineering/resiliency-planning-for-high-traffic-events)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUsing Fault Injection Testing to Improve DoorDash Reliability[0m[38;5;12m (https://doordash.engineering/2022/04/25/using-fault-injection-testing-to-improve-doordash-reliability/)[39m

[38;2;255;187;0m[4mMonitoring & Observability & Alerting[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA Working Theory-of-Monitoring[0m[38;5;12m (https://www.usenix.org/conference/lisa13/working-theory-monitoring)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Evolution of Monitoring Systems at Google - Tony Rippy[0m[38;5;12m (https://vimeo.com/131484321)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring without Infrastructure @ Airbnb[0m[38;5;12m (https://www.usenix.org/conference/srecon15/program/presentation/serebryany)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring distributed systems[0m[38;5;12m (https://www.oreilly.com/ideas/monitoring-distributed-systems)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mObservability at Uber Engineering: Past, Present, Future[0m[38;5;12m (https://www.youtube.com/watch?v=2JAnmzVwgP8)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe 4 Golden Signals of API Health and Performance in Cloud-Native Applications[0m[38;5;12m (https://blog.netsil.com/the-4-golden-signals-of-api-health-and-performance-in-cloud-native-applications-a6e87526e74)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMy Philosophy on Alerting by Rob Ewaschuk[0m[38;5;12m (https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/preview#)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTime To Detect - Netflix[0m[38;5;12m (https://www.youtube.com/watch?v=wsgpV67MLFo)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhy Percentiles Don’t Work the Way you Think[0m[38;5;12m (https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding Twitter’s Next-Gen Alerting System[0m[38;5;12m (https://www.youtube.com/watch?v=jQggG0qIjTM)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInstrumentation: Worst case performance matters[0m[38;5;12m (https://honeycomb.io/blog/2017/01/instrumentation-worst-case-performance-matters/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInstrumentation: What does 'uptime' mean?[0m[38;5;12m (https://honeycomb.io/blog/2017/01/instrumentation-what-does-uptime-mean/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncidents + Outages at CircleCI: Our Playbook and What We’ve Learned[0m[38;5;12m (https://circleci.com/blog/incidents-outages-at-circleci-our-playbook-and-what-we-ve-learned/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAn introduction to monitoring and alerting with timeseries at scale, with Prometheus[0m[38;5;12m (https://www.youtube.com/watch?v=gNmWzkGViAY)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDetecting outliers and anomalies in realtime at Datadog[0m[38;5;12m (https://www.youtube.com/watch?v=mG4ZpEhRKHA)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to Monitor the SRE Golden Signals[0m[38;5;12m (https://medium.com/devopslinks/how-to-monitor-the-sre-golden-signals-1391cadc7524)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring in a DevOps World[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=3178371)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring Your Monitoring’s Monitoring[0m[38;5;12m (https://medium.com/@jerub/monitoring-your-monitorings-monitoring-51d479100f4c)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mObservability: the new wave or buzzword?[0m[38;5;12m (https://medium.com/@dlite/observability-the-new-wave-or-buzzword-fc23a68abf72)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring Isn't Observability[0m[38;5;12m (https://www.vividcortex.com/blog/monitoring-isnt-observability)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring in the time of Cloud Native[0m[38;5;12m (https://medium.com/@copyconstruct/monitoring-in-the-time-of-cloud-native-c87c7a5bfa3e)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPrinciples of Monitoring Microservices[0m[38;5;12m (https://www.youtube.com/watch?v=2LNHv0JyBUk)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Many Ways Your Monitoring Is Lying to You[0m[38;5;12m (https://www.usenix.org/node/197446)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGitOps Part 3 - Observability[0m[38;5;12m (https://www.weave.works/blog/gitops-part-3-observability)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWant to Debug Latency?[0m[38;5;12m (https://medium.com/observability/want-to-debug-latency-7aa48ecbe8f7)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDebugging Latency in Go 1.11[0m[38;5;12m (https://medium.com/observability/debugging-latency-in-go-1-11-9f97a7910d68)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAlerting on SLOs like Pros[0m[38;5;12m (https://developers.soundcloud.com/blog/alerting-on-slos)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mApplied Alerting Philosophy[0m[38;5;12m (https://www.youtube.com/watch?v=JhxfZ0VIPP0)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mObservations on Observability[0m[38;5;12m (https://blog.colinbreck.com/observations-on-observability/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDeploys: It's Not Actually About Fridays[0m[38;5;12m (https://charity.wtf/2019/10/28/deploys-its-not-actually-about-fridays/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering Best Practices for Data Pipelines[0m[38;5;12m (https://medium.com/better-programming/site-reliability-engineering-best-practices-for-data-pipelines-44a78e91f6f0)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mElastic Observability in SRE and Incident Response[0m[38;5;12m (https://www.elastic.co/blog/elastic-observability-sre-incident-response)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mError Budget Policy - Part 1 - Adoption at Expedia Group[0m[38;5;12m (https://medium.com/expedia-group-tech/error-budget-policy-adoption-at-expedia-group-7d80d41c4a8b)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mError Budget Policy - Part 2 - Practices at Expedia Group[0m[38;5;12m (https://medium.com/expedia-group-tech/error-budget-policies-in-practice-4c98f56a28c1)[39m

[38;2;255;187;0m[4mOn-Call[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBeing an On-Call Engineer: A Google SRE Perspective[0m[38;5;12m (http://research.google.com/pubs/pub44813.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInside Atlassian: how our site reliability engineers do incident management[0m[38;5;12m (https://www.atlassian.com/blog/it-teams/inside-atlassian-site-reliability-engineers-incident-management)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInside Atlassian: how IT & SRE use ChatOps to run incident management[0m[38;5;12m (https://www.atlassian.com/blog/2016/02/inside-atlassian-sre-use-chatops-run-incident-management)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident Response at Heroku[0m[38;5;12m (https://blog.heroku.com/archives/2014/5/9/incident-response-at-heroku)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWho's On Call?[0m[38;5;12m (http://www.susanjfowler.com/blog/2016/9/6/whos-on-call)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSysAdvent - Day 6 - No More On-Call Martyrs[0m[38;5;12m (https://sysadvent.blogspot.com/2016/12/day-6-no-more-on-call-martyrs.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOn Being On Call[0m[38;5;12m (http://naildrivin5.com/blog/2016/12/07/on-call.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe On-Call Handbook[0m[38;5;12m (https://github.com/alicegoldfuss/oncall-handbook)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident management at Google — adventures in SRE-land[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/02/Incident-management-at-Google-adventures-in-SRE-land.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mRun Book / Operations Manual template[0m[38;5;12m (https://github.com/SkeltonThatcher/run-book-template)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAutomating Your Oncall: Open Sourcing Fossor and Ascii Etch[0m[38;5;12m (https://engineering.linkedin.com/blog/2017/12/open-sourcing-fossor-and-ascii-etch)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mProject STAR[0m[48;2;30;30;40m[38;5;14m[1m[3m: Streamlining Our On-Call Process[0m[48;2;30;30;40m[38;5;13m[3m (https://engineering.linkedin.com/blog/2018/01/project-star-streamlining-our-on-call-process)[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE@Xero: Managing Incidents Part I[0m[38;5;12m (https://devblog.xero.com/sre-xero-managing-incidents-part-i-7d02d650a71c)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE@Xero: Managing Incidents Part II[0m[38;5;12m (https://devblog.xero.com/sre-xero-managing-incidents-part-ii-224a6e06f426)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow To Establish a High Severity Incident Management Program[0m[38;5;12m (https://www.gremlin.com/how-to-establish-a-high-severity-incident-management-program/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow Your Systems Keep Running Day After Day - John Allspaw[0m[38;5;12m (https://www.youtube.com/watch?v=xA5U85LSk0M)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOn-call doesn’t have to suck[0m[38;5;12m (https://medium.com/@copyconstruct/on-call-b0bd8c5ea4e0)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhy, as a Netflix infrastructure manager, am I on call?[0m[38;5;12m (https://medium.com/@awspyker/why-as-a-netflix-infrastructure-manager-am-i-on-call-bdc551ac01fe)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOncall and Sustainable Software Development[0m[38;5;12m (https://honeycomb.io/blog/2018/02/oncall-and-sustainable-software-development/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOn Call Rotations: How Best to Wake Devs Up in the Middle of the Night[0m[38;5;12m (https://thenewstack.io/call-rotations-best-wake-devs-middle-night/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUnderstanding The Role Of The Incident Manager On-Call (IMOC)[0m[38;5;12m (https://www.gremlin.com/community/tutorials/understanding-the-role-of-the-incident-manager-on-call-imoc/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m3 Ways to Minimize the Impact of High Severity Incidents[0m[38;5;12m (https://devops.com/three-ways-to-minimize-the-impact-of-high-severity-incidents/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAdvice to Management Teams While Enrolling Changes to On-Call Systems[0m[38;5;12m (https://thenewstack.io/advice-management-teams-enrolling-changes-on-call-systems/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMoving Past Shallow Incident Data[0m[38;5;12m (http://www.adaptivecapacitylabs.com/blog/2018/03/23/moving-past-shallow-incident-data/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSustainable On-Call[0m[38;5;12m (https://codywilbourn.com/2018/03/22/sustainable-on-call/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mdotScale 2017 - Aish Raj Dahal - Chaos management during a major incident[0m[38;5;12m (https://youtu.be/8pPrtf1J1Z8)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident Management at Netflix Velocity[0m[38;5;12m (https://www.infoq.com/presentations/netflix-incident-management)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncidents, fixes, and the day after[0m[38;5;12m (https://medium.com/booking-com-infrastructure/incidents-fixes-and-the-day-after-c5d9aeae28c3)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m10 Steps to Develop an Incident Response Plan You’ll ACTUALLY Use[0m[38;5;12m (https://engineering.salesforce.com/10-steps-to-develop-an-incident-response-plan-youll-actually-use-6cc49d9bf94c)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mChecklists: a stupidly simple but valuable operational gift[0m[38;5;12m (https://tech.buzzfeed.com/checklists-an-operational-gift-aaf42cf0be12)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to write a status page update[0m[38;5;12m (https://blog.hostedgraphite.com/2018/09/13/how-to-write-a-status-page-update/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAtlassian Incident Handbook[0m[38;5;12m (https://www.atlassian.com/software/jira/ops/handbook)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPagerDuty Incident Response Handbook[0m[38;5;12m (https://response.pagerduty.com/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAvoiding Burnout for SREs[0m[38;5;12m (https://blog.zenduty.com/blog/2019/05/02/Avoiding-SRE-Burnout)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBetter On-Call the SRE way[0m[38;5;12m (https://vimeo.com/344516642)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mManaging Incidents at Monzo[0m[38;5;12m (https://www.youtube.com/watch?v=ZqwVlsIonIw)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMaking On-Call Not Suck[0m[38;5;12m (https://dev.to/molly_struve/making-on-call-not-suck-490)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow we (Monzo) respond to incidents[0m[38;5;12m (https://monzo.com/blog/2019/07/08/how-we-respond-to-incidents)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow we’ve evolved on-call at Monzo[0m[38;5;12m (https://monzo.com/blog/how-weve-evolved-on-call-at-monzo)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCode Yellow: When Operations Isn’t Perfect[0m[38;5;12m (https://devops.com/code-yellow-when-operations-isnt-perfect/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMTTR is dead, long live CIRT[0m[38;5;12m (https://opensource.com/article/19/7/measure-operational-performance)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mExtended Dreyfus Model for Incident Lifecycles[0m[38;5;12m (https://github.com/preed/incident-lifecycle-model)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInhumanity of Root Cause Analysis[0m[38;5;12m (https://www.verica.io/inhumanity-of-root-cause-analysis/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident insights from NASA, NTSB, and the CDC[0m[38;5;12m (https://www.youtube.com/watch?v=ODYO2MPymJ4)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to avoid On-Call Burnout the SRE Way[0m[38;5;12m (https://www.squadcast.com/blog/how-to-avoid-on-call-burnout)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMy week shadowing a GitLab Site Reliability Engineer[0m[38;5;12m (https://about.gitlab.com/blog/2019/12/16/sre-shadow/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow our production team runs the weekly on-call handover[0m[38;5;12m (https://about.gitlab.com/blog/2018/03/14/the-on-call-handover-at-gitlab/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWriting Runbook Documentation When You’re An SRE[0m[38;5;12m (https://www.transposit.com/blog/2020.01.30-writing-runbook-documentation-when-youre-an-sre/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident response, programs and you(r startup)[0m[38;5;12m (https://lethain.com/incident-response-programs-and-your-startup/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAn Incident Command Training Handbook[0m[38;5;12m (https://blog.danslimmon.com/2019/06/24/an-incident-command-training-handbook/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mShrinking the time to mitigate production incidents[0m[38;5;12m (https://cloud.google.com/blog/products/management-tools/shrinking-the-time-to-mitigate-production-incidents)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncident writeup as sociological storytelling[0m[38;5;12m (https://surfingcomplexity.blog/2021/06/11/incident-writeup-as-sociological-storytelling/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mElephant in the Blameless War Room: Accountability[0m[38;5;12m (https://www.blameless.com/incident-response/elephant-in-the-blameless-war-room-accountability)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mNaming names in incident writeups[0m[38;5;12m (https://surfingcomplexity.blog/2021/05/22/naming-names-in-incident-writeups/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding On-Call Culture at GitHub[0m[38;5;12m (https://github.blog/2021-01-06-building-on-call-culture-at-github/)[39m

[38;2;255;187;0m[4mPost-Mortem[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA collection of post-mortems[0m[38;5;12m (https://github.com/danluu/post-mortems)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCollection of Kubernetes Failure Stories[0m[38;5;12m (https://github.com/hjacobs/kubernetes-failure-stories)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBlameless PostMortems and a Just Culture[0m[38;5;12m (https://codeascraft.com/2012/05/22/blameless-postmortems/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA Tale of Postmortems[0m[38;5;12m (https://blog.box.com/blog/a-tale-of-postmortems/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding a Blameless Post-Mortem Culture with Jason Hand[0m[38;5;12m (http://runasradio.com/Shows/Show/486)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe infinite hows[0m[38;5;12m (https://www.oreilly.com/ideas/the-infinite-hows)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mFailure is Always An Option: How a Blameless Culture Leads to Better Results[0m[38;5;12m (https://victorops.com/blog/blameless-culture/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSysAdvent - Day 1 - Why You Need a Postmortem Process[0m[38;5;12m (https://sysadvent.blogspot.com/2016/12/day-1-why-you-need-postmortem-process.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEtsy’s Debriefing Facilitation Guide for Blameless Postmortems[0m[38;5;12m (https://codeascraft.com/2016/11/17/debriefing-facilitation-guide/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWriting Your First Postmortem[0m[38;5;12m (https://sharpend.io/writing-your-first-postmortem/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to Write Great Outage Post-Mortems[0m[38;5;12m (https://artsy.github.io/blog/2014/11/19/how-to-write-great-outage-post-mortems/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA collection of postmortem templates[0m[38;5;12m (https://github.com/dastergon/postmortem-templates)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEmbracing Feedback[0m[38;5;12m (https://blog.heptio.com/embracing-feedback-2fd703da714f)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPostmortem Action Items: Plan the Work and Work the Plan[0m[38;5;12m (https://www.usenix.org/conference/srecon17americas/program/presentation/lueder)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSocial Issues In Postmortems[0m[38;5;12m (https://medium.com/@allspaw/social-issues-in-postmortems-d48dde624d18)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGoogle Has an Official Process in Place for Learning From Failure--and It's Absolutely Brilliant[0m[38;5;12m (https://www.inc.com/justin-bariso/meet-postmortem-googles-brilliant-process-tool-for-learning-from-failure.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPostmortem culture: how you can learn from failure[0m[38;5;12m (https://rework.withgoogle.com/blog/postmortem-culture-how-you-can-learn-from-failure/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mre:Work - Postmortem discussion template[0m[38;5;12m (https://docs.google.com/document/d/1ob0dfG_gefr_gQ8kbKr0kS4XpaKbc0oVAk4Te9tbDqM/edit)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPost-mortems to the rescue[0m[38;5;12m (https://increment.com/documentation/post-mortems-to-the-rescue/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPostmortem Action Items: Plan the Work and Work the Plan[0m[38;5;12m (https://ai.google/research/pubs/pub45906)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhy Every Company Can Benefit from a Blameless Culture[0m[38;5;12m (https://www.blameless.com/why-companies-can-benefit-from-blameless-culture/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m"It's dead, Jim": How we write an incident postmortem[0m[38;5;12m (https://www.hostedgraphite.com/blog/its-dead-jim-how-we-write-an-incident-postmortem)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOur incident postmortem template[0m[38;5;12m (https://www.hostedgraphite.com/blog/incident-postmortem-template)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLearn out of mistakes. Postmortems to the rescue.[0m[38;5;12m (https://fernandocejas.com/2020/03/21/learn-out-of-mistakes-postmortems/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mImproving Postmortem Practices with Veteran Google SRE, Steve McGhee[0m[38;5;12m (https://www.blameless.com/improve-postmortem-with-sre-steve-mcghee/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInhumanity of Root Cause Analysis[0m[38;5;12m (https://www.verica.io/blog/inhumanity-of-root-cause-analysis/)[39m

[38;2;255;187;0m[4mCapacity Planning[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCapacity Planning[0m[38;5;12m (https://www.usenix.org/system/files/login/articles/login_feb15_07_hixson.pdf)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSouthBay SRE: Cloud Capacity Planning[0m[38;5;12m (https://www.youtube.com/watch?v=MDQ0uEUmLOo)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIntent-based Capacity Planning and Autoscaling with Kubernetes[0m[38;5;12m (https://www.squadcast.com/blog/intent-based-capacity-planning-and-autoscaling-with-kubernetes)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow do you do Capacity Planning[0m[38;5;12m (https://jvns.ca/blog/2016/03/20/how-do-you-do-capacity-planning/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow Back Market SREs prepared for Black Friday[0m[38;5;12m (https://medium.com/back-market-engineering/how-back-market-sres-prepared-for-black-friday-5f017f343408)[39m

[38;2;255;187;0m[4mService Level Agreement[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIf It's in the Cloud, Get It on Paper: Cloud Computing Contract Issues[0m[38;5;12m (http://er.educause.edu/articles/2010/6/if-its-in-the-cloud-get-it-on-paper-cloud-computing-contract-issues)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mService Level Agreements in the Cloud: Who cares?[0m[38;5;12m (http://www.wired.com/insights/2011/12/service-level-agreements-in-the-cloud-who-cares/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSysAdvent- Day 20 - How to set and monitor SLAs[0m[38;5;12m (https://sysadvent.blogspot.com/2016/12/day-20-how-to-set-and-monitor-slas.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSLOs, SLIs, SLAs, oh my - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/01/availability-part-deux--CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mService Levels and Error Budgets[0m[38;5;12m (https://www.usenix.org/conference/srecon16/program/presentation/jones)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m(Un)Reliability Budgets - Finding Balance between Innovation and Reliability[0m[38;5;12m (https://www.usenix.org/system/files/login/articles/login_aug15_06_roth.pdf)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Calculus of Service Availability[0m[38;5;12m (https://queue.acm.org/detail.cfm?id=3096459&__s=dnkxuaws9pogqdnxmx8i)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAvailability Calculator: Calculate how much downtime should be permitted in your SLA[0m[38;5;12m (https://dastergon.github.io/availability-calculator/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mStandardize cloud SLA availability with numerical performance data[0m[38;5;12m (https://www.ibm.com/developerworks/cloud/library/cl-SLAloadbalance-numanalysis/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBest practices to develop SLAs for cloud computing[0m[38;5;12m (https://www.ibm.com/developerworks/cloud/library/cl-slastandards/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA Practical Guide to SLAs[0m[38;5;12m (https://www.catchpoint.com/blog/sla-management-guide/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding good SLOs - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2017/10/building-good-SLOs-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mNo Grumpy Humans and Other Site Reliability Engineering Lessons from Google[0m[38;5;12m (https://thenewstack.io/sre-lessons-google-no-grumpy-humans/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mConsequences of SLO violations — CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2018/01/consequences-of-SLO-violations-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mService Level Objectives in Practice[0m[38;5;12m (https://medium.com/@jerub/service-level-objectives-in-practice-ed1200502d5)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Consensus Building[0m[38;5;12m (https://medium.com/@jerub/sre-consensus-building-36ad5d2e470b)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAn example escalation policy — CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2018/01/an-example-escalation-policy-CRE-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mError Budget Calculator[0m[38;5;12m (https://dastergon.gr/error-budget-calculator/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUnderstanding error budget overspend - part one - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2018/06/understanding-error-budget-overspend-cre-life-lessons.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGood housekeeping for error budgets - part two - CRE life lessons[0m[38;5;12m (https://cloudplatform.googleblog.com/2018/06/cre-life-lessons-good-housekeeping-for-error-budgets.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE fundamentals: SLIs, SLAs and SLOs[0m[38;5;12m (https://cloudplatform.googleblog.com/2018/07/sre-fundamentals-slis-slas-and-slos.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSLOs & You: A Guide To Service Level Objectives[0m[38;5;12m (https://www.circonus.com/2018/07/a-guide-to-service-level-objectives/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEarning Our Wings: Stories and Findings From Operating a Large-scale Concourse Deployment[0m[38;5;12m (https://medium.com/concourse-ci/earning-our-wings-a0c307fa73e6)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mNines are Not Enough: Meaningful Metrics for Clouds[0m[38;5;12m (https://ai.google/research/pubs/pub48033)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow many nines is my storage system?[0m[38;5;12m (https://medium.com/@jamesacowling/how-many-nines-is-my-storage-system-7d16e852d56d)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDon't follow the sun.[0m[38;5;12m (https://lethain.com/dont-follow-the-sun/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Tyranny of the SLA[0m[38;5;12m (https://www.youtube.com/watch?v=4cPqLuIXBnw)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBackblaze Durability is 99.999999999% — And Why It Doesn’t Matter[0m[38;5;12m (https://www.backblaze.com/blog/cloud-storage-durability/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDevOpsDays Chicago 2019 - The Art of SLOs[0m[38;5;12m (https://youtu.be/Dfnbw5dJQ5I)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Art of SLOs Workshop Materials[0m[38;5;12m (https://cre.page.link/art-of-slos)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow to Include Latency in SLO-Based Alerting[0m[38;5;12m (https://grafana.com/blog/2019/11/27/kubecon-recap-how-to-include-latency-in-slo-based-alerting/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSucceeding With Service Level Objectives[0m[38;5;12m (https://www.squadcast.com/blog/succeeding-with-service-level-objectives)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPutting customers first with SLIs and SLOs[0m[38;5;12m (https://medium.com/the-telegraph-engineering/putting-customers-first-with-slis-and-slos-15352f9b6cbc)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Leadership: Have Tiered SLAs[0m[38;5;12m (https://medium.com/site-reliability-engineering-leadership/sre-tip-have-tiered-slas-2c432ffe46a)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow SLOs Enable Fast, Reliable Application Delivery[0m[38;5;12m (https://www.blameless.com/blog/how-slos-enable-fast-reliable-application-delivery)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Tail at Scale[0m[38;5;12m (https://billduncan.org/the-tail-at-scale/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Tail at Scale Revisited[0m[38;5;12m (https://billduncan.org/the-tail-at-scale-revisited/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDefining SLOs for services with dependencies[0m[38;5;12m (https://cloud.google.com/blog/products/gcp/defining-slos-for-services-with-dependencies-cre-life-lessons)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mService Level Disagreements[0m[38;5;12m (https://blog.b3k.us/2009/07/15/service-level-disagreements.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHow We Use Sloth to do SLO Monitoring and Alerting with Prometheus[0m[38;5;12m (https://mattermost.com/blog/sloth-for-slo-monitoring-and-alerting-with-prometheus/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSLI Deep Dive[0m[38;5;12m (https://medium.com/site-reliability-engineering-leadership/sli-deep-dive-cae92bd90a79)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMeasuring Reliability in GCP: Step By Step SLO creation guide using Cloud Operation Sandbox[0m[38;5;12m (https://medium.com/google-cloud/measuring-reliability-in-gcp-step-by-step-slo-creation-guide-using-cloud-operation-sandbox-99043bd0e70f)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSLO tracker[0m[38;5;12m (https://slotracker.com/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSLO Alerting for Mortals[0m[38;5;12m (https://ervinbarta.com/2021/10/19/slo-alerting-for-mortals/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE methods and climate change[0m[38;5;12m (https://bpetit.nce.re/2021/03/sre-methods-and-climate-change/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat made SLOs so messy (and what we can do about it)[0m[38;5;12m (https://medium.com/lightstephq/what-made-slos-so-messy-and-what-we-can-do-about-it-89be415a80b3)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSLICK: Adopting SLOs for improved reliability[0m[38;5;12m (https://engineering.fb.com/2021/12/13/production-engineering/slick/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCalculating composite SLA[0m[38;5;12m (https://alexewerlof.medium.com/calculating-composite-sla-d855eaf2c655)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBest practices for setting SLOs and SLIs for modern, complex systems[0m[38;5;12m (https://newrelic.com/blog/best-practices/best-practices-for-setting-slos-and-slis-for-modern-complex-systems)[39m

[38;2;255;187;0m[4mPerformance[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mPerformance Checklists for SREs[0m[38;5;12m (https://www.brendangregg.com/blog/2016-05-04/srecon2016-perf-checklists-for-sres.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSouth Bay SRE Meetup - Netflix Cloud Performance Team[0m[38;5;12m (https://youtu.be/uQ0flQOtQEA)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSoftware Performance Analysis Guided By SLOs[0m[38;5;12m (https://medium.com/dm03514-tech-blog/sre-performance-analysis-tuning-methodology-using-a-simple-http-webserver-in-go-d475460f27ca)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mA framework for pragmatic performance engineering[0m[38;5;12m (https://mterwill.com/posts/framework-for-performance-engineering/)[39m

[38;2;255;187;0m[4mProgramming[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGo Language for Ops and Site Reliability Engineering[0m[38;5;12m (http://www.oreilly.com/pub/e/2712)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGo for SREs using Python[0m[38;5;12m (https://www.usenix.org/sites/default/files/conference/protected-files/srecon16_slides_hamilton.pdf)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOperability in Go[0m[38;5;12m (https://speakerdeck.com/ianschenck/operability-in-go)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGo Reliability and Durability at Dropbox[0m[38;5;12m (https://www.youtube.com/watch?v=5doOcaMXx08)[39m

[38;2;255;187;0m[4mMisc Articles[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat is SRE (Site Reliability Engineering)?[0m[38;5;12m (https://www.oreilly.com/ideas/what-is-sre-site-reliability-engineering)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHere’s How Google Makes Sure It (Almost) Never Goes Down[0m[38;5;12m (http://www.wired.com/2016/04/google-ensures-services-almost-never-go/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAre site reliability engineers the next data scientists?[0m[38;5;12m (http://techcrunch.com/2016/03/02/are-site-reliability-engineers-the-next-data-scientists/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineers: "solving the most interesting problems"[0m[38;5;12m (http://googleresearch.blogspot.gr/2012/07/site-reliability-engineers-solving-most.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineers: the "world’s most intense pit crew"[0m[38;5;12m (http://googleforstudents.blogspot.gr/2012/06/site-reliability-engineers-worlds-most.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite reliability engineering kicks rote tasks out of IT ops[0m[38;5;12m (http://searchitoperations.techtarget.com/feature/Site-reliability-engineering-kicks-rote-tasks-out-of-IT-ops)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mNotes on Site Reliability Engineering[0m[38;5;12m (http://danluu.com/google-sre-book/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAdventures in SRE-land: Welcome to Google Mission Control[0m[38;5;12m (https://cloudplatform.googleblog.com/2016/07/adventures-in-SRE-land-welcome-to-Google-Mission-Control.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBook Review: Site Reliability Engineering - How Google Runs Production Systems[0m[38;5;12m (https://www.infoq.com/articles/site-reliability-engineering)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineers: “We solve cooler problems”[0m[38;5;12m (https://www.google.com/about/careers/stories/site-reliability-engineering-profile-google/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSREcon17: Brave new world of site reliability engineering[0m[38;5;12m (http://www.networkworld.com/article/3182827/cloud-computing/srecon17-brave-new-world-of-site-reliability-engineering.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOpen AWS guide[0m[38;5;12m (https://github.com/open-guides/og-aws)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCommentary on Site Reliability Engineering[0m[38;5;12m (https://medium.com/@jerub/commentary-on-site-reliability-engineering-9ba9e1be2a8c)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering: 4 Things to Know[0m[38;5;12m (https://www.networkcomputing.com/data-centers/site-reliability-engineering-4-things-know/888724300)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLooking for SRE Success? Then Find the Intrapreneurs![0m[38;5;12m (https://www.linkedin.com/pulse/looking-sre-success-find-intrapreneurs-josh-gilliland/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mWhat Team Structure is Right for DevOps to Flourish?[0m[38;5;12m (http://web.devopstopologies.com/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mInjured on Vacation? Applying Principles from Site Reliability Engineering to a Travel Emergency[0m[38;5;12m (https://www.sidewalksafari.com/2018/12/sre-in-a-travel-emergency.html)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBuilding blameless working environment[0m[38;5;12m (https://sobolevn.me/2018/12/blameless-environment)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Adoption Report[0m[38;5;12m (https://techbeacon.com/devops/how-accenture-retrofitted-site-reliability-engineering)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSREs: The Happiest – and Highest Paid – in the Industry[0m[38;5;12m (https://devops.com/sres-the-happiest-and-highest-paid-in-the-industry/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe Role of Site Reliability Engineering, Today and Tomorrow[0m[38;5;12m (https://thenewstack.io/the-role-of-site-reliability-engineering-today-and-tomorrow/)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE as a Lifestyle Choice[0m[38;5;12m (https://medium.com/@bellmar/sre-as-a-lifestyle-choice-de9f5a82d73d)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRECon EMEA 2019 Recap[0m[38;5;12m (https://speakerdeck.com/dastergon/srecon-emea-2019-recap-sre-muc-meetup)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLife of an SRE at Google - JC van Winkel[0m[38;5;12m (https://www.youtube.com/watch?v=7Oe8mYPBZmw)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering for Native Mobile Apps - Abhijith Krishnappa[0m[38;5;12m (https://www.infoq.com/articles/site-reliability-engineering-mobile-apps/) - Case study: Halodoc adaptation of SRE principles for Native Mobile Apps[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Best Practices by InfraCloud[0m[38;5;12m (https://www.infracloud.io/blogs/sre-best-practices/)[39m

[38;2;255;187;0m[4mReal-time Messaging[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m#sre channel at Hangops Slack[0m[38;5;12m (https://hangops.slack.com/) - Discussion of Site Reliability Engineering generally.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1m#incident_response channel at Hangops Slack[0m[38;5;12m (https://hangops.slack.com/) - Discussion about Incident Response.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUSENIX SREcon Slack[0m[38;5;12m (https://usenix-srecon.slack.com)[39m

[38;2;255;187;0m[4mBlogs[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBrendan Gregg's Blog[0m[38;5;12m (http://www.brendangregg.com/blog/index.html) - Highly Technical Blog Posts About Systems Internals, Performance and SRE.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mEverything Sysadmin[0m[38;5;12m (http://everythingsysadmin.com/) - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mHigh Scalability[0m[38;5;12m (http://highscalability.com/) - Technical Blog Posts About Systems Architecture.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mrachelbythebay[0m[38;5;12m (https://rachelbythebay.com/w/) - Techincal Blog Posts.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSusan J. Fowler[0m[38;5;12m (http://www.susanjfowler.com/blog/) - Various blog posts about SRE, Software Engineering and Microservices.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSysAdvent[0m[38;5;12m (https://sysadvent.blogspot.com) - One article for each day of December, ending on the 25th article.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mStephen Thorne's Blog[0m[38;5;12m (https://medium.com/@jerub) - Blog Posts About SRE[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mIncrement[0m[38;5;12m (https://increment.com/) - A digital magazine about how teams build and operate software systems at scale.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGopherSRE[0m[38;5;12m (http://www.gophersre.com/) - Blog Posts about Go and SRE.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mCindy Sridharan[0m[38;5;12m (https://medium.com/@copyconstruct) - Blog posts about distributed systems and their management.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBlameless Blog[0m[38;5;12m (https://www.blameless.com/blog/) - Blog posts about SRE culture and practices.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mResilience Roundup[0m[38;5;12m (https://ResilienceRoundup.com) - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSquadcast Blog[0m[38;5;12m (https://www.squadcast.com/blog) - Blog posts about SRE best practices, reliability, on-call and incident management.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mFireHydrant Blog[0m[38;5;12m (https://www.firehydrant.io/blog) - Posts about complex systems, incident response, and SRE best practices.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mRootly Blog[0m[38;5;12m (https://www.rootly.io/blog) - Incident management best practices and guides.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mincident.io Blog[0m[38;5;12m (https://www.incident.io/blog) - Guides, advice and resources on incident management and response.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLogit.io Blog[0m[38;5;12m (https://logit.io/blog) - Resources on log management, SRE and devOps.[39m

[38;2;255;187;0m[4mNewsletters[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mDevOpsLinks[0m[38;5;12m (https://faun.dev) - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mKubeWeekly[0m[38;5;12m (https://kubeweekly.io/) - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Weekly[0m[38;5;12m (https://sreweekly.com/) - Weekly Site Reliability Newsletter.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mO’Reilly Systems Engineering and Operations Newsletter[0m[38;5;12m (http://www.oreilly.com/webops-perf/newsletter.html) - Weekly systems engineering and operations news and insights from industry insiders.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mChaosEngineering.news[0m[38;5;12m (https://chaosengineering.news/) - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox![39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMonitoring Weekly[0m[38;5;12m (https://monitoring.love/) - What's new in monitoring? Curated monitoring articles to your inbox each week.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mObservability news[0m[38;5;12m (https://o11y.news/) - Updates around observability (o11y) with a special focus on open source.[39m

[38;2;255;187;0m[4mConferences & Meetups[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRECon Conferences[0m[38;5;12m (https://www.usenix.org/conferences/byname/925) - The Official SRE Conference.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mLISA Conferences[0m[38;5;12m (https://www.usenix.org/conferences/byname/5) - Prominent Conference About SysAdmin/DevOps/SRE.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE Tech Talks[0m[38;5;12m (https://developers.google.com/events/sre/) - SRE Talks Hosted by Google.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSouth Bay Site Reliability Engineering (Sunnyvale, CA) Meetup[0m[38;5;12m (https://www.meetup.com/South-Bay-Site-Reliability-Engineering/) - A Group For Individuals Who Tackle Reliability Challenges For Web-Scale Systems.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSan Francisco Reliability Engineering[0m[38;5;12m (https://www.meetup.com/San-Francisco-Reliability-Engineering/) - A Group Of People Who Are Passionate About Reliable, Performant Software Systems.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering Munich, Germany[0m[38;5;12m (https://www.meetup.com/Site-Reliability-Engineering-Munich/) - SRE Meetup in the greater area of Oktoberfest city.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mADDO - All Day DevOps[0m[38;5;12m (https://www.alldaydevops.com/) - A 24 hour conference that is completely online and free.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering Paris, France[0m[38;5;12m (https://www.meetup.com/Site-Reliability-Engineering-Paris/) - SRE Meetup in the city of light.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSite Reliability Engineering India[0m[38;5;12m (https://www.meetup.com/site-reliability-enggineering/) - SRE Meetup India[39m

[38;2;255;187;0m[4mTwitter[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGoogle SRE Twitter Account[0m[38;5;12m (https://twitter.com/googlesre) - Google's SRE Twitter Account.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSREBook[0m[38;5;12m (https://twitter.com/SREBook) - The Official Twitter Account of Site Reliability Engineering Book.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSREcon[0m[38;5;12m (https://twitter.com/SREcon) - SRECon's Official Twitter Account.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSREWorkbook[0m[38;5;12m (https://twitter.com/SREWorkbook) - The Official Twitter Account of Site Reliability Workbook.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mThe SRE Dev[0m[38;5;12m (https://twitter.com/The_SRE_Dev) - SRE-related Posts from [39m[38;5;14m[1mdev.to[0m[38;5;12m (https://dev.to).[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTwitter SRE[0m[38;5;12m (https://twitter.com/TwitterSRE) - The Official Twitter Account of Twitter's SRE team.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mTwitter SRE Weekly[0m[38;5;12m (https://twitter.com/SREWeekly) - The Official Twitter Account of SRE Weekly Newsletter.[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mUSENIX Association[0m[38;5;12m (https://twitter.com/usenix) - The Official USENIX Twitter Account.[39m

[38;2;255;187;0m[4mSRE Tools[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mAwesome SRE Tools[0m[38;5;12m (https://github.com/SquadcastHub/awesome-sre-tools) - A curated list of Site Reliability and Production Engineering tools[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mList of Continuous Integration services[0m[38;5;12m (https://github.com/ligurio/awesome-ci)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mSRE cheat sheet[0m[38;5;12m (https://github.com/shibumi/SRE-cheat-sheet) - A cheat sheet for Site Reliability Engineering principles and numbers[39m

[38;2;255;187;0m[4mPodcasts[0m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mBlameless / Resilience in Action[0m[38;5;12m (https://podcasts.apple.com/us/podcast/resilience-in-action/id1506828506)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mGoogle SRE Prodcast[0m[38;5;12m (https://sre.google/prodcast)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mo11y Observability Podcast[0m[38;5;12m (https://www.honeycomb.io/usecase/o11ycast/ )[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mOn Call Nightmares (retired)[0m[38;5;12m (https://podcasts.apple.com/us/podcast/on-call-nightmares-podcast/id1447430839)[39m
[48;5;12m[38;5;11m⟡[49m[39m[38;5;12m [39m[38;5;14m[1mMaking of the SRE Omelette[0m[38;5;12m (https://open.spotify.com/show/1KxLVUduNdDRAiOw8BB32J)[39m

[38;5;12msre Github: https://github.com/dastergon/awesome-sre[39m