Files
awesome-awesomeness/html/scalability.html
2025-07-18 23:13:11 +02:00

2966 lines
139 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<p><a href="http://awesome-scalability.com/"><img src="/logo.png"
alt="Logo" /></a></p>
<p>An updated and organized reading list for illustrating the patterns
of scalable, reliable, and performant large-scale systems. Concepts are
explained in the articles of prominent engineers and credible
references. Case studies are taken from battle-tested systems that serve
millions to billions of users.</p>
<h4 id="if-your-system-goes-slow">If your system goes slow</h4>
<blockquote>
<p>Understand your problems: scalability problem (fast for a single user
but slow under heavy load) or performance problem (slow for a single
user) by reviewing some <a href="#principle">design principles</a> and
checking how <a href="#scalability">scalability</a> and <a
href="#performance">performance</a> problems are solved at tech
companies. The section of <a href="#intelligence">intelligence</a> are
created for those who work with data and machine learning at big (data)
and deep (learning) scale.</p>
</blockquote>
<h4 id="if-your-system-goes-down">If your system goes down</h4>
<blockquote>
<p>“Even if you lose all one day, you can build all over again if you
retain your calm!” - Thuan Pham, former CTO of Uber. So, keep calm and
mind the <a href="#availability">availability</a> and <a
href="#stability">stability</a> matters!</p>
</blockquote>
<h4 id="if-you-are-having-a-system-design-interview">If you are having a
system design interview</h4>
<blockquote>
<p>Look at some <a href="#interview">interview notes</a> and <a
href="#architecture">real-world architectures with completed
diagrams</a> to get a comprehensive view before designing your system on
whiteboard. You can check some <a href="#talk">talks</a> of engineers
from tech giants to know how they build, scale, and optimize their
systems. Good luck!</p>
</blockquote>
<h4 id="if-you-are-building-your-dream-team">If you are building your
dream team</h4>
<blockquote>
<p>The goal of scaling team is not growing team size but increasing team
output and value. You can find out how tech companies reach that goal in
various aspects: hiring, management, organization, culture, and
communication in the <a href="#organization">organization</a>
section.</p>
</blockquote>
<h4 id="community-power">Community power</h4>
<blockquote>
<p>Contributions are greatly welcome! You may want to take a look at the
<a href="CONTRIBUTING.md">contribution guidelines</a>. If you see a link
here that is no longer maintained or is not a good fit, please submit a
pull request!</p>
</blockquote>
<blockquote>
<p>Many long hours of hard work have gone into this project. If you find
it helpful, please share on Facebook, <a href="https://ctt.ec/V8B2p">on
Twitter</a>, <a href="http://t.cn/RnjFLCB">on Weibo</a>, or on your chat
groups! Knowledge is power, knowledge shared is power multiplied. Thank
you!</p>
</blockquote>
<h2 id="content">Content</h2>
<ul>
<li><a href="#principle">Principle</a></li>
<li><a href="#scalability">Scalability</a></li>
<li><a href="#availability">Availability</a></li>
<li><a href="#stability">Stability</a></li>
<li><a href="#performance">Performance</a></li>
<li><a href="#intelligence">Intelligence</a></li>
<li><a href="#architecture">Architecture</a></li>
<li><a href="#interview">Interview</a></li>
<li><a href="#organization">Organization</a></li>
<li><a href="#talk">Talk</a></li>
<li><a href="#book">Book</a></li>
</ul>
<h2 id="principle">Principle</h2>
<ul>
<li><a
href="https://people.eecs.berkeley.edu/~brewer/papers/GiantScale-IEEE.pdf">Lessons
from Giant-Scale Services - Eric Brewer, UC Berkeley &amp;
Google</a></li>
<li><a
href="https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf">Designs,
Lessons and Advice from Building Large Distributed Systems - Jeff Dean,
Google</a></li>
<li><a
href="https://www.infoq.com/presentations/effective-api-design">How to
Design a Good API &amp; Why it Matters - Joshua Bloch, CMU &amp;
Google</a></li>
<li><a href="http://mvdirona.com/jrh/work/">On Efficiency, Reliability,
Scaling - James Hamilton, VP at AWS</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17americas/program/presentation/rosenthal">Principles
of Chaos Engineering</a></li>
<li><a
href="https://www.usenix.org/conference/srecon16/program/presentation/lueder">Finding
the Order in Chaos</a></li>
<li><a href="https://12factor.net/">The Twelve-Factor App</a></li>
<li><a
href="https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html">Clean
Architecture</a></li>
<li><a
href="http://www.math-cs.gordon.edu/courses/cs211/lectures-2009/Cohesion,Coupling,MVC.pdf">High
Cohesion and Low Coupling</a></li>
<li><a
href="https://medium.com/@SkyscannerEng/monoliths-and-microservices-8c65708c3dbf">Monoliths
and Microservices</a></li>
<li><a
href="http://robertgreiner.com/2014/08/cap-theorem-revisited/">CAP
Theorem and Trade-offs</a></li>
<li><a href="https://blog.andyet.com/2014/10/01/right-database">CP
Databases and AP Databases</a></li>
<li><a href="http://ithare.com/scaling-stateful-objects/">Stateless vs
Stateful Scalability</a><br />
</li>
<li><a
href="https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/">Scale
Up vs Scale Out: Hidden Costs</a></li>
<li><a
href="https://neo4j.com/blog/acid-vs-base-consistency-models-explained/">ACID
and BASE</a></li>
<li><a
href="https://blogs.msdn.microsoft.com/csliu/2009/08/27/io-concept-blockingnon-blocking-vs-syncasync/">Blocking/Non-Blocking
and Sync/Async</a></li>
<li><a
href="https://use-the-index-luke.com/sql/testing-scalability">Performance
and Scalability of Databases</a></li>
<li><a
href="http://highscalability.com/blog/2011/2/10/database-isolation-levels-and-their-effects-on-performance-a.html">Database
Isolation Levels and Effects on Performance and Scalability</a></li>
<li><a
href="https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html">The
Probability of Data Loss in Large Clusters</a></li>
<li><a
href="https://docs.microsoft.com/en-us/previous-versions/msp-n-p/dn271399(v=pandp.10)">Data
Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot
Persistence</a></li>
<li><a
href="https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/">SQL
vs NoSQL</a></li>
<li><a
href="https://engineering.salesforce.com/sql-or-nosql-9eaf1d92545b">SQL
vs NoSQL - Lesson Learned at Salesforce</a></li>
<li><a
href="https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d">NoSQL
Databases: Survey and Decision Guidance</a></li>
<li><a
href="https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6">How
Sharding Works</a></li>
<li><a
href="http://www.tom-e-white.com/2007/11/consistent-hashing.html">Consistent
Hashing</a></li>
<li><a
href="https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8">Consistent
Hashing: Algorithmic Tradeoffs</a></li>
<li><a
href="https://booking.ai/dont-be-tricked-by-the-hashing-trick-192a6aae3087">Dont
be tricked by the Hashing Trick</a></li>
<li><a
href="https://medium.com/netflix-techblog/distributing-content-to-open-connect-3e3e391d4dc9">Uniform
Consistent Hashing at Netflix</a></li>
<li><a
href="https://www.allthingsdistributed.com/2008/12/eventually_consistent.html">Eventually
Consistent - Werner Vogels, CTO at Amazon</a></li>
<li><a
href="https://www.stevesouders.com/blog/2012/10/11/cache-is-king/">Cache
is King</a></li>
<li><a
href="https://www.the-paper-trail.org/post/2014-06-06-paper-notes-anti-caching/">Anti-Caching</a></li>
<li><a
href="http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it">Understand
Latency</a></li>
<li><a href="http://norvig.com/21-days.html#answers">Latency Numbers
Every Programmer Should Know</a></li>
<li><a
href="https://queue.acm.org/detail.cfm?id=3096459&amp;__s=dnkxuaws9pogqdnxmx8i">The
Calculus of Service Availability</a></li>
<li><a
href="http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html">Architecture
Issues When Scaling Web Applications: Bottlenecks, Database, CPU,
IO</a><br />
</li>
<li><a
href="http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html">Common
Bottlenecks</a></li>
<li><a href="https://queue.acm.org/detail.cfm?id=3025012">Life Beyond
Distributed Transactions</a></li>
<li><a
href="https://www.usenix.org/conference/srecon15/program/presentation/taveira">Relying
on Software to Redirect Traffic Reliably at Various Layers</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17americas/program/presentation/andrus">Breaking
Things on Purpose</a></li>
<li><a
href="https://medium.com/@rdsubhas/10-modern-software-engineering-mistakes-bc67fbef4fc8">Avoid
Over Engineering</a></li>
<li><a
href="https://www.infoq.com/articles/scalability-worst-practices">Scalability
Worst Practices</a></li>
<li><a
href="https://medium.com/@DataStax/instagram-engineerings-3-rules-to-a-scalable-cloud-application-architecture-c44afed31406">Use
Solid Technologies - Dont Re-invent the Wheel - Keep It
Simple!</a></li>
<li><a
href="https://jobs.zalando.com/tech/blog/simplicity-by-distributing-complexity/">Simplicity
by Distributing Complexity</a></li>
<li><a href="http://tech.transferwise.com/why-over-reusing-is-bad/">Why
Over-Reusing is Bad</a></li>
<li><a
href="https://blog.codinghorror.com/performance-is-a-feature/">Performance
is a Feature</a></li>
<li><a
href="https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/">Make
Performance Part of Your Workflow</a></li>
<li><a
href="https://medium.com/walmartlabs/the-benefits-of-server-side-rendering-over-client-side-rendering-5d07ff2cefe8">The
Benefits of Server Side Rendering over Client Side Rendering</a></li>
<li><a
href="https://architecht.io/lessons-from-facebook-on-engineering-for-scale-f5716f0afc7a">Automate
and Abstract: Lessons at Facebook</a></li>
<li><a
href="https://8thlight.com/blog/sarah-sunday/2017/09/15/aws-dos-and-donts.html">AWS
Dos and Donts</a></li>
<li><a
href="https://medium.com/@hellostanley/design-doesnt-scale-4d81e12cbc3e">(UI)
Design Doesnt Scale - Stanley Wood, Design Director at Spotify</a></li>
<li><a href="http://www.brendangregg.com/linuxperf.html">Linux
Performance</a></li>
<li><a
href="https://www.igvita.com/2016/05/20/building-fast-and-resilient-web-applications/">Building
Fast and Resilient Web Applications - Ilya Grigorik</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17asia/program/presentation/wang_daxin">Accept
Partial Failures, Minimize Service Loss</a></li>
<li><a
href="http://highscalability.com/blog/2012/12/31/designing-for-resiliency-will-be-so-2013.html">Design
for Resiliency</a></li>
<li><a
href="https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/self-healing">Design
for Self-healing</a></li>
<li><a
href="https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/scale-out">Design
for Scaling Out</a><br />
</li>
<li><a
href="https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-evolution">Design
for Evolution</a></li>
<li><a
href="http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html">Learn
from Mistakes</a></li>
</ul>
<h2 id="scalability">Scalability</h2>
<ul>
<li><a href="https://martinfowler.com/microservices/">Microservices and
Orchestration</a>
<ul>
<li><a
href="https://eng.uber.com/microservice-architecture/">Domain-Oriented
Microservice Architecture at Uber</a></li>
<li><a
href="https://developers.soundcloud.com/blog/service-architecture-3">Service
Architecture (3 parts: Domain Gateways, Value-Added Services, BFF) at
SoundCloud</a></li>
<li><a
href="https://engineering.riotgames.com/news/thinking-inside-container">Container
(8 parts) at Riot Games</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/containerization-at-pinterest-92295347f2f3">Containerization
at Pinterest</a></li>
<li><a
href="https://medium.com/netflix-techblog/the-evolution-of-container-usage-at-netflix-3abfc096781b">Evolution
of Container Usage at Netflix</a></li>
<li><a href="https://eng.uber.com/dockerizing-mysql/">Dockerizing MySQL
at Uber</a></li>
<li><a
href="https://labs.spotify.com/2018/01/11/testing-of-microservices/">Testing
of Microservices at Spotify</a></li>
<li><a
href="https://medium.com/treehouse-engineering/lessons-learned-running-docker-in-production-5dce99ece770">Docker
in Production at Treehouse</a></li>
<li><a
href="https://developers.soundcloud.com/blog/inside-a-soundcloud-microservice">Microservice
at SoundCloud</a></li>
<li><a href="https://stripe.com/blog/operating-kubernetes">Operate
Kubernetes Reliably at Stripe</a></li>
<li><a
href="https://tech.trivago.com/2020/06/10/cross-cluster-traffic-mirroring-with-istio/">Cross-Cluster
Traffic Mirroring with Istio at Trivago</a></li>
<li><a
href="https://open.nytimes.com/agrarian-scale-kubernetes-part-3-ee459887ed7e">Agrarian-Scale
Kubernetes (3 parts) at New York Times</a></li>
<li><a
href="https://medium.com/bbc-design-engineering/powering-bbc-online-with-nanoservices-727840ba015b">Nanoservices
at BBC</a></li>
<li><a
href="https://www.techatbloomberg.com/blog/powerfulseal-testing-tool-kubernetes-clusters/">PowerfulSeal:
Testing Tool for Kubernetes Clusters at Bloomberg</a></li>
<li><a
href="https://medium.com/netflix-techblog/netflix-conductor-a-microservices-orchestrator-2e8d4771bf40">Conductor:
Microservices Orchestrator at Netflix</a></li>
<li><a
href="https://shopifyengineering.myshopify.com/blogs/engineering/docker-at-shopify-how-we-built-containers-that-power-over-100-000-online-shops">Docker
Containers that Power Over 100.000 Online Shops at Shopify</a></li>
<li><a
href="https://medium.engineering/microservice-architecture-at-medium-9c33805eb74f">Microservice
Architecture at Medium</a></li>
<li><a href="https://boxunix.com/post/bare_metal_to_kube/">From
bare-metal to Kubernetes at Betabrand</a></li>
<li><a
href="https://medium.com/tinder-engineering/tinders-move-to-kubernetes-cda2a6372f44">Kubernetes
at Tinder</a></li>
<li><a
href="https://www.quora.com/q/quoraengineering/Adopting-Kubernetes-at-Quora">Kubernetes
at Quora</a><br />
</li>
<li><a
href="https://medium.com/pinterest-engineering/building-a-kubernetes-platform-at-pinterest-fb3d9571c948">Kubernetes
Platform at Pinterest</a></li>
<li><a
href="https://medium.com/building-nubank/microservices-at-nubank-an-overview-2ebcb336c64d">Microservices
at Nubank</a></li>
<li><a
href="https://engineering.mercari.com/en/blog/entry/20210831-2019-06-07-155849/">Payment
Transaction Management in Microservices at Mercari</a></li>
<li><a
href="https://eng.snap.com/monolith-to-multicloud-microservices-snap-service-mesh">Service
Mesh at Snap</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/grit-a-protocol-for-distributed-transactions-across-microservices/">GRIT:
Protocol for Distributed Transactions across Microservices at
eBay</a></li>
<li><a
href="https://medium.com/palantir/introducing-rubix-kubernetes-at-palantir-ab0ce16ea42e">Rubix:
Kubernetes at Palantir</a></li>
<li><a
href="https://eng.uber.com/crisp-critical-path-analysis-for-microservice-architectures/">CRISP:
Critical Path Analysis for Microservice Architectures at Uber</a></li>
</ul></li>
<li><a
href="https://www.wix.engineering/post/scaling-to-100m-to-cache-or-not-to-cache">Distributed
Caching</a>
<ul>
<li><a
href="https://medium.com/netflix-techblog/caching-for-a-global-netflix-7bcc457012f1">EVCache:
Distributed In-memory Caching at Netflix</a></li>
<li><a
href="https://medium.com/netflix-techblog/cache-warming-agility-for-a-stateful-service-2d3b1da82642">EVCache
Cache Warmer Infrastructure at Netflix</a></li>
<li><a
href="https://blog.box.com/blog/introducing-memsniff-robust-memcache-traffic-analyzer/">Memsniff:
Robust Memcache Traffic Analyzer at Box</a></li>
<li><a
href="https://codeascraft.com/2017/11/30/how-etsy-caches/">Caching with
Consistent Hashing and Cache Smearing at Etsy</a></li>
<li><a
href="https://code.facebook.com/posts/220956754772273/an-analysis-of-facebook-photo-caching/">Analysis
of Photo Caching at Facebook</a></li>
<li><a
href="https://code.facebook.com/posts/964122680272229/web-performance-cache-efficiency-exercise/">Cache
Efficiency Exercise at Facebook</a></li>
<li><a href="http://tech.trivago.com/2015/10/15/tcache/">tCache:
Scalable Data-aware Java Caching at Trivago</a></li>
<li><a
href="https://engineering.quora.com/Pycache-lightning-fast-in-process-caching">Pycache:
In-process Caching at Quora</a><br />
</li>
<li><a
href="http://tech.trivago.com/2017/12/19/how-trivago-reduced-memcached-memory-usage-by-50/">Reduce
Memcached Memory Usage by 50% at Trivago</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/03/caching-internal-service-calls-at-yelp.html">Caching
Internal Service Calls at Yelp</a></li>
<li><a
href="https://allegro.tech/2017/01/estimating-the-cache-efficiency-using-big-data.html">Estimating
the Cache Efficiency using Big Data at Allegro</a></li>
<li><a
href="https://jobs.zalando.com/tech/blog/distributed-cache-akka-kubernetes/">Distributed
Cache at Zalando</a></li>
<li><a
href="https://medium.com/netflix-techblog/evolution-of-application-data-caching-from-ram-to-ssd-a33d6fa7a690">Application
Data Caching from RAM to SSD at NetFlix</a></li>
<li><a
href="https://medium.com/@SkyscannerEng/the-tradeoffs-of-a-replicated-cache-b6680c722f58">Tradeoffs
of Replicated Cache at Skyscanner</a></li>
<li><a href="http://engblog.yext.com/post/geolocation-caching">Location
Caching with Quadtrees at Yext</a></li>
<li><a
href="https://medium.com/vimeo-engineering-blog/video-metadata-caching-at-vimeo-a54b25f0b304">Video
Metadata Caching at Vimeo</a></li>
<li><a
href="http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html">Scaling
Redis at Twitter</a></li>
<li><a
href="https://slack.engineering/scaling-slacks-job-queue-687222e9d100">Scaling
Job Queue with Redis at Slack</a></li>
<li><a
href="https://githubengineering.com/moving-persistent-data-out-of-redis/">Moving
persistent data out of Redis at Github</a></li>
<li><a
href="https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f74c">Storing
Hundreds of Millions of Simple Key-Value Pairs in Redis at
Instagram</a></li>
<li><a
href="http://tech.trivago.com/2017/01/25/learn-redis-the-hard-way-in-production/">Redis
at Trivago</a></li>
<li><a
href="https://deliveroo.engineering/2017/01/19/optimising-membership-queries.html">Optimizing
Redis Storage at Deliveroo</a></li>
<li><a
href="http://engineering.wattpad.com/post/23244724794/store-more-stuff-memory-optimization-in-redis">Memory
Optimization in Redis at Wattpad</a></li>
<li><a href="https://blog.heroku.com/rolling-redis-fleet">Redis Fleet at
Heroku</a></li>
<li><a
href="https://developers.soundcloud.com/blog/gradle-remote-build-cache-misses-part-2">Solving
Remote Build Cache Misses (2 parts) at SoundCloud</a></li>
<li><a
href="https://blog.flipkart.tech/ratings-reviews-flipkart-part-2-574ab08e75cf">Ratings
&amp; Reviews (2 parts) at Flipkart</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/prefetch-caching-of-ebay-items/">Prefetch
Caching of Items at eBay</a></li>
<li><a
href="https://www.wix.engineering/post/how-we-built-a-cross-region-caching-library">Cross-Region
Caching Library at Wix</a></li>
<li><a
href="https://medium.com/pinterest-engineering/improving-distributed-caching-performance-and-efficiency-at-pinterest-92484b5fe39b">Improving
Distributed Caching Performance and Efficiency at Pinterest</a></li>
<li><a
href="https://doordash.engineering/2023/10/19/how-doordash-standardized-and-improved-microservices-caching/">Standardize
and Improve Microservices Caching at DoorDash</a></li>
<li><a
href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching">HTTP
Caching and CDN</a>
<ul>
<li><a
href="https://www.zynga.com/blogs/engineering/zynga-geo-proxy-reducing-mobile-game-latency">Zynga
Geo Proxy: Reducing Mobile Game Latency at Zynga</a></li>
<li><a
href="https://technology.condenast.com/story/the-why-and-how-of-google-amp-at-conde-nast">Google
AMP at Condé Nast</a></li>
<li><a
href="https://deliveroo.engineering/2016/09/19/ab-testing-cdns.html">A/B
Tests on Hosting Infrastructure (CDNs) at Deliveroo</a></li>
<li><a
href="https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic">HAProxy
with Kubernetes for User-facing Traffic at SoundCloud</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy/">Bandaid:
Service Proxy at Dropbox</a></li>
<li><a
href="https://slack.engineering/service-workers-at-slack-our-quest-for-faster-boot-times-and-offline-support-3492cf79c88">Service
Workers at Slack</a></li>
<li><a
href="https://labs.spotify.com/2020/02/24/how-spotify-aligned-cdn-services-for-a-lightning-fast-streaming-experience/">CDN
Services at Spotify</a></li>
</ul></li>
</ul></li>
<li><a
href="https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html">Distributed
Locking</a>
<ul>
<li><a
href="https://blog.acolyer.org/2015/02/13/the-chubby-lock-service-for-loosely-coupled-distributed-systems/">Chubby:
Lock Service for Loosely Coupled Distributed Systems at Google</a></li>
<li><a href="https://www.youtube.com/watch?v=MDuagr729aU">Distributed
Locking at Uber</a></li>
<li><a
href="https://engineering.gosquared.com/distributed-locks-using-redis">Distributed
Locks using Redis at GoSquared</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/zookeeper-at-twitter.html">ZooKeeper
at Twitter</a></li>
<li><a
href="https://chartio.com/blog/eliminating-duplicate-queries-using-distributed-locking/">Eliminating
Duplicate Queries using Distributed Locking at Chartio</a></li>
</ul></li>
<li><a
href="https://www.oreilly.com/ideas/understanding-the-value-of-distributed-tracing">Distributed
Tracking, Tracing, and Measuring</a>
<ul>
<li><a
href="https://blog.twitter.com/engineering/en_us/a/2012/distributed-systems-tracing-with-zipkin.html">Zipkin:
Distributed Systems Tracing at Twitter</a></li>
<li><a
href="https://developers.soundcloud.com/blog/using-kubernetes-pod-metadata-to-improve-zipkin-traces">Improve
Zipkin Traces using Kubernetes Pod Metadata at SoundCloud</a></li>
<li><a
href="https://www.infoq.com/presentations/canopy-scalable-tracing-analytics-facebook">Canopy:
Scalable Distributed Tracing &amp; Analysis at Facebook</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b">Pintrace:
Distributed Tracing at Pinterest</a></li>
<li><a
href="https://engineering.atspotify.com/2021/01/20/introducing-xcmetrics-our-all-in-one-tool-for-tracking-xcode-build-metrics/">XCMetrics:
All-in-One Tool for Tracking Xcode Build Metrics at Spotify</a></li>
<li><a
href="https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency">Real-time
Distributed Tracing at LinkedIn</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17americas/program/presentation/arthorne">Tracking
Service Infrastructure at Scale at Shopify</a><br />
</li>
<li><a
href="https://engineering.hellofresh.com/scaling-hellofresh-distributed-tracing-7b182928247d">Distributed
Tracing at HelloFresh</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/analyzing-distributed-trace-data-6aae58919949">Analyzing
Distributed Trace Data at Pinterest</a></li>
<li><a href="https://eng.uber.com/distributed-tracing/">Distributed
Tracing at Uber</a></li>
<li><a href="https://eng.uber.com/jvm-profiler/">JVM Profiler: Tracing
Distributed JVM Applications at Uber</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17asia/program/presentation/mah">Data
Checking at Dropbox</a></li>
<li><a
href="https://tech.showmax.com/2016/10/tracing-distributed-systems-at-showmax/">Tracing
Distributed Systems at Showmax</a></li>
<li><a
href="https://medium.com/@palantir/osquery-across-the-enterprise-3c3c9d13ec55">osquery
Across the Enterprise at Palantir</a></li>
<li><a
href="https://codeascraft.com/2011/02/15/measure-anything-measure-everything/">StatsD
at Etsy</a></li>
</ul></li>
<li><a
href="https://www.csee.umbc.edu/courses/graduate/CMSC621/fall02/lectures/ch11.pdf">Distributed
Scheduling</a>
<ul>
<li><a
href="https://www.pagerduty.com/eng/distributed-task-scheduling-3/">Distributed
Task Scheduling (3 parts) at PagerDuty</a></li>
<li><a
href="https://landing.google.com/sre/sre-book/chapters/distributed-periodic-scheduling/">Building
Cron at Google</a></li>
<li><a
href="https://engineering.quora.com/Quoras-Distributed-Cron-Architecture">Distributed
Cron Architecture at Quora</a></li>
<li><a
href="https://medium.com/airbnb-engineering/chronos-a-replacement-for-cron-f05d7d986a9d">Chronos:
A Replacement for Cron at Airbnb</a></li>
<li><a
href="https://engblog.nextdoor.com/we-don-t-run-cron-jobs-at-nextdoor-6f7f9cc62040">Scheduler
at Nextdoor</a></li>
<li><a href="https://eng.uber.com/peloton/">Peloton: Unified Resource
Scheduler for Diverse Cluster Workloads at Uber</a></li>
<li><a
href="https://medium.com/netflix-techblog/fenzo-oss-scheduler-for-apache-mesos-frameworks-5c340e77e543">Fenzo:
OSS Scheduler for Apache Mesos Frameworks at Netflix</a></li>
<li><a href="https://airflow.apache.org/">Airflow - Workflow
Orchestration</a>
<ul>
<li><a
href="https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8">Airflow
at Airbnb</a></li>
<li><a
href="https://www.adyen.com/knowledge-hub/apache-airflow-at-adyen">Airflow
at Adyen</a></li>
<li><a
href="https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee">Airflow
at Pandora</a></li>
<li><a
href="https://medium.com/robinhood-engineering/why-robinhood-uses-airflow-aed13a9a90c8">Airflow
at Robinhood</a></li>
<li><a
href="https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff">Airflow
at Lyft</a></li>
<li><a href="https://drivy.engineering/airflow-architecture/">Airflow at
Drivy</a></li>
<li><a
href="https://engineering.grab.com/experimentation-platform-data-pipeline">Airflow
at Grab</a></li>
<li><a
href="https://medium.com/adobetech/adobe-experience-platform-orchestration-service-with-apache-airflow-952203723c0b">Airflow
at Adobe</a></li>
<li><a
href="https://medium.com/walmartlabs/auditing-airflow-batch-jobs-73b45100045">Auditing
Airflow Job Runs at Walmart</a></li>
<li><a
href="https://hackernoon.com/meet-maat-alibabas-dag-based-distributed-task-scheduler-7c9cf0c83438">MaaT:
DAG-based Distributed Task Scheduler at Alibaba</a></li>
<li><a
href="https://www.etsy.com/codeascraft/boundary-layer-declarative-airflow-workflows">boundary-layer:
Declarative Airflow Workflows at Etsy</a></li>
</ul></li>
</ul></li>
<li><a
href="https://www.oreilly.com/ideas/monitoring-distributed-systems">Distributed
Monitoring and Alerting</a>
<ul>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/unicorn-rheos-remediation-center/">Unicorn:
Remediation System at eBay</a></li>
<li><a href="https://eng.uber.com/optimizing-m3/">M3: Metrics and
Monitoring Platform at Uber</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2019/05/athena-our-automated-build-health-management-system/">Athena:
Automated Build Health Management System at Dropbox</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2019/11/monitoring-server-applications-with-vortex/">Vortex:
Monitoring Server Applications at Dropbox</a><br />
</li>
<li><a
href="https://engineering.linkedin.com/blog/2019/solving-manageability-challenges-with-nuage">Nuage:
Cloud Management Service at LinkedIn</a></li>
<li><a
href="https://netflixtechblog.com/telltale-netflix-application-monitoring-simplified-5c08bfa780ba">Telltale:
Application Monitoring at Netflix</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor">ThirdEye:
Monitoring Platform at LinkedIn</a></li>
<li><a
href="https://developers.soundcloud.com/blog/periskop-exception-monitoring-service">Periskop:
Exception Monitoring Service at SoundCloud</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2017/02/meet-securitybot-open-sourcing-automated-security-at-scale/">Securitybot:
Distributed Alerting Bot at Dropbox</a><br />
</li>
<li><a
href="https://www.usenix.org/conference/srecon18asia/presentation/xinchi">Monitoring
System at Alibaba</a></li>
<li><a
href="https://medium.com/dailymotion/real-user-monitoring-1948375f8be5">Real
User Monitoring at Dailymotion</a></li>
<li><a href="https://eng.uber.com/observability-at-scale/">Alerting
Ecosystem at Uber</a></li>
<li><a
href="https://medium.com/airbnb-engineering/alerting-framework-at-airbnb-35ba48df894f">Alerting
Framework at Airbnb</a></li>
<li><a
href="https://developers.soundcloud.com/blog/alerting-on-slos">Alerting
on Service-Level Objectives (SLOs) at SoundCloud</a></li>
<li><a
href="https://eng.uber.com/observability-anomaly-detection/">Job-based
Forecasting Workflow for Observability Anomaly Detection at
Uber</a></li>
<li><a
href="http://engineering.hackerearth.com/2017/03/21/monitoring-and-alert-system-using-graphite-and-cabot/">Monitoring
and Alert System using Graphite and Cabot at HackerEarth</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/a/2016/observability-at-twitter-technical-overview-part-ii.html">Observability
(2 parts) at Twitter</a></li>
<li><a
href="https://slack.engineering/distributed-security-alerting-c89414c992d6">Distributed
Security Alerting at Slack</a></li>
<li><a
href="https://www.infoq.com/presentations/news-alerting-bloomberg">Real-Time
News Alerting at Bloomberg</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2019/an-inside-look-at-linkedins-data-pipeline-monitoring-system-">Data
Pipeline Monitoring System at LinkedIn</a></li>
<li><a
href="https://blog.picnic.nl/monitoring-and-observability-at-picnic-684cefd845c4">Monitoring
and Observability at Picnic</a></li>
</ul></li>
<li><a
href="https://msdn.microsoft.com/en-us/library/cc767123.aspx">Distributed
Security</a>
<ul>
<li><a
href="https://blogs.dropbox.com/tech/2018/02/security-at-scale-the-dropbox-approach/">Approach
to Security at Scale at Dropbox</a></li>
<li><a
href="https://medium.com/netflix-techblog/introducing-aardvark-and-repokid-53b081bf3a7e">Aardvark
and Repokid: AWS Least Privilege for Distributed, High-Velocity
Development at Netflix</a><br />
</li>
<li><a
href="https://www.slideshare.net/MikeSvoboda/2017-lisa-linkedins-distributed-firewall-dfw">LISA:
Distributed Firewall at LinkedIn</a></li>
<li><a
href="https://engineering.coinbase.com/how-coinbase-builds-secure-infrastructure-to-store-bitcoin-in-the-cloud-30a6504e40ba">Secure
Infrastructure To Store Bitcoin In The Cloud at Coinbase</a></li>
<li><a
href="https://medium.com/airbnb-engineering/binaryalert-real-time-serverless-malware-detection-ca44370c1b90">BinaryAlert:
Real-time Serverless Malware Detection at Airbnb</a></li>
<li><a
href="https://segment.com/blog/secure-access-to-100-aws-accounts/">Scalable
IAM Architecture to Secure Access to 100 AWS Accounts at
Segment</a></li>
<li><a
href="http://engineering.indeedblog.com/blog/2018/04/oaudit-toolbox/">OAuth
Audit Toolbox at Indeed</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/04/ad-password-blacklisting.html">Active
Directory Password Blacklisting at Yelp</a><br />
</li>
<li><a
href="https://slack.engineering/syscall-auditing-at-scale-e6a3ca8ac1b8">Syscall
Auditing at Scale at Slack</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/160481899076/open-sourcing-athenz-fine-grained-role-based">Athenz:
Fine-Grained, Role-Based Access Control at Yahoo</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2018/05/introducing-webauthn-support-for-secure-dropbox-sign-in/">WebAuthn
Support for Secure Sign In at Dropbox</a></li>
<li><a
href="https://slack.engineering/moving-fast-and-securing-things-540e6c5ae58a">Security
Development Lifecycle at Slack</a></li>
<li><a
href="https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/">Unprivileged
Container Builds at Kinvolk</a></li>
<li><a
href="https://medium.com/netflix-techblog/netflix-sirt-releases-diffy-a-differencing-engine-for-digital-forensics-in-the-cloud-37b71abd2698">Diffy:
Differencing Engine for Digital Forensics in the Cloud at
Netflix</a></li>
<li><a
href="https://medium.com/netflix-techblog/netflix-cloud-security-detecting-credential-compromise-in-aws-9493d6fd373a">Detecting
Credential Compromise in AWS at Netflix</a></li>
<li><a
href="https://labs.spotify.com/2018/09/18/scalable-user-privacy/">Scalable
User Privacy at Spotify</a></li>
<li><a
href="https://engineering.indeedblog.com/blog/2018/09/application-scanning/">AVA:
Audit Web Applications at Indeed</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/11/ttl-as-a-service.html">TTL
as a Service: Automatic Revocation of Stale Privileges at Yelp</a></li>
<li><a
href="https://slack.engineering/engineering-dive-into-slack-enterprise-key-management-1fce471b178c">Enterprise
Key Management at Slack</a><br />
</li>
<li><a
href="https://blog.twitch.tv/en/2019/03/15/how-twitch-addresses-scalability-and-authentication/">Scalability
and Authentication at Twitch</a></li>
<li><a
href="https://netflixtechblog.com/edge-authentication-and-token-agnostic-identity-propagation-514e47e0b602">Edge
Authentication and Token-Agnostic Identity Propagation at
Netflix</a></li>
<li><a
href="https://blog.palantir.com/hardening-palantirs-kubernetes-infrastructure-with-cilium-1c40d4c7ef0">Hardening
Kubernetes Infrastructure with Cilium at Palantir</a></li>
<li><a
href="https://eng.lyft.com/improving-web-vulnerability-management-through-automation-2631570d8415">Improving
Web Vulnerability Management through Automation at Lyft</a></li>
<li><a
href="https://dropbox.tech/application/dropbox-passwords-clock-skew-payload-sync-merge">Clock
Skew when Syncing Password Payloads at Drobbox</a></li>
</ul></li>
<li><a href="https://arxiv.org/pdf/1704.00411.pdf">Distributed
Messaging, Queuing, and Event Streaming</a>
<ul>
<li><a
href="https://blogs.dropbox.com/tech/2017/05/introducing-cape/">Cape:
Event Stream Processing Framework at Dropbox</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2019/brooklin-open-source">Brooklin:
Distributed Service for Near Real-Time Data Streaming at
LinkedIn</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/04/samza-aeon--latency-insights-for-asynchronous-one-way-flows">Samza:
Stream Processing System for Latency Insighs at LinkedIn</a><br />
</li>
<li><a
href="https://yahooeng.tumblr.com/post/161855616651/open-sourcing-bullet-yahoos-forward-looking">Bullet:
Forward-Looking Query Engine for Streaming Data at Yahoo</a></li>
<li><a
href="https://codeascraft.com/2018/05/29/the-eventhorizon-saga/">EventHorizon:
Tool for Watching Events Streaming at Etsy</a></li>
<li><a
href="https://engineering.quora.com/Qmessage-Handling-Billions-of-Tasks-Per-Day">Qmessage:
Distributed, Asynchronous Task Queue at Quora</a></li>
<li><a href="https://eng.uber.com/cherami/">Cherami: Message Queue
System for Transporting Async Tasks at Uber</a></li>
<li><a
href="https://medium.com/airbnb-engineering/dynein-building-a-distributed-delayed-job-queueing-system-93ab10f05f99">Dynein:
Distributed Delayed Job Queueing System at Airbnb</a></li>
<li><a
href="https://netflixtechblog.com/timestone-netflixs-high-throughput-low-latency-priority-queueing-system-with-built-in-support-1abf249ba95f">Timestone:
Queueing System for Non-Parallelizable Workloads at Netflix</a></li>
<li><a
href="https://engineering.riotgames.com/news/riot-messaging-service">Messaging
Service at Riot Games</a></li>
<li><a
href="https://dropbox.tech/infrastructure/infrastructure-messaging-system-model-async-platform-evolution">Messaging
System Model at Dropbox</a></li>
<li><a
href="https://www.zillow.com/engineering/debugging-production-event-logging/">Debugging
Production with Event Logging at Zillow</a></li>
<li><a
href="https://medium.com/netflix-techblog/building-a-cross-platform-in-app-messaging-orchestration-service-86ba614f92d8">Cross-platform
In-app Messaging Orchestration Service at Netflix</a></li>
<li><a
href="https://medium.com/netflix-techblog/re-architecting-the-video-gatekeeper-f7b0ac2f6b00">Video
Gatekeeper at Netflix</a></li>
<li><a
href="https://www.infoq.com/presentations/neflix-push-messaging-scale">Scaling
Push Messaging for Millions of Devices at Netflix</a></li>
<li><a
href="http://engineering.indeedblog.com/blog/2017/06/delaying-messages/">Delaying
Asynchronous Message Processing with RabbitMQ at Indeed</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at">Benchmarking
Streaming Computation Engines at Yahoo</a></li>
<li><a
href="https://deliveroo.engineering/2019/02/05/improving-stream-data-quality-with-protobuf-schema-validation.html">Improving
Stream Data Quality With Protobuf Schema Validation at
Deliveroo</a></li>
<li><a
href="https://medium.engineering/scaling-email-infrastructure-for-medium-digest-254223c883b8">Scaling
Email Infrastructure at Medium</a></li>
<li><a href="https://slack.engineering/real-time-messaging/">Real-time
Messaging at Slack</a></li>
<li><a
href="https://medium.com/nikeengineering/moving-faster-with-aws-by-creating-an-event-stream-database-dedec8ca3eeb">Event
Stream Database at Nike</a></li>
<li><a
href="https://medium.com/udemy-engineering/designing-the-new-event-tracking-system-at-udemy-a45e502216fd">Event
Tracking System at Udemy</a></li>
<li><a
href="https://martinfowler.com/articles/201701-event-driven.html">Event-Driven
Messaging</a>
<ul>
<li><a
href="https://medium.com/swlh/creating-coding-excellence-with-domain-driven-design-88f73d2232c3">Domain-Driven
Design at Alibaba</a></li>
<li><a
href="https://medium.com/weebly-engineering/how-to-organize-your-monolith-before-breaking-it-into-services-69cbdb9248b0">Domain-Driven
Design at Weebly</a></li>
<li><a
href="https://engineering.moonpig.com/development/modelling-for-domain-driven-design">Domain-Driven
Design at Moonpig</a></li>
<li><a
href="https://www.infoq.com/presentations/netflix-scale-event-sourcing">Scaling
Event Sourcing for Netflix Downloads</a></li>
<li><a
href="https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c873cac33b8">Scaling
Event-Sourcing at Jet.com</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/event-sourcing-in-action-with-ebays-continuous-delivery-team/">Event
Sourcing (2 parts) at eBay</a></li>
<li><a
href="https://medium.com/inside-freenow/event-sourcing-an-evolutionary-perspective-31e7387aa6f1">Event
Sourcing at FREE NOW</a></li>
<li><a
href="https://medium.com/engineering-brainly/scalable-content-feed-using-event-sourcing-and-cqrs-patterns-e09df98bf977">Scalable
content feed using Event Sourcing and CQRS patterns at Brainly</a></li>
</ul></li>
<li><a href="https://aws.amazon.com/pub-sub-messaging/">Pub-Sub
Messaging</a>
<ul>
<li><a
href="https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale">Pulsar:
Pub-Sub Messaging at Scale at Yahoo</a></li>
<li><a
href="https://code.facebook.com/posts/188966771280871/wormhole-pub-sub-system-moving-data-through-space-and-time/">Wormhole:
Pub-Sub System at Facebook</a></li>
<li><a
href="https://medium.com/pinterest-engineering/memq-an-efficient-scalable-cloud-native-pubsub-system-4402695dd4e7">MemQ:
Cloud Native Pub-Sub System at Pinterest</a></li>
<li><a
href="https://medium.com/netflix-techblog/how-netflix-microservices-tackle-dataset-pub-sub-4a068adcc9a">Pub-Sub
in Microservices at Netflix</a></li>
</ul></li>
<li><a
href="https://martin.kleppmann.com/papers/kafka-debull15.pdf">Kafka -
Message Broker</a>
<ul>
<li><a
href="https://engineering.linkedin.com/kafka/running-kafka-scale">Kafka
at LinkedIn</a></li>
<li><a
href="https://medium.com/pinterest-engineering/how-pinterest-runs-kafka-at-scale-ff9c6f735be">Kafka
at Pinterest</a></li>
<li><a href="https://tech.trello.com/why-we-chose-kafka/">Kafka at
Trello</a><br />
</li>
<li><a
href="https://engineering.salesforce.com/how-apache-kafka-inspired-our-platform-events-architecture-2f351fe4cf63">Kafka
at Salesforce</a></li>
<li><a
href="https://open.nytimes.com/publishing-with-apache-kafka-at-the-new-york-times-7f0e3b7d2077">Kafka
at The New York Times</a></li>
<li><a
href="https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html">Kafka
at Yelp</a></li>
<li><a
href="https://medium.com/criteo-labs/upgrading-kafka-on-a-large-infra-3ee99f56e970">Kafka
at Criteo</a></li>
<li><a
href="https://shopifyengineering.myshopify.com/blogs/engineering/running-apache-kafka-on-kubernetes-at-shopify">Kafka
on Kubernetes at Shopify</a></li>
<li><a
href="https://engineeringblog.yelp.com/2022/03/kafka-on-paasta-part-two.html">Kafka
on PaaSTA: Running Kafka on Kubernetes at Yelp (2 parts)</a></li>
<li><a
href="https://engineeringblog.yelp.com/2019/01/migrating-kafkas-zookeeper-with-no-downtime.html">Migrating
Kafkas Zookeeper with No Downtime at Yelp</a></li>
<li><a href="https://eng.uber.com/reliable-reprocessing/">Reprocessing
and Dead Letter Queues with Kafka at Uber</a></li>
<li><a href="https://eng.uber.com/chaperone/">Chaperone: Audit Kafka
End-to-End at Uber</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2019/01/finding-kafkas-throughput-limit-in-dropbox-infrastructure/">Finding
Kafka throughput limit in infrastructure at Dropbox</a></li>
<li><a
href="https://medium.com/walmartlabs/cost-orchestration-at-walmart-f34918af67c4">Cost
Orchestration at Walmart</a></li>
<li><a
href="https://medium.com/hulu-tech-blog/how-hulu-uses-influxdb-and-kafka-to-scale-to-over-1-million-metrics-a-second-1721476aaff5">InfluxDB
and Kafka to Scale to Over 1 Million Metrics a Second at Hulu</a></li>
<li><a
href="https://medium.com/paypal-tech/scaling-kafka-to-support-paypals-data-growth-a0b4da420fab">Scaling
Kafka to Support Data Growth at PayPal</a></li>
</ul></li>
<li><a href="https://en.wikipedia.org/wiki/Data_deduplication">Stream
Data Deduplication</a>
<ul>
<li><a
href="https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/">Exactly-once
Semantics with Kafka</a></li>
<li><a
href="http://eng.tapjoy.com/blog-list/real-time-deduping-at-scale">Real-time
Deduping at Tapjoy</a></li>
<li><a
href="https://segment.com/blog/exactly-once-delivery/">Deduplication at
Segment</a></li>
<li><a
href="https://medium.com/@andrewsumin/efficient-storage-how-we-went-down-from-50-pb-to-32-pb-99f9c61bf6b4">Deduplication
at Mail.Ru</a></li>
<li><a
href="https://medium.com/mixpaneleng/petabyte-scale-data-deduplication-mixpanel-engineering-e808c70c99f8">Petabyte
Scale Data Deduplication at Mixpanel</a></li>
</ul></li>
</ul></li>
<li><a
href="https://blog.codinghorror.com/the-problem-with-logging/">Distributed
Logging</a>
<ul>
<li><a
href="https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying">Logging
at LinkedIn</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754">Scalable
and Reliable Log Ingestion at Pinterest</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2015/building-distributedlog-twitter-s-high-performance-replicated-log-servic.html">High-performance
Replicated Log Service at Twitter</a></li>
<li><a
href="https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html">Logging
Service with Spark at CERN Accelerator</a></li>
<li><a
href="https://engineering.quora.com/Logging-and-Aggregation-at-Quora">Logging
and Aggregation at Quora</a></li>
<li><a
href="https://badoo.com/techblog/blog/2016/06/06/collection-and-analysis-of-daemon-logs-at-badoo/">Collection
and Analysis of Daemon Logs at Badoo</a></li>
<li><a
href="https://medium.com/palantir/using-static-code-analysis-to-improve-log-parsing-18f0d1843965">Log
Parsing with Static Code Analysis at Palantir</a><br />
</li>
<li><a
href="https://tech.ebayinc.com/engineering/low-latency-and-high-throughput-cal-ingress/">Centralized
Application Logging at eBay</a></li>
<li><a
href="https://netflixtechblog.com/hyper-scale-vpc-flow-logs-enrichment-to-provide-network-insight-e5f1db02910d">Enrich
VPC Flow Logs at Hyper Scale to provide Network Insight at
Netflix</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/109908973316/bookkeeper-yahoos-distributed-log-storage-is">BookKeeper:
Distributed Log Storage at Yahoo</a></li>
<li><a
href="https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/">LogDevice:
Distributed Data Store for Logs at Facebook</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/03/introducing-logfeeder.html">LogFeeder:
Log Collection System at Yelp</a></li>
<li><a
href="https://medium.com/netflix-techblog/dblog-a-generic-change-data-capture-framework-69351fb9099b">DBLog:
Generic Change-Data-Capture Framework at Netflix</a></li>
</ul></li>
<li><a
href="http://nwds.cs.washington.edu/files/nwds/pdf/Distributed-WR.pdf">Distributed
Searching</a>
<ul>
<li><a
href="https://instagram-engineering.com/search-architecture-eeb34a936d3a">Search
Architecture at Instagram</a></li>
<li><a
href="http://www.cs.otago.ac.nz/homepages/andrew/papers/2017-8.pdf">Search
Architecture at eBay</a></li>
<li><a
href="https://medium.com/box-tech-blog/scaling-box-search-using-lumos-22d9e0cb4175">Search
Architecture at Box</a></li>
<li><a
href="https://medium.com/coupang-tech/the-evolution-of-search-discovery-indexing-platform-fa43e41305f9">Search
Discovery Indexing Platform at Coupang</a></li>
<li><a
href="https://medium.com/pinterest-engineering/building-a-universal-search-system-for-pinterest-e4cb03a898d4">Universal
Search System at Pinterest</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/making-e-commerce-search-faster/">Improving
Search Engine Efficiency by over 25% at eBay</a></li>
<li><a
href="https://medium.com/palantir/indexing-and-querying-telemetry-logs-with-lucene-234c5ce3e5f3">Indexing
and Querying Telemetry Logs with Lucene at Palantir</a></li>
<li><a
href="https://www.tripadvisor.com/engineering/query-understanding-at-tripadvisor/">Query
Understanding at TripAdvisor</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/03/search-federation-architecture-at-linkedin">Search
Federation Architecture at LinkedIn (2018)</a></li>
<li><a
href="https://slack.engineering/search-at-slack-431f8c80619e">Search at
Slack</a></li>
<li><a
href="https://careersatdoordash.com/blog/introducing-doordashs-in-house-search-engine/">Search
Engine at DoorDash</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2022/stability-and-scalability-for-search">Stability
and Scalability for Search at Twitter</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html">Search
Service at Twitter (2014)</a></li>
<li><a
href="https://medium.com/traveloka-engineering/high-quality-autocomplete-search-part-2-d5b15bb0dadf">Autocomplete
Search (2 parts) at Traveloka</a></li>
<li><a
href="https://product.canva.com/building-a-data-driven-autocorrection-system/">Data-Driven
Autocorrection System at Canva</a></li>
<li><a
href="https://blog.flipkart.tech/adapting-search-to-indian-phonetics-cdbe65259686">Adapting
Search to Indian Phonetics at Flipkart</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2018/09/architecture-of-nautilus-the-new-dropbox-search-engine/">Nautilus:
Search Engine at Dropbox</a></li>
<li><a
href="https://engineering.linkedin.com/search/did-you-mean-galene">Galene:
Search Architecture of LinkedIn</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/manas-a-high-performing-customized-search-system-cf189f6ca40f">Manas:
High Performing Customized Search System at Pinterest</a></li>
<li><a
href="https://blog.flipkart.tech/sherlock-near-real-time-search-indexing-95519783859d">Sherlock:
Near Real Time Search Indexing at Flipkart</a></li>
<li><a
href="https://medium.com/airbnb-engineering/nebula-as-a-storage-platform-to-build-airbnbs-search-backends-ecc577b05f06">Nebula:
Storage Platform to Build Search Backends at Airbnb</a></li>
<li><a
href="https://logz.io/blog/15-tech-companies-chose-elk-stack/">ELK
(Elasticsearch, Logstash, Kibana) Stack</a>
<ul>
<li><a href="https://eng.uber.com/elk/">Predictions in Real Time with
ELK at Uber</a></li>
<li><a
href="https://webuild.envato.com/blog/building-a-scalable-elk-stack/">Building
a scalable ELK stack at Envato</a></li>
<li><a href="https://robinhood.engineering/taming-elk-4e1349f077c3">ELK
at Robinhood</a></li>
<li><a
href="https://www.infoq.com/presentations/uber-elasticsearch-clusters?utm_source=presentations_about_Case_Study&amp;utm_medium=link&amp;utm_campaign=Case_Study">Scaling
Elasticsearch Clusters at Uber</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/elasticsearch-performance-tuning-practice-at-ebay/">Elasticsearch
Performance Tuning Practice at eBay</a></li>
<li><a
href="https://medium.com/tinder-engineering/how-we-improved-our-performance-using-elasticsearch-plugins-part-2-b051da2ee85b">Improve
Performance using Elasticsearch Plugins (2 parts) at Tinder</a></li>
<li><a
href="https://kickstarter.engineering/elasticsearch-at-kickstarter-db3c487887fc">Elasticsearch
at Kickstarter</a></li>
<li><a
href="https://tech.trivago.com/2016/01/19/logstash_protobuf_codec/">Log
Parsing with Logstash and Google Protocol Buffers at Trivago</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/06/fast-order-search.html">Fast
Order Search using Data Pipeline and Elasticsearch at Yelp</a></li>
<li><a
href="https://engineeringblog.yelp.com/2017/06/moving-yelps-core-business-search-to-elasticsearch.html">Moving
Core Business Search to Elasticsearch at Yelp</a></li>
<li><a
href="http://engineering.vinted.com/2017/06/05/sharding-out-elasticsearch/">Sharding
out Elasticsearch at Vinted</a></li>
<li><a
href="http://engineering.wattpad.com/post/146216619727/self-ranking-search-with-elasticsearch-at-wattpad">Self-Ranking
Search with Elasticsearch at Wattpad</a></li>
<li><a
href="https://github.blog/2019-03-05-vulcanizer-a-library-for-operating-elasticsearch/">Vulcanizer:
a library for operating Elasticsearch at Github</a><br />
</li>
</ul></li>
</ul></li>
<li><a
href="http://highscalability.com/blog/2011/11/1/finding-the-right-data-solution-for-your-application-in-the.html">Distributed
Storage</a>
<ul>
<li><a
href="https://medium.com/@denisanikin/what-an-in-memory-database-is-and-how-it-persists-data-efficiently-f43868cff4c1">In-memory
Storage</a>
<ul>
<li><a
href="http://highscalability.com/blog/2012/8/14/memsql-architecture-the-fast-mvcc-inmem-lockfree-codegen-and.html">MemSQL
Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar
(SQL)</a></li>
<li><a
href="https://engineering.quora.com/Optimizing-Memcached-Efficiency">Optimizing
Memcached Efficiency at Quora</a></li>
<li><a href="https://blogs.cisco.com/datacenter/memsql">Real-Time Data
Warehouse with MemSQL on Cisco UCS</a></li>
<li><a href="http://eng.tapjoy.com/blog-list/moving-to-memsql">Moving to
MemSQL at Tapjoy</a></li>
<li><a
href="https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/68131">MemSQL
and Kinesis for Real-time Insights at Disney</a></li>
<li><a
href="https://engineering.pandora.com/using-memsql-at-pandora-79a86cb09b57">MemSQL
to Query Hundreds of Billions of Rows in a Dashboard at Pandora</a></li>
</ul></li>
<li><a
href="http://www.datacenterknowledge.com/archives/2013/10/04/object-storage-the-future-of-scale-out">Object
Storage</a>
<ul>
<li><a href="https://eng.uber.com/scaling-hdfs/">Scaling HDFS at
Uber</a></li>
<li><a
href="https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html">Reasons
for Choosing S3 over HDFS at Databricks</a></li>
<li><a
href="https://www.quantcast.com/blog/quantcast-file-system-on-amazon-s3/">File
System on Amazon S3 at Quantcast</a></li>
<li><a
href="https://tech.trivago.com/2018/09/03/efficient-image-recovery-at-scale-using-amazon-s3-versioning/">Image
Recovery at Scale Using S3 Versioning at Trivago</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at">Cloud
Object Store at Yahoo</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17americas/program/presentation/shenoy">Ambry:
Distributed Immutable Object Store at LinkedIn</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum">Dynamometer:
Scale Testing HDFS on Minimal Hardware with Maximum Fidelity at
LinkedIn</a></li>
<li><a
href="https://medium.com/airbnb-engineering/hammerspace-persistent-concurrent-off-heap-storage-3db39bb04472">Hammerspace:
Persistent, Concurrent, Off-heap Storage at Airbnb</a></li>
<li><a
href="https://medium.com/netflix-techblog/mezzfs-mounting-object-storage-in-netflixs-media-processing-platform-cda01c446ba">MezzFS:
Mounting Object Storage in Media Processing Platform at
Netflix</a><br />
</li>
<li><a
href="https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pocket/">Magic
Pocket: In-house Multi-exabyte Storage System at Dropbox</a></li>
</ul></li>
</ul></li>
<li><a
href="https://www.mysql.com/products/cluster/scalability.html">Relational
Databases</a>
<ul>
<li><a href="https://www.uber.com/en-SG/blog/mysql-at-uber/">MySQL at
Uber</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/learn-to-stop-using-shiny-new-things-and-love-mysql-3e1613c2ce14">MySQL
at Pinterest</a></li>
<li><a
href="https://blog.twitch.tv/en/2016/10/11/how-twitch-uses-postgresql-c34aa9e56f58">PostgreSQL
at Twitch</a></li>
<li><a
href="https://medium.com/airbnb-engineering/tracking-the-money-scaling-financial-reporting-at-airbnb-6d742b80f040">Scaling
MySQL-based Financial Reporting System at Airbnb</a></li>
<li><a
href="https://www.wix.engineering/post/scaling-to-100m-mysql-is-a-better-nosql">Scaling
MySQL at Wix</a></li>
<li><a
href="https://engineering.fb.com/2023/05/16/data-infrastructure/mysql-raft-meta/">Building
and Deploying MySQL Raft at Meta</a></li>
<li><a
href="https://medium.com/airbnb-engineering/unlocking-horizontal-scalability-in-our-web-serving-tier-d907449cdbcf">MaxScale
(MySQL) Database Proxy at Airbnb</a></li>
<li><a
href="https://www.uber.com/en-NL/blog/postgres-to-mysql-migration/">Switching
from Postgres to MySQL at Uber</a></li>
<li><a
href="https://engineering.instagram.com/handling-growth-with-postgres-5-tips-from-instagram-d5d7e7ffdfcb">Handling
Growth with Postgres at Instagram</a></li>
<li><a
href="http://tech.transferwise.com/scaling-our-analytics-database/">Scaling
the Analytics Database (Postgres) at TransferWise</a></li>
<li><a
href="https://medium.com/adyen/updating-a-50-terabyte-postgresql-database-f64384b799e7">Updating
a 50 Terabyte PostgreSQL Database at Adyen</a></li>
<li><a
href="https://medium.com/paypal-engineering/scaling-database-access-for-100s-of-billions-of-queries-per-day-paypal-introducing-hera-e192adacda54">Scaling
Database Access for 100s of Billions of Queries per Day at
PayPal</a></li>
<li><a
href="https://engineeringblog.yelp.com/2020/11/minimizing-read-write-mysql-downtime.html">Minimizing
Read-Write MySQL Downtime at Yelp</a></li>
<li><a
href="https://engineering.fb.com/2021/07/22/data-infrastructure/mysql/">Migrating
MySQL from 5.6 to 8.0 at Facebook</a></li>
<li><a
href="https://quoraengineering.quora.com/Migration-from-HBase-to-MyRocks-at-Quora">Migration
from HBase to MyRocks at Quora</a></li>
<li><a
href="https://docs.microsoft.com/en-us/sql/relational-databases/replication/types-of-replication">Replication</a>
<ul>
<li><a
href="https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-annex-under-the-hood-eb456cf8b2fb">MySQL
Parallel Replication (4 parts) at Booking.com</a></li>
<li><a
href="https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/">Mitigating
MySQL Replication Lag and Reducing Read Load at Github</a></li>
<li><a
href="https://shopify.engineering/read-consistency-database-replicas">Read
Consistency with Database Replicas at Shopify</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/04/black-box-auditing.html">Black-Box
Auditing: Verifying End-to-End Replication Integrity between MySQL and
Redshift at Yelp</a></li>
<li><a
href="https://medium.com/airbnb-engineering/how-we-partitioned-airbnb-s-main-database-in-two-weeks-55f7e006ff21">Partitioning
Main MySQL Database at Airbnb</a></li>
<li><a href="https://eng.uber.com/herb-datacenter-replication/">Herb:
Multi-DC Replication Engine for Schemaless Datastore at Uber</a></li>
</ul></li>
<li><a
href="https://quabase.sei.cmu.edu/mediawiki/index.php/Shard_data_set_across_multiple_servers_(Range-based)">Sharding</a>
<ul>
<li><a
href="https://medium.com/@Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f">Sharding
MySQL at Pinterest</a></li>
<li><a
href="https://www.twilio.com/engineering/2014/06/26/how-we-replaced-our-data-pipeline-with-zero-downtime">Sharding
MySQL at Twilio</a></li>
<li><a
href="https://medium.com/square-corner-blog/sharding-cash-10280fa3ef3b">Sharding
MySQL at Square</a></li>
<li><a
href="https://www.quora.com/q/quoraengineering/MySQL-sharding-at-Quora">Sharding
MySQL at Quora</a></li>
<li><a href="https://eng.uber.com/schemaless-rewrite/">Sharding Layer of
Schemaless Datastore at Uber</a></li>
<li><a
href="https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c">Sharding
&amp; IDs at Instagram</a></li>
<li><a
href="https://www.notion.so/blog/sharding-postgres-at-notion">Sharding
Postgres at Notion</a></li>
<li><a
href="https://blog.box.com/blog/solr-improving-performance-batch-indexing/">Solr:
Improving Performance for Batch Indexing at Box</a></li>
<li><a
href="https://medium.com/tinder-engineering/geosharded-recommendations-part-3-consistency-2d2cb2f0594b">Geosharded
Recommendations (3 parts) at Tinder</a></li>
<li><a
href="https://engineering.fb.com/production-engineering/scaling-services-with-shard-manager/">Scaling
Services with Shard Manager at Facebook</a></li>
</ul></li>
<li><a
href="https://research.fb.com/wp-content/uploads/2019/03/Presto-SQL-on-Everything.pdf?">Presto
the Distributed SQL Query Engine</a>
<ul>
<li><a
href="https://medium.com/@Pinterest_Engineering/presto-at-pinterest-a8bda7515e52">Presto
at Pinterest</a></li>
<li><a
href="https://eng.lyft.com/presto-infrastructure-at-lyft-b10adb9db01">Presto
Infrastructure at Lyft</a></li>
<li><a
href="https://engineering.grab.com/scaling-like-a-boss-with-presto">Presto
at Grab</a></li>
<li><a href="https://eng.uber.com/presto/">Engineering Data Analytics
with Presto and Apache Parquet at Uber</a></li>
<li><a
href="https://slack.engineering/data-wrangling-at-slack-f2e0ff633b69">Data
Wrangling at Slack</a></li>
<li><a
href="https://medium.com/netflix-techblog/using-presto-in-our-big-data-platform-on-aws-938035909fd4">Presto
in Big Data Platform on AWS at Netflix</a></li>
<li><a
href="https://www.eventbrite.com/engineering/big-data-workloads-presto-auto-scaling/">Presto
Auto Scaling at Eventbrite</a></li>
<li><a
href="https://www.uber.com/en-MY/blog/speed-up-presto-with-alluxio-local-cache/">Speed
Up Presto with Alluxio Local Cache at Uber</a></li>
</ul></li>
</ul></li>
<li><a
href="https://www.thoughtworks.com/insights/blog/nosql-databases-overview">NoSQL
Databases</a>
<ul>
<li><a
href="http://www.cs.ucsb.edu/~agrawal/fall2009/dynamo.pdf">Key-Value
Databases</a>
<ul>
<li><a
href="https://medium.com/nikeengineering/becoming-a-nimble-giant-how-dynamo-db-serves-nike-at-scale-4cc375dbb18e">DynamoDB
at Nike</a></li>
<li><a
href="https://segment.com/blog/the-million-dollar-eng-problem/">DynamoDB
at Segment</a></li>
<li><a
href="https://blog.mapbox.com/scaling-mapbox-infrastructure-with-dynamodb-streams-d53eabc5e972">DynamoDB
at Mapbox</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/a/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale.html">Manhattan:
Distributed Key-Value Database at Twitter</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/120730204806/sherpa-scales-new-heights">Sherpa:
Distributed NoSQL Key-Value Store at Yahoo</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/178262468576/introducing-halodb-a-fast-embedded-key-value">HaloDB:
Embedded Key-Value Storage Engine at Yahoo</a></li>
<li><a
href="http://engineering.indeedblog.com/blog/2018/02/indeed-mph/">MPH:
Fast and Compact Immutable Key-Value Stores at Indeed</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2017/02/building-venice-with-apache-helix">Venice:
Distributed Key-Value Database at Linkedin</a></li>
</ul></li>
<li><a href="https://aws.amazon.com/nosql/columnar/">Columnar
Databases</a>
<ul>
<li><a
href="http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf">Cassandra</a>
<ul>
<li><a
href="https://www.slideshare.net/DataStax/cassandra-at-instagram-2016">Cassandra
at Instagram</a></li>
<li><a
href="https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593">Storing
Images in Cassandra at Walmart</a></li>
<li><a
href="https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7">Storing
Messages with Cassandra at Discord</a></li>
<li><a
href="https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04">Scaling
Cassandra Cluster at Walmart</a></li>
<li><a
href="https://engineeringblog.yelp.com/2016/08/how-we-scaled-our-ad-analytics-with-cassandra.html">Scaling
Ad Analytics with Cassandra at Yelp</a></li>
<li><a
href="https://medium.com/dream11-tech-blog/leaderboard-dream11-4efc6f93c23e">Scaling
to 100+ Million Reads/Writes using Spark and Cassandra at
Dream11</a><br />
</li>
<li><a
href="https://www.zomato.com/blog/how-we-moved-our-food-feed-from-redis-to-cassandra">Moving
Food Feed from Redis to Cassandra at Zomato</a></li>
<li><a
href="https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e">Benchmarking
Cassandra Scalability on AWS at Netflix</a></li>
<li><a
href="https://quickbooks-engineering.intuit.com/service-decomposition-at-scale-70405ac2f637">Service
Decomposition at Scale with Cassandra at Intuit QuickBooks</a></li>
<li><a
href="https://developers.soundcloud.com/blog/keeping-counts-in-sync">Cassandra
for Keeping Counts In Sync at SoundCloud</a></li>
<li><a
href="https://medium.com/glassdoor-engineering/cassandra-driver-configuration-for-improved-performance-and-load-balancing-1b0106ce12bb">Cassandra
Driver Configuration for Improved Performance and Load Balancing at
Glassdoor</a></li>
<li><a
href="https://labs.spotify.com/2018/09/04/introducing-cstar-the-spotify-cassandra-orchestration-tool-now-open-source/">cstar:
Cassandra Orchestration Tool at Spotify</a></li>
</ul></li>
<li><a href="https://hbase.apache.org/">HBase</a>
<ul>
<li><a
href="https://engineering.salesforce.com/investing-in-big-data-apache-hbase-b9d98661a66b">HBase
at Salesforce</a></li>
<li><a
href="https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919/">HBase
in Facebook Messages</a></li>
<li><a
href="https://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/">HBase
in Imgur Notification</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954">Improving
HBase Backup Efficiency at Pinterest</a></li>
<li><a
href="https://www.slideshare.net/HBaseCon/hbase-practice-at-xiaomi">HBase
at Xiaomi</a></li>
</ul></li>
<li><a
href="https://www.allthingsdistributed.com/2018/11/amazon-redshift-performance-optimization.html">Redshift</a>
<ul>
<li><a
href="https://engineering.giphy.com/scaling-redshift-without-scaling-costs/">Redshift
at GIPHY</a></li>
<li><a
href="https://www.hudl.com/bits/the-low-hanging-fruit-of-redshift-performance">Redshift
at Hudl</a></li>
<li><a
href="https://drivy.engineering/redshift_tips_ticks_part_1/">Redshift at
Drivy</a></li>
</ul></li>
</ul></li>
<li><a
href="https://msdn.microsoft.com/en-us/magazine/hh547103.aspx">Document
Databases</a>
<ul>
<li><a
href="https://www.mongodb.com/blog/post/ebay-building-mission-critical-multi-data-center-applications-with-mongodb">eBay:
Building Mission-Critical Multi-Data Center Applications with
MongoDB</a></li>
<li><a
href="https://www.mongodb.com/blog/post/mongodb-at-baidu-powering-100-apps-across-600-nodes-at-pb-scale">MongoDB
at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160
Shards</a></li>
<li><a
href="https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952">Migrating
Mongo Data at Addepar</a></li>
<li><a
href="https://medium.baqend.com/parse-is-gone-a-few-secrets-about-their-infrastructure-91b3ab2fcf71">The
AWS and MongoDB Infrastructure of Parse (acquired by Facebook)</a></li>
<li><a
href="https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952">Migrating
Mountains of Mongo Data at Addepar</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2017/12/couchbase-ecosystem-at-linkedin">Couchbase
Ecosystem at LinkedIn</a></li>
<li><a
href="https://medium.com/zendesk-engineering/resurrecting-amazon-simpledb-9404034ec506">SimpleDB
at Zendesk</a></li>
<li><a
href="https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store">Espresso:
Distributed Document Store at LinkedIn</a></li>
</ul></li>
<li><a
href="https://www.eecs.harvard.edu/margo/papers/systor13-bench/">Graph
Databases</a>
<ul>
<li><a
href="https://blog.twitter.com/engineering/en_us/a/2010/introducing-flockdb.html">FlockDB:
Distributed Graph Database at Twitter</a></li>
<li><a
href="https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/11730-atc13-bronson.pdf">TAO:
Distributed Data Store for the Social Graph at Facebook</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/akutan-a-distributed-knowledge-graph-store/">Akutan:
Distributed Knowledge Graph Store at eBay</a></li>
</ul></li>
</ul></li>
<li><a href="https://www.influxdata.com/time-series-database/">Time
Series Databases</a>
<ul>
<li><a
href="https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/">Beringei:
High-performance Time Series Storage Engine at Facebook</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/metricsdb.html">MetricsDB:
TimeSeries Database for storing metrics at Twitter</a></li>
<li><a
href="https://medium.com/netflix-techblog/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a">Atlas:
In-memory Dimensional Time Series Database at Netflix</a></li>
<li><a
href="https://labs.spotify.com/2015/11/17/monitoring-at-spotify-introducing-heroic/">Heroic:
Time Series Database at Spotify</a></li>
<li><a
href="https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-timestamped-events">Roshi:
Distributed Storage System for Time-Series Event at SoundCloud</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181">Goku:
Time Series Database at Pinterest</a></li>
<li><a
href="https://medium.com/netflix-techblog/scaling-time-series-data-storage-part-ii-d67939655586">Scaling
Time Series Data Storage (2 parts) at Netflix</a></li>
<li><a
href="https://netflixtechblog.com/introducing-netflix-timeseries-data-abstraction-layer-31552f6326f8">Time
Series Data Abstraction Layer at Netflix</a></li>
<li><a href="https://druid.apache.org/">Druid - Real-time Analytics
Database</a>
<ul>
<li><a
href="https://medium.com/airbnb-engineering/druid-airbnb-data-platform-601c312f2a4c">Druid
at Airbnb</a></li>
<li><a
href="https://medium.com/walmartlabs/event-stream-analytics-at-walmart-with-druid-dcf1a37ceda7">Druid
at Walmart</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/monitoring-at-ebay-with-druid/">Druid
at eBay</a></li>
<li><a
href="https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06">Druid
at Netflix</a></li>
</ul></li>
</ul></li>
<li><a
href="https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/">Distributed
Repositories, Dependencies, and Configurations Management</a>
<ul>
<li><a href="https://githubengineering.com/introducing-dgit/">DGit:
Distributed Git at Github</a></li>
<li><a
href="https://medium.com/@palantir/stemma-distributed-git-server-70afbca0fc29">Stemma:
Distributed Git Server at Palantir</a></li>
<li><a
href="https://code.flickr.net/2016/03/24/configuration-management-for-distributed-systems-using-github-and-cfg4j/">Configuration
Management for Distributed Systems at Flickr</a></li>
<li><a
href="https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-repo-on-the-planet/">Git
Repository at Microsoft</a></li>
<li><a href="https://www.infoq.com/news/2017/02/GVFS">Solve Git Problem
with Large Repositories at Microsoft</a></li>
<li><a
href="https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext">Single
Repository at Google</a><br />
</li>
<li><a
href="https://medium.com/adyen/from-0-100-billion-scaling-infrastructure-and-workflow-at-adyen-7b63b690dfb6">Scaling
Infrastructure and (Git) Workflow at Adyen</a><br />
</li>
<li><a
href="https://medium.com/booking-com-infrastructure/dotfiles-distribution-dedb69c66a75">Dotfiles
Distribution at Booking.com</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/06/yelps-secret-detector.html">Secret
Detector: Preventing Secrets in Source Code at Yelp</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/09/managing-software-dependency-at-scale">Managing
Software Dependency at Scale at LinkedIn</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2020/continuous-integration">Merging
Code in High-velocity Repositories at LinkedIn</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/dynamic-configuration-at-twitter.html">Dynamic
Configuration at Twitter</a></li>
<li><a
href="https://medium.com/mixpaneleng/dynamic-configuration-at-mixpanel-94bfcf97d6b8">Dynamic
Configuration at Mixpanel</a></li>
<li><a
href="https://sg.godaddy.com/engineering/2019/03/06/dynamic-configuration-for-nodejs/">Dynamic
Configuration at GoDaddy</a></li>
<li><a
href="https://engineering.atspotify.com/2023/5/fleet-management-at-spotify-part-3-fleet-wide-refactoring">Fleet
Management (3 parts) at Spotify</a></li>
</ul></li>
<li><a
href="https://www.synopsys.com/blogs/software-security/agile-cicd-devops-glossary/">Scaling
Continuous Integration and Continuous Delivery</a>
<ul>
<li><a
href="https://code.fb.com/web/rapid-release-at-massive-scale/">Continuous
Integration Stack at Facebook</a></li>
<li><a
href="https://medium.com/netflix-techblog/towards-true-continuous-integration-distributed-repositories-and-dependencies-2a2e3108c051">Continuous
Integration with Distributed Repositories and Dependencies at
Netflix</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2019/12/continuous-integration-and-deployment-with-bazel/">Continuous
Integration and Deployment with Bazel at Dropbox</a></li>
<li><a
href="https://medium.com/airbnb-engineering/adopting-bazel-for-web-at-scale-a784b2dbe325">Adopting
Bazel for Web at Airbnb</a></li>
<li><a
href="https://tech.buzzfeed.com/continuous-deployments-at-buzzfeed-d171f76c1ac4">Continuous
Deployments at BuzzFeed</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/155765242061/open-sourcing-screwdriver-yahoos-continuous">Screwdriver:
Continuous Delivery Build System for Dynamic Infrastructure at
Yahoo</a></li>
<li><a
href="https://www.betterment.com/resources/ci-cd-shortening-the-feedback-loop/">CI/CD
at Betterment</a></li>
<li><a
href="https://medium.com/engineering-brainly/ci-cd-at-scale-fdfb0f49e031">CI/CD
at Brainly</a></li>
<li><a
href="https://engineering.shopify.com/blogs/engineering/scaling-ios-ci-with-anka">Scaling
iOS CI with Anka at Shopify</a></li>
<li><a
href="https://engineeringblog.yelp.com/2019/04/Scaling-Jira-Server-Administration-For-The-Enterprise.html">Scaling
Jira Server at Yelp</a></li>
<li><a
href="https://flexport.engineering/how-flexport-halved-testing-costs-with-an-auto-scaling-ci-cd-cluster-8304297222f">Auto-scaling
CI/CD cluster at Flexport</a></li>
</ul></li>
</ul>
<h2 id="availability">Availability</h2>
<ul>
<li><a href="https://queue.acm.org/detail.cfm?id=2371297">Resilience
Engineering: Learning to Embrace Failure</a>
<ul>
<li><a
href="https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear">Resilience
Engineering with Project Waterbear at LinkedIn</a></li>
<li><a
href="https://tech.iheart.com/resiliency-against-traffic-oversaturation-77c5ed92a5fb">Resiliency
against Traffic Oversaturation at iHeartRadio</a></li>
<li><a
href="https://blog.gojekengineering.com/resiliency-in-distributed-systems-efd30f74baf4">Resiliency
in Distributed Systems at GO-JEK</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/practical-nosql-resilience-design-pattern-for-the-enterprise/">Practical
NoSQL Resilience Design Pattern for the Enterprise at eBay</a></li>
<li><a
href="https://engineering.quora.com/Ensuring-Quoras-Resilience-to-Disaster">Ensuring
Resilience to Disaster at Quora</a></li>
<li><a
href="https://www.infoq.com/presentations/expedia-website-resiliency?utm_source=presentations_about_Case_Study&amp;utm_medium=link&amp;utm_campaign=Case_Study">Site
Resiliency at Expedia</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/resiliency-and-disaster-recovery-with-kafka/">Resiliency
and Disaster Recovery with Kafka at eBay</a></li>
<li><a href="https://eng.uber.com/kafka/">Disaster Recovery for
Multi-Region Kafka at Uber</a></li>
</ul></li>
<li><a
href="http://cloudpatterns.org/mechanisms/failover_system">Failover</a>
<ul>
<li><a
href="https://www.usenix.org/conference/srecon16/program/presentation/heady">The
Evolution of Global Traffic Routing and Failover</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17asia/program/presentation/liu_zehua">Testing
for Disaster Recovery Failover Testing</a></li>
<li><a
href="https://blog.risingstack.com/designing-microservices-architecture-for-failure/">Designing
a Microservices Architecture for Failure</a></li>
<li><a
href="https://engineering.gosquared.com/use-elb-automatic-failover">ELB
for Automatic Failover at GoSquared</a></li>
<li><a
href="http://americanexpress.io/eliminate-the-database-for-higher-availability/">Eliminate
the Database for Higher Availability at American Express</a></li>
<li><a
href="http://engineering.vinted.com/2015/09/03/failover-with-redis-sentinel/">Failover
with Redis Sentinel at Vinted</a></li>
<li><a
href="http://engineering.freeagent.com/2017/02/06/ha-infrastructure-without-breaking-the-bank/">High-availability
SaaS Infrastructure at FreeAgent</a></li>
<li><a
href="https://github.blog/2018-06-20-mysql-high-availability-at-github/">MySQL
High Availability at GitHub</a></li>
<li><a
href="https://www.eventbrite.com/engineering/mysql-high-availability-at-eventbrite/">MySQL
High Availability at Eventbrite</a></li>
<li><a
href="https://medium.com/walmartlabs/business-continuity-disaster-recovery-in-the-microservices-world-ef2adca363df">Business
Continuity &amp; Disaster Recovery at Walmart</a></li>
</ul></li>
<li><a
href="https://blog.vivekpanyam.com/scaling-a-web-service-load-balancing/">Load
Balancing</a>
<ul>
<li><a
href="https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236">Introduction
to Modern Network Load Balancing and Proxying</a></li>
<li><a
href="https://www.f5.com/company/blog/top-five-scalability-patterns">Top
Five (Load Balancing) Scalability Patterns</a></li>
<li><a
href="https://www.usenix.org/conference/srecon15europe/program/presentation/shuff">Load
Balancing infrastructure to support more than 1.3 billion users at
Facebook</a></li>
<li><a
href="https://code.facebook.com/posts/1734309626831603/dhcplb-an-open-source-load-balancer/">DHCPLB:
DHCP Load Balancer at Facebook</a></li>
<li><a
href="https://code.facebook.com/posts/1906146702752923/open-sourcing-katran-a-scalable-network-load-balancer/">Katran:
Scalable Network Load Balancer at Facebook</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html">Deterministic
Aperture: A Distributed, Load Balancing Algorithm at Twitter</a><br />
</li>
<li><a
href="https://medium.com/netflix-techblog/netflix-shares-cloud-load-balancing-and-failover-tool-eureka-c10647ef95e5">Load
Balancing with Eureka at Netflix</a></li>
<li><a
href="https://medium.com/netflix-techblog/netflix-edge-load-balancing-695308b5548c">Edge
Load Balancing at Netflix</a></li>
<li><a
href="https://medium.com/netflix-techblog/open-sourcing-zuul-2-82ea476cb2b3">Zuul
2: Cloud Gateway at Netflix</a></li>
<li><a
href="https://engineeringblog.yelp.com/2017/05/taking-zero-downtime-load-balancing-even-further.html">Load
Balancing at Yelp</a></li>
<li><a href="https://githubengineering.com/introducing-glb/">Load
Balancing at Github</a></li>
<li><a
href="https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed">Consistent
Hashing to Improve Load Balancing at Vimeo</a></li>
<li><a
href="https://developers.500px.com/udp-load-balancing-with-keepalived-167382d7ad08">UDP
Load Balancing at 500 pixel</a></li>
<li><a href="https://eng.uber.com/qalm/">QALM: QoS Load Management
Framework at Uber</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17europe/program/presentation/rastogi">Traffic
Steering using Rum DNS at LinkedIn</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2018/10/dropbox-traffic-infrastructure-edge-network/">Traffic
Infrastructure (Edge Network) at Dropbox</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2020/01/intelligent-dns-based-load-balancing-at-dropbox/">Intelligent
DNS based load balancing at Dropbox</a></li>
<li><a href="https://stripe.com/en-sg/blog/secret-life-of-dns">Monitor
DNS systems at Stripe</a></li>
<li><a
href="https://medium.com/monday-engineering/how-and-why-we-migrated-our-dns-from-cloudflare-to-a-multi-dns-architecture-part-3-584a470f4062">Multi-DNS
Architecture (3 parts) at Monday</a></li>
<li><a
href="https://medium.com/hulu-tech-blog/building-hulus-dynamic-anycast-dns-infrastructure-985a7a11fd30">Dynamic
Anycast DNS Infrastructure at Hulu</a></li>
</ul></li>
<li><a href="https://www.keycdn.com/support/rate-limiting/">Rate
Limiting</a>
<ul>
<li><a
href="https://blog.cloudflare.com/counting-things-a-lot-of-different-things/">Rate
Limiting for Scaling to Millions of Domains at Cloudflare</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate-limiting-at-yahoo">Cloud
Bouncer: Distributed Rate Limiting at Yahoo</a></li>
<li><a href="https://stripe.com/blog/rate-limiters">Scaling API with
Rate Limiters at Stripe</a></li>
<li><a
href="https://allegro.tech/2017/04/hermes-max-rate.html">Distributed
Rate Limiting at Allegro</a></li>
<li><a
href="https://www.twilio.com/blog/2017/11/chaos-engineering-ratequeue-ha.html">Ratequeue:
Core Queueing-And-Rate-Limiting System at Twilio</a></li>
<li><a href="https://engineering.grab.com/quotas-service">Quotas Service
at Grab</a></li>
<li><a
href="https://medium.com/figma-design/an-alternative-approach-to-rate-limiting-f8a06cf7c94c">Rate
Limiting at Figma</a><br />
</li>
</ul></li>
<li><a
href="https://medium.com/@BotmetricHQ/top-11-hard-won-lessons-learned-about-aws-auto-scaling-5bfe56da755f">Autoscaling</a>
<ul>
<li><a
href="https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64">Autoscaling
Pinterest</a></li>
<li><a
href="https://medium.com/square-corner-blog/autoscaling-based-on-request-queuing-c4c0f57f860f">Autoscaling
Based on Request Queuing at Square</a></li>
<li><a
href="http://tech.trivago.com/2017/02/17/your-definite-guide-for-autoscaling-jenkins/">Autoscaling
Jenkins at Trivago</a></li>
<li><a
href="https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/">Autoscaling
Pub-Sub Consumers at Spotify</a></li>
<li><a
href="https://labs.spotify.com/2018/12/18/bigtable-autoscaler-saving-money-and-time-using-managed-storage/">Autoscaling
Bigtable Clusters based on CPU Load at Spotify</a></li>
<li><a
href="https://engineeringblog.yelp.com/2019/06/autoscaling-aws-step-functions-activities.html">Autoscaling
AWS Step Functions Activities at Yelp</a></li>
<li><a
href="https://medium.com/netflix-techblog/scryer-netflixs-predictive-auto-scaling-engine-a3f8fc922270">Scryer:
Predictive Auto Scaling Engine at Netflix</a><br />
</li>
<li><a
href="https://medium.com/palantir/bouncer-simple-aws-auto-scaling-rollovers-c5af601d65d4">Bouncer:
Simple AWS Auto Scaling Rollovers at Palantir</a></li>
<li><a
href="https://engineeringblog.yelp.com/2019/02/autoscaling-mesos-clusters-with-clusterman.html">Clusterman:
Autoscaling Mesos Clusters at Yelp</a></li>
</ul></li>
<li><a
href="http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36737.pdf">Availability
in Globally Distributed Storage Systems at Google</a><br />
</li>
<li><a
href="https://yahooeng.tumblr.com/post/68823943185/nodejs-high-availability">NodeJS
High Availability at Yahoo</a></li>
<li><a
href="https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason">Operations
(11 parts) at LinkedIn</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17americas/program/presentation/barot">Monitoring
Powers High Availability for LinkedIn Feed</a></li>
<li><a
href="https://code.facebook.com/posts/166966743929963/how-production-engineers-support-global-events-on-facebook/">Supporting
Global Events at Facebook</a></li>
<li><a
href="https://medium.com/blablacar-tech/the-expendables-backends-high-availability-at-blablacar-8cea3b95b26b">High
Availability at BlaBlaCar</a></li>
<li><a
href="https://medium.com/@NetflixTechBlog/tips-for-high-availability-be0472f2599c">High
Availability at Netflix</a></li>
<li><a
href="https://www.twilio.com/engineering/2011/12/12/scaling-high-availablity-infrastructure-in-cloud">High
Availability Cloud Infrastructure at Twilio</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2019/01/automating-datacenter-operations-at-dropbox/">Automating
Datacenter Operations at Dropbox</a></li>
<li><a
href="https://technology.riotgames.com/news/globalizing-player-accounts">Globalizing
Player Accounts at Riot Games</a></li>
</ul>
<h2 id="stability">Stability</h2>
<ul>
<li><a href="https://martinfowler.com/bliki/CircuitBreaker.html">Circuit
Breaker</a>
<ul>
<li><a
href="https://www.infoq.com/presentations/circuit-breaking-distributed-systems">Circuit
Breaking in Distributed Systems</a></li>
<li><a
href="https://f5.com/about-us/blog/articles/the-art-of-scaling-containers-circuit-breakers-28919">Circuit
Breaker for Scaling Containers</a></li>
<li><a
href="https://developers.soundcloud.com/blog/lessons-in-resilience-at-SoundCloud">Lessons
in Resilience at SoundCloud</a></li>
<li><a href="http://tech.trivago.com/2016/02/23/protector/">Protector:
Circuit Breaker for Time Series Databases at Trivago</a></li>
<li><a
href="https://blog.heroku.com/improved-production-stability-with-circuit-breakers">Improved
Production Stability with Circuit Breakers at Heroku</a></li>
<li><a
href="https://medium.com/zendesk-engineering/the-joys-of-circuit-breaking-ee6584acd687">Circuit
Breaker at Zendesk</a></li>
<li><a
href="https://medium.com/traveloka-engineering/circuit-breakers-dont-let-your-dependencies-bring-you-down-5ba1c5cf1eec">Circuit
Breaker at Traveloka</a></li>
<li><a
href="https://shopify.engineering/circuit-breaker-misconfigured">Circuit
Breaker at Shopify</a></li>
</ul></li>
<li><a
href="https://www.javaworld.com/article/2824163/application-performance/stability-patterns-applied-in-a-restful-architecture.html">Timeouts</a>
<ul>
<li><a
href="https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a">Fault
Tolerance (Timeouts and Retries, Thread Separation, Semaphores, Circuit
Breakers) at Netflix</a></li>
<li><a
href="https://doordash.engineering/2018/12/21/enforce-timeout-a-doordash-reliability-methodology/">Enforce
Timeout: A Reliability Methodology at DoorDash</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/a-vip-connection-timeout-issue-caused-by-snat-and-tcp-tw-recycle/">Troubleshooting
a Connection Timeout Issue with tcp_tw_recycle Enabled at eBay</a></li>
</ul></li>
<li><a
href="https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f">Crash-safe
Replication for MySQL at Booking.com</a></li>
<li><a
href="https://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html">Bulkheads:
Partition and Tolerate Failure in One Part</a></li>
<li><a
href="https://docs.microsoft.com/en-us/sql/relational-databases/policy-based-management/place-data-and-log-files-on-separate-drives">Steady
State: Always Put Logs on Separate Disk</a></li>
<li><a href="http://www.sosp.org/2001/papers/welsh.pdf">Throttling:
Maintain a Steady Pace</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2017/11/improving-resiliency-and-stability-of-a-large-scale-api">Multi-Clustering:
Improving Resiliency and Stability of a Large-scale Monolithic API
Service at LinkedIn</a></li>
<li><a
href="https://engineering.riotgames.com/news/determinism-league-legends-fixing-divergences">Determinism
(4 parts) in League of Legends Server</a></li>
</ul>
<h2 id="performance">Performance</h2>
<ul>
<li><a
href="https://stackify.com/application-performance-metrics/">Performance
Optimization on OS, Storage, Database, Network</a>
<ul>
<li><a
href="https://engineering.instagram.com/improving-performance-with-background-data-prefetching-b191acb39898">Improving
Performance with Background Data Prefetching at Instagram</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2020/fixing-linux-filesystem-performance-regressions">Fixing
Linux filesystem performance regressions at LinkedIn</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/how-ebays-shopping-cart-used-compression-techniques-to-solve-network-io-bottlenecks/">Compression
Techniques to Solve Network I/O Bottlenecks at eBay</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/">Optimizing
Web Servers for High Throughput and Low Latency at Dropbox</a></li>
<li><a
href="https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55">Linux
Performance Analysis in 60.000 Milliseconds at Netflix</a></li>
<li><a
href="https://engineering.mixpanel.com/2018/07/31/live-downsizing-google-cloud-pds-for-fun-and-profit/">Live
Downsizing Google Cloud Persistent Disks (PD-SSD) at Mixpanel</a></li>
<li><a
href="https://zapier.com/engineering/celery-python-jemalloc/">Decreasing
RAM Usage by 40% Using jemalloc with Python &amp; Celery at
Zapier</a></li>
<li><a
href="https://slack.engineering/reducing-slacks-memory-footprint-4480fec7e8eb">Reducing
Memory Footprint at Slack</a></li>
<li><a
href="https://slack.engineering/continuous-load-testing/">Continuous
Load Testing at Slack</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/driving-user-growth-with-performance-improvements-cfc50dafadd7">Performance
Improvements at Pinterest</a></li>
<li><a href="https://www.youtube.com/watch?v=f9xI2jR71Ms">Server Side
Rendering at Wix</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/02/making-30x-performance-improvements-on-yelps-mysqlstreamer.html">30x
Performance Improvements on MySQLStreamer at Yelp</a></li>
<li><a
href="https://medium.com/netflix-techblog/optimizing-the-netflix-api-5c9ac715cf19">Optimizing
APIs at Netflix</a></li>
<li><a
href="https://medium.com/walmartlabs/performance-monitoring-with-riemann-and-clojure-eafc07fcd375">Performance
Monitoring with Riemann and Clojure at Walmart</a></li>
<li><a
href="https://www.zynga.com/blogs/engineering/live-games-have-evolving-performance">Performance
Tracking Dashboard for Live Games at Zynga</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/optimization-of-cal-report-hadoop-mapreduce-job/">Optimizing
CAL Report Hadoop MapReduce Jobs at eBay</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/performance-tuning-on-quartz-scheduler/">Performance
Tuning on Quartz Scheduler at eBay</a></li>
<li><a
href="https://engineering.riotgames.com/news/profiling-optimisation">Profiling
C++ (Part 1: Optimization, Part 2: Measurement and Analysis) at Riot
Games</a></li>
<li><a
href="https://medium.com/homeaway-tech-blog/profiling-react-server-side-rendering-to-free-the-node-js-event-loop-7f0fe455a901">Profiling
React Server-Side Rendering at HomeAway</a></li>
<li><a
href="https://medium.com/dailymotion-engineering/hardware-assisted-video-transcoding-at-dailymotion-66cd2db448ae">Hardware-Assisted
Video Transcoding at Dailymotion</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2018/11/cross-shard-transactions-at-10-million-requests-per-second/">Cross
Shard Transactions at 10 Million RPS at Dropbox</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/api-profiling-at-pinterest-6fa9333b4961">API
Profiling at Pinterest</a></li>
<li><a
href="https://engineeringblog.yelp.com/2017/07/generating-web-pages-in-parallel-with-pagelets.html">Pagelets
Parallelize Server-side Processing at Yelp</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/improving-key-expiration-in-redis.html">Improving
key expiration in Redis at Twitter</a></li>
<li><a
href="https://medium.com/mindgeek-engineering-blog/ad-delivery-network-performance-optimization-with-flame-graphs-bc550cf59cf7">Ad
Delivery Network Performance Optimization with Flame Graphs at
MindGeek</a></li>
<li><a
href="https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7">Predictive
CPU isolation of containers at Netflix</a></li>
<li><a
href="https://eng.uber.com/improving-hdfs-i-o-utilization-for-efficiency/">Improving
HDFS I/O Utilization for Efficiency at Uber</a></li>
<li><a
href="https://codeascraft.com/2020/04/23/cloud-jewels-estimating-kwh-in-the-cloud/">Cloud
Jewels: Estimating kWh in the Cloud at Etsy</a></li>
<li><a
href="https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/">Unthrottled:
Fixing CPU Limits in the Cloud (2 parts) at Indeed</a></li>
</ul></li>
<li><a
href="https://confluence.atlassian.com/enterprise/garbage-collection-gc-tuning-guide-461504616.html">Performance
Optimization by Tuning Garbage Collection</a>
<ul>
<li><a
href="https://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-applications">Garbage
Collection in Java Applications at LinkedIn</a></li>
<li><a
href="https://medium.com/adobetech/engineering-high-throughput-low-latency-machine-learning-services-7d45edac0271">Garbage
Collection in High-Throughput, Low-Latency Machine Learning Services at
Adobe</a></li>
<li><a
href="https://developers.soundcloud.com/blog/garbage-collection-in-redux-applications">Garbage
Collection in Redux Applications at SoundCloud</a></li>
<li><a
href="https://blog.twitch.tv/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2">Garbage
Collection in Go Application at Twitch</a></li>
<li><a
href="https://www.linux.com/blog/can-nodejs-scale-ask-team-alibaba">Analyzing
V8 Garbage Collection Logs at Alibaba</a></li>
<li><a
href="https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf">Python
Garbage Collection for Dropping 50% Memory Growth Per Request at
Instagram</a></li>
<li><a href="https://githubengineering.com/removing-oobgc/">Performance
Impact of Removing Out of Band Garbage Collector (OOBGC) at
Github</a></li>
<li><a
href="https://allegro.tech/2018/05/a-comedy-of-errors-debugging-java-memory-leaks.html">Debugging
Java Memory Leaks at Allegro</a></li>
<li><a href="https://www.youtube.com/watch?v=X4tmr3nhZRg">Optimizing JVM
at Alibaba</a></li>
<li><a href="https://eng.uber.com/jvm-tuning-garbage-collection/">Tuning
JVM Memory for Large-scale Services at Uber</a></li>
<li><a
href="https://medium.com/walmartglobaltech/solr-performance-tuning-beb7d0d0f8d9">Solr
Performance Tuning at Walmart</a></li>
<li><a
href="https://blog.flipkart.tech/memory-tuning-a-high-throughput-microservice-ed57b3e60997">Memory
Tuning a High Throughput Microservice at Flipkart</a></li>
</ul></li>
<li><a
href="https://developers.google.com/web/fundamentals/performance/why-performance-matters/">Performance
Optimization on Image, Video, Page Load</a>
<ul>
<li><a
href="https://code.facebook.com/posts/129055711052260/optimizing-360-photos-at-scale/">Optimizing
360 Photos at Scale at Facebook</a></li>
<li><a
href="https://codeascraft.com/2017/05/30/reducing-image-file-size-at-etsy/">Reducing
Image File Size in the Photos Infrastructure at Etsy</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/improving-gif-performance-on-pinterest-8dad74bf92f1">Improving
GIF Performance at Pinterest</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/optimizing-video-playback-performance-caf55ce310d1">Optimizing
Video Playback Performance at Pinterest</a></li>
<li><a
href="https://medium.com/netflix-techblog/optimized-shot-based-encodes-now-streaming-4b9464204830">Optimizing
Video Stream for Low Bandwidth with Dynamic Optimizer at
Netflix</a></li>
<li><a
href="https://youtube-eng.googleblog.com/2018/04/making-high-quality-video-efficient.html">Adaptive
Video Streaming at YouTube</a></li>
<li><a
href="https://medium.com/dailymotion/reducing-video-loading-time-fa9c997a2294">Reducing
Video Loading Time at Dailymotion</a></li>
<li><a
href="https://www.zillow.com/engineering/improving-homepage-performance/">Improving
Homepage Performance at Zillow</a></li>
<li><a
href="https://medium.com/expedia-engineering/go-fast-or-go-home-the-process-of-optimizing-for-client-performance-57bb497402e">The
Process of Optimizing for Client Performance at Expedia</a></li>
<li><a
href="https://medium.com/bbc-design-engineering/bbc-world-service-web-performance-26b08f7abfcc">Web
Performance at BBC</a></li>
</ul></li>
<li><a
href="https://blogs.akamai.com/2016/02/understanding-brotlis-potential.html">Performance
Optimization by Brotli Compression</a>
<ul>
<li><a
href="https://engineering.linkedin.com/blog/2017/05/boosting-site-speed-using-brotli-compression">Boosting
Site Speed Using Brotli Compression at LinkedIn</a><br />
</li>
<li><a
href="https://medium.com/booking-com-development/bookings-journey-with-brotli-978b249d34f3">Brotli
at Booking.com</a></li>
<li><a
href="https://tech.treebo.com/a-tale-of-brotli-compression-bcb071d9780a">Brotli
at Treebo</a></li>
<li><a
href="https://dropbox.tech/infrastructure/deploying-brotli-for-static-content">Deploying
Brotli for Static Content at Dropbox</a></li>
<li><a
href="https://engineeringblog.yelp.com/2017/07/progressive-enhancement-with-brotli.html">Progressive
Enhancement with Brotli at Yelp</a></li>
<li><a
href="https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/">Speeding
Up Redis with Compression at DoorDash</a></li>
</ul></li>
<li><a href="https://www.techempower.com/benchmarks/">Performance
Optimization on Languages and Frameworks</a>
<ul>
<li><a
href="https://netflixtechblog.com/python-at-netflix-bba45dae649e">Python
at Netflix</a></li>
<li><a
href="https://instagram-engineering.com/python-at-scale-strict-modules-c0bb9245c834">Python
at scale (3 parts) at Instagram</a></li>
<li><a
href="https://engineering.issuu.com/2018/12/10/our-current-ocaml-best-practices-part-2">OCaml
best practices (2 parts) at Issuu</a></li>
<li><a
href="https://slack.engineering/taking-php-seriously-cf7a60065329">PHP
at Slack</a></li>
<li><a href="https://tech.trivago.com/2020/03/02/why-we-chose-go/">Go at
Trivago</a></li>
<li><a
href="https://codeascraft.com/2021/11/08/etsys-journey-to-typescript/">TypeScript
at Etsy</a></li>
<li><a
href="https://www.etsy.com/sg-en/codeascraft/sealed-classes-opened-my-mind">Kotlin
for taming state at Etsy</a></li>
<li><a
href="https://doordash.engineering/2022/03/22/how-to-leverage-functional-programming-in-kotlin-to-write-better-cleaner-code/">Kotlin
at DoorDash</a></li>
<li><a
href="https://medium.com/bumble-tech/bpf-and-go-modern-forms-of-introspection-in-linux-6b9802682223">BPF
and Go at Bumble</a></li>
<li><a
href="https://medium.com/gitlab-magazine/why-we-use-ruby-on-rails-to-build-gitlab-601dce4a7a38">Ruby
on Rails at GitLab</a></li>
<li><a
href="https://medium.com/figma-design/rust-in-production-at-figma-e10a0ec31929">Rust
in production at Figma</a></li>
<li><a
href="https://engineering.wework.com/choosing-a-language-stack-cac3726928f6">Choosing
a Language Stack at WeWork</a></li>
<li><a
href="https://blog.discord.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f">Switching
from Go to Rust at Discord</a></li>
<li><a
href="https://medium.com/agoda-engineering/happy-asp-net-core-performance-optimization-4e21a383d299">ASP.NET
Core Performance Optimization at Agoda</a></li>
<li><a href="https://eng.uber.com/data-race-patterns-in-go/">Data Race
Patterns in Go at Uber</a></li>
<li><a
href="https://netflixtechblog.com/java-21-virtual-threads-dude-wheres-my-lock-3052540e231d">Java
21 Virtual Threads at Netflix</a></li>
</ul></li>
</ul>
<h2 id="intelligence">Intelligence</h2>
<ul>
<li><a
href="https://insights.sei.cmu.edu/sei_blog/2017/05/reference-architectures-for-big-data-systems.html">Big
Data</a>
<ul>
<li><a href="https://eng.uber.com/uber-big-data-platform/">Data Platform
at Uber</a></li>
<li><a
href="https://www.unibw.de/code/events-u/jt-2018-workshops/ws3_bigdata_vortrag_widmann.pdf">Data
Platform at BMW</a></li>
<li><a href="https://www.youtube.com/watch?v=CSDIThSwA7s">Data Platform
at Netflix</a></li>
<li><a
href="https://blog.flipkart.tech/overview-of-flipkart-data-platform-20c6d3e9a196">Data
Platform at Flipkart</a></li>
<li><a
href="https://medium.com/coupang-tech/evolving-the-coupang-data-platform-308e305a9c45">Data
Platform at Coupang</a></li>
<li><a
href="https://doordash.engineering/2020/09/25/how-doordash-is-scaling-its-data-platform/">Data
Platform at DoorDash</a></li>
<li><a
href="http://engineering.khanacademy.org/posts/khanalytics.htm">Data
Platform at Khan Academy</a></li>
<li><a
href="https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c">Data
Infrastructure at Airbnb</a></li>
<li><a
href="https://www.infoq.com/presentations/big-data-infrastructure-linkedin">Data
Infrastructure at LinkedIn</a></li>
<li><a
href="https://blog.gojekengineering.com/data-infrastructure-at-go-jek-cd4dc8cbd929">Data
Infrastructure at GO-JEK</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754">Data
Ingestion Infrastructure at Pinterest</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/behind-the-pins-building-analytics-f7b508cdacab">Data
Analytics Architecture at Pinterest</a></li>
<li><a
href="https://engineering.atspotify.com/2022/03/why-we-switched-our-data-orchestration-service/">Data
Orchestration Service at Spotify</a></li>
<li><a
href="https://labs.spotify.com/2017/10/23/big-data-processing-at-spotify-the-road-to-scio-part-2/">Big
Data Processing (2 parts) at Spotify</a></li>
<li><a
href="https://cdn.oreillystatic.com/en/assets/1/event/160/Big%20data%20processing%20with%20Hadoop%20and%20Spark%2C%20the%20Uber%20way%20Presentation.pdf">Big
Data Processing at Uber</a></li>
<li><a
href="https://cdn.oreillystatic.com/en/assets/1/event/269/Lyft_s%20analytics%20pipeline_%20From%20Redshift%20to%20Apache%20Hive%20and%20Presto%20Presentation.pdf">Analytics
Pipeline at Lyft</a></li>
<li><a
href="https://tech.grammarly.com/blog/building-a-versatile-analytics-pipeline-on-top-of-apache-spark">Analytics
Pipeline at Grammarly</a></li>
<li><a
href="https://medium.com/teads-engineering/give-meaning-to-100-billion-analytics-events-a-day-d6ba09aa8f44">Analytics
Pipeline at Teads</a></li>
<li><a
href="https://www.infoq.com/presentations/paypal-ml-fraud-prevention-2018">ML
Data Pipelines for Real-Time Fraud Prevention at PayPal</a></li>
<li><a
href="https://cdn.oreillystatic.com/en/assets/1/event/269/Big%20data%20analytics%20and%20machine%20learning%20techniques%20to%20drive%20and%20grow%20business%20Presentation%201.pdf">Big
Data Analytics and ML Techniques at LinkedIn</a></li>
<li><a
href="https://cdn.oreillystatic.com/en/assets/1/event/137/Building%20a%20self-serve%20real-time%20reporting%20platform%20at%20LinkedIn%20Presentation%201.pdf">Self-Serve
Reporting Platform on Hadoop at LinkedIn</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2019/04/privacy-preserving-analytics-and-reporting-at-linkedin">Privacy-Preserving
Analytics and Reporting at LinkedIn</a></li>
<li><a
href="https://medium.com/walmartlabs/how-we-build-a-robust-analytics-platform-using-spark-kafka-and-cassandra-lambda-architecture-70c2d1bc8981">Analytics
Platform for Tracking Item Availability at Walmart</a></li>
<li><a
href="https://www.uber.com/en-SG/blog/real-time-analytics-for-mobile-app-crashes/">Real-Time
Analytics for Mobile App Crashes using Apache Pinot at Uber</a></li>
<li><a
href="https://code.fb.com/data-center-engineering/hardware-analytics-and-lifecycle-optimization-halo-at-facebook/">HALO:
Hardware Analytics and Lifecycle Optimization at Facebook</a></li>
<li><a
href="https://techblog.king.com/rbea-scalable-real-time-analytics-king/">RBEA:
Real-time Analytics Platform at King</a></li>
<li><a href="https://eng.uber.com/aresdb/">AresDB: GPU-Powered Real-time
Analytics Engine at Uber</a></li>
<li><a href="https://eng.uber.com/athenax/">AthenaX: Streaming Analytics
Platform at Uber</a></li>
<li><a
href="https://www.uber.com/en-SG/blog/jupiter-batch-ingestion-platform/">Jupiter:
Config Driven Adtech Batch Ingestion Platform at Uber</a></li>
<li><a
href="https://medium.com/netflix-techblog/delta-a-data-synchronization-and-enrichment-platform-e82c36a79aee">Delta:
Data Synchronization and Enrichment Platform at Netflix</a></li>
<li><a
href="https://medium.com/netflix-techblog/keystone-real-time-stream-processing-platform-a3ee651812a">Keystone:
Real-time Stream Processing Platform at Netflix</a></li>
<li><a href="https://eng.uber.com/databook/">Databook: Turning Big Data
into Knowledge with Metadata at Uber</a></li>
<li><a
href="https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9">Amundsen:
Data Discovery &amp; Metadata Engine at Lyft</a></li>
<li><a href="https://eng.uber.com/maze/">Maze: Funnel Visualization
Platform at Uber</a></li>
<li><a
href="https://medium.com/netflix-techblog/metacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520">Metacat:
Making Big Data Discoverable and Meaningful at Netflix</a></li>
<li><a
href="https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f">SpinalTap:
Change Data Capture System at Airbnb</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/announcing-the-accelerator-processing-1-000-000-000-lines-per-second-on-a-single-computer/">Accelerator:
Fast Data Processing Framework at eBay</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/180867271141/a-new-chapter-for-omid">Omid:
Transaction Processing Platform at Yahoo</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/157196488076/open-sourcing-tensorflowonspark-distributed-deep">TensorFlowOnSpark:
Distributed Deep Learning on Big Data Clusters at Yahoo</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/139916828451/caffeonspark-open-sourced-for-distributed-deep">CaffeOnSpark:
Distributed Deep Learning on Big Data Clusters at Yahoo</a></li>
<li><a
href="https://medium.com/adobetech/spark-on-scala-adobe-analytics-reference-architecture-7457f5614b4c">Spark
on Scala: Analytics Reference Architecture at Adobe</a></li>
<li><a
href="https://engineering.atspotify.com/2020/11/02/spotifys-new-experimentation-platform-part-2/">Experimentation
Platform (2 parts) at Spotify</a></li>
<li><a
href="https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-23fd17c91166">Experimentation
Platform at Airbnb</a></li>
<li><a
href="https://engineering.zalando.com/posts/2017/10/zalando-smart-product-platform.html">Smart
Product Platform at Zalando</a></li>
<li><a href="https://www.slideshare.net/wyukawa/strata2017-sg">Log
Analysis Platform at LINE</a></li>
<li><a
href="https://medium.com/myntra-engineering/universal-dashboarding-platform-udp-data-visualisation-platform-at-myntra-5f2522fcf72d">Data
Visualisation Platform at Myntra</a></li>
<li><a
href="https://medium.com/netflix-techblog/building-and-scaling-data-lineage-at-netflix-to-improve-data-infrastructure-reliability-and-1a52526a7977">Building
and Scaling Data Lineage at Netflix</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/building-a-scalable-data-management-system-for-computer-vision-tasks-a6dee8f1c580">Building
a scalable data management system for computer vision tasks at
Pinterest</a></li>
<li><a
href="https://codeascraft.com/2019/07/31/an-introduction-to-structured-data-at-etsy/">Structured
Data at Etsy</a></li>
<li><a
href="https://medium.com/airbnb-engineering/scaling-a-mature-data-pipeline-managing-overhead-f34835cbc866">Scaling
a Mature Data Pipeline - Managing Overhead at Airbnb</a></li>
<li><a
href="https://medium.com/airbnb-engineering/on-spark-hive-and-small-files-an-in-depth-look-at-spark-partitioning-strategies-a9a364f908">Spark
Partitioning Strategies at Airbnb</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2021/the-exabyte-club--linkedin-s-journey-of-scaling-the-hadoop-distr">Scaling
the Hadoop Distributed File System at LinkedIn</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2021/scaling-linkedin-s-hadoop-yarn-cluster-beyond-10-000-nodes">Scaling
Hadoop YARN cluster beyond 10,000 nodes at LinkedIn</a></li>
<li><a
href="https://medium.com/pinterest-engineering/securely-scaling-big-data-access-controls-at-pinterest-bbc3406a1695">Scaling
Big Data Access Controls at Pinterest</a></li>
</ul></li>
<li><a
href="https://www.csie.ntu.edu.tw/~cjlin/talks/bigdata-bilbao.pdf">Distributed
Machine Learning</a>
<ul>
<li><a
href="https://engineeringblog.yelp.com/2020/07/ML-platform-overview.html">Machine
Learning Platform at Yelp</a></li>
<li><a
href="https://codeascraft.com/2021/12/21/redesigning-etsys-machine-learning-platform/">Machine
Learning Platform at Etsy</a></li>
<li><a
href="https://engineering.zalando.com/posts/2022/04/zalando-machine-learning-platform.html">Machine
Learning Platform at Zalando</a></li>
<li><a
href="https://www.uber.com/en-SG/blog/scaling-ai-ml-infrastructure-at-uber/">Scaling
AI/ML Infrastructure at Uber</a></li>
<li><a
href="https://eng.lyft.com/the-recommendation-system-at-lyft-67bc9dcc1793">Recommendation
System at Lyft</a></li>
<li><a
href="https://eng.lyft.com/lyfts-reinforcement-learning-platform-670f77ff46ec">Reinforcement
Learning Platform at Lyft</a></li>
<li><a
href="https://www.etsy.com/sg-en/codeascraft/building-a-platform-for-serving-recommendations-at-etsy">Platform
for Serving Recommendations at Etsy</a></li>
<li><a
href="https://engineering.atspotify.com/2022/06/how-we-built-infrastructure-to-run-user-forecasts-at-spotify/">Infrastructure
to Run User Forecasts at Spotify</a></li>
<li><a href="https://code.fb.com/developer-tools/aroma/">Aroma: Using ML
for Code Recommendation at Facebook</a></li>
<li><a
href="https://eng.lyft.com/introducing-flyte-cloud-native-machine-learning-and-data-processing-platform-fb2bb3046a59">Flyte:
Cloud Native Machine Learning and Data Processing Platform at
Lyft</a></li>
<li><a
href="https://eng.lyft.com/lyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb">LyftLearn:
ML Model Training Infrastructure built on Kubernetes at Lyft</a></li>
<li><a href="https://eng.uber.com/horovod/">Horovod: Open Source
Distributed Deep Learning Framework for TensorFlow at Uber</a></li>
<li><a
href="https://www.uber.com/blog/genie-ubers-gen-ai-on-call-copilot/">Genie:
Gen AI On-Call Copilot at Uber</a></li>
<li><a href="https://eng.uber.com/cota/">COTA: Improving Customer Care
with NLP &amp; Machine Learning at Uber</a></li>
<li><a href="https://eng.uber.com/manifold/">Manifold: Model-Agnostic
Visual Debugging Tool for Machine Learning at Uber</a></li>
<li><a href="https://githubengineering.com/topics/">Repo-Topix: Topic
Extraction Framework at Github</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/05/concourse--generating-personalized-content-notifications-in-near">Concourse:
Generating Personalized Content Notifications in Near-Real-Time at
LinkedIn</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/altus-care-apply-chatbot-to-ebay-platform-engineering/">Altus
Care: Applying a Chatbot to Platform Engineering at eBay</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/pykrylov-accelerating-machine-learning-research-at-ebay/">PyKrylov:
Accelerating Machine Learning Research at eBay</a></li>
<li><a
href="https://blog.box.com/blog/box-graph-how-we-built-spontaneous-social-network/">Box
Graph: Spontaneous Social Network at Box</a></li>
<li><a
href="https://hackernoon.com/pricingnet-modelling-the-global-airline-industry-with-neural-networks-833844d20ea6">PricingNet:
Pricing Modelling with Neural Networks at Skyscanner</a></li>
<li><a
href="https://medium.com/pinterest-engineering/pintext-a-multitask-text-embedding-system-in-pinterest-b80ece364555">PinText:
Multitask Text Embedding System at Pinterest</a></li>
<li><a
href="https://medium.com/pinterest-engineering/searchsage-learning-search-query-representations-at-pinterest-654f2bb887fc">SearchSage:
Learning Search Query Representations at Pinterest</a></li>
<li><a
href="https://dropbox.tech/machine-learning/cannes--how-ml-saves-us--1-7m-a-year-on-document-previews">Cannes:
ML saves $1.7M a year on document previews at Dropbox</a></li>
<li><a
href="https://engineeringblog.yelp.com/2018/01/building-a-distributed-ml-pipeline-part1.html">Scaling
Gradient Boosted Trees for Click-Through-Rate Prediction at
Yelp</a><br />
</li>
<li><a
href="https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html">Learning
with Privacy at Scale at Apple</a></li>
<li><a
href="https://medium.com/mercari-engineering/mercaris-image-classification-experiment-using-deep-learning-9b4e994a18ec">Deep
Learning for Image Classification Experiment at Mercari</a></li>
<li><a
href="https://allegro.tech/2016/12/deep-learning-for-frame-detection.html">Deep
Learning for Frame Detection in Product Images at Allegro</a></li>
<li><a
href="https://medium.com/hulu-tech-blog/content-based-video-relevance-prediction-b2c448e14752">Content-based
Video Relevance Prediction at Hulu</a></li>
<li><a
href="https://engineeringblog.yelp.com/2024/03/moderating-inappropriate-video-content-at-yelp.html">Moderating
Inappropriate Video Content at Yelp</a></li>
<li><a
href="http://engineering.tripadvisor.com/improving-tripadvisor-photo-selection-deep-learning/">Improving
Photo Selection With Deep Learning at TripAdvisor</a></li>
<li><a
href="https://www.tripadvisor.com/engineering/personalized-recommendations-for-experiences-using-deep-learning/">Personalized
Recommendations for Experiences Using Deep Learning at
TripAdvisor</a></li>
<li><a
href="https://medium.com/bbc-design-engineering/developing-personalised-recommender-systems-at-the-bbc-e26c5e0c4216">Personalised
Recommender Systems at BBC</a></li>
<li><a
href="https://technology.condenast.com/story/handbag-brand-and-color-detection">Machine
Learning (2 parts) at Condé Nast</a></li>
<li><a
href="https://technology.condenast.com/story/natural-language-processing-and-content-analysis-at-conde-nast-part-2-system-architecture">Natural
Language Processing and Content Analysis (2 parts) at Condé
Nast</a></li>
<li><a
href="https://tech.iheart.com/mapping-the-world-of-music-using-machine-learning-part-2-aa50b6a0304c">Mapping
the World of Music Using Machine Learning (2 parts) at
iHeartRadio</a></li>
<li><a
href="https://medium.com/netflix-techblog/using-machine-learning-to-improve-streaming-quality-at-netflix-9651263ef09f">Machine
Learning to Improve Streaming Quality at Netflix</a></li>
<li><a
href="https://blog.gojekengineering.com/how-we-use-machine-learning-to-match-drivers-riders-b06d617b9e5">Machine
Learning to Match Drivers &amp; Riders at GO-JEK</a></li>
<li><a
href="https://youtube-eng.googleblog.com/2015/10/improving-youtube-video-thumbnails-with_8.html">Improving
Video Thumbnails with Deep Neural Nets at YouTube</a></li>
<li><a
href="https://tech.instacart.com/how-instacart-delivers-on-time-using-quantile-regression-2383e2e03edb">Quantile
Regression for Delivering On Time at Instacart</a></li>
<li><a
href="https://jobs.zalando.com/tech/blog/search-deep-neural-network/">Cross-Lingual
End-to-End Product Search with Deep Learning at Zalando</a></li>
<li><a
href="https://blog.janestreet.com/real-world-machine-learning-part-1/">Machine
Learning at Jane Street</a></li>
<li><a
href="https://engineering.quora.com/A-Machine-Learning-Approach-to-Ranking-Answers-on-Quora">Machine
Learning for Ranking Answers End-to-End at Quora</a></li>
<li><a
href="http://engineering.flipboard.com/2017/02/storyclustering">Clustering
Similar Stories Using LDA at Flipboard</a></li>
<li><a
href="https://code.flickr.net/2017/03/07/introducing-similarity-search-at-flickr/">Similarity
Search at Flickr</a></li>
<li><a
href="http://engineering.indeedblog.com/blog/2016/04/building-a-large-scale-machine-learning-pipeline-for-job-recommendations/">Large-Scale
Machine Learning Pipeline for Job Recommendations at Indeed</a></li>
<li><a
href="http://engineering.taboola.com/deep-learning-from-prototype-to-production/">Deep
Learning from Prototype to Production at Taboola</a></li>
<li><a
href="https://cdn.oreillystatic.com/en/assets/1/event/144/Atom%20smashing%20using%20machine%20learning%20at%20CERN%20Presentation.pdf">Atom
Smashing using Machine Learning at CERN</a></li>
<li><a
href="https://medium.engineering/mapping-mediums-tags-1b9a78d77cf0">Mapping
Tags at Medium</a></li>
<li><a
href="http://engineering.monsanto.com/2015/11/23/chinese-restaurant-process/">Clustering
with the Dirichlet Process Mixture Model in Scala at Monsanto</a></li>
<li><a
href="https://engineering.foursquare.com/you-are-probably-here-better-map-pins-with-dbscan-random-forests-9d51e8c1964d">Map
Pins with DBSCAN &amp; Random Forests at Foursquare</a></li>
<li><a href="https://eng.uber.com/forecasting-introduction/">Forecasting
at Uber</a></li>
<li><a
href="https://eng.uber.com/transforming-financial-forecasting-machine-learning/">Financial
Forecasting at Uber</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html">Productionizing
ML with Workflows at Twitter</a></li>
<li><a
href="https://www.ebayinc.com/stories/blogs/tech/gui-testing-powered-by-deep-learning/">GUI
Testing Powered by Deep Learning at eBay</a></li>
<li><a
href="http://engineering.pivotal.io/post/scaling-machine-learning-to-recommend-driving-routes/">Scaling
Machine Learning to Recommend Driving Routes at Pivotal</a></li>
<li><a
href="https://www.infoq.com/presentations/doordash-real-time-predictions">Real-Time
Predictions at DoorDash</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2018/09/machine-intelligence-at-dropbox-an-update-from-our-dbxi-team/">Machine
Intelligence at Dropbox</a></li>
<li><a
href="https://blogs.dropbox.com/tech/2018/10/using-machine-learning-to-index-text-from-billions-of-images/">Machine
Learning for Indexing Text from Billions of Images at Dropbox</a></li>
<li><a
href="https://codeascraft.com/2018/07/12/modeling-user-journey-via-semantic-embeddings/">Modeling
User Journeys via Semantic Embeddings at Etsy</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/09/automated-fake-account-detection-at-linkedin">Automated
Fake Account Detection at LinkedIn</a></li>
<li><a
href="https://medium.com/airbnb-engineering/contextualizing-airbnb-by-building-knowledge-graph-b7077e268d5a">Building
Knowledge Graph at Airbnb</a></li>
<li><a
href="https://instagram-engineering.com/core-modeling-at-instagram-a51e0158aa48">Core
Modeling at Instagram</a></li>
<li><a href="https://tech.mercari.com/entry/2019/04/26/163000">Neural
Architecture Search (NAS) for Prohibited Item Detection at
Mercari</a></li>
<li><a
href="https://medium.com/airbnb-engineering/amenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e">Computer
Vision at Airbnb</a></li>
<li><a
href="https://www.zillow.com/engineering/behind-zillow-3d-home-backend-algorithms/">3D
Home Backend Algorithms at Zillow</a></li>
<li><a
href="https://eng.lyft.com/making-long-term-forecasts-at-lyft-fac475b3ba52">Long-term
Forecasts at Lyft</a></li>
<li><a
href="https://engineeringblog.yelp.com/2019/10/discovering-popular-dishes-with-deep-learning.html">Discovering
Popular Dishes with Deep Learning at Yelp</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/splitnet-architecture-for-ad-candidate-ranking.html">SplitNet
Architecture for Ad Candidate Ranking at Twitter</a></li>
<li><a
href="https://engineering.indeedblog.com/blog/2019/09/jobs-filter/">Jobs
Filter at Indeed</a></li>
<li><a
href="https://engineeringblog.yelp.com/2019/12/architecting-wait-time-estimations.html">Architecting
Restaurant Wait Time Predictions at Yelp</a></li>
<li><a
href="https://labs.spotify.com/2016/08/07/commodity-music-ml-services/">Music
Personalization at Spotify</a></li>
<li><a
href="https://sg.godaddy.com/engineering/2019/07/26/domain-name-valuation/">Deep
Learning for Domain Name Valuation at GoDaddy</a></li>
<li><a href="https://stripe.com/blog/similarity-clustering">Similarity
Clustering to Catch Fraud Rings at Stripe</a></li>
<li><a
href="https://codeascraft.com/2020/10/29/bringing-personalized-search-to-etsy/">Personalized
Search at Etsy</a></li>
<li><a
href="https://eng.lyft.com/ml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a">ML
Feature Serving Infrastructure at Lyft</a></li>
<li><a
href="https://codeascraft.com/2021/03/23/how-we-built-a-context-specific-bidding-system-for-etsy-ads/">Context-Specific
Bidding System at Etsy</a></li>
<li><a
href="https://engineeringblog.yelp.com/2021/05/moderating-promotional-spam-and-inappropriate-content-in-photos-at-scale-at-yelp.html">Moderating
Promotional Spam and Inappropriate Content in Photos at Scale at
Yelp</a></li>
<li><a
href="https://dropbox.tech/machine-learning/optimizing-payments-with-machine-learning">Optimizing
Payments with Machine Learning at Dropbox</a></li>
<li><a
href="https://netflixtechblog.com/scaling-media-machine-learning-at-netflix-f19b400243">Scaling
Media Machine Learning at Netflix</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/ebays-blazingly-fast-billion-scale-vector-similarity-engine/">Similarity
Engine at eBay</a></li>
<li><a
href="https://www.etsy.com/codeascraft/machine-learning-in-content-moderation-at-etsy">Machine
Learning in Content Moderation at Etsy</a></li>
</ul></li>
</ul>
<h2 id="architecture">Architecture</h2>
<ul>
<li><a
href="https://medium.engineering/the-stack-that-helped-medium-drive-2-6-millennia-of-reading-time-e56801f7c492">Tech
Stack at Medium</a></li>
<li><a
href="https://engineering.shopify.com/blogs/engineering/e-commerce-at-scale-inside-shopifys-tech-stack">Tech
Stack at Shopify</a></li>
<li><a
href="https://medium.com/airbnb-engineering/building-services-at-airbnb-part-4-23c95e428064">Building
Services (4 parts) at Airbnb</a></li>
<li><a
href="https://evernote.com/blog/a-digest-of-evernotes-architecture/">Architecture
of Evernote</a></li>
<li><a
href="https://engineering.riotgames.com/news/chat-service-architecture-persistence">Architecture
of Chat Service (3 parts) at Riot Games</a></li>
<li><a
href="https://technology.riotgames.com/news/architecture-league-client-update">Architecture
of League of Legends Client Update</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/building-twitters-ad-platform-architecture-for-the-future.html">Architecture
of Ad Platform at Twitter</a></li>
<li><a
href="https://eng.uber.com/architecture-api-gateway/">Architecture of
API Gateway at Uber</a></li>
<li><a
href="https://medium.com/tinder/how-we-built-the-tinder-api-gateway-831c6ca5ceca">Architecture
of API Gateway at Tinder</a></li>
<li><a
href="https://slack.engineering/how-slack-built-shared-channels-8d42c895b19f">Basic
Architecture of Slack</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/a-lightweight-distributed-architecture-to-handle-thousands-of-library-releases-at-ebay/">Lightweight
Distributed Architecture to Handle Thousands of Library Releases at
eBay</a></li>
<li><a
href="https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin">Back-end
at LinkedIn</a></li>
<li><a
href="https://yahooeng.tumblr.com/post/157200523046/introducing-tripod-flickrs-backend-refactored">Back-end
at Flickr</a></li>
<li><a
href="https://medium.com/zendesk-engineering/the-history-of-infrastructure-at-zendesk-part-3-foundation-team-forming-and-evolving-9859e40f5390">Infrastructure
(3 parts) at Zendesk</a></li>
<li><a
href="https://bytes.grubhub.com/cloud-infrastructure-at-grubhub-94db998a898a">Cloud
Infrastructure at Grubhub</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/01/now-you-see-me--now-you-dont--linkedins-real-time-presence-platf">Real-time
Presence Platform at LinkedIn</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2019/05/building-member-trust-through-a-centralized-and-scalable-setting">Settings
Platform at LinkedIn</a></li>
<li><a
href="https://medium.com/glassdoor-engineering/building-a-nearline-system-for-scale-and-performance-part-ii-9e01bf51b23d">Nearline
System for Scale and Performance (2 parts) at Glassdoor</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a">Real-time
User Action Counting System for Ads at Pinterest</a></li>
<li><a
href="https://engineering.riotgames.com/news/riot-games-api-deep-dive">API
Platform at Riot Games</a></li>
<li><a
href="https://open.nytimes.com/play-by-play-moving-the-nyt-games-platform-to-gcp-with-zero-downtime-cf425898d569">Games
Platform at The New York Times</a></li>
<li><a
href="https://bytes.swiggy.com/kabootar-swiggys-communication-platform-e5a43cc25629">Kabootar:
Communication Platform at Swiggy</a></li>
<li><a
href="https://medium.com/netflix-techblog/https-medium-com-netflix-techblog-simone-a-distributed-simulation-service-b2c85131ca1b">Simone:
Distributed Simulation Service at Netflix</a></li>
<li><a
href="https://engineeringblog.yelp.com/2017/04/how-yelp-runs-millions-of-tests-every-day.html">Seagull:
Distributed System that Helps Running &gt; 20 Million Tests Per Day at
Yelp</a></li>
<li><a
href="https://medium.com/agoda-engineering/priceaggregator-an-intelligent-system-for-hotel-price-fetching-part-3-52acfc705081">PriceAggregator:
Intelligent System for Hotel Price Fetching (3 parts) at Agoda</a></li>
<li><a
href="https://medium.com/tinder-engineering/phoenix-tinders-testing-platform-part-iii-520728b9537">Phoenix:
Testing Platform (3 parts) at Tinder</a></li>
<li><a
href="https://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749">Hexagonal
Architecture at Netflix</a></li>
<li><a
href="https://www.slideshare.net/linecorp/architecture-sustaining-line-sticker-services">Architecture
of Sticker Services at LINE</a></li>
<li><a
href="https://medium.com/@palantir/terraforming-stack-overflow-enterprise-in-aws-47ee431e6be7">Stack
Overflow Enterprise at Palantir</a></li>
<li><a
href="https://medium.com/@Pinterest_Engineering/building-a-dynamic-and-responsive-pinterest-7d410e99f0a9">Architecture
of Following Feed, Interest Feed, and Picked For You at
Pinterest</a></li>
<li><a
href="https://engineering.wework.com/our-api-specification-workflow-9337448d6ee6">API
Specification Workflow at WeWork</a></li>
<li><a
href="https://medium.com/netflix-techblog/implementing-the-netflix-media-database-53b5a840b42a">Media
Database at Netflix</a></li>
<li><a
href="https://medium.com/walmartlabs/member-transaction-history-architecture-8b6e34b87c21">Member
Transaction History Architecture at Walmart</a></li>
<li><a
href="https://dropbox.tech/infrastructure/-testing-our-new-sync-engine">Sync
Engine (2 parts) at Dropbox</a></li>
<li><a
href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/how-we-built-twitter-s-highly-reliable-ads-pacing-service">Ads
Pacing Service at Twitter</a></li>
<li><a
href="https://netflixtechblog.com/rapid-event-notification-system-at-netflix-6deb1d2b57d1">Rapid
Event Notification System at Netflix</a></li>
<li><a
href="https://www.redhat.com/architect/portfolio/detail/12-integrating-a-modern-payments-architecture">Architectures
of Finance, Banking, and Payment Systems</a>
<ul>
<li><a
href="https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/">Bank
Backend at Monzo</a></li>
<li><a
href="https://medium.com/@Wealthsimple/engineering-at-wealthsimple-reinventing-our-trading-platform-for-scale-17e332241b6c">Trading
Platform for Scale at Wealthsimple</a></li>
<li><a
href="https://medium.com/margobank/choosing-an-architecture-85750e1e5a03">Core
Banking System at Margo Bank</a></li>
<li><a
href="https://www.infoq.com/presentations/nubank-architecture">Architecture
of Nubank</a></li>
<li><a
href="http://tech.transferwise.com/the-transferwise-stack-heartbeat-of-our-little-revolution/">Tech
Stack at TransferWise</a></li>
<li><a
href="https://medium.com/build-addepar/our-tech-stack-a4f55dab4b0d">Tech
Stack at Addepar</a></li>
<li><a
href="https://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb">Avoiding
Double Payments in a Distributed Payments System at Airbnb</a></li>
<li><a
href="https://www.etsy.com/sg-en/codeascraft/scaling-etsy-payments-with-vitess-part-3--reducing-cutover-risk">Scaling
Payments (3 parts) at Etsy</a></li>
<li><a
href="https://paytm.com/blog/engineering/how-paytm-handles-millions-of-digital-transactions-safely-everyday/">Handles
Millions of Digital Transactions Safely Everyday at Paytm</a></li>
<li><a
href="https://www.grammarly.com/blog/engineering/billing-and-payments-platform/">Billing
and Payment Platform at Grammarly</a></li>
</ul></li>
</ul>
<h2 id="interview">Interview</h2>
<ul>
<li><a
href="https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/">Designing
Large-Scale Systems</a>
<ul>
<li><a href="https://blog.codinghorror.com/my-scaling-hero/">My Scaling
Hero - Jeff Atwood (a dose of Endorphins before your interview,
JK)</a></li>
<li><a
href="https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf">Software
Engineering Advice from Building Large-Scale Distributed Systems - Jeff
Dean</a></li>
<li><a
href="https://lethain.com/introduction-to-architecting-systems-for-scale/">Introduction
to Architecting Systems for Scale</a></li>
<li><a
href="https://hackernoon.com/anatomy-of-a-system-design-interview-4cb57d75a53f">Anatomy
of a System Design Interview</a></li>
<li><a
href="http://blog.gainlo.co/index.php/2015/10/22/8-things-you-need-to-know-before-system-design-interviews/">8
Things You Need to Know Before a System Design Interview</a></li>
<li><a
href="https://hackernoon.com/top-10-system-design-interview-questions-for-software-engineers-8561290f0444">Top
10 System Design Interview Questions</a></li>
<li><a
href="https://towardsdatascience.com/10-common-software-architectural-patterns-in-a-nutshell-a0b47a1e9013">Top
10 Common Large-Scale Software Architectural Patterns in a
Nutshell</a></li>
<li><a href="https://lynnlangit.com/2017/03/14/beyond-relational/">Cloud
Big Data Design Patterns - Lynn Langit</a><br />
</li>
<li><a
href="https://hackernoon.com/how-not-to-design-netflix-in-your-45-minute-system-design-interview-64953391a054">How
NOT to design Netflix in your 45-minute System Design
Interview?</a></li>
<li><a href="https://zapier.com/engineering/api-best-practices/">API
Best Practices: Webhooks, Deprecation, and Design</a></li>
</ul></li>
<li><a
href="https://www.cse.wustl.edu/~jain/cse567-06/ftp/os_monitors/index.html">Explaining
Low-Level Systems (OS, Network/Protocol, Database, Storage)</a>
<ul>
<li><a href="http://veithen.github.io/2013/11/18/iowait-linux.html">The
Precise Meaning of I/O Wait Time in Linux</a></li>
<li><a
href="https://research.google.com/archive/paxos_made_live.html">Paxos
Made Live An Engineering Perspective</a></li>
<li><a
href="https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html">How
to do Distributed Locking</a></li>
<li><a
href="http://elliot.land/post/sql-transaction-isolation-levels-explained">SQL
Transaction Isolation Levels Explained</a></li>
</ul></li>
<li><a
href="https://www.glassdoor.com/Interview/What-happens-when-you-type-www-google-com-in-your-browser-QTN_56396.htm">“What
Happens When… and How” Questions</a>
<ul>
<li><a
href="http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html">Netflix:
What Happens When You Press Play?</a></li>
<li><a
href="https://monzo.com/blog/2018/04/05/how-monzo-to-monzo-payments-work/">Monzo:
How Peer-To-Peer Payments Work</a></li>
<li><a
href="https://githubengineering.com/transit-and-peering-how-your-requests-reach-github/">Transit
and Peering: How Your Requests Reach GitHub</a></li>
<li><a
href="https://labs.spotify.com/2018/08/31/smoother-streaming-with-bbr/">How
Spotify Streams Music</a></li>
</ul></li>
</ul>
<h2 id="organization">Organization</h2>
<ul>
<li><a
href="https://developers.soundcloud.com/blog/engineering-levels">Engineering
Levels at SoundCloud</a></li>
<li><a
href="https://medium.com/palantir/dev-versus-delta-demystifying-engineering-roles-at-palantir-ad44c2a6e87">Engineering
Roles at Palantir</a></li>
<li><a
href="https://dropbox.tech/culture/our-updated-engineering-career-framework">Engineering
Career Framework at Dropbox</a></li>
<li><a href="https://www.youtube.com/watch?v=-PXi_7Ld5kU">Scaling
Engineering Teams at Twitter</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/03/scaling-decision-making-across-teams-within-linkedin-engineering">Scaling
Decision-Making Across Teams at LinkedIn</a></li>
<li><a
href="https://blog.gojekengineering.com/the-dynamics-of-scaling-an-organisation-cb96dbe8aecd">Scaling
Data Science Team at GOJEK</a></li>
<li><a
href="https://jobs.zalando.com/tech/blog/scaling-agile-zalando/?gh_src=4n3gxh1">Scaling
Agile at Zalando</a></li>
<li><a
href="https://hackernoon.com/how-we-run-bol-com-with-60-autonomous-teams-fe7a98c0759">Scaling
Agile at bol.com</a></li>
<li><a href="https://blog.intercom.com/how-we-build-software/">Lessons
Learned from Scaling a Product Team at Intercom</a></li>
<li><a
href="https://medium.com/@eleonorazucconi/toby-oliver-cto-typeform-on-hiring-managing-and-scaling-engineering-teams-86bef9e5a708">Hiring,
Managing, and Scaling Engineering Teams at Typeform</a></li>
<li><a
href="https://instagram-engineering.com/scaling-the-datagram-team-fc67bcf9b721">Scaling
the Datagram Team at Instagram</a></li>
<li><a
href="https://medium.com/flexport-design/designing-a-design-team-a9a066bc48a5">Scaling
the Design Team at Flexport</a></li>
<li><a
href="https://medium.com/salesforce-ux/the-salesforce-team-model-for-scaling-a-design-system-d89c2a2d404b">Team
Model for Scaling a Design System at Salesforce</a></li>
<li><a
href="https://medium.com/wish-engineering/scaling-the-analytics-team-at-wish-part-4-recruiting-2a9823b9f5a">Building
Analytics Team (4 parts) at Wish</a></li>
<li><a
href="https://medium.com/transferwise-ideas/from-2-founders-to-1000-employees-how-a-small-scale-startup-grew-into-a-global-community-9f26371a551b">From
2 Founders to 1000 Employees at Transferwise</a></li>
<li><a
href="https://medium.com/thinking-design/lessons-learned-growing-a-ux-team-from-10-to-170-f7b47be02262">Lessons
Learned Growing a UX Team from 10 to 170 at Adobe</a></li>
<li><a
href="https://medium.com/@sarahtavel/five-lessons-from-scaling-pinterest-6a699a889b08">Five
Lessons from Scaling at Pinterest</a></li>
<li><a
href="http://engineering.vinted.com/2018/09/04/how-we-approach-engineering-at-vinted/">Approach
Engineering at Vinted</a></li>
<li><a
href="https://engineering.indeedblog.com/blog/2018/10/using-metrics-to-improve-the-development-process-and-coach-people/">Using
Metrics to Improve the Development Process (and Coach People) at
Indeed</a></li>
<li><a
href="https://medium.com/@SkyscannerEng/9-mistakes-to-avoid-while-creating-an-internal-product-63d579b00b1a">Mistakes
to Avoid while Creating an Internal Product at Skyscanner</a></li>
<li><a
href="https://codeascraft.com/2018/01/04/selecting-a-cloud-provider/">RACI
(Responsible, Accountable, Consulted, Informed) at Etsy</a></li>
<li><a
href="https://jobs.zalando.com/tech/blog/four-pillars-leadership/">Four
Pillars of Leading People (Empathy, Inspiration, Trust, Honesty) at
Zalando</a></li>
<li><a
href="https://engineering.shopify.com/blogs/engineering/pair-programming-explained">Pair
Programming at Shopify</a></li>
<li><a
href="https://blog.asana.com/2017/12/distributed-responsibility-engineering-manager/">Distributed
Responsibility at Asana</a></li>
<li><a
href="https://jobs.zalando.com/tech/blog/rotating-engineers-at-zalando/">Rotating
Engineers at Zalando</a></li>
<li><a
href="https://medium.com/pinterest-engineering/how-pinterest-supercharged-its-growth-team-with-experiment-idea-review-fd6571a02fb8">Experiment
Idea Review at Pinterest</a></li>
<li><a
href="https://engineering.atspotify.com/2020/06/25/tech-migrations-the-spotify-way/">Tech
Migrations at Spotify</a></li>
<li><a
href="https://engineeringblog.yelp.com/2021/01/whose-code-is-it-anyway.html">Improving
Code Ownership at Yelp</a></li>
<li><a
href="https://tech.ebayinc.com/engineering/how-creating-an-agile-code-base-helped-ebay-pivot-for-apple-silicon/">Agile
Code Base at eBay</a></li>
<li><a
href="https://medium.com/miro-engineering/agile-data-engineering-at-miro-ec2dcc8a3fcb">Agile
Data Engineering at Miro</a></li>
<li><a
href="https://medium.com/airbnb-engineering/incident-management-ae863dc5d47f">Automated
Incident Management through Slack at Airbnb</a></li>
<li><a
href="https://medium.com/bbc-product-technology/refactor-organisation-80e4e171d922">Refactor
Organization at BBC</a></li>
<li><a href="https://ai.google/research/pubs/pub47025">Code Review</a>
<ul>
<li><a
href="https://medium.com/@palantir/code-review-best-practices-19e02780015f">Code
Review at Palantir</a></li>
<li><a
href="https://engineering.linecorp.com/en/blog/effective-code-review/">Code
Review at LINE</a></li>
<li><a
href="https://medium.engineering/code-reviews-at-medium-bed2c0dce13a">Code
Reviews at Medium</a></li>
<li><a
href="https://engineering.linkedin.com/blog/2018/06/scaling-collective-code-ownership-with-code-reviews">Code
Review at LinkedIn</a></li>
<li><a
href="https://medium.com/disney-streaming/the-secret-to-better-code-reviews-c14c7884b9ac">Code
Review at Disney</a></li>
<li><a
href="https://www.netlify.com/blog/2020/03/05/feedback-ladders-how-we-encode-code-reviews-at-netlify/">Code
Review at Netlify</a></li>
</ul></li>
</ul>
<h2 id="talk">Talk</h2>
<ul>
<li><a href="https://www.youtube.com/watch?v=Y6Ev8GIlbxc">Distributed
Systems in One Lesson - Tim Berglund, Senior Director of Developer
Experience at Confluent</a></li>
<li><a
href="https://www.usenix.org/conference/srecon17americas/program/presentation/erlich">Building
Real Time Infrastructure at Facebook - Jeff Barber and Shie Erlich,
Software Engineer at Facebook</a></li>
<li><a
href="https://www.usenix.org/conference/srecon16/program/presentation/alvidrez">Building
Reliable Social Infrastructure for Google - Marc Alvidrez, Senior
Manager at Google</a></li>
<li><a href="https://www.youtube.com/watch?v=K8YuavUy6Qc">Building a
Distributed Build System at Google Scale - Aysylu Greenberg, SDE at
Google</a></li>
<li><a href="https://www.youtube.com/watch?v=ggizCjUCCqE">Site
Reliability Engineering at Dropbox - Tammy Butow, Site Reliability
Engineering Manager at Dropbox</a></li>
<li><a href="https://www.youtube.com/watch?v=H4vMcD7zKM0">How Google
Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director
for Google Cloud Platform</a></li>
<li><a
href="https://www.youtube.com/watch?v=CZ3wIuvmHeM&amp;t=2837s">Netflix
Guide to Microservices - Josh Evans, Director of Operations Engineering
at Netflix</a></li>
<li><a href="https://www.youtube.com/watch?v=1-3Ahy7Fxsc">Achieving
Rapid Response Times in Large Online Services - Jeff Dean, Google Senior
Fellow</a></li>
<li><a href="https://www.youtube.com/watch?v=N8NWDHgWA28">Architecture
to Handle 80K RPS Celebrity Sales at Shopify - Simon Eskildsen,
Engineering Lead at Shopify</a></li>
<li><a href="https://www.youtube.com/watch?v=QCHiNEw73AU">Lessons of
Scale at Facebook - Bobby Johnson, Director of Engineering at
Facebook</a></li>
<li><a href="https://www.salesforce.com/video/1757880/">Performance
Optimization for the Greater China Region at Salesforce - Jeff Cheng,
Enterprise Architect at Salesforce</a></li>
<li><a href="https://vimeo.com/252367076">How GIPHY Delivers a GIF to
300 Millions Users - Alex Hoang and Nima Khoshini, Services Engineers at
GIPHY</a></li>
<li><a
href="https://www.youtube.com/watch?v=wzsxJqeVIhY&amp;list=PLMu8-hpCxIVENuAue7bd0eCAglLGY_8AW&amp;index=7">High
Performance Packet Processing Platform at Alibaba - Haiyong Wang, Senior
Director at Alibaba</a></li>
<li><a
href="https://atscaleconference.com/videos/solving-large-scale-data-center-and-cloud-interconnection-problems/">Solving
Large-scale Data Center and Cloud Interconnection Problems - Ihab
Tarazi, CTO at Equinix</a></li>
<li><a href="https://www.youtube.com/watch?v=PE4gwstWhmc">Scaling
Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox</a></li>
<li><a href="https://www.youtube.com/watch?v=IhGWOaD5BYQ">Scaling
Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox</a></li>
<li><a
href="https://atscaleconference.com/videos/performance-scale-2018-opening-remarks/">Scaling
with Performance at Facebook - Bill Jia, VP of Infrastructure at
Facebook</a></li>
<li><a href="https://www.youtube.com/watch?v=IO4teCbHvZw">Scaling Live
Videos to a Billion Users at Facebook - Sachin Kulkarni, Director of
Engineering at Facebook</a></li>
<li><a href="https://www.youtube.com/watch?v=hnpzNAPiC0E">Scaling
Infrastructure at Instagram - Lisa Guo, Instagram Engineering</a></li>
<li><a href="https://www.youtube.com/watch?v=6OvrFkLSoZ0">Scaling
Infrastructure at Twitter - Yao Yue, Staff Software Engineer at
Twitter</a></li>
<li><a href="https://www.youtube.com/watch?v=LfqyhM1LeIU">Scaling
Infrastructure at Etsy - Bethany Macri, Engineering Manager at
Etsy</a></li>
<li><a
href="https://atscaleconference.com/videos/scaling-alibabas-real-time-infrastructure-for-global-shopping-holiday/">Scaling
Real-time Infrastructure at Alibaba for Global Shopping Holiday -
Xiaowei Jiang, Senior Director at Alibaba</a></li>
<li><a href="https://www.youtube.com/watch?v=cdsfRXr9pJU">Scaling Data
Infrastructure at Spotify - Matti (Lepistö) Pehrs, Spotify</a></li>
<li><a
href="https://www.youtube.com/watch?v=jQNCuD_hxdQ&amp;list=RDhnpzNAPiC0E&amp;index=11">Scaling
Pinterest - Marty Weiner, Pinterests founding engineer</a></li>
<li><a
href="https://www.infoq.com/presentations/slack-scalability">Scaling
Slack - Bing Wei, Software Engineer (Infrastructure) at Slack</a></li>
<li><a
href="https://www.youtube.com/watch?v=5yDO-tmIoXY&amp;feature=youtu.be">Scaling
Backend at Youtube - Sugu Sougoumarane, SDE at Youtube</a></li>
<li><a href="https://www.youtube.com/watch?v=nuiLcWE8sPA">Scaling
Backend at Uber - Matt Ranney, Chief Systems Architect at Uber</a></li>
<li><a href="https://www.youtube.com/watch?v=tbqcsHg-Q_o">Scaling Global
CDN at Netflix - Dave Temkin, Director of Global Networks at
Netflix</a></li>
<li><a href="https://www.youtube.com/watch?v=bxhYNfFeVF4">Scaling Load
Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick
Shuff, Production Engineer at Facebook</a></li>
<li><a href="https://www.youtube.com/watch?v=RlkCdM_f3p4">Scaling (a
NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead
Platform Developer at MindGeek</a></li>
<li><a
href="https://www.infoq.com/presentations/quora-analytics">Scaling
Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at
Quora</a></li>
<li><a href="https://www.youtube.com/watch?v=g_MPGU_m01s">Scaling Git at
Microsoft - Saeed Noursalehi, Principal Program Manager at
Microsoft</a></li>
<li><a href="https://www.youtube.com/watch?v=F-f0-k46WVk">Scaling
Multitenant Architecture Across Multiple Data Centres at Shopify -
Weingarten, Engineering Lead at Shopify</a></li>
</ul>
<h2 id="a-piece-of-cake">A Piece of Cake</h2>
<p>Roses are red. Violets are blue. <a
href="https://nguyenquocbinh.org/">Binh</a> likes sweet. <a
href="https://paypal.me/binhnguyennus">Treat Binh a tiramisu?</a>
:cake:</p>
<p><a
href="https://github.com/binhnguyennus/awesome-scalability">scalability.md
Github</a></p>