248 KiB
248 KiB
!Logo (/logo.png) (http://awesome-scalability.com/)
An updated and organized reading list for illustrating the patterns of scalable, reliable, and performant large-scale systems. Concepts are explained in the articles of prominent engineers
and credible references. Case studies are taken from battle-tested systems that serve millions to billions of users.
If your system goes slow
▐ Understand your problems: scalability problem (fast for a single user but slow under heavy load) or performance problem (slow for a single user) by reviewing some design principles
▐ (#principle) and checking how scalability (#scalability) and performance (#performance) problems are solved at tech companies. The section of intelligence (#intelligence) are created for
▐ those who work with data and machine learning at big (data) and deep (learning) scale.
If your system goes down
▐ "Even if you lose all one day, you can build all over again if you retain your calm!" - Thuan Pham, former CTO of Uber. So, keep calm and mind the availability (#availability) and stability
▐ (#stability) matters!
If you are having a system design interview
▐ Look at some interview notes (#interview) and real-world architectures with completed diagrams (#architecture) to get a comprehensive view before designing your system on whiteboard. You
▐ can check some talks (#talk) of engineers from tech giants to know how they build, scale, and optimize their systems. Good luck!
If you are building your dream team
▐ The goal of scaling team is not growing team size but increasing team output and value. You can find out how tech companies reach that goal in various aspects: hiring, management,
▐ organization, culture, and communication in the organization (#organization) section.
Community power
▐ Contributions are greatly welcome! You may want to take a look at the contribution guidelines (CONTRIBUTING.md). If you see a link here that is no longer maintained or is not a good fit,
▐ please submit a pull request!
▐ Many long hours of hard work have gone into this project. If you find it helpful, please share on Facebook, on Twitter (https://ctt.ec/V8B2p), on Weibo (http://t.cn/RnjFLCB), or on your
▐ chat groups! Knowledge is power, knowledge shared is power multiplied. Thank you!
Content
- Principle (#principle)
- Scalability (#scalability)
- Availability (#availability)
- Stability (#stability)
- Performance (#performance)
- Intelligence (#intelligence)
- Architecture (#architecture)
- Interview (#interview)
- Organization (#organization)
- Talk (#talk)
- Book (#book)
Principle
⟡ Lessons from Giant-Scale Services - Eric Brewer, UC Berkeley & Google (https://people.eecs.berkeley.edu/~brewer/papers/GiantScale-IEEE.pdf)
⟡ Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean, Google (https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
⟡ How to Design a Good API & Why it Matters - Joshua Bloch, CMU & Google (https://www.infoq.com/presentations/effective-api-design)
⟡ On Efficiency, Reliability, Scaling - James Hamilton, VP at AWS (http://mvdirona.com/jrh/work/)
⟡ Principles of Chaos Engineering (https://www.usenix.org/conference/srecon17americas/program/presentation/rosenthal)
⟡ Finding the Order in Chaos (https://www.usenix.org/conference/srecon16/program/presentation/lueder)
⟡ The Twelve-Factor App (https://12factor.net/)
⟡ Clean Architecture (https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html)
⟡ High Cohesion and Low Coupling (http://www.math-cs.gordon.edu/courses/cs211/lectures-2009/Cohesion,Coupling,MVC.pdf)
⟡ Monoliths and Microservices (https://medium.com/@SkyscannerEng/monoliths-and-microservices-8c65708c3dbf)
⟡ CAP Theorem and Trade-offs (http://robertgreiner.com/2014/08/cap-theorem-revisited/)
⟡ CP Databases and AP Databases (https://blog.andyet.com/2014/10/01/right-database)
⟡ Stateless vs Stateful Scalability (http://ithare.com/scaling-stateful-objects/)
⟡ Scale Up vs Scale Out: Hidden Costs (https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/)
⟡ ACID and BASE (https://neo4j.com/blog/acid-vs-base-consistency-models-explained/)
⟡ Blocking/Non-Blocking and Sync/Async (https://blogs.msdn.microsoft.com/csliu/2009/08/27/io-concept-blockingnon-blocking-vs-syncasync/)
⟡ Performance and Scalability of Databases (https://use-the-index-luke.com/sql/testing-scalability)
⟡ Database Isolation Levels and Effects on Performance and Scalability (http://highscalability.com/blog/2011/2/10/database-isolation-levels-and-their-effects-on-performance-a.html)
⟡ The Probability of Data Loss in Large Clusters (https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html)
⟡ Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence (https://docs.microsoft.com/en-us/previous-versions/msp-n-p/dn271399(v=pandp.10))
⟡ SQL vs NoSQL (https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/)
⟡ SQL vs NoSQL - Lesson Learned at Salesforce (https://engineering.salesforce.com/sql-or-nosql-9eaf1d92545b)
⟡ NoSQL Databases: Survey and Decision Guidance (https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d)
⟡ How Sharding Works (https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6)
⟡ Consistent Hashing (http://www.tom-e-white.com/2007/11/consistent-hashing.html)
⟡ Consistent Hashing: Algorithmic Tradeoffs (https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8)
⟡ Don’t be tricked by the Hashing Trick (https://booking.ai/dont-be-tricked-by-the-hashing-trick-192a6aae3087)
⟡ Uniform Consistent Hashing at Netflix (https://medium.com/netflix-techblog/distributing-content-to-open-connect-3e3e391d4dc9)
⟡ Eventually Consistent - Werner Vogels, CTO at Amazon (https://www.allthingsdistributed.com/2008/12/eventually_consistent.html)
⟡ Cache is King (https://www.stevesouders.com/blog/2012/10/11/cache-is-king/)
⟡ Anti-Caching (https://www.the-paper-trail.org/post/2014-06-06-paper-notes-anti-caching/)
⟡ Understand Latency (http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it)
⟡ Latency Numbers Every Programmer Should Know (http://norvig.com/21-days.html#answers)
⟡ The Calculus of Service Availability (https://queue.acm.org/detail.cfm?id=3096459&__s=dnkxuaws9pogqdnxmx8i)
⟡ Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO
(http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html)
⟡ Common Bottlenecks (http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html)
⟡ Life Beyond Distributed Transactions (https://queue.acm.org/detail.cfm?id=3025012)
⟡ Relying on Software to Redirect Traffic Reliably at Various Layers (https://www.usenix.org/conference/srecon15/program/presentation/taveira)
⟡ Breaking Things on Purpose (https://www.usenix.org/conference/srecon17americas/program/presentation/andrus)
⟡ Avoid Over Engineering (https://medium.com/@rdsubhas/10-modern-software-engineering-mistakes-bc67fbef4fc8)
⟡ Scalability Worst Practices (https://www.infoq.com/articles/scalability-worst-practices)
⟡ Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple! (https://medium.com/@DataStax/instagram-engineerings-3-rules-to-a-scalable-cloud-application-architecture-c44afed31406)
⟡ Simplicity by Distributing Complexity (https://jobs.zalando.com/tech/blog/simplicity-by-distributing-complexity/)
⟡ Why Over-Reusing is Bad (http://tech.transferwise.com/why-over-reusing-is-bad/)
⟡ Performance is a Feature (https://blog.codinghorror.com/performance-is-a-feature/)
⟡ Make Performance Part of Your Workflow (https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/)
⟡ The Benefits of Server Side Rendering over Client Side Rendering (https://medium.com/walmartlabs/the-benefits-of-server-side-rendering-over-client-side-rendering-5d07ff2cefe8)
⟡ Automate and Abstract: Lessons at Facebook (https://architecht.io/lessons-from-facebook-on-engineering-for-scale-f5716f0afc7a)
⟡ AWS Do's and Don'ts (https://8thlight.com/blog/sarah-sunday/2017/09/15/aws-dos-and-donts.html)
⟡ (UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify (https://medium.com/@hellostanley/design-doesnt-scale-4d81e12cbc3e)
⟡ Linux Performance (http://www.brendangregg.com/linuxperf.html)
⟡ Building Fast and Resilient Web Applications - Ilya Grigorik (https://www.igvita.com/2016/05/20/building-fast-and-resilient-web-applications/)
⟡ Accept Partial Failures, Minimize Service Loss (https://www.usenix.org/conference/srecon17asia/program/presentation/wang_daxin)
⟡ Design for Resiliency (http://highscalability.com/blog/2012/12/31/designing-for-resiliency-will-be-so-2013.html)
⟡ Design for Self-healing (https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/self-healing)
⟡ Design for Scaling Out (https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/scale-out)
⟡ Design for Evolution (https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-evolution)
⟡ Learn from Mistakes (http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html)
Scalability
⟡ Microservices and Orchestration (https://martinfowler.com/microservices/)
* **Domain-Oriented Microservice Architecture at Uber** (https://eng.uber.com/microservice-architecture/)
* **Service Architecture (3 parts: Domain Gateways, Value-Added Services, BFF) at SoundCloud** (https://developers.soundcloud.com/blog/service-architecture-3)
* **Container (8 parts) at Riot Games** (https://engineering.riotgames.com/news/thinking-inside-container)
* **Containerization at Pinterest** (https://medium.com/@Pinterest_Engineering/containerization-at-pinterest-92295347f2f3)
* **Evolution of Container Usage at Netflix** (https://medium.com/netflix-techblog/the-evolution-of-container-usage-at-netflix-3abfc096781b)
* **Dockerizing MySQL at Uber** (https://eng.uber.com/dockerizing-mysql/)
* **Testing of Microservices at Spotify** (https://labs.spotify.com/2018/01/11/testing-of-microservices/)
* **Docker in Production at Treehouse** (https://medium.com/treehouse-engineering/lessons-learned-running-docker-in-production-5dce99ece770)
* **Microservice at SoundCloud** (https://developers.soundcloud.com/blog/inside-a-soundcloud-microservice)
* **Operate Kubernetes Reliably at Stripe** (https://stripe.com/blog/operating-kubernetes)
* **Cross-Cluster Traffic Mirroring with Istio at Trivago** (https://tech.trivago.com/2020/06/10/cross-cluster-traffic-mirroring-with-istio/)
* **Agrarian-Scale Kubernetes (3 parts) at New York Times** (https://open.nytimes.com/agrarian-scale-kubernetes-part-3-ee459887ed7e)
* **Nanoservices at BBC** (https://medium.com/bbc-design-engineering/powering-bbc-online-with-nanoservices-727840ba015b)
* **PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg** (https://www.techatbloomberg.com/blog/powerfulseal-testing-tool-kubernetes-clusters/)
* **Conductor: Microservices Orchestrator at Netflix** (https://medium.com/netflix-techblog/netflix-conductor-a-microservices-orchestrator-2e8d4771bf40)
* **Docker Containers that Power Over 100.000 Online Shops at Shopify** (https://shopifyengineering.myshopify.com/blogs/engineering/docker-at-shopify-how-we-built-containers-that-power-over-1
00-000-online-shops)
* **Microservice Architecture at Medium** (https://medium.engineering/microservice-architecture-at-medium-9c33805eb74f)
* **From bare-metal to Kubernetes at Betabrand** (https://boxunix.com/post/bare_metal_to_kube/)
* **Kubernetes at Tinder** (https://medium.com/tinder-engineering/tinders-move-to-kubernetes-cda2a6372f44)
* **Kubernetes at Quora** (https://www.quora.com/q/quoraengineering/Adopting-Kubernetes-at-Quora)
* **Kubernetes Platform at Pinterest** (https://medium.com/pinterest-engineering/building-a-kubernetes-platform-at-pinterest-fb3d9571c948)
* **Microservices at Nubank** (https://medium.com/building-nubank/microservices-at-nubank-an-overview-2ebcb336c64d)
* **Payment Transaction Management in Microservices at Mercari** (https://engineering.mercari.com/en/blog/entry/20210831-2019-06-07-155849/)
* **Service Mesh at Snap** (https://eng.snap.com/monolith-to-multicloud-microservices-snap-service-mesh)
* **GRIT: Protocol for Distributed Transactions across Microservices at eBay** (https://tech.ebayinc.com/engineering/grit-a-protocol-for-distributed-transactions-across-microservices/)
* **Rubix: Kubernetes at Palantir** (https://medium.com/palantir/introducing-rubix-kubernetes-at-palantir-ab0ce16ea42e)
* **CRISP: Critical Path Analysis for Microservice Architectures at Uber** (https://eng.uber.com/crisp-critical-path-analysis-for-microservice-architectures/)
⟡ Distributed Caching (https://www.wix.engineering/post/scaling-to-100m-to-cache-or-not-to-cache)
* **EVCache: Distributed In-memory Caching at Netflix** (https://medium.com/netflix-techblog/caching-for-a-global-netflix-7bcc457012f1)
* **EVCache Cache Warmer Infrastructure at Netflix** (https://medium.com/netflix-techblog/cache-warming-agility-for-a-stateful-service-2d3b1da82642)
* **Memsniff: Robust Memcache Traffic Analyzer at Box** (https://blog.box.com/blog/introducing-memsniff-robust-memcache-traffic-analyzer/)
* **Caching with Consistent Hashing and Cache Smearing at Etsy** (https://codeascraft.com/2017/11/30/how-etsy-caches/)
* **Analysis of Photo Caching at Facebook** (https://code.facebook.com/posts/220956754772273/an-analysis-of-facebook-photo-caching/)
* **Cache Efficiency Exercise at Facebook** (https://code.facebook.com/posts/964122680272229/web-performance-cache-efficiency-exercise/)
* **tCache: Scalable Data-aware Java Caching at Trivago** (http://tech.trivago.com/2015/10/15/tcache/)
* **Pycache: In-process Caching at Quora** (https://engineering.quora.com/Pycache-lightning-fast-in-process-caching)
* **Reduce Memcached Memory Usage by 50% at Trivago** (http://tech.trivago.com/2017/12/19/how-trivago-reduced-memcached-memory-usage-by-50/)
* **Caching Internal Service Calls at Yelp** (https://engineeringblog.yelp.com/2018/03/caching-internal-service-calls-at-yelp.html)
* **Estimating the Cache Efficiency using Big Data at Allegro** (https://allegro.tech/2017/01/estimating-the-cache-efficiency-using-big-data.html)
* **Distributed Cache at Zalando** (https://jobs.zalando.com/tech/blog/distributed-cache-akka-kubernetes/)
* **Application Data Caching from RAM to SSD at NetFlix** (https://medium.com/netflix-techblog/evolution-of-application-data-caching-from-ram-to-ssd-a33d6fa7a690)
* **Tradeoffs of Replicated Cache at Skyscanner** (https://medium.com/@SkyscannerEng/the-tradeoffs-of-a-replicated-cache-b6680c722f58)
* **Avoiding Cache Stampede at DoorDash** (https://blog.doordash.com/avoiding-cache-stampede-at-doordash-55bbf596d94b)
* **Location Caching with Quadtrees at Yext** (http://engblog.yext.com/post/geolocation-caching)
* **Video Metadata Caching at Vimeo** (https://medium.com/vimeo-engineering-blog/video-metadata-caching-at-vimeo-a54b25f0b304)
* **Scaling Redis at Twitter** (http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html)
* **Scaling Job Queue with Redis at Slack** (https://slack.engineering/scaling-slacks-job-queue-687222e9d100)
* **Moving persistent data out of Redis at Github** (https://githubengineering.com/moving-persistent-data-out-of-redis/)
* **Storing Hundreds of Millions of Simple Key-Value Pairs in Redis at Instagram** (https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f
74c)
* **Redis at Trivago** (http://tech.trivago.com/2017/01/25/learn-redis-the-hard-way-in-production/)
* **Optimizing Redis Storage at Deliveroo** (https://deliveroo.engineering/2017/01/19/optimising-membership-queries.html)
* **Memory Optimization in Redis at Wattpad** (http://engineering.wattpad.com/post/23244724794/store-more-stuff-memory-optimization-in-redis)
* **Redis Fleet at Heroku** (https://blog.heroku.com/rolling-redis-fleet)
* **Solving Remote Build Cache Misses (2 parts) at SoundCloud** (https://developers.soundcloud.com/blog/gradle-remote-build-cache-misses-part-2)
* **Ratings & Reviews (2 parts) at Flipkart** (https://blog.flipkart.tech/ratings-reviews-flipkart-part-2-574ab08e75cf)
* **Prefetch Caching of Items at eBay** (https://tech.ebayinc.com/engineering/prefetch-caching-of-ebay-items/)
* **Cross-Region Caching Library at Wix** (https://www.wix.engineering/post/how-we-built-a-cross-region-caching-library)
* **Improving Distributed Caching Performance and Efficiency at Pinterest** (https://medium.com/pinterest-engineering/improving-distributed-caching-performance-and-efficiency-at-pinterest-924
84b5fe39b)
* **Standardize and Improve Microservices Caching at DoorDash** (https://doordash.engineering/2023/10/19/how-doordash-standardized-and-improved-microservices-caching/)
* **HTTP Caching and CDN** (https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)
* **Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga** (https://www.zynga.com/blogs/engineering/zynga-geo-proxy-reducing-mobile-game-latency)
* **Google AMP at Condé Nast** (https://technology.condenast.com/story/the-why-and-how-of-google-amp-at-conde-nast)
* **A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo** (https://deliveroo.engineering/2016/09/19/ab-testing-cdns.html)
* **HAProxy with Kubernetes for User-facing Traffic at SoundCloud** (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic)
* **Bandaid: Service Proxy at Dropbox** (https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy/)
* **Service Workers at Slack** (https://slack.engineering/service-workers-at-slack-our-quest-for-faster-boot-times-and-offline-support-3492cf79c88)
* **CDN Services at Spotify** (https://labs.spotify.com/2020/02/24/how-spotify-aligned-cdn-services-for-a-lightning-fast-streaming-experience/)
⟡ Distributed Locking (https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
* **Chubby: Lock Service for Loosely Coupled Distributed Systems at Google** (https://blog.acolyer.org/2015/02/13/the-chubby-lock-service-for-loosely-coupled-distributed-systems/)
* **Distributed Locking at Uber** (https://www.youtube.com/watch?v=MDuagr729aU)
* **Distributed Locks using Redis at GoSquared** (https://engineering.gosquared.com/distributed-locks-using-redis)
* **ZooKeeper at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/zookeeper-at-twitter.html)
* **Eliminating Duplicate Queries using Distributed Locking at Chartio** (https://blog.chartio.com/posts/eliminating-duplicate-queries-using-distributed-locking)
⟡ Distributed Tracking, Tracing, and Measuring (https://www.oreilly.com/ideas/understanding-the-value-of-distributed-tracing)
* **Zipkin: Distributed Systems Tracing at Twitter** (https://blog.twitter.com/engineering/en_us/a/2012/distributed-systems-tracing-with-zipkin.html)
* **Improve Zipkin Traces using Kubernetes Pod Metadata at SoundCloud** (https://developers.soundcloud.com/blog/using-kubernetes-pod-metadata-to-improve-zipkin-traces)
* **Canopy: Scalable Distributed Tracing & Analysis at Facebook** (https://www.infoq.com/presentations/canopy-scalable-tracing-analytics-facebook)
* **Pintrace: Distributed Tracing at Pinterest** (https://medium.com/@Pinterest_Engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b)
* **XCMetrics: All-in-One Tool for Tracking Xcode Build Metrics at Spotify** (https://engineering.atspotify.com/2021/01/20/introducing-xcmetrics-our-all-in-one-tool-for-tracking-xcode-build-m
etrics/)
* **Real-time Distributed Tracing at LinkedIn** (https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency)
* **Tracking Service Infrastructure at Scale at Shopify** (https://www.usenix.org/conference/srecon17americas/program/presentation/arthorne)
* **Distributed Tracing at HelloFresh** (https://engineering.hellofresh.com/scaling-hellofresh-distributed-tracing-7b182928247d)
* **Analyzing Distributed Trace Data at Pinterest** (https://medium.com/@Pinterest_Engineering/analyzing-distributed-trace-data-6aae58919949)
* **Distributed Tracing at Uber** (https://eng.uber.com/distributed-tracing/)
* **JVM Profiler: Tracing Distributed JVM Applications at Uber** (https://eng.uber.com/jvm-profiler/)
* **Data Checking at Dropbox** (https://www.usenix.org/conference/srecon17asia/program/presentation/mah)
* **Tracing Distributed Systems at Showmax** (https://tech.showmax.com/2016/10/tracing-distributed-systems-at-showmax/)
* **osquery Across the Enterprise at Palantir** (https://medium.com/@palantir/osquery-across-the-enterprise-3c3c9d13ec55)
* **StatsD at Etsy** (https://codeascraft.com/2011/02/15/measure-anything-measure-everything/)
* **StatsD at DoorDash** (https://blog.doordash.com/scaling-statsd-84d456a7cc2a)
⟡ Distributed Scheduling (https://www.csee.umbc.edu/courses/graduate/CMSC621/fall02/lectures/ch11.pdf)
* **Distributed Task Scheduling (3 parts) at PagerDuty** (https://www.pagerduty.com/eng/distributed-task-scheduling-3/)
* **Building Cron at Google** (https://landing.google.com/sre/sre-book/chapters/distributed-periodic-scheduling/)
* **Distributed Cron Architecture at Quora** (https://engineering.quora.com/Quoras-Distributed-Cron-Architecture)
* **Chronos: A Replacement for Cron at Airbnb** (https://medium.com/airbnb-engineering/chronos-a-replacement-for-cron-f05d7d986a9d)
* **Scheduler at Nextdoor** (https://engblog.nextdoor.com/we-don-t-run-cron-jobs-at-nextdoor-6f7f9cc62040)
* **Peloton: Unified Resource Scheduler for Diverse Cluster Workloads at Uber** (https://eng.uber.com/peloton/)
* **Fenzo: OSS Scheduler for Apache Mesos Frameworks at Netflix** (https://medium.com/netflix-techblog/fenzo-oss-scheduler-for-apache-mesos-frameworks-5c340e77e543)
* **Airflow - Workflow Orchestration** (https://airflow.apache.org/)
* **Airflow at Airbnb** (https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8)
* **Airflow at Pandora** (https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee)
* **Airflow at Robinhood** (https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8)
* **Airflow at Lyft** (https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff)
* **Airflow at Drivy** (https://drivy.engineering/airflow-architecture/)
* **Airflow at Grab** (https://engineering.grab.com/experimentation-platform-data-pipeline)
* **Airflow at Adobe** (https://medium.com/adobetech/adobe-experience-platform-orchestration-service-with-apache-airflow-952203723c0b)
* **Auditing Airflow Job Runs at Walmart** (https://medium.com/walmartlabs/auditing-airflow-batch-jobs-73b45100045)
* **MaaT: DAG-based Distributed Task Scheduler at Alibaba** (https://hackernoon.com/meet-maat-alibabas-dag-based-distributed-task-scheduler-7c9cf0c83438)
* **boundary-layer: Declarative Airflow Workflows at Etsy** (https://codeascraft.com/2018/11/14/boundary-layer%e2%80%89-declarative-airflow-workflows/)
⟡ Distributed Monitoring and Alerting (https://www.oreilly.com/ideas/monitoring-distributed-systems)
* **Unicorn: Remediation System at eBay** (https://www.ebayinc.com/stories/blogs/tech/unicorn-rheos-remediation-center/)
* **M3: Metrics and Monitoring Platform at Uber** (https://eng.uber.com/optimizing-m3/)
* **Athena: Automated Build Health Management System at Dropbox** (https://blogs.dropbox.com/tech/2019/05/athena-our-automated-build-health-management-system/)
* **Vortex: Monitoring Server Applications at Dropbox** (https://blogs.dropbox.com/tech/2019/11/monitoring-server-applications-with-vortex/)
* **Nuage: Cloud Management Service at LinkedIn** (https://engineering.linkedin.com/blog/2019/solving-manageability-challenges-with-nuage)
* **Telltale: Application Monitoring at Netflix** (https://netflixtechblog.com/telltale-netflix-application-monitoring-simplified-5c08bfa780ba)
* **ThirdEye: Monitoring Platform at LinkedIn** (https://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor)
* **Periskop: Exception Monitoring Service at SoundCloud** (https://developers.soundcloud.com/blog/periskop-exception-monitoring-service)
* **Securitybot: Distributed Alerting Bot at Dropbox** (https://blogs.dropbox.com/tech/2017/02/meet-securitybot-open-sourcing-automated-security-at-scale/)
* **Monitoring System at Alibaba** (https://www.usenix.org/conference/srecon18asia/presentation/xinchi)
* **Real User Monitoring at Dailymotion** (https://medium.com/dailymotion/real-user-monitoring-1948375f8be5)
* **Alerting Ecosystem at Uber** (https://eng.uber.com/observability-at-scale/)
* **Alerting Framework at Airbnb** (https://medium.com/airbnb-engineering/alerting-framework-at-airbnb-35ba48df894f)
* **Alerting on Service-Level Objectives (SLOs) at SoundCloud** (https://developers.soundcloud.com/blog/alerting-on-slos)
* **Job-based Forecasting Workflow for Observability Anomaly Detection at Uber** (https://eng.uber.com/observability-anomaly-detection/)
* **Monitoring and Alert System using Graphite and Cabot at HackerEarth** (http://engineering.hackerearth.com/2017/03/21/monitoring-and-alert-system-using-graphite-and-cabot/)
* **Observability (2 parts) at Twitter** (https://blog.twitter.com/engineering/en_us/a/2016/observability-at-twitter-technical-overview-part-ii.html)
* **Distributed Security Alerting at Slack** (https://slack.engineering/distributed-security-alerting-c89414c992d6)
* **Real-Time News Alerting at Bloomberg** (https://www.infoq.com/presentations/news-alerting-bloomberg)
* **Data Pipeline Monitoring System at LinkedIn** (https://engineering.linkedin.com/blog/2019/an-inside-look-at-linkedins-data-pipeline-monitoring-system-)
* **Monitoring and Observability at Picnic** (https://blog.picnic.nl/monitoring-and-observability-at-picnic-684cefd845c4)
⟡ Distributed Security (https://msdn.microsoft.com/en-us/library/cc767123.aspx)
* **Approach to Security at Scale at Dropbox** (https://blogs.dropbox.com/tech/2018/02/security-at-scale-the-dropbox-approach/)
* **Aardvark and Repokid: AWS Least Privilege for Distributed, High-Velocity Development at Netflix** (https://medium.com/netflix-techblog/introducing-aardvark-and-repokid-53b081bf3a7e)
* **LISA: Distributed Firewall at LinkedIn** (https://www.slideshare.net/MikeSvoboda/2017-lisa-linkedins-distributed-firewall-dfw)
* **Secure Infrastructure To Store Bitcoin In The Cloud at Coinbase** (https://engineering.coinbase.com/how-coinbase-builds-secure-infrastructure-to-store-bitcoin-in-the-cloud-30a6504e40ba)
* **BinaryAlert: Real-time Serverless Malware Detection at Airbnb** (https://medium.com/airbnb-engineering/binaryalert-real-time-serverless-malware-detection-ca44370c1b90)
* **Scalable IAM Architecture to Secure Access to 100 AWS Accounts at Segment** (https://segment.com/blog/secure-access-to-100-aws-accounts/)
* **OAuth Audit Toolbox at Indeed** (http://engineering.indeedblog.com/blog/2018/04/oaudit-toolbox/)
* **Active Directory Password Blacklisting at Yelp** (https://engineeringblog.yelp.com/2018/04/ad-password-blacklisting.html)
* **Syscall Auditing at Scale at Slack** (https://slack.engineering/syscall-auditing-at-scale-e6a3ca8ac1b8)
* **Athenz: Fine-Grained, Role-Based Access Control at Yahoo** (https://yahooeng.tumblr.com/post/160481899076/open-sourcing-athenz-fine-grained-role-based)
* **WebAuthn Support for Secure Sign In at Dropbox** (https://blogs.dropbox.com/tech/2018/05/introducing-webauthn-support-for-secure-dropbox-sign-in/)
* **Security Development Lifecycle at Slack** (https://slack.engineering/moving-fast-and-securing-things-540e6c5ae58a)
* **Unprivileged Container Builds at Kinvolk** (https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/)
* **Diffy: Differencing Engine for Digital Forensics in the Cloud at Netflix** (https://medium.com/netflix-techblog/netflix-sirt-releases-diffy-a-differencing-engine-for-digital-forensics-in-
the-cloud-37b71abd2698)
* **Detecting Credential Compromise in AWS at Netflix** (https://medium.com/netflix-techblog/netflix-cloud-security-detecting-credential-compromise-in-aws-9493d6fd373a)
* **Scalable User Privacy at Spotify** (https://labs.spotify.com/2018/09/18/scalable-user-privacy/)
* **AVA: Audit Web Applications at Indeed** (https://engineering.indeedblog.com/blog/2018/09/application-scanning/)
* **TTL as a Service: Automatic Revocation of Stale Privileges at Yelp** (https://engineeringblog.yelp.com/2018/11/ttl-as-a-service.html)
* **Enterprise Key Management at Slack** (https://slack.engineering/engineering-dive-into-slack-enterprise-key-management-1fce471b178c)
* **Scalability and Authentication at Twitch** (https://blog.twitch.tv/en/2019/03/15/how-twitch-addresses-scalability-and-authentication/)
* **Edge Authentication and Token-Agnostic Identity Propagation at Netflix** (https://netflixtechblog.com/edge-authentication-and-token-agnostic-identity-propagation-514e47e0b602)
* **Hardening Kubernetes Infrastructure with Cilium at Palantir** (https://blog.palantir.com/hardening-palantirs-kubernetes-infrastructure-with-cilium-1c40d4c7ef0)
* **Improving Web Vulnerability Management through Automation at Lyft** (https://eng.lyft.com/improving-web-vulnerability-management-through-automation-2631570d8415)
* **Clock Skew when Syncing Password Payloads at Drobbox** (https://dropbox.tech/application/dropbox-passwords-clock-skew-payload-sync-merge)
⟡ Distributed Messaging, Queuing, and Event Streaming (https://arxiv.org/pdf/1704.00411.pdf)
* **Cape: Event Stream Processing Framework at Dropbox** (https://blogs.dropbox.com/tech/2017/05/introducing-cape/)
* **Brooklin: Distributed Service for Near Real-Time Data Streaming at LinkedIn** (https://engineering.linkedin.com/blog/2019/brooklin-open-source)
* **Samza: Stream Processing System for Latency Insighs at LinkedIn** (https://engineering.linkedin.com/blog/2018/04/samza-aeon--latency-insights-for-asynchronous-one-way-flows)
* **Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo** (https://yahooeng.tumblr.com/post/161855616651/open-sourcing-bullet-yahoos-forward-looking)
* **EventHorizon: Tool for Watching Events Streaming at Etsy** (https://codeascraft.com/2018/05/29/the-eventhorizon-saga/)
* **Qmessage: Distributed, Asynchronous Task Queue at Quora** (https://engineering.quora.com/Qmessage-Handling-Billions-of-Tasks-Per-Day)
* **Cherami: Message Queue System for Transporting Async Tasks at Uber** (https://eng.uber.com/cherami/)
* **Dynein: Distributed Delayed Job Queueing System at Airbnb** (https://medium.com/airbnb-engineering/dynein-building-a-distributed-delayed-job-queueing-system-93ab10f05f99)
* **Timestone: Queueing System for Non-Parallelizable Workloads at Netflix** (https://netflixtechblog.com/timestone-netflixs-high-throughput-low-latency-priority-queueing-system-with-built-in
-support-1abf249ba95f)
* **Messaging Service at Riot Games** (https://engineering.riotgames.com/news/riot-messaging-service)
* **Debugging Production with Event Logging at Zillow** (https://www.zillow.com/engineering/debugging-production-event-logging/)
* **Cross-platform In-app Messaging Orchestration Service at Netflix** (https://medium.com/netflix-techblog/building-a-cross-platform-in-app-messaging-orchestration-service-86ba614f92d8)
* **Video Gatekeeper at Netflix** (https://medium.com/netflix-techblog/re-architecting-the-video-gatekeeper-f7b0ac2f6b00)
* **Scaling Push Messaging for Millions of Devices at Netflix** (https://www.infoq.com/presentations/neflix-push-messaging-scale)
* **Delaying Asynchronous Message Processing with RabbitMQ at Indeed** (http://engineering.indeedblog.com/blog/2017/06/delaying-messages/)
* **Benchmarking Streaming Computation Engines at Yahoo** (https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at)
* **Improving Stream Data Quality With Protobuf Schema Validation at Deliveroo** (https://deliveroo.engineering/2019/02/05/improving-stream-data-quality-with-protobuf-schema-validation.html)
* **Scaling Email Infrastructure at Medium** (https://medium.engineering/scaling-email-infrastructure-for-medium-digest-254223c883b8)
* **Real-time Messaging at Slack** (https://slack.engineering/real-time-messaging/)
* **Event Stream Database at Nike** (https://medium.com/nikeengineering/moving-faster-with-aws-by-creating-an-event-stream-database-dedec8ca3eeb)
* **Event Tracking System at Udemy** (https://medium.com/udemy-engineering/designing-the-new-event-tracking-system-at-udemy-a45e502216fd)
* **Event-Driven Messaging** (https://martinfowler.com/articles/201701-event-driven.html)
* **Domain-Driven Design at Alibaba** (https://medium.com/swlh/creating-coding-excellence-with-domain-driven-design-88f73d2232c3)
* **Domain-Driven Design at Weebly** (https://medium.com/weebly-engineering/how-to-organize-your-monolith-before-breaking-it-into-services-69cbdb9248b0)
* **Domain-Driven Design at Moonpig** (https://engineering.moonpig.com/development/modelling-for-domain-driven-design)
* **Scaling Event Sourcing for Netflix Downloads** (https://www.infoq.com/presentations/netflix-scale-event-sourcing)
* **Scaling Event-Sourcing at Jet.com** (https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c873cac33b8)
* **Event Sourcing (2 parts) at eBay** (https://www.ebayinc.com/stories/blogs/tech/event-sourcing-in-action-with-ebays-continuous-delivery-team/)
* **Event Sourcing at FREE NOW** (https://medium.com/inside-freenow/event-sourcing-an-evolutionary-perspective-31e7387aa6f1)
* **Scalable content feed using Event Sourcing and CQRS patterns at Brainly** (https://medium.com/engineering-brainly/scalable-content-feed-using-event-sourcing-and-cqrs-patterns-e09df98bf977
)
* **Pub-Sub Messaging** (https://aws.amazon.com/pub-sub-messaging/)
* **Pulsar: Pub-Sub Messaging at Scale at Yahoo** (https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale)
* **Wormhole: Pub-Sub System at Facebook** (https://code.facebook.com/posts/188966771280871/wormhole-pub-sub-system-moving-data-through-space-and-time/)
* **MemQ: Cloud Native Pub-Sub System at Pinterest** (https://medium.com/pinterest-engineering/memq-an-efficient-scalable-cloud-native-pubsub-system-4402695dd4e7)
* **Pub-Sub in Microservices at Netflix** (https://medium.com/netflix-techblog/how-netflix-microservices-tackle-dataset-pub-sub-4a068adcc9a)
* **Kafka - Message Broker** (https://martin.kleppmann.com/papers/kafka-debull15.pdf)
* **Kafka at LinkedIn** (https://engineering.linkedin.com/kafka/running-kafka-scale)
* **Kafka at Pinterest** (https://medium.com/pinterest-engineering/how-pinterest-runs-kafka-at-scale-ff9c6f735be)
* **Kafka at Trello** (https://tech.trello.com/why-we-chose-kafka/)
* **Kafka at Salesforce** (https://engineering.salesforce.com/how-apache-kafka-inspired-our-platform-events-architecture-2f351fe4cf63)
* **Kafka at The New York Times** (https://open.nytimes.com/publishing-with-apache-kafka-at-the-new-york-times-7f0e3b7d2077)
* **Kafka at Yelp** (https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html)
* **Kafka at Criteo** (https://medium.com/criteo-labs/upgrading-kafka-on-a-large-infra-3ee99f56e970)
* **Kafka on Kubernetes at Shopify** (https://shopifyengineering.myshopify.com/blogs/engineering/running-apache-kafka-on-kubernetes-at-shopify)
* **Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (2 parts)** (https://engineeringblog.yelp.com/2022/03/kafka-on-paasta-part-two.html)
* **Migrating Kafka's Zookeeper with No Downtime at Yelp** (https://engineeringblog.yelp.com/2019/01/migrating-kafkas-zookeeper-with-no-downtime.html)
* **Reprocessing and Dead Letter Queues with Kafka at Uber** (https://eng.uber.com/reliable-reprocessing/)
* **Chaperone: Audit Kafka End-to-End at Uber** (https://eng.uber.com/chaperone/)
* **Finding Kafka throughput limit in infrastructure at Dropbox** (https://blogs.dropbox.com/tech/2019/01/finding-kafkas-throughput-limit-in-dropbox-infrastructure/)
* **Cost Orchestration at Walmart** (https://medium.com/walmartlabs/cost-orchestration-at-walmart-f34918af67c4)
* **InfluxDB and Kafka to Scale to Over 1 Million Metrics a Second at Hulu** (https://medium.com/hulu-tech-blog/how-hulu-uses-influxdb-and-kafka-to-scale-to-over-1-million-metrics-a-second-17
21476aaff5)
* **Scaling Kafka to Support Data Growth at PayPal** (https://medium.com/paypal-tech/scaling-kafka-to-support-paypals-data-growth-a0b4da420fab)
* **Stream Data Deduplication** (https://en.wikipedia.org/wiki/Data_deduplication)
* **Exactly-once Semantics with Kafka** (https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/)
* **Real-time Deduping at Tapjoy** (http://eng.tapjoy.com/blog-list/real-time-deduping-at-scale)
* **Deduplication at Segment** (https://segment.com/blog/exactly-once-delivery/)
* **Deduplication at Mail.Ru** (https://medium.com/@andrewsumin/efficient-storage-how-we-went-down-from-50-pb-to-32-pb-99f9c61bf6b4)
* **Petabyte Scale Data Deduplication at Mixpanel** (https://medium.com/mixpaneleng/petabyte-scale-data-deduplication-mixpanel-engineering-e808c70c99f8)
⟡ Distributed Logging (https://blog.codinghorror.com/the-problem-with-logging/)
* **Logging at LinkedIn** (https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying)
* **Scalable and Reliable Log Ingestion at Pinterest** (https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
* **High-performance Replicated Log Service at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2015/building-distributedlog-twitter-s-high-performance-replicated-l
og-servic.html)
* **Logging Service with Spark at CERN Accelerator** (https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html)
* **Logging and Aggregation at Quora** (https://engineering.quora.com/Logging-and-Aggregation-at-Quora)
* **Collection and Analysis of Daemon Logs at Badoo** (https://badoo.com/techblog/blog/2016/06/06/collection-and-analysis-of-daemon-logs-at-badoo/)
* **Log Parsing with Static Code Analysis at Palantir** (https://medium.com/palantir/using-static-code-analysis-to-improve-log-parsing-18f0d1843965)
* **Centralized Application Logging at eBay** (https://tech.ebayinc.com/engineering/low-latency-and-high-throughput-cal-ingress/)
* **Enrich VPC Flow Logs at Hyper Scale to provide Network Insight at Netflix** (https://netflixtechblog.com/hyper-scale-vpc-flow-logs-enrichment-to-provide-network-insight-e5f1db02910d)
* **BookKeeper: Distributed Log Storage at Yahoo** (https://yahooeng.tumblr.com/post/109908973316/bookkeeper-yahoos-distributed-log-storage-is)
* **LogDevice: Distributed Data Store for Logs at Facebook** (https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/)
* **LogFeeder: Log Collection System at Yelp** (https://engineeringblog.yelp.com/2018/03/introducing-logfeeder.html)
* **DBLog: Generic Change-Data-Capture Framework at Netflix** (https://medium.com/netflix-techblog/dblog-a-generic-change-data-capture-framework-69351fb9099b)
⟡ Distributed Searching (http://nwds.cs.washington.edu/files/nwds/pdf/Distributed-WR.pdf)
* **Search Architecture at Instagram** (https://instagram-engineering.com/search-architecture-eeb34a936d3a)
* **Search Architecture at eBay** (http://www.cs.otago.ac.nz/homepages/andrew/papers/2017-8.pdf)
* **Search Architecture at Box** (https://medium.com/box-tech-blog/scaling-box-search-using-lumos-22d9e0cb4175)
* **Search Discovery Indexing Platform at Coupang** (https://medium.com/coupang-tech/the-evolution-of-search-discovery-indexing-platform-fa43e41305f9)
* **Universal Search System at Pinterest** (https://medium.com/pinterest-engineering/building-a-universal-search-system-for-pinterest-e4cb03a898d4)
* **Improving Search Engine Efficiency by over 25% at eBay** (https://www.ebayinc.com/stories/blogs/tech/making-e-commerce-search-faster/)
* **Indexing and Querying Telemetry Logs with Lucene at Palantir** (https://medium.com/palantir/indexing-and-querying-telemetry-logs-with-lucene-234c5ce3e5f3)
* **Query Understanding at TripAdvisor** (https://www.tripadvisor.com/engineering/query-understanding-at-tripadvisor/)
* **Search Federation Architecture at LinkedIn (2018)** (https://engineering.linkedin.com/blog/2018/03/search-federation-architecture-at-linkedin)
* **Search at Slack** (https://slack.engineering/search-at-slack-431f8c80619e)
* **Search and Recommendations at DoorDash** (https://blog.doordash.com/powering-search-recommendations-at-doordash-8310c5cfd88c)
* **Stability and Scalability for Search at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2022/stability-and-scalability-for-search)
* **Search Service at Twitter (2014)** (https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html)
* **Autocomplete Search (2 parts) at Traveloka** (https://medium.com/traveloka-engineering/high-quality-autocomplete-search-part-2-d5b15bb0dadf)
* **Data-Driven Autocorrection System at Canva** (https://product.canva.com/building-a-data-driven-autocorrection-system/)
* **Adapting Search to Indian Phonetics at Flipkart** (https://blog.flipkart.tech/adapting-search-to-indian-phonetics-cdbe65259686)
* **Nautilus: Search Engine at Dropbox** (https://blogs.dropbox.com/tech/2018/09/architecture-of-nautilus-the-new-dropbox-search-engine/)
* **Galene: Search Architecture of LinkedIn** (https://engineering.linkedin.com/search/did-you-mean-galene)
* **Manas: High Performing Customized Search System at Pinterest** (https://medium.com/@Pinterest_Engineering/manas-a-high-performing-customized-search-system-cf189f6ca40f)
* **Sherlock: Near Real Time Search Indexing at Flipkart** (https://blog.flipkart.tech/sherlock-near-real-time-search-indexing-95519783859d)
* **Nebula: Storage Platform to Build Search Backends at Airbnb** (https://medium.com/airbnb-engineering/nebula-as-a-storage-platform-to-build-airbnbs-search-backends-ecc577b05f06)
* **ELK (Elasticsearch, Logstash, Kibana) Stack** (https://logz.io/blog/15-tech-companies-chose-elk-stack/)
* **Predictions in Real Time with ELK at Uber** (https://eng.uber.com/elk/)
* **Building a scalable ELK stack at Envato** (https://webuild.envato.com/blog/building-a-scalable-elk-stack/)
* **ELK at Robinhood** (https://robinhood.engineering/taming-elk-4e1349f077c3)
* **Scaling Elasticsearch Clusters at Uber** (https://www.infoq.com/presentations/uber-elasticsearch-clusters?utm_source=presentations_about_Case_Study&utm_medium=link&utm_campaign=Case_Study
)
* **Elasticsearch Performance Tuning Practice at eBay** (https://www.ebayinc.com/stories/blogs/tech/elasticsearch-performance-tuning-practice-at-ebay/)
* **Improve Performance using Elasticsearch Plugins (2 parts) at Tinder** (https://medium.com/tinder-engineering/how-we-improved-our-performance-using-elasticsearch-plugins-part-2-b051da2ee85
b)
* **Elasticsearch at Kickstarter** (https://kickstarter.engineering/elasticsearch-at-kickstarter-db3c487887fc)
* **Log Parsing with Logstash and Google Protocol Buffers at Trivago** (https://tech.trivago.com/2016/01/19/logstash_protobuf_codec/)
* **Fast Order Search using Data Pipeline and Elasticsearch at Yelp** (https://engineeringblog.yelp.com/2018/06/fast-order-search.html)
* **Moving Core Business Search to Elasticsearch at Yelp** (https://engineeringblog.yelp.com/2017/06/moving-yelps-core-business-search-to-elasticsearch.html)
* **Sharding out Elasticsearch at Vinted** (http://engineering.vinted.com/2017/06/05/sharding-out-elasticsearch/)
* **Self-Ranking Search with Elasticsearch at Wattpad** (http://engineering.wattpad.com/post/146216619727/self-ranking-search-with-elasticsearch-at-wattpad)
* **Vulcanizer: a library for operating Elasticsearch at Github** (https://github.blog/2019-03-05-vulcanizer-a-library-for-operating-elasticsearch/)
⟡ Distributed Storage (http://highscalability.com/blog/2011/11/1/finding-the-right-data-solution-for-your-application-in-the.html)
* **In-memory Storage** (https://medium.com/@denisanikin/what-an-in-memory-database-is-and-how-it-persists-data-efficiently-f43868cff4c1)
* **MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar (SQL)** (http://highscalability.com/blog/2012/8/14/memsql-architecture-the-fast-mvcc-inmem-lockfree-codegen-an
d.html)
* **Optimizing Memcached Efficiency at Quora** (https://engineering.quora.com/Optimizing-Memcached-Efficiency)
* **Real-Time Data Warehouse with MemSQL on Cisco UCS** (https://blogs.cisco.com/datacenter/memsql)
* **Moving to MemSQL at Tapjoy** (http://eng.tapjoy.com/blog-list/moving-to-memsql)
* **MemSQL and Kinesis for Real-time Insights at Disney** (https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/68131)
* **MemSQL to Query Hundreds of Billions of Rows in a Dashboard at Pandora** (https://engineering.pandora.com/using-memsql-at-pandora-79a86cb09b57)
* **Object Storage** (http://www.datacenterknowledge.com/archives/2013/10/04/object-storage-the-future-of-scale-out)
* **Scaling HDFS at Uber** (https://eng.uber.com/scaling-hdfs/)
* **Reasons for Choosing S3 over HDFS at Databricks** (https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html)
* **File System on Amazon S3 at Quantcast** (https://www.quantcast.com/blog/quantcast-file-system-on-amazon-s3/)
* **Image Recovery at Scale Using S3 Versioning at Trivago** (https://tech.trivago.com/2018/09/03/efficient-image-recovery-at-scale-using-amazon-s3-versioning/)
* **Cloud Object Store at Yahoo** (https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at)
* **Ambry: Distributed Immutable Object Store at LinkedIn** (https://www.usenix.org/conference/srecon17americas/program/presentation/shenoy)
* **Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity at LinkedIn** (https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-
with-maximum)
* **Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb** (https://medium.com/airbnb-engineering/hammerspace-persistent-concurrent-off-heap-storage-3db39bb04472)
* **MezzFS: Mounting Object Storage in Media Processing Platform at Netflix** (https://medium.com/netflix-techblog/mezzfs-mounting-object-storage-in-netflixs-media-processing-platform-cda01c4
46ba)
* **Magic Pocket: In-house Multi-exabyte Storage System at Dropbox** (https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pocket/)
⟡ Relational Databases (https://www.mysql.com/products/cluster/scalability.html)
* **Building and Deploying MySQL Raft at Meta** (https://engineering.fb.com/2023/05/16/data-infrastructure/mysql-raft-meta/)
* **MySQL for Schema-less Data at FriendFeed** (https://backchannel.org/blog/friendfeed-schemaless-mysql)
* **MySQL at Pinterest** (https://medium.com/@Pinterest_Engineering/learn-to-stop-using-shiny-new-things-and-love-mysql-3e1613c2ce14)
* **PostgreSQL at Twitch** (https://blog.twitch.tv/how-twitch-uses-postgresql-c34aa9e56f58)
* **Scaling MySQL-based Financial Reporting System at Airbnb** (https://medium.com/airbnb-engineering/tracking-the-money-scaling-financial-reporting-at-airbnb-6d742b80f040)
* **Scaling MySQL at Wix** (https://www.wix.engineering/post/scaling-to-100m-mysql-is-a-better-nosql)
* **MaxScale (MySQL) Database Proxy at Airbnb** (https://medium.com/airbnb-engineering/unlocking-horizontal-scalability-in-our-web-serving-tier-d907449cdbcf)
* **Switching from Postgres to MySQL at Uber** (https://eng.uber.com/mysql-migration/)
* **Handling Growth with Postgres at Instagram** (https://engineering.instagram.com/handling-growth-with-postgres-5-tips-from-instagram-d5d7e7ffdfcb)
* **Scaling the Analytics Database (Postgres) at TransferWise** (http://tech.transferwise.com/scaling-our-analytics-database/)
* **Updating a 50 Terabyte PostgreSQL Database at Adyen** (https://medium.com/adyen/updating-a-50-terabyte-postgresql-database-f64384b799e7)
* **Scaling Database Access for 100s of Billions of Queries per Day at PayPal** (https://medium.com/paypal-engineering/scaling-database-access-for-100s-of-billions-of-queries-per-day-paypal-i
ntroducing-hera-e192adacda54)
* **Minimizing Read-Write MySQL Downtime at Yelp** (https://engineeringblog.yelp.com/2020/11/minimizing-read-write-mysql-downtime.html)
* **Migrating MySQL from 5.6 to 8.0 at Facebook** (https://engineering.fb.com/2021/07/22/data-infrastructure/mysql/)
* **Migration from HBase to MyRocks at Quora** (https://quoraengineering.quora.com/Migration-from-HBase-to-MyRocks-at-Quora)
* **Replication** (https://docs.microsoft.com/en-us/sql/relational-databases/replication/types-of-replication)
* **MySQL Parallel Replication (4 parts) at Booking.com** (https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-annex-under-the-hood-eb456cf8b2fb)
* **Mitigating MySQL Replication Lag and Reducing Read Load at Github** (https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/)
* **Read Consistency with Database Replicas at Shopify** (https://shopify.engineering/read-consistency-database-replicas)
* **Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift at Yelp** (https://engineeringblog.yelp.com/2018/04/black-box-auditing.html)
* **Partitioning Main MySQL Database at Airbnb** (https://medium.com/airbnb-engineering/how-we-partitioned-airbnb-s-main-database-in-two-weeks-55f7e006ff21)
* **Herb: Multi-DC Replication Engine for Schemaless Datastore at Uber** (https://eng.uber.com/herb-datacenter-replication/)
* **Sharding** (https://quabase.sei.cmu.edu/mediawiki/index.php/Shard_data_set_across_multiple_servers_(Range-based))
* **Sharding MySQL at Pinterest** (https://medium.com/@Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f)
* **Sharding MySQL at Twilio** (https://www.twilio.com/engineering/2014/06/26/how-we-replaced-our-data-pipeline-with-zero-downtime)
* **Sharding MySQL at Square** (https://medium.com/square-corner-blog/sharding-cash-10280fa3ef3b)
* **Sharding MySQL at Quora** (https://www.quora.com/q/quoraengineering/MySQL-sharding-at-Quora)
* **Sharding Layer of Schemaless Datastore at Uber** (https://eng.uber.com/schemaless-rewrite/)
* **Sharding & IDs at Instagram** (https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c)
* **Sharding Postgres at Notion** (https://www.notion.so/blog/sharding-postgres-at-notion)
* **Solr: Improving Performance for Batch Indexing at Box** (https://blog.box.com/blog/solr-improving-performance-batch-indexing/)
* **Geosharded Recommendations (3 parts) at Tinder** (https://medium.com/tinder-engineering/geosharded-recommendations-part-3-consistency-2d2cb2f0594b)
* **Scaling Services with Shard Manager at Facebook** (https://engineering.fb.com/production-engineering/scaling-services-with-shard-manager/)
* **Presto the Distributed SQL Query Engine** (https://research.fb.com/wp-content/uploads/2019/03/Presto-SQL-on-Everything.pdf?)
* **Presto at Pinterest** (https://medium.com/@Pinterest_Engineering/presto-at-pinterest-a8bda7515e52)
* **Presto Infrastructure at Lyft** (https://eng.lyft.com/presto-infrastructure-at-lyft-b10adb9db01)
* **Presto at Grab** (https://engineering.grab.com/scaling-like-a-boss-with-presto)
* **Engineering Data Analytics with Presto and Apache Parquet at Uber** (https://eng.uber.com/presto/)
* **Data Wrangling at Slack** (https://slack.engineering/data-wrangling-at-slack-f2e0ff633b69)
* **Presto in Big Data Platform on AWS at Netflix** (https://medium.com/netflix-techblog/using-presto-in-our-big-data-platform-on-aws-938035909fd4)
* **Presto Auto Scaling at Eventbrite** (https://www.eventbrite.com/engineering/big-data-workloads-presto-auto-scaling/)
* **Speed Up Presto with Alluxio Local Cache at Uber** (https://www.uber.com/en-MY/blog/speed-up-presto-with-alluxio-local-cache/)
⟡ NoSQL Databases (https://www.thoughtworks.com/insights/blog/nosql-databases-overview)
* **Key-Value Databases** (http://www.cs.ucsb.edu/~agrawal/fall2009/dynamo.pdf)
* **DynamoDB at Nike** (https://medium.com/nikeengineering/becoming-a-nimble-giant-how-dynamo-db-serves-nike-at-scale-4cc375dbb18e)
* **DynamoDB at Segment** (https://segment.com/blog/the-million-dollar-eng-problem/)
* **DynamoDB at Mapbox** (https://blog.mapbox.com/scaling-mapbox-infrastructure-with-dynamodb-streams-d53eabc5e972)
* **Manhattan: Distributed Key-Value Database at Twitter** (https://blog.twitter.com/engineering/en_us/a/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale.html)
* **Sherpa: Distributed NoSQL Key-Value Store at Yahoo** (https://yahooeng.tumblr.com/post/120730204806/sherpa-scales-new-heights)
* **HaloDB: Embedded Key-Value Storage Engine at Yahoo** (https://yahooeng.tumblr.com/post/178262468576/introducing-halodb-a-fast-embedded-key-value)
* **MPH: Fast and Compact Immutable Key-Value Stores at Indeed** (http://engineering.indeedblog.com/blog/2018/02/indeed-mph/)
* **Venice: Distributed Key-Value Database at Linkedin** (https://engineering.linkedin.com/blog/2017/02/building-venice-with-apache-helix)
* **Columnar Databases** (https://aws.amazon.com/nosql/columnar/)
* **Cassandra** (http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)
* **Cassandra at Instagram** (https://www.slideshare.net/DataStax/cassandra-at-instagram-2016)
* **Storing Images in Cassandra at Walmart** (https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593)
* **Storing Messages with Cassandra at Discord** (https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7)
* **Scaling Cassandra Cluster at Walmart** (https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04)
* **Scaling Ad Analytics with Cassandra at Yelp** (https://engineeringblog.yelp.com/2016/08/how-we-scaled-our-ad-analytics-with-cassandra.html)
* **Scaling to 100+ Million Reads/Writes using Spark and Cassandra at Dream11** (https://medium.com/dream11-tech-blog/leaderboard-dream11-4efc6f93c23e)
* **Moving Food Feed from Redis to Cassandra at Zomato** (https://www.zomato.com/blog/how-we-moved-our-food-feed-from-redis-to-cassandra)
* **Benchmarking Cassandra Scalability on AWS at Netflix** (https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e)
* **Service Decomposition at Scale with Cassandra at Intuit QuickBooks** (https://quickbooks-engineering.intuit.com/service-decomposition-at-scale-70405ac2f637)
* **Cassandra for Keeping Counts In Sync at SoundCloud** (https://developers.soundcloud.com/blog/keeping-counts-in-sync)
* **Cassandra Driver Configuration for Improved Performance and Load Balancing at Glassdoor** (https://medium.com/glassdoor-engineering/cassandra-driver-configuration-for-improved-performance
-and-load-balancing-1b0106ce12bb)
* **cstar: Cassandra Orchestration Tool at Spotify** (https://labs.spotify.com/2018/09/04/introducing-cstar-the-spotify-cassandra-orchestration-tool-now-open-source/)
* **HBase** (https://hbase.apache.org/)
* **HBase at Salesforce** (https://engineering.salesforce.com/investing-in-big-data-apache-hbase-b9d98661a66b)
* **HBase in Facebook Messages** (https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919/)
* **HBase in Imgur Notification** (https://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/)
* **Improving HBase Backup Efficiency at Pinterest** (https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954)
* **HBase at Xiaomi** (https://www.slideshare.net/HBaseCon/hbase-practice-at-xiaomi)
* **Redshift** (https://www.allthingsdistributed.com/2018/11/amazon-redshift-performance-optimization.html)
* **Redshift at GIPHY** (https://engineering.giphy.com/scaling-redshift-without-scaling-costs/)
* **Redshift at Hudl** (https://www.hudl.com/bits/the-low-hanging-fruit-of-redshift-performance)
* **Redshift at Drivy** (https://drivy.engineering/redshift_tips_ticks_part_1/)
* **Document Databases** (https://msdn.microsoft.com/en-us/magazine/hh547103.aspx)
* **eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB** (https://www.mongodb.com/blog/post/ebay-building-mission-critical-multi-data-center-applications-with-mongodb
)
* **MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards** (https://www.mongodb.com/blog/post/mongodb-at-baidu-powering-100-apps-across-600-nodes-at-pb-scal
e)
* **Migrating Mongo Data at Addepar** (https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952)
* **The AWS and MongoDB Infrastructure of Parse (acquired by Facebook)** (https://medium.baqend.com/parse-is-gone-a-few-secrets-about-their-infrastructure-91b3ab2fcf71)
* **Migrating Mountains of Mongo Data at Addepar** (https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952)
* **Couchbase Ecosystem at LinkedIn** (https://engineering.linkedin.com/blog/2017/12/couchbase-ecosystem-at-linkedin)
* **SimpleDB at Zendesk** (https://medium.com/zendesk-engineering/resurrecting-amazon-simpledb-9404034ec506)
* **Espresso: Distributed Document Store at LinkedIn** (https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store)
* **Graph Databases** (https://www.eecs.harvard.edu/margo/papers/systor13-bench/)
* **FlockDB: Distributed Graph Database at Twitter** (https://blog.twitter.com/engineering/en_us/a/2010/introducing-flockdb.html)
* **TAO: Distributed Data Store for the Social Graph at Facebook** (https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/11730-atc13-bronson.pdf)
* **Akutan: Distributed Knowledge Graph Store at eBay** (https://tech.ebayinc.com/engineering/akutan-a-distributed-knowledge-graph-store/)
⟡ Time Series Databases (https://www.influxdata.com/time-series-database/)
* **Beringei: High-performance Time Series Storage Engine at Facebook** (https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/)
* **MetricsDB: TimeSeries Database for storing metrics at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/metricsdb.html)
* **Atlas: In-memory Dimensional Time Series Database at Netflix** (https://medium.com/netflix-techblog/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a)
* **Heroic: Time Series Database at Spotify** (https://labs.spotify.com/2015/11/17/monitoring-at-spotify-introducing-heroic/)
* **Roshi: Distributed Storage System for Time-Series Event at SoundCloud** (https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-timestamped-events)
* **Goku: Time Series Database at Pinterest** (https://medium.com/@Pinterest_Engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181)
* **Scaling Time Series Data Storage (2 parts) at Netflix** (https://medium.com/netflix-techblog/scaling-time-series-data-storage-part-ii-d67939655586)
* **Druid - Real-time Analytics Database** (https://druid.apache.org/)
* **Druid at Airbnb** (https://medium.com/airbnb-engineering/druid-airbnb-data-platform-601c312f2a4c)
* **Druid at Walmart** (https://medium.com/walmartlabs/event-stream-analytics-at-walmart-with-druid-dcf1a37ceda7)
* **Druid at eBay** (https://tech.ebayinc.com/engineering/monitoring-at-ebay-with-druid/)
* **Druid at Netflix** (https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06)
⟡ Distributed Repositories, Dependencies, and Configurations Management (https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/)
* **DGit: Distributed Git at Github** (https://githubengineering.com/introducing-dgit/)
* **Stemma: Distributed Git Server at Palantir** (https://medium.com/@palantir/stemma-distributed-git-server-70afbca0fc29)
* **Configuration Management for Distributed Systems at Flickr** (https://code.flickr.net/2016/03/24/configuration-management-for-distributed-systems-using-github-and-cfg4j/)
* **Git Repository at Microsoft** (https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-repo-on-the-planet/)
* **Solve Git Problem with Large Repositories at Microsoft** (https://www.infoq.com/news/2017/02/GVFS)
* **Single Repository at Google** (https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext)
* **Scaling Infrastructure and (Git) Workflow at Adyen** (https://medium.com/adyen/from-0-100-billion-scaling-infrastructure-and-workflow-at-adyen-7b63b690dfb6)
* **Dotfiles Distribution at Booking.com** (https://medium.com/booking-com-infrastructure/dotfiles-distribution-dedb69c66a75)
* **Secret Detector: Preventing Secrets in Source Code at Yelp** (https://engineeringblog.yelp.com/2018/06/yelps-secret-detector.html)
* **Managing Software Dependency at Scale at LinkedIn** (https://engineering.linkedin.com/blog/2018/09/managing-software-dependency-at-scale)
* **Merging Code in High-velocity Repositories at LinkedIn** (https://engineering.linkedin.com/blog/2020/continuous-integration)
* **Dynamic Configuration at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/dynamic-configuration-at-twitter.html)
* **Dynamic Configuration at Mixpanel** (https://medium.com/mixpaneleng/dynamic-configuration-at-mixpanel-94bfcf97d6b8)
* **Dynamic Configuration at GoDaddy** (https://sg.godaddy.com/engineering/2019/03/06/dynamic-configuration-for-nodejs/)
⟡ Scaling Continuous Integration and Continuous Delivery (https://www.synopsys.com/blogs/software-security/agile-cicd-devops-glossary/)
* **Continuous Integration Stack at Facebook** (https://code.fb.com/web/rapid-release-at-massive-scale/)
* **Continuous Integration with Distributed Repositories and Dependencies at Netflix** (https://medium.com/netflix-techblog/towards-true-continuous-integration-distributed-repositories-and-de
pendencies-2a2e3108c051)
* **Continuous Integration and Deployment with Bazel at Dropbox** (https://blogs.dropbox.com/tech/2019/12/continuous-integration-and-deployment-with-bazel/)
* **Continuous Deployments at BuzzFeed** (https://tech.buzzfeed.com/continuous-deployments-at-buzzfeed-d171f76c1ac4)
* **Screwdriver: Continuous Delivery Build System for Dynamic Infrastructure at Yahoo** (https://yahooeng.tumblr.com/post/155765242061/open-sourcing-screwdriver-yahoos-continuous)
* **CI/CD at Betterment** (https://www.betterment.com/resources/ci-cd-shortening-the-feedback-loop/)
* **CI/CD at Brainly** (https://medium.com/engineering-brainly/ci-cd-at-scale-fdfb0f49e031)
* **Scaling iOS CI with Anka at Shopify** (https://engineering.shopify.com/blogs/engineering/scaling-ios-ci-with-anka)
* **Scaling Jira Server at Yelp** (https://engineeringblog.yelp.com/2019/04/Scaling-Jira-Server-Administration-For-The-Enterprise.html)
* **Auto-scaling CI/CD cluster at Flexport** (https://flexport.engineering/how-flexport-halved-testing-costs-with-an-auto-scaling-ci-cd-cluster-8304297222f)
Availability
⟡ Resilience Engineering: Learning to Embrace Failure (https://queue.acm.org/detail.cfm?id=2371297)
* **Resilience Engineering with Project Waterbear at LinkedIn** (https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear)
* **Resiliency against Traffic Oversaturation at iHeartRadio** (https://tech.iheart.com/resiliency-against-traffic-oversaturation-77c5ed92a5fb)
* **Resiliency in Distributed Systems at GO-JEK** (https://blog.gojekengineering.com/resiliency-in-distributed-systems-efd30f74baf4)
* **Practical NoSQL Resilience Design Pattern for the Enterprise at eBay** (https://www.ebayinc.com/stories/blogs/tech/practical-nosql-resilience-design-pattern-for-the-enterprise/)
* **Ensuring Resilience to Disaster at Quora** (https://engineering.quora.com/Ensuring-Quoras-Resilience-to-Disaster)
* **Site Resiliency at Expedia** (https://www.infoq.com/presentations/expedia-website-resiliency?utm_source=presentations_about_Case_Study&utm_medium=link&utm_campaign=Case_Study)
* **Resiliency and Disaster Recovery with Kafka at eBay** (https://tech.ebayinc.com/engineering/resiliency-and-disaster-recovery-with-kafka/)
* **Disaster Recovery for Multi-Region Kafka at Uber** (https://eng.uber.com/kafka/)
⟡ Failover (http://cloudpatterns.org/mechanisms/failover_system)
* **The Evolution of Global Traffic Routing and Failover** (https://www.usenix.org/conference/srecon16/program/presentation/heady)
* **Testing for Disaster Recovery Failover Testing** (https://www.usenix.org/conference/srecon17asia/program/presentation/liu_zehua)
* **Designing a Microservices Architecture for Failure** (https://blog.risingstack.com/designing-microservices-architecture-for-failure/)
* **ELB for Automatic Failover at GoSquared** (https://engineering.gosquared.com/use-elb-automatic-failover)
* **Eliminate the Database for Higher Availability at American Express** (http://americanexpress.io/eliminate-the-database-for-higher-availability/)
* **Failover with Redis Sentinel at Vinted** (http://engineering.vinted.com/2015/09/03/failover-with-redis-sentinel/)
* **High-availability SaaS Infrastructure at FreeAgent** (http://engineering.freeagent.com/2017/02/06/ha-infrastructure-without-breaking-the-bank/)
* **MySQL High Availability at GitHub** (https://github.blog/2018-06-20-mysql-high-availability-at-github/)
* **MySQL High Availability at Eventbrite** (https://www.eventbrite.com/engineering/mysql-high-availability-at-eventbrite/)
* **Business Continuity & Disaster Recovery at Walmart** (https://medium.com/walmartlabs/business-continuity-disaster-recovery-in-the-microservices-world-ef2adca363df)
⟡ Load Balancing (https://blog.vivekpanyam.com/scaling-a-web-service-load-balancing/)
* **Introduction to Modern Network Load Balancing and Proxying** (https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236)
* **Top Five (Load Balancing) Scalability Patterns** (https://www.f5.com/company/blog/top-five-scalability-patterns)
* **Load Balancing infrastructure to support more than 1.3 billion users at Facebook** (https://www.usenix.org/conference/srecon15europe/program/presentation/shuff)
* **DHCPLB: DHCP Load Balancer at Facebook** (https://code.facebook.com/posts/1734309626831603/dhcplb-an-open-source-load-balancer/)
* **Katran: Scalable Network Load Balancer at Facebook** (https://code.facebook.com/posts/1906146702752923/open-sourcing-katran-a-scalable-network-load-balancer/)
* **Deterministic Aperture: A Distributed, Load Balancing Algorithm at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html)
* **Load Balancing with Eureka at Netflix** (https://medium.com/netflix-techblog/netflix-shares-cloud-load-balancing-and-failover-tool-eureka-c10647ef95e5)
* **Edge Load Balancing at Netflix** (https://medium.com/netflix-techblog/netflix-edge-load-balancing-695308b5548c)
* **Zuul 2: Cloud Gateway at Netflix** (https://medium.com/netflix-techblog/open-sourcing-zuul-2-82ea476cb2b3)
* **Load Balancing at Yelp** (https://engineeringblog.yelp.com/2017/05/taking-zero-downtime-load-balancing-even-further.html)
* **Load Balancing at Github** (https://githubengineering.com/introducing-glb/)
* **Consistent Hashing to Improve Load Balancing at Vimeo** (https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed)
* **UDP Load Balancing at 500 pixel** (https://developers.500px.com/udp-load-balancing-with-keepalived-167382d7ad08)
* **QALM: QoS Load Management Framework at Uber** (https://eng.uber.com/qalm/)
* **Traffic Steering using Rum DNS at LinkedIn** (https://www.usenix.org/conference/srecon17europe/program/presentation/rastogi)
* **Traffic Infrastructure (Edge Network) at Dropbox** (https://blogs.dropbox.com/tech/2018/10/dropbox-traffic-infrastructure-edge-network/)
* **Intelligent DNS based load balancing at Dropbox** (https://blogs.dropbox.com/tech/2020/01/intelligent-dns-based-load-balancing-at-dropbox/)
* **Monitor DNS systems at Stripe** (https://stripe.com/en-sg/blog/secret-life-of-dns)
* **Multi-DNS Architecture (3 parts) at Monday** (https://medium.com/monday-engineering/how-and-why-we-migrated-our-dns-from-cloudflare-to-a-multi-dns-architecture-part-3-584a470f4062)
* **Dynamic Anycast DNS Infrastructure at Hulu** (https://medium.com/hulu-tech-blog/building-hulus-dynamic-anycast-dns-infrastructure-985a7a11fd30)
⟡ Rate Limiting (https://www.keycdn.com/support/rate-limiting/)
* **Rate Limiting for Scaling to Millions of Domains at Cloudflare** (https://blog.cloudflare.com/counting-things-a-lot-of-different-things/)
* **Cloud Bouncer: Distributed Rate Limiting at Yahoo** (https://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate-limiting-at-yahoo)
* **Scaling API with Rate Limiters at Stripe** (https://stripe.com/blog/rate-limiters)
* **Distributed Rate Limiting at Allegro** (https://allegro.tech/2017/04/hermes-max-rate.html)
* **Ratequeue: Core Queueing-And-Rate-Limiting System at Twilio** (https://www.twilio.com/blog/2017/11/chaos-engineering-ratequeue-ha.html)
* **Quotas Service at Grab** (https://engineering.grab.com/quotas-service)
⟡ Autoscaling (https://medium.com/@BotmetricHQ/top-11-hard-won-lessons-learned-about-aws-auto-scaling-5bfe56da755f)
* **Autoscaling Pinterest** (https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64)
* **Autoscaling Based on Request Queuing at Square** (https://medium.com/square-corner-blog/autoscaling-based-on-request-queuing-c4c0f57f860f)
* **Autoscaling Jenkins at Trivago** (http://tech.trivago.com/2017/02/17/your-definite-guide-for-autoscaling-jenkins/)
* **Autoscaling Pub-Sub Consumers at Spotify** (https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/)
* **Autoscaling Bigtable Clusters based on CPU Load at Spotify** (https://labs.spotify.com/2018/12/18/bigtable-autoscaler-saving-money-and-time-using-managed-storage/)
* **Autoscaling AWS Step Functions Activities at Yelp** (https://engineeringblog.yelp.com/2019/06/autoscaling-aws-step-functions-activities.html)
* **Scryer: Predictive Auto Scaling Engine at Netflix** (https://medium.com/netflix-techblog/scryer-netflixs-predictive-auto-scaling-engine-a3f8fc922270)
* **Bouncer: Simple AWS Auto Scaling Rollovers at Palantir** (https://medium.com/palantir/bouncer-simple-aws-auto-scaling-rollovers-c5af601d65d4)
* **Clusterman: Autoscaling Mesos Clusters at Yelp** (https://engineeringblog.yelp.com/2019/02/autoscaling-mesos-clusters-with-clusterman.html)
⟡ Availability in Globally Distributed Storage Systems at Google (http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36737.pdf)
⟡ NodeJS High Availability at Yahoo (https://yahooeng.tumblr.com/post/68823943185/nodejs-high-availability)
⟡ Operations (11 parts) at LinkedIn (https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason)
⟡ Monitoring Powers High Availability for LinkedIn Feed (https://www.usenix.org/conference/srecon17americas/program/presentation/barot)
⟡ Supporting Global Events at Facebook (https://code.facebook.com/posts/166966743929963/how-production-engineers-support-global-events-on-facebook/)
⟡ High Availability at BlaBlaCar (https://medium.com/blablacar-tech/the-expendables-backends-high-availability-at-blablacar-8cea3b95b26b)
⟡ High Availability at Netflix (https://medium.com/@NetflixTechBlog/tips-for-high-availability-be0472f2599c)
⟡ High Availability Cloud Infrastructure at Twilio (https://www.twilio.com/engineering/2011/12/12/scaling-high-availablity-infrastructure-in-cloud)
⟡ Automating Datacenter Operations at Dropbox (https://blogs.dropbox.com/tech/2019/01/automating-datacenter-operations-at-dropbox/)
⟡ Globalizing Player Accounts at Riot Games (https://technology.riotgames.com/news/globalizing-player-accounts)
Stability
⟡ Circuit Breaker (https://martinfowler.com/bliki/CircuitBreaker.html)
* **Circuit Breaking in Distributed Systems** (https://www.infoq.com/presentations/circuit-breaking-distributed-systems)
* **Circuit Breaker for Scaling Containers** (https://f5.com/about-us/blog/articles/the-art-of-scaling-containers-circuit-breakers-28919)
* **Lessons in Resilience at SoundCloud** (https://developers.soundcloud.com/blog/lessons-in-resilience-at-SoundCloud)
* **Protector: Circuit Breaker for Time Series Databases at Trivago** (http://tech.trivago.com/2016/02/23/protector/)
* **Improved Production Stability with Circuit Breakers at Heroku** (https://blog.heroku.com/improved-production-stability-with-circuit-breakers)
* **Circuit Breaker at Zendesk** (https://medium.com/zendesk-engineering/the-joys-of-circuit-breaking-ee6584acd687)
* **Circuit Breaker at Traveloka** (https://medium.com/traveloka-engineering/circuit-breakers-dont-let-your-dependencies-bring-you-down-5ba1c5cf1eec)
* **Circuit Breaker at Shopify** (https://shopify.engineering/circuit-breaker-misconfigured)
⟡ Timeouts (https://www.javaworld.com/article/2824163/application-performance/stability-patterns-applied-in-a-restful-architecture.html)
* **Fault Tolerance (Timeouts and Retries, Thread Separation, Semaphores, Circuit Breakers) at Netflix** (https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-syst
em-91ab4faae74a)
* **Enforce Timeout: A Reliability Methodology at DoorDash** (https://doordash.engineering/2018/12/21/enforce-timeout-a-doordash-reliability-methodology/)
* **Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled at eBay** (https://www.ebayinc.com/stories/blogs/tech/a-vip-connection-timeout-issue-caused-by-snat-and-tcp-tw-recyc
le/)
⟡ Crash-safe Replication for MySQL at Booking.com (https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f)
⟡ Bulkheads: Partition and Tolerate Failure in One Part (https://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html)
⟡ Steady State: Always Put Logs on Separate Disk (https://docs.microsoft.com/en-us/sql/relational-databases/policy-based-management/place-data-and-log-files-on-separate-drives)
⟡ Throttling: Maintain a Steady Pace (http://www.sosp.org/2001/papers/welsh.pdf)
⟡ Multi-Clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn
(https://engineering.linkedin.com/blog/2017/11/improving-resiliency-and-stability-of-a-large-scale-api)
⟡ Determinism (4 parts) in League of Legends Server (https://engineering.riotgames.com/news/determinism-league-legends-fixing-divergences)
Performance
⟡ Performance Optimization on OS, Storage, Database, Network (https://stackify.com/application-performance-metrics/)
* **Improving Performance with Background Data Prefetching at Instagram** (https://engineering.instagram.com/improving-performance-with-background-data-prefetching-b191acb39898)
* **Fixing Linux filesystem performance regressions at LinkedIn** (https://engineering.linkedin.com/blog/2020/fixing-linux-filesystem-performance-regressions)
* **Compression Techniques to Solve Network I/O Bottlenecks at eBay** (https://www.ebayinc.com/stories/blogs/tech/how-ebays-shopping-cart-used-compression-techniques-to-solve-network-io-bottl
enecks/)
* **Optimizing Web Servers for High Throughput and Low Latency at Dropbox** (https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/)
* **Linux Performance Analysis in 60.000 Milliseconds at Netflix** (https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55)
* **Live Downsizing Google Cloud Persistent Disks (PD-SSD) at Mixpanel** (https://engineering.mixpanel.com/2018/07/31/live-downsizing-google-cloud-pds-for-fun-and-profit/)
* **Decreasing RAM Usage by 40% Using jemalloc with Python & Celery at Zapier** (https://zapier.com/engineering/celery-python-jemalloc/)
* **Reducing Memory Footprint at Slack** (https://slack.engineering/reducing-slacks-memory-footprint-4480fec7e8eb)
* **Continuous Load Testing at Slack** (https://slack.engineering/continuous-load-testing/)
* **Performance Improvements at Pinterest** (https://medium.com/@Pinterest_Engineering/driving-user-growth-with-performance-improvements-cfc50dafadd7)
* **Server Side Rendering at Wix** (https://www.youtube.com/watch?v=f9xI2jR71Ms)
* **30x Performance Improvements on MySQLStreamer at Yelp** (https://engineeringblog.yelp.com/2018/02/making-30x-performance-improvements-on-yelps-mysqlstreamer.html)
* **Optimizing APIs at Netflix** (https://medium.com/netflix-techblog/optimizing-the-netflix-api-5c9ac715cf19)
* **Performance Monitoring with Riemann and Clojure at Walmart** (https://medium.com/walmartlabs/performance-monitoring-with-riemann-and-clojure-eafc07fcd375)
* **Performance Tracking Dashboard for Live Games at Zynga** (https://www.zynga.com/blogs/engineering/live-games-have-evolving-performance)
* **Optimizing CAL Report Hadoop MapReduce Jobs at eBay** (https://www.ebayinc.com/stories/blogs/tech/optimization-of-cal-report-hadoop-mapreduce-job/)
* **Performance Tuning on Quartz Scheduler at eBay** (https://www.ebayinc.com/stories/blogs/tech/performance-tuning-on-quartz-scheduler/)
* **Profiling C++ (Part 1: Optimization, Part 2: Measurement and Analysis) at Riot Games** (https://engineering.riotgames.com/news/profiling-optimisation)
* **Profiling React Server-Side Rendering at HomeAway** (https://medium.com/homeaway-tech-blog/profiling-react-server-side-rendering-to-free-the-node-js-event-loop-7f0fe455a901)
* **Hardware-Assisted Video Transcoding at Dailymotion** (https://medium.com/dailymotion-engineering/hardware-assisted-video-transcoding-at-dailymotion-66cd2db448ae)
* **Cross Shard Transactions at 10 Million RPS at Dropbox** (https://blogs.dropbox.com/tech/2018/11/cross-shard-transactions-at-10-million-requests-per-second/)
* **API Profiling at Pinterest** (https://medium.com/@Pinterest_Engineering/api-profiling-at-pinterest-6fa9333b4961)
* **Pagelets Parallelize Server-side Processing at Yelp** (https://engineeringblog.yelp.com/2017/07/generating-web-pages-in-parallel-with-pagelets.html)
* **Improving key expiration in Redis at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/improving-key-expiration-in-redis.html)
* **Ad Delivery Network Performance Optimization with Flame Graphs at MindGeek** (https://medium.com/mindgeek-engineering-blog/ad-delivery-network-performance-optimization-with-flame-graphs-b
c550cf59cf7)
* **Predictive CPU isolation of containers at Netflix** (https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7)
* **Improving HDFS I/O Utilization for Efficiency at Uber** (https://eng.uber.com/improving-hdfs-i-o-utilization-for-efficiency/)
* **Cloud Jewels: Estimating kWh in the Cloud at Etsy** (https://codeascraft.com/2020/04/23/cloud-jewels-estimating-kwh-in-the-cloud/)
* **Unthrottled: Fixing CPU Limits in the Cloud (2 parts) at Indeed** (https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/)
⟡ Performance Optimization by Tuning Garbage Collection (https://confluence.atlassian.com/enterprise/garbage-collection-gc-tuning-guide-461504616.html)
* **Garbage Collection in Java Applications at LinkedIn** (https://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-application
s)
* **Garbage Collection in High-Throughput, Low-Latency Machine Learning Services at Adobe** (https://medium.com/adobetech/engineering-high-throughput-low-latency-machine-learning-services-7d4
5edac0271)
* **Garbage Collection in Redux Applications at SoundCloud** (https://developers.soundcloud.com/blog/garbage-collection-in-redux-applications)
* **Garbage Collection in Go Application at Twitch** (https://blog.twitch.tv/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2)
* **Analyzing V8 Garbage Collection Logs at Alibaba** (https://www.linux.com/blog/can-nodejs-scale-ask-team-alibaba)
* **Python Garbage Collection for Dropping 50% Memory Growth Per Request at Instagram** (https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf)
* **Performance Impact of Removing Out of Band Garbage Collector (OOBGC) at Github** (https://githubengineering.com/removing-oobgc/)
* **Debugging Java Memory Leaks at Allegro** (https://allegro.tech/2018/05/a-comedy-of-errors-debugging-java-memory-leaks.html)
* **Optimizing JVM at Alibaba** (https://www.youtube.com/watch?v=X4tmr3nhZRg)
* **Tuning JVM Memory for Large-scale Services at Uber** (https://eng.uber.com/jvm-tuning-garbage-collection/)
* **Solr Performance Tuning at Walmart** (https://medium.com/walmartglobaltech/solr-performance-tuning-beb7d0d0f8d9)
* **Memory Tuning a High Throughput Microservice at Flipkart** (https://blog.flipkart.tech/memory-tuning-a-high-throughput-microservice-ed57b3e60997)
⟡ Performance Optimization on Image, Video, Page Load (https://developers.google.com/web/fundamentals/performance/why-performance-matters/)
* **Optimizing 360 Photos at Scale at Facebook** (https://code.facebook.com/posts/129055711052260/optimizing-360-photos-at-scale/)
* **Reducing Image File Size in the Photos Infrastructure at Etsy** (https://codeascraft.com/2017/05/30/reducing-image-file-size-at-etsy/)
* **Improving GIF Performance at Pinterest** (https://medium.com/@Pinterest_Engineering/improving-gif-performance-on-pinterest-8dad74bf92f1)
* **Optimizing Video Playback Performance at Pinterest** (https://medium.com/@Pinterest_Engineering/optimizing-video-playback-performance-caf55ce310d1)
* **Optimizing Video Stream for Low Bandwidth with Dynamic Optimizer at Netflix** (https://medium.com/netflix-techblog/optimized-shot-based-encodes-now-streaming-4b9464204830)
* **Adaptive Video Streaming at YouTube** (https://youtube-eng.googleblog.com/2018/04/making-high-quality-video-efficient.html)
* **Reducing Video Loading Time at Dailymotion** (https://medium.com/dailymotion/reducing-video-loading-time-fa9c997a2294)
* **Improving Homepage Performance at Zillow** (https://www.zillow.com/engineering/improving-homepage-performance/)
* **The Process of Optimizing for Client Performance at Expedia** (https://medium.com/expedia-engineering/go-fast-or-go-home-the-process-of-optimizing-for-client-performance-57bb497402e)
* **Web Performance at BBC** (https://medium.com/bbc-design-engineering/bbc-world-service-web-performance-26b08f7abfcc)
⟡ Performance Optimization by Brotli Compression (https://blogs.akamai.com/2016/02/understanding-brotlis-potential.html)
* **Boosting Site Speed Using Brotli Compression at LinkedIn** (https://engineering.linkedin.com/blog/2017/05/boosting-site-speed-using-brotli-compression)
* **Brotli at Booking.com** (https://medium.com/booking-com-development/bookings-journey-with-brotli-978b249d34f3)
* **Brotli at Treebo** (https://tech.treebo.com/a-tale-of-brotli-compression-bcb071d9780a)
* **Deploying Brotli for Static Content at Dropbox** (https://dropbox.tech/infrastructure/deploying-brotli-for-static-content)
* **Progressive Enhancement with Brotli at Yelp** (https://engineeringblog.yelp.com/2017/07/progressive-enhancement-with-brotli.html)
* **Speeding Up Redis with Compression at Doordash** (https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/)
⟡ Performance Optimization on Languages and Frameworks (https://www.techempower.com/benchmarks/)
* **Python at Netflix** (https://netflixtechblog.com/python-at-netflix-bba45dae649e)
* **Python at scale (3 parts) at Instagram** (https://instagram-engineering.com/python-at-scale-strict-modules-c0bb9245c834)
* **OCaml best practices (2 parts) at Issuu** (https://engineering.issuu.com/2018/12/10/our-current-ocaml-best-practices-part-2)
* **PHP at Slack** (https://slack.engineering/taking-php-seriously-cf7a60065329)
* **Go at Trivago** (https://tech.trivago.com/2020/03/02/why-we-chose-go/)
* **TypeScript at Etsy** (https://codeascraft.com/2021/11/08/etsys-journey-to-typescript/)
* **Kotlin for taming state at Etsy** (https://www.etsy.com/sg-en/codeascraft/sealed-classes-opened-my-mind)
* **BPF and Go at Bumble** (https://medium.com/bumble-tech/bpf-and-go-modern-forms-of-introspection-in-linux-6b9802682223)
* **Ruby on Rails at GitLab** (https://medium.com/gitlab-magazine/why-we-use-ruby-on-rails-to-build-gitlab-601dce4a7a38)
* **Rust in production at Figma** (https://medium.com/figma-design/rust-in-production-at-figma-e10a0ec31929)
* **Choosing a Language Stack at WeWork** (https://engineering.wework.com/choosing-a-language-stack-cac3726928f6)
* **Switching from Go to Rust at Discord** (https://blog.discord.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f)
* **ASP.NET Core Performance Optimization at Agoda** (https://medium.com/agoda-engineering/happy-asp-net-core-performance-optimization-4e21a383d299)
* **Data Race Patterns in Go at Uber** (https://eng.uber.com/data-race-patterns-in-go/)
Intelligence
⟡ Big Data (https://insights.sei.cmu.edu/sei_blog/2017/05/reference-architectures-for-big-data-systems.html)
* **Data Platform at Uber** (https://eng.uber.com/uber-big-data-platform/)
* **Data Platform at BMW** (https://www.unibw.de/code/events-u/jt-2018-workshops/ws3_bigdata_vortrag_widmann.pdf)
* **Data Platform at Netflix** (https://www.youtube.com/watch?v=CSDIThSwA7s)
* **Data Platform at Flipkart** (https://blog.flipkart.tech/overview-of-flipkart-data-platform-20c6d3e9a196)
* **Data Platform at Coupang** (https://medium.com/coupang-tech/evolving-the-coupang-data-platform-308e305a9c45)
* **Data Platform at DoorDash** (https://doordash.engineering/2020/09/25/how-doordash-is-scaling-its-data-platform/)
* **Data Platform at Khan Academy** (http://engineering.khanacademy.org/posts/khanalytics.htm)
* **Data Infrastructure at Airbnb** (https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c)
* **Data Infrastructure at LinkedIn** (https://www.infoq.com/presentations/big-data-infrastructure-linkedin)
* **Data Infrastructure at GO-JEK** (https://blog.gojekengineering.com/data-infrastructure-at-go-jek-cd4dc8cbd929)
* **Data Ingestion Infrastructure at Pinterest** (https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
* **Data Analytics Architecture at Pinterest** (https://medium.com/@Pinterest_Engineering/behind-the-pins-building-analytics-f7b508cdacab)
* **Data Orchestration Service at Spotify** (https://engineering.atspotify.com/2022/03/why-we-switched-our-data-orchestration-service/)
* **Big Data Processing (2 parts) at Spotify** (https://labs.spotify.com/2017/10/23/big-data-processing-at-spotify-the-road-to-scio-part-2/)
* **Big Data Processing at Uber** (https://cdn.oreillystatic.com/en/assets/1/event/160/Big%20data%20processing%20with%20Hadoop%20and%20Spark%2C%20the%20Uber%20way%20Presentation.pdf)
* **Analytics Pipeline at Lyft** (https://cdn.oreillystatic.com/en/assets/1/event/269/Lyft_s%20analytics%20pipeline_%20From%20Redshift%20to%20Apache%20Hive%20and%20Presto%20Presentation.pdf)
* **Analytics Pipeline at Grammarly** (https://tech.grammarly.com/blog/building-a-versatile-analytics-pipeline-on-top-of-apache-spark)
* **Analytics Pipeline at Teads** (https://medium.com/teads-engineering/give-meaning-to-100-billion-analytics-events-a-day-d6ba09aa8f44)
* **ML Data Pipelines for Real-Time Fraud Prevention at PayPal** (https://www.infoq.com/presentations/paypal-ml-fraud-prevention-2018)
* **Big Data Analytics and ML Techniques at LinkedIn** (https://cdn.oreillystatic.com/en/assets/1/event/269/Big%20data%20analytics%20and%20machine%20learning%20techniques%20to%20drive%20and%2
0grow%20business%20Presentation%201.pdf)
* **Self-Serve Reporting Platform on Hadoop at LinkedIn** (https://cdn.oreillystatic.com/en/assets/1/event/137/Building%20a%20self-serve%20real-time%20reporting%20platform%20at%20LinkedIn%20P
resentation%201.pdf)
* **Privacy-Preserving Analytics and Reporting at LinkedIn** (https://engineering.linkedin.com/blog/2019/04/privacy-preserving-analytics-and-reporting-at-linkedin)
* **Analytics Platform for Tracking Item Availability at Walmart** (https://medium.com/walmartlabs/how-we-build-a-robust-analytics-platform-using-spark-kafka-and-cassandra-lambda-architecture
-70c2d1bc8981)
* **Real-Time Analytics for Mobile App Crashes using Apache Pinot at Uber** (https://www.uber.com/en-SG/blog/real-time-analytics-for-mobile-app-crashes/)
* **HALO: Hardware Analytics and Lifecycle Optimization at Facebook** (https://code.fb.com/data-center-engineering/hardware-analytics-and-lifecycle-optimization-halo-at-facebook/)
* **RBEA: Real-time Analytics Platform at King** (https://techblog.king.com/rbea-scalable-real-time-analytics-king/)
* **AresDB: GPU-Powered Real-time Analytics Engine at Uber** (https://eng.uber.com/aresdb/)
* **AthenaX: Streaming Analytics Platform at Uber** (https://eng.uber.com/athenax/)
* **Jupiter: Config Driven Adtech Batch Ingestion Platform at Uber** (https://www.uber.com/en-SG/blog/jupiter-batch-ingestion-platform/)
* **Delta: Data Synchronization and Enrichment Platform at Netflix** (https://medium.com/netflix-techblog/delta-a-data-synchronization-and-enrichment-platform-e82c36a79aee)
* **Keystone: Real-time Stream Processing Platform at Netflix** (https://medium.com/netflix-techblog/keystone-real-time-stream-processing-platform-a3ee651812a)
* **Databook: Turning Big Data into Knowledge with Metadata at Uber** (https://eng.uber.com/databook/)
* **Amundsen: Data Discovery & Metadata Engine at Lyft** (https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9)
* **Maze: Funnel Visualization Platform at Uber** (https://eng.uber.com/maze/)
* **Metacat: Making Big Data Discoverable and Meaningful at Netflix** (https://medium.com/netflix-techblog/metacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520)
* **SpinalTap: Change Data Capture System at Airbnb** (https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f)
* **Accelerator: Fast Data Processing Framework at eBay** (https://www.ebayinc.com/stories/blogs/tech/announcing-the-accelerator-processing-1-000-000-000-lines-per-second-on-a-single-computer
/)
* **Omid: Transaction Processing Platform at Yahoo** (https://yahooeng.tumblr.com/post/180867271141/a-new-chapter-for-omid)
* **TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo** (https://yahooeng.tumblr.com/post/157196488076/open-sourcing-tensorflowonspark-distributed-deep)
* **CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo** (https://yahooeng.tumblr.com/post/139916828451/caffeonspark-open-sourced-for-distributed-deep)
* **Spark on Scala: Analytics Reference Architecture at Adobe** (https://medium.com/adobetech/spark-on-scala-adobe-analytics-reference-architecture-7457f5614b4c)
* **Experimentation Platform (2 parts) at Spotify** (https://engineering.atspotify.com/2020/11/02/spotifys-new-experimentation-platform-part-2/)
* **Experimentation Platform at Airbnb** (https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-23fd17c91166)
* **Smart Product Platform at Zalando** (https://jobs.zalando.com/tech/blog/zalando-smart-product-platform/?gh_src=4n3gxh1)
* **Log Analysis Platform at LINE** (https://www.slideshare.net/wyukawa/strata2017-sg)
* **Data Visualisation Platform at Myntra** (https://medium.com/myntra-engineering/universal-dashboarding-platform-udp-data-visualisation-platform-at-myntra-5f2522fcf72d)
* **Building and Scaling Data Lineage at Netflix** (https://medium.com/netflix-techblog/building-and-scaling-data-lineage-at-netflix-to-improve-data-infrastructure-reliability-and-1a52526a797
7)
* **Building a scalable data management system for computer vision tasks at Pinterest** (https://medium.com/@Pinterest_Engineering/building-a-scalable-data-management-system-for-computer-visi
on-tasks-a6dee8f1c580)
* **Structured Data at Etsy** (https://codeascraft.com/2019/07/31/an-introduction-to-structured-data-at-etsy/)
* **Scaling a Mature Data Pipeline - Managing Overhead at Airbnb** (https://medium.com/airbnb-engineering/scaling-a-mature-data-pipeline-managing-overhead-f34835cbc866)
* **Spark Partitioning Strategies at Airbnb** (https://medium.com/airbnb-engineering/on-spark-hive-and-small-files-an-in-depth-look-at-spark-partitioning-strategies-a9a364f908)
* **Scaling the Hadoop Distributed File System at LinkedIn** (https://engineering.linkedin.com/blog/2021/the-exabyte-club--linkedin-s-journey-of-scaling-the-hadoop-distr)
* **Scaling Hadoop YARN cluster beyond 10,000 nodes at LinkedIn** (https://engineering.linkedin.com/blog/2021/scaling-linkedin-s-hadoop-yarn-cluster-beyond-10-000-nodes)
* **Scaling Big Data Access Controls at Pinterest** (https://medium.com/pinterest-engineering/securely-scaling-big-data-access-controls-at-pinterest-bbc3406a1695)
⟡ Distributed Machine Learning (https://www.csie.ntu.edu.tw/~cjlin/talks/bigdata-bilbao.pdf)
* **Machine Learning Platform at Uber** (https://eng.uber.com/michelangelo/)
* **Machine Learning Platform at Yelp** (https://engineeringblog.yelp.com/2020/07/ML-platform-overview.html)
* **Machine Learning Platform at Etsy** (https://codeascraft.com/2021/12/21/redesigning-etsys-machine-learning-platform/)
* **Machine Learning Platform at Zalando** (https://engineering.zalando.com/posts/2022/04/zalando-machine-learning-platform.html)
* **Recommendation System at Lyft** (https://eng.lyft.com/the-recommendation-system-at-lyft-67bc9dcc1793)
* **Platform for Serving Recommendations at Etsy** (https://www.etsy.com/sg-en/codeascraft/building-a-platform-for-serving-recommendations-at-etsy)
* **Infrastructure to Run User Forecasts at Spotify** (https://engineering.atspotify.com/2022/06/how-we-built-infrastructure-to-run-user-forecasts-at-spotify/)
* **Aroma: Using ML for Code Recommendation at Facebook** (https://code.fb.com/developer-tools/aroma/)
* **Flyte: Cloud Native Machine Learning and Data Processing Platform at Lyft** (https://eng.lyft.com/introducing-flyte-cloud-native-machine-learning-and-data-processing-platform-fb2bb3046a59
)
* **LyftLearn: ML Model Training Infrastructure built on Kubernetes at Lyft** (https://eng.lyft.com/lyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb)
* **Horovod: Open Source Distributed Deep Learning Framework for TensorFlow at Uber** (https://eng.uber.com/horovod/)
* **COTA: Improving Customer Care with NLP & Machine Learning at Uber** (https://eng.uber.com/cota/)
* **Manifold: Model-Agnostic Visual Debugging Tool for Machine Learning at Uber** (https://eng.uber.com/manifold/)
* **Repo-Topix: Topic Extraction Framework at Github** (https://githubengineering.com/topics/)
* **Concourse: Generating Personalized Content Notifications in Near-Real-Time at LinkedIn** (https://engineering.linkedin.com/blog/2018/05/concourse--generating-personalized-content-notifica
tions-in-near)
* **Altus Care: Applying a Chatbot to Platform Engineering at eBay** (https://www.ebayinc.com/stories/blogs/tech/altus-care-apply-chatbot-to-ebay-platform-engineering/)
* **PyKrylov: Accelerating Machine Learning Research at eBay** (https://tech.ebayinc.com/engineering/pykrylov-accelerating-machine-learning-research-at-ebay/)
* **Box Graph: Spontaneous Social Network at Box** (https://blog.box.com/blog/box-graph-how-we-built-spontaneous-social-network/)
* **PricingNet: Pricing Modelling with Neural Networks at Skyscanner** (https://hackernoon.com/pricingnet-modelling-the-global-airline-industry-with-neural-networks-833844d20ea6)
* **PinText: Multitask Text Embedding System at Pinterest** (https://medium.com/pinterest-engineering/pintext-a-multitask-text-embedding-system-in-pinterest-b80ece364555)
* **SearchSage: Learning Search Query Representations at Pinterest** (https://medium.com/pinterest-engineering/searchsage-learning-search-query-representations-at-pinterest-654f2bb887fc)
* **Cannes: ML saves $1.7M a year on document previews at Dropbox** (https://dropbox.tech/machine-learning/cannes--how-ml-saves-us--1-7m-a-year-on-document-previews)
* **Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp** (https://engineeringblog.yelp.com/2018/01/building-a-distributed-ml-pipeline-part1.html)
* **Learning with Privacy at Scale at Apple** (https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html)
* **Deep Learning for Image Classification Experiment at Mercari** (https://medium.com/mercari-engineering/mercaris-image-classification-experiment-using-deep-learning-9b4e994a18ec)
* **Deep Learning for Frame Detection in Product Images at Allegro** (https://allegro.tech/2016/12/deep-learning-for-frame-detection.html)
* **Content-based Video Relevance Prediction at Hulu** (https://medium.com/hulu-tech-blog/content-based-video-relevance-prediction-b2c448e14752)
* **Moderating Inappropriate Video Content at Yelp** (https://engineeringblog.yelp.com/2024/03/moderating-inappropriate-video-content-at-yelp.html)
* **Improving Photo Selection With Deep Learning at TripAdvisor** (http://engineering.tripadvisor.com/improving-tripadvisor-photo-selection-deep-learning/)
* **Personalized Recommendations for Experiences Using Deep Learning at TripAdvisor** (https://www.tripadvisor.com/engineering/personalized-recommendations-for-experiences-using-deep-learning
/)
* **Personalised Recommender Systems at BBC** (https://medium.com/bbc-design-engineering/developing-personalised-recommender-systems-at-the-bbc-e26c5e0c4216)
* **Machine Learning (2 parts) at Condé Nast** (https://technology.condenast.com/story/handbag-brand-and-color-detection)
* **Natural Language Processing and Content Analysis (2 parts) at Condé Nast** (https://technology.condenast.com/story/natural-language-processing-and-content-analysis-at-conde-nast-part-2-sy
stem-architecture)
* **Mapping the World of Music Using Machine Learning (2 parts) at iHeartRadio** (https://tech.iheart.com/mapping-the-world-of-music-using-machine-learning-part-2-aa50b6a0304c)
* **Machine Learning to Improve Streaming Quality at Netflix** (https://medium.com/netflix-techblog/using-machine-learning-to-improve-streaming-quality-at-netflix-9651263ef09f)
* **Machine Learning to Match Drivers & Riders at GO-JEK** (https://blog.gojekengineering.com/how-we-use-machine-learning-to-match-drivers-riders-b06d617b9e5)
* **Improving Video Thumbnails with Deep Neural Nets at YouTube** (https://youtube-eng.googleblog.com/2015/10/improving-youtube-video-thumbnails-with_8.html)
* **Quantile Regression for Delivering On Time at Instacart** (https://tech.instacart.com/how-instacart-delivers-on-time-using-quantile-regression-2383e2e03edb)
* **Cross-Lingual End-to-End Product Search with Deep Learning at Zalando** (https://jobs.zalando.com/tech/blog/search-deep-neural-network/)
* **Machine Learning at Jane Street** (https://blog.janestreet.com/real-world-machine-learning-part-1/)
* **Machine Learning for Ranking Answers End-to-End at Quora** (https://engineering.quora.com/A-Machine-Learning-Approach-to-Ranking-Answers-on-Quora)
* **Clustering Similar Stories Using LDA at Flipboard** (http://engineering.flipboard.com/2017/02/storyclustering)
* **Similarity Search at Flickr** (https://code.flickr.net/2017/03/07/introducing-similarity-search-at-flickr/)
* **Large-Scale Machine Learning Pipeline for Job Recommendations at Indeed** (http://engineering.indeedblog.com/blog/2016/04/building-a-large-scale-machine-learning-pipeline-for-job-recommen
dations/)
* **Deep Learning from Prototype to Production at Taboola** (http://engineering.taboola.com/deep-learning-from-prototype-to-production/)
* **Atom Smashing using Machine Learning at CERN** (https://cdn.oreillystatic.com/en/assets/1/event/144/Atom%20smashing%20using%20machine%20learning%20at%20CERN%20Presentation.pdf)
* **Mapping Tags at Medium** (https://medium.engineering/mapping-mediums-tags-1b9a78d77cf0)
* **Clustering with the Dirichlet Process Mixture Model in Scala at Monsanto** (http://engineering.monsanto.com/2015/11/23/chinese-restaurant-process/)
* **Map Pins with DBSCAN & Random Forests at Foursquare** (https://engineering.foursquare.com/you-are-probably-here-better-map-pins-with-dbscan-random-forests-9d51e8c1964d)
* **Forecasting at Uber** (https://eng.uber.com/forecasting-introduction/)
* **Financial Forecasting at Uber** (https://eng.uber.com/transforming-financial-forecasting-machine-learning/)
* **Productionizing ML with Workflows at Twitter** (https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html)
* **GUI Testing Powered by Deep Learning at eBay** (https://www.ebayinc.com/stories/blogs/tech/gui-testing-powered-by-deep-learning/)
* **Scaling Machine Learning to Recommend Driving Routes at Pivotal** (http://engineering.pivotal.io/post/scaling-machine-learning-to-recommend-driving-routes/)
* **Real-Time Predictions at DoorDash** (https://www.infoq.com/presentations/doordash-real-time-predictions)
* **Machine Intelligence at Dropbox** (https://blogs.dropbox.com/tech/2018/09/machine-intelligence-at-dropbox-an-update-from-our-dbxi-team/)
* **Machine Learning for Indexing Text from Billions of Images at Dropbox** (https://blogs.dropbox.com/tech/2018/10/using-machine-learning-to-index-text-from-billions-of-images/)
* **Modeling User Journeys via Semantic Embeddings at Etsy** (https://codeascraft.com/2018/07/12/modeling-user-journey-via-semantic-embeddings/)
* **Automated Fake Account Detection at LinkedIn** (https://engineering.linkedin.com/blog/2018/09/automated-fake-account-detection-at-linkedin)
* **Building Knowledge Graph at Airbnb** (https://medium.com/airbnb-engineering/contextualizing-airbnb-by-building-knowledge-graph-b7077e268d5a)
* **Core Modeling at Instagram** (https://instagram-engineering.com/core-modeling-at-instagram-a51e0158aa48)
* **Neural Architecture Search (NAS) for Prohibited Item Detection at Mercari** (https://tech.mercari.com/entry/2019/04/26/163000)
* **Computer Vision at Airbnb** (https://medium.com/airbnb-engineering/amenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e)
* **3D Home Backend Algorithms at Zillow** (https://www.zillow.com/engineering/behind-zillow-3d-home-backend-algorithms/)
* **Long-term Forecasts at Lyft** (https://eng.lyft.com/making-long-term-forecasts-at-lyft-fac475b3ba52)
* **Discovering Popular Dishes with Deep Learning at Yelp** (https://engineeringblog.yelp.com/2019/10/discovering-popular-dishes-with-deep-learning.html)
* **SplitNet Architecture for Ad Candidate Ranking at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/splitnet-architecture-for-ad-candidate-ranking.html)
* **Jobs Filter at Indeed** (https://engineering.indeedblog.com/blog/2019/09/jobs-filter/)
* **Architecting Restaurant Wait Time Predictions at Yelp** (https://engineeringblog.yelp.com/2019/12/architecting-wait-time-estimations.html)
* **Music Personalization at Spotify** (https://labs.spotify.com/2016/08/07/commodity-music-ml-services/)
* **Deep Learning for Domain Name Valuation at GoDaddy** (https://sg.godaddy.com/engineering/2019/07/26/domain-name-valuation/)
* **Similarity Clustering to Catch Fraud Rings at Stripe** (https://stripe.com/blog/similarity-clustering)
* **Personalized Search at Etsy** (https://codeascraft.com/2020/10/29/bringing-personalized-search-to-etsy/)
* **ML Feature Serving Infrastructure at Lyft** (https://eng.lyft.com/ml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a)
* **Context-Specific Bidding System at Etsy** (https://codeascraft.com/2021/03/23/how-we-built-a-context-specific-bidding-system-for-etsy-ads/)
* **Moderating Promotional Spam and Inappropriate Content in Photos at Scale at Yelp** (https://engineeringblog.yelp.com/2021/05/moderating-promotional-spam-and-inappropriate-content-in-photo
s-at-scale-at-yelp.html)
* **Optimizing Payments with Machine Learning at Dropbox** (https://dropbox.tech/machine-learning/optimizing-payments-with-machine-learning)
* **Scaling Media Machine Learning at Netflix** (https://netflixtechblog.com/scaling-media-machine-learning-at-netflix-f19b400243)
* **Similarity Engine at eBay** (https://tech.ebayinc.com/engineering/ebays-blazingly-fast-billion-scale-vector-similarity-engine/)
Architecture
⟡ Tech Stack at Medium (https://medium.engineering/the-stack-that-helped-medium-drive-2-6-millennia-of-reading-time-e56801f7c492)
⟡ Tech Stack at Shopify (https://engineering.shopify.com/blogs/engineering/e-commerce-at-scale-inside-shopifys-tech-stack)
⟡ Building Services (4 parts) at Airbnb (https://medium.com/airbnb-engineering/building-services-at-airbnb-part-4-23c95e428064)
⟡ Architecture of Evernote (https://evernote.com/blog/a-digest-of-evernotes-architecture/)
⟡ Architecture of Chat Service (3 parts) at Riot Games (https://engineering.riotgames.com/news/chat-service-architecture-persistence)
⟡ Architecture of League of Legends Client Update (https://technology.riotgames.com/news/architecture-league-client-update)
⟡ Architecture of Ad Platform at Twitter (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/building-twitters-ad-platform-architecture-for-the-future.html)
⟡ Architecture of API Gateway at Uber (https://eng.uber.com/architecture-api-gateway/)
⟡ Architecture of API Gateway at Tinder (https://medium.com/tinder/how-we-built-the-tinder-api-gateway-831c6ca5ceca)
⟡ Basic Architecture of Slack (https://slack.engineering/how-slack-built-shared-channels-8d42c895b19f)
⟡ Lightweight Distributed Architecture to Handle Thousands of Library Releases at eBay
(https://tech.ebayinc.com/engineering/a-lightweight-distributed-architecture-to-handle-thousands-of-library-releases-at-ebay/)
⟡ Back-end at LinkedIn (https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin)
⟡ Back-end at Flickr (https://yahooeng.tumblr.com/post/157200523046/introducing-tripod-flickrs-backend-refactored)
⟡ Infrastructure (3 parts) at Zendesk (https://medium.com/zendesk-engineering/the-history-of-infrastructure-at-zendesk-part-3-foundation-team-forming-and-evolving-9859e40f5390)
⟡ Cloud Infrastructure at Grubhub (https://bytes.grubhub.com/cloud-infrastructure-at-grubhub-94db998a898a)
⟡ Real-time Presence Platform at LinkedIn (https://engineering.linkedin.com/blog/2018/01/now-you-see-me--now-you-dont--linkedins-real-time-presence-platf)
⟡ Settings Platform at LinkedIn (https://engineering.linkedin.com/blog/2019/05/building-member-trust-through-a-centralized-and-scalable-setting)
⟡ Nearline System for Scale and Performance (2 parts) at Glassdoor (https://medium.com/glassdoor-engineering/building-a-nearline-system-for-scale-and-performance-part-ii-9e01bf51b23d)
⟡ Real-time User Action Counting System for Ads at Pinterest (https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a)
⟡ API Platform at Riot Games (https://engineering.riotgames.com/news/riot-games-api-deep-dive)
⟡ Games Platform at The New York Times (https://open.nytimes.com/play-by-play-moving-the-nyt-games-platform-to-gcp-with-zero-downtime-cf425898d569)
⟡ Kabootar: Communication Platform at Swiggy (https://bytes.swiggy.com/kabootar-swiggys-communication-platform-e5a43cc25629)
⟡ Simone: Distributed Simulation Service at Netflix (https://medium.com/netflix-techblog/https-medium-com-netflix-techblog-simone-a-distributed-simulation-service-b2c85131ca1b)
⟡ Seagull: Distributed System that Helps Running > 20 Million Tests Per Day at Yelp (https://engineeringblog.yelp.com/2017/04/how-yelp-runs-millions-of-tests-every-day.html)
⟡ PriceAggregator: Intelligent System for Hotel Price Fetching (3 parts) at Agoda
(https://medium.com/agoda-engineering/priceaggregator-an-intelligent-system-for-hotel-price-fetching-part-3-52acfc705081)
⟡ Phoenix: Testing Platform (3 parts) at Tinder (https://medium.com/tinder-engineering/phoenix-tinders-testing-platform-part-iii-520728b9537)
⟡ Hexagonal Architecture at Netflix (https://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749)
⟡ Architecture of Sticker Services at LINE (https://www.slideshare.net/linecorp/architecture-sustaining-line-sticker-services)
⟡ Stack Overflow Enterprise at Palantir (https://medium.com/@palantir/terraforming-stack-overflow-enterprise-in-aws-47ee431e6be7)
⟡ Architecture of Following Feed, Interest Feed, and Picked For You at Pinterest (https://medium.com/@Pinterest_Engineering/building-a-dynamic-and-responsive-pinterest-7d410e99f0a9)
⟡ API Specification Workflow at WeWork (https://engineering.wework.com/our-api-specification-workflow-9337448d6ee6)
⟡ Media Database at Netflix (https://medium.com/netflix-techblog/implementing-the-netflix-media-database-53b5a840b42a)
⟡ Member Transaction History Architecture at Walmart (https://medium.com/walmartlabs/member-transaction-history-architecture-8b6e34b87c21)
⟡ Sync Engine (2 parts) at Dropbox (https://dropbox.tech/infrastructure/-testing-our-new-sync-engine)
⟡ Ads Pacing Service at Twitter (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/how-we-built-twitter-s-highly-reliable-ads-pacing-service)
⟡ Rapid Event Notification System at Netflix (https://netflixtechblog.com/rapid-event-notification-system-at-netflix-6deb1d2b57d1)
⟡ Architectures of Finance, Banking, and Payment Systems (https://www.redhat.com/architect/portfolio/detail/12-integrating-a-modern-payments-architecture)
* **Bank Backend at Monzo** (https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/)
* **Trading Platform for Scale at Wealthsimple** (https://medium.com/@Wealthsimple/engineering-at-wealthsimple-reinventing-our-trading-platform-for-scale-17e332241b6c)
* **Core Banking System at Margo Bank** (https://medium.com/margobank/choosing-an-architecture-85750e1e5a03)
* **Architecture of Nubank** (https://www.infoq.com/presentations/nubank-architecture)
* **Tech Stack at TransferWise** (http://tech.transferwise.com/the-transferwise-stack-heartbeat-of-our-little-revolution/)
* **Tech Stack at Addepar** (https://medium.com/build-addepar/our-tech-stack-a4f55dab4b0d)
* **Avoiding Double Payments in a Distributed Payments System at Airbnb** (https://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb)
* **Scaling Payments (3 parts) at Etsy** (https://www.etsy.com/sg-en/codeascraft/scaling-etsy-payments-with-vitess-part-3--reducing-cutover-risk)
* **Handles Millions of Digital Transactions Safely Everyday at Paytm** (https://paytm.com/blog/engineering/how-paytm-handles-millions-of-digital-transactions-safely-everyday/)
* **Billing and Payment Platform at Grammarly** (https://www.grammarly.com/blog/engineering/billing-and-payments-platform/)
Interview
⟡ Designing Large-Scale Systems (https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/)
* **My Scaling Hero - Jeff Atwood (a dose of Endorphins before your interview, JK)** (https://blog.codinghorror.com/my-scaling-hero/)
* **Software Engineering Advice from Building Large-Scale Distributed Systems - Jeff Dean** (https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.p
df)
* **Introduction to Architecting Systems for Scale** (https://lethain.com/introduction-to-architecting-systems-for-scale/)
* **Anatomy of a System Design Interview** (https://hackernoon.com/anatomy-of-a-system-design-interview-4cb57d75a53f)
* **8 Things You Need to Know Before a System Design Interview** (http://blog.gainlo.co/index.php/2015/10/22/8-things-you-need-to-know-before-system-design-interviews/)
* **Top 10 System Design Interview Questions ** (https://hackernoon.com/top-10-system-design-interview-questions-for-software-engineers-8561290f0444)
* **Top 10 Common Large-Scale Software Architectural Patterns in a Nutshell** (https://towardsdatascience.com/10-common-software-architectural-patterns-in-a-nutshell-a0b47a1e9013)
* **Cloud Big Data Design Patterns - Lynn Langit** (https://lynnlangit.com/2017/03/14/beyond-relational/)
* **How NOT to design Netflix in your 45-minute System Design Interview?** (https://hackernoon.com/how-not-to-design-netflix-in-your-45-minute-system-design-interview-64953391a054)
* **API Best Practices: Webhooks, Deprecation, and Design** (https://zapier.com/engineering/api-best-practices/)
⟡ Explaining Low-Level Systems (OS, Network/Protocol, Database, Storage) (https://www.cse.wustl.edu/~jain/cse567-06/ftp/os_monitors/index.html)
* **The Precise Meaning of I/O Wait Time in Linux** (http://veithen.github.io/2013/11/18/iowait-linux.html)
* **Paxos Made Live – An Engineering Perspective** (https://research.google.com/archive/paxos_made_live.html)
* **How to do Distributed Locking** (https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
* **SQL Transaction Isolation Levels Explained** (http://elliot.land/post/sql-transaction-isolation-levels-explained)
⟡ "What Happens When... and How" Questions (https://www.glassdoor.com/Interview/What-happens-when-you-type-www-google-com-in-your-browser-QTN_56396.htm)
* **Netflix: What Happens When You Press Play?** (http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html)
* **Monzo: How Peer-To-Peer Payments Work** (https://monzo.com/blog/2018/04/05/how-monzo-to-monzo-payments-work/)
* **Transit and Peering: How Your Requests Reach GitHub** (https://githubengineering.com/transit-and-peering-how-your-requests-reach-github/)
* **How Spotify Streams Music** (https://labs.spotify.com/2018/08/31/smoother-streaming-with-bbr/)
Organization
⟡ Engineering Levels at SoundCloud (https://developers.soundcloud.com/blog/engineering-levels)
⟡ Engineering Roles at Palantir (https://medium.com/palantir/dev-versus-delta-demystifying-engineering-roles-at-palantir-ad44c2a6e87)
⟡ Engineering Career Framework at Dropbox (https://dropbox.tech/culture/our-updated-engineering-career-framework)
⟡ Scaling Engineering Teams at Twitter (https://www.youtube.com/watch?v=-PXi_7Ld5kU)
⟡ Scaling Decision-Making Across Teams at LinkedIn (https://engineering.linkedin.com/blog/2018/03/scaling-decision-making-across-teams-within-linkedin-engineering)
⟡ Scaling Data Science Team at GOJEK (https://blog.gojekengineering.com/the-dynamics-of-scaling-an-organisation-cb96dbe8aecd)
⟡ Scaling Agile at Zalando (https://jobs.zalando.com/tech/blog/scaling-agile-zalando/?gh_src=4n3gxh1)
⟡ Scaling Agile at bol.com (https://hackernoon.com/how-we-run-bol-com-with-60-autonomous-teams-fe7a98c0759)
⟡ Lessons Learned from Scaling a Product Team at Intercom (https://blog.intercom.com/how-we-build-software/)
⟡ Hiring, Managing, and Scaling Engineering Teams at Typeform (https://medium.com/@eleonorazucconi/toby-oliver-cto-typeform-on-hiring-managing-and-scaling-engineering-teams-86bef9e5a708)
⟡ Scaling the Datagram Team at Instagram (https://instagram-engineering.com/scaling-the-datagram-team-fc67bcf9b721)
⟡ Scaling the Design Team at Flexport (https://medium.com/flexport-design/designing-a-design-team-a9a066bc48a5)
⟡ Team Model for Scaling a Design System at Salesforce (https://medium.com/salesforce-ux/the-salesforce-team-model-for-scaling-a-design-system-d89c2a2d404b)
⟡ Building Analytics Team (4 parts) at Wish (https://medium.com/wish-engineering/scaling-the-analytics-team-at-wish-part-4-recruiting-2a9823b9f5a)
⟡ From 2 Founders to 1000 Employees at Transferwise
(https://medium.com/transferwise-ideas/from-2-founders-to-1000-employees-how-a-small-scale-startup-grew-into-a-global-community-9f26371a551b)
⟡ Lessons Learned Growing a UX Team from 10 to 170 at Adobe (https://medium.com/thinking-design/lessons-learned-growing-a-ux-team-from-10-to-170-f7b47be02262)
⟡ Five Lessons from Scaling at Pinterest (https://medium.com/@sarahtavel/five-lessons-from-scaling-pinterest-6a699a889b08)
⟡ Approach Engineering at Vinted (http://engineering.vinted.com/2018/09/04/how-we-approach-engineering-at-vinted/)
⟡ Using Metrics to Improve the Development Process (and Coach People) at Indeed
(https://engineering.indeedblog.com/blog/2018/10/using-metrics-to-improve-the-development-process-and-coach-people/)
⟡ Mistakes to Avoid while Creating an Internal Product at Skyscanner (https://medium.com/@SkyscannerEng/9-mistakes-to-avoid-while-creating-an-internal-product-63d579b00b1a)
⟡ RACI (Responsible, Accountable, Consulted, Informed) at Etsy (https://codeascraft.com/2018/01/04/selecting-a-cloud-provider/)
⟡ Four Pillars of Leading People (Empathy, Inspiration, Trust, Honesty) at Zalando (https://jobs.zalando.com/tech/blog/four-pillars-leadership/)
⟡ Pair Programming at Shopify (https://engineering.shopify.com/blogs/engineering/pair-programming-explained)
⟡ Distributed Responsibility at Asana (https://blog.asana.com/2017/12/distributed-responsibility-engineering-manager/)
⟡ Rotating Engineers at Zalando (https://jobs.zalando.com/tech/blog/rotating-engineers-at-zalando/)
⟡ Experiment Idea Review at Pinterest (https://medium.com/pinterest-engineering/how-pinterest-supercharged-its-growth-team-with-experiment-idea-review-fd6571a02fb8)
⟡ Tech Migrations at Spotify (https://engineering.atspotify.com/2020/06/25/tech-migrations-the-spotify-way/)
⟡ Improving Code Ownership at Yelp (https://engineeringblog.yelp.com/2021/01/whose-code-is-it-anyway.html)
⟡ Agile Code Base at eBay (https://tech.ebayinc.com/engineering/how-creating-an-agile-code-base-helped-ebay-pivot-for-apple-silicon/)
⟡ Agile Data Engineering at Miro (https://medium.com/miro-engineering/agile-data-engineering-at-miro-ec2dcc8a3fcb)
⟡ Automated Incident Management through Slack at Airbnb (https://medium.com/airbnb-engineering/incident-management-ae863dc5d47f)
⟡ Refactor Organization at BBC (https://medium.com/bbc-product-technology/refactor-organisation-80e4e171d922)
⟡ Code Review (https://ai.google/research/pubs/pub47025)
* **Code Review at Palantir** (https://medium.com/@palantir/code-review-best-practices-19e02780015f)
* **Code Review at LINE** (https://engineering.linecorp.com/en/blog/effective-code-review/)
* **Code Reviews at Medium** (https://medium.engineering/code-reviews-at-medium-bed2c0dce13a)
* **Code Review at LinkedIn** (https://engineering.linkedin.com/blog/2018/06/scaling-collective-code-ownership-with-code-reviews)
* **Code Review at Disney** (https://medium.com/disney-streaming/the-secret-to-better-code-reviews-c14c7884b9ac)
* **Code Review at Netlify** (https://www.netlify.com/blog/2020/03/05/feedback-ladders-how-we-encode-code-reviews-at-netlify/)
Talk
⟡ Distributed Systems in One Lesson - Tim Berglund, Senior Director of Developer Experience at Confluent (https://www.youtube.com/watch?v=Y6Ev8GIlbxc)
⟡ Building Real Time Infrastructure at Facebook - Jeff Barber and Shie Erlich, Software Engineer at Facebook (https://www.usenix.org/conference/srecon17americas/program/presentation/erlich)
⟡ Building Reliable Social Infrastructure for Google - Marc Alvidrez, Senior Manager at Google (https://www.usenix.org/conference/srecon16/program/presentation/alvidrez)
⟡ Building a Distributed Build System at Google Scale - Aysylu Greenberg, SDE at Google (https://www.youtube.com/watch?v=K8YuavUy6Qc)
⟡ Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox (https://www.youtube.com/watch?v=ggizCjUCCqE)
⟡ How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform (https://www.youtube.com/watch?v=H4vMcD7zKM0)
⟡ Netflix Guide to Microservices - Josh Evans, Director of Operations Engineering at Netflix (https://www.youtube.com/watch?v=CZ3wIuvmHeM&t=2837s)
⟡ Achieving Rapid Response Times in Large Online Services - Jeff Dean, Google Senior Fellow (https://www.youtube.com/watch?v=1-3Ahy7Fxsc)
⟡ Architecture to Handle 80K RPS Celebrity Sales at Shopify - Simon Eskildsen, Engineering Lead at Shopify (https://www.youtube.com/watch?v=N8NWDHgWA28)
⟡ Lessons of Scale at Facebook - Bobby Johnson, Director of Engineering at Facebook (https://www.youtube.com/watch?v=QCHiNEw73AU)
⟡ Performance Optimization for the Greater China Region at Salesforce - Jeff Cheng, Enterprise Architect at Salesforce (https://www.salesforce.com/video/1757880/)
⟡ How GIPHY Delivers a GIF to 300 Millions Users - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY (https://vimeo.com/252367076)
⟡ High Performance Packet Processing Platform at Alibaba - Haiyong Wang, Senior Director at Alibaba
(https://www.youtube.com/watch?v=wzsxJqeVIhY&list=PLMu8-hpCxIVENuAue7bd0eCAglLGY_8AW&index=7)
⟡ Solving Large-scale Data Center and Cloud Interconnection Problems - Ihab Tarazi, CTO at Equinix
(https://atscaleconference.com/videos/solving-large-scale-data-center-and-cloud-interconnection-problems/)
⟡ Scaling Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox (https://www.youtube.com/watch?v=PE4gwstWhmc)
⟡ Scaling Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox (https://www.youtube.com/watch?v=IhGWOaD5BYQ)
⟡ Scaling with Performance at Facebook - Bill Jia, VP of Infrastructure at Facebook (https://atscaleconference.com/videos/performance-scale-2018-opening-remarks/)
⟡ Scaling Live Videos to a Billion Users at Facebook - Sachin Kulkarni, Director of Engineering at Facebook (https://www.youtube.com/watch?v=IO4teCbHvZw)
⟡ Scaling Infrastructure at Instagram - Lisa Guo, Instagram Engineering (https://www.youtube.com/watch?v=hnpzNAPiC0E)
⟡ Scaling Infrastructure at Twitter - Yao Yue, Staff Software Engineer at Twitter (https://www.youtube.com/watch?v=6OvrFkLSoZ0)
⟡ Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy (https://www.youtube.com/watch?v=LfqyhM1LeIU)
⟡ Scaling Real-time Infrastructure at Alibaba for Global Shopping Holiday - Xiaowei Jiang, Senior Director at Alibaba
(https://atscaleconference.com/videos/scaling-alibabas-real-time-infrastructure-for-global-shopping-holiday/)
⟡ Scaling Data Infrastructure at Spotify - Matti (Lepistö) Pehrs, Spotify (https://www.youtube.com/watch?v=cdsfRXr9pJU)
⟡ Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer (https://www.youtube.com/watch?v=jQNCuD_hxdQ&list=RDhnpzNAPiC0E&index=11)
⟡ Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack (https://www.infoq.com/presentations/slack-scalability)
⟡ Scaling Backend at Youtube - Sugu Sougoumarane, SDE at Youtube (https://www.youtube.com/watch?v=5yDO-tmIoXY&feature=youtu.be)
⟡ Scaling Backend at Uber - Matt Ranney, Chief Systems Architect at Uber (https://www.youtube.com/watch?v=nuiLcWE8sPA)
⟡ Scaling Global CDN at Netflix - Dave Temkin, Director of Global Networks at Netflix (https://www.youtube.com/watch?v=tbqcsHg-Q_o)
⟡ Scaling Load Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick Shuff, Production Engineer at Facebook (https://www.youtube.com/watch?v=bxhYNfFeVF4)
⟡ Scaling (a NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead Platform Developer at MindGeek (https://www.youtube.com/watch?v=RlkCdM_f3p4)
⟡ Scaling Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at Quora (https://www.infoq.com/presentations/quora-analytics)
⟡ Scaling Git at Microsoft - Saeed Noursalehi, Principal Program Manager at Microsoft (https://www.youtube.com/watch?v=g_MPGU_m01s)
⟡ Scaling Multitenant Architecture Across Multiple Data Centres at Shopify - Weingarten, Engineering Lead at Shopify (https://www.youtube.com/watch?v=F-f0-k46WVk)
Donation
Roses are red. Violets are blue. Binh (https://nguyenquocbinh.org/) likes sweet. Treat Binh a tiramisu? (https://paypal.me/binhnguyennus) :cake:
An updated and organized reading list for illustrating the patterns of scalable, reliable, and performant large-scale systems. Concepts are explained in the articles of prominent engineers
and credible references. Case studies are taken from battle-tested systems that serve millions to billions of users.
If your system goes slow
▐ Understand your problems: scalability problem (fast for a single user but slow under heavy load) or performance problem (slow for a single user) by reviewing some design principles
▐ (#principle) and checking how scalability (#scalability) and performance (#performance) problems are solved at tech companies. The section of intelligence (#intelligence) are created for
▐ those who work with data and machine learning at big (data) and deep (learning) scale.
If your system goes down
▐ "Even if you lose all one day, you can build all over again if you retain your calm!" - Thuan Pham, former CTO of Uber. So, keep calm and mind the availability (#availability) and stability
▐ (#stability) matters!
If you are having a system design interview
▐ Look at some interview notes (#interview) and real-world architectures with completed diagrams (#architecture) to get a comprehensive view before designing your system on whiteboard. You
▐ can check some talks (#talk) of engineers from tech giants to know how they build, scale, and optimize their systems. Good luck!
If you are building your dream team
▐ The goal of scaling team is not growing team size but increasing team output and value. You can find out how tech companies reach that goal in various aspects: hiring, management,
▐ organization, culture, and communication in the organization (#organization) section.
Community power
▐ Contributions are greatly welcome! You may want to take a look at the contribution guidelines (CONTRIBUTING.md). If you see a link here that is no longer maintained or is not a good fit,
▐ please submit a pull request!
▐ Many long hours of hard work have gone into this project. If you find it helpful, please share on Facebook, on Twitter (https://ctt.ec/V8B2p), on Weibo (http://t.cn/RnjFLCB), or on your
▐ chat groups! Knowledge is power, knowledge shared is power multiplied. Thank you!
Content
- Principle (#principle)
- Scalability (#scalability)
- Availability (#availability)
- Stability (#stability)
- Performance (#performance)
- Intelligence (#intelligence)
- Architecture (#architecture)
- Interview (#interview)
- Organization (#organization)
- Talk (#talk)
- Book (#book)
Principle
⟡ Lessons from Giant-Scale Services - Eric Brewer, UC Berkeley & Google (https://people.eecs.berkeley.edu/~brewer/papers/GiantScale-IEEE.pdf)
⟡ Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean, Google (https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
⟡ How to Design a Good API & Why it Matters - Joshua Bloch, CMU & Google (https://www.infoq.com/presentations/effective-api-design)
⟡ On Efficiency, Reliability, Scaling - James Hamilton, VP at AWS (http://mvdirona.com/jrh/work/)
⟡ Principles of Chaos Engineering (https://www.usenix.org/conference/srecon17americas/program/presentation/rosenthal)
⟡ Finding the Order in Chaos (https://www.usenix.org/conference/srecon16/program/presentation/lueder)
⟡ The Twelve-Factor App (https://12factor.net/)
⟡ Clean Architecture (https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html)
⟡ High Cohesion and Low Coupling (http://www.math-cs.gordon.edu/courses/cs211/lectures-2009/Cohesion,Coupling,MVC.pdf)
⟡ Monoliths and Microservices (https://medium.com/@SkyscannerEng/monoliths-and-microservices-8c65708c3dbf)
⟡ CAP Theorem and Trade-offs (http://robertgreiner.com/2014/08/cap-theorem-revisited/)
⟡ CP Databases and AP Databases (https://blog.andyet.com/2014/10/01/right-database)
⟡ Stateless vs Stateful Scalability (http://ithare.com/scaling-stateful-objects/)
⟡ Scale Up vs Scale Out: Hidden Costs (https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/)
⟡ ACID and BASE (https://neo4j.com/blog/acid-vs-base-consistency-models-explained/)
⟡ Blocking/Non-Blocking and Sync/Async (https://blogs.msdn.microsoft.com/csliu/2009/08/27/io-concept-blockingnon-blocking-vs-syncasync/)
⟡ Performance and Scalability of Databases (https://use-the-index-luke.com/sql/testing-scalability)
⟡ Database Isolation Levels and Effects on Performance and Scalability (http://highscalability.com/blog/2011/2/10/database-isolation-levels-and-their-effects-on-performance-a.html)
⟡ The Probability of Data Loss in Large Clusters (https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html)
⟡ Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence (https://docs.microsoft.com/en-us/previous-versions/msp-n-p/dn271399(v=pandp.10))
⟡ SQL vs NoSQL (https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/)
⟡ SQL vs NoSQL - Lesson Learned at Salesforce (https://engineering.salesforce.com/sql-or-nosql-9eaf1d92545b)
⟡ NoSQL Databases: Survey and Decision Guidance (https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d)
⟡ How Sharding Works (https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6)
⟡ Consistent Hashing (http://www.tom-e-white.com/2007/11/consistent-hashing.html)
⟡ Consistent Hashing: Algorithmic Tradeoffs (https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8)
⟡ Don’t be tricked by the Hashing Trick (https://booking.ai/dont-be-tricked-by-the-hashing-trick-192a6aae3087)
⟡ Uniform Consistent Hashing at Netflix (https://medium.com/netflix-techblog/distributing-content-to-open-connect-3e3e391d4dc9)
⟡ Eventually Consistent - Werner Vogels, CTO at Amazon (https://www.allthingsdistributed.com/2008/12/eventually_consistent.html)
⟡ Cache is King (https://www.stevesouders.com/blog/2012/10/11/cache-is-king/)
⟡ Anti-Caching (https://www.the-paper-trail.org/post/2014-06-06-paper-notes-anti-caching/)
⟡ Understand Latency (http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it)
⟡ Latency Numbers Every Programmer Should Know (http://norvig.com/21-days.html#answers)
⟡ The Calculus of Service Availability (https://queue.acm.org/detail.cfm?id=3096459&__s=dnkxuaws9pogqdnxmx8i)
⟡ Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO
(http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html)
⟡ Common Bottlenecks (http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html)
⟡ Life Beyond Distributed Transactions (https://queue.acm.org/detail.cfm?id=3025012)
⟡ Relying on Software to Redirect Traffic Reliably at Various Layers (https://www.usenix.org/conference/srecon15/program/presentation/taveira)
⟡ Breaking Things on Purpose (https://www.usenix.org/conference/srecon17americas/program/presentation/andrus)
⟡ Avoid Over Engineering (https://medium.com/@rdsubhas/10-modern-software-engineering-mistakes-bc67fbef4fc8)
⟡ Scalability Worst Practices (https://www.infoq.com/articles/scalability-worst-practices)
⟡ Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple! (https://medium.com/@DataStax/instagram-engineerings-3-rules-to-a-scalable-cloud-application-architecture-c44afed31406)
⟡ Simplicity by Distributing Complexity (https://jobs.zalando.com/tech/blog/simplicity-by-distributing-complexity/)
⟡ Why Over-Reusing is Bad (http://tech.transferwise.com/why-over-reusing-is-bad/)
⟡ Performance is a Feature (https://blog.codinghorror.com/performance-is-a-feature/)
⟡ Make Performance Part of Your Workflow (https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/)
⟡ The Benefits of Server Side Rendering over Client Side Rendering (https://medium.com/walmartlabs/the-benefits-of-server-side-rendering-over-client-side-rendering-5d07ff2cefe8)
⟡ Automate and Abstract: Lessons at Facebook (https://architecht.io/lessons-from-facebook-on-engineering-for-scale-f5716f0afc7a)
⟡ AWS Do's and Don'ts (https://8thlight.com/blog/sarah-sunday/2017/09/15/aws-dos-and-donts.html)
⟡ (UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify (https://medium.com/@hellostanley/design-doesnt-scale-4d81e12cbc3e)
⟡ Linux Performance (http://www.brendangregg.com/linuxperf.html)
⟡ Building Fast and Resilient Web Applications - Ilya Grigorik (https://www.igvita.com/2016/05/20/building-fast-and-resilient-web-applications/)
⟡ Accept Partial Failures, Minimize Service Loss (https://www.usenix.org/conference/srecon17asia/program/presentation/wang_daxin)
⟡ Design for Resiliency (http://highscalability.com/blog/2012/12/31/designing-for-resiliency-will-be-so-2013.html)
⟡ Design for Self-healing (https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/self-healing)
⟡ Design for Scaling Out (https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/scale-out)
⟡ Design for Evolution (https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-evolution)
⟡ Learn from Mistakes (http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html)
Scalability
⟡ Microservices and Orchestration (https://martinfowler.com/microservices/)
* **Domain-Oriented Microservice Architecture at Uber** (https://eng.uber.com/microservice-architecture/)
* **Service Architecture (3 parts: Domain Gateways, Value-Added Services, BFF) at SoundCloud** (https://developers.soundcloud.com/blog/service-architecture-3)
* **Container (8 parts) at Riot Games** (https://engineering.riotgames.com/news/thinking-inside-container)
* **Containerization at Pinterest** (https://medium.com/@Pinterest_Engineering/containerization-at-pinterest-92295347f2f3)
* **Evolution of Container Usage at Netflix** (https://medium.com/netflix-techblog/the-evolution-of-container-usage-at-netflix-3abfc096781b)
* **Dockerizing MySQL at Uber** (https://eng.uber.com/dockerizing-mysql/)
* **Testing of Microservices at Spotify** (https://labs.spotify.com/2018/01/11/testing-of-microservices/)
* **Docker in Production at Treehouse** (https://medium.com/treehouse-engineering/lessons-learned-running-docker-in-production-5dce99ece770)
* **Microservice at SoundCloud** (https://developers.soundcloud.com/blog/inside-a-soundcloud-microservice)
* **Operate Kubernetes Reliably at Stripe** (https://stripe.com/blog/operating-kubernetes)
* **Cross-Cluster Traffic Mirroring with Istio at Trivago** (https://tech.trivago.com/2020/06/10/cross-cluster-traffic-mirroring-with-istio/)
* **Agrarian-Scale Kubernetes (3 parts) at New York Times** (https://open.nytimes.com/agrarian-scale-kubernetes-part-3-ee459887ed7e)
* **Nanoservices at BBC** (https://medium.com/bbc-design-engineering/powering-bbc-online-with-nanoservices-727840ba015b)
* **PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg** (https://www.techatbloomberg.com/blog/powerfulseal-testing-tool-kubernetes-clusters/)
* **Conductor: Microservices Orchestrator at Netflix** (https://medium.com/netflix-techblog/netflix-conductor-a-microservices-orchestrator-2e8d4771bf40)
* **Docker Containers that Power Over 100.000 Online Shops at Shopify** (https://shopifyengineering.myshopify.com/blogs/engineering/docker-at-shopify-how-we-built-containers-that-power-over-1
00-000-online-shops)
* **Microservice Architecture at Medium** (https://medium.engineering/microservice-architecture-at-medium-9c33805eb74f)
* **From bare-metal to Kubernetes at Betabrand** (https://boxunix.com/post/bare_metal_to_kube/)
* **Kubernetes at Tinder** (https://medium.com/tinder-engineering/tinders-move-to-kubernetes-cda2a6372f44)
* **Kubernetes at Quora** (https://www.quora.com/q/quoraengineering/Adopting-Kubernetes-at-Quora)
* **Kubernetes Platform at Pinterest** (https://medium.com/pinterest-engineering/building-a-kubernetes-platform-at-pinterest-fb3d9571c948)
* **Microservices at Nubank** (https://medium.com/building-nubank/microservices-at-nubank-an-overview-2ebcb336c64d)
* **Payment Transaction Management in Microservices at Mercari** (https://engineering.mercari.com/en/blog/entry/20210831-2019-06-07-155849/)
* **Service Mesh at Snap** (https://eng.snap.com/monolith-to-multicloud-microservices-snap-service-mesh)
* **GRIT: Protocol for Distributed Transactions across Microservices at eBay** (https://tech.ebayinc.com/engineering/grit-a-protocol-for-distributed-transactions-across-microservices/)
* **Rubix: Kubernetes at Palantir** (https://medium.com/palantir/introducing-rubix-kubernetes-at-palantir-ab0ce16ea42e)
* **CRISP: Critical Path Analysis for Microservice Architectures at Uber** (https://eng.uber.com/crisp-critical-path-analysis-for-microservice-architectures/)
⟡ Distributed Caching (https://www.wix.engineering/post/scaling-to-100m-to-cache-or-not-to-cache)
* **EVCache: Distributed In-memory Caching at Netflix** (https://medium.com/netflix-techblog/caching-for-a-global-netflix-7bcc457012f1)
* **EVCache Cache Warmer Infrastructure at Netflix** (https://medium.com/netflix-techblog/cache-warming-agility-for-a-stateful-service-2d3b1da82642)
* **Memsniff: Robust Memcache Traffic Analyzer at Box** (https://blog.box.com/blog/introducing-memsniff-robust-memcache-traffic-analyzer/)
* **Caching with Consistent Hashing and Cache Smearing at Etsy** (https://codeascraft.com/2017/11/30/how-etsy-caches/)
* **Analysis of Photo Caching at Facebook** (https://code.facebook.com/posts/220956754772273/an-analysis-of-facebook-photo-caching/)
* **Cache Efficiency Exercise at Facebook** (https://code.facebook.com/posts/964122680272229/web-performance-cache-efficiency-exercise/)
* **tCache: Scalable Data-aware Java Caching at Trivago** (http://tech.trivago.com/2015/10/15/tcache/)
* **Pycache: In-process Caching at Quora** (https://engineering.quora.com/Pycache-lightning-fast-in-process-caching)
* **Reduce Memcached Memory Usage by 50% at Trivago** (http://tech.trivago.com/2017/12/19/how-trivago-reduced-memcached-memory-usage-by-50/)
* **Caching Internal Service Calls at Yelp** (https://engineeringblog.yelp.com/2018/03/caching-internal-service-calls-at-yelp.html)
* **Estimating the Cache Efficiency using Big Data at Allegro** (https://allegro.tech/2017/01/estimating-the-cache-efficiency-using-big-data.html)
* **Distributed Cache at Zalando** (https://jobs.zalando.com/tech/blog/distributed-cache-akka-kubernetes/)
* **Application Data Caching from RAM to SSD at NetFlix** (https://medium.com/netflix-techblog/evolution-of-application-data-caching-from-ram-to-ssd-a33d6fa7a690)
* **Tradeoffs of Replicated Cache at Skyscanner** (https://medium.com/@SkyscannerEng/the-tradeoffs-of-a-replicated-cache-b6680c722f58)
* **Avoiding Cache Stampede at DoorDash** (https://blog.doordash.com/avoiding-cache-stampede-at-doordash-55bbf596d94b)
* **Location Caching with Quadtrees at Yext** (http://engblog.yext.com/post/geolocation-caching)
* **Video Metadata Caching at Vimeo** (https://medium.com/vimeo-engineering-blog/video-metadata-caching-at-vimeo-a54b25f0b304)
* **Scaling Redis at Twitter** (http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html)
* **Scaling Job Queue with Redis at Slack** (https://slack.engineering/scaling-slacks-job-queue-687222e9d100)
* **Moving persistent data out of Redis at Github** (https://githubengineering.com/moving-persistent-data-out-of-redis/)
* **Storing Hundreds of Millions of Simple Key-Value Pairs in Redis at Instagram** (https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f
74c)
* **Redis at Trivago** (http://tech.trivago.com/2017/01/25/learn-redis-the-hard-way-in-production/)
* **Optimizing Redis Storage at Deliveroo** (https://deliveroo.engineering/2017/01/19/optimising-membership-queries.html)
* **Memory Optimization in Redis at Wattpad** (http://engineering.wattpad.com/post/23244724794/store-more-stuff-memory-optimization-in-redis)
* **Redis Fleet at Heroku** (https://blog.heroku.com/rolling-redis-fleet)
* **Solving Remote Build Cache Misses (2 parts) at SoundCloud** (https://developers.soundcloud.com/blog/gradle-remote-build-cache-misses-part-2)
* **Ratings & Reviews (2 parts) at Flipkart** (https://blog.flipkart.tech/ratings-reviews-flipkart-part-2-574ab08e75cf)
* **Prefetch Caching of Items at eBay** (https://tech.ebayinc.com/engineering/prefetch-caching-of-ebay-items/)
* **Cross-Region Caching Library at Wix** (https://www.wix.engineering/post/how-we-built-a-cross-region-caching-library)
* **Improving Distributed Caching Performance and Efficiency at Pinterest** (https://medium.com/pinterest-engineering/improving-distributed-caching-performance-and-efficiency-at-pinterest-924
84b5fe39b)
* **Standardize and Improve Microservices Caching at DoorDash** (https://doordash.engineering/2023/10/19/how-doordash-standardized-and-improved-microservices-caching/)
* **HTTP Caching and CDN** (https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)
* **Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga** (https://www.zynga.com/blogs/engineering/zynga-geo-proxy-reducing-mobile-game-latency)
* **Google AMP at Condé Nast** (https://technology.condenast.com/story/the-why-and-how-of-google-amp-at-conde-nast)
* **A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo** (https://deliveroo.engineering/2016/09/19/ab-testing-cdns.html)
* **HAProxy with Kubernetes for User-facing Traffic at SoundCloud** (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic)
* **Bandaid: Service Proxy at Dropbox** (https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy/)
* **Service Workers at Slack** (https://slack.engineering/service-workers-at-slack-our-quest-for-faster-boot-times-and-offline-support-3492cf79c88)
* **CDN Services at Spotify** (https://labs.spotify.com/2020/02/24/how-spotify-aligned-cdn-services-for-a-lightning-fast-streaming-experience/)
⟡ Distributed Locking (https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
* **Chubby: Lock Service for Loosely Coupled Distributed Systems at Google** (https://blog.acolyer.org/2015/02/13/the-chubby-lock-service-for-loosely-coupled-distributed-systems/)
* **Distributed Locking at Uber** (https://www.youtube.com/watch?v=MDuagr729aU)
* **Distributed Locks using Redis at GoSquared** (https://engineering.gosquared.com/distributed-locks-using-redis)
* **ZooKeeper at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/zookeeper-at-twitter.html)
* **Eliminating Duplicate Queries using Distributed Locking at Chartio** (https://blog.chartio.com/posts/eliminating-duplicate-queries-using-distributed-locking)
⟡ Distributed Tracking, Tracing, and Measuring (https://www.oreilly.com/ideas/understanding-the-value-of-distributed-tracing)
* **Zipkin: Distributed Systems Tracing at Twitter** (https://blog.twitter.com/engineering/en_us/a/2012/distributed-systems-tracing-with-zipkin.html)
* **Improve Zipkin Traces using Kubernetes Pod Metadata at SoundCloud** (https://developers.soundcloud.com/blog/using-kubernetes-pod-metadata-to-improve-zipkin-traces)
* **Canopy: Scalable Distributed Tracing & Analysis at Facebook** (https://www.infoq.com/presentations/canopy-scalable-tracing-analytics-facebook)
* **Pintrace: Distributed Tracing at Pinterest** (https://medium.com/@Pinterest_Engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b)
* **XCMetrics: All-in-One Tool for Tracking Xcode Build Metrics at Spotify** (https://engineering.atspotify.com/2021/01/20/introducing-xcmetrics-our-all-in-one-tool-for-tracking-xcode-build-m
etrics/)
* **Real-time Distributed Tracing at LinkedIn** (https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency)
* **Tracking Service Infrastructure at Scale at Shopify** (https://www.usenix.org/conference/srecon17americas/program/presentation/arthorne)
* **Distributed Tracing at HelloFresh** (https://engineering.hellofresh.com/scaling-hellofresh-distributed-tracing-7b182928247d)
* **Analyzing Distributed Trace Data at Pinterest** (https://medium.com/@Pinterest_Engineering/analyzing-distributed-trace-data-6aae58919949)
* **Distributed Tracing at Uber** (https://eng.uber.com/distributed-tracing/)
* **JVM Profiler: Tracing Distributed JVM Applications at Uber** (https://eng.uber.com/jvm-profiler/)
* **Data Checking at Dropbox** (https://www.usenix.org/conference/srecon17asia/program/presentation/mah)
* **Tracing Distributed Systems at Showmax** (https://tech.showmax.com/2016/10/tracing-distributed-systems-at-showmax/)
* **osquery Across the Enterprise at Palantir** (https://medium.com/@palantir/osquery-across-the-enterprise-3c3c9d13ec55)
* **StatsD at Etsy** (https://codeascraft.com/2011/02/15/measure-anything-measure-everything/)
* **StatsD at DoorDash** (https://blog.doordash.com/scaling-statsd-84d456a7cc2a)
⟡ Distributed Scheduling (https://www.csee.umbc.edu/courses/graduate/CMSC621/fall02/lectures/ch11.pdf)
* **Distributed Task Scheduling (3 parts) at PagerDuty** (https://www.pagerduty.com/eng/distributed-task-scheduling-3/)
* **Building Cron at Google** (https://landing.google.com/sre/sre-book/chapters/distributed-periodic-scheduling/)
* **Distributed Cron Architecture at Quora** (https://engineering.quora.com/Quoras-Distributed-Cron-Architecture)
* **Chronos: A Replacement for Cron at Airbnb** (https://medium.com/airbnb-engineering/chronos-a-replacement-for-cron-f05d7d986a9d)
* **Scheduler at Nextdoor** (https://engblog.nextdoor.com/we-don-t-run-cron-jobs-at-nextdoor-6f7f9cc62040)
* **Peloton: Unified Resource Scheduler for Diverse Cluster Workloads at Uber** (https://eng.uber.com/peloton/)
* **Fenzo: OSS Scheduler for Apache Mesos Frameworks at Netflix** (https://medium.com/netflix-techblog/fenzo-oss-scheduler-for-apache-mesos-frameworks-5c340e77e543)
* **Airflow - Workflow Orchestration** (https://airflow.apache.org/)
* **Airflow at Airbnb** (https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8)
* **Airflow at Pandora** (https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee)
* **Airflow at Robinhood** (https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8)
* **Airflow at Lyft** (https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff)
* **Airflow at Drivy** (https://drivy.engineering/airflow-architecture/)
* **Airflow at Grab** (https://engineering.grab.com/experimentation-platform-data-pipeline)
* **Airflow at Adobe** (https://medium.com/adobetech/adobe-experience-platform-orchestration-service-with-apache-airflow-952203723c0b)
* **Auditing Airflow Job Runs at Walmart** (https://medium.com/walmartlabs/auditing-airflow-batch-jobs-73b45100045)
* **MaaT: DAG-based Distributed Task Scheduler at Alibaba** (https://hackernoon.com/meet-maat-alibabas-dag-based-distributed-task-scheduler-7c9cf0c83438)
* **boundary-layer: Declarative Airflow Workflows at Etsy** (https://codeascraft.com/2018/11/14/boundary-layer%e2%80%89-declarative-airflow-workflows/)
⟡ Distributed Monitoring and Alerting (https://www.oreilly.com/ideas/monitoring-distributed-systems)
* **Unicorn: Remediation System at eBay** (https://www.ebayinc.com/stories/blogs/tech/unicorn-rheos-remediation-center/)
* **M3: Metrics and Monitoring Platform at Uber** (https://eng.uber.com/optimizing-m3/)
* **Athena: Automated Build Health Management System at Dropbox** (https://blogs.dropbox.com/tech/2019/05/athena-our-automated-build-health-management-system/)
* **Vortex: Monitoring Server Applications at Dropbox** (https://blogs.dropbox.com/tech/2019/11/monitoring-server-applications-with-vortex/)
* **Nuage: Cloud Management Service at LinkedIn** (https://engineering.linkedin.com/blog/2019/solving-manageability-challenges-with-nuage)
* **Telltale: Application Monitoring at Netflix** (https://netflixtechblog.com/telltale-netflix-application-monitoring-simplified-5c08bfa780ba)
* **ThirdEye: Monitoring Platform at LinkedIn** (https://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor)
* **Periskop: Exception Monitoring Service at SoundCloud** (https://developers.soundcloud.com/blog/periskop-exception-monitoring-service)
* **Securitybot: Distributed Alerting Bot at Dropbox** (https://blogs.dropbox.com/tech/2017/02/meet-securitybot-open-sourcing-automated-security-at-scale/)
* **Monitoring System at Alibaba** (https://www.usenix.org/conference/srecon18asia/presentation/xinchi)
* **Real User Monitoring at Dailymotion** (https://medium.com/dailymotion/real-user-monitoring-1948375f8be5)
* **Alerting Ecosystem at Uber** (https://eng.uber.com/observability-at-scale/)
* **Alerting Framework at Airbnb** (https://medium.com/airbnb-engineering/alerting-framework-at-airbnb-35ba48df894f)
* **Alerting on Service-Level Objectives (SLOs) at SoundCloud** (https://developers.soundcloud.com/blog/alerting-on-slos)
* **Job-based Forecasting Workflow for Observability Anomaly Detection at Uber** (https://eng.uber.com/observability-anomaly-detection/)
* **Monitoring and Alert System using Graphite and Cabot at HackerEarth** (http://engineering.hackerearth.com/2017/03/21/monitoring-and-alert-system-using-graphite-and-cabot/)
* **Observability (2 parts) at Twitter** (https://blog.twitter.com/engineering/en_us/a/2016/observability-at-twitter-technical-overview-part-ii.html)
* **Distributed Security Alerting at Slack** (https://slack.engineering/distributed-security-alerting-c89414c992d6)
* **Real-Time News Alerting at Bloomberg** (https://www.infoq.com/presentations/news-alerting-bloomberg)
* **Data Pipeline Monitoring System at LinkedIn** (https://engineering.linkedin.com/blog/2019/an-inside-look-at-linkedins-data-pipeline-monitoring-system-)
* **Monitoring and Observability at Picnic** (https://blog.picnic.nl/monitoring-and-observability-at-picnic-684cefd845c4)
⟡ Distributed Security (https://msdn.microsoft.com/en-us/library/cc767123.aspx)
* **Approach to Security at Scale at Dropbox** (https://blogs.dropbox.com/tech/2018/02/security-at-scale-the-dropbox-approach/)
* **Aardvark and Repokid: AWS Least Privilege for Distributed, High-Velocity Development at Netflix** (https://medium.com/netflix-techblog/introducing-aardvark-and-repokid-53b081bf3a7e)
* **LISA: Distributed Firewall at LinkedIn** (https://www.slideshare.net/MikeSvoboda/2017-lisa-linkedins-distributed-firewall-dfw)
* **Secure Infrastructure To Store Bitcoin In The Cloud at Coinbase** (https://engineering.coinbase.com/how-coinbase-builds-secure-infrastructure-to-store-bitcoin-in-the-cloud-30a6504e40ba)
* **BinaryAlert: Real-time Serverless Malware Detection at Airbnb** (https://medium.com/airbnb-engineering/binaryalert-real-time-serverless-malware-detection-ca44370c1b90)
* **Scalable IAM Architecture to Secure Access to 100 AWS Accounts at Segment** (https://segment.com/blog/secure-access-to-100-aws-accounts/)
* **OAuth Audit Toolbox at Indeed** (http://engineering.indeedblog.com/blog/2018/04/oaudit-toolbox/)
* **Active Directory Password Blacklisting at Yelp** (https://engineeringblog.yelp.com/2018/04/ad-password-blacklisting.html)
* **Syscall Auditing at Scale at Slack** (https://slack.engineering/syscall-auditing-at-scale-e6a3ca8ac1b8)
* **Athenz: Fine-Grained, Role-Based Access Control at Yahoo** (https://yahooeng.tumblr.com/post/160481899076/open-sourcing-athenz-fine-grained-role-based)
* **WebAuthn Support for Secure Sign In at Dropbox** (https://blogs.dropbox.com/tech/2018/05/introducing-webauthn-support-for-secure-dropbox-sign-in/)
* **Security Development Lifecycle at Slack** (https://slack.engineering/moving-fast-and-securing-things-540e6c5ae58a)
* **Unprivileged Container Builds at Kinvolk** (https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/)
* **Diffy: Differencing Engine for Digital Forensics in the Cloud at Netflix** (https://medium.com/netflix-techblog/netflix-sirt-releases-diffy-a-differencing-engine-for-digital-forensics-in-
the-cloud-37b71abd2698)
* **Detecting Credential Compromise in AWS at Netflix** (https://medium.com/netflix-techblog/netflix-cloud-security-detecting-credential-compromise-in-aws-9493d6fd373a)
* **Scalable User Privacy at Spotify** (https://labs.spotify.com/2018/09/18/scalable-user-privacy/)
* **AVA: Audit Web Applications at Indeed** (https://engineering.indeedblog.com/blog/2018/09/application-scanning/)
* **TTL as a Service: Automatic Revocation of Stale Privileges at Yelp** (https://engineeringblog.yelp.com/2018/11/ttl-as-a-service.html)
* **Enterprise Key Management at Slack** (https://slack.engineering/engineering-dive-into-slack-enterprise-key-management-1fce471b178c)
* **Scalability and Authentication at Twitch** (https://blog.twitch.tv/en/2019/03/15/how-twitch-addresses-scalability-and-authentication/)
* **Edge Authentication and Token-Agnostic Identity Propagation at Netflix** (https://netflixtechblog.com/edge-authentication-and-token-agnostic-identity-propagation-514e47e0b602)
* **Hardening Kubernetes Infrastructure with Cilium at Palantir** (https://blog.palantir.com/hardening-palantirs-kubernetes-infrastructure-with-cilium-1c40d4c7ef0)
* **Improving Web Vulnerability Management through Automation at Lyft** (https://eng.lyft.com/improving-web-vulnerability-management-through-automation-2631570d8415)
* **Clock Skew when Syncing Password Payloads at Drobbox** (https://dropbox.tech/application/dropbox-passwords-clock-skew-payload-sync-merge)
⟡ Distributed Messaging, Queuing, and Event Streaming (https://arxiv.org/pdf/1704.00411.pdf)
* **Cape: Event Stream Processing Framework at Dropbox** (https://blogs.dropbox.com/tech/2017/05/introducing-cape/)
* **Brooklin: Distributed Service for Near Real-Time Data Streaming at LinkedIn** (https://engineering.linkedin.com/blog/2019/brooklin-open-source)
* **Samza: Stream Processing System for Latency Insighs at LinkedIn** (https://engineering.linkedin.com/blog/2018/04/samza-aeon--latency-insights-for-asynchronous-one-way-flows)
* **Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo** (https://yahooeng.tumblr.com/post/161855616651/open-sourcing-bullet-yahoos-forward-looking)
* **EventHorizon: Tool for Watching Events Streaming at Etsy** (https://codeascraft.com/2018/05/29/the-eventhorizon-saga/)
* **Qmessage: Distributed, Asynchronous Task Queue at Quora** (https://engineering.quora.com/Qmessage-Handling-Billions-of-Tasks-Per-Day)
* **Cherami: Message Queue System for Transporting Async Tasks at Uber** (https://eng.uber.com/cherami/)
* **Dynein: Distributed Delayed Job Queueing System at Airbnb** (https://medium.com/airbnb-engineering/dynein-building-a-distributed-delayed-job-queueing-system-93ab10f05f99)
* **Timestone: Queueing System for Non-Parallelizable Workloads at Netflix** (https://netflixtechblog.com/timestone-netflixs-high-throughput-low-latency-priority-queueing-system-with-built-in
-support-1abf249ba95f)
* **Messaging Service at Riot Games** (https://engineering.riotgames.com/news/riot-messaging-service)
* **Debugging Production with Event Logging at Zillow** (https://www.zillow.com/engineering/debugging-production-event-logging/)
* **Cross-platform In-app Messaging Orchestration Service at Netflix** (https://medium.com/netflix-techblog/building-a-cross-platform-in-app-messaging-orchestration-service-86ba614f92d8)
* **Video Gatekeeper at Netflix** (https://medium.com/netflix-techblog/re-architecting-the-video-gatekeeper-f7b0ac2f6b00)
* **Scaling Push Messaging for Millions of Devices at Netflix** (https://www.infoq.com/presentations/neflix-push-messaging-scale)
* **Delaying Asynchronous Message Processing with RabbitMQ at Indeed** (http://engineering.indeedblog.com/blog/2017/06/delaying-messages/)
* **Benchmarking Streaming Computation Engines at Yahoo** (https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at)
* **Improving Stream Data Quality With Protobuf Schema Validation at Deliveroo** (https://deliveroo.engineering/2019/02/05/improving-stream-data-quality-with-protobuf-schema-validation.html)
* **Scaling Email Infrastructure at Medium** (https://medium.engineering/scaling-email-infrastructure-for-medium-digest-254223c883b8)
* **Real-time Messaging at Slack** (https://slack.engineering/real-time-messaging/)
* **Event Stream Database at Nike** (https://medium.com/nikeengineering/moving-faster-with-aws-by-creating-an-event-stream-database-dedec8ca3eeb)
* **Event Tracking System at Udemy** (https://medium.com/udemy-engineering/designing-the-new-event-tracking-system-at-udemy-a45e502216fd)
* **Event-Driven Messaging** (https://martinfowler.com/articles/201701-event-driven.html)
* **Domain-Driven Design at Alibaba** (https://medium.com/swlh/creating-coding-excellence-with-domain-driven-design-88f73d2232c3)
* **Domain-Driven Design at Weebly** (https://medium.com/weebly-engineering/how-to-organize-your-monolith-before-breaking-it-into-services-69cbdb9248b0)
* **Domain-Driven Design at Moonpig** (https://engineering.moonpig.com/development/modelling-for-domain-driven-design)
* **Scaling Event Sourcing for Netflix Downloads** (https://www.infoq.com/presentations/netflix-scale-event-sourcing)
* **Scaling Event-Sourcing at Jet.com** (https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c873cac33b8)
* **Event Sourcing (2 parts) at eBay** (https://www.ebayinc.com/stories/blogs/tech/event-sourcing-in-action-with-ebays-continuous-delivery-team/)
* **Event Sourcing at FREE NOW** (https://medium.com/inside-freenow/event-sourcing-an-evolutionary-perspective-31e7387aa6f1)
* **Scalable content feed using Event Sourcing and CQRS patterns at Brainly** (https://medium.com/engineering-brainly/scalable-content-feed-using-event-sourcing-and-cqrs-patterns-e09df98bf977
)
* **Pub-Sub Messaging** (https://aws.amazon.com/pub-sub-messaging/)
* **Pulsar: Pub-Sub Messaging at Scale at Yahoo** (https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale)
* **Wormhole: Pub-Sub System at Facebook** (https://code.facebook.com/posts/188966771280871/wormhole-pub-sub-system-moving-data-through-space-and-time/)
* **MemQ: Cloud Native Pub-Sub System at Pinterest** (https://medium.com/pinterest-engineering/memq-an-efficient-scalable-cloud-native-pubsub-system-4402695dd4e7)
* **Pub-Sub in Microservices at Netflix** (https://medium.com/netflix-techblog/how-netflix-microservices-tackle-dataset-pub-sub-4a068adcc9a)
* **Kafka - Message Broker** (https://martin.kleppmann.com/papers/kafka-debull15.pdf)
* **Kafka at LinkedIn** (https://engineering.linkedin.com/kafka/running-kafka-scale)
* **Kafka at Pinterest** (https://medium.com/pinterest-engineering/how-pinterest-runs-kafka-at-scale-ff9c6f735be)
* **Kafka at Trello** (https://tech.trello.com/why-we-chose-kafka/)
* **Kafka at Salesforce** (https://engineering.salesforce.com/how-apache-kafka-inspired-our-platform-events-architecture-2f351fe4cf63)
* **Kafka at The New York Times** (https://open.nytimes.com/publishing-with-apache-kafka-at-the-new-york-times-7f0e3b7d2077)
* **Kafka at Yelp** (https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html)
* **Kafka at Criteo** (https://medium.com/criteo-labs/upgrading-kafka-on-a-large-infra-3ee99f56e970)
* **Kafka on Kubernetes at Shopify** (https://shopifyengineering.myshopify.com/blogs/engineering/running-apache-kafka-on-kubernetes-at-shopify)
* **Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (2 parts)** (https://engineeringblog.yelp.com/2022/03/kafka-on-paasta-part-two.html)
* **Migrating Kafka's Zookeeper with No Downtime at Yelp** (https://engineeringblog.yelp.com/2019/01/migrating-kafkas-zookeeper-with-no-downtime.html)
* **Reprocessing and Dead Letter Queues with Kafka at Uber** (https://eng.uber.com/reliable-reprocessing/)
* **Chaperone: Audit Kafka End-to-End at Uber** (https://eng.uber.com/chaperone/)
* **Finding Kafka throughput limit in infrastructure at Dropbox** (https://blogs.dropbox.com/tech/2019/01/finding-kafkas-throughput-limit-in-dropbox-infrastructure/)
* **Cost Orchestration at Walmart** (https://medium.com/walmartlabs/cost-orchestration-at-walmart-f34918af67c4)
* **InfluxDB and Kafka to Scale to Over 1 Million Metrics a Second at Hulu** (https://medium.com/hulu-tech-blog/how-hulu-uses-influxdb-and-kafka-to-scale-to-over-1-million-metrics-a-second-17
21476aaff5)
* **Scaling Kafka to Support Data Growth at PayPal** (https://medium.com/paypal-tech/scaling-kafka-to-support-paypals-data-growth-a0b4da420fab)
* **Stream Data Deduplication** (https://en.wikipedia.org/wiki/Data_deduplication)
* **Exactly-once Semantics with Kafka** (https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/)
* **Real-time Deduping at Tapjoy** (http://eng.tapjoy.com/blog-list/real-time-deduping-at-scale)
* **Deduplication at Segment** (https://segment.com/blog/exactly-once-delivery/)
* **Deduplication at Mail.Ru** (https://medium.com/@andrewsumin/efficient-storage-how-we-went-down-from-50-pb-to-32-pb-99f9c61bf6b4)
* **Petabyte Scale Data Deduplication at Mixpanel** (https://medium.com/mixpaneleng/petabyte-scale-data-deduplication-mixpanel-engineering-e808c70c99f8)
⟡ Distributed Logging (https://blog.codinghorror.com/the-problem-with-logging/)
* **Logging at LinkedIn** (https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying)
* **Scalable and Reliable Log Ingestion at Pinterest** (https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
* **High-performance Replicated Log Service at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2015/building-distributedlog-twitter-s-high-performance-replicated-l
og-servic.html)
* **Logging Service with Spark at CERN Accelerator** (https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html)
* **Logging and Aggregation at Quora** (https://engineering.quora.com/Logging-and-Aggregation-at-Quora)
* **Collection and Analysis of Daemon Logs at Badoo** (https://badoo.com/techblog/blog/2016/06/06/collection-and-analysis-of-daemon-logs-at-badoo/)
* **Log Parsing with Static Code Analysis at Palantir** (https://medium.com/palantir/using-static-code-analysis-to-improve-log-parsing-18f0d1843965)
* **Centralized Application Logging at eBay** (https://tech.ebayinc.com/engineering/low-latency-and-high-throughput-cal-ingress/)
* **Enrich VPC Flow Logs at Hyper Scale to provide Network Insight at Netflix** (https://netflixtechblog.com/hyper-scale-vpc-flow-logs-enrichment-to-provide-network-insight-e5f1db02910d)
* **BookKeeper: Distributed Log Storage at Yahoo** (https://yahooeng.tumblr.com/post/109908973316/bookkeeper-yahoos-distributed-log-storage-is)
* **LogDevice: Distributed Data Store for Logs at Facebook** (https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/)
* **LogFeeder: Log Collection System at Yelp** (https://engineeringblog.yelp.com/2018/03/introducing-logfeeder.html)
* **DBLog: Generic Change-Data-Capture Framework at Netflix** (https://medium.com/netflix-techblog/dblog-a-generic-change-data-capture-framework-69351fb9099b)
⟡ Distributed Searching (http://nwds.cs.washington.edu/files/nwds/pdf/Distributed-WR.pdf)
* **Search Architecture at Instagram** (https://instagram-engineering.com/search-architecture-eeb34a936d3a)
* **Search Architecture at eBay** (http://www.cs.otago.ac.nz/homepages/andrew/papers/2017-8.pdf)
* **Search Architecture at Box** (https://medium.com/box-tech-blog/scaling-box-search-using-lumos-22d9e0cb4175)
* **Search Discovery Indexing Platform at Coupang** (https://medium.com/coupang-tech/the-evolution-of-search-discovery-indexing-platform-fa43e41305f9)
* **Universal Search System at Pinterest** (https://medium.com/pinterest-engineering/building-a-universal-search-system-for-pinterest-e4cb03a898d4)
* **Improving Search Engine Efficiency by over 25% at eBay** (https://www.ebayinc.com/stories/blogs/tech/making-e-commerce-search-faster/)
* **Indexing and Querying Telemetry Logs with Lucene at Palantir** (https://medium.com/palantir/indexing-and-querying-telemetry-logs-with-lucene-234c5ce3e5f3)
* **Query Understanding at TripAdvisor** (https://www.tripadvisor.com/engineering/query-understanding-at-tripadvisor/)
* **Search Federation Architecture at LinkedIn (2018)** (https://engineering.linkedin.com/blog/2018/03/search-federation-architecture-at-linkedin)
* **Search at Slack** (https://slack.engineering/search-at-slack-431f8c80619e)
* **Search and Recommendations at DoorDash** (https://blog.doordash.com/powering-search-recommendations-at-doordash-8310c5cfd88c)
* **Stability and Scalability for Search at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2022/stability-and-scalability-for-search)
* **Search Service at Twitter (2014)** (https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html)
* **Autocomplete Search (2 parts) at Traveloka** (https://medium.com/traveloka-engineering/high-quality-autocomplete-search-part-2-d5b15bb0dadf)
* **Data-Driven Autocorrection System at Canva** (https://product.canva.com/building-a-data-driven-autocorrection-system/)
* **Adapting Search to Indian Phonetics at Flipkart** (https://blog.flipkart.tech/adapting-search-to-indian-phonetics-cdbe65259686)
* **Nautilus: Search Engine at Dropbox** (https://blogs.dropbox.com/tech/2018/09/architecture-of-nautilus-the-new-dropbox-search-engine/)
* **Galene: Search Architecture of LinkedIn** (https://engineering.linkedin.com/search/did-you-mean-galene)
* **Manas: High Performing Customized Search System at Pinterest** (https://medium.com/@Pinterest_Engineering/manas-a-high-performing-customized-search-system-cf189f6ca40f)
* **Sherlock: Near Real Time Search Indexing at Flipkart** (https://blog.flipkart.tech/sherlock-near-real-time-search-indexing-95519783859d)
* **Nebula: Storage Platform to Build Search Backends at Airbnb** (https://medium.com/airbnb-engineering/nebula-as-a-storage-platform-to-build-airbnbs-search-backends-ecc577b05f06)
* **ELK (Elasticsearch, Logstash, Kibana) Stack** (https://logz.io/blog/15-tech-companies-chose-elk-stack/)
* **Predictions in Real Time with ELK at Uber** (https://eng.uber.com/elk/)
* **Building a scalable ELK stack at Envato** (https://webuild.envato.com/blog/building-a-scalable-elk-stack/)
* **ELK at Robinhood** (https://robinhood.engineering/taming-elk-4e1349f077c3)
* **Scaling Elasticsearch Clusters at Uber** (https://www.infoq.com/presentations/uber-elasticsearch-clusters?utm_source=presentations_about_Case_Study&utm_medium=link&utm_campaign=Case_Study
)
* **Elasticsearch Performance Tuning Practice at eBay** (https://www.ebayinc.com/stories/blogs/tech/elasticsearch-performance-tuning-practice-at-ebay/)
* **Improve Performance using Elasticsearch Plugins (2 parts) at Tinder** (https://medium.com/tinder-engineering/how-we-improved-our-performance-using-elasticsearch-plugins-part-2-b051da2ee85
b)
* **Elasticsearch at Kickstarter** (https://kickstarter.engineering/elasticsearch-at-kickstarter-db3c487887fc)
* **Log Parsing with Logstash and Google Protocol Buffers at Trivago** (https://tech.trivago.com/2016/01/19/logstash_protobuf_codec/)
* **Fast Order Search using Data Pipeline and Elasticsearch at Yelp** (https://engineeringblog.yelp.com/2018/06/fast-order-search.html)
* **Moving Core Business Search to Elasticsearch at Yelp** (https://engineeringblog.yelp.com/2017/06/moving-yelps-core-business-search-to-elasticsearch.html)
* **Sharding out Elasticsearch at Vinted** (http://engineering.vinted.com/2017/06/05/sharding-out-elasticsearch/)
* **Self-Ranking Search with Elasticsearch at Wattpad** (http://engineering.wattpad.com/post/146216619727/self-ranking-search-with-elasticsearch-at-wattpad)
* **Vulcanizer: a library for operating Elasticsearch at Github** (https://github.blog/2019-03-05-vulcanizer-a-library-for-operating-elasticsearch/)
⟡ Distributed Storage (http://highscalability.com/blog/2011/11/1/finding-the-right-data-solution-for-your-application-in-the.html)
* **In-memory Storage** (https://medium.com/@denisanikin/what-an-in-memory-database-is-and-how-it-persists-data-efficiently-f43868cff4c1)
* **MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar (SQL)** (http://highscalability.com/blog/2012/8/14/memsql-architecture-the-fast-mvcc-inmem-lockfree-codegen-an
d.html)
* **Optimizing Memcached Efficiency at Quora** (https://engineering.quora.com/Optimizing-Memcached-Efficiency)
* **Real-Time Data Warehouse with MemSQL on Cisco UCS** (https://blogs.cisco.com/datacenter/memsql)
* **Moving to MemSQL at Tapjoy** (http://eng.tapjoy.com/blog-list/moving-to-memsql)
* **MemSQL and Kinesis for Real-time Insights at Disney** (https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/68131)
* **MemSQL to Query Hundreds of Billions of Rows in a Dashboard at Pandora** (https://engineering.pandora.com/using-memsql-at-pandora-79a86cb09b57)
* **Object Storage** (http://www.datacenterknowledge.com/archives/2013/10/04/object-storage-the-future-of-scale-out)
* **Scaling HDFS at Uber** (https://eng.uber.com/scaling-hdfs/)
* **Reasons for Choosing S3 over HDFS at Databricks** (https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html)
* **File System on Amazon S3 at Quantcast** (https://www.quantcast.com/blog/quantcast-file-system-on-amazon-s3/)
* **Image Recovery at Scale Using S3 Versioning at Trivago** (https://tech.trivago.com/2018/09/03/efficient-image-recovery-at-scale-using-amazon-s3-versioning/)
* **Cloud Object Store at Yahoo** (https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at)
* **Ambry: Distributed Immutable Object Store at LinkedIn** (https://www.usenix.org/conference/srecon17americas/program/presentation/shenoy)
* **Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity at LinkedIn** (https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-
with-maximum)
* **Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb** (https://medium.com/airbnb-engineering/hammerspace-persistent-concurrent-off-heap-storage-3db39bb04472)
* **MezzFS: Mounting Object Storage in Media Processing Platform at Netflix** (https://medium.com/netflix-techblog/mezzfs-mounting-object-storage-in-netflixs-media-processing-platform-cda01c4
46ba)
* **Magic Pocket: In-house Multi-exabyte Storage System at Dropbox** (https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pocket/)
⟡ Relational Databases (https://www.mysql.com/products/cluster/scalability.html)
* **Building and Deploying MySQL Raft at Meta** (https://engineering.fb.com/2023/05/16/data-infrastructure/mysql-raft-meta/)
* **MySQL for Schema-less Data at FriendFeed** (https://backchannel.org/blog/friendfeed-schemaless-mysql)
* **MySQL at Pinterest** (https://medium.com/@Pinterest_Engineering/learn-to-stop-using-shiny-new-things-and-love-mysql-3e1613c2ce14)
* **PostgreSQL at Twitch** (https://blog.twitch.tv/how-twitch-uses-postgresql-c34aa9e56f58)
* **Scaling MySQL-based Financial Reporting System at Airbnb** (https://medium.com/airbnb-engineering/tracking-the-money-scaling-financial-reporting-at-airbnb-6d742b80f040)
* **Scaling MySQL at Wix** (https://www.wix.engineering/post/scaling-to-100m-mysql-is-a-better-nosql)
* **MaxScale (MySQL) Database Proxy at Airbnb** (https://medium.com/airbnb-engineering/unlocking-horizontal-scalability-in-our-web-serving-tier-d907449cdbcf)
* **Switching from Postgres to MySQL at Uber** (https://eng.uber.com/mysql-migration/)
* **Handling Growth with Postgres at Instagram** (https://engineering.instagram.com/handling-growth-with-postgres-5-tips-from-instagram-d5d7e7ffdfcb)
* **Scaling the Analytics Database (Postgres) at TransferWise** (http://tech.transferwise.com/scaling-our-analytics-database/)
* **Updating a 50 Terabyte PostgreSQL Database at Adyen** (https://medium.com/adyen/updating-a-50-terabyte-postgresql-database-f64384b799e7)
* **Scaling Database Access for 100s of Billions of Queries per Day at PayPal** (https://medium.com/paypal-engineering/scaling-database-access-for-100s-of-billions-of-queries-per-day-paypal-i
ntroducing-hera-e192adacda54)
* **Minimizing Read-Write MySQL Downtime at Yelp** (https://engineeringblog.yelp.com/2020/11/minimizing-read-write-mysql-downtime.html)
* **Migrating MySQL from 5.6 to 8.0 at Facebook** (https://engineering.fb.com/2021/07/22/data-infrastructure/mysql/)
* **Migration from HBase to MyRocks at Quora** (https://quoraengineering.quora.com/Migration-from-HBase-to-MyRocks-at-Quora)
* **Replication** (https://docs.microsoft.com/en-us/sql/relational-databases/replication/types-of-replication)
* **MySQL Parallel Replication (4 parts) at Booking.com** (https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-annex-under-the-hood-eb456cf8b2fb)
* **Mitigating MySQL Replication Lag and Reducing Read Load at Github** (https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/)
* **Read Consistency with Database Replicas at Shopify** (https://shopify.engineering/read-consistency-database-replicas)
* **Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift at Yelp** (https://engineeringblog.yelp.com/2018/04/black-box-auditing.html)
* **Partitioning Main MySQL Database at Airbnb** (https://medium.com/airbnb-engineering/how-we-partitioned-airbnb-s-main-database-in-two-weeks-55f7e006ff21)
* **Herb: Multi-DC Replication Engine for Schemaless Datastore at Uber** (https://eng.uber.com/herb-datacenter-replication/)
* **Sharding** (https://quabase.sei.cmu.edu/mediawiki/index.php/Shard_data_set_across_multiple_servers_(Range-based))
* **Sharding MySQL at Pinterest** (https://medium.com/@Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f)
* **Sharding MySQL at Twilio** (https://www.twilio.com/engineering/2014/06/26/how-we-replaced-our-data-pipeline-with-zero-downtime)
* **Sharding MySQL at Square** (https://medium.com/square-corner-blog/sharding-cash-10280fa3ef3b)
* **Sharding MySQL at Quora** (https://www.quora.com/q/quoraengineering/MySQL-sharding-at-Quora)
* **Sharding Layer of Schemaless Datastore at Uber** (https://eng.uber.com/schemaless-rewrite/)
* **Sharding & IDs at Instagram** (https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c)
* **Sharding Postgres at Notion** (https://www.notion.so/blog/sharding-postgres-at-notion)
* **Solr: Improving Performance for Batch Indexing at Box** (https://blog.box.com/blog/solr-improving-performance-batch-indexing/)
* **Geosharded Recommendations (3 parts) at Tinder** (https://medium.com/tinder-engineering/geosharded-recommendations-part-3-consistency-2d2cb2f0594b)
* **Scaling Services with Shard Manager at Facebook** (https://engineering.fb.com/production-engineering/scaling-services-with-shard-manager/)
* **Presto the Distributed SQL Query Engine** (https://research.fb.com/wp-content/uploads/2019/03/Presto-SQL-on-Everything.pdf?)
* **Presto at Pinterest** (https://medium.com/@Pinterest_Engineering/presto-at-pinterest-a8bda7515e52)
* **Presto Infrastructure at Lyft** (https://eng.lyft.com/presto-infrastructure-at-lyft-b10adb9db01)
* **Presto at Grab** (https://engineering.grab.com/scaling-like-a-boss-with-presto)
* **Engineering Data Analytics with Presto and Apache Parquet at Uber** (https://eng.uber.com/presto/)
* **Data Wrangling at Slack** (https://slack.engineering/data-wrangling-at-slack-f2e0ff633b69)
* **Presto in Big Data Platform on AWS at Netflix** (https://medium.com/netflix-techblog/using-presto-in-our-big-data-platform-on-aws-938035909fd4)
* **Presto Auto Scaling at Eventbrite** (https://www.eventbrite.com/engineering/big-data-workloads-presto-auto-scaling/)
* **Speed Up Presto with Alluxio Local Cache at Uber** (https://www.uber.com/en-MY/blog/speed-up-presto-with-alluxio-local-cache/)
⟡ NoSQL Databases (https://www.thoughtworks.com/insights/blog/nosql-databases-overview)
* **Key-Value Databases** (http://www.cs.ucsb.edu/~agrawal/fall2009/dynamo.pdf)
* **DynamoDB at Nike** (https://medium.com/nikeengineering/becoming-a-nimble-giant-how-dynamo-db-serves-nike-at-scale-4cc375dbb18e)
* **DynamoDB at Segment** (https://segment.com/blog/the-million-dollar-eng-problem/)
* **DynamoDB at Mapbox** (https://blog.mapbox.com/scaling-mapbox-infrastructure-with-dynamodb-streams-d53eabc5e972)
* **Manhattan: Distributed Key-Value Database at Twitter** (https://blog.twitter.com/engineering/en_us/a/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale.html)
* **Sherpa: Distributed NoSQL Key-Value Store at Yahoo** (https://yahooeng.tumblr.com/post/120730204806/sherpa-scales-new-heights)
* **HaloDB: Embedded Key-Value Storage Engine at Yahoo** (https://yahooeng.tumblr.com/post/178262468576/introducing-halodb-a-fast-embedded-key-value)
* **MPH: Fast and Compact Immutable Key-Value Stores at Indeed** (http://engineering.indeedblog.com/blog/2018/02/indeed-mph/)
* **Venice: Distributed Key-Value Database at Linkedin** (https://engineering.linkedin.com/blog/2017/02/building-venice-with-apache-helix)
* **Columnar Databases** (https://aws.amazon.com/nosql/columnar/)
* **Cassandra** (http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)
* **Cassandra at Instagram** (https://www.slideshare.net/DataStax/cassandra-at-instagram-2016)
* **Storing Images in Cassandra at Walmart** (https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593)
* **Storing Messages with Cassandra at Discord** (https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7)
* **Scaling Cassandra Cluster at Walmart** (https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04)
* **Scaling Ad Analytics with Cassandra at Yelp** (https://engineeringblog.yelp.com/2016/08/how-we-scaled-our-ad-analytics-with-cassandra.html)
* **Scaling to 100+ Million Reads/Writes using Spark and Cassandra at Dream11** (https://medium.com/dream11-tech-blog/leaderboard-dream11-4efc6f93c23e)
* **Moving Food Feed from Redis to Cassandra at Zomato** (https://www.zomato.com/blog/how-we-moved-our-food-feed-from-redis-to-cassandra)
* **Benchmarking Cassandra Scalability on AWS at Netflix** (https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e)
* **Service Decomposition at Scale with Cassandra at Intuit QuickBooks** (https://quickbooks-engineering.intuit.com/service-decomposition-at-scale-70405ac2f637)
* **Cassandra for Keeping Counts In Sync at SoundCloud** (https://developers.soundcloud.com/blog/keeping-counts-in-sync)
* **Cassandra Driver Configuration for Improved Performance and Load Balancing at Glassdoor** (https://medium.com/glassdoor-engineering/cassandra-driver-configuration-for-improved-performance
-and-load-balancing-1b0106ce12bb)
* **cstar: Cassandra Orchestration Tool at Spotify** (https://labs.spotify.com/2018/09/04/introducing-cstar-the-spotify-cassandra-orchestration-tool-now-open-source/)
* **HBase** (https://hbase.apache.org/)
* **HBase at Salesforce** (https://engineering.salesforce.com/investing-in-big-data-apache-hbase-b9d98661a66b)
* **HBase in Facebook Messages** (https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919/)
* **HBase in Imgur Notification** (https://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/)
* **Improving HBase Backup Efficiency at Pinterest** (https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954)
* **HBase at Xiaomi** (https://www.slideshare.net/HBaseCon/hbase-practice-at-xiaomi)
* **Redshift** (https://www.allthingsdistributed.com/2018/11/amazon-redshift-performance-optimization.html)
* **Redshift at GIPHY** (https://engineering.giphy.com/scaling-redshift-without-scaling-costs/)
* **Redshift at Hudl** (https://www.hudl.com/bits/the-low-hanging-fruit-of-redshift-performance)
* **Redshift at Drivy** (https://drivy.engineering/redshift_tips_ticks_part_1/)
* **Document Databases** (https://msdn.microsoft.com/en-us/magazine/hh547103.aspx)
* **eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB** (https://www.mongodb.com/blog/post/ebay-building-mission-critical-multi-data-center-applications-with-mongodb
)
* **MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards** (https://www.mongodb.com/blog/post/mongodb-at-baidu-powering-100-apps-across-600-nodes-at-pb-scal
e)
* **Migrating Mongo Data at Addepar** (https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952)
* **The AWS and MongoDB Infrastructure of Parse (acquired by Facebook)** (https://medium.baqend.com/parse-is-gone-a-few-secrets-about-their-infrastructure-91b3ab2fcf71)
* **Migrating Mountains of Mongo Data at Addepar** (https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952)
* **Couchbase Ecosystem at LinkedIn** (https://engineering.linkedin.com/blog/2017/12/couchbase-ecosystem-at-linkedin)
* **SimpleDB at Zendesk** (https://medium.com/zendesk-engineering/resurrecting-amazon-simpledb-9404034ec506)
* **Espresso: Distributed Document Store at LinkedIn** (https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store)
* **Graph Databases** (https://www.eecs.harvard.edu/margo/papers/systor13-bench/)
* **FlockDB: Distributed Graph Database at Twitter** (https://blog.twitter.com/engineering/en_us/a/2010/introducing-flockdb.html)
* **TAO: Distributed Data Store for the Social Graph at Facebook** (https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/11730-atc13-bronson.pdf)
* **Akutan: Distributed Knowledge Graph Store at eBay** (https://tech.ebayinc.com/engineering/akutan-a-distributed-knowledge-graph-store/)
⟡ Time Series Databases (https://www.influxdata.com/time-series-database/)
* **Beringei: High-performance Time Series Storage Engine at Facebook** (https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/)
* **MetricsDB: TimeSeries Database for storing metrics at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/metricsdb.html)
* **Atlas: In-memory Dimensional Time Series Database at Netflix** (https://medium.com/netflix-techblog/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a)
* **Heroic: Time Series Database at Spotify** (https://labs.spotify.com/2015/11/17/monitoring-at-spotify-introducing-heroic/)
* **Roshi: Distributed Storage System for Time-Series Event at SoundCloud** (https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-timestamped-events)
* **Goku: Time Series Database at Pinterest** (https://medium.com/@Pinterest_Engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181)
* **Scaling Time Series Data Storage (2 parts) at Netflix** (https://medium.com/netflix-techblog/scaling-time-series-data-storage-part-ii-d67939655586)
* **Druid - Real-time Analytics Database** (https://druid.apache.org/)
* **Druid at Airbnb** (https://medium.com/airbnb-engineering/druid-airbnb-data-platform-601c312f2a4c)
* **Druid at Walmart** (https://medium.com/walmartlabs/event-stream-analytics-at-walmart-with-druid-dcf1a37ceda7)
* **Druid at eBay** (https://tech.ebayinc.com/engineering/monitoring-at-ebay-with-druid/)
* **Druid at Netflix** (https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06)
⟡ Distributed Repositories, Dependencies, and Configurations Management (https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/)
* **DGit: Distributed Git at Github** (https://githubengineering.com/introducing-dgit/)
* **Stemma: Distributed Git Server at Palantir** (https://medium.com/@palantir/stemma-distributed-git-server-70afbca0fc29)
* **Configuration Management for Distributed Systems at Flickr** (https://code.flickr.net/2016/03/24/configuration-management-for-distributed-systems-using-github-and-cfg4j/)
* **Git Repository at Microsoft** (https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-repo-on-the-planet/)
* **Solve Git Problem with Large Repositories at Microsoft** (https://www.infoq.com/news/2017/02/GVFS)
* **Single Repository at Google** (https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext)
* **Scaling Infrastructure and (Git) Workflow at Adyen** (https://medium.com/adyen/from-0-100-billion-scaling-infrastructure-and-workflow-at-adyen-7b63b690dfb6)
* **Dotfiles Distribution at Booking.com** (https://medium.com/booking-com-infrastructure/dotfiles-distribution-dedb69c66a75)
* **Secret Detector: Preventing Secrets in Source Code at Yelp** (https://engineeringblog.yelp.com/2018/06/yelps-secret-detector.html)
* **Managing Software Dependency at Scale at LinkedIn** (https://engineering.linkedin.com/blog/2018/09/managing-software-dependency-at-scale)
* **Merging Code in High-velocity Repositories at LinkedIn** (https://engineering.linkedin.com/blog/2020/continuous-integration)
* **Dynamic Configuration at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/dynamic-configuration-at-twitter.html)
* **Dynamic Configuration at Mixpanel** (https://medium.com/mixpaneleng/dynamic-configuration-at-mixpanel-94bfcf97d6b8)
* **Dynamic Configuration at GoDaddy** (https://sg.godaddy.com/engineering/2019/03/06/dynamic-configuration-for-nodejs/)
⟡ Scaling Continuous Integration and Continuous Delivery (https://www.synopsys.com/blogs/software-security/agile-cicd-devops-glossary/)
* **Continuous Integration Stack at Facebook** (https://code.fb.com/web/rapid-release-at-massive-scale/)
* **Continuous Integration with Distributed Repositories and Dependencies at Netflix** (https://medium.com/netflix-techblog/towards-true-continuous-integration-distributed-repositories-and-de
pendencies-2a2e3108c051)
* **Continuous Integration and Deployment with Bazel at Dropbox** (https://blogs.dropbox.com/tech/2019/12/continuous-integration-and-deployment-with-bazel/)
* **Continuous Deployments at BuzzFeed** (https://tech.buzzfeed.com/continuous-deployments-at-buzzfeed-d171f76c1ac4)
* **Screwdriver: Continuous Delivery Build System for Dynamic Infrastructure at Yahoo** (https://yahooeng.tumblr.com/post/155765242061/open-sourcing-screwdriver-yahoos-continuous)
* **CI/CD at Betterment** (https://www.betterment.com/resources/ci-cd-shortening-the-feedback-loop/)
* **CI/CD at Brainly** (https://medium.com/engineering-brainly/ci-cd-at-scale-fdfb0f49e031)
* **Scaling iOS CI with Anka at Shopify** (https://engineering.shopify.com/blogs/engineering/scaling-ios-ci-with-anka)
* **Scaling Jira Server at Yelp** (https://engineeringblog.yelp.com/2019/04/Scaling-Jira-Server-Administration-For-The-Enterprise.html)
* **Auto-scaling CI/CD cluster at Flexport** (https://flexport.engineering/how-flexport-halved-testing-costs-with-an-auto-scaling-ci-cd-cluster-8304297222f)
Availability
⟡ Resilience Engineering: Learning to Embrace Failure (https://queue.acm.org/detail.cfm?id=2371297)
* **Resilience Engineering with Project Waterbear at LinkedIn** (https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear)
* **Resiliency against Traffic Oversaturation at iHeartRadio** (https://tech.iheart.com/resiliency-against-traffic-oversaturation-77c5ed92a5fb)
* **Resiliency in Distributed Systems at GO-JEK** (https://blog.gojekengineering.com/resiliency-in-distributed-systems-efd30f74baf4)
* **Practical NoSQL Resilience Design Pattern for the Enterprise at eBay** (https://www.ebayinc.com/stories/blogs/tech/practical-nosql-resilience-design-pattern-for-the-enterprise/)
* **Ensuring Resilience to Disaster at Quora** (https://engineering.quora.com/Ensuring-Quoras-Resilience-to-Disaster)
* **Site Resiliency at Expedia** (https://www.infoq.com/presentations/expedia-website-resiliency?utm_source=presentations_about_Case_Study&utm_medium=link&utm_campaign=Case_Study)
* **Resiliency and Disaster Recovery with Kafka at eBay** (https://tech.ebayinc.com/engineering/resiliency-and-disaster-recovery-with-kafka/)
* **Disaster Recovery for Multi-Region Kafka at Uber** (https://eng.uber.com/kafka/)
⟡ Failover (http://cloudpatterns.org/mechanisms/failover_system)
* **The Evolution of Global Traffic Routing and Failover** (https://www.usenix.org/conference/srecon16/program/presentation/heady)
* **Testing for Disaster Recovery Failover Testing** (https://www.usenix.org/conference/srecon17asia/program/presentation/liu_zehua)
* **Designing a Microservices Architecture for Failure** (https://blog.risingstack.com/designing-microservices-architecture-for-failure/)
* **ELB for Automatic Failover at GoSquared** (https://engineering.gosquared.com/use-elb-automatic-failover)
* **Eliminate the Database for Higher Availability at American Express** (http://americanexpress.io/eliminate-the-database-for-higher-availability/)
* **Failover with Redis Sentinel at Vinted** (http://engineering.vinted.com/2015/09/03/failover-with-redis-sentinel/)
* **High-availability SaaS Infrastructure at FreeAgent** (http://engineering.freeagent.com/2017/02/06/ha-infrastructure-without-breaking-the-bank/)
* **MySQL High Availability at GitHub** (https://github.blog/2018-06-20-mysql-high-availability-at-github/)
* **MySQL High Availability at Eventbrite** (https://www.eventbrite.com/engineering/mysql-high-availability-at-eventbrite/)
* **Business Continuity & Disaster Recovery at Walmart** (https://medium.com/walmartlabs/business-continuity-disaster-recovery-in-the-microservices-world-ef2adca363df)
⟡ Load Balancing (https://blog.vivekpanyam.com/scaling-a-web-service-load-balancing/)
* **Introduction to Modern Network Load Balancing and Proxying** (https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236)
* **Top Five (Load Balancing) Scalability Patterns** (https://www.f5.com/company/blog/top-five-scalability-patterns)
* **Load Balancing infrastructure to support more than 1.3 billion users at Facebook** (https://www.usenix.org/conference/srecon15europe/program/presentation/shuff)
* **DHCPLB: DHCP Load Balancer at Facebook** (https://code.facebook.com/posts/1734309626831603/dhcplb-an-open-source-load-balancer/)
* **Katran: Scalable Network Load Balancer at Facebook** (https://code.facebook.com/posts/1906146702752923/open-sourcing-katran-a-scalable-network-load-balancer/)
* **Deterministic Aperture: A Distributed, Load Balancing Algorithm at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html)
* **Load Balancing with Eureka at Netflix** (https://medium.com/netflix-techblog/netflix-shares-cloud-load-balancing-and-failover-tool-eureka-c10647ef95e5)
* **Edge Load Balancing at Netflix** (https://medium.com/netflix-techblog/netflix-edge-load-balancing-695308b5548c)
* **Zuul 2: Cloud Gateway at Netflix** (https://medium.com/netflix-techblog/open-sourcing-zuul-2-82ea476cb2b3)
* **Load Balancing at Yelp** (https://engineeringblog.yelp.com/2017/05/taking-zero-downtime-load-balancing-even-further.html)
* **Load Balancing at Github** (https://githubengineering.com/introducing-glb/)
* **Consistent Hashing to Improve Load Balancing at Vimeo** (https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed)
* **UDP Load Balancing at 500 pixel** (https://developers.500px.com/udp-load-balancing-with-keepalived-167382d7ad08)
* **QALM: QoS Load Management Framework at Uber** (https://eng.uber.com/qalm/)
* **Traffic Steering using Rum DNS at LinkedIn** (https://www.usenix.org/conference/srecon17europe/program/presentation/rastogi)
* **Traffic Infrastructure (Edge Network) at Dropbox** (https://blogs.dropbox.com/tech/2018/10/dropbox-traffic-infrastructure-edge-network/)
* **Intelligent DNS based load balancing at Dropbox** (https://blogs.dropbox.com/tech/2020/01/intelligent-dns-based-load-balancing-at-dropbox/)
* **Monitor DNS systems at Stripe** (https://stripe.com/en-sg/blog/secret-life-of-dns)
* **Multi-DNS Architecture (3 parts) at Monday** (https://medium.com/monday-engineering/how-and-why-we-migrated-our-dns-from-cloudflare-to-a-multi-dns-architecture-part-3-584a470f4062)
* **Dynamic Anycast DNS Infrastructure at Hulu** (https://medium.com/hulu-tech-blog/building-hulus-dynamic-anycast-dns-infrastructure-985a7a11fd30)
⟡ Rate Limiting (https://www.keycdn.com/support/rate-limiting/)
* **Rate Limiting for Scaling to Millions of Domains at Cloudflare** (https://blog.cloudflare.com/counting-things-a-lot-of-different-things/)
* **Cloud Bouncer: Distributed Rate Limiting at Yahoo** (https://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate-limiting-at-yahoo)
* **Scaling API with Rate Limiters at Stripe** (https://stripe.com/blog/rate-limiters)
* **Distributed Rate Limiting at Allegro** (https://allegro.tech/2017/04/hermes-max-rate.html)
* **Ratequeue: Core Queueing-And-Rate-Limiting System at Twilio** (https://www.twilio.com/blog/2017/11/chaos-engineering-ratequeue-ha.html)
* **Quotas Service at Grab** (https://engineering.grab.com/quotas-service)
⟡ Autoscaling (https://medium.com/@BotmetricHQ/top-11-hard-won-lessons-learned-about-aws-auto-scaling-5bfe56da755f)
* **Autoscaling Pinterest** (https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64)
* **Autoscaling Based on Request Queuing at Square** (https://medium.com/square-corner-blog/autoscaling-based-on-request-queuing-c4c0f57f860f)
* **Autoscaling Jenkins at Trivago** (http://tech.trivago.com/2017/02/17/your-definite-guide-for-autoscaling-jenkins/)
* **Autoscaling Pub-Sub Consumers at Spotify** (https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/)
* **Autoscaling Bigtable Clusters based on CPU Load at Spotify** (https://labs.spotify.com/2018/12/18/bigtable-autoscaler-saving-money-and-time-using-managed-storage/)
* **Autoscaling AWS Step Functions Activities at Yelp** (https://engineeringblog.yelp.com/2019/06/autoscaling-aws-step-functions-activities.html)
* **Scryer: Predictive Auto Scaling Engine at Netflix** (https://medium.com/netflix-techblog/scryer-netflixs-predictive-auto-scaling-engine-a3f8fc922270)
* **Bouncer: Simple AWS Auto Scaling Rollovers at Palantir** (https://medium.com/palantir/bouncer-simple-aws-auto-scaling-rollovers-c5af601d65d4)
* **Clusterman: Autoscaling Mesos Clusters at Yelp** (https://engineeringblog.yelp.com/2019/02/autoscaling-mesos-clusters-with-clusterman.html)
⟡ Availability in Globally Distributed Storage Systems at Google (http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36737.pdf)
⟡ NodeJS High Availability at Yahoo (https://yahooeng.tumblr.com/post/68823943185/nodejs-high-availability)
⟡ Operations (11 parts) at LinkedIn (https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason)
⟡ Monitoring Powers High Availability for LinkedIn Feed (https://www.usenix.org/conference/srecon17americas/program/presentation/barot)
⟡ Supporting Global Events at Facebook (https://code.facebook.com/posts/166966743929963/how-production-engineers-support-global-events-on-facebook/)
⟡ High Availability at BlaBlaCar (https://medium.com/blablacar-tech/the-expendables-backends-high-availability-at-blablacar-8cea3b95b26b)
⟡ High Availability at Netflix (https://medium.com/@NetflixTechBlog/tips-for-high-availability-be0472f2599c)
⟡ High Availability Cloud Infrastructure at Twilio (https://www.twilio.com/engineering/2011/12/12/scaling-high-availablity-infrastructure-in-cloud)
⟡ Automating Datacenter Operations at Dropbox (https://blogs.dropbox.com/tech/2019/01/automating-datacenter-operations-at-dropbox/)
⟡ Globalizing Player Accounts at Riot Games (https://technology.riotgames.com/news/globalizing-player-accounts)
Stability
⟡ Circuit Breaker (https://martinfowler.com/bliki/CircuitBreaker.html)
* **Circuit Breaking in Distributed Systems** (https://www.infoq.com/presentations/circuit-breaking-distributed-systems)
* **Circuit Breaker for Scaling Containers** (https://f5.com/about-us/blog/articles/the-art-of-scaling-containers-circuit-breakers-28919)
* **Lessons in Resilience at SoundCloud** (https://developers.soundcloud.com/blog/lessons-in-resilience-at-SoundCloud)
* **Protector: Circuit Breaker for Time Series Databases at Trivago** (http://tech.trivago.com/2016/02/23/protector/)
* **Improved Production Stability with Circuit Breakers at Heroku** (https://blog.heroku.com/improved-production-stability-with-circuit-breakers)
* **Circuit Breaker at Zendesk** (https://medium.com/zendesk-engineering/the-joys-of-circuit-breaking-ee6584acd687)
* **Circuit Breaker at Traveloka** (https://medium.com/traveloka-engineering/circuit-breakers-dont-let-your-dependencies-bring-you-down-5ba1c5cf1eec)
* **Circuit Breaker at Shopify** (https://shopify.engineering/circuit-breaker-misconfigured)
⟡ Timeouts (https://www.javaworld.com/article/2824163/application-performance/stability-patterns-applied-in-a-restful-architecture.html)
* **Fault Tolerance (Timeouts and Retries, Thread Separation, Semaphores, Circuit Breakers) at Netflix** (https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-syst
em-91ab4faae74a)
* **Enforce Timeout: A Reliability Methodology at DoorDash** (https://doordash.engineering/2018/12/21/enforce-timeout-a-doordash-reliability-methodology/)
* **Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled at eBay** (https://www.ebayinc.com/stories/blogs/tech/a-vip-connection-timeout-issue-caused-by-snat-and-tcp-tw-recyc
le/)
⟡ Crash-safe Replication for MySQL at Booking.com (https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f)
⟡ Bulkheads: Partition and Tolerate Failure in One Part (https://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html)
⟡ Steady State: Always Put Logs on Separate Disk (https://docs.microsoft.com/en-us/sql/relational-databases/policy-based-management/place-data-and-log-files-on-separate-drives)
⟡ Throttling: Maintain a Steady Pace (http://www.sosp.org/2001/papers/welsh.pdf)
⟡ Multi-Clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn
(https://engineering.linkedin.com/blog/2017/11/improving-resiliency-and-stability-of-a-large-scale-api)
⟡ Determinism (4 parts) in League of Legends Server (https://engineering.riotgames.com/news/determinism-league-legends-fixing-divergences)
Performance
⟡ Performance Optimization on OS, Storage, Database, Network (https://stackify.com/application-performance-metrics/)
* **Improving Performance with Background Data Prefetching at Instagram** (https://engineering.instagram.com/improving-performance-with-background-data-prefetching-b191acb39898)
* **Fixing Linux filesystem performance regressions at LinkedIn** (https://engineering.linkedin.com/blog/2020/fixing-linux-filesystem-performance-regressions)
* **Compression Techniques to Solve Network I/O Bottlenecks at eBay** (https://www.ebayinc.com/stories/blogs/tech/how-ebays-shopping-cart-used-compression-techniques-to-solve-network-io-bottl
enecks/)
* **Optimizing Web Servers for High Throughput and Low Latency at Dropbox** (https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/)
* **Linux Performance Analysis in 60.000 Milliseconds at Netflix** (https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55)
* **Live Downsizing Google Cloud Persistent Disks (PD-SSD) at Mixpanel** (https://engineering.mixpanel.com/2018/07/31/live-downsizing-google-cloud-pds-for-fun-and-profit/)
* **Decreasing RAM Usage by 40% Using jemalloc with Python & Celery at Zapier** (https://zapier.com/engineering/celery-python-jemalloc/)
* **Reducing Memory Footprint at Slack** (https://slack.engineering/reducing-slacks-memory-footprint-4480fec7e8eb)
* **Continuous Load Testing at Slack** (https://slack.engineering/continuous-load-testing/)
* **Performance Improvements at Pinterest** (https://medium.com/@Pinterest_Engineering/driving-user-growth-with-performance-improvements-cfc50dafadd7)
* **Server Side Rendering at Wix** (https://www.youtube.com/watch?v=f9xI2jR71Ms)
* **30x Performance Improvements on MySQLStreamer at Yelp** (https://engineeringblog.yelp.com/2018/02/making-30x-performance-improvements-on-yelps-mysqlstreamer.html)
* **Optimizing APIs at Netflix** (https://medium.com/netflix-techblog/optimizing-the-netflix-api-5c9ac715cf19)
* **Performance Monitoring with Riemann and Clojure at Walmart** (https://medium.com/walmartlabs/performance-monitoring-with-riemann-and-clojure-eafc07fcd375)
* **Performance Tracking Dashboard for Live Games at Zynga** (https://www.zynga.com/blogs/engineering/live-games-have-evolving-performance)
* **Optimizing CAL Report Hadoop MapReduce Jobs at eBay** (https://www.ebayinc.com/stories/blogs/tech/optimization-of-cal-report-hadoop-mapreduce-job/)
* **Performance Tuning on Quartz Scheduler at eBay** (https://www.ebayinc.com/stories/blogs/tech/performance-tuning-on-quartz-scheduler/)
* **Profiling C++ (Part 1: Optimization, Part 2: Measurement and Analysis) at Riot Games** (https://engineering.riotgames.com/news/profiling-optimisation)
* **Profiling React Server-Side Rendering at HomeAway** (https://medium.com/homeaway-tech-blog/profiling-react-server-side-rendering-to-free-the-node-js-event-loop-7f0fe455a901)
* **Hardware-Assisted Video Transcoding at Dailymotion** (https://medium.com/dailymotion-engineering/hardware-assisted-video-transcoding-at-dailymotion-66cd2db448ae)
* **Cross Shard Transactions at 10 Million RPS at Dropbox** (https://blogs.dropbox.com/tech/2018/11/cross-shard-transactions-at-10-million-requests-per-second/)
* **API Profiling at Pinterest** (https://medium.com/@Pinterest_Engineering/api-profiling-at-pinterest-6fa9333b4961)
* **Pagelets Parallelize Server-side Processing at Yelp** (https://engineeringblog.yelp.com/2017/07/generating-web-pages-in-parallel-with-pagelets.html)
* **Improving key expiration in Redis at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/improving-key-expiration-in-redis.html)
* **Ad Delivery Network Performance Optimization with Flame Graphs at MindGeek** (https://medium.com/mindgeek-engineering-blog/ad-delivery-network-performance-optimization-with-flame-graphs-b
c550cf59cf7)
* **Predictive CPU isolation of containers at Netflix** (https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7)
* **Improving HDFS I/O Utilization for Efficiency at Uber** (https://eng.uber.com/improving-hdfs-i-o-utilization-for-efficiency/)
* **Cloud Jewels: Estimating kWh in the Cloud at Etsy** (https://codeascraft.com/2020/04/23/cloud-jewels-estimating-kwh-in-the-cloud/)
* **Unthrottled: Fixing CPU Limits in the Cloud (2 parts) at Indeed** (https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/)
⟡ Performance Optimization by Tuning Garbage Collection (https://confluence.atlassian.com/enterprise/garbage-collection-gc-tuning-guide-461504616.html)
* **Garbage Collection in Java Applications at LinkedIn** (https://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-application
s)
* **Garbage Collection in High-Throughput, Low-Latency Machine Learning Services at Adobe** (https://medium.com/adobetech/engineering-high-throughput-low-latency-machine-learning-services-7d4
5edac0271)
* **Garbage Collection in Redux Applications at SoundCloud** (https://developers.soundcloud.com/blog/garbage-collection-in-redux-applications)
* **Garbage Collection in Go Application at Twitch** (https://blog.twitch.tv/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2)
* **Analyzing V8 Garbage Collection Logs at Alibaba** (https://www.linux.com/blog/can-nodejs-scale-ask-team-alibaba)
* **Python Garbage Collection for Dropping 50% Memory Growth Per Request at Instagram** (https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf)
* **Performance Impact of Removing Out of Band Garbage Collector (OOBGC) at Github** (https://githubengineering.com/removing-oobgc/)
* **Debugging Java Memory Leaks at Allegro** (https://allegro.tech/2018/05/a-comedy-of-errors-debugging-java-memory-leaks.html)
* **Optimizing JVM at Alibaba** (https://www.youtube.com/watch?v=X4tmr3nhZRg)
* **Tuning JVM Memory for Large-scale Services at Uber** (https://eng.uber.com/jvm-tuning-garbage-collection/)
* **Solr Performance Tuning at Walmart** (https://medium.com/walmartglobaltech/solr-performance-tuning-beb7d0d0f8d9)
* **Memory Tuning a High Throughput Microservice at Flipkart** (https://blog.flipkart.tech/memory-tuning-a-high-throughput-microservice-ed57b3e60997)
⟡ Performance Optimization on Image, Video, Page Load (https://developers.google.com/web/fundamentals/performance/why-performance-matters/)
* **Optimizing 360 Photos at Scale at Facebook** (https://code.facebook.com/posts/129055711052260/optimizing-360-photos-at-scale/)
* **Reducing Image File Size in the Photos Infrastructure at Etsy** (https://codeascraft.com/2017/05/30/reducing-image-file-size-at-etsy/)
* **Improving GIF Performance at Pinterest** (https://medium.com/@Pinterest_Engineering/improving-gif-performance-on-pinterest-8dad74bf92f1)
* **Optimizing Video Playback Performance at Pinterest** (https://medium.com/@Pinterest_Engineering/optimizing-video-playback-performance-caf55ce310d1)
* **Optimizing Video Stream for Low Bandwidth with Dynamic Optimizer at Netflix** (https://medium.com/netflix-techblog/optimized-shot-based-encodes-now-streaming-4b9464204830)
* **Adaptive Video Streaming at YouTube** (https://youtube-eng.googleblog.com/2018/04/making-high-quality-video-efficient.html)
* **Reducing Video Loading Time at Dailymotion** (https://medium.com/dailymotion/reducing-video-loading-time-fa9c997a2294)
* **Improving Homepage Performance at Zillow** (https://www.zillow.com/engineering/improving-homepage-performance/)
* **The Process of Optimizing for Client Performance at Expedia** (https://medium.com/expedia-engineering/go-fast-or-go-home-the-process-of-optimizing-for-client-performance-57bb497402e)
* **Web Performance at BBC** (https://medium.com/bbc-design-engineering/bbc-world-service-web-performance-26b08f7abfcc)
⟡ Performance Optimization by Brotli Compression (https://blogs.akamai.com/2016/02/understanding-brotlis-potential.html)
* **Boosting Site Speed Using Brotli Compression at LinkedIn** (https://engineering.linkedin.com/blog/2017/05/boosting-site-speed-using-brotli-compression)
* **Brotli at Booking.com** (https://medium.com/booking-com-development/bookings-journey-with-brotli-978b249d34f3)
* **Brotli at Treebo** (https://tech.treebo.com/a-tale-of-brotli-compression-bcb071d9780a)
* **Deploying Brotli for Static Content at Dropbox** (https://dropbox.tech/infrastructure/deploying-brotli-for-static-content)
* **Progressive Enhancement with Brotli at Yelp** (https://engineeringblog.yelp.com/2017/07/progressive-enhancement-with-brotli.html)
* **Speeding Up Redis with Compression at Doordash** (https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/)
⟡ Performance Optimization on Languages and Frameworks (https://www.techempower.com/benchmarks/)
* **Python at Netflix** (https://netflixtechblog.com/python-at-netflix-bba45dae649e)
* **Python at scale (3 parts) at Instagram** (https://instagram-engineering.com/python-at-scale-strict-modules-c0bb9245c834)
* **OCaml best practices (2 parts) at Issuu** (https://engineering.issuu.com/2018/12/10/our-current-ocaml-best-practices-part-2)
* **PHP at Slack** (https://slack.engineering/taking-php-seriously-cf7a60065329)
* **Go at Trivago** (https://tech.trivago.com/2020/03/02/why-we-chose-go/)
* **TypeScript at Etsy** (https://codeascraft.com/2021/11/08/etsys-journey-to-typescript/)
* **Kotlin for taming state at Etsy** (https://www.etsy.com/sg-en/codeascraft/sealed-classes-opened-my-mind)
* **BPF and Go at Bumble** (https://medium.com/bumble-tech/bpf-and-go-modern-forms-of-introspection-in-linux-6b9802682223)
* **Ruby on Rails at GitLab** (https://medium.com/gitlab-magazine/why-we-use-ruby-on-rails-to-build-gitlab-601dce4a7a38)
* **Rust in production at Figma** (https://medium.com/figma-design/rust-in-production-at-figma-e10a0ec31929)
* **Choosing a Language Stack at WeWork** (https://engineering.wework.com/choosing-a-language-stack-cac3726928f6)
* **Switching from Go to Rust at Discord** (https://blog.discord.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f)
* **ASP.NET Core Performance Optimization at Agoda** (https://medium.com/agoda-engineering/happy-asp-net-core-performance-optimization-4e21a383d299)
* **Data Race Patterns in Go at Uber** (https://eng.uber.com/data-race-patterns-in-go/)
Intelligence
⟡ Big Data (https://insights.sei.cmu.edu/sei_blog/2017/05/reference-architectures-for-big-data-systems.html)
* **Data Platform at Uber** (https://eng.uber.com/uber-big-data-platform/)
* **Data Platform at BMW** (https://www.unibw.de/code/events-u/jt-2018-workshops/ws3_bigdata_vortrag_widmann.pdf)
* **Data Platform at Netflix** (https://www.youtube.com/watch?v=CSDIThSwA7s)
* **Data Platform at Flipkart** (https://blog.flipkart.tech/overview-of-flipkart-data-platform-20c6d3e9a196)
* **Data Platform at Coupang** (https://medium.com/coupang-tech/evolving-the-coupang-data-platform-308e305a9c45)
* **Data Platform at DoorDash** (https://doordash.engineering/2020/09/25/how-doordash-is-scaling-its-data-platform/)
* **Data Platform at Khan Academy** (http://engineering.khanacademy.org/posts/khanalytics.htm)
* **Data Infrastructure at Airbnb** (https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c)
* **Data Infrastructure at LinkedIn** (https://www.infoq.com/presentations/big-data-infrastructure-linkedin)
* **Data Infrastructure at GO-JEK** (https://blog.gojekengineering.com/data-infrastructure-at-go-jek-cd4dc8cbd929)
* **Data Ingestion Infrastructure at Pinterest** (https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
* **Data Analytics Architecture at Pinterest** (https://medium.com/@Pinterest_Engineering/behind-the-pins-building-analytics-f7b508cdacab)
* **Data Orchestration Service at Spotify** (https://engineering.atspotify.com/2022/03/why-we-switched-our-data-orchestration-service/)
* **Big Data Processing (2 parts) at Spotify** (https://labs.spotify.com/2017/10/23/big-data-processing-at-spotify-the-road-to-scio-part-2/)
* **Big Data Processing at Uber** (https://cdn.oreillystatic.com/en/assets/1/event/160/Big%20data%20processing%20with%20Hadoop%20and%20Spark%2C%20the%20Uber%20way%20Presentation.pdf)
* **Analytics Pipeline at Lyft** (https://cdn.oreillystatic.com/en/assets/1/event/269/Lyft_s%20analytics%20pipeline_%20From%20Redshift%20to%20Apache%20Hive%20and%20Presto%20Presentation.pdf)
* **Analytics Pipeline at Grammarly** (https://tech.grammarly.com/blog/building-a-versatile-analytics-pipeline-on-top-of-apache-spark)
* **Analytics Pipeline at Teads** (https://medium.com/teads-engineering/give-meaning-to-100-billion-analytics-events-a-day-d6ba09aa8f44)
* **ML Data Pipelines for Real-Time Fraud Prevention at PayPal** (https://www.infoq.com/presentations/paypal-ml-fraud-prevention-2018)
* **Big Data Analytics and ML Techniques at LinkedIn** (https://cdn.oreillystatic.com/en/assets/1/event/269/Big%20data%20analytics%20and%20machine%20learning%20techniques%20to%20drive%20and%2
0grow%20business%20Presentation%201.pdf)
* **Self-Serve Reporting Platform on Hadoop at LinkedIn** (https://cdn.oreillystatic.com/en/assets/1/event/137/Building%20a%20self-serve%20real-time%20reporting%20platform%20at%20LinkedIn%20P
resentation%201.pdf)
* **Privacy-Preserving Analytics and Reporting at LinkedIn** (https://engineering.linkedin.com/blog/2019/04/privacy-preserving-analytics-and-reporting-at-linkedin)
* **Analytics Platform for Tracking Item Availability at Walmart** (https://medium.com/walmartlabs/how-we-build-a-robust-analytics-platform-using-spark-kafka-and-cassandra-lambda-architecture
-70c2d1bc8981)
* **Real-Time Analytics for Mobile App Crashes using Apache Pinot at Uber** (https://www.uber.com/en-SG/blog/real-time-analytics-for-mobile-app-crashes/)
* **HALO: Hardware Analytics and Lifecycle Optimization at Facebook** (https://code.fb.com/data-center-engineering/hardware-analytics-and-lifecycle-optimization-halo-at-facebook/)
* **RBEA: Real-time Analytics Platform at King** (https://techblog.king.com/rbea-scalable-real-time-analytics-king/)
* **AresDB: GPU-Powered Real-time Analytics Engine at Uber** (https://eng.uber.com/aresdb/)
* **AthenaX: Streaming Analytics Platform at Uber** (https://eng.uber.com/athenax/)
* **Jupiter: Config Driven Adtech Batch Ingestion Platform at Uber** (https://www.uber.com/en-SG/blog/jupiter-batch-ingestion-platform/)
* **Delta: Data Synchronization and Enrichment Platform at Netflix** (https://medium.com/netflix-techblog/delta-a-data-synchronization-and-enrichment-platform-e82c36a79aee)
* **Keystone: Real-time Stream Processing Platform at Netflix** (https://medium.com/netflix-techblog/keystone-real-time-stream-processing-platform-a3ee651812a)
* **Databook: Turning Big Data into Knowledge with Metadata at Uber** (https://eng.uber.com/databook/)
* **Amundsen: Data Discovery & Metadata Engine at Lyft** (https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9)
* **Maze: Funnel Visualization Platform at Uber** (https://eng.uber.com/maze/)
* **Metacat: Making Big Data Discoverable and Meaningful at Netflix** (https://medium.com/netflix-techblog/metacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520)
* **SpinalTap: Change Data Capture System at Airbnb** (https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f)
* **Accelerator: Fast Data Processing Framework at eBay** (https://www.ebayinc.com/stories/blogs/tech/announcing-the-accelerator-processing-1-000-000-000-lines-per-second-on-a-single-computer
/)
* **Omid: Transaction Processing Platform at Yahoo** (https://yahooeng.tumblr.com/post/180867271141/a-new-chapter-for-omid)
* **TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo** (https://yahooeng.tumblr.com/post/157196488076/open-sourcing-tensorflowonspark-distributed-deep)
* **CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo** (https://yahooeng.tumblr.com/post/139916828451/caffeonspark-open-sourced-for-distributed-deep)
* **Spark on Scala: Analytics Reference Architecture at Adobe** (https://medium.com/adobetech/spark-on-scala-adobe-analytics-reference-architecture-7457f5614b4c)
* **Experimentation Platform (2 parts) at Spotify** (https://engineering.atspotify.com/2020/11/02/spotifys-new-experimentation-platform-part-2/)
* **Experimentation Platform at Airbnb** (https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-23fd17c91166)
* **Smart Product Platform at Zalando** (https://jobs.zalando.com/tech/blog/zalando-smart-product-platform/?gh_src=4n3gxh1)
* **Log Analysis Platform at LINE** (https://www.slideshare.net/wyukawa/strata2017-sg)
* **Data Visualisation Platform at Myntra** (https://medium.com/myntra-engineering/universal-dashboarding-platform-udp-data-visualisation-platform-at-myntra-5f2522fcf72d)
* **Building and Scaling Data Lineage at Netflix** (https://medium.com/netflix-techblog/building-and-scaling-data-lineage-at-netflix-to-improve-data-infrastructure-reliability-and-1a52526a797
7)
* **Building a scalable data management system for computer vision tasks at Pinterest** (https://medium.com/@Pinterest_Engineering/building-a-scalable-data-management-system-for-computer-visi
on-tasks-a6dee8f1c580)
* **Structured Data at Etsy** (https://codeascraft.com/2019/07/31/an-introduction-to-structured-data-at-etsy/)
* **Scaling a Mature Data Pipeline - Managing Overhead at Airbnb** (https://medium.com/airbnb-engineering/scaling-a-mature-data-pipeline-managing-overhead-f34835cbc866)
* **Spark Partitioning Strategies at Airbnb** (https://medium.com/airbnb-engineering/on-spark-hive-and-small-files-an-in-depth-look-at-spark-partitioning-strategies-a9a364f908)
* **Scaling the Hadoop Distributed File System at LinkedIn** (https://engineering.linkedin.com/blog/2021/the-exabyte-club--linkedin-s-journey-of-scaling-the-hadoop-distr)
* **Scaling Hadoop YARN cluster beyond 10,000 nodes at LinkedIn** (https://engineering.linkedin.com/blog/2021/scaling-linkedin-s-hadoop-yarn-cluster-beyond-10-000-nodes)
* **Scaling Big Data Access Controls at Pinterest** (https://medium.com/pinterest-engineering/securely-scaling-big-data-access-controls-at-pinterest-bbc3406a1695)
⟡ Distributed Machine Learning (https://www.csie.ntu.edu.tw/~cjlin/talks/bigdata-bilbao.pdf)
* **Machine Learning Platform at Uber** (https://eng.uber.com/michelangelo/)
* **Machine Learning Platform at Yelp** (https://engineeringblog.yelp.com/2020/07/ML-platform-overview.html)
* **Machine Learning Platform at Etsy** (https://codeascraft.com/2021/12/21/redesigning-etsys-machine-learning-platform/)
* **Machine Learning Platform at Zalando** (https://engineering.zalando.com/posts/2022/04/zalando-machine-learning-platform.html)
* **Recommendation System at Lyft** (https://eng.lyft.com/the-recommendation-system-at-lyft-67bc9dcc1793)
* **Platform for Serving Recommendations at Etsy** (https://www.etsy.com/sg-en/codeascraft/building-a-platform-for-serving-recommendations-at-etsy)
* **Infrastructure to Run User Forecasts at Spotify** (https://engineering.atspotify.com/2022/06/how-we-built-infrastructure-to-run-user-forecasts-at-spotify/)
* **Aroma: Using ML for Code Recommendation at Facebook** (https://code.fb.com/developer-tools/aroma/)
* **Flyte: Cloud Native Machine Learning and Data Processing Platform at Lyft** (https://eng.lyft.com/introducing-flyte-cloud-native-machine-learning-and-data-processing-platform-fb2bb3046a59
)
* **LyftLearn: ML Model Training Infrastructure built on Kubernetes at Lyft** (https://eng.lyft.com/lyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb)
* **Horovod: Open Source Distributed Deep Learning Framework for TensorFlow at Uber** (https://eng.uber.com/horovod/)
* **COTA: Improving Customer Care with NLP & Machine Learning at Uber** (https://eng.uber.com/cota/)
* **Manifold: Model-Agnostic Visual Debugging Tool for Machine Learning at Uber** (https://eng.uber.com/manifold/)
* **Repo-Topix: Topic Extraction Framework at Github** (https://githubengineering.com/topics/)
* **Concourse: Generating Personalized Content Notifications in Near-Real-Time at LinkedIn** (https://engineering.linkedin.com/blog/2018/05/concourse--generating-personalized-content-notifica
tions-in-near)
* **Altus Care: Applying a Chatbot to Platform Engineering at eBay** (https://www.ebayinc.com/stories/blogs/tech/altus-care-apply-chatbot-to-ebay-platform-engineering/)
* **PyKrylov: Accelerating Machine Learning Research at eBay** (https://tech.ebayinc.com/engineering/pykrylov-accelerating-machine-learning-research-at-ebay/)
* **Box Graph: Spontaneous Social Network at Box** (https://blog.box.com/blog/box-graph-how-we-built-spontaneous-social-network/)
* **PricingNet: Pricing Modelling with Neural Networks at Skyscanner** (https://hackernoon.com/pricingnet-modelling-the-global-airline-industry-with-neural-networks-833844d20ea6)
* **PinText: Multitask Text Embedding System at Pinterest** (https://medium.com/pinterest-engineering/pintext-a-multitask-text-embedding-system-in-pinterest-b80ece364555)
* **SearchSage: Learning Search Query Representations at Pinterest** (https://medium.com/pinterest-engineering/searchsage-learning-search-query-representations-at-pinterest-654f2bb887fc)
* **Cannes: ML saves $1.7M a year on document previews at Dropbox** (https://dropbox.tech/machine-learning/cannes--how-ml-saves-us--1-7m-a-year-on-document-previews)
* **Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp** (https://engineeringblog.yelp.com/2018/01/building-a-distributed-ml-pipeline-part1.html)
* **Learning with Privacy at Scale at Apple** (https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html)
* **Deep Learning for Image Classification Experiment at Mercari** (https://medium.com/mercari-engineering/mercaris-image-classification-experiment-using-deep-learning-9b4e994a18ec)
* **Deep Learning for Frame Detection in Product Images at Allegro** (https://allegro.tech/2016/12/deep-learning-for-frame-detection.html)
* **Content-based Video Relevance Prediction at Hulu** (https://medium.com/hulu-tech-blog/content-based-video-relevance-prediction-b2c448e14752)
* **Moderating Inappropriate Video Content at Yelp** (https://engineeringblog.yelp.com/2024/03/moderating-inappropriate-video-content-at-yelp.html)
* **Improving Photo Selection With Deep Learning at TripAdvisor** (http://engineering.tripadvisor.com/improving-tripadvisor-photo-selection-deep-learning/)
* **Personalized Recommendations for Experiences Using Deep Learning at TripAdvisor** (https://www.tripadvisor.com/engineering/personalized-recommendations-for-experiences-using-deep-learning
/)
* **Personalised Recommender Systems at BBC** (https://medium.com/bbc-design-engineering/developing-personalised-recommender-systems-at-the-bbc-e26c5e0c4216)
* **Machine Learning (2 parts) at Condé Nast** (https://technology.condenast.com/story/handbag-brand-and-color-detection)
* **Natural Language Processing and Content Analysis (2 parts) at Condé Nast** (https://technology.condenast.com/story/natural-language-processing-and-content-analysis-at-conde-nast-part-2-sy
stem-architecture)
* **Mapping the World of Music Using Machine Learning (2 parts) at iHeartRadio** (https://tech.iheart.com/mapping-the-world-of-music-using-machine-learning-part-2-aa50b6a0304c)
* **Machine Learning to Improve Streaming Quality at Netflix** (https://medium.com/netflix-techblog/using-machine-learning-to-improve-streaming-quality-at-netflix-9651263ef09f)
* **Machine Learning to Match Drivers & Riders at GO-JEK** (https://blog.gojekengineering.com/how-we-use-machine-learning-to-match-drivers-riders-b06d617b9e5)
* **Improving Video Thumbnails with Deep Neural Nets at YouTube** (https://youtube-eng.googleblog.com/2015/10/improving-youtube-video-thumbnails-with_8.html)
* **Quantile Regression for Delivering On Time at Instacart** (https://tech.instacart.com/how-instacart-delivers-on-time-using-quantile-regression-2383e2e03edb)
* **Cross-Lingual End-to-End Product Search with Deep Learning at Zalando** (https://jobs.zalando.com/tech/blog/search-deep-neural-network/)
* **Machine Learning at Jane Street** (https://blog.janestreet.com/real-world-machine-learning-part-1/)
* **Machine Learning for Ranking Answers End-to-End at Quora** (https://engineering.quora.com/A-Machine-Learning-Approach-to-Ranking-Answers-on-Quora)
* **Clustering Similar Stories Using LDA at Flipboard** (http://engineering.flipboard.com/2017/02/storyclustering)
* **Similarity Search at Flickr** (https://code.flickr.net/2017/03/07/introducing-similarity-search-at-flickr/)
* **Large-Scale Machine Learning Pipeline for Job Recommendations at Indeed** (http://engineering.indeedblog.com/blog/2016/04/building-a-large-scale-machine-learning-pipeline-for-job-recommen
dations/)
* **Deep Learning from Prototype to Production at Taboola** (http://engineering.taboola.com/deep-learning-from-prototype-to-production/)
* **Atom Smashing using Machine Learning at CERN** (https://cdn.oreillystatic.com/en/assets/1/event/144/Atom%20smashing%20using%20machine%20learning%20at%20CERN%20Presentation.pdf)
* **Mapping Tags at Medium** (https://medium.engineering/mapping-mediums-tags-1b9a78d77cf0)
* **Clustering with the Dirichlet Process Mixture Model in Scala at Monsanto** (http://engineering.monsanto.com/2015/11/23/chinese-restaurant-process/)
* **Map Pins with DBSCAN & Random Forests at Foursquare** (https://engineering.foursquare.com/you-are-probably-here-better-map-pins-with-dbscan-random-forests-9d51e8c1964d)
* **Forecasting at Uber** (https://eng.uber.com/forecasting-introduction/)
* **Financial Forecasting at Uber** (https://eng.uber.com/transforming-financial-forecasting-machine-learning/)
* **Productionizing ML with Workflows at Twitter** (https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html)
* **GUI Testing Powered by Deep Learning at eBay** (https://www.ebayinc.com/stories/blogs/tech/gui-testing-powered-by-deep-learning/)
* **Scaling Machine Learning to Recommend Driving Routes at Pivotal** (http://engineering.pivotal.io/post/scaling-machine-learning-to-recommend-driving-routes/)
* **Real-Time Predictions at DoorDash** (https://www.infoq.com/presentations/doordash-real-time-predictions)
* **Machine Intelligence at Dropbox** (https://blogs.dropbox.com/tech/2018/09/machine-intelligence-at-dropbox-an-update-from-our-dbxi-team/)
* **Machine Learning for Indexing Text from Billions of Images at Dropbox** (https://blogs.dropbox.com/tech/2018/10/using-machine-learning-to-index-text-from-billions-of-images/)
* **Modeling User Journeys via Semantic Embeddings at Etsy** (https://codeascraft.com/2018/07/12/modeling-user-journey-via-semantic-embeddings/)
* **Automated Fake Account Detection at LinkedIn** (https://engineering.linkedin.com/blog/2018/09/automated-fake-account-detection-at-linkedin)
* **Building Knowledge Graph at Airbnb** (https://medium.com/airbnb-engineering/contextualizing-airbnb-by-building-knowledge-graph-b7077e268d5a)
* **Core Modeling at Instagram** (https://instagram-engineering.com/core-modeling-at-instagram-a51e0158aa48)
* **Neural Architecture Search (NAS) for Prohibited Item Detection at Mercari** (https://tech.mercari.com/entry/2019/04/26/163000)
* **Computer Vision at Airbnb** (https://medium.com/airbnb-engineering/amenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e)
* **3D Home Backend Algorithms at Zillow** (https://www.zillow.com/engineering/behind-zillow-3d-home-backend-algorithms/)
* **Long-term Forecasts at Lyft** (https://eng.lyft.com/making-long-term-forecasts-at-lyft-fac475b3ba52)
* **Discovering Popular Dishes with Deep Learning at Yelp** (https://engineeringblog.yelp.com/2019/10/discovering-popular-dishes-with-deep-learning.html)
* **SplitNet Architecture for Ad Candidate Ranking at Twitter** (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/splitnet-architecture-for-ad-candidate-ranking.html)
* **Jobs Filter at Indeed** (https://engineering.indeedblog.com/blog/2019/09/jobs-filter/)
* **Architecting Restaurant Wait Time Predictions at Yelp** (https://engineeringblog.yelp.com/2019/12/architecting-wait-time-estimations.html)
* **Music Personalization at Spotify** (https://labs.spotify.com/2016/08/07/commodity-music-ml-services/)
* **Deep Learning for Domain Name Valuation at GoDaddy** (https://sg.godaddy.com/engineering/2019/07/26/domain-name-valuation/)
* **Similarity Clustering to Catch Fraud Rings at Stripe** (https://stripe.com/blog/similarity-clustering)
* **Personalized Search at Etsy** (https://codeascraft.com/2020/10/29/bringing-personalized-search-to-etsy/)
* **ML Feature Serving Infrastructure at Lyft** (https://eng.lyft.com/ml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a)
* **Context-Specific Bidding System at Etsy** (https://codeascraft.com/2021/03/23/how-we-built-a-context-specific-bidding-system-for-etsy-ads/)
* **Moderating Promotional Spam and Inappropriate Content in Photos at Scale at Yelp** (https://engineeringblog.yelp.com/2021/05/moderating-promotional-spam-and-inappropriate-content-in-photo
s-at-scale-at-yelp.html)
* **Optimizing Payments with Machine Learning at Dropbox** (https://dropbox.tech/machine-learning/optimizing-payments-with-machine-learning)
* **Scaling Media Machine Learning at Netflix** (https://netflixtechblog.com/scaling-media-machine-learning-at-netflix-f19b400243)
* **Similarity Engine at eBay** (https://tech.ebayinc.com/engineering/ebays-blazingly-fast-billion-scale-vector-similarity-engine/)
Architecture
⟡ Tech Stack at Medium (https://medium.engineering/the-stack-that-helped-medium-drive-2-6-millennia-of-reading-time-e56801f7c492)
⟡ Tech Stack at Shopify (https://engineering.shopify.com/blogs/engineering/e-commerce-at-scale-inside-shopifys-tech-stack)
⟡ Building Services (4 parts) at Airbnb (https://medium.com/airbnb-engineering/building-services-at-airbnb-part-4-23c95e428064)
⟡ Architecture of Evernote (https://evernote.com/blog/a-digest-of-evernotes-architecture/)
⟡ Architecture of Chat Service (3 parts) at Riot Games (https://engineering.riotgames.com/news/chat-service-architecture-persistence)
⟡ Architecture of League of Legends Client Update (https://technology.riotgames.com/news/architecture-league-client-update)
⟡ Architecture of Ad Platform at Twitter (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/building-twitters-ad-platform-architecture-for-the-future.html)
⟡ Architecture of API Gateway at Uber (https://eng.uber.com/architecture-api-gateway/)
⟡ Architecture of API Gateway at Tinder (https://medium.com/tinder/how-we-built-the-tinder-api-gateway-831c6ca5ceca)
⟡ Basic Architecture of Slack (https://slack.engineering/how-slack-built-shared-channels-8d42c895b19f)
⟡ Lightweight Distributed Architecture to Handle Thousands of Library Releases at eBay
(https://tech.ebayinc.com/engineering/a-lightweight-distributed-architecture-to-handle-thousands-of-library-releases-at-ebay/)
⟡ Back-end at LinkedIn (https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin)
⟡ Back-end at Flickr (https://yahooeng.tumblr.com/post/157200523046/introducing-tripod-flickrs-backend-refactored)
⟡ Infrastructure (3 parts) at Zendesk (https://medium.com/zendesk-engineering/the-history-of-infrastructure-at-zendesk-part-3-foundation-team-forming-and-evolving-9859e40f5390)
⟡ Cloud Infrastructure at Grubhub (https://bytes.grubhub.com/cloud-infrastructure-at-grubhub-94db998a898a)
⟡ Real-time Presence Platform at LinkedIn (https://engineering.linkedin.com/blog/2018/01/now-you-see-me--now-you-dont--linkedins-real-time-presence-platf)
⟡ Settings Platform at LinkedIn (https://engineering.linkedin.com/blog/2019/05/building-member-trust-through-a-centralized-and-scalable-setting)
⟡ Nearline System for Scale and Performance (2 parts) at Glassdoor (https://medium.com/glassdoor-engineering/building-a-nearline-system-for-scale-and-performance-part-ii-9e01bf51b23d)
⟡ Real-time User Action Counting System for Ads at Pinterest (https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a)
⟡ API Platform at Riot Games (https://engineering.riotgames.com/news/riot-games-api-deep-dive)
⟡ Games Platform at The New York Times (https://open.nytimes.com/play-by-play-moving-the-nyt-games-platform-to-gcp-with-zero-downtime-cf425898d569)
⟡ Kabootar: Communication Platform at Swiggy (https://bytes.swiggy.com/kabootar-swiggys-communication-platform-e5a43cc25629)
⟡ Simone: Distributed Simulation Service at Netflix (https://medium.com/netflix-techblog/https-medium-com-netflix-techblog-simone-a-distributed-simulation-service-b2c85131ca1b)
⟡ Seagull: Distributed System that Helps Running > 20 Million Tests Per Day at Yelp (https://engineeringblog.yelp.com/2017/04/how-yelp-runs-millions-of-tests-every-day.html)
⟡ PriceAggregator: Intelligent System for Hotel Price Fetching (3 parts) at Agoda
(https://medium.com/agoda-engineering/priceaggregator-an-intelligent-system-for-hotel-price-fetching-part-3-52acfc705081)
⟡ Phoenix: Testing Platform (3 parts) at Tinder (https://medium.com/tinder-engineering/phoenix-tinders-testing-platform-part-iii-520728b9537)
⟡ Hexagonal Architecture at Netflix (https://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749)
⟡ Architecture of Sticker Services at LINE (https://www.slideshare.net/linecorp/architecture-sustaining-line-sticker-services)
⟡ Stack Overflow Enterprise at Palantir (https://medium.com/@palantir/terraforming-stack-overflow-enterprise-in-aws-47ee431e6be7)
⟡ Architecture of Following Feed, Interest Feed, and Picked For You at Pinterest (https://medium.com/@Pinterest_Engineering/building-a-dynamic-and-responsive-pinterest-7d410e99f0a9)
⟡ API Specification Workflow at WeWork (https://engineering.wework.com/our-api-specification-workflow-9337448d6ee6)
⟡ Media Database at Netflix (https://medium.com/netflix-techblog/implementing-the-netflix-media-database-53b5a840b42a)
⟡ Member Transaction History Architecture at Walmart (https://medium.com/walmartlabs/member-transaction-history-architecture-8b6e34b87c21)
⟡ Sync Engine (2 parts) at Dropbox (https://dropbox.tech/infrastructure/-testing-our-new-sync-engine)
⟡ Ads Pacing Service at Twitter (https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/how-we-built-twitter-s-highly-reliable-ads-pacing-service)
⟡ Rapid Event Notification System at Netflix (https://netflixtechblog.com/rapid-event-notification-system-at-netflix-6deb1d2b57d1)
⟡ Architectures of Finance, Banking, and Payment Systems (https://www.redhat.com/architect/portfolio/detail/12-integrating-a-modern-payments-architecture)
* **Bank Backend at Monzo** (https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/)
* **Trading Platform for Scale at Wealthsimple** (https://medium.com/@Wealthsimple/engineering-at-wealthsimple-reinventing-our-trading-platform-for-scale-17e332241b6c)
* **Core Banking System at Margo Bank** (https://medium.com/margobank/choosing-an-architecture-85750e1e5a03)
* **Architecture of Nubank** (https://www.infoq.com/presentations/nubank-architecture)
* **Tech Stack at TransferWise** (http://tech.transferwise.com/the-transferwise-stack-heartbeat-of-our-little-revolution/)
* **Tech Stack at Addepar** (https://medium.com/build-addepar/our-tech-stack-a4f55dab4b0d)
* **Avoiding Double Payments in a Distributed Payments System at Airbnb** (https://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb)
* **Scaling Payments (3 parts) at Etsy** (https://www.etsy.com/sg-en/codeascraft/scaling-etsy-payments-with-vitess-part-3--reducing-cutover-risk)
* **Handles Millions of Digital Transactions Safely Everyday at Paytm** (https://paytm.com/blog/engineering/how-paytm-handles-millions-of-digital-transactions-safely-everyday/)
* **Billing and Payment Platform at Grammarly** (https://www.grammarly.com/blog/engineering/billing-and-payments-platform/)
Interview
⟡ Designing Large-Scale Systems (https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/)
* **My Scaling Hero - Jeff Atwood (a dose of Endorphins before your interview, JK)** (https://blog.codinghorror.com/my-scaling-hero/)
* **Software Engineering Advice from Building Large-Scale Distributed Systems - Jeff Dean** (https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.p
df)
* **Introduction to Architecting Systems for Scale** (https://lethain.com/introduction-to-architecting-systems-for-scale/)
* **Anatomy of a System Design Interview** (https://hackernoon.com/anatomy-of-a-system-design-interview-4cb57d75a53f)
* **8 Things You Need to Know Before a System Design Interview** (http://blog.gainlo.co/index.php/2015/10/22/8-things-you-need-to-know-before-system-design-interviews/)
* **Top 10 System Design Interview Questions ** (https://hackernoon.com/top-10-system-design-interview-questions-for-software-engineers-8561290f0444)
* **Top 10 Common Large-Scale Software Architectural Patterns in a Nutshell** (https://towardsdatascience.com/10-common-software-architectural-patterns-in-a-nutshell-a0b47a1e9013)
* **Cloud Big Data Design Patterns - Lynn Langit** (https://lynnlangit.com/2017/03/14/beyond-relational/)
* **How NOT to design Netflix in your 45-minute System Design Interview?** (https://hackernoon.com/how-not-to-design-netflix-in-your-45-minute-system-design-interview-64953391a054)
* **API Best Practices: Webhooks, Deprecation, and Design** (https://zapier.com/engineering/api-best-practices/)
⟡ Explaining Low-Level Systems (OS, Network/Protocol, Database, Storage) (https://www.cse.wustl.edu/~jain/cse567-06/ftp/os_monitors/index.html)
* **The Precise Meaning of I/O Wait Time in Linux** (http://veithen.github.io/2013/11/18/iowait-linux.html)
* **Paxos Made Live – An Engineering Perspective** (https://research.google.com/archive/paxos_made_live.html)
* **How to do Distributed Locking** (https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
* **SQL Transaction Isolation Levels Explained** (http://elliot.land/post/sql-transaction-isolation-levels-explained)
⟡ "What Happens When... and How" Questions (https://www.glassdoor.com/Interview/What-happens-when-you-type-www-google-com-in-your-browser-QTN_56396.htm)
* **Netflix: What Happens When You Press Play?** (http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html)
* **Monzo: How Peer-To-Peer Payments Work** (https://monzo.com/blog/2018/04/05/how-monzo-to-monzo-payments-work/)
* **Transit and Peering: How Your Requests Reach GitHub** (https://githubengineering.com/transit-and-peering-how-your-requests-reach-github/)
* **How Spotify Streams Music** (https://labs.spotify.com/2018/08/31/smoother-streaming-with-bbr/)
Organization
⟡ Engineering Levels at SoundCloud (https://developers.soundcloud.com/blog/engineering-levels)
⟡ Engineering Roles at Palantir (https://medium.com/palantir/dev-versus-delta-demystifying-engineering-roles-at-palantir-ad44c2a6e87)
⟡ Engineering Career Framework at Dropbox (https://dropbox.tech/culture/our-updated-engineering-career-framework)
⟡ Scaling Engineering Teams at Twitter (https://www.youtube.com/watch?v=-PXi_7Ld5kU)
⟡ Scaling Decision-Making Across Teams at LinkedIn (https://engineering.linkedin.com/blog/2018/03/scaling-decision-making-across-teams-within-linkedin-engineering)
⟡ Scaling Data Science Team at GOJEK (https://blog.gojekengineering.com/the-dynamics-of-scaling-an-organisation-cb96dbe8aecd)
⟡ Scaling Agile at Zalando (https://jobs.zalando.com/tech/blog/scaling-agile-zalando/?gh_src=4n3gxh1)
⟡ Scaling Agile at bol.com (https://hackernoon.com/how-we-run-bol-com-with-60-autonomous-teams-fe7a98c0759)
⟡ Lessons Learned from Scaling a Product Team at Intercom (https://blog.intercom.com/how-we-build-software/)
⟡ Hiring, Managing, and Scaling Engineering Teams at Typeform (https://medium.com/@eleonorazucconi/toby-oliver-cto-typeform-on-hiring-managing-and-scaling-engineering-teams-86bef9e5a708)
⟡ Scaling the Datagram Team at Instagram (https://instagram-engineering.com/scaling-the-datagram-team-fc67bcf9b721)
⟡ Scaling the Design Team at Flexport (https://medium.com/flexport-design/designing-a-design-team-a9a066bc48a5)
⟡ Team Model for Scaling a Design System at Salesforce (https://medium.com/salesforce-ux/the-salesforce-team-model-for-scaling-a-design-system-d89c2a2d404b)
⟡ Building Analytics Team (4 parts) at Wish (https://medium.com/wish-engineering/scaling-the-analytics-team-at-wish-part-4-recruiting-2a9823b9f5a)
⟡ From 2 Founders to 1000 Employees at Transferwise
(https://medium.com/transferwise-ideas/from-2-founders-to-1000-employees-how-a-small-scale-startup-grew-into-a-global-community-9f26371a551b)
⟡ Lessons Learned Growing a UX Team from 10 to 170 at Adobe (https://medium.com/thinking-design/lessons-learned-growing-a-ux-team-from-10-to-170-f7b47be02262)
⟡ Five Lessons from Scaling at Pinterest (https://medium.com/@sarahtavel/five-lessons-from-scaling-pinterest-6a699a889b08)
⟡ Approach Engineering at Vinted (http://engineering.vinted.com/2018/09/04/how-we-approach-engineering-at-vinted/)
⟡ Using Metrics to Improve the Development Process (and Coach People) at Indeed
(https://engineering.indeedblog.com/blog/2018/10/using-metrics-to-improve-the-development-process-and-coach-people/)
⟡ Mistakes to Avoid while Creating an Internal Product at Skyscanner (https://medium.com/@SkyscannerEng/9-mistakes-to-avoid-while-creating-an-internal-product-63d579b00b1a)
⟡ RACI (Responsible, Accountable, Consulted, Informed) at Etsy (https://codeascraft.com/2018/01/04/selecting-a-cloud-provider/)
⟡ Four Pillars of Leading People (Empathy, Inspiration, Trust, Honesty) at Zalando (https://jobs.zalando.com/tech/blog/four-pillars-leadership/)
⟡ Pair Programming at Shopify (https://engineering.shopify.com/blogs/engineering/pair-programming-explained)
⟡ Distributed Responsibility at Asana (https://blog.asana.com/2017/12/distributed-responsibility-engineering-manager/)
⟡ Rotating Engineers at Zalando (https://jobs.zalando.com/tech/blog/rotating-engineers-at-zalando/)
⟡ Experiment Idea Review at Pinterest (https://medium.com/pinterest-engineering/how-pinterest-supercharged-its-growth-team-with-experiment-idea-review-fd6571a02fb8)
⟡ Tech Migrations at Spotify (https://engineering.atspotify.com/2020/06/25/tech-migrations-the-spotify-way/)
⟡ Improving Code Ownership at Yelp (https://engineeringblog.yelp.com/2021/01/whose-code-is-it-anyway.html)
⟡ Agile Code Base at eBay (https://tech.ebayinc.com/engineering/how-creating-an-agile-code-base-helped-ebay-pivot-for-apple-silicon/)
⟡ Agile Data Engineering at Miro (https://medium.com/miro-engineering/agile-data-engineering-at-miro-ec2dcc8a3fcb)
⟡ Automated Incident Management through Slack at Airbnb (https://medium.com/airbnb-engineering/incident-management-ae863dc5d47f)
⟡ Refactor Organization at BBC (https://medium.com/bbc-product-technology/refactor-organisation-80e4e171d922)
⟡ Code Review (https://ai.google/research/pubs/pub47025)
* **Code Review at Palantir** (https://medium.com/@palantir/code-review-best-practices-19e02780015f)
* **Code Review at LINE** (https://engineering.linecorp.com/en/blog/effective-code-review/)
* **Code Reviews at Medium** (https://medium.engineering/code-reviews-at-medium-bed2c0dce13a)
* **Code Review at LinkedIn** (https://engineering.linkedin.com/blog/2018/06/scaling-collective-code-ownership-with-code-reviews)
* **Code Review at Disney** (https://medium.com/disney-streaming/the-secret-to-better-code-reviews-c14c7884b9ac)
* **Code Review at Netlify** (https://www.netlify.com/blog/2020/03/05/feedback-ladders-how-we-encode-code-reviews-at-netlify/)
Talk
⟡ Distributed Systems in One Lesson - Tim Berglund, Senior Director of Developer Experience at Confluent (https://www.youtube.com/watch?v=Y6Ev8GIlbxc)
⟡ Building Real Time Infrastructure at Facebook - Jeff Barber and Shie Erlich, Software Engineer at Facebook (https://www.usenix.org/conference/srecon17americas/program/presentation/erlich)
⟡ Building Reliable Social Infrastructure for Google - Marc Alvidrez, Senior Manager at Google (https://www.usenix.org/conference/srecon16/program/presentation/alvidrez)
⟡ Building a Distributed Build System at Google Scale - Aysylu Greenberg, SDE at Google (https://www.youtube.com/watch?v=K8YuavUy6Qc)
⟡ Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox (https://www.youtube.com/watch?v=ggizCjUCCqE)
⟡ How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform (https://www.youtube.com/watch?v=H4vMcD7zKM0)
⟡ Netflix Guide to Microservices - Josh Evans, Director of Operations Engineering at Netflix (https://www.youtube.com/watch?v=CZ3wIuvmHeM&t=2837s)
⟡ Achieving Rapid Response Times in Large Online Services - Jeff Dean, Google Senior Fellow (https://www.youtube.com/watch?v=1-3Ahy7Fxsc)
⟡ Architecture to Handle 80K RPS Celebrity Sales at Shopify - Simon Eskildsen, Engineering Lead at Shopify (https://www.youtube.com/watch?v=N8NWDHgWA28)
⟡ Lessons of Scale at Facebook - Bobby Johnson, Director of Engineering at Facebook (https://www.youtube.com/watch?v=QCHiNEw73AU)
⟡ Performance Optimization for the Greater China Region at Salesforce - Jeff Cheng, Enterprise Architect at Salesforce (https://www.salesforce.com/video/1757880/)
⟡ How GIPHY Delivers a GIF to 300 Millions Users - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY (https://vimeo.com/252367076)
⟡ High Performance Packet Processing Platform at Alibaba - Haiyong Wang, Senior Director at Alibaba
(https://www.youtube.com/watch?v=wzsxJqeVIhY&list=PLMu8-hpCxIVENuAue7bd0eCAglLGY_8AW&index=7)
⟡ Solving Large-scale Data Center and Cloud Interconnection Problems - Ihab Tarazi, CTO at Equinix
(https://atscaleconference.com/videos/solving-large-scale-data-center-and-cloud-interconnection-problems/)
⟡ Scaling Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox (https://www.youtube.com/watch?v=PE4gwstWhmc)
⟡ Scaling Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox (https://www.youtube.com/watch?v=IhGWOaD5BYQ)
⟡ Scaling with Performance at Facebook - Bill Jia, VP of Infrastructure at Facebook (https://atscaleconference.com/videos/performance-scale-2018-opening-remarks/)
⟡ Scaling Live Videos to a Billion Users at Facebook - Sachin Kulkarni, Director of Engineering at Facebook (https://www.youtube.com/watch?v=IO4teCbHvZw)
⟡ Scaling Infrastructure at Instagram - Lisa Guo, Instagram Engineering (https://www.youtube.com/watch?v=hnpzNAPiC0E)
⟡ Scaling Infrastructure at Twitter - Yao Yue, Staff Software Engineer at Twitter (https://www.youtube.com/watch?v=6OvrFkLSoZ0)
⟡ Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy (https://www.youtube.com/watch?v=LfqyhM1LeIU)
⟡ Scaling Real-time Infrastructure at Alibaba for Global Shopping Holiday - Xiaowei Jiang, Senior Director at Alibaba
(https://atscaleconference.com/videos/scaling-alibabas-real-time-infrastructure-for-global-shopping-holiday/)
⟡ Scaling Data Infrastructure at Spotify - Matti (Lepistö) Pehrs, Spotify (https://www.youtube.com/watch?v=cdsfRXr9pJU)
⟡ Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer (https://www.youtube.com/watch?v=jQNCuD_hxdQ&list=RDhnpzNAPiC0E&index=11)
⟡ Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack (https://www.infoq.com/presentations/slack-scalability)
⟡ Scaling Backend at Youtube - Sugu Sougoumarane, SDE at Youtube (https://www.youtube.com/watch?v=5yDO-tmIoXY&feature=youtu.be)
⟡ Scaling Backend at Uber - Matt Ranney, Chief Systems Architect at Uber (https://www.youtube.com/watch?v=nuiLcWE8sPA)
⟡ Scaling Global CDN at Netflix - Dave Temkin, Director of Global Networks at Netflix (https://www.youtube.com/watch?v=tbqcsHg-Q_o)
⟡ Scaling Load Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick Shuff, Production Engineer at Facebook (https://www.youtube.com/watch?v=bxhYNfFeVF4)
⟡ Scaling (a NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead Platform Developer at MindGeek (https://www.youtube.com/watch?v=RlkCdM_f3p4)
⟡ Scaling Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at Quora (https://www.infoq.com/presentations/quora-analytics)
⟡ Scaling Git at Microsoft - Saeed Noursalehi, Principal Program Manager at Microsoft (https://www.youtube.com/watch?v=g_MPGU_m01s)
⟡ Scaling Multitenant Architecture Across Multiple Data Centres at Shopify - Weingarten, Engineering Lead at Shopify (https://www.youtube.com/watch?v=F-f0-k46WVk)
Donation
Roses are red. Violets are blue. Binh (https://nguyenquocbinh.org/) likes sweet. Treat Binh a tiramisu? (https://paypal.me/binhnguyennus) :cake: