📖
System Design
  • Home
  • Learn System Design Daily
  • System Design Steps
  • List of design questions:
  • My System Design Interview Checklist
  • CheatSheet
  • Programming Language Jargons
  • Scaleable Design
  • Agile Scrum
  • Uber
  • Gmail System Design
  • Distributed Rate Limiting
  • Audio Search Engine
  • Code Style and Review
  • Calling App Design
  • Low Level Design: Payment Tracking App
  • Machine Coding :Cache
  • Interview Advice
  • URL Shortner
  • Unique ID in Distributed Systems
  • Load Balancing Algorithm
  • API Architecture
  • Desgin Practise for Rest API
  • Performance Practise for API
  • API Gateway
  • API Security Hacks
  • Distributed Design Patterns
  • Fault Tolerance in Distributed Systems
  • Microservice Communication Design Patterns
  • Zipping /Compression
  • Database
  • Mongo DB
  • SQL
  • PostgreSQL
  • Database Designs
    • Designing a location based database
  • Building Database
  • Design Patterns
    • Microservice Architecture 10 Design Patterns
    • Interaction Patterns
  • Locale.ai
  • Version Control
  • Caches / Caching
  • High Level Design
  • Low Level Design
  • Containers, Docker
  • Docker
  • Linux Directories
  • Design Pattern for Software Architect
  • S.O.L.I.D Principles
  • Monitoring and Telemetry for Production System
  • C4 model
  • LRU Cache
  • VSCode
  • Chatbot Architecture
  • Streaming API Repsonse
  • Latency in System Design
  • Cloud
    • Azure
    • AWS
  • Builds
    • Jenkins
Powered by GitBook
On this page
  • System Design Cheat Sheet
  • Load Balancers (LB)
  • Caches
  • Queues
  • Configuration Service
  • API Gateway vs Service Mesh
  • CDN
  • How to Scale database
  • Cassandra
  • Snowflake at Twitter
  • Some numbers
  • Some More Numbers
  • ……………….Architecture Designs…………………..
  • Instagram
  • Uber
  • Web Crawler
  • URL Shortner
  • Yelp
  • Dropbox
  • Distributed Message Queue
  • Distributed Cache
  • Distributed Cache at Netflix
  • Twitter
  • Netflix
  • Distributed Rate Limiter
  • Notification Service
  • Message sharing within cluster

Was this helpful?

CheatSheet

PreviousMy System Design Interview ChecklistNextProgramming Language Jargons

Last updated 3 years ago

Was this helpful?

System Design Cheat Sheet

Reference: , , ,

For you to go through just before the interview :).

Disclaimer: This article is still under work. Still have to add notes and more architecture designs. Feel free to add comments on what else to add.

Load Balancers (LB)

Selects Servers/Databases/Caches following

  1. Round Robin: Select servers one after another

  2. Weighted Round Robin: Admin assigns weight, i.e. probability…

  3. Least Connection/Response Time/ Resource Based: Dynamic Load balancing. Server with Least Connection/Response Time/ Resource Based is allotted next request. The values are calculated using client installed at servers.

  4. Similarly we have Weighted flavor of Least Connection/Response Time/ Resource Based.

: (Ex )

  • L4 : Makes balancing decision only on IP address , tcp port. Cannot see request header, client, type etc.

  • L7 : Has info about url, message, request type, header, client everything. Can route request based on type of request

Use L4 when you need to make simple reliable and fast balancing decision on server load, which has reliable TCP connection. Use L7 when you need to route request to appropriate resource server, such as image request will go to image server etc.

Types of Load Balancers in AWS : ELB, ALB, NLB

Caches

Use Memcached when : Simple Key Value storage needed. Need to only store string. No need to perform any operational query on cache. Scaling vertically by using more cores and threads is easier. When keys are maximum 250 B and Values maximum 1MB, when you are ok with only LRU eviction policy. Its Volatile.

Use Redis when: You need to store objects (don't want to serialize deserialize : access or change parts of a data object without having to load the entire object). Scaling horizontally is easier. You need to store Set, Hash, List, Sorted Set (A non-repeating list of string values ordered by a score value). When you want to chose from multiple eviction policies. When you would want to save data (its non volatile)

  • No eviction returning an error the memory limit is reached.

  • All keys LRU removing keys by the least recently used first

  • Volatile LRU removing keys, that have an expiration time set, by the least recently used first.

  • All keys random removing keys randomly

  • Volatile random removing keys, that have an expiration time set, randomly

  • Volatile TTL removing keys, that have an expiration time set, by the shortest time to live first.

Queues

Message Queue: Used to decouple producer from consumer. Name a queue as X, publish to X, consume from X. Can be used for events based intra service communication. Consumer listen for events in queue.

Publish/subscribe: Used in Notification system, backend job that takes lot of time and one action that triggers multiple services. Provides : lose coupling between message production and consumption, fault tolerance, retry messages failed to be consumed and independently scalable.

Can be done in following ways

Streaming Platform ( as in Kafka, Kinesis): Producers just write to their specific logs(Commit log/Write ahead logs). Consumers read data from logs using offset. Failures, retries, deletion, filtering all handled by consumers. Order is guaranteed per consumer per partition per topic. Replaying past message is possible(move the offset). Uses Zookeeper ( store information about brokers, topics, partitions, partition leader/followers, consumer offsets, etc.).

Topics in Kafka is like table in database. Thus, each message is associated with topic. Consumers subscribe to topic ( like update event on table). Multiple subscribers read from a topic( eg, one can send mail, other send phone message on order ship message in shipping topic). Messages not deleted on reading ( can have TTL), and written on disk. Since its sequential write its faster.

Central system managing queue is called Broker.

  1. Pub Sub: Sending messages to many consumers at once : Notification service, intra service communiation

  2. Routing: Receiving messages selectively : Queuing, retries

  3. Topics: Receiving messages based on a pattern (topics) : Selective Notification

In SNS SQS, SQS is simply a queue, SNS on the other hand behaves like a broker. SQS behaves as subscribers EC2 behaves as consumers of SQS messages So SNS+SQS = Kafka. SQS alone is not. SQS is more like like partition in Kafka.

Configuration Service

Used to manage cluster of servers, databases or caches. External services can interact with clusters via zookeeper. Zookeeper names services, knows their ips, elects leader, provides failure recovery.

API Gateway vs Service Mesh

API Gateway provides a single entry point for a client for a number of different underlying APIs (system interfaces/web services/Rest APIs, Lambdas, etc.). Performs traffic management, authorization and access control, monitoring, and API version management.

CDN

Content Delivery Networks (Akamai, Cloudfare etc) : Mostly store static data which can be pushed by servers or CDN can pull from servers on a miss or at speculated times. Globally distributed so stays near clients and is fast.

Use when you have some static data to serve near client, like images of restaurants/food on yelp, telephone directory

How to Scale database

Brief: Query optimization -> vertical scaling -> Master slave -> multi master -> partitioning -> sharding -> multi-datacenter replication

Cassandra

  1. Wide Column NoSQL

  2. Tunable consistency -> read by qourum. type of quorum defines consistency level

  3. Fast wright -> Writes in log sequentially. Tombstoning and Compation in backend.

Snowflake at Twitter

Generates tens of thousands of ids per second in a highly available manner and is 64 bits (UUID is 128)

These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how most Twitter clients sort tweets.

To generate the roughly-sorted 64 bit ids in an uncoordinated manner, ids are generated as composition of: timestamp, worker number and sequence number.

Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper.

Some numbers

Availability : 99.99% ~ 50 min downtime/year | 99.999 % ~ 5min downtime /year | 99.9999% ~= 30seconds downtime/year

Bandwidth : Average EC2 instance is 5Gb/s

Requests/second : On average 1 server can process 1000 requests/second

Low end dedicated MySQL server (2 cores, 4 GB RAM) can serve 100 requests/sec on an average ~10 million/day with CPU idle rate of 90%.

Some More Numbers

Approximation : 1 day ~ 10⁴ seconds

Daily active users : Twitter: ~200M/day (200/sec), Facebook: ~2Bn (2000/sec), Whatsapp: ~200M, Netflix: 200M/day

Photos uploaded: 200M/day (200/sec)(Instagram)

Videos uploaded: 500 hours/minute (Youtube)

Uber: 20 million trips/day (20*10⁶ / day => 2000/sec)

Bandwidth : Say 200 reads/second, each read needs 10kb data (10⁴ characters ) 2000kb/sec => 2MB/sec

……………….Architecture Designs…………………..

Instagram

  • Precompute : User A posts “x”. User B,C,D follows. Then “x” can be appended to User B,C,D news feed inside cache. This is called fanning out. In hybrid approach, if A is a celebrity, then its post is not fanned out ( 1 post need to be written to millions cache).

  • Notification server needs to maintain a persistent websocket connection to push data to User B

  • Place Metadata service to : provide separation of concern, access to DB via API, also can act as a caching layer, metadata can be accessed via service cache rather than db.

Uber

Web Crawler

Note:

Get document, remove duplicate documents using checkum, extract URLs in page, remove duplicate urls, feed to the queue to parse.

Fetcher will map host name to robots.txt values.

Domain name resolution: Before contacting a Web server, a Web crawler must use the Domain Name Service (DNS) to map the Web server’s hostname into an IP address. DNS name resolution will be a big bottleneck of our crawlers given the amount of URLs we will be working with. To avoid repeated requests, we can start caching DNS results by building our local DNS server.

URL Shortner

Key point here is a separate key generation service and cleanup service

Yelp

Dropbox

Distributed Message Queue

Distributed Cache

Distributed Cache at Netflix

Twitter

Netflix

Distributed Rate Limiter

  • Client Identifier Builder assigns unique id/client to get originator.

  • Rate limiter coordinates with throttler and processor to pass or reject.

  • Throttling Service implements token bucket, sliding window using requests stored in rules db.

Notification Service

  • Metadata service for caching and separation of concern.

  • Temporary storage to store message for a while

  • Sender reads data from Metadata service to know which message to send to whom.

Temporary storage can be NoSQL, Key Value (Redis) or Column Based (Cassandra) or streaming service (kafka)

Message sharing within cluster

420

See Table :

Two popular Implementation Memcached (used by Netflix & Facebook) & Redis (used at )

Two popular async message queuing services:. Supports message queuing and publish/subscribe.

Message Queuing

Message Exchange ( as in Rabbit MQ, SQS): Producers submit messages to exchange. Consumers create a queue, subscribes to exchange. Exchange sends message to appropriate queues. It also takes care of failure, retry, expiry, routing. Ordering of consumption not guaranteed.

See Kafka Use cases :

See RabbitMQ usecases :

Work Queues : Distributing tasks among workers (the ) : Cron job.

RPC: example.

: Provides configuration information, naming, synchronization and group services over large clusters in distributed systems.

Popular apps: Zuul, . Read :

A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.

API Gateway operates at Layer 7. Just like layer 7 load balancers it knows about content. Thus induces an overhead in extra hop at API gateway.

(Popular example Istio) operates at layer 4. Provides inter service communication. Works like a sidecar to services. Handles service discovery, circuit breaking, timeouts, retries, encryption, tracebility, authentication and authorization among services.

Reference:

[Like HashMap(partition) of HashMap(Rows or clusters)]: First K is the partition key and is used to determine which node the data lives on and where it is found on disk. The partition contains multiple rows within it . Second K (clustering key) finds the row within a partition. The clustering key (Second K) acts as both a primary key within the partition and how the rows are sorted. You can think of a partition as an ordered dictionary.

See how used it to scale their storage. They made KK in KKV as ((channel_id, bucket), message_id). Message id was a .

Max Websocket Connection: 65k (65,536) socket connections i.e max number of TCP ports available. Ea. So the quantity can be easily extended by adding additional network interfaces to a server. (HaProxy, ELB does TCP balancing as well)

Reference:

Reference:

: A must read

Reference:

Reference: Detailed components

Reference:

Resource:

Resource :

UnshardedSharded

Reference: Netflix ImplementationLevel 1Level 2

Reference:

Resources:

Token Bucket algorithm

Resources: . Kind of like building RabbitMQ

Frontend Service Host caches the metadata

Resources:

https://iamondemand.com/blog/elb-vs-alb-vs-nlb-choosing-the-best-aws-load-balancer-for-your-needs/
High Speed Key Value storage.
Pinterest
Eviction policies
RabbitMQ and Kafka
https://kafka.apache.org/uses
https://www.rabbitmq.com/getstarted.html
competing consumers pattern
Request/reply pattern
SNS SQS
Zookeeper
Amazon API Gateway
My experiences with API gateway
Service Mesh
Understanding database scaling patterns | by Kousik Nath | Medium
A KKV value storage
Discord
snowflake id
ch server can handle 65,536 sockets per single IP address
UBER System design
System design Interview
Vivek Singh
May 22·10 min read
Tech Dummies
System Design
Netflix
GeeksForGeeks
some algorithm
Types
AWS
Read more.
API gateway pattern (microservices.io)
API Gateway vs. Service Mesh — DZone Microservices
System Design Analysis of Instagram
Best Article so far
RoadtoArchitect
https://www.educative.io/collection/page/5668639101419520/5649050225344512/5718998062727168
Educative
Tech Dummies
Dropbox system design
System Design Interview
Netflix
Tech Dummies
System design Interview
System design Interview
Vivek Singh