CheatSheet

PreviousMy System Design Interview Checklist NextProgramming Language Jargons

Last updated 3 years ago

Was this helpful?

CheatSheet

System Design Cheat Sheet

Reference: , , ,

For you to go through just before the interview :).

Disclaimer: This article is still under work. Still have to add notes and more architecture designs. Feel free to add comments on what else to add.

Load Balancers (LB)

Selects Servers/Databases/Caches following

Round Robin: Select servers one after another
Weighted Round Robin: Admin assigns weight, i.e. probability…
Least Connection/Response Time/ Resource Based: Dynamic Load balancing. Server with Least Connection/Response Time/ Resource Based is allotted next request. The values are calculated using client installed at servers.
Similarly we have Weighted flavor of Least Connection/Response Time/ Resource Based.

: (Ex )

L4 : Makes balancing decision only on IP address , tcp port. Cannot see request header, client, type etc.
L7 : Has info about url, message, request type, header, client everything. Can route request based on type of request

Use L4 when you need to make simple reliable and fast balancing decision on server load, which has reliable TCP connection. Use L7 when you need to route request to appropriate resource server, such as image request will go to image server etc.

Types of Load Balancers in AWS : ELB, ALB, NLB

Caches

Use Memcached when : Simple Key Value storage needed. Need to only store string. No need to perform any operational query on cache. Scaling vertically by using more cores and threads is easier. When keys are maximum 250 B and Values maximum 1MB, when you are ok with only LRU eviction policy. Its Volatile.

Use Redis when: You need to store objects (don't want to serialize deserialize : access or change parts of a data object without having to load the entire object). Scaling horizontally is easier. You need to store Set, Hash, List, Sorted Set (A non-repeating list of string values ordered by a score value). When you want to chose from multiple eviction policies. When you would want to save data (its non volatile)

No eviction returning an error the memory limit is reached.
All keys LRU removing keys by the least recently used first
Volatile LRU removing keys, that have an expiration time set, by the least recently used first.
All keys random removing keys randomly
Volatile random removing keys, that have an expiration time set, randomly
Volatile TTL removing keys, that have an expiration time set, by the shortest time to live first.

Queues

Message Queue: Used to decouple producer from consumer. Name a queue as X, publish to X, consume from X. Can be used for events based intra service communication. Consumer listen for events in queue.

Publish/subscribe: Used in Notification system, backend job that takes lot of time and one action that triggers multiple services. Provides : lose coupling between message production and consumption, fault tolerance, retry messages failed to be consumed and independently scalable.
Can be done in following ways

Streaming Platform ( as in Kafka, Kinesis): Producers just write to their specific logs(Commit log/Write ahead logs). Consumers read data from logs using offset. Failures, retries, deletion, filtering all handled by consumers. Order is guaranteed per consumer per partition per topic. Replaying past message is possible(move the offset). Uses Zookeeper ( store information about brokers, topics, partitions, partition leader/followers, consumer offsets, etc.).

Topics in Kafka is like table in database. Thus, each message is associated with topic. Consumers subscribe to topic ( like update event on table). Multiple subscribers read from a topic( eg, one can send mail, other send phone message on order ship message in shipping topic). Messages not deleted on reading ( can have TTL), and written on disk. Since its sequential write its faster.

Central system managing queue is called Broker.

Pub Sub: Sending messages to many consumers at once : Notification service, intra service communiation
Routing: Receiving messages selectively : Queuing, retries
Topics: Receiving messages based on a pattern (topics) : Selective Notification

In SNS SQS, SQS is simply a queue, SNS on the other hand behaves like a broker. SQS behaves as subscribers EC2 behaves as consumers of SQS messages So SNS+SQS = Kafka. SQS alone is not. SQS is more like like partition in Kafka.

Configuration Service

Used to manage cluster of servers, databases or caches. External services can interact with clusters via zookeeper. Zookeeper names services, knows their ips, elects leader, provides failure recovery.

API Gateway vs Service Mesh

API Gateway provides a single entry point for a client for a number of different underlying APIs (system interfaces/web services/Rest APIs, Lambdas, etc.). Performs traffic management, authorization and access control, monitoring, and API version management.

CDN

Content Delivery Networks (Akamai, Cloudfare etc) : Mostly store static data which can be pushed by servers or CDN can pull from servers on a miss or at speculated times. Globally distributed so stays near clients and is fast.

Use when you have some static data to serve near client, like images of restaurants/food on yelp, telephone directory

How to Scale database

Brief: Query optimization -> vertical scaling -> Master slave -> multi master -> partitioning -> sharding -> multi-datacenter replication

Cassandra

Wide Column NoSQL
Tunable consistency -> read by qourum. type of quorum defines consistency level
Fast wright -> Writes in log sequentially. Tombstoning and Compation in backend.

Snowflake at Twitter

Generates tens of thousands of ids per second in a highly available manner and is 64 bits (UUID is 128)

These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how most Twitter clients sort tweets.

To generate the roughly-sorted 64 bit ids in an uncoordinated manner, ids are generated as composition of: timestamp, worker number and sequence number.

Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper.

Some numbers

Availability : 99.99% ~ 50 min downtime/year | 99.999 % ~ 5min downtime /year | 99.9999% ~= 30seconds downtime/year

Bandwidth : Average EC2 instance is 5Gb/s

Requests/second : On average 1 server can process 1000 requests/second

Low end dedicated MySQL server (2 cores, 4 GB RAM) can serve 100 requests/sec on an average ~10 million/day with CPU idle rate of 90%.

Some More Numbers

Approximation : 1 day ~ 10⁴ seconds

Daily active users : Twitter: ~200M/day (200/sec), Facebook: ~2Bn (2000/sec), Whatsapp: ~200M, Netflix: 200M/day

Photos uploaded: 200M/day (200/sec)(Instagram)

Videos uploaded: 500 hours/minute (Youtube)

Uber: 20 million trips/day (20*10⁶ / day => 2000/sec)

Bandwidth : Say 200 reads/second, each read needs 10kb data (10⁴ characters ) 2000kb/sec => 2MB/sec

……………….Architecture Designs…………………..

Instagram

Precompute : User A posts “x”. User B,C,D follows. Then “x” can be appended to User B,C,D news feed inside cache. This is called fanning out. In hybrid approach, if A is a celebrity, then its post is not fanned out ( 1 post need to be written to millions cache).
Notification server needs to maintain a persistent websocket connection to push data to User B
Place Metadata service to : provide separation of concern, access to DB via API, also can act as a caching layer, metadata can be accessed via service cache rather than db.

Uber

Web Crawler

Note:

Get document, remove duplicate documents using checkum, extract URLs in page, remove duplicate urls, feed to the queue to parse.

Fetcher will map host name to robots.txt values.

Domain name resolution: Before contacting a Web server, a Web crawler must use the Domain Name Service (DNS) to map the Web server’s hostname into an IP address. DNS name resolution will be a big bottleneck of our crawlers given the amount of URLs we will be working with. To avoid repeated requests, we can start caching DNS results by building our local DNS server.

URL Shortner

Key point here is a separate key generation service and cleanup service

Yelp

Dropbox

Distributed Message Queue

Distributed Cache

Distributed Cache at Netflix

Twitter

Netflix

Distributed Rate Limiter

Client Identifier Builder assigns unique id/client to get originator.
Rate limiter coordinates with throttler and processor to pass or reject.
Throttling Service implements token bucket, sliding window using requests stored in rules db.