• Original address: github.com/donnemartin…
  • The Nuggets translation Project
  • Translators: XatMassacrE, L9m, Airmacho, Xiaoyusilen, Jifaxu
  • Please continue to follow the Chinese maintenance link for the latest content.

Introduction to System Design








translation

Interested in translation? Here’s a translation in progress:

  • Brazilian Portuguese
  • Simplified Chinese (completed)
  • Turkish

purpose

Learn how to design large systems.

Prepare for the system design interview.

Learn how to design large systems

Learning how to design scalable systems will help you become a better engineer.

System design is a broad topic. Resources on system design principles are plentiful on the Internet.

The repository is an organizational collection of these resources that can help you learn how to build scalable systems.

Learn from the open source community

This is the first release of a continuously updated open source project.

Contributions are welcome!

Prepare for the system design interview

In addition to code interviews, system design is a necessary part of the technical interview process at many tech companies.

Practice common system design interview questions and match your answers with examples: discussion, code, and diagrams.

Other interview preparation topics:

  • Learning guidance
  • How to handle a system design interview question
  • System designed interview questions,With solutions
  • Object-oriented interview questions,With solutions
  • Other system design interview questions

flashcards








The stack of flashcards provided here uses spaced repetition to help you memorize key system design concepts.

  • System design card pile
  • System design practice card stack
  • A stack of practice cards for object-oriented design

It can be used anytime, anywhere.

Code Resources: Interactive programming challenges

Are you looking for resources to prepare for a programming interview?








Check out our sister warehouse interactive programming challenge, which includes an additional stack of flashcards:

  • Legal code certificate heap

contribution

Learn from the community.

Welcome to submit PR for help:

  • Fixed error
  • Complete section
  • Add a section

Some of the content that still needs to be improved is being improved.

See the contribution guide.

Index of system design topics

A summary of various system design topics, including pros and cons. Each subject faces trade-offs and trade-offs.

Each chapter contains links to more resources.








  • System Design Theme: Start here
    • Step 1: Review the extensibility video lecture
    • Step 2: Review the extensibility article
    • Next steps
  • Performance and extensibility
  • Latency and throughput
  • Availability and consistency
    • Theory of CAP
      • CP – Consistency and partition fault tolerance
      • AP – Availability and partition fault tolerance
  • Consistent patterns
    • Weak consistency
    • Final consistency
    • Strong consistency
  • Available model
    • failover
    • copy
  • The domain name system
  • CDN
    • CDN push
    • CDN pull
  • Load balancer
    • Work to active-passive Switchover
    • Dual-work switchover (active-active)
    • Layer 4 load balancing
    • Layer 7 load balancing
    • Horizontal scaling
  • Reverse proxy (Web server)
    • Load balancing and reverse proxy
  • The application layer
    • Micro service
    • Service discovery
  • The database
    • Relational database Management System (RDBMS)
      • The Master – slave copy set
      • The Master – Master copy set
      • The joint
      • shard
      • informal
      • SQL tuning
    • NoSQL
      • The Key – value stored
      • Document storage
      • Wide column storage
      • Figure database
    • SQL or no
  • The cache
    • Client cache
    • CDN cache
    • Web server cache
    • Database cache
    • The application cache
    • Database query-level caching
    • Object level caching
    • When to update the cache
      • The caching pattern
      • Direct writing mode
      • Back to the write mode
      • The refresh
  • asynchronous
    • The message queue
    • Task queue
    • Back pressure mechanism
  • communication
    • Transmission Control Protocol (TCP)
    • User Datagram Protocol (UDP)
    • Remote Control Call Protocol (RPC)
    • Declarative State Transition (REST)
  • security
  • The appendix
    • Two to the power of two
    • The number of delays that every programmer should know
    • Other system design interview questions
    • Real architecture
    • The company’s system architecture
    • Company Engineering Blog
  • In progress
  • Thank you
  • contact
  • The license

Learning guidance

Review the recommended topics based on your interview timeline (short, medium, or long).

Imgur

Q: Do I need to know everything here for an interview?

A: No, you don’t need to know everything just to prepare for the interview.

What you will be asked in an interview depends on the following factors:

  • Your experience
  • Your technical background
  • The position you are interviewing for
  • The company you are interviewing with
  • luck

Experienced candidates are usually expected to know more about system design. The architect or team lead is expected to know more than just their individual contributions. Top tech companies also usually have one or more system design interviews.

The interview will be broad and in-depth in several areas. This will help you understand some of the different themes of system design. Make appropriate adjustments to the following guidelines based on your timeline, experience, position and company.

  • Short term – Aim for the breadth of system design topics. Practice by solving some interview questions.
  • Middle stage – Aims at the breadth and primary depth of system design topics. Practice by solving lots of interview questions.
  • Long term – Aim for breadth and advanced depth of system design topics. Contact by solving most interview questions.
In the short term Mid – For a long time
readingSystem Design ThemeTo get a general idea of how the system works : + 1: : + 1: : + 1:
Read some of the ones you are interviewing forCompany Engineering BlogIn the article : + 1: : + 1: : + 1:
readingReal architecture : + 1: : + 1: : + 1:
reviewHow to handle a system design interview question : + 1: : + 1: : + 1:
completeSystem design of interview questions and answers Some of the A lot of Most of the
completeObject-oriented design of interview questions and solutions Some of the A lot of Most of the
reviewOther system design interview questions Some of the A lot of Most of the

How to handle a system design interview question

The system design interview is an open dialogue. They expect you to lead the conversation.

You can use the following steps to guide the discussion. To consolidate this process, use the following steps to complete the system design interview questions and solutions section.

Step 1: Describe usage scenarios, constraints, and assumptions

Put everything you need together and look at the problem. Keep asking questions so that we can define scenarios and constraints. Discuss assumptions.

  • Who will use it?
  • How will they use it?
  • How many users?
  • What is the role of the system?
  • What are the inputs and outputs of the system?
  • How much data do we want to process?
  • How many requests do we want to process per second?
  • What literacy ratio do we want?

Step 2: Create a high-level design

Use all the important components to describe a high-level design.

  • Draw the main components and connections
  • Prove your idea

Step 3: Design the core components

Conduct a detailed in-depth analysis of each of the core components. For example, if you are asked to design a URL shortening service, start the discussion:

  • Generates and stores a hash of the full URL
    • MD5 and Base62
    • Hash collision
    • SQL or no
    • Database model
  • Translate a hashed URL into the full URL
    • Database lookup
  • API and object-oriented design

Step 4: Measure the design

Identify and address bottlenecks and limitations. For example, do you need the following to complete an extended issue?

  • Load balancing
  • The level of development
  • The cache
  • Database sharding

Discuss possible solutions and costs. Everything has to be a trade-off. Bottlenecks can be addressed using design principles for scalable systems.

Calculation of the back of the envelope

You may be asked to make some estimates by hand. The appendices involved refer to the following resources:

  • Use the back of an envelope for calculations
  • Two to the power of two
  • The number of delays that every programmer should know

Related resources and further reading

Check out the link below for a better idea of what we expected:

  • How to pass a systematic design interview
  • System design interview
  • System architecture and design interview brief

System design of interview questions and answers

Common system design interview questions and related examples of discourse, codes and diagrams.

Solutions related to the content are in the solutions/ folder.

The problem
Design Pastebin.com (or bit.ly) answer
Designing Twitter timelines and searches (or Facebook feeds and searches) answer
Design a web crawler answer
Design: Mint.com answer
Design data structures for a social network answer
Design a key-value store for a search engine answer
Design Amazon’s sales rankings by category features answer
Design a million-user system on AWS answer
Add a system design problem contribution

Design Pastebin.com (or bit.ly)

See practices and solutions

Imgur

Designing Twitter timelines and searches (or Facebook feeds and searches)

See practices and solutions

Imgur

Design a web crawler

See practices and solutions

Imgur

Design: Mint.com

See practices and solutions

Imgur

Design data structures for a social network

See practices and solutions

Imgur

Design a key-value store for a search engine

See practices and solutions

Imgur

Design Amazon sales rankings by category

See practices and solutions

Imgur

Design a million-user system on AWS

See practices and solutions

Imgur

Object-oriented design of interview questions and solutions

Common object-oriented design interview questions and discussion of examples, code and diagrams.

Content related solutions are in the solutions/ folder.

Note: This section is still being improved

The problem
Design a hash map The solution
Design the LRU cache The solution
Design a call center The solution
Design a deck of cards The solution
Design a parking lot The solution
Design a chat service The solution
Design a circular array To be solved
Add an object-oriented design problem To be solved

System Design Theme: Start here

Not familiar with system design?

First, you need to have a basic understanding of the general principles, what they are, how to use them, and the pros and cons.

Step 1: Review the scalability video lecture

Harvard University Lecture on Scalability

  • Topics include
    • Vertical Scaling
    • Horizontal scaling
    • The cache
    • Load balancing
    • Database replication
    • Database partition

Step 2: Review the extensibility article

scalability

  • Topics covered:
    • Clones
    • The database
    • The cache
    • asynchronous

Next steps

Next, we’ll look at higher-order trade-offs:

  • Performance and scalability
  • Latency and throughput
  • Availability and consistency

Remember that there are trade-offs and trade-offs on every front.

Then, we’ll dive into more specific topics such as DNS, CDN, and load balancers.

Performance and scalability

A service is scalable if its performance increases in proportion to the increase in resources. Often, improving performance means serving more units of work, and on the other hand, as the data set grows, it can also handle larger units of work. 1

Another way to look at performance and scalability:

  • If your system has performance issues, it’s slow for a single user.
  • If your system has scalability issues, individual users are fast but slow under high loads.

Sources and further reading

  • A few words about scalability
  • Scalability, availability, stability and patterns

Latency and throughput

Delay is the time it takes to perform an operation or the result of an operation.

Throughput is the number of such operations or operations per unit of time.

In general, you should aim to maximize throughput with acceptable levels of latency.

Sources and further reading

  • Understand latency and throughput

Availability and consistency

Theory of CAP








Source: CAP theory

In a distributed computing system, only the following two points can be met simultaneously:

  • Consistency – Each access gets the latest data but may receive an error response
  • Availability – A non-error response is received on each access, but the latest data is not guaranteed
  • Partition fault tolerance – The system can continue to operate in the event of a network failure in any partition

Networks are unreliable, so you need to support partition fault tolerance and make trade-offs between software availability and consistency.

CP — Consistency and partition fault tolerance

Waiting for a response from a partitioned node may cause delay errors. If your business needs require atomic reading and writing, CP is a good choice.

AP — Availability and partition fault tolerance

The most recent version of the data available on the response node may not be up to date. When the partition is resolved, writes may take some time to propagate.

An AP is a good choice if the business requirements allow for ultimate consistency or require the system to continue operating when there is an external failure.

Sources and further reading

  • Now, CAP theory
  • CAP theory is introduced in an easy-to-understand way
  • CAP FAQ

Consistency pattern

With multiple copies of the same data, we are faced with the choice of how to synchronize them so that the client has a consistent display of data. Recall the definition of consistency in CAP theory — you get the latest data every time you access it but you might get an error response

Weak consistency

After a write, the access may or may not see (write data). Try to optimize it so that it can access the latest data.

This can be seen in systems such as Memcached. Weak consistency works well in real-world use cases such as VoIP, video chat, and real-time multiplayer games. For example, if you lose your signal for a few seconds during a call, you can’t hear those seconds when you reconnect.

Final consistency

After a write, the access eventually sees the data being written (usually in milliseconds). Data is asynchronously replicated.

Systems such as DNS and email use this approach. Final consistency works well in high availability systems.

Strong consistency

Access is visible immediately after writing. Data is replicated synchronously.

This approach is used in file systems and relational databases (RDBMS). Strong consistency works well in systems that require recording.

Sources and further reading

  • Transactions across data centers

Availability pattern

There are two modes that support high availability: Fail-over and replication.

failover

Work to active-passive Switchover

The failover process for working to standby is that the working server sends periodic signals to the standby server. If the periodic signal is interrupted, the standby server switches to the IP address of the standby server and resumes services.

The downtime depends on whether the standby server is in “hot” standby or needs to be started from “cold” standby. Only the worker server handles the traffic.

Failover to standby is also known as master-slave switchover.

Dual-work switchover (active-active)

In dual-work switching, both sides manage traffic and spread the load between them.

If it is an external server, DNS will need to know both sides. If it is an Intranet server, the application logic will need to know both sides.

Dual – work switchover can also be called master – master switchover.

Defect: Failover

  • Failover requires additional hardware and complexity.
  • If the working system fails before the newly written data can be copied to the standby system, data can be lost.

copy

Master-slave replication and master-master replication

This topic further explores the database section:

  • Master-slave replication
  • Master — Master replication

The domain name system








Source: DNS security introduction

The domain name system converts domain names such as www.example.com into IP addresses.

The domain name system is hierarchical, with some DNS servers at the top level. When querying (domain name) IP, the route or ISP provides the information to connect to the DNS server. The lower level DNS server caches the mapping, which may fail due to DNS propagation delays. DNS results can be cached in the browser or operating system for a period of time, depending on the TTL.

  • NS Record (Domain name Service) – Specifies the DNS server that resolves domain names or subdomain names.
  • MX Record (mail exchange) – Specifies the mail server to receive messages from.
  • A Record (Address) specifies the IP address of the specified domain name.
  • CNAME (specification)– Mapping a domain name to another domain name orCNAMERecord (example.com points to www.example.com) or map to oneARecord.

Platforms such as CloudFlare and Route 53 provide the capability to manage DNS. Some DNS services route traffic centrally:

  • Weighted polling scheduling
    • Prevent traffic from entering the server in maintenance
    • Load balancing between clusters of different sizes
    • A/B testing
  • Delay based routing
  • Routing based on geographic location

Defect: DNS

  • While caching mitigates DNS latency, connecting to a DNS server still introduces a slight delay.
  • Although they are typically managed by governments, ISPs, and large corporations, DNS service management can still be complex.
  • The DNS service recently suffered a DDoS attack that prevented users who did not know Twtter’s IP address from accessing Twiiter.

Sources and further reading

  • DNS architecture. Aspx)
  • Wikipedia
  • An article on DNS

Content Delivery Network (CDN)








Source: Why use CDN

The Content Delivery Network (CDN) is a global distributed network of proxy servers that provide content from a location close to users. Typically static content such as HTML/CSS/JS, images and videos is provided by the CDN, although amazon CloudFront and others also support dynamic content. The DNS resolution of the CDN tells the client which server to connect to.

Storing content on a CDN provides performance in two ways:

  • Provide resources from data centers close to users
  • Your server doesn’t have to actually process requests through a CDN

CDN Push

When the content on your server changes, push the CDN to accept new content. The CDN address that pushes directly to the CDN and overwrites the URL to point to your content. You can configure when content expires and when to update it. Content is pushed only when it changes or is added, minimizing traffic but maximizing storage.

CDN Pull

CDN pull is to pull a resource from the server when the first user requests it. You leave the content on your own server and rewrite the URL to the CDN address. Until the content is cached on the CDN, which will only make the request slower,

The TTL determines how long the cache lasts. CDN pull minimizes the storage space on the CDN, but can result in redundant traffic if expired files are pulled before actual changes are made.

CDN pull works well for high-traffic sites because traffic can be spread more evenly only if recently requested content is kept in the CDN.

Defect: the CDN

  • CDN costs may vary by traffic, and you may not use a CDN after a tradeoff.
  • If the content is updated before the TTL expires, the CDN cache content may become obsolete.
  • The CDN needs to change the URL address of static content to point to the CDN.

Sources and further reading

  • Global content distribution network
  • CDN pull and CDN push difference
  • Wikipedia

Load balancer








Source: Extensible system design patterns

The load balancer distributes incoming requests to computing resources such as application servers and databases. In either case, the load balancer returns the response from the computing resource to the appropriate client. The utility of a load balancer is:

  • Prevent requests from going to bad servers
  • Preventing resource overload
  • Helps eliminate single points of failure

Load balancers can be implemented with hardware (expensive) or software such as HAProxy. Added benefits include:

  • SSL endDecrypt the incoming request and encrypt the server response so that the back-end server does not have to perform these potentially costly operations.
    • X.509 certificates do not need to be installed on every server.
  • Session retention — If the Web application does not track the Session, it issues cookies and routes requests from specific clients to the same instance.

Multiple load balancers in active-standby or dual-mode are typically set up to prevent failure.

Load balancers can route traffic in a number of ways:

  • random
  • The minimum load
  • Session/cookie
  • Polling scheduling or weighted polling scheduling algorithm
  • Layer 4 load balancing
  • Layer 7 load balancing

Layer 4 load balancing

The four-tier load balancer looks at the transport layer to determine how to distribute requests. Typically, this refers to the source, destination IP address, and port in the request header, but not the packet (packet) content. Layer-4 load balancers perform network address translation (NAT) to forward network packets to upstream servers.

A seven-layer load balancer

The seven-tier load balancer monitors the application layer to determine how requests are distributed. This involves the content of the request header, the message, and the cookie. A layer 7 load balancer terminates network traffic, reads messages, makes load balancing decisions, and passes them on to specific servers. For example, a seven-tier load balancer can connect video traffic directly to the server hosting the video, while directing more sensitive user billing traffic to a more secure server.

At the expense of flexibility, four-tier load balancing takes less time and computing resources than seven-tier load balancing, although it has little impact on the performance of modern commercial hardware.

Horizontal scaling

Load balancers can also help scale horizontally, improving performance and availability. Using commercial hardware is more cost-effective and has higher availability than scaling more expensive hardware vertically on a single piece of hardware. It’s easier to hire people for business hardware than it is for enterprise specific systems.

Defect: Horizontal scaling

  • Horizontal scaling introduces complexity and involves server replication
    • Servers should be stateless: they should also not contain data associated with the user, such as sessions or profile pictures.
    • Sessions can be centrally stored in a database or data store in a persistent cache (Redis, Memcached).
  • Downstream servers such as caches and databases need to scale with upstream servers to handle more concurrent connections.

Defect: Load balancer

  • Load balancers can become a performance bottleneck if there are insufficient or incorrectly configured resources.
  • The introduction of load balancers to help eliminate single points of failure resulted in additional complexity.
  • A single load balancer can lead to a single point of failure, but configuring multiple load balancers adds further complexity.

Sources and further reading

  • NGINX architecture
  • HAProxy Architecture Guide
  • scalability
  • Wikipedia)
  • Four layers of load balancing
  • Seven-layer load balancing
  • ELB listener configuration

Reverse proxy (Web server)








Source: Wikipedia





A reverse proxy is a Web server that centrally invokes internal services and provides a unified interface to common customers. Requests from the client are forwarded by the reverse proxy server to a responsive server, and the proxy returns the response from the server to the client.

The benefits include:

  • Enhanced security – Hides back-end server information, blocks IP addresses in the blacklist, and limits the number of connections to each client.
  • Improved scalability and flexibility – clients can only see the IP of the reverse proxy server, which allows you to add or subtract servers or modify their configuration.
  • The SSL session is terminated locally– Decrypt the incoming request and encrypt the server response so that the back-end server does not have to complete these potentially costly operations.
    • Eliminates the need to install X.509 certificates on every server
  • Compression – Compresses the server response
  • Cache – Returns hit cache results directly
  • Static content– Provide static content directly
    • HTML/CSS/JS
    • The picture
    • video
    • , etc.

Load balancer and reverse proxy

  • Deploying a load balancer is useful when you have multiple servers. Typically, a load balancer routes traffic to a set of servers that perform the same functions.
  • Reverse proxies are useful even when there is only one Web server or application server, as described in the previous section.
  • Solutions such as NGINX and HAProxy can support both Layer 7 reverse proxy and load balancing.

The downside: Reverse proxy

  • Introducing reverse proxies increases the complexity of the system.
  • A single reverse proxy server may still fail. Configuring multiple reverse proxy servers (for example, failover) adds complexity.

Sources and further reading

  • Reverse proxy and load balancing
  • NGINX architecture
  • HAProxy Architecture Guide
  • Wikipedia

The application layer








Source: Introduction to scalable system architecture

By separating the Web services layer from the application layer (also known as the platform layer), the two layers can be scaled and configured independently. Adding a new API requires only adding an application server, not additional Web servers.

The single responsibility principle encourages small, autonomous services to work together. Small teams can plan for growth more aggressively by offering smaller services.

Worker processes in the application layer can also be asynchronous.

Micro service

Related to this discussion is the topic of microservices, which can be described as a series of small, modular services that can be deployed independently. Each service runs in a separate thread and communicates through clearly defined lightweight mechanisms to achieve business goals. 1

For example, Pinterest might have these micro-services: user profiles, followers, Feed streams, search, photo uploads, and so on.

Service discovery

Systems like Consul, Etcd, and Zookeeper help services find each other by tracking registrations, addresses, ports, and more. Health checks can help verify the integrity of the service and whether an HTTP path is often used. Consul and Etcd both have a built-in key-value store for storing configuration information and other shared information.

The downside: The application layer

  • Adding an application layer consisting of multiple loosely coupled services can be very different from a single system in terms of architecture, operations, processes, and so on.
  • Microservices add complexity to deployment and operations.

Sources and further reading

  • Introduction to scalable system architecture
  • Hack the system design interview
  • Service-oriented Architecture
  • They are introduced
  • Everything you need to know to build microservices

The database








Source: Expand your user base to your first 10 million

Relational database Management System (RDBMS)

Relational databases such as SQL are collections of data items organized as tables.

Note: the author of SQL may refer to MySQL

ACID is used to describe the characteristics of relational database transactions.

  • Atomicity – All operations within each transaction either complete or do not complete.
  • Consistency – Any transaction causes the database to transition from one valid state to another.
  • Isolation – The results of transactions executed concurrently are the same as those executed sequentially.
  • Persistence – After a transaction is committed, the impact on the system is permanent.

Relational database extension includes many techniques: master-slave replication, master-master replication, federation, sharding, de-normalization, and SQL tuning.








Sources: Scalability, availability, stability, patterns

A master-slave replication

The master library is responsible for both read and write operations and copies writes to one or more slave libraries, which are only responsible for read operations. A tree of slave libraries copies writes to more slave libraries. If the primary library is offline, the system can run in read-only mode until a secondary library is promoted to the primary or a new primary appears.

Disadvantages: master-slave replication
  • Additional logic is required to promote the slave library to the master library.
  • Disadvantages: Problems common to master-slave replication and master-master replication in replication.








Sources: Scalability, availability, stability, patterns

The main master replicates

Both primary libraries are responsible for both read and write operations, and write operations are coordinated. If one of the primary libraries hangs up, the system can continue reading and writing.

The downside: Master master replication
  • You need to add a load balancer or make changes in your application logic to determine which database to write to.
  • Most master-master systems either do not guarantee consistency (a violation of ACID) or write delays due to synchronization.
  • As more write nodes are added and latency increases, how to resolve conflicts becomes more and more important.
  • Disadvantages: Problems common to master-slave replication and master-master replication in replication.
The downside: Replication
  • If the master library fails before copying newly written data to another node, there is the potential for data loss.
  • The write is replaced into the copy responsible for the read operation. The copy may be blocked by too many write operations, causing the read function to be abnormal.
  • The more slave libraries are read, the more written data needs to be copied, resulting in more severe replication delays.
  • In some database systems, writing to the main library can be written in parallel with multiple threads, but reading copies only supports single-thread sequential writing.
  • Replication means more hardware and additional complexity.
Sources and further reading
  • Scalability, availability, and stability patterns
  • The master replicate more

The joint








Source: Expand your user base to your first 10 million

Federation (or partitioning by function) Divides a database into corresponding functions. For example, you can have three databases: forums, users, and products instead of just one monolithic database, thus reducing read and write traffic and replication latency for each database. A smaller database means more data that fits into memory, which in turn means a higher chance of cache hits. There is no centralized master library that can only write serially, so you can write in parallel, increasing load capacity.

The downside: Unity
  • If your database schema requires a large number of functions and tables, federation is not efficient.
  • You need to update the application logic to determine which database to read and write to.
  • Joining data from two libraries with a Server link is more complicated.
  • Federation requires more hardware and additional complexity.
Source and extension of reading: Union
  • Expand your user base to the first 10 million

shard








Sources: Scalability, availability, stability, patterns

Sharding distributes data across different databases so that each database manages only a subset of the entire dataset. Take the user database as an example. As the number of users increases, more and more shards are added to the cluster.

Similar to the benefits of federation, sharding can reduce read and write traffic, reduce replication, and improve cache hit ratio. There are also fewer indexes, which usually means faster queries and better performance. If one shard fails and the others still work, you can use some form of redundancy to prevent data loss. Similar to federation, there is no centralized master library that can only write serially, you can write in parallel, increasing load capacity.

The common practice is to separate the user table by the first letter of the user’s last name or the user’s geographical location.

The downside: Fragmentation
  • You need to modify the application logic to implement sharding, which leads to complex SQL queries.
  • Improper fragmentation may result in unbalanced data loads. For example, frequently accessed user data can cause the shard in which it resides to be loaded higher than other shards.
    • Rebalancing introduces additional complexity. The sharding algorithm based on consistent hash can reduce this situation.
  • Joining multiple shards of data is more complex.
  • Sharding requires more hardware and additional complexity.

Source and extension: Sharding

  • The Age of sharding
  • Database sharding architecture)
  • Consistency hashing

informal

Canonicalization attempts to trade write performance for read performance. Redundant copies of data in multiple tables to avoid costly join operations. Some relational databases, such as PostgreSQl and Oracle, support materialized views to handle redundant information storage and ensure that redundant copies are consistent.

This further increases the complexity of handling join operations across data centers when data is partitioned using techniques such as federation and sharding. De-normalization can circumvent this complex join operation.

On most systems, reads are much more frequent than writes, by a ratio of 100:1 or even 1000:1. Read operations that require complex database joins are very expensive and take a lot of time on disk operations.

The downside: non-standardization
  • Data will be redundant.
  • Constraints can help keep redundant copies of information in sync, but they can add complexity to database design.
  • An unnormalized database may perform worse than a normalized database under high write loads.
Source and further reading: non – standardization
  • informal

SQL tuning

SQL tuning is a very broad topic, and there are many books on it.

It is important to use benchmarking and performance analysis to simulate and discover system bottlenecks.

  • Benchmarking – use tools such as AB to simulate high load conditions.
  • Performance analysis – Assist in tracking performance issues by enabling tools such as slow query logging.

Benchmarking and performance analysis may lead you to the following optimizations.

Improved model
  • For fast access, MySQL stores data in contiguous blocks on disk.
  • useCHARType stores fields of fixed length. Do not use itVARCHAR.
    • CHAREfficient at fast, random access. If you are usingVARCHARIf you want to read the next string, you have to read to the end of the current string.
  • useTEXTType stores large blocks of text, such as the body of a blog.TEXTBoolean searches are also allowed. useTEXTThe field needs to store a pointer on disk to locate the text block.
  • useINTType stores larger numbers up to 2^32 or 4 billion.
  • useDECIMALType store currency to avoid floating point representation errors.
  • Avoid the use ofBLOBSStore objects, store the location where objects are stored.
  • VARCHAR(255)Is the maximum number of characters stored in 8 digits, in some relational databases, to maximize the use of bytes.
  • Set this parameter in the application scenarioNOT NULLConstraints toImprove search performance.
Use the correct index
  • Are you enquiring (SELECT,GROUP BY,ORDER BY,JOIN) columns are faster if they are indexed.
  • An index is usually represented as a self-balanced B-tree that keeps the data in order and allows search, sequential access, insertion, and deletion in logarithmic time.
  • Setting indexes will store data in memory, occupying more memory space.
  • Write operations are slow because the index needs to be updated.
  • When loading a large amount of data, it may be faster to disable the index, reload the data, and then rebuild the index.
Avoid costly join operations
  • It can be de-normalized if performance is required.
Split table
  • Splitting the hotspot data into separate tables can help with caching.
Tune the query cache
  • In some cases, query caching can cause performance problems.
Sources and further reading
  • MySQL query optimization Tips
  • Why is VARCHAR(255) common?
  • How do Null values affect database performance?
  • Slow Query logs

NoSQL

NoSQL is a general term for key-value database, document database, column database, or graph database. Databases are de-normalized, and table joins are mostly done in application code. Most NoSQL cannot implement true ACID-compliant transactions that support ultimate consistency.

BASE is commonly used to describe the features of NoSQL databases. Compared to CAP theory, BASE emphasizes usability over consistency.

  • Basic availability – The system guarantees availability.
  • Soft state – System state may change over time even without input.
  • Final consistency – The system eventually becomes consistent over a period of time when no input is received.

In addition to choosing between SQL or NoSQL, it is helpful to know which type of NoSQL database is best for your use case. We’ll take a quick look at key-value storage, document storage, column storage, and graph storage databases in the next section.

Key-value storage

Abstract model: hash table

Key-value storage can usually implement O(1) time read and write, with data stored in memory or SSD. The data store can maintain keys in lexicographical order to achieve efficient retrieval of keys. Key-value stores can be used to store metadata.

Key-value stores have high performance and are typically used to store simple data models or frequently modified data, such as caches in memory. Key-value stores provide limited operations, and if more operations are needed, the complexity is shifted to the application level.

Key-value storage is the basis for more complex storage systems such as document storage and, in some cases, even graph storage.

Sources and further reading

  • Key-value database
  • Disadvantages of key-value storage
  • Redis architecture
  • Memcached architecture

Document type storage

Abstract model: Documents are stored as key-value values of values

Document type storage is centered on a document (XML, JSON, binary files, and so on) that stores all information about a given object. The document store provides apis or query statements to implement queries based on the internal structure of the document itself. Note that many key-value storage databases use the feature of value storage metadata, which blurs the distinction between the two storage types.

Based on the underlying implementation, documents can be organized by collections, tags, metadata, or folders. Although different documents can be organized together or grouped together, they may have completely different fields from each other.

Some document type stores, such as MongoDB and CouchDB, also provide SQL-like query statements to implement complex queries. DynamoDB supports both key-value storage and document type storage.

Document type stores are highly flexible and are often used to deal with data that occasionally changes.

Source and Extension: Document type storage

  • Document-oriented databases
  • Mongo architecture
  • CouchDB architecture
  • Elasticsearch architecture

Column type storage








Source: SQL and NoSQL, a short history

Abstract model: nested ColumnFamily

> mappings
,>

The basic data units of type storage are columns (name/value pairs). Columns can be grouped in column families (SQL-like data tables). The super column family is subdivided into the ordinary column family. You can use row keys to access each column independently, forming a row of columns with the same row key value. Each value contains a timestamp of the version used to resolve version conflicts.

Google released its first columnar storage database, Bigtable, which influenced HBase, an active open source database in the Hadoop ecosystem, and Facebook’s Cassandra. Storage systems such as BigTable, HBase, and Cassandra store keys in alphabetical order, making it efficient to read key columns.

Column storage is highly available and scalable. It is commonly used for big data-related storage.

Source and extension: column storage
  • A brief history of SQL and NoSQL
  • BigTable architecture
  • Hbase architecture
  • Cassandra architecture

Figure database








Source: Graph database

Abstract model: diagram

In a graph database, a node corresponds to a record and an arc corresponds to a relationship between two nodes. Graph databases are optimized to represent complex or many-to-many relationships with numerous foreign keys.

Graph databases provide high performance for data models that store complex relationships, such as social networks. They are relatively new and not yet widely used, making it relatively difficult to find development tools or resources. Many diagrams are only accessible through the REST API.

Related resources and extension reading: graph
  • Figure database
  • Neo4j
  • FlockDB

Source and extension: NoSQL

  • Database Terminology
  • NoSQL Database – Survey and Decision Guide
  • scalability
  • No introduction
  • No pattern

SQL or no








Source: Transition from RDBMS to NoSQL

Reasons for choosing SQL:

  • Structured data
  • Strict model
  • Relational data
  • Complex join operations are required
  • The transaction
  • Clear extension patterns
  • More resources are available: developers, communities, code bases, tools, etc
  • Querying by index is very fast

Reasons for choosing NoSQL:

  • Semi-structured data
  • Dynamic or flexible patterns
  • Non-relational data
  • No complex join operations are required
  • Store TB (or even PB) data
  • High data intensive workloads
  • IOPS High throughput

Sample data suitable for NoSQL:

  • Buried data and log data
  • Leaderboards or score data
  • Temporary data, such as shopping carts
  • Frequently accessed (” hot “) tables
  • Metadata/lookup tables
Source and further reading: SQL or NoSQL
  • Expand your user base to the first ten million
  • Difference between SQL and NoSQL

    The cache








Source: Extensible system design patterns

Caching can speed up page loading and reduce the load on the server and database. In this model, the dispatcher checks to see if the request has been responded to before, and if so, returns the previous result directly, eliminating actual processing.

The database fragmentation evenly distributed reading is the best. But hot data can lead to uneven read distribution, which can cause bottlenecks. If you put a cache in front of the database, it can smooth out the effects of uneven load and burst traffic on the database.

Client cache

The cache can be on the client side (operating system or browser), on the server side, or at a different cache layer.

CDN cache

CDN is also seen as a cache.

Web server cache

Reverse proxies and caches, such as Varnish, can serve both static and dynamic content directly. The Web server can also cache requests and return results without having to connect to the application server.

Database cache

The default configuration of a database usually includes a cache level, optimized for general use cases. Tuning the configuration to use different modes in different situations can further improve performance.

The application cache

Memory-based caches such as Memcached and Redis are key-value stores between applications and data stores. Because the data is stored in RAM, it is much faster than a typical database stored on disk. RAM is more restrictive than disk, so cache invalidation algorithms such as least Recently Used (LRU) can keep “hot” data in RAM and leave “cold” data alone.

Redis has the following additional features:

  • Persistence option
  • Built-in data structures such as ordered collections and lists

There are multiple cache levels, which fall into two broad categories: database queries and objects:

  • row-level
  • Query level
  • A complete serializable object
  • Fully rendered HTML

In general, you should try to avoid file-based caching because it makes copying and automatic scaling difficult.

Database query-level caching

When you query a database, store the hash of the query statement and the query result in the cache. This approach encounters the following problems:

  • It is difficult to delete cached results with complex queries.
  • If a piece of data, such as an item in a table, is changed, all cached results that might contain the changed item need to be deleted.

Object level caching

Treat your data as objects, just like you treat your application code. Let an application combine data from a database into a class instance or data structure:

  • If the object’s underlying data has changed, the object is removed from the cache.
  • Allow asynchronous processing: Workers assemble objects by using the latest cached objects.

Recommended cache contents:

  • The user’s session
  • Fully rendered Web pages
  • Activity streams
  • User graph data

When to update the cache

Since you can only store a limited amount of data in the cache, you need to choose a cache update strategy that is appropriate for your use case.

The caching pattern








Sources: From cache to in-memory data grid

Applications read and write from storage. The cache does not interact directly with storage, and the application does the following:

  • Look up records in the cache if the desired data is not in the cache
  • Load the required content from the database
  • Store the found results in the cache
  • Return what you want
def get_user(self, user_id):
    user = cache.get("user.{0}", user_id)
    if user is None:
        user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
        if user is not None:
            key = "user.{0}".format(user_id)
            cache.set(key, json.dumps(user))
    return userCopy the code

Memcached is typically used this way.

Data added to the cache is read quickly. The caching mode is also known as lazy loading. Only requested data is cached, which prevents unrequested data from filling up the cache.

Disadvantages of caching:
  • If the requested data is not in the cache, three steps are required to retrieve the data, resulting in significant delays.
  • If the data in the database is updated, the data in the cache will become obsolete. The problem needs to be mitigated by setting TTL to force update cache or direct write mode.
  • When a node fails, it is replaced by a new node, which increases the latency.

Direct writing mode








Sources: Scalability, availability, stability, patterns

The application uses the cache as the primary data store, reading and writing data to the cache, which is responsible for reading and writing data from the database.

  • The application adds/updates data to the cache
  • The cache writes synchronously to the data store
  • Return what you want

Application code:

set_user(12345, {"foo":"bar"})Copy the code

Cache code:

def set_user(user_id, values):
    user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
    cache.set(user_id, user)Copy the code

Direct write mode is generally a slow operation due to save and write operations, but it is fast to read newly written data. Users are generally more comfortable with the slower pace of updating data than reading it. Data in the cache does not become stale.

Disadvantages of direct writing mode:
  • New nodes created due to failure or scaling are not cached until the database is updated. Caching applies direct write mode to alleviate this problem.
  • Most of the data written will probably never be read, and TTL minimizes this situation.

Back to the write mode








Sources: Scalability, availability, stability, patterns

In write back mode, the application performs the following operations:

  • Add or update entries to the cache
  • Asynchronous data writing improves write performance.
Disadvantages of write back mode:
  • The cache can lose data before its contents are successfully stored.
  • Performing direct write mode is more complex than caching or write-back mode.

The refresh








Sources: From cache to in-memory data grid

You can configure the cache to automatically refresh the most recently accessed content before expiration.

If the cache can accurately predict what data is likely to be requested in the future, the refresh can result in lower latency and read times.

Disadvantages of refresh:
  • Not being able to accurately predict what data will be needed in the future can result in worse performance than not using a refresh.

Disadvantages of caching:

  • You need to maintain consistency between the cache and the real data source, such as invalidation of the database based on the cache.
  • You need to change the application such as adding Redis or memcached.
  • Invalid caches are a challenge, and when to update caches is a complex issue associated with it.

Related resources and further reading

  • From cache to in-memory data
  • Extensible system design patterns
  • Introduction to scalable system architecture
  • Scalability, availability, stability and patterns
  • scalability
  • AWS ElastiCache strategy
  • Wikipedia)

asynchronous








Source: Introduction to scalable system architecture

Asynchronous workflows help reduce the time of requests that would otherwise be executed sequentially. They can help reduce request time by doing time-consuming work ahead of time, such as periodically summarizing data.

The message queue

Message queues receive, retain, and deliver messages. If sequential operations are too slow, you can use message queues with the following workflows:

  • The application publishes the job to the queue and notifies the user of the job status
  • A worker pulls the job out of the queue, processes it, and shows that the job is complete

Instead of blocking user operations, jobs are processed in the background. In the meantime, the client might do something to make it look like the task is complete. For example, if you’re sending a tweet, it might appear on your timeline right away, but it might take some time to push your tweet out to all your followers.

Redis is a satisfyingly simple message broker, but messages can get lost.

RabbitMQ is popular but requires you to adapt to the “AMQP” protocol and manage your own nodes.

Amazon SQS is hosted, but can have high latency, and messages can be delivered twice.

Task queue

Task queues receive tasks and their associated data, run them, and pass on their results. They can support scheduling and can be used to run computationally intensive jobs in the background.

Celery supports scheduling, mainly developed in Python.

Back pressure

If the queue starts to grow significantly, the queue size can exceed the memory size, resulting in cache misses, disk reads, and even slower performance. Back pressure helps us by limiting the queue size to maintain high throughput and good response times for jobs in the queue. Once the queue fills, the client gets the message that the server is busy with the HTTP 503 status code to retry later. The client can retry the request at a later time, perhaps in exponential retreat.

Disadvantages of asynchrony

  • Use cases such as simple calculations and real-time workflows may be better suited for synchronous operations because introducing queues can add latency and complexity.

Related resources and further reading

  • It’s a numbers game
  • Back pressure is applied when overloading
  • Little’s law
  • What is the difference between a message queue and a task queue?

communication








Source: OSI 7 layer model

Hypertext Transfer Protocol (HTTP)

HTTP is a way to encode and transfer data between a client and a server. It is a request/response protocol: client and server requests and responses for relevant content and completion status information. HTTP is standalone, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.

A basic HTTP request consists of a verb (method) and a resource (endpoint). Here are some common HTTP verbs:

The verb describe * power etc. security cacheable
GET Read the resource Yes Yes Yes
POST Create resources or trigger processes that process data No No Yes, if the response contains refresh information
PUT Create or replace resources Yes No No
PATCH Partially updated resource No No Yes, if the response contains refresh information
DELETE Delete the resource Yes No No

Multiple executions will not produce different results.

HTTP is an application-layer protocol that relies on lower-level protocols such as TCP and UDP.

Source and further reading: HTTP

  • README +
  • What is HTTP?
  • Differences between HTTP and TCP
  • Difference between PUT and PATCH

Transmission Control Protocol (TCP)








Source: How to make a multiplayer game

TCP is a connection-oriented protocol over IP networks. Use a handshake to establish and disconnect connections. All packets sent are guaranteed to arrive at their destination in the original order, with the following measures to ensure that the packets are not damaged:

  • Serial number and parity code of each packet.
  • Confirmation packet) and automatic retransmission

If the sender does not receive the correct response, it resends the packet. If multiple times out, the connection will be disconnected. TCP implements traffic control) and congestion control. These safeguards cause latency and generally result in less efficient transport than UDP.

To ensure high throughput, the Web server can maintain a large number of TCP connections, resulting in high memory usage. Having a large number of open connections between Web server threads can be expensive and resource-consuming, that is, a memcached server. Connection pooling can help in addition to switching to UDP where applicable.

TCP is useful for time-constrained applications that require high reliability. Examples include Web servers, database information, SMTP, FTP, and SSH.

TCP is used instead of UDP in the following cases:

  • You need the data intact.
  • You want to automate best estimates of network throughput.

User Datagram Protocol (UDP)








Source: How to make a multiplayer game

UDP is connectionless. Datagrams (similar to packets) are guaranteed only at the datagram level. Datagrams may arrive at their destination in disorder, or they may be lost. UDP does not support congestion control. Although not as guaranteed as TCP, UDP is generally more efficient.

UDP can broadcast datagrams to all devices on a subnet. This is useful for DHCP because devices within the subnet have not yet been assigned IP addresses, which are required for TCP.

UDP is less reliable but suitable for voip, video chat, streaming media and real-time multiplayer games.

UDP is used instead of TCP in the following cases:

  • You need low latency
  • Worse than data loss is data latency
  • You want to implement your own error correction method

TCP and UDP

  • Network of game programming
  • Key differences between TCP and UDP
  • The difference between TCP and UDP
  • Transmission control protocol
  • User datagram protocol
  • Memcache expansion on Facebook

Remote Procedure Call Protocol (RPC)








Source: Crack the system design interview

In RPC, a client calls a method in another address space (usually a remote server). The calling code looks as if it is calling a local method, and the details of how the client and server interact are abstracted. Remote calls are generally slower and less reliable than local calls, so it is helpful to distinguish between the two. Popular RPC frameworks include Protobuf, Thrift, and Avro.

RPC is a “request-response” protocol:

  • Client program — call client stub program. Just like calling a local method, arguments are pushed onto the stack.
  • Client-side stub program – packages the id and parameters of the request procedure into the request information.
  • Client communication module – sends information from client to server.
  • Server communication module – passes accepted packets to the server stub.
  • Server-side stub program — unpackages the result, calls the server-side method based on the procedure ID and passes the parameters.

Examples of RPC calls:

GET /someoperation? data=anId POST /anotheroperation {"data":"anId";
  "anotherdata": "another value"
}Copy the code

RPC focuses on exposure methods. RPC is usually used to handle performance issues for internal communication, so you can handle local calls manually to better suit your situation.

Select native libraries (i.e., SDKS) when:

  • You know your target platform.
  • You want to control how you access your logic.
  • You want to have control over errors that occur in your library.
  • Performance and end-user experience are your top concerns.

Rest-compliant HTTP apis tend to be better suited to public apis.

The bad: the RPC

  • The RPC client is tightly tied to the service implementation.
  • A new API must be defined in each operation or use case.
  • RPC is difficult to debug.
  • You may not be able to easily modify existing technology. For example, if you want to ensure that RPCS are properly cached on a caching server like Squid, it might take some extra effort.

Declarative State Transition (REST)

REST is a mandatory client/server architecture design model based on a set of resource operations managed by the server. The server provides an interface to modify or obtain resources. All communication must be stateless and cacheable.

A RESTful interface has four rules:

  • Flag resource (URI in HTTP) – Use the same URI for all operations.
  • Represent changes (HTTP actions) – use action, headers and body.
  • Self-describing error message (status code in HTTP) — Use the status code, don’t rebuild the wheel.
  • HATEOAS (HTML interface in HTTP) — Your Web server should be accessible through a browser.

Examples of REST requests:

GET /someresources/anId

PUT /someresources/anId
{"anotherdata": "another value"}Copy the code

REST focuses on exposing data. It reduces client/server coupling and is often used in common HTTP API interface design. REST uses a more general and formal approach to exposing resources through URIs, expressing them through headers, and operating through actions such as GET, POST, PUT, DELETE, and PATCH. REST is easy to scale horizontally and isolate because of its stateless nature.

Weakness: the REST

  • Because REST focuses on exposing data, it may not adapt well when resources are not naturally organized or have complex structures. For example, returning the updated records that match a particular set of events in the past hour is difficult to represent as a path. With REST, this might be implemented using URI paths, query parameters, and possible request bodies.
  • REST generally relies on a few actions (GET, POST, PUT, DELETE, and PATCH), but sometimes these alone don’t work for you. For example, moving an outdated document into an archive folder may not be easily expressed by these verbs.
  • To render a single page, retrieving complex resources nested within a hierarchy requires multiple round-trip communications between clients and servers. For example, get blog content and its associated comments. For mobile applications using an uncertain network environment, these multiple round trips can be cumbersome.
  • As time goes on, more fields may be added to the API response, and older clients will receive all new data fields, even those they don’t need, resulting in increased load and greater latency.

Compare RPC with REST

operation RPC REST
registered POST /signup POST /persons
The cancellation POST /resign

{

“personid”: “1234”

}
DELETE /persons/1234
Reading user information GET/readPerson? personid=1234 GET /persons/1234
Read the user’s inventory list GET/readUsersItemsList? personid=1234 GET /persons/1234/items
Adds an item to the user’s item list POST /addItemToUsersItemsList

{

“personid”: “1234”;

“itemid”: “456”

}
POST /persons/1234/items

{

“itemid”: “456”

}
Update an item POST /modifyItem

{

“itemid”: “456”;

“key”: “value”

}
PUT /items/456

{

“key”: “value”

}
Delete an item POST /removeItem

{

“itemid”: “456”

}
DELETE /items/456

Source: Do you really know why you prefer REST to RPC

Source and Extension: REST and RPC

  • Do you really know why you prefer REST to RPC
  • When is RPC more appropriate than REST?
  • REST vs JSON-RPC
  • Demystify RPC and REST
  • What are the disadvantages of using REST
  • Hack the system design interview
  • Thrift
  • Why use REST internally instead of RPC

security

This section needs more information. Come along!

Security is a broad topic. Unless you have considerable experience, a background in security, or are applying for a position that requires security knowledge, you don’t need to know more than the basics of security:

  • Encrypt during transportation and waiting
  • All user input and parameters sent from the user are processed to prevent XSS and SQL injection.
  • Use parameterized queries to prevent SQL injection.
  • Use the minimum permission rule.

Sources and further reading

  • Security guide for developers
  • OWASP top ten

The appendix

Sometimes you’ll be asked to make conservative estimates. For example, you might need to estimate how long it takes to generate 100 image thumbnails from disk or how much memory a data structure requires. The power of 2 table and some time data that every developer needs to know are handy references.

Two to the power of two

Power           Exact Value         Approx Value        Bytes
---------------------------------------------------------------
7                             128
8                             256
10                           1024   1 thousand           1 KB
16                         65.536                       64 KB
20                      1.048.576   1 million            1 MB
30                  1.073.741.824   1 billion            1 GB
32                  4.294.967.296                        4 GB
40              1.099.511.627.776   1 trillion           1 TBCopy the code

Sources and further reading

  • The power of 2

The number of delays that every programmer should know

Latency Comparison Numbers
--------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                          100   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy            10.000   ns       10 us
Send 1 KB bytes over 1 Gbps network     10.000   ns       10 us
Read 4 KB randomly from SSD*           150.000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory     250.000   ns      250 us
Round trip within same datacenter      500.000   ns      500 us
Read 1 MB sequentially from SSD*     1.000.000   ns    1.000 us    1 ms  ~1GB/sec SSD, 4X memory
Disk seek                           10.000.000   ns   10.000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps  10.000.000   ns   10.000 us   10 ms  40x memory, 10X SSD
Read 1 MB sequentially from disk    30.000.000   ns   30.000 us   30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA    150.000.000   ns  150.000 us  150 ms

Notes
-----
1 ns = 10^9 - seconds
1 us = 10^- 6 seconds = 1.000 ns
1 ms = 10^- 3 seconds = 1.000 us = 1.000.000 nsCopy the code

Indicators based on the above figures:

  • Read sequentially from disk at 30 MB/s
  • Read sequentially at 100 MB/s from 1 Gbps Ethernet
  • Read from SSD at 1 GB/s
  • Read from main memory at 4 GB/s
  • It travels around the earth six to seven times per second
  • There are 2,000 round trips per second within the data center

Delay number visualization

Sources and further reading

  • The number of delays every programmer should know – 1
  • The number of delays every programmer should know — 2
  • Design proposals, courses, and recommendations for building large distributed systems
  • Software engineering consulting on building large scalable distributed systems

Other system design interview questions

Common system design interview questions, given how to solve the solution link

The problem reference
Design a file synchronization service similar to Dropbox youtube.com
Design a search engine similar to Google queue.acm.org

stackexchange.com

ardendertat.com

stanford.edu
Design is similar to Google’s extensible web crawler quora.com
Designing Google Docs code.google.com

neil.fraser.name
Design similar to Redis built value store slideshare.net
Design a memcached-like caching system slideshare.net
Design a recommendation system similar to Amazon’s hulu.com

ijcai13.org
Design a Bitly like short link system n00tc0d3r.blogspot.com
Design a WhatsApp chat app highscalability.com
Design an Instagram-like photo-sharing system highscalability.com

highscalability.com
Design Facebook news recommendations quora.com

quora.com

slideshare.net
Designing Facebook’s timeline system facebook.com

highscalability.com
Design Facebook chat erlang-factory.com

facebook.com
Design a facebook-like chart search system facebook.com

facebook.com

facebook.com
Design content delivery networks like CloudFlare cmu.edu
Design a Twitter-like trending topic system michael-noll.com

snikolov .wordpress.com
Design a random ID generation system blog.twitter.com

github.com
Returns the request whose number is higher than k within a specified period ucsb.edu

wpi.edu
Design a service system with data from multiple data centers highscalability.com
Design a multiplayer online card game indieflashblog.com

buildnewgames.com
Design a garbage collection system stuffwithstuff.com

washington.edu
Add more system design issues contribution

Real architecture

An article about how real systems are designed in real life.








Source: Twitter timelines at scale

Instead of focusing on the details of the following article, focus on the following:

  • Discover common principles, techniques, and patterns in these articles.
  • What problems does each component solve, when does it work, and when does it not work
  • Review the passage you have learned
type system reference
Data processing MapReduce– Google distributed data processing research.google.com
Data processing Spark– Databricks distributed data processing slideshare.net
Data processing Storm– Distributed data processing for Twitter slideshare.net
Data store Bigtable– Google’s column database harvard.edu
Data store HBase– Open source implementation of Bigtable slideshare.net
Data store Cassandra– Facebook’s column database slideshare.net
Data store DynamoDB– Amazon’s document database harvard.edu
Data store MongoDB– Document database slideshare.net
Data store Spanner– Google’s global distribution database research.google.com
Data store Memcached– Distributed memory caching system slideshare.net
Data store Redis– A distributed memory caching system that can persist and have value types slideshare.net
File system Google File System (GFS)– Distributed file system research.google.com
File system Hadoop File System (HDFS)– Open source implementation of GFS apache.org
Misc Chubby– Google’s low coupling lock service for distributed systems research.google.com
Misc Dapper– Distributed system tracking infrastructure research.google.com
Misc Kafka– LinkedIn’s publish-subscribe messaging system slideshare.net
Misc Zookeeper– Centralized infrastructure and coordination services slideshare.net
Add more contribution

The company’s system architecture

Company Reference(s)
Amazon Amazon’s architecture
Cinchcast Produces 1,500 hours of audio per day
DataSift Mine 120,000 tweets per second in real time
DropBox How do we zoom in Dropbox
ESPN 100,000 operations per second
Google Google’s architecture
Instagram 14 million users, Megabyte photo storage

What drives Instagram
Justin.tv Justin.Tv’s live broadcast architecture
Facebook Extensible Memcached for Facebook

TAO: Distributed data store for Facebook social graph

Facebook’s photo store
Flickr Flickr architecture
Mailbox Went from 0 to 1 million users in 6 weeks
Pinterest From zero to billions of page views per month

18 million visitors, a tenfold increase, 12 employees
Playfish It has 50 million monthly users and growing
PlentyOfFish The structure of the PlentyOfFish
Salesforce How do they process 1.3 billion transactions a day
Stack Overflow Stack Overflow architecture
TripAdvisor 40M visitors, 200M page views, 30TB data
Tumblr 15 billion page views a month
Twitter Making Twitter 10000 percent faster

MySQL is used to store 250 million tweets per day

150M active users, 300K QPS, 22 MB/S firewall

Extensible schedule

Twitter size data

Twitter behavior: Scale over 100 million users
Uber How can Uber expand its real-time market
WhatsApp Facebook bought WhatsApp’s architecture for $19 billion
YouTube The scalability of YouTube

YouTube, the architecture of the

Company Engineering Blog

The structure of the company you will be interviewing with

The problem you face may come from the same area

  • Airbnb Engineering
  • Atlassian Developers
  • Autodesk Engineering
  • AWS Blog
  • Bitly Engineering Blog
  • Box Blogs
  • Cloudera Developer Blog
  • Dropbox Tech Blog
  • Engineering at Quora
  • Ebay Tech Blog
  • Evernote Tech Blog
  • Etsy Code as Craft
  • Facebook Engineering
  • Flickr Code
  • Foursquare Engineering Blog
  • GitHub Engineering Blog
  • Google Research Blog
  • Groupon Engineering Blog
  • Heroku Engineering Blog
  • Hubspot Engineering Blog
  • High Scalability
  • Instagram Engineering
  • Intel Software Blog
  • Jane Street Tech Blog
  • LinkedIn Engineering
  • Microsoft Engineering
  • Microsoft Python Engineering
  • Netflix Tech Blog
  • Paypal Developer Blog
  • Pinterest Engineering Blog
  • Quora Engineering
  • Reddit Blog
  • Salesforce Engineering Blog
  • Slack Engineering Blog
  • Spotify Labs
  • Twilio Engineering Blog
  • Twitter Engineering
  • Uber Engineering Blog
  • Yahoo Engineering Blog
  • Yelp Engineering Blog
  • Zynga Engineering Blog

Sources and further reading

  • kilimchoi/engineering-blogs

In progress

Interested in adding some parts or helping to improve some parts? Join in!

  • Use MapReduce for distributed computing
  • Consistency hashing
  • Direct memory access (DMA) controller
  • contribution

Thank you

Certificates and sources are provided throughout the repository

Special thanks:

  • Hired in tech
  • Cracking the coding interview
  • High scalability
  • checkcheckzz/system-design-interview
  • shashank88/system_design
  • mmcgrana/services-engineering
  • System design cheat sheet
  • A distributed systems reading list
  • Cracking the system design interview

contact

Feel free to contact me to discuss any problems, questions or comments.

You can find my contact information on my GitHub page

The license

Creative Commons Attribution 4.0 International License (CC BY 4.0)

http:/ / creativecommons.org/licenses/by/4.0/Copy the code

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. Android, iOS, React, front end, back end, product, design, etc. Keep an eye on the Nuggets Translation project for more quality translations.