First recommended reading

A: One article is enough to get through Python network request, scrapy crawler, server, proxy, all kinds of operations, really one article is enough.

Within the article

)



Through the operation of “acid and acid”, make the crawler become no national boundaries, real core do whatever they want to climb

Powder a wave of their own small program

The body of the

Recently, the excrement manager is charging like crazy. Starting today, I’m going to summarize some of the problems in system design for you. On the one hand is to leave a learning summary, on the other hand can also help everyone to overcome the system design problems.

After learning for such a long time, looking back, the system design is actually quite practical. It is very useful in both actual delivery and interview. Learning system design can broaden the mind and make it easier to deal with some difficult and miscellaneous diseases in actual production. If you do not know the system design, you may not plan the overall project structure well at the beginning, which leads to the failure of maintenance and management in the later stage of the project, thus causing the risk of aborting the project. For interview, we all know that generally IT post the interview, ask some algorithms, and project experience, some companies will simply take an examination of the system design of the interviewer, if you know ahead of time or has prepared to system design, so in this kind of problem, you will stride, talk about IT, for the company, If two candidates have similar algorithmic abilities, the students who are excellent in answering system design questions may stand out. And, the system design answer is good, extra points are very considerable. In a word, system design is actually a concrete expression of abstract programming thinking. Its importance is self-evident.

If you still stay in even the algorithm data structure do not understand, then you can hold a try to see the mentality of this article, because in system design, you can combine examples to design algorithms and data structures, but it is still recommended that you do algorithms and data structures; If your skill can already, then read the shovel excrement officer’s article can discuss and communicate with the shovel excrement officer, we make progress together.

Without further ado, let’s begin the first article in this series:

If an interviewer asks you Please Design Twitter, what should you say?

Before answering all system design questions, understand the most critical step:

The first step in system design is to clarify your questions.

Here are the wrong moves, the most taboo technical keywords master: Say a bunch of keywords like Load Balancer, Memcache, NodeJS, MongoDB, MySQL, Sharding, Consistent Hashing, Master Slave, HDFS, Hadoop… HR will assume that someone who has heard about it, but doesn’t really understand the technology.

People who really understand start with a small number of users and then expand to a big problem.

To be clear:

There is no standard answer to system design. System design is a kind of macroscopic design.

Generally speaking, the scoring of system design is mainly determined by the following aspects:

  • Can you come up with a solution that works

  • Whether 20% of specific problems can be solved is generally regarded as follow up questions, especially system design

  • Analytical skills will be demonstrated in the overall communication process.

  • Method A can Tradeoff 15%, method B can Tradeoff 15%. Neither method is good or bad. For example, is it convenient for users to use? Is it easy to store? Scalability?

  • Knowledge Base 15%

Analysis methods to deal with system design problems:

  • Scenario: The main purpose of the Scenario is to ask what the Design is and clarify the goal. Ask/Feature/DAU/QPS/Interface
  • Service: After knowing what the design goal is, it is necessary to clarify whether the goal is a small Feature or a large system. If it is a large system, we need to find a way to dismantle, dismantle into a small system. Large systems must be broken down into smaller services. People with engineering skills must have the ability to take big things apart into small ones.
    The decoupling
  • Storage: The important part.
    How to store it, how to access it.

    Understand database, understand table, understand Schema (table structure), how to save, save into several tables, what is the relationship between data and data, some OO Design flavor. OO Design classes may be written. Schema/Data/SQL/no/File System.

  • Scale: When the machine goes down, there are more users. More and more access, how to reduce the loss after downtime. Sharding/Optimize/Special Case.

The above analysis must be done step by step, not jump to Scale.

Please Design Twitter

Start by asking the interviewer:

  1. What functions need to be designed
  2. How much traffic do you need?

Daily Active Users (DAU), monthly Active Users? MAU and DAU are not multiplied by 30. It’s a one-month log-in. DAU is known because DAU can be used to calculate QPS. Daily active * Average Number of visits per User/number of seconds per day = 150M * 60(estimated) / 86400-100K Peak Peek = Average Concurrent User * 3(estimated 2-9 reasonable) ~ 300K Fast growing product Fast Growing: MAX Peek users in 3 months = Peek users * 2 Read frequency Read QPS ~300K Write frequency Write QPS ~5K

In the whole process above, what matters is not the result of the calculation, but the calculation.

Enumerate, the features that Twitter does. Register/login User Profile Display/Edit Upload Image/Video Search Post/Share tweet Timeline/ News feed Follow/Unfollow a user

Step 2: Sort: Sort the core content, because you can’t design everything in such a short time. Post a tweet Timeline News Feed Follow/Unfollow a user Register/Login (every App has it, don’t need to answer)

What is the use of QPS analysis:

  • QPS = 100: Just use your laptop as a Web server
  • QPS = 1K: Using a better Web server is about the same, consider Single Point Failure, maybe some process is deadlocked, stuck or something.
  • QPS = 1M: A cluster of 1000 Web servers is required. Maintainance is considered.

Relationship between QPS and Web Server/Database:

  • A Web Server is about 1K QPS (factoring in logical processing time and database query bottlenecks)
  • An SQL Database can withstand about 1K of QPS (less if there are more joins and INDEX queries).
  • A NoSQL Databse (Cassandra) can withstand 10K QPS per month
  • A NoSQL Databse (Memcashed) can withstand 1M QPS per month (can be considered a cache, in memory, not disk, fast)
Fetching data from a database is constrained by the performance of the database.

Break down large systems into smaller services

Step 1: Replay: Go over each requirement again, adding a service for each requirement. Part 2: Merge: Merges the same services.

Service: Can be considered as the integration of logical processing, the logical processing of the same type of problem into a Service, the entire System is subdivided into several small services.

Storage store

Storage write, System Design basic answer is ready.

First, analyze what kind of database to use and where to store what data:

  • Memory: Some data, not necessarily persistent, can be stored in memory, memcache
  • Database: Persists structured information, such as user information, in the Database.
  • File system: Unorganized information, file system, exists directly in file system. Amazon S3

Then, the case stored in this design problem should be:

  • Relational database SQL Databse: User information User Table
  • NoSQL Databse: Tweets, Social Grouph (follows)
  • File system: Video, Image, Media Files

Unstructured data is placed in the File System; Structured data is stored in the Database; Lost data is stored in the Cache.

Design News Feed (Timeline)

NewsFeed is the integration of information, attention and being watched. Small information flows are integrated into large information flows. Everyone sees something new differently.

The Pull Model:

K-merge – when users view the News Feed, get the top 100 Tweets of each friend and merge the top 100 News feeds. Merge K Sorted Arrays.

Complexity analysis: if N friends are added to the News Feed, it takes N DB Reads times, and the k-way merging time can be ignored. (Why can be ignored, because merging occurs in memory. DB IO is an order of 3 delay. Post a Tweet => Time of a DB Write

Shortcomings of the Pull model:

N DB Reads is very slow

The pseudocode is as follows:

 1getNewsFeed(request):  
 2    following = DB.getFollowings(use = request.user)
 3    news_feed = empty
 4    for follow in following:
 5        tweets = DB.getTweets(follow.to_user, 100)
 6        news_feed.merge(tweets)
 7    sort(news_feed)
 8    return news_feed
 9
10postTweet(requst, tweet):
11    DB.insertTweet(request.user, tweet)
12    return success
Copy the code
Push Model

Create a List for each user to store the News Feed. When a user posts a Tweet, the tweets are sent to each user’s News Feed one by one.

News Feed => 1 DB Read Post A tweet => N DB Writes There is a News Feed Table that stores id/owner_id/tweet_id/ create_AT

Asyn Tasks are processed through queues.

PushModel defects:

Not timely, if there are many fans, timeliness will not be guaranteed

The pseudocode is as follows:

 1getNewsFeed(request):
 2    returnDB.getNewsFeed(request.user) 3 4postTweet(request, tweet_info): 5 tweet = db.inserttweet (request.user, tweets_info) 6 AsyncService.fanoutTweet(request.user, tweet) 7return8 9 asyncservice: success: fanoutTweet (user, tweet) : [the number of followers may too] follows = 10 DB. GetFollow (user) 11for follower in follows:
12        DB.insertNewsFeed(tweet, follower)
Copy the code
Which is better, Push or Pull?

Facebook – Pull

Instagram – Push & Pull

Twitter – Pull

This reflects the Tradeoff ability. Where the two methods are good, where they are bad, can they be optimized?

TradeOff point: User experience is important; Zombie fan issues, etc

Scale extension

The first step: Optimize

Resolve the shortcomings of the Pull model:

  • Reading the database is slow when the user requests it (the user has to wait)
  • Add to Cache before DB access,
  • Cache Timeline for each user N DB requests -> N cache requests (N is the number of friends you follow)

Trade off: Cache the latest 1000 News feeds for each user:

  • For users without cache news feed, merge the last 100 tweets of N users and extract the top 100 tweets
  • Users with cache News feed: Merges all tweets of N users in a certain time period

Address the shortcomings of the Push model:

  • Waste too much storage space Disk:
    Disk is cheep
  • Inactive Users: Rank followers by weight(for example last login time)
  • Number of followers >> Number of followers

Optimization of everything: Try to make minimal changes to the current model to optimize: for example, add more machines for Push tasks, estimate the long-term growth, and estimate whether it is worthwhile to transform the model.

Social apps are generally push, because push is easy, and gradually become pull

Optimization scheme of Push combined with Pull:

  • Ordinary users use Push
  • Star user tag
  • For star users, it is not pushed into the News feed
  • When the user needs, the star is taken from the Timeline and merged into the News feed

Problem: star definition, drop fan problem. Solution: Define the star as regular. The star Timeline is displayed in combination with its own News feed.

When to use Push: Fewer resources; Want to be lazy, less code; Real time requirement is not high; Users post less; Two-way relationship, no celebrity issues (e.g. circle of friends)

When to use Pull: Sufficient resources; High real-time requirement; Users post a lot; One way friend relationship, star problem.

The rest of the Twitter feature design is up to you. The above is the summary of the system design of this phase. Very miscellaneous, but very useful, I believe that many people have not seen here, may see pure text, then closed. But still want to give the students who see here a small benefit, that is, the address of DailyProject!! The inside will be updated every day posts are gathered together, convenient fried chicken. It looks something like this:


Access, pay attention to the public number”
Pick’s pooper“, reply”
Daily“To obtain the latest address. Reply”
code“, will give you the ultimate masturbation experience.

So hardcore public number, still not concerned about a wave ah?