System Design Special Topic :Amazon Sorting System Design (mock-interview)

Background:

Interviewee: 4 years of experience, ready to interview Stripe Interviewer: From Amazon Scientiacoder: HTTPS Scientiacoder IO

Background:

Amazon has 10 million products and 1,000 categories, each of which can fall into 1 to N (more than) categories.

For example, when you look at some items, they are sorted by category, like iPhone is sorted by iphones, electronics, phones, Design a system that sorts categories by sales(number of sales in n days) within them

start

If the problem is advantageous, then the API will send a category and return n top hits. If the problem is advantageous, then the API will send a category and return n top hits.

The system is required to be updated hourly

After the load balancer(. This is a pit. I’ll talk to you later) start by describing the architecture of the system, as shown below:

Here are the components:

  1. Load-balancer: a load balancer that serves as the entrance of back-end traffic
  2. Server Response: This is its back-end server server
  3. Redis cache: The cache stores topK tasks
  4. Batch job: Batch jobs are performed. Data is pulled from the DB periodically, calculated and written to the cache
  5. Timeseries_db: Timing database used to store transaction information
  6. Relational_db: Relational database, used to store product_info

Then He explained the logic: Since the category is sorted by the purchases made in the last n days, it is a good idea to use a sequential database to store transaction information (by Cassandra, I think it is good to use a sequential database), and then batch job will pull data from the database periodically to calculate and write to Redis

Interviewer: If some people want to add an item that already has sorted information, does your system support this function? For example, you have some items in the third-party mall, and now you want to sell them on Amazon. Then you provide some transaction information of the third-party, can you use these to represent its rank?

Scientiacoder comment: This is a good question, the solution given by the interviewee is to add these transaction information to timesERIes_DB, and then read the transaction information of n days when the next Batch job runs, and then update it. Instead of waiting for the next Batch job(CRon), use the heap to store topK issues, compare the new transaction information directly to the top of the heap, and write it to the timesERIes_DB.

Interviewer: What should I do if my Redis is down? (Actually this architecture has a single point of failure)

If redis is down, there is no way to run batch jobs every time and then return data to the database, which is advantageous to both redis and redis. If redis is down, there is no way to run batch jobs every time and then return data to the database, which is advantageous to redis.

And then it’s code writing, just writing it outclass.functionWhen advantageous wee wrote his code:

Interviewer: I’ve had a good time in the interviewer’s office. Interviewer: I’ve had a good time in the interviewer’s office.

  1. Be sure to ask more clarifying questions (don’t write code at first, be sure to clarify requirements, limitations, scales, etc!! 3 years of work experience is always in a hurry to write code this is not good…) , if you are not sure the question to ask, excuse me, haven’t even had time to ask, is easy to skip some questions, and don’t ask the question, and answer until the interview they will bring you may make changes to the design, or even worse, because you did not ask them to tell you something can destroy you have for the design of the (deliberately)
  2. When doing advanced design, be sure to start at a specific point where the request begins/ends (this should start with the client or caller initiating the HTTP request, not the load balancer), and then start with the load balancer without specifying any information about the source of the request. This is usually the wrong form, and you should always start with the source of the request. This can be risky, as visitors can always isolate it to questions about DNS, but I think it looks good and gives a clearer view of the overall life cycle of the request. The flow of information that you are modeling
  3. Calculate the amount!! You completely skip this part, I know this might be very boring, but take 2-4 minutes to calculate how much the service will use usage is worth it, I think if you in advance for this operation, will realize that can extend a redo is not enough to meet the demand of 100 billion requests per month of read (about 40000 requests per second unbalanced traffic)
  4. Statements are you doing hypothesis (and the interviewer statement hypothesis), I think you had thought about some things during an interview, and you have no oral talk about these things (such as how the archive database cache are stored in separate tables) for your design, I understand and implied it, but what you assume that good). It’s also good to state assumptions about traffic patterns and loads (for example, a service like the one we’re talking about would have no traffic throughout the day, with heavy spikes and fluctuations)

PS

Judging from the feedback from netizens, this advantageous person should not have been.. Scientiacoder will post my design in appendix later

This article was originally published in Scientiacoder IO. For more in-depth knowledge of back end (operating system, database, middleware, etc.) and golang principle, please visit HTTPS Scientiacoder IO