1. Design a database of 1 billion levels and increase the data volume of 100 million per year.

Mp.weixin.qq.com/s/EY1L-7GpZ…

Ii. Understanding of CAP and BASE Theory?

A:

Both caps cannot be met in distributed systemsCopy the code
  • Consistency: In a distributed system, whether data is consistent across multiple copies, either the latest data is read or it fails. Emphasis is on data accuracy.

  • Availability: Services to be provided in a distributed system must be consistently available, and there is always a limited amount of time available for a correct response to each user operation request (but no guarantee of up-to-date data Availability). Features: data must be returned, no error will be returned, but the latest data is not guaranteed, emphasis is no error. Reading and writing are successful at all times.

  • Partition tolerance: Most distributed systems are distributed in multiple subnetworks, each of which is a Partition. Partition tolerance means that the distributed system can continue to provide services satisfying consistency and availability when some nodes have message loss or Partition failure. Features: The system keeps running, regardless of any internal data synchronization problems, emphasizing that it does not hang up.

The significance of BASE theory is that we don't have to choose between A and C, we can achieve partial A and CCopy the code
  • Eventually consistent: Since data cannot be soft all the time, it must be within a period of time (this period is the inconsistent window), after which all copies of data must be consistent, that is, the final data consistency must be maintained.

  • Basically Available: Don’t go for the CAP. It is emphasized that in the event of an unexpected failure in a distributed system, some loss of availability is allowed, which may be response time, core function availability, or service degradation compared to a normal system.

  • Soft state: Soft state can correspond to atomicity in ACID, which is a “strong state”, and Soft state, which allows data in the current system to exist in an intermediate state that is not considered to affect the overall availability of the system. That is, it allows the system to have data delay in multiple different data copies.

Three, traffic limiting algorithm?

  • Counter algorithm

    In general, we will limit the number of requests that can pass in one second. For example, if the QPS is 100, the algorithm is implemented by starting the timer from the first request. In the next 1s, each request will be counted by 1. Subsequent requests will be rejected altogether. Wait until the 1s end, restore the count to 0, restart the count.

The implementation might look like this: for each service invocation, the AtomicLong#incrementAndGet() method increments the counter by one and returns the latest value, which is compared to the threshold.

I believe we all know that there is a drawback to this implementation: if I have passed 100 requests in the first 10ms of 1s per unit time, I can only refuse the requests in the later 990ms. We call this phenomenon “spike phenomenon”.

  • Bucket algorithm

    In order to eliminate the “spike phenomenon”, you can use the leaky bucket algorithm to limit the flow. The leaky bucket algorithm is very vividly named. There is a container inside the algorithm, similar to the funnel used in life. No matter how big the flow is above, the velocity of the flow below remains the same.

No matter how unstable the service caller is, the request is processed every 10 milliseconds through a leak-bucket algorithm that limits the flow. Because the speed of processing is fixed, the speed of incoming requests is unknown. Many requests may suddenly come in, and the requests that are not processed will be put in the bucket first. Since it is a bucket, there must be a capacity limit.

On the algorithm implementation side, a queue can be prepared to hold requests, and requests can be periodically fetched from the queue and executed by a thread pool, and multiple concurrent executions can be fetched at once.

This algorithm also has disadvantages after use: it cannot deal with sudden traffic in a short time.

  • Token bucket algorithm

In a sense, the token bucket algorithm is an improvement on the leaky bucket algorithm. The bucket algorithm can limit the rate of request invocation, while the token bucket algorithm can limit the average rate of invocation while allowing a certain degree of burst invocation.

In the token bucket algorithm, there is a bucket that holds a fixed number of tokens. There is a mechanism in the algorithm to put tokens into buckets at a certain rate. Each request invocation requires a token to be obtained, and only after the token is obtained can execution continue. Otherwise, the option is to wait for available tokens or reject them outright.

Put a token this action is ongoing, if the number of tokens in the bucket to limit, discarding the token, so there is this kind of situation, ladle has been a large number of tokens are available, and then the incoming request can get directly to the token, such as setting up QPS for 100, so after completing a second current limiter is initialized, it has been 100 barrels token, At this time, the service is not fully started, and when the service is completed, the current limiter can withstand 100 instantaneous requests. So, the request waits only if there is no token in the bucket, and eventually executes at a rate.

You can prepare a queue to store tokens, and use a thread pool to periodically generate tokens and put them in the queue. Each request will fetch a token from the queue and continue execution.

Cluster current-limiting

The algorithms discussed above all belong to the category of single-machine traffic limiting, but the business requirements are varied, and simple single-machine traffic limiting cannot meet them at all.

For example, to limit the number of times that a resource is accessed by each user or merchant, you can only access it twice in 5s or invoke it 1000 times a day. Traffic limiting cannot be implemented on a single machine. In this case, you need to implement traffic limiting in a cluster.

How to do that? To control the number of accesses, you definitely need a counter that can only be stored in a third-party service like Redis.

General idea: For example, to limit the number of times a user can access the /index interface, you only need to concatenate the user ID and the interface name to generate a Redis key. Each time the user accesses the interface, you only need to run the INCr command on the key. Add an expiration time to this key, and you can implement the frequency of access at a specified time.

Fourth, cache (Redis) and database consistency solution?

Juejin. Cn/post / 685041…

How does the final keyword work?

Juejin. Cn/post / 684490…

Java object allocation?

Juejin. Cn/post / 689833…

Seven, slow SQL optimization ideas?

1. The database CPU has a high load. Generally, there is a lot of calculation logic in the query statement, causing the database CPU load.

2. The server is stuck due to high I/O load. This is usually associated with a full table query without an index.

3, the query statement is normal, the index is normal but still slow. If the index appears healthy but the query is slow, you need to check whether the index does not take effect.

Juejin. Cn/post / 684490…

What are the properties of the monitor object?