Distributed microservices and middleware
4.1 Message Queues (RabbitMQ, Kafka, RocketMQ)
1. Why message queues? The advantages and disadvantages? Business scenario?
Advantages: decoupling, asynchronous, peak clipping Disadvantages: - Improved system complexity - Idempotency - Message loss - Message delivery order - consistency: ABC three systems, AB successful, C failed - Availability: MQ is down and all subsequent services are unavailable Business scenarios: - Batch generating file tasks, multiple executors accepting Kafka messages to perform tasks - non-core business, such as: modifying membership points, sending SMS, generating PDFS - timed events for download coupons or free membership, using MQ peak clippingCopy the code
2. How can message consumption be idempotent?
- Use database table primary key - Use Redis distributed lockCopy the code
3. How to solve message loss?
- Producer Rabbitmq (synchronous) can choose to use the provided transaction function, but if throughput is too high (asynchronous) Confirm mode is enabled, each message will be assigned a unique ID, and writing to Rabbitmq will send you an ACK message indicating success. If RabbitMQ fails to process the message it will call back to the NACK interface to say failure and retry. - Message queue Rabbitmq Enable Rabbitmq persistence to disk with confirm. Ack Kafka is returned only after the message has been persisted to disk. Replicas >1 to ensure that each partition must have at least two replicas. Min.insync.replicas >1 to ensure that a leader and at least one follower are in contact. Acks =all, Retries =MAX - Consumer RabbitMq To turn off automatic RabbitMq ack and use an API to call manual ACK. Kafka automatically submits offset when a consumer fails to allocate the consumption to another consumerCopy the code
4. Sequential consumption of message queues?
Scenario: The database adds, modifies, and deletes the data of the same Id at the same time, but when you send the message to consume, it changes to change, modifies, and adds the data, so that the data is incorrect. RocketMQ: There are multiple queues under a topic. In order to ensure orderly sending, RocketMQ provides MessageQueueSelector queue selection mechanism: - Hash mode: Send the same order to the same queue. Then send the payment message if the creation message of the same order is sent successfully. In this way, we ensure that the delivery is orderly. - The queue mechanism in RocketMQ topic ensures that the storage meets the FIFO (First Input, First Output), and the rest only needs to be consumed in order by the consumers. - RocketMQ only guarantees sequential delivery, sequential consumption is guaranteed by consumer business!! - Kafka in order to guarantee a consumer in multiple threads to deal with, will not be upset, make the order of the messages can be in the consumer, the dissemination of the message to different threads, add a queue, hash consumers to do distribution, will need to put together the data, distributed to the same queue, the multiple threads to fetch the data from the queue; Write N memory queues. All data with the same key is allocated to the same queue. Then, for N threads, each thread consumes a queue to ensure orderliness. (https://blog.csdn.net/weixin_43934607/article/details/115631270?ops_request_misc=%257B%2522request%255Fid%2522%253A%252 2162313161016780255271168% 2522% 2522% of the 252 2522 SCM c % % 253 a % 252220140713.130102334 PC % 255 fall. The 2522% % 257 d & request_id = 1623131610 16780255271168&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_v2~rank_v29-1-115631270.p C_search_result_control_group & utm_term = 9 E7 Kafka % % a % % % 84% E9 A1 BA BA 8 f % % % E5% E6% B6 E9 B9 B4 E8% % % % % 88% 97% A2 E9 AE % % % 98 & SPM = 1018.222 6.3001.4187)Copy the code
5. What are the differences and application scenarios between Kafka, RabbitMQ, ActiveMQ and RocketMQ?
(1) Differences - ActiveMQ is fully compatible with JMS specifications and occasionally has a low probability of missing messages - RabbitMQ Erlang language development, - RocketMQ Java development, can control ali development technology community active general support transactions - Kafka high throughput distributed arbitrary expansion (2) Applicable scenarios ActiveMQ: Not recommended RabbitMQ: RocketMQ: low community activity. Kafka: good for large data logsCopy the code
6. How to ensure that message queues are highly available?
RabbitMQ high availability Single-machine mode Common Cluster mode Mirrored cluster mode Kafka high availability - Storage structure A folder: Each message in the partition is assigned an ordered ID, called offset, containing multiple segments. 1. All replicas elect a leader, with whom production and consumption are connected. 2. If all followers return an ACK, the message will be read only by the leader. The message will be read only when all followers return an ACK successfullyCopy the code
7. How to deal with the backlog of millions of messages in the message queue?
- Temporary emergency capacity expansion 1) First fix the problem of consumers to ensure their consumption speed is restored, and then stop all existing consumers. 2) Create a new topic, whose partition is 10 times the original size. 3) Write a temporary consumer program that distributes data. This program is deployed to consume the backlog of data without doing any time-consuming processing. 4) Then we temporarily employ 10 times as many machines to deploy consumers, each of which consumes data from a temporary queue. 5) This is equivalent to temporarily expanding queue resources and consumer resources by 10 times. 6) When the backlogs are quickly consumed, the original deployment architecture will have to be restored to consume messages with the original Consumer machine - data loss due to the RabbitMQ timeout setting bulk redirection: scripts will be manually sent to MQ at night when less people are using them to recover the lost dataCopy the code
Kafka feature
How it works: Directly map Files to physical Memory using the operating system's pages. After the mapping is complete, your operations on physical memory are synchronized to the hard disk (where appropriate by the operating system), which gives you a big I/O boost without the overhead of copying from user space to kernel space. Comparison of Zero Copy (Zero Copy) based on SendFile: - Traditional mode (four times Copy and four times context switch) : Take sending disk files over the network as an example: The file data is first read into the kernel-state Buffer (DMA copy) through a system call. The application then reads the memory-state Buffer data into the user-state Buffer (CPU copy). The user program then copies the user-mode Buffer data to the kernel-mode Buffer (CPU copy) when sending data through the Socket, and finally copies the data to the NIC Buffer through DMA copy. This is accompanied by four context switches. - Zero copy (based on SendFile and transferTo) : After data is copied to the kernel Buffer, data is directly copied to the NIC Buffer by DMA without CPU copy. This is where the term zero copy comes from. In addition to reducing data copying, there are only two context switches because the entire read file to the network is sent by a single sendFile call. Note: transferTo and transferFrom do not guarantee the use of zero copy. Whether or not you can actually use zero-copy depends on the operating system. If the operating system provides a zero-copy system call like SendFile, then the two methods will take full advantage of zero-copy through such a system call, otherwise they cannot implement zero-copy by themselves. (1) Producer first finds the leader of the partition from the nodes in the broker-list. (2) The producer then sends the message to the partition acting as the leader; (3) After receiving the message, the leader writes the message to the local log. (4) Followers pull messages from the leader to implement replication and write them to the local log. (5) Replication writes a local log and sends an ACK to the leader. (6) After receiving all replication ACKS, the leader sends acks to the producer; (7) Producer receives an ACK from the leader, proving that the production data has been successfully written by Kafka. 3. The new node, Kafka data redistribution question https://www.cnblogs.com/xionggeclub/p/9390037.html 4. Kafka partition and principles - partition reasons (1) from the Angle of the producer, Kafka partitions are easier to scale in a cluster. A topic can be composed of multiple partitions that reside on different physical broker nodes. If a large amount of data is added to Kafka, the entire Kafka cluster can accommodate any size of data. (2) From the perspective of consumer, if there is no partition, a consumer can only consume the data of one topic; After partitioning, concurrency can be improved. Multiple consumers form a consumer group, and different consumers consume different partitions under a topic. In this way, concurrent consumption can be realized and consumption efficiency is greatly improved. - Partition rule (1) If partition is specified, this parameter is used directly. (2) If no partition is specified but a key is specified, hash a partition using the value of the key. (3) If neither partition nor key is specified, RoundRobin is used to select a partition.Copy the code
4.2 Distributed Cache Redis
1. Why cache in a project? What are the problems with using caching?
1. Why cache in a project? - High performance (common combination scenario) 1) Shared session storage user information 2) Cache frequently used data, such as page views and daily activity - High concurrent brain knowledge base: grab coupons, grab free members, etc. 2. What are the problems with using caching? 1) cache and database double write inconsistencies 2) cache avalanche 3) cache penetration 4) cache concurrency contentionCopy the code
2. Redis thread model? Why is the Redis single-threaded model efficient?
1) File event handler (because this is single-threaded, Redis is single-threaded) - structure: Server socket (multiple socket client connections) IO multiplexer queue file event dispatcher file event handler Connection reply handler connection redis: Socket associative connection reply handler command requests the handler to write data to Redis: Socket association command request handler command reply handler read data from redis: Socket association response processor-flow: Multiple sockets may concurrently produce different operations, each corresponding to a different file event, but IO multiplexers listen for multiple sockets but queue the sockets. Each time a socket is fetched from the queue to the event dispatcher, the file event dispatcher selects the corresponding event handler to process based on the events currently generated by each socket. After the event is processed, the IO multiplexer will send the next socket in the queue to the event dispatcher. 2) File event IO multiplexers can listen for both AE_REABLE and AE_WRITABLE events, and the file event dispatcher processes AE_REABLE event 2 first. Why is the single-threaded model efficient? - Pure memory operation - core is based on non-blocking I/O multiplexing mechanism - single thread instead avoids multi-thread's frequent context switch problem (Baidu)Copy the code
3, Redis memory elimination mechanism
2) Allkeys-lRU: Remove the least recently used key when running out of memory (this is the most used key) 3) Allkeys-random: Volatile -lru: Removes the least recently used key from the expired key when memory is low. Volatile -random: removes the least recently used key from the expired key when memory is low. 6) volatile- TTL: When the memory is insufficient to accommodate new writes, the key whose expiration time is earlier is removed from the key spaceCopy the code
4. Redis avalanche problem and memory penetration problem
- Before the avalanche: Redis high availability, master/slave + Sentinel, Redis Cluster, avoid total crash: local EhCache + Hystrix limiting & downgrade, avoid MySQL killed: Redis persistence, fast recovery of cached data business code: For example, if the request parameter id=-99999 is not found in the database, then the redis key=-99999,value=null will be inserted before the return. The next request will request Redis firstCopy the code
5. How to solve the problem of redis concurrency competition? How to ensure data consistency between the cache and the database in dual write?
1. Concurrency competition problem of Redis: CAS scheme of Redis transaction: distributed lock ensures that only one instance can operate at a time. Before each write, check whether the timestamp of value is newer than that of cache, and then write. 2. Ensure data consistency between the cache and the database in dual-write modeCopy the code
6, Redis Cluster
- Data distribution algorithm (1) Disadvantages of the traditional Hash algorithm: Once a master is down, the new model will cause the machine to be dislocated, almost all the cache will be invalid, and a large number of cache reconstruction (high concurrency scenarios are not allowed). (2) Disadvantages of the consistent hash algorithm: (3) Hash slot algorithm Redis Cluster has a fixed number of 16384 hash slots. Hash slot for each key in a redis cluster. Each master in a redis cluster has a number of slots, for example, three master slots. Hash slots make it easy to add and remove nodes. If you add a master, move some of the hash slots of other masters, subtract one master, To move its hash slot to another master, the client API can move the specified data to the same hash slot. Implements hash tag - Core Principles (1) Nodes communicate with each other through the Gossip protocol A. Centralized versus Gossip protocol - Centralized: - Advantages: Metadata update and read, very good timeliness, once the metadata changes, immediately updated to the centralized storage, other nodes can immediately perceive when read; - disadvantages: all metadata update pressure all concentrated in one place, may cause the metadata stored pressure - gossip protocols: - advantage: metadata update scattered, not concentrated in one place, will update request, one dozen to update all the nodes, has a certain delay, reduce the pressure; - Disadvantages: metadata update is delayed, may cause some cluster operations to lag (reshard, to do another operation, configuration error, not yet agreed) - message: -meet: Redis-trib.rb add-node-ping: a node sends a meet to a new node to join the cluster, and the new node starts communicating with other nodes. -Pong: Returns ping and meet, containing its own status and other information, and can also be used for information broadcast and update. -fail: After judging that another node fails, a node sends a FAIL message to other nodes to notify them that the specified node is down. B. 10000 Port: the number of the port that provides services +10000, for example, 7001, is 17001 C used for communication between nodes. Information exchanged - Fault information - Hash Slot information - Adding and removing nodes (2) Smart Jedis A. What is? - Redirection-based client that consumes network IO, such as Jedis is smart. A hashslot -> node mapping table is maintained locally. In most cases, a hashslot -> node can be found directly in the local cache. No need for moved redirection b.JedisCluster works - when JedisCluster is initialized, a node is randomly selected, the hashSlot -> node mapping table is initialized, Create a JedisPool connection pool for each node - each time an operation is performed based on JedisCluster, JedisCluster first computes the hashslot of the key locally, Then find the corresponding node in the local mapping table - if that node still happens to hold the same Hashslot, then ok; If you did something like reshard, maybe the hashSlot is no longer on that node, so it will return moved, and if the JedisCluter API finds that the node returned moved, then using the metadata of that node, Update the local HashSlot -> node mapping table cache - repeat the above steps until the corresponding node is found. If you retry more than 5 times, an error is reported. JedisClusterMaxRedirectionException jedis old version in a node fault haven't automatically switch back, will be too hash slot updates and ping check active node, causing a large amount of network IO overhead, jedis new version has optimized, C. Hash Slot migration and Ask redirection - If hash Slot is migrating, an Ask redirect is returned to Jedis, and jedis receives the Ask redirect and relocates to the target node for execution, but because ask occurs during hash Slot migration, Moved will update the local hashslot->node mapping table cache (3) high availability and master/slave switchover If a node does not return pong within cluster-Node-timeout, it is considered as pFAIL. If a node does not return PONG within cluster-Node-timeout, it is considered as pFAIL. Ping to other nodes. If more than half of the nodes consider pFAIL, it becomes FAIL, odown (almost the same as sentry) b. Slave Node timeout filtering - For the master node that breaks down, select one of its slave nodes to become the master node and check the disconnection time between each slave node and the master node. If you exceed cluster-node-timeout * cluster-slave-validity-factor, you are not eligible to switch to master (like sentry) C. Election from node - Sentinel: All slave nodes are sorted, such as slave Priority, offset, and run ID. Each slave node sets an election time according to its offset of data replicated from master. The slave node with a larger offset (more data replicated) has a higher election time and is preferred for election. All master nodes vote in the slave election. If most master nodes (N/2 + 1) vote for a slave node, then the election is approved, and that slave node can be switched over to the master node to perform the master/slave switchover. Switch from a secondary node to a primary node. D. Compare to Sentry: The process is very similar to Sentry, so Redis Cluster is a powerful integration of Replication and SentinalCopy the code
7. What is the difference between Redis cluster mode and master-slave mode and Sentinel mode
Primary and secondary is for data backup,sentinel focuses on high availability, and Cluster improves concurrency. 1. Master and slave: Read/write separation, data backup, load balancing, a Master can have multiple Slaves. 2. Sentinel (sentinel) : Monitoring, automatic transfer. When the sentinel finds that the master server is suspended, it elects a new master server from the slave. 3. Cluster: In order to solve the problem of limited capacity of single Redis, data is allocated to multiple machines according to certain rules. The memory /QPS is not limited to single machines, which can benefit from high scalability of distributed Cluster. - In sentinel mode, the monitoring authority is given to the Sentinel system. In cluster mode, the working nodes do the monitoring by themselves. - In sentinel mode, a leader sentinel node is elected to handle failover. In cluster mode, a new master node is elected from the node to handle failover.Copy the code
4.3 Distributed Search Engine Elasticsearch
1. How is ES distributed?
1, Index -> type -> mapping -> document -> field - an index can be split into multiple shards, each shard stores part of the data, the data of this shard actually has multiple backups. So each shard has a primary shard that writes data, but there are also replica Shards. After data is written to the primary shard, it is synchronized to the other replica shards. -According to the Replica scheme, there are multiple copies of data for each shard. If one machine goes down, it doesn't matter. There are other copies of data on other machines. - es If there are multiple nodes in a cluster, one node is automatically elected as the master node. The master node does some management work, such as maintaining index metadata pull and switching the primary shard and replica Shard identities. 1. If the master node goes down, a new master node is elected. 2. If the non-master node breaks down, the master node transfers the identity of the primary shard on the faulty node to the Replica Shard on another machine. 3. If the downed machine is repaired and restarted, the master node will allocate the missing Replica Shard to the replica shard and synchronize the subsequent modified data to restore the cluster to normal.Copy the code
2. How is it deployed in production environment?
(1) We have deployed 5 machines in THE ES production cluster, each of which is six-core 64G and the total memory of the cluster is 320G. (2) The daily incremental data of our ES cluster is about 20 million, about 500MB per day and about 600 million 15G per month. At present, the system has been running for several months, and the total amount of data in ES cluster is about 100G. (3) Currently there are 5 indexes online (taxpayers' tax payment records, service records, etc., can be placed in ES), and the data volume of each index is about 20G. Therefore, within this data volume, we allocate 8 shards to each index, 3 more than the default 5 shards.Copy the code
3, ES write and query workflow?
(1) Write process 1) The client selects a node to send the request to, and the node is the coordinating node. 2) The document is routed. 3) The primary shard on the actual node processes the request, and then synchronizes the data to the Replica node 4) Coordinating node, If it is found that the primary node and all replica nodes have been repaired, the replica node returns the response result to the client. (2) Query: GET a piece of data. You can query by doc ID. Figure out which shard the DOC ID is allocated to at that time, and query from that shard. 1) The client sends the request to any node as coordinate Node. 2) Coordinate Node routes the document. The request is forwarded to the corresponding node. In this case, the round-robin random polling algorithm is used to randomly select one from the primary shard and all replicas. Let the read request load balance 3) the node receiving the request returns document to the coordinate Node 4) The coordinate Node returns document to the client (3) ES search the data process 1) The client sends the request to a coordinate Node 2) Coordinate nodes to forward search requests to the corresponding Primary shard or Replica shard of all shards. Each SHard will return its search results (actually some doc ids) to the coordinating node, which will perform data merging, sorting, paging and other operations to produce the final results. Then, the coordination node pulls the actual document data from each node according to the doc ID, and finally returns it to the client. Refresh, flush, translog, merge 1) Data is written to the buffer first, and cannot be searched in the buffer. 2) Refresh process: If the segment file is in the SEGMENT cache, refresh the segment file to the OS cache. If the segment file is in the SEGMENT cache, refresh the segment file to the OS cache. SQL > select * from 'segment file'; When translog reaches a certain length, refresh the existing buffer to the OS cache, clear the buffer, write a commit point to disk file, which identifies all segment files corresponding to the commit point. 4) Flush: Flush the existing translog, restart the translog again, and commit. By default, a COMMIT is automatically performed every 30 minutes, but if translog is too large, a COMMIT is triggered. Merge the merge process In the case of a delete operation, a.del file is generated at commit time, which identifies a doc as a deleted file, so the search will know that the doc is deleted based on the.del file. In the case of an update operation, the original DOC is identified as a deleted file. Segment file by default, the segment file is created every second. If the segment file is not refreshed, merge the segment file. Delete doc as deleted and write the new segment file to disk. A commit point will be written to identify all new segment files. Then open the segment file for search and delete the old segment file. (Sort of like the mark-copy algorithm for JVM garbage collection)Copy the code
4. How to optimize query performance in multi-billion data level scenario?
The es search engine relies heavily on the underlying Filesystem cache, and if you give filesystem Cache more memory (at least half of it), Make sure that all indx segment files are indexed in memory, so that most of your searches are performed in memory (at least half of your searches are performed in memory). 2) ES + hbase Architecture: Search keywords such as name and tag, put es, then ID into hbase, find ID by keyword, then directly go to hbase to find ID, minimize the storage of ES, do not put non-search fields in ES, waste memory (2) Data preheat those data that you think will be accessed frequently. It is best to have a dedicated cache warm-up subsystem, in which every once in a while you access hot data in advance and let it go into filesystem cache. This will give better performance the next time someone accesses it. (3) Hot/cold separation is similar to the horizontal split of mysql, in which cold data is written to one index and hot data is written to another index. This ensures that hot data is kept in the Filesystem OS cache as much as possible after being warmed up, so as not to flush out cold data. Suppose you have 6 machines, 2 indexes, 1 cooling data, 1 heat data, 3 shard each index, 3 machine heat data index; Index (4) Paging performance optimization 1) Deep paging is not allowed/the default deep paging performance is miserable 2) Similar to the recommended goods in APP constantly pull down page after page, using Scroll API, the principle of Scroll is actually to preserve a data snapshot, The Scroll API only allows you to scroll backwards page by page, not jumpCopy the code
5. Tf-idf (Term Frequency — Inverse Document Frequency)
Term Frequency -- Inverse Document Frequency (TF-IDF) is a commonly used weighting technique for information retrieval and information exploration. Tf-idf is a statistical method used to assess the importance of a word to one of the documents in a document set or a corpus. The importance of a word increases with the number of times it appears in the document, but decreases inversely with the frequency of its occurrence in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of correlation between files and user queries. In addition to TF-IDF, search engines on the Internet also use link analysis-based rating methods to determine the order in which documents appear in search results. How it works: - Term frequency (TF) refers to the number of occurrences of a given word in a given document. This number is usually normalized (the numerator is generally less than the denominator distinguished from IDF) to prevent it from favoring long files. (The same word in a long document may have a higher frequency than in a short document, regardless of its importance.) - Inverse Document Frequency (IDF) is a measure of the general importance of a word. The IDF of a particular term can be obtained by dividing the total number of files by the number of files containing the term and taking the logarithm of the resulting quotient. - The high frequency of words in a particular document and the low frequency of words in the entire document set can produce a high-weight TF-IDF. Therefore, TF-IDF tends to filter out common words and retain important ones. - The main idea of TFIDF is: if a certain word or phrase appears in a high frequency TF in one article and rarely in other articles, it is considered that the word or phrase has good classification ability and is suitable for classification. TFIDF is actually: TF * IDF, TF Term Frequency, IDF Inverse Document Frequency. TF stands for the Frequency with which a Term appears in document D. (Alternatively, Term Frequency refers to the number of occurrences of a given word in that document.) The main idea of IDF is that if the number of documents containing the term t is smaller, that is, the smaller n is, the larger IDF is, it indicates that the term T has a good ability to distinguish categories. If the number of documents containing entry T in a certain type of document C is M, and the total number of documents containing t in other categories is K, it is obvious that the number of documents containing t is n= M + K. When M is large, n is also large, and the IDF value obtained according to the IDF formula will be small, indicating that the classification ability of entry T is not strong. (Another one is: The Inverse Document Frequency of IDF refers to the smaller the number of documents containing terms, the larger the IDF is, indicating that terms have good classification ability.) But in practice, if a term appears frequently in the documents of a class, it indicates that the term is a good representative of the characteristics of the text of that class, and such terms should be given high weight and selected as features of the text of that class to distinguish it from other kinds of documents. This is where IDF falls short.Copy the code
4.4 Distributed Architecture
4.4.1 Dubbo + Zookeeper
1. What communication and serialization protocols does Dubbo support?
(1) DuBBo protocol (default), single long connection, NIO asynchronous communication, based on Hessian serialization protocol (2) RMI protocol, Java binary serialization protocol (3) Hessian protocol, also Hessian serialization protocol (4) HTTP protocol, Json serialization (5) WebService protocol, SOAP text serializationCopy the code
2. Dubbo load balancing strategy, cluster fault tolerance strategy and dynamic proxy strategy
(1) Dubbo load balancing policy 1) Random loadBalance 2) roundrobin loadbalance, 3) LeastActive loadBalance automatically senses that the worse a machine is, the fewer requests it receives and the less active it is. 4) A consistanthash loadBalance algorithm is used to distribute requests with the same parameters to a provider. When the provider fails, the remaining traffic will be evenly distributed based on the virtual nodes. The jitter is not too great. (2) Dubbo cluster fault tolerance policy 1) (Default) Failover Cluster mode (automatic switchover and retry on other machines when a call fails) 2) Failfast Cluster mode (immediate failure if a call fails. Common in write operations) 3) Failsafe Cluster mode (ignored when an exception occurs, often used for invoking unimportant interfaces, such as logging) 4) Failback Cluster mode (the background automatically records the request and sends it periodically. 5) Forking cluster(call multiple providers in parallel, 6) Broadcacst cluster(call all providers one by one) (3) Dubbo dynamic proxy policy by default uses Javassist dynamic bytecode generation to create proxy classes, but you can configure your own dynamic proxy policy through the SPI extension mechanismCopy the code
3. Register the Zookeeper
1. Can the publisher and subscriber still communicate when the Zookeeper cluster is down? It can communicate. When Dubbo is started, consumers will pull data such as the address interface of registered producers from Zookeeper and cache it locally. Each invocation is made according to the address of the local store. Zk guarantees CP, so it will lock up during the re-election and cannot use the watch mechanism. Every time the watch, the session node dies - What is the difference between the start election and the break election - ZAB protocol - three roles of ZKCopy the code
4.4.2 Distributed Lock
1, distributed lock implementation, their differences? Application scenarios?
1. Zk distributed lock - Unfair lock - Principle: a node attempts to create a temporary ZNode and obtains the lock if the creation succeeds. At this time, other clients to create the lock will fail, can only register a listener to listen to the lock. To release the lock is to delete the ZNode, notify all clients once it is released, and then have a waiting client to re-lock again. - Problem: If a temporary node disappears in the case of large concurrency, many threads will attempt to create temporary nodes at the same time, which will affect the stability of ZK. This effect is called herd effect. - fair lock - the principle that each thread under the zk creates a temporary sequence node, if the current node is the smallest of temporary sequence node a on behalf of myself to get the lock, if not to monitor a smaller than their node, each thread only listen to one node - advantage: the impact scope of only one - the Shared lock - read-write lock - principle: Redis (1) SET NX PX (1) SET NX PX (2) SET my:lock NX PX 10000 (NX means that the key does not exist, PX means that the timeout duration is 10 000ms, 10s) When redis breaks down, deadlocks will occur. Value: This value must be different from that of all competitors who obtain the same key. Otherwise, someone else will obtain the lock after the timeout, resulting in inconsistent services. When the lock is released, the Lua script tells Redis that the lock was deleted only if the key exists and stores the same value as the one I specified. (2) Official recommendation -- RedLock algorithm 1. Get the current time in milliseconds 2. Try to get locks in all N instances sequentially, using the same key name and random value in all instances. When a lock is set in each instance, the client uses a timeout that is smaller than the total automatic lock release time to obtain it. For example, if the auto release time is 10 seconds, the timeout can be in the range of ~5-50 milliseconds. This prevents the client from remaining blocked for a long time trying to communicate with the Redis node, and if the instance is unavailable, we should try to talk to the next instance as soon as possible. 3. The client calculates how long it takes to obtain the lock by subtracting the timestamp obtained in Step 1 from the current time. A lock is obtained if and only if the client is able to obtain the lock in most instances (at least three) and the total time elapsed to obtain the lock is less than the lock validity time. 4. If a lock is acquired, its duration is considered to be the initial duration minus the elapsed time, as calculated in Step 3. 5. If the client fails to acquire a lock for some reason (unable to lock N/2 +1 instances or negative valid time), it will attempt to unlock all instances (even if it thinks it can't lock). 3. Comparison between Redis distributed lock and ZK distributed lock - Redis distributed lock: you need to constantly try to obtain the lock, which consumes performance. If the client that Redis obtains the lock is buggy or hangs, you can only release the lock after timeout. -zk Distributed lock: if the client fails to obtain the lock, you can register a listener and do not need to actively attempt to obtain the lock. The performance cost is low because the zNode is created temporarily. If the client hangs, the ZNode will disappear and the lock will be automatically released. 4. Distributed interface idempotent, sequential issues - idempotent: write through fields like orderId as the key of Redis, and each request is verified by that key - Sequential: it is best to combine two operations into one operation, or avoid this logicCopy the code
4.4.3 Distributed Transactions
1. What are the schemes of distributed transactions? The advantages and disadvantages?
(1) two phase commit solutions/XA (not recommended, efficiency is slow, easily jammed) - a single point of failure: due to the importance of the coordinator, once the coordinator fails, the participants will continue to be blocked, especially in the second phase, the coordinator fails, then all of the participants is still in the lock state of affairs resources, and can't continue to complete the transaction. If the coordinator fails, you can re-elect a coordinator, but there is no way to resolve the problem of participants and blocked states because the coordinator is down Data inconsistency: in two-phase commit phase two, after the coordinator sends a request to the participants, hair became a local network anomalies or recurrence from commit request coordinator fault happened in the process, part of the participants have received a commit request after this time lead to, and in this part of the participants received a commit will perform commi behind the request T operation, but other parts of the machine that did not receive the commit request cannot perform the transaction commit, so the whole distributed system has inconsistent data - the transaction state is uncertain: Coordinator to issue a commit message after downtime, and the only to receive the message of the participants also goes down, so the coordinator in time through election agreement creates new coordinator, this state of affairs is also uncertain, no one knows whether the transaction has been submitted for (2) the TCC compensation scheme (usually related to money will use) 1) Try stage: this stage refers to checking the resources of each service and locking or reserving the resources. 2) Confirm stage: this stage refers to performing actual operations in each service. 3) Cancel stage: If the business method execution of any of the services fails, then compensation is needed here, which is to perform the rollback operation of the successfully executed business logic (3) local message table (proposed by ebay abroad, generally no one, high concurrency and expansion problems) (4) Reliable message final consistency scheme (using RocketMQ, 2) If the prepared message fails, the local transaction will be cancelled. If the prepared message fails, the local transaction will be executed. If it succeeds, the local transaction will tell MQ to send the confirmation message. If failure will tell mq rollback 3) if you send the confirmation message, so at this point B, the system will receive a confirmation message, and then perform local affairs 4) mq automatically regularly polling all prepared news callback interface, you ask you, this news is local transaction failed, all did not send a confirmation message? Is that retry or rollback? Generally you can check the database here to see if the previous local transaction was executed, and if it was rolled back, then roll back here as well. This is to avoid the possibility that the local transaction executed successfully, do not confirm that the message sent failed. 5) In this scenario, what if the transaction of system B fails? Retry, automatically retry until successful, if it is not possible, or for the important fund services roll back, for example, after the local rollback of system B, try to inform system A to roll back; Or send alarm manual rollback and compensation by artificial (5) the best notification scheme (MQ) 1) system is A local transaction execution, after sending A message to the MQ 2) there will be A special consumption MQ best notification service, the service will be consuming MQ and then recorded in written to the database, or is in an in-memory queue can also, 3) If system B succeeds, it is ok; If system B failed to perform, so best to try a call notification service is timing system B, N times repeatedly, finally can't give up (6) alibaba open source framework Seata http://www.dreamwu.com/post-1741.html - several patterns contrast - AT mode: Automated transactions, simple to use, non-intrusive to business code - Advantages: AT mode phase ONE, phase two commit and rollback are automatically generated by the Seata framework, users can easily access distributed transactions by writing "business SQL", AT mode is a non-intrusive distributed transaction solution. - TCC mode: - Advantages: Compared with THE AT mode, THE TCC mode is more intrusive to service code, but the TCC mode does not have the global row lock of the AT mode, and the TCC performance is much higher than that of the AT mode. - Saga mode: - Advantages: One-stage commit local database transaction, no lock, high compensation service is the "reverse" of forward service, high throughput participants can execute asynchronously, high throughput. - XA mode - Advantages: non-intrusive, snapshot data and row locks are delegated to the database by XA instructions - Core components of AT mode: - Transaction coordinator TC: Maintains the status of global and branch transactions, and indicates global commit or rollback. - Transaction Manager TM: Starts, commits, or rolls back a global transaction. - Resource manager RM: manages the resources that execute branch transactions, registers branch transactions with TCS, reports branch transaction status, and controls the submission or rollback of branch transactions. - Workflow: 1.TM requests TCS to start a new global transaction. 2.XID is passed to other microservices through the invocation chain of the microservice. 3.RM registers the local transaction with the TC as a branch transaction of the XID. 4.TM requests TC to commit or roll back the XID. 5.TC directs all branch transactions under this XID to commit and roll back.Copy the code
4.4.5 Database and table
1. How to migrate sub-database sub-table without shutdown?
System online, and before all of the library, increases the deletion operation, in addition to all the old library increase deleted, all add to the new library of deleted, it is called double write, write according to gmt_modified such field judgment to the last modified time, unless it is read out data in new garage no, or the new data will be written than the new library. It is possible that the data is still inconsistent, so the program automatically does a round of verification, comparing each table of the new and old library data, and then if there is a difference, for those differences, read data from the old library and write again. This loop is repeated until the two libraries have exactly the same data for each tableCopy the code
2. How to design the database and table division scheme that can be dynamically expanded?
Summary of the scheme: 1. Set several database servers with several libraries on each server and how many tables are in each database. It is recommended that 32 databases * 32 tables be set. OrderId module 32 = library, orderId / 32 module 32 = table; 3. During capacity expansion, apply for adding more database servers, install MySQL, expand the capacity by multiple, from 4 servers to 8 servers, and then to 16 servers; 4. DBA is responsible for migrating the original database server library to the new database server. There are some convenient tools for database migration. 5. Modify the configuration to adjust the IP address of the database server where the migrated libraries reside. 6. Re-publish the system and go online, without changing the original routing rules, and directly continue to provide online system services based on n times of database server resources; Sub-table strategy: 1. Selecting modules 2. Sub-table scope - usually time 3. MyCat (heavyweight solution) - Advantages: 1, no perception of development 2, add and delete node programs do not need to restart 3, cross-language (Java, PHP) - disadvantages: Sharing-jdbc (lightweight solution) Advantages: 1, very good performance 2, support cross-database JDBC disadvantages: 1, increased development difficulty 2, do not support cross-language (Java) 3. Comparison: 1) Mycat is a third-party middleware application, sharding-JDBC is a JAR package 2) Mycat does not need to change the code, while sharding-JDBC needs to change the codeCopy the code
3, how to generate global ID?
Snowflake algorithm 1bit: 0, because all numbers are positive 41 bit: timestamp 10 bit: record the working machine id 5 bits higher than the machine room id 5 bits lower than the machine id 12 bit: this is used to record the different ids generated in the same millisecondCopy the code
4. How to solve the problem of paging query of horizontal sub-table?
(1) Advantages of building associated tables: through service layer modification, expand the amount of data query, get the global view, business lossless, accurate disadvantages (obvious) : each sub-library needs to return more data, increase the amount of network transmission; In addition to the database to be sorted by time, the service layer also needs to be sorted twice, resulting in performance loss; With the increase of page number, performance is extremely degraded, the amount of data and sorting will be greatly increased, performance square decreased. (2) Advantages of secondary query: business data can be accurately obtained, and the amount of data returned each time is very small, and the amount of data will not increase with the increase of page number. Disadvantages: PageSize =10, offset = (1001-1) *10) =10000, Select * from table limit 2500,10; 2, to the four returns the records in the table id matching (id if not the integer, using hashCode match), compare the head of each table records the id of the value, obtain the minimum minId, and four largest value table maxId, Select * from db_x where id between minId and maxId 3. Record the position of minId, such as T1 at 2500,T2 at 2500-2=2048,T3 at 2500-3=2047, and T4 at 2500-3=2047 Then the final number of records is :2500+2048+2047+2047=10000-2-3-3=9992, so we need to query the number of records from 9992 to start with 8, and then 10 is the desired data. (3) Merge result sets of several tables with union all, and then perform paging query. (4) Suppose three sub-tables, the number of records is 90,120,80 respectively, the total records are 290, 40 pages per page: page 1: Table 1-40 page 2: Table 1-80 page 3: Table 1-90 + Table 2-30 page 4: Table 2-31-7... (5) Use sphinx (sphinx) to build index first, and then paging query.Copy the code
5. How to solve the cross-library join problem? Count, Order by,group by, and aggregate functions across nodes?
Cross-library JOIN: usually two queries are used to find out the data of the primary table first, and then perform a second query based on the data of the primary table. Cross-node function problem: Get the results separately on each node and merge them on the application side. Unlike a JOIN, the queries on each node can be executed in parallel, so it is often much faster than a single large table. But if the result set is large, application memory consumption is an issue;Copy the code
4.5 Microservices Architecture (SpringCloud)
4.5.1 Registry
1. Differences between Eureka, Consul and NACOS
Eureka in-app/out-app: directly integrated into the application, relying on the application itself to complete the registration and discovery of services, CAP principle: follow the AP (availability + separation tolerance) principle, have strong availability, fast service registration, but sacrifice a certain consistency. Version Iteration: currently not upgraded Integration support: only SpringCloud integration access protocol: HTTP Avalanche protection: avalanche protection supported Consul In-app/Out-app: Belongs to an external application, invasive Small CAP Principles: Service registration is slow following the CP principle (consistency + separation tolerance) because its consistency causes the real Consul to be unavailable during a re-election if the Leader fails. Version iteration: version iteration is still implemented Integration support: SpringCloud K8S integration access protocol: HTTP/DNS Avalanche protection: avalanche protection is not supported NACOS In-app/Out-of-app: Belongs to external applications, small invasive ACP Principles: Notification Complies with CP principles (consistency + separation tolerance) and AP principles (availability + separation tolerance) Version iteration: Version iteration Integration: Dubbo, SpringCloud, and K8S integrated access protocols: HTTP, dynamic DNS, and UDP Avalanche protection: avalanche protection is supportedCopy the code
4.5.2 Fuse Current Limiting Degradation (Hystrix)
1. Hystrix Isolation policy
- Thread pool isolation (default) : A thread pool mechanism whereby each command runs in a thread and limiting flow is controlled by the size of the thread pool. - HystrixCommand-run () resource isolation policy - THREAD based on THREAD pool: HystrixCommandProperties. Setter (.) withExecutionIsolationStrategy (ExecutionIsolationStrategy. THREAD) - is benefit for the network access request, If there is a timeout, it prevents the calling thread from blocking; - SEMAPHORE is based on semaphores: HystrixCommandProperties. Setter (.) withExecutionIsolationStrategy (ExecutionIsolationStrategy SEMAPHORE) - in view of the large amount of concurrent scenarios, Each service instance has hundreds of QPS per second, so if you use a thread pool, there are usually not too many threads, you may not be able to support that high concurrency, if you want to support, may consume a lot of thread resources, so you use semaphores for limiting protection, based on semaphores, performance is much better; - the maximum number of concurrent allows access to (the default value is set 10 smaller, because otherwise the semaphore is based on the calling thread to execute the command, and not from a timeout, so once set is too large, and there are delay happened, may result in tomcat itself thread resources this moment filled), more than the maximum concurrent, The request is rejected. - command Command name Each command can set its own name, and it can set its own group command group By default, because it is the command group that defines a thread pool, Threadpool key is a HystrixThreadPool. If a request from the same command group is sent to the same threadpool, The default threadpool key is the command group name coreSize, which sets the size of the threadpool. The default is 10Copy the code
2. Hystirx execution process and working principle (watch video)
-Execution process: HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand HystrixObservableCommand 7. Short circuit health check 8. Fallback degradation mechanism 9. Configure alarm to send emails or push enterprise wechat robotCopy the code
Hystirx source code
https://blog.csdn.net/loushuiyifan/article/details/82702522 1, Hystrix the entire workflow is as follows: 1. Construct a HystrixCommand or HystrixObservableCommand object that encapsulates the request and configates the parameters needed for the request to be executed in the constructor; 2. Execute commands. Hystrix provides four ways to execute commands, which are described in detail later. 3. Determine whether to use cache to respond to a request. If cache is enabled and available, directly use cache to respond to a request. Hystrix supports request caching, but requires custom startup; 4. Check whether the fuse is on. If so, go to step 8. 5. Check whether the thread pool/queue/semaphore is full, if it is, skip to step 8. 6. Perform HystrixObservableCommand. The construct () or HystrixCommand. The run (), failure or timeout, jump to step 8; Otherwise, skip to step 9; 7. Make statistics of fuse monitoring indicators; Circuit Breaker: 1. CircuitBreaker. Enabled Whether to enable the fuses. 2. CircuitBreaker. ForceOpen fuse forced open, keeping open all the time, don't focus on the actual state of blowout switch. The default value is FLASE. 3. CircuitBreaker. ForceClosed fuse forced closure, remain closed, don't focus on the actual state of blowout switch. The default value is FLASE. 4. CircuitBreaker. ErrorThresholdPercentage error rate, the default value is 50%, for example, for a period of time (10 s) with 100 requests, there are 54 timeout or abnormal, the error rate is 54%, then this period of time is greater than the default value is 50%, This will trigger the fuse to open. 5. CircuitBreaker. RequestVolumeThreshold default value is 20. ErrorThresholdPercentage is calculated only when there are at least 20 requests in a period of time. For example, there are 19 requests for a period of time, and all of these requests fail. The error rate is 100%, but the fuse does not turn on, and the total number of requests does not meet 20. 6. CircuitBreaker. SleepWindowInMilliseconds ajar sounding sleep time, the default value is 5000 ms. For example, after the fuse is turned on for 5000ms, it will try to release part of the traffic to test whether the dependent service is restored. 1. Call allowRequest() to determine whether the request is allowed to submit to the thread pool - if the fuse is forced to open, circuitBreaker. ForceOpen is true, disallow and return. - if the fuse closed, circuitBreaker forceClosed is true, allow the release. In addition, you don't have to worry about the actual state of the fuses, which means the fuses still maintain statistics and switch states, just not in effect. 2. Call isOpen() to check whether the fuse switch is on - If the fuse switch is on, go to the third step, otherwise continue; - if the total number of requests of one cycle is less than the circuitBreaker, requestVolumeThreshold values, allowing the request, otherwise continue; - if a cycle error rate is less than the circuitBreaker, errorThresholdPercentage values, allow the request has been submitted. Otherwise, turn on the fuse switch and proceed to step 3. 3. Call allowSingleTest() to determine whether a single request is allowed to pass and check whether dependent services are restored - if the fuse is turned on, Fuse open and distance of the time or the last test request more than circuitBreaker release time. SleepWindowInMilliseconds value, fuse into the ajar, clears a testing request; Otherwise, release is not allowed. In addition, to provide a basis for decision making, each fuse maintains 10 buckets by default, one per second, and the oldest bucket is discarded when a new bucket is created. Each blucket maintains counters for success, failure, timeout, and rejection of requests, which Hystrix collects and counts.Copy the code
4.5.2 Link Tracing
To be added
4.5.3 Cloud Container Docker/K8s
1. Common components of K8S
1. Control Plane Components - KUBE-APiserver (API) external exposure OF K8S API interface, Etcd - ETCD is both consistency and high availability of the key value database, can be used as a background database to save all the cluster data Kubernetes. - Kubernetes cluster etCD database usually need a backup plan. - kube-scheduler (seched) controls the plane component that monitors newly created Pods that have no specified running node (node) and selects nodes for Pods to run on. - kube-controller-manager (C-M) Runs the components of the controller on the primary node. These controllers include: - Node Controller: responsible for notifying and responding to a Node failure - Replication Controller: Responsible for maintaining the correct number of POD-Endpoints controllers for each replica Controller object in the system: Populate the Endpoints object (add Service and Pod) - Service Account & Token Controllers: Create default accounts and API access tokens for the new namespace - Cloud-Controller-Manager (C-C-M) - Cloud controller manager is a control plane component that embeds control logic for a particular cloud. The Cloud Controller manager allows you to link aggregation into the cloud provider's application programming interfaces and isolate the interacting components that interact with your cluster. - The following controllers all contain dependencies on cloud platform drivers - Node Controller: used to check the cloud provider after the Node stops responding to determine whether the Node has been deleted - Route Controller: Used to set up route-Service Controllers in the underlying cloud infrastructure: used to create, update, and delete cloud provider load balancers 2.Node component - kubelet An agent that runs on each Node in the cluster. It ensures that containers are running in a Pod. - kube-proxy Kube-proxy is a network proxy running on each node in a cluster, which is part of the Concept of Kubernetes Service. Kube-proxy maintains network rules on nodes. These network rules allow network communication with pods from network sessions inside or outside the cluster. - Container Runtime The Container Runtime environment is the software that runs the Container. Kubernetes supports multiple container environments: Docker, Containerd, Cri-O, and any implementation of Kubernetes CRI.Copy the code