1. Community Introduction:
Niangao Mom Community, as a platform to accept Niangao Mom wechat users to APP, provides scientific parenting services for more than millions of users. The community consists of attention, recommendation, discovery, search, and parenting knowledge. Users can post in the community, participate in the topic, punch, comment, like, favorites, add attention.
2. Difficulties in community implementation
- Feed implementation, how to ensure high concurrency and low latency
- High concurrency under various data statistics. A page has the number of posts to read, the number of comments, the number of likes, the number of users’ fans, the number of concerns and so on
- Paging query TAB under the article list various sorts
3. Community implementation plan
The overall architecture diagram is as follows:Copy the code
3.1 the feed is introduced
- Feed: Every message in a feed stream is a feed. A message in moments is a feed, and a tweet in a microblog is a feed.
- Feed stream: A stream of information that is constantly updated and presented to users. Everyone’s moments, Micro blog page and so on is a Feed stream.
- Timeline: Timeline is a type of Feed flow. Microblog and moments are all Timeline Feed flows
- The following feed of the Rice Cake Mom community is made up of posts by people who follow it.
3.2 Feed stream implementation mode
3.2.1 pull mode
A. Publish a post is simple, a publish a post, only need to store to the post table
B. The unfollowing process is simple. A unfollows B: In this case, you only need to delete B from A’s following list and A from B’s fan list.
C. A’s acquisition of feed flow is complicated. First, A obtains all concerned users of A, then obtains and sorts the posts published by these users, and takes out the corresponding page of posts in pages.
-
Advantages:
1, the storage structure is simple, the storage capacity is small, feed data can only be stored in one copy 2, follow, release feed process simple, easy to understand, suitable for rapid development.Copy the code
-
Disadvantages:
1. The process of obtaining user feed stream is complex, requiring multiple queries. 2Copy the code
3.2.2 push mode
When a user triggers an action (such as Posting), their actions are recorded in the behavior table, which also corresponds to the user’s followers table, and a feed is inserted for each follower. Fans read the feed stream, just read the feed stream and sort it.
-
advantages
The business process of pulling the feed stream is simple and the query performance is highCopy the code
-
Disadvantages:
1. Multiple copies of feed data will be stored, which will seriously consume storage resources, especially in the case of large V fans. 2Copy the code
3.2.3 Combination of push and pull modes
Users with a lot of followers are called big Vs. Ordinary users publish feeds in push mode. After the offline user goes online, the feed is pulled regularly, and the background synchronizes the large V feed to the feed stream of the user to complete dynamic pull and push. This mode avoids the rapid expansion of FEE stream storage when big V users publish feeds, which leads to the delay of users’ query feed.
3.3 Counting center implementation
The community displays various numbers, such as the number of posts on the details page, the number of reading, comments, likes, the number of fans, the number of attention, the number of messages, etc. In the number of types, high concurrency, how to achieve data counting, there are commonly the following scheme.
3.3.1 Traditional count counting method
Select count(*) from post where user_id = XXX
- advantages
The count counting method is simple to implement and accurate in statistics. It is suitable for small-volume and low-concurrency services.
- Disadvantages:
Counting counts one by one, and implementing a business often requires multiple queries
3.3.2 Counting (external) redundancy method
Through the analysis of community counting business, two dimensions of counting are obtained
- User dimension: number of followers, number of fans, number of posts, number of likes and favorites
- Post dimension: number of views, likes, comments, favorites
Indicators of these two dimensions can be stored separately by adding attributes in the user table and the post table, or two new tables can be created to store the user count table and the post count table.
- Update set post_num = post_num ++ where user_id = XXX
- To query the post count, select PV,like_num,comment_num from post where post_id = XXX
This method is the count – out method, is also a data redundancy method.
-
advantages
1, one time to query multiple counts, one dimension count does not need to query multiple times 2, query through the primary key index, high efficiencyCopy the code
-
Disadvantages:
1, data redundancy, data inconsistency may occur 2, under high concurrency, DB pressure increasesCopy the code
3.3.3 Counting external improvement scheme
For the business of Niangao Mom community, we abstracted the counting service center (bubble service), used Redis cache for real-time counting, and periodically synchronized redis counting data to DB. This avoids the stress of high parallel db and improves the ability to count and read.
3.4 Community Search
Business background: community search, by entering keywords, select labels, query posts, parenting knowledge, etc.
3.4.1 Search selection:
Mysql5.6.4 and above innoDB also introduced full text search, can be directly through the match query, simple scheme, but the built-in word segmentation, search effect may not meet the requirements. Elasticsearch and OpenSearch are more professional search engines
3.4.2 Index synchronization Implementation
-
Full amount of synchronization
Historical data synchronization: Scan the entire table to synchronize data to the indexCopy the code
-
The incremental synchronization
Add data or modify data synchronization. Solution 1: Synchronize data to the index in the service code advantages: Simple implementation, data synchronization is scattered in the service, service maintenance disadvantages: high coupling, service code confusion, cannot be reused Solution 2: synchronize data to the index by listening to the binlog log of the database Advantages: decoupled, reusable disadvantages: An additional framework was introduced to increase the complexity of the system and finally we adopted the second solution, binlog subscription and data synchronization, by introducing CanalCopy the code
3.4.3 Index Query
-
Use the search engine API
There is a cost to learn, and every developer needs to learn additional API usage. Is there a tool that people can use without learning
-
Since the research esqlParse
Combined with Mybatis, the SQL statement is translated into the API of the search engine to achieve index query.
Advantages: reduce the development of learning costs, development only need to write SQLMAPper, you can achieve index query
Disadvantages: the functionality is not complete, some complex queries are not supported
3.5 feeling
Architecture is built on top of the business, so it’s hard to do it all at once. This article still has a lot of imperfect place, also had other problem not explained to, welcome everybody more exchanges!
Nicomama: the wind smile