We all know MySQL is called relational database, many other storage engines are called non-relational database, here to talk about GDB is one of them. Ironically, MySQL is known as a relational database, but it’s actually not that friendly with relationships. If the Join statement is careless, it will be a slow query. DBA students often stare at the Join statement and advise us not to use it if possible. The Graph Database (GDB), which itself represents association, can handle these problems well.

In community business, relationships are especially important, especially the relationship between users and users, users and content like the relationship, etc. This information represents users’ preferences, and we can use this information to connect them to like-minded people, so that they can see more of what they like. In this article, we will talk about GDB through several questions in the acquisition community of 100 million level of actual combat relationship chain.

What is the GDB

Graph Database is a non-relational Database that uses Graph data structure for semantic query. It uses “nodes”, “directed edges”, and “attributes” to represent and store data. Let’s take a picture to get an intuitive feeling. In the picture below, each dot is a User, and the line between dots represents the relationship of “Attention” :

By comparing with MySQL, we can better understand some noun concepts in graph data as follows:

Database name The entity name Object name The object data The query syntax
GDB Label Nodes, edges (Node, Edge) Properties Cypher
MySQL Table (table) Record Field SQL

A relatively unfamiliar word Cypher appears in the above table, which is the exclusive query syntax of Neo4j, a relatively mature graph database on the market. Now, it has been standardized and supported in the graph databases of various manufacturers. Again, let’s compare the syntax to get a feel for it:

Business semantics SQL Cypher
Example Query 10 users select * from users limit 10; match (a:users) return a limit 10;
Query for users labeled “obvious” select t.* from users as tinner join user_tag_relation as utr on utr.userId= t.userIdinner join user_tag as u on U.tagid = utr.tagIdwhere u.tag_name= ‘star’ Match (t:users)-[UTr :usesr_tag_relation]->(u:user_tag) where u.tag_name = ‘star’ return t
legend ER diagram: Graph structure:

The syntax (t:users)-[UTr :usesr_tag_relation]->(u:user_tag) is very similar to the graph structure. If parentheses () are regarded as nodes and parentheses [] are regarded as edges, ()-[]->() can be used to express the storage structure.

Why GDB

As mentioned in the introduction, some of the community’s businesses are well suited for implementation using GDB. Two common services are listed below:

  1. Whether the people I follow have liked it.

  2. People that I follow other people that I follow at a given time.

We respectively through the MySQL scheme and GDB scheme to deal with, and then compare the advantages and disadvantages of them, will be able to get the answer of this section. The sample data is depicted as a graph below.

Whether the people I follow have liked it

MySQL > select * from ‘MySQL’;

  • Select userId from user_follow where userId = {$userId}
  • Select * from content_light where userId in {followUserId} and contentId in {contentId} from content_light where userId in {followUserId} and contentId in {contentId}
    • If you need to query an index too many times, it may not be as fast as simply scanning the table. This is also the reason why DBA suggested that our in query should not exceed 200.

GDB plan:

  • Write the structure of nodes and edges according to the legend: match (u1:User) -[:Attention]->(u2:User)-[:Light]->(c:Content) where u1.userId = {userId} and c.contentId in {contentIds} return u2.userId , c.contentId;

Scheme comparison:

In the MySQL scheme, there is a risk of slow search when there are too many “users I follow” (when acquiring objects, there are many users who follow more than 1000). In THE GDB scheme, the query statements are concise and clear, and the query efficiency is also high. There is no graph and no evidence. The following figure shows the practice data (without Redis cache) :

The people I follow and the other people I follow within 24 hours

MySQL > select * from ‘MySQL’;

  • Select userId from user_follow where userId = {$userId}
  • The user_FOLLOW_XX sub-table queries other people who follow users and follow users

GDB plan:

  • Match (u1:User)-[:Attention]->(u2:User)-[:24HAttention]->(u3:User) return u3;

Scheme comparison:

In the MySQL scheme, it is necessary to divide tables in MySQL due to concerns about data reaching hundreds of millions of levels. However, it is precisely because of this process that the query of this business scenario is more troublesome, and it needs to be separated and called separately, which can be imagined as the complexity. In THE GDB scheme, the query statement is simple and clear, and the query efficiency is shown as follows:

How do I use GDB

Grammar learning

  • Cypher:neo4j.com/docs/cypher… More similar to SQL, more visual, easy to understand, low learning cost

  • Gremlin: tinkerpop-gremlin.cn/#traversal is more like ORM in that it encapsulates many, many chained methods and is expensive to learn

Problems encountered & solutions

    1. Unique index problem: To ensure that data is unique, we usually set uniqueness constraints

Create CONSTRAINT on (a:User) ASSERT A. User id IS UNIQUE Merge (n:User{userId: duplicate key update) userId}) on create set n.isAllowLike = isAllowLike on match set n.isAllowLike = $isAllowLike return n.userId as userId

    1. Secondary query efficiency:

Try not to use the attributes of the secondary query to sort, you can reduce the order of magnitude of the secondary query according to the business, and sort in the result code Badcase example: match (u:User)-[:Attention]->(c:User)-[a:Attention]->(t:User) where u.userId=userId and a.createTime > {TTFtime} return C. usserid as cUserId, t. usserid as tUserId Order by A.createTime desc Goodcase example: match (u:User)-[:Attention]->(c:User)-[a:TTFAttention]->(t:User) where u.userId=$userId return c.userId as cUserId, t.userId as tUserId

The future application

Here are some examples to extend the application scenarios.

Deployment dependency graph

When we release a version, we usually sort out the release list. One of the most important parts is to clarify the dependency relationship. Sometimes, there are many projects and the dependency relationship is complicated. We express dependencies very clearly through GDB.

The following figure shows the deployment diagram of a certain version with the following characteristics:

  • Deployment diagrams become a “forest”
  • Different “trees” in “Forest” can be published independently
  • The tree’s dependencies are known from directed edges
  • You must start publishing from the root node.

User portrait

Get yourself a user portrait, through the likeness of different users of the portrait, can find like-minded friends.

Knowledge map

Want to know the relationship between the major families in Game of Thrones? This kind of knowledge map can be very handy to look up someone’s family relationship.

conclusion

At present, the community GDB service supports hundreds of millions of point and edge relational data, hundreds of QPS, and the average RT is about 28ms, which better supports this part of the business scenario. Of course, GDB is certainly not omnipotent, I believe that there is no best technology in the world, only the most suitable for the current application scenario of technology.

Due to the limited space, this document does not discuss with you in depth. Offline discussion is welcome. Finally, I hope this article can bring you some inspiration and ideas, thank you for reading.

If it helps, you can leave a message, you can also pay attention to the “object technology wechat” public number!