Introduction to the
A graph is made up of Vertices and edges. In ArangoDB, in addition to collections with vertices, there are also collections with edges, in which edges are stored as documents.
There are two collections in ArangoDB;
A vertex can be a document in a document collection or a document in an Edge collection, so edges can be used as verticesCopy the code
What is the graph?
In the SQL world, for n:m relationships (many-to-many), it is common to create a separate table for that relationship, thus linking the corresponding two tables. Edge collections in ArangoDB are similar to this concept, while Vertex collections, or document collections, are similar to SQL tables that store data that you want to be associated with.
In SQL, the problem of querying the friends of Wang’s friends can be realized through multiple join, but in the graph, it only means that starting from the point of Wang, two hops is performed, which is called graph traversal.
In a directed graph, edges are oriented, pointing from one document in the Vertex collection to another, with _from and _to indicating the direction. When writing a query, you can specify the direction of the edge, such as OUTBOUND: _from → _to,
The INBOUND: _from please _to,
ANY
: _from
↔ _to
Which graph algorithms ArangoDB supports
Graph traversal
- Traverse the edge along outbound, inbound, or any direction
- Traversal at the specified depth
- Depth first, breadth first, with weight traversal
- Can match pruning condition
The shortest path
For example, when we use Autonavi and Baidu map navigation, the shortest path between two points
K shortest paths
K a path
Distributed Iterative Graph Processing (Pregel)
- Page Rank
- Seeded Page Rank
- Single-Source Shortest Path (SSSP)
- Connected Components
- Weakly Connected Components(WCC)
- Strongly Connected Components (SCC)
- Hyperlink-Induced Topic Search (HITS)
- Effective Closeness Vertex Centrality
- LineRank Vertex Centrality
- Label Propagation Community Detection
- Speaker-Listener Label Propagation (SLPA) Community Detection
- Programmable Pregel Algorithms (experimental)
Named diagrams and anonymous diagrams
Named Graphs are managed entirely by ArangoDB. You can see them in the Web interface and have all the Graphs in ArangoDB. In ArangoDB, a layer called graph Module provides the following guarantees:
- All changes are transactional
- If a vertex is deleted, all adjacent edges are also deleted
- If an edge is inserted, the edge is checked to see if it matches the definition of the edge
If you do not access the corresponding collections through the Graph Module, all of these guarantees are lost, and your manipulation may result in data inconsistencies, such as dangling edges.
Anonymous Graphs have no corresponding edge definition to describe which vertex collection is joined by which edge collection.
Which graph should I choose?
As mentioned above, named graphs ensure graph integrity when edges or vertices are inserted or removed. Therefore, even if you use the same Vertex collection in multiple named graphs, you will not encounter dangling edges. But this involves more operations inside the database, and these operations come at a cost.
Therefore, anonymous graphs can be faster. So the question is a tradeoff between performance and integrity.
How do you determine points and edges
When building a graph, the first question is, which of my data should be points in the graph, and which should be edges? One way to help you think about it is to first describe the business scenario in short sentences, with nouns as dots and verbs as edges.
For example, if we have data about users and their organizations, a business scenario would be: What organization (noun) does a user (noun) belong to?
ArangoDB can be stored in two different Vertex collections. Groups collections can store attributes of organizations, such as the name of the organization and the date of its establishment. URL of the organization, etc. Similarly, the Users Collection can store a user’s name, date of birth, gender, and other attributes.
The ownership relationship between the user and the organization can be linked by edges. Logically, there is a many-to-many (M :n) relationship between the two. In SQL, we would create a separate table of relationships and store the corresponding relationships with foreign keys. In the ArangoDB graph database, we would use an edge collection, such as a collection called UsersInGroups, to store dependencies. For example, one edge could be: _from pointing to the Users/John, point _to Groups/BowlingGroupHappyPin, form the relationship as shown in the figure below.
You can also add attributes to the relationship, such as when you joined the organization (e.g., since: 2022-4-12), and roles in the organization (role: member).
Compared with the storage method in RDBMS, the use of graphs can be more intuitive to see the relationship between different entities, more conducive to in-depth analysis and modeling of data. This is especially true when it comes to multi-hop scenarios.
Backup restore
You can use the Arangodump backup and restore the backup to a new ArangoDB using the ArangoRestore.
Two points to note:
- To back up named diagrams, use system Collection
_graphs
- Partial backup restore may not work if the full edges and points contained in the backup diagram are required
Figure sample
ArangoDB comes with a few examples to help you understand diagrams and their APIs, and this article lists just a few of them.
Knows_Graph
In this example, there is only one kind of node (Persons) and one kind of relationship (KNOWS). The relationship between five people is shown in the figure, and the direction of the arrow represents the direction of the relationship (KNOWS)
- Alice knows Bob
- Bob knows Charlie
- Bob knows Dave
- Eve knows Alice
- Eve knows Bob
Social Graph
There are two types of nodes in this figure, namely female and male, and the relation between people is stored in Relation Collection
City Graph
reference
[1] www.arangodb.com/docs/stable…).