Trie
树
Trie tree, namely dictionary tree, also known as prefix tree. This data structure is typically used to store strings in the form of paths. Strings with a common prefix share the same parent node path.
The core idea of Trie is space for time. The common prefix of string is used to reduce the cost of query time to improve efficiency.
There are three basic properties of prefix trees:
- The root node contains no characters, and each node except the root node contains only one character.
- From the root node to a node, the characters passing through the path are concatenated to be the string corresponding to the node.
- All children of each node contain different characters.
graph TD
ROOT --> t
t --> e
e --> a
e --> n
t --> o
a --> tea
n --> ten
o --> to
Apply to route matching
The routing path structure is also a string and begins with/to connect different levels. There may be different subpaths at the same level. Similarly, such a batch of paths has a common prefix. Route matching is a process of searching in the existing routing table, and it is stored first and then searched. Considering the nature of Trie tree, this data structure can be used to store the routing table for easy searching.
Starting with and extending the basic Trie properties, paths are divided into components based on /, which are the smallest units, equivalent to the characters in a string, used to construct nodes in a tree.
Such as:
/api/user/nameList
/api/user/addressList
As described above, split the two paths with/to form a group of components that have two levels of common paths. The tree structure looks like this:
graph TD
ROOT --> B
B['api'] --> C
C['user'] --> D['nameList']
C --> E['addressList']
D --> F[handler1]
E --> G[handler2]
As you can see, it takes only four nodes to store two routing table records, and the more common paths you have, the more space you save.
Add the routing declaration to the Trie tree
Here’s the code to add the route declaration Component to the Trie tree:
addToTree(urlPattern: string, handler: any) {
let p = this.root;
// Padding an element to the rear of the array to make the leaf node.
const urlPatternComponents = [...urlPattern.split('/').filter(Boolean), LEAF_SIGN];
urlPatternComponents.forEach(component= > {
const { quickMap } = p;
// If quickMap has this component, it means the route has the same namespace
// with existed route, so get to the next level directly. If the node is a leaf
// node, just return cause it means redundant route is adding to the tree, we dont need it.
if (p.quickMap.has(component as string)) {
const node = p.quickMap.get(component as string)! ;if (isLeafNode(node)) {
return;
}
p = node;
return;
}
if (component === LEAF_SIGN) {
const newNode = new RouterTreeLeafNode(handler);
quickMap.set(LEAF_SIGN, newNode);
return;
}
const newNode = new NTreeNode(component as string);
p.quickMap.set(component as string, newNode);
// When the expression like ':id' shows in the route, it should
// treat it as a parameter node.One tree node can only have one parameter node.
if ((component as string).indexOf(':') > -1) {
p.paramNode = newNode;
}
p = newNode;
});
}
Copy the code
Here, a quickMap is used to store child nodes, with key as Component and value as node, which is used to quickly find nodes matching Component values during the matching process. Note that the end of the urlComponents array is filled with a Symbol called LEAF_SIGN. If you look at the tree structure above, you can see that after the actual route declaration Component is traversed, the value of the leaf node stores the last Component. So we need to add a child node to it to store the result of the actual match, which is the Handler for the route. ParamNode can be parsed in the dynamic route matching section.
Static Route Matching
When matching, the actual request path is also split into a set of Components as described above. Starting at the root of the Trie, which must have only one child, move the pointer to its only child and move the pointer to traverse the Component back. Matches the child nodes of the next level based on the new Component traversed. Until the Component is traversed, if a matching child node can be found at the end, that node is a leaf node and its value is extracted and returned as the result. If the static Route does not match, the Route is not found in the case of static Route matching. However, the actual scenario is not so simple.
The code is as follows:
getHandlerFromTree(url: string) :any{
const [urlWithParams, _] = url.split('? ');
const urlComponents = urlWithParams.split('/').filter(Boolean);
let p = this.root;
let i = 0;
let res;
let path = ' ';
while (p) {
const component = urlComponents[i ++];
// If the quickMap has the component, return it if it's also a leaf node.
// Or just move to the next level and store the path.
if (p.quickMap.has(component)) {
constnode = p.quickMap.get(component)! ;if (isLeafNode(node)) {
res = node.value;
break;
}
path += '/' + node.value;
p = node;
continue;
}
const leafNode = p.quickMap.get(LEAF_SIGN);
if (leafNode == null) {
// If quickMap has other node, it means static route cannot be matched.
if (p.quickMap.size > 0) {
const err = { message: 'Route not defined'.statusCode: 404.statusMessage: 'Not found' };
throw err;
}
// Else it means no handler was defined.
const err = { message: 'Handler not defined'.statusCode: 500.statusMessage: 'Not found' };
throw err;
}
res = leafNode.value;
break;
}
return {
handler: res,
path
};
}
Copy the code
Dynamic Route Matching
We use dynamic routing declarations such as/API /user/:id when using Express or koa.js. This is a useful feature to see how to implement dynamic routing in a Trie tree based routing system.
Now we add a new route declaration:
/api/user/nameList
/api/user/addressList
/api/user/:id
The resulting structure looks like this:
graph TD
ROOT --> B
B['api'] --> C
C['user'] --> D['nameList']
C --> E['addressList']
C --> |paramNode| F[':id']
D --> G[handler1]
E --> H[handler2]
F --> I[handler3]
/ API /user/:id/API /user/:name/API /user/:name/API /user/:name That way, you don’t know what the corresponding Component in the path means when you actually match.
Component = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode = paramNode We categorize it directly as a dynamic parameter matching scenario.
Take a look at the code that adds dynamic parameter matching, omits the common part:
getHandlerFromTree(url: string) :any {
// ...
if (p.quickMap.has(component)) {
constnode = p.quickMap.get(component)! ;if (isLeafNode(node)) {
res = node.value;
break;
}
path += '/' + node.value;
p = node;
continue;
}
if (component) {
path += '/' + p.paramNode.value;
p = p.paramNode;
continue;
}
const leafNode = p.quickMap.get(LEAF_SIGN);
// ...
}
Copy the code
What if paramNode does not exist? Here’s the next scenario: regular expression matching.
Regular expression matching
Since regular expressions can be used to match strings directly, this route declaration would exist outside the data structure of the Trie tree, so we chose to store all such nodes below the root to avoid unnecessary lookups. If a node’s paramNode pointer does not exist, then the last option is to match the entire path with a regular expression. If there is still no match, we throw an exception that the route did not find, and the routing-dependent code handles the exception.
getHandlerFromTree(url: string) {
// ...
if (component) {
// If no parameter node found, try regular expression matching.
if(! p.paramNode) {const { handler, matched } = this.getHandlerFromRegExpNode(url);
res = handler
path = matched;
break;
}
path += '/' + p.paramNode.value;
p = p.paramNode;
continue;
}
const leafNode = p.quickMap.get(LEAF_SIGN);
// ...
}
Copy the code
Add a regular expression node to the tree:
addRegExpToTree(urlPattern: RegExp, handler: Function) {
const root = this.root;
root.children.push(new RouterRegExpLeafNode(urlPattern, handler));
}
Copy the code
Perform regular expression matching:
getHandlerFromRegExpNode(url: string) {
const regExpNode = this.root.children.filter(isRegExpNode).filter(node= > node.exp.test(url));
if (regExpNode.length === 0) {
const err = { message: 'Route not defined'.statusCode: 404.statusMessage: 'Not found' };
throw err;
}
// Take the first mathed regular expression as the result,
// No considering the case for multi regexp matched.
const node = regExpNode[0];
return {
handler: node.value,
matched: node.exp
};
}
Copy the code
The resulting structure looks like this:
graph TD
ROOT --> B
ROOT --> RegExp1
ROOT --> RegExp2...
RegExp1 --> handler1
RegExp2... --> handler2
B['api'] --> C
C['user'] --> D['nameList']
C --> E['addressList']
C --> |paramNode| F[':id']
D --> G[handler3]
E --> H[handler4]
F --> I[handler5]
conclusion
I had a question in mind at first, why not just store static routes in hash tables, and then use traversal lookup for dynamic routes? After that, I have my own understanding of this problem:
Using a hash table to store static routes, the lookup speed is constant, very fast, but requires linear space to store, and each static route needs to be stored in its entirety, so it wastes the common prefix feature. Secondly, the traversal matching method is too violent for dynamic routes. Dynamic routes cannot cache well, and route matching is a high frequency action, so the performance cost of this method is relatively large, and it is not appropriate.
Using Trie tree routing match to do is to compare the compromise solution, usually routing, the statement will be classified according to the module to do in the same primary module under the multiple secondary routing module, then each of the secondary module will have multiple triple module routing, can produce public prefix, gave a Trie tree the opportunity to save space, And the higher the repetition rate, the more space is saved (how does that sound like gzip compression?), the worst lookup efficiency of a Trie tree depends on the maximum length of the stored sequence, the maximum depth of the tree, which is a linear level of time complexity. There is another problem here. If all route declarations are scattered and there is no common prefix, assume that there are m records with the length of N. If there is no common prefix for each other, the spatial complexity of the Trie tree will reach O(MN).
See personal Github for the full code: Divasatanica – auf/ Core
Personal code design experience is less, there are mistakes big guys please point out.