Use LTREE to process hierarchical data in PostgreSQL

By Alexey Vasiliev

Shi Yanliang, working in Hangao Basic Software Co., LTD., PostgreSQL database technology enthusiast, PostgreSQL ACE, PGCM, 10G & 11G OCM, OGG certification expert.

The original address: leopard. In. Ua / 2013/09/02 /…

In this article, you’ll learn how to use PostgreSQL’s LTREE module, which allows you to store data in a hierarchical tree structure.

What is LTREE?

Ltree is a PostgreSQL module. It implements a data type, LTREE, used to represent labels for data stored in a hierarchical tree structure. Provides extensive tools for searching tag trees.

Why ltREE?

Ltree implements a materialized path that is fast for INSERT/UPDATE/DELETE and fast for SELECT operations
In general, it is faster than using recursive Ctes or recursive functions that often require recalculation of branches
Such as built-in query syntax and operators dedicated to query and navigation trees
Index!!

The initial data

First, you should enable extensions in the database. You can perform this operation by running the following command:

CREATE EXTENSION ltree;
Copy the code

Let’s create a table and add some data to it:

CREATE TABLE comments (user_id integer, description text, path ltree); INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001'); INSERT INTO comments (user_id, description, path) VALUES (2, MD5 (random()::text), '0001.0001.0001'); INSERT INTO comments (user_id, description, path) VALUES (2, MD5 (random()::text), '0001.0001.0001.0001'); INSERT INTO comments (user_id, description, path) VALUES (2, MD5 (random()::text), '0001.0001.0001.0001'); INSERT INTO comments (user_id, description, path) VALUES (1, MD5 (random()::text), '0001.0001.0001.0002'); INSERT INTO comments (user_id, description, path) VALUES (1, MD5 (random()::text), '0001.0001.0001.0002'); INSERT INTO comments (user_id, description, path) VALUES (5, MD5 (random()::text), '0001.0001.0001.0001.0003 '); INSERT INTO comments (user_id, description, path) VALUES (5, MD5 (random()::text),' 0001.0001.0001.0001.0003 '); INSERT INTO comments (user_id, description, path) VALUES (6, MD5 (random()::text), '0001.0002'); INSERT INTO comments (user_id, description, path) VALUES (6, MD5 (random()::text), '0001.0002'); INSERT INTO comments (user_id, description, path) VALUES (6, MD5 (random()::text), '0001.0002.0001'); INSERT INTO comments (user_id, description, path) VALUES (6, MD5 (random()::text), '0001.0002.0001'); INSERT INTO comments (user_id, description, path) VALUES (6, MD5 (random()::text), '0001.0003'); INSERT INTO comments (user_id, description, path) VALUES (6, MD5 (random()::text), '0001.0003'); INSERT INTO comments (user_id, description, path) VALUES (8, MD5 (random()::text), '0001.0003.0001'); INSERT INTO comments (user_id, description, path) VALUES (8, MD5 (random()::text), '0001.0003.0001'); INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0002'); INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0002'); INSERT INTO comments (user_id, description, path) VALUES (11, MD5 (random()::text), '0001.0003.0002.0001'); INSERT INTO comments (user_id, description, path) VALUES (11, MD5 (random()::text), '0001.0003.0002.0001'); INSERT INTO comments (user_id, description, path) VALUES (2, MD5 (random()::text), '0001.0003.0002.0002'); INSERT INTO comments (user_id, description, path) VALUES (5, MD5 (random()::text), '0001.0003.0002.0003'); INSERT INTO comments (user_id, description, path) VALUES (7, MD5 (random()::text), '0001.0003.0002.0002.0001'); INSERT INTO comments (user_id, description, path) VALUES (7, MD5 (random()::text), '0001.0003.0002.0002.0001'); INSERT INTO comments (user_id, description, path) VALUES (20, MD5 (random()::text), '0001.0003.0002.0002.0002'); INSERT INTO comments (user_id, description, path) VALUES (31, MD5 (random()::text), '0001.0003.0002.0002.0003'); INSERT INTO comments (user_id, description, path) VALUES (31, MD5 (random()::text), '0001.0003.0002.0002.0003'); INSERT INTO comments (user_id, description, path) VALUES (22, MD5 (random()::text), '0001.0003.0002.0002.0004'); INSERT INTO comments (user_id, description, path) VALUES (22, MD5 (random()::text), '0001.0003.0002.0002.0004'); INSERT INTO comments (user_id, description, path) VALUES (34, MD5 (random()::text), '0001.0003.0002.0002.0005'); INSERT INTO comments (user_id, description, path) VALUES (34, MD5 (random()::text), '0001.0003.0002.0002.0005'); INSERT INTO comments (user_id, description, path) VALUES (22, MD5 (random()::text), '0001.0003.0002.0002.0006'); INSERT INTO comments (user_id, description, path) VALUES (22, MD5 (random()::text), '0001.0003.0002.0002.0006');Copy the code

Also, we should add some indexes:

CREATE INDEX path_gist_comments_idx ON comments USING GIST(path);
CREATE INDEX path_comments_idx ON comments USING btree(path);
Copy the code

As you can see, I set up the Comments table with a PATH field that contains all the tree paths for the table. As you can see, FOR the tree delimiter, I use four numbers and dots.

Let’s find the commenets table with path as’ 0001.0003 ‘:

$SELECT user_id, path FROM comments WHERE path <@ '0001.0003'; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6 0001.0003 8 | | 0001.0003.0001 9 | | 0001.0003.0002 11 7 | 0001.0003.0002.0001 2 | | 0001.0003.0002.0002 5 0001.0003.0002.0003 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | | 0001.0003.0002.0002.0003 22 0001.0003.0002.0002.0004 34 | | 0001.0003.0002.0002.0005 22 0001.0003.0002.0002.0006 (12 rows)Copy the code

Let’s examine this SQL with the EXPLAIN command:

$EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003'; QUERY PLAN ---------------------------------------------------------------------------------------------------- Seq Scan On comments (cost = 0.00.. Rows =2 width= 50) (actual time=0.013.. Loops =1) Filter: (path <@ '0001.0003'::ltree) Rows Removed by Filter: 7 Total Runtime: 0.037ms (4 rows)Copy the code

Let’s disable seQ Scan for testing:

$ SET enable_seqscan=false; SET $EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ ----------- Index Scan using path_gist_comments_idx on comments (cost=0.00.. Rows =2 width=38) (actual time=0.023.. Loops =1) Index Cond: (path <@ '0001.0003'::ltree) Total Runtime: 0.0004ms (3 rows)Copy the code

Now SQL is slow, but you can see how SQL uses index. The first SQL statement uses sequence Scan because there is not much data in the table.

Select path <@ ‘0001.0003’;

$SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*'; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6 0001.0003 8 | | 0001.0003.0001 9 | | 0001.0003.0002 11 7 | 0001.0003.0002.0001 2 | | 0001.0003.0002.0002 5 0001.0003.0002.0003 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | | 0001.0003.0002.0002.0003 22 0001.0003.0002.0002.0004 34 | | 0001.0003.0002.0002.0005 22 0001.0003.0002.0002.0006 (12 rows)Copy the code

You should never forget the order of the data, as in the following example:

$INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0001.0001'); $INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0001.0001'); $INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0001.0002'); $INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0001.0002'); $INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0001.0003'); $INSERT INTO comments (user_id, description, path) VALUES (9, MD5 (random()::text), '0001.0003.0001.0003'); $SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*'; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6 0001.0003 8 | | 0001.0003.0001 9 | | 0001.0003.0002 11 7 | 0001.0003.0002.0001 2 | | 0001.0003.0002.0002 5 0001.0003.0002.0003 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | | 0001.0003.0002.0002.0003 22 0001.0003.0002.0002.0004 34 | | 0001.0003.0002.0002.0005 22 0001.0003.0002.0002.0006 9 | | 0001.0003.0001.0001 9 0001.0003.0001.0002 9 | 0001.0003.0001.0003 (15 rows)Copy the code

Now sort:

$SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6 0001.0003 8 | | 0001.0003.0001 9 | | 0001.0003.0001.0001 9 0001.0003.0001.0002 9 | | 0001.0003.0001.0003 9 0001.0003.0002 11 | 2 | 0001.0003.0002.0001 0001.0003.0002.0002 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | | 0001.0003.0002.0002.0003 22 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | | 0001.0003.0002.0002.0006 5 0001.0003.0002.0003 (15 rows)Copy the code

Several modifiers can be added to the end of an lQuery’s non-asterisk tag to make it a better match than a full match:

* “@” – case insensitive matching, for example a @ matches a

“*” – matches any label with that prefix, such as foo * matches foobar

“%” – matches words that begin with an underscore

$SELECT user_id, path FROM comments WHERE the path ~ '0001. * {1, 2} in 0001 | 0002. *' ORDER by path; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 2 | 2 | 0001.0001.0001 0001.0001.0001.0001 1 | 0001.0001.0001.0002 5 8 | | | 0001.0001.0001.0003 6 0001.0002.0001 0001.0003.0001 9 | | 0001.0003.0001.0001 9 0001.0003.0001.0002 | 9 0001.0003.0001.0003 9 | 0001.0003.0002 11 | 2 | 0001.0003.0002.0001 0001.0003.0002.0002 7 | 0001.0003.0002.0002.0001 20 34 | | 0001.0003.0002.0002.0002 31 | | 0001.0003.0002.0002.0003 22 0001.0003.0002.0002.0004 0001.0003.0002.0002.0005 22 5 | | 0001.0003.0002.0002.0006 0001.0003.0002.0003 rows (19)Copy the code

Find all direct childrens for parent ‘0001.0003’, see below:

$SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*{1}' ORDER by path; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 8 | 0001.0003.0001 9 | 0001.0003.0002 (2 rows)Copy the code

Find all childrens for parent ‘0001.0003’ as follows:

$SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6 0001.0003 8 | | 0001.0003.0001 9 | | 0001.0003.0001.0001 9 0001.0003.0001.0002 9 | | 0001.0003.0001.0003 9 0001.0003.0002 11 | 2 | 0001.0003.0002.0001 0001.0003.0002.0002 7 | 0001.0003.0002.0002.0001 20 | 0001.0003.0002.0002.0002 31 | | 0001.0003.0002.0002.0003 22 0001.0003.0002.0002.0004 34 | 0001.0003.0002.0002.0005 22 | | 0001.0003.0002.0002.0006 5 0001.0003.0002.0003 (15 rows)Copy the code

Find parent for children ‘0001.0003.0002.0002.0005’ :

$SELECT user_id, path FROM comments WHERE path = subpath('0001.0003.0002.0002.0005', 0, -1) ORDER by path; User_id | path -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 2 | 0001.0003.0002.0002 row (1)Copy the code

If your path is not unique, you will get multiple records.

An overview of the

As you can see, the materialization path using LTREE is very simple. In this article, I have not listed all possible uses of LTREE. It is not considered a full-text search problem ltXtQuery. But you can be in PostgreSQL official documentation (www.postgresql.org/docs/curren)… Find it in.

For more PostgreSQL news and events, please visit the Official PostgreSQL China website:

www.postgresqlchina.com

For more information about PostgreSQL, please visit the Official Chinese PostgreSQL Q&A community:

www.pgfans.cn

For more information about PostgreSQL, please visit the Official PostgreSQL download website in China:

Use LTREE to process hierarchical data in PostgreSQL

What is LTREE?

Why ltREE?

The initial data

An overview of the

Related Posts

PHP connects to the MySQL database application

Don’t worry about the interview, micro service distributed transaction XA model consistency selection not worry | 🏆 technology project stage v essay…

Huawei cloud native media network, upgrade the tradition, empower the future