Abstract: GaussDB(DWS) supports XML data types and rich XML parsing functions, which can manage the mapping between relational data and XML data.
Summary of XML
XML, short for eXtensible Markup Language, can describe very complex data structures and is widely used to transfer and store data. XML is a markup language similar to HTML, but XML does not use predefined markup. Markup can be defined according to application requirements. The basic format of XML is standardized, enabling data sharing between heterogeneous systems across platforms, operating systems, and applications.
XML data type
GaussDB(DWS) supports storing XML documents in the XML data type column of the database. The advantage of storing data by XML data type over text is that it has the function of checking data structure to ensure correct structure, and supports XML data parsing and processing functions.
The criteria for determining the correctness of an XML document are:
- The document must be a well-formed document.
- The document follows all the syntax rules of XML and is valid.
- Documents follow semantically specific rules, often laid down in XML or DTD specifications
XML can store well-formed documents defined by the XML standard, as well as content fragments defined by the XML standard, which can have multiple top-level elements or character nodes.
Here is an example of a well-formed XML document:
<? The XML version = "1.0" encoding = "utf-8"? > <message> Hello GaussDB(DWS) </message>Copy the code
A well-formed document can be inserted into an XML column using the INSERT SQL statement, and if the document is parsed successfully, it is well formed. The XML document is validated against configuration parameters before an insert or update operation is performed.
XML data in an application typically takes its serialized string format, and when data is inserted into an XML column, it must be converted to an XML hierarchical format. Therefore, the XMLPARSE function can be explicitly called at insert time to convert data from its serialized string format to XML hierarchical format.
test=# SELECT XMLPARSE(document '<root>GAUSSDB(DWS)</root>');
xmlparse
---------------------------
<root>GAUSSDB(DWS)</root>
(1 row)
Copy the code
To access the XML value
When accessing and processing XML data, XML data types do not provide comparison operators and cannot be compared directly with strings because XML data is not represented as a string within the database. The result is that rows cannot be retrieved by comparing XML values to search values, so XML data should be accompanied by an ID value to retrieve the data.
Using the XMLSERIALIZE function, you can transform AN XML value into a serialized string value representing an XML document.
test=# SELECT XMLSERIALIZE(document '<root>GAUSSDB(DWS)</root>' AS TEXT);
xmlserialize
---------------------------
<root>GAUSSDB(DWS)</root>
(1 row)
Copy the code
XML parsing function
GaussDB(DWS) supports more than 30 XML parsing functions, including parsing XML data, generating XML content, XML predicates, XML parameter Settings, and mapping data to XML.
A function that processes XML data
- Xpath evaluates the result of an xpath expression against an XML value
- Xmltable parses XML data as XPath expressions
A function that generates XML content
- Xmlparse character data is converted to a value of type XML
- Xmlserialize Converts the XML type to a string
- Xmlcomment creates a value that contains the specific text content of the XML comment
- Xmlconcat concates a separate list of XML values to create a single value containing a fragment of XML content
- Xmlelement generates an XML element with a given name, attributes, and content.
- Xmlforest generates an XML forest (sequence) element with the specified name and content
- Xmlpi creates an XML processing instruction
- Xmlroot changes the root node attribute of an XML value
- Xmlagg aggregate function, which joins the input value of the aggregate function call
XML predicate function
- IS DOCUMENT determines whether the XML value IS a DOCUMENT
- IS NOT DOCUMENT Determines whether the XML value IS a DOCUMENT
- Xmlexists determines whether the XPath expression returns any nodes
- Xpath_exists determines whether an XPath expression returns any nodes
- Xml_is_well_formed Checks whether the string is well-formed XML
- Xml_is_well_formed_document Checks that the string is a well-formed XML document
- Xml_is_well_formed_content Checks that the string is well-formed XML content
XML parameter Settings
- SET XML OPTION Sets the XML format
- SET XMLBINARY TO Sets the encoding format of binary values in XML
Functions that map tables, queries, cursors, and databases to XML
- Table_to_xml, query_to_XML, cursor_to_XML, database_to_XML, etc.
For details about how to use each function, see the GaussDB(DWS) user manual. The following describes the XMLTABLE function that parses XML data in applications.
Overview of the XMLTABLE function
The XMLTABLE function parses XML data as aN XPath expression and returns the data by generating a table with defined columns that can contain any SQL data type, including XML.
The XMLTABLE function supports passing XML data from a table or a SELECT query as a variable to a specified XPath expression. The result of parsing the XML data through the XPath expression is used to generate column values in the table. The structure of the generated table is defined by the COLUMNS clause of THE XMLTABLE, which can be characterized by specifying the column name, data type, and how the column values are generated.
To demonstrate the use of the XMLTABLE function, first create the CUSTOMER table and insert the XML data containing the CUSTOMER information.
CREATE TABLE CUSTOMER AS SELECT 1 AS ID,
XML $$
<ROWS>
<ROW ID="1">
<CUSTOMER_NAME>Tony</CUSTOMER_NAME>
<PHONENUM>123-456-666</PHONENUM>
</ROW>
<ROW ID="2">
<CUSTOMER_NAME>Serena</CUSTOMER_NAME>
<PHONENUM>123-456-888</PHONENUM>
</ROW>
<ROW ID="3">
<CUSTOMER_NAME>Tina</CUSTOMER_NAME>
<PHONENUM>123-456-999</PHONENUM>
</ROW>
</ROWS>
$$ AS INFO;
Copy the code
The INFO column of the CUSTOMER table is parsed in the XMLTABLE function with the following SELECT statement.
SELECT XMLTABLE.* FROM CUSTOMER,
XMLTABLE('//ROWS/ROW'
PASSING INFO
COLUMNS ID INT PATH '@ID',
NAME VARCHAR(64) PATH 'CUSTOMER_NAME',
PHONENUM TEXT PATH 'PHONENUM');
id | name | phonenum
----+--------+-------------
1 | Tony | 123-456-666
2 | Serena | 123-456-888
3 | Tina | 123-456-999
(3 rows)
Copy the code
On GaussDB(DWS), the XMLTABLE function supports a STREAM query plan that is pushed down to the data node DN. XML data is parsed on the data node to generate the XMLTABLE result table, and the results are summarized on the coordination node CN by the GATHER STREAM. STREAM plans that can be pushed down to DN have better query performance.
XMLTABLE application case
In business scenarios, XML documents often need to be parsed. The XMLTABLE function can be used to quickly and conveniently parse XML data and return the required data in the form of a table for further query and analysis.
Here is an example of the process of parsing message data:
1. Create a table to store message data and insert data.
CREATE TABLE MSGS AS SELECT 1 AS ID, 'A,BB,CCC,DDDD,EEEEE' AS MSG;
Copy the code
2. Convert message text data to XML data using the XMLPARSE function.
test=# SELECT XMLPARSE(content '<r><c>' || REPLACE(MSG, ',', '</c><c>') ||'</c></r>') AS XML_MSG FROM MSGS;
xml_msg
-----------------------------------------------------------
<r><c>A</c><c>BB</c><c>CCC</c><c>DDDD</c><c>EEEEE</c></r>
(1 row)
Copy the code
3. Use the XMLTABLE function to parse XML data and return message contents one by one.
test=# SELECT MSG_CONTENT FROM
test-# (SELECT XMLPARSE(content '<r><c>' || REPLACE(MSG, ',', '</c><c>') ||'</c></r>') AS XML_MSG FROM MSGS),
test-# XMLTABLE('/r/c/text()' PASSING XML_MSG COLUMNS MSG_CONTENT VARCHAR(4000) PATH '.');
msg_content
-------------
A
BB
CCC
DDDD
EEEEE
(5 rows)
Copy the code
As you can see in the figure above, the parsed data is returned in the form of a table on which you can sort and filter data. GaussDB(DWS) also supports XML data processing in functions and stored procedures, facilitating application development.
conclusion
GaussDB(DWS) supports XML data types and rich XML parsing functions. In addition, the distributed architecture based on Shared Nothing has good parallel processing and expansion capabilities. XML data parsing tasks can be pushed down to data nodes for parallel processing. It can fully meet the requirements of XML data parsing in applications.
This article is published in GaussDB(DWS) XML Data Processing Practice by Huawei cloud community.
Click to follow, the first time to learn about Huawei cloud fresh technology ~