preface
- In learning Java Web often configure XML files, and in the later framework learning will also encounter XML, so first to a simple learning of XML, after the encounter about XML new applications to supplement, this article is updated for a long time.
- Reference article: learning XML from scratch in java3y
- XML reference video: dark horse programmer learning www.bilibili.com/video/BV1P7…
What is XML
- XML is an extensible Markup language (
Extensible Markup Language
). Extensible means that its tags are custom. - Its main function is to make configuration files
The history of XML
- gml(1969)->sgml(1985)->html(1993)->xml(1998)
- 1969 GML (General Markup Language), a data specification whose main purpose is to communicate between different machines
- 1985 SGML (Standard Common Markup Language)
- HTML (Hypertext Markup Language, WWW web)
- XML Extensiable Markup Language
XML was originally developed as a replacement for HTML, because HTML code is very prescriptive (you can omit a lot of code when you write it, and the browser engine fills it in automatically. However, there is no unified specification between different browsers, resulting in the confusion of the code.) Due to market reasons and other reasons, it was not successful, and then tended to replace the configuration file properties of the code (properties is a configuration file, the main function is to modify the parameters in the code by modifying the configuration file. Realize the flexible parameter change without changing the file)
XML is better than properties and is widely used in configuration files.
Why use XML
- As mentioned earlier, used in project configuration files.
- Post an answer
- But there are other aspects as well. For example, data transfer between programs, acting as a mini-database.
- Data transmission between programs: data transmission between QQ, using XML format to transmit data, with good readability, maintainability
- Act as a small database: Our programs may use data that is often manually configured, and if reading it in a database is not appropriate (because of the added effort of maintaining the database), consider making a small database directly in XML. Reading files directly in this way is obviously faster than reading databases. MSN, for example, uses XML files to save users’ chat records.
Differences between XML and HTML
Much like HTML, XML is a markup language with many similarities in use, but also many differences. The differences are as follows.
- XML tags are custom, and HTML tags are predefined.
- XML syntax is strict, HTML syntax is loose (code can be omitted, incomplete).
- XML stores data, HTML presents it.
- XML does nothing compared to HTML, it is used to structure, store, and transfer information (as you can see from XML configuration files above, small databases, etc.)
XML syntax
1. Document declaration
- Format:
- The XML declaration is placed in the first line of XML
- Property list
- Version: Specifies the version number. The value is usually 1.0.
- Encoding: Indicates the encoding mode. Informs the parsing engine of the character set used in the current document. The default is ISO-8859-1, usually using UTF-8 or GBK
- Standalone: Whether or not you are independent. The value can be yes or no, indicating whether the XML document depends on other files, but today’s XML files do not use this attribute.
- Correct document properties
<? xml version="1.0" encoding="UTF-8"? >Copy the code
Elements of 2.
- Each XML document must have one and only one root element.
The root element is an element that includes all the other elements in the document and the start tag of the root element comes before the start tag of all the other elements and the end tag of the root element comes after the end tag of all the other elements
- Occurrences of whitespace and newlines in XML elements are treated as element content
<stu>xiaoming</stu>The code above and the code below represent different meanings<stu>
xiaoming
</stu>
Copy the code
- The element must be closed
- Case sensitivity
- You can’t start with a number
- Cannot cross nest
There are a lot of points to note here, but you don’t need to memorize all of them, just know that XML syntax is standard and can’t be arbitrarily written!
3. The attribute
- Attributes are part of an XML element. (ID is the attribute value)
<student id="100">
<name>Tom</name>
</student>
Copy the code
- Note here:
Attribute values are separated by double quotation marks (“) or single quotation marks (‘). If there are single quotation marks in attribute values, they are separated by double quotation marks. If there are double quotation marks, they are separated by single quotation marks. So what if property values have both single and double quotes? This uses entities (escape characters, similar to Spaces in HTML), and XML has five predefined entity characters
- Five predefined XML entity characters
4. Comment
- Same comments as HTML
<! -- Comment content -->
Copy the code
5. Processing instructions
- You can decorate an XML document with a CSS file, and when you open the XML file in a browser, you’ll see what it looks like, but not usually. Import CSS files to use.
Copy the code
6. CDATA
- When you write an XML file, some content may not be parsed by the parsing engine, but treated as raw content. In this case, you can put the content in the CDATA section, and the XML parser will not process the content in the CDATA section, but simply output it as it is
- grammar
<! [CDATA[... contents]]>Copy the code
Constraints and use of XML
- Because XML is a customizable language, it can be confusing when used by multiple people, so we need to constrain it and dictate its writing rules artificially.
- There are two types of constraints.
- DTD: A simple constraint technology (flawed, unable to constrain XML documents accurately)
- Schema: A complex constraint technique (mainly used for this)
-
DTD:
1. Introduce DTD documents into XML documents internal DTDS: Define constraint documents in XML documents.Copy the code
<! DOCTYPE Root label name SYSTEM"Location of DTD file">
Copy the code
External DTD: Define the constraint rules in an external DTD fileCopy the code
<! DOCTYPE Root label name PUBLIC"DTD file name" "Location URL of DTD file">
Copy the code
2. Some rules for DTD constraintsCopy the code
-
Schema import documents (automatically imported by the editor during development)
Fill in the root element of the XML document (the following operations are done inside the root element, adding attributes) 2. Introduce xsi prefixes: (there are many values)Copy the code
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Copy the code
3. Introduce XSD namespaces :(student.std represents Schema constraint documents for XML)Copy the code
xsi:schemaLocation="http://www.itcast.cn/xml student.xsd"
Copy the code
4. Declare a prefix for each XSD constraint as an identifier :(the following identifier is not declared before the equal sign after XMLNS)Copy the code
xmlns="http://www.itcast.cn/xml"
Copy the code
Parsing XML
- Because XML is designed to do “nothing,” XML is only used to organize and store data, and the other operations of data generation, reading, transmission, and so on are irrelevant to XML itself. So special techniques are needed to parse and use it.
- XML can be parsed in two ways.
- DOM: The markup language is loaded into the memory at a time, forming a DOM tree in the memory ——— Advantages: easy to operate, can perform all CRUD operations on the document ———- Disadvantages: occupies memory
- SAX: Read line by line, event-driven.
———- Advantages: Does not occupy memory.
---------- disadvantages: can only read, can not add, delete or changeCopy the code
-
In XML parsing, there is also a parser in addition to the parsing method, through which the parser adopts a specific parsing method to obtain the data in XML documents.
-
Common XML parsers are JAXP, DOM4j,Jsoup, and PULL.
- JAXP: A sun parser that supports both DOM and SAX. It comes with the JDK, but isn’t very effective.
- DOM4J: a very good parser
- Jsoup: Jsoup is a Java HTML parser that can directly parse a URL address, HTML text content. It provides a very labor-intensive API for retrieving and manipulating data using DOM, CSS, and jquery-like manipulation methods.
- PULL: a built-in parser of the Android operating system in SAX mode.
- Parsing of XML documents: The application that uses XML documents does not operate on the XML document directly, but retrives the content of the XML by performing SAX/DOM parsing by the XML parser.
- See this article for details
Mp.weixin.qq.com/s?__biz=MzI…