Introduction to Dom4j usage

    • A list,
    • 1.1 Reading and parsing XML documents
    • 1.2. Obtain Root node
    • 1.3. Traverse XML tree
    • 1.4. The XPath support
    • 1.5. Conversion of strings to XML
    • 1.6 Transform XML with XSLT
    • 1.7 create XML
    • 1.8. File output
  • Parsing XML and Chinese with Dom4j
    • 2.1. Download and Install
    • 2.2. Sample XML document (Holen.xml)
    • 2.3. Create an XML document
    • 2.4. Modifying XML documents
    • 2.5. Format the output and specify the encoding
    • 2.6. Method to import

A list,

DOM4J is an open source XML parsing package from dom4j.org, as defined on its website: Dom4j is an easy to use, open source library for working with XML, XPath and XSLT on the Java platform using the Java Collections Framework and with full support for DOM, SAX and JAXP.

Dom4j is an easy-to-use, open source library for XML, XPath, and XSLT. It works on the Java platform, using the Java Collections framework and fully supporting DOM, SAX, and JAXP.

DOM4J is simple to use. As long as you know the basic XML-DOM model, you can use it. His own guide, however, is only one page (HTML) long, but it’s full. There are few Chinese materials in China. So I wrote this short tutorial for your convenience, this article only covers the basic usage, for further use, please… Do your own research or look up other information.

A previous article on the IBM Developer community (see appendix) mentioned performance comparisons of some XML parsing packages, of which DOM4J performed very well, coming out on top in a number of tests. (In fact, DOM4J’s official documentation also references this comparison.) So I used DOM4J as an XML parsing tool for this project.

The most popular parser in The country is JDOM, and both have their strengths, but DOM4J’s biggest feature is its use of a large number of interfaces, which is the main reason it is considered more flexible than JDOM. The master said, “interface programming.” More and more people are using DOM4J. If you’re good with JDOM, go ahead and read this article for comparison. If you’re going to use a parser, DOM4J is the best option.

Its main interfaces are defined in the package org.dom4j:

interface java.lang.Cloneable

interface org.dom4j.Node

       interface org.dom4j.Attribute

       interface org.dom4j.Branch

              interface org.dom4j.Document

              interface org.dom4j.Element

       interface org.dom4j.CharacterData

              interface org.dom4j.CDATA

              interface org.dom4j.Comment

              interface org.dom4j.Text

       interface org.dom4j.DocumentType

       interface org.dom4j.Entity

       interface org.dom4j.ProcessingInstruction
Copy the code

At a glance, a lot of things are clear. Most of these are inherited by Node. Knowing these relationships, you can write programs in the future without classcastExceptions.

Here are some examples (some from the documentation that comes with DOM4J) to briefly explain how to use them.

1.1 Reading and parsing XML documents

Reading and writing XML documents relies heavily on the org.dom4j. IO package, which provides DOMReader and SAXReader classes that are called in the same way. That’s the benefit of relying on interfaces.

// Read the XML from the file, enter the file name, and return the XML document
public Document read(String fileName) throws MalformedURLException, DocumentException {
   SAXReader reader = new SAXReader();
   Document document = reader.read(new File(fileName));
   return document;
}
Copy the code

The read method of reader is overloaded and can be read from InputStream, File, Url, etc. The resulting Document object takes the entire XML with it. Based on my own experience, the character encodings read are converted according to the encodings defined in the XML file header. If you encounter garbled characters, be sure to keep the code names consistent throughout.

1.2. Obtain Root node

The second step after reading is to get the Root node. Anyone familiar with XML knows that all XML analysis begins with the Root element.

   public Element getRootElement(Document doc){
       return doc.getRootElement();
    }
Copy the code

1.3. Traverse XML tree

DOM4J provides at least three methods for traversing a node: 1) enumeration (Iterator)

// Enumerate all child nodes
for ( Iterator i = root.elementIterator(); i.hasNext(); ) {
   Element element = (Element) i.next();
   // do something
}
// Enumerate the node named foo
for ( Iterator i = root.elementIterator(foo); i.hasNext();) {
   Element foo = (Element) i.next();
   // do something
}
// Enumerate properties
for ( Iterator i = root.attributeIterator(); i.hasNext(); ) {
   Attribute attribute = (Attribute) i.next();
   // do something
}
Copy the code

2) recursion

Recursion could also use Iterator as an enumeration, but the documentation provides an alternative

public void treeWalk(a) {
       treeWalk(getRootElement());
    }
    public void treeWalk(Element element) {
       for (int i = 0, size = element.nodeCount(); i < size; i++)     {
           Node node = element.node(i);
           if (node instanceof Element) {
              treeWalk((Element) node);
           } else { // do something....}}}Copy the code

3) the Visitor pattern

Most exciting of all is DOM4J’s support for Visitor, which can greatly reduce the amount of code and make it easy to understand. Anyone who knows design patterns knows that Visitor is one of the GOF design patterns. The principle is that the two types keep references to each other, and one acts as a Visitor to visit many Visitable. Looking at the Visitor pattern in DOM4J (not provided in the quick documentation), you just need to customize a class that implements the Visitor interface.

public class MyVisitor extends VisitorSupport {
	public void visit(Element element){
	    System.out.println(element.getName());
	}
	public void visit(Attribute attr){ System.out.println(attr.getName()); }}Copy the code

Call: root.accept(new MyVisitor()) The Visitor interface provides multiple overloads of Visit(), which will be accessed in different ways depending on the object in the XML. The above is a simple implementation of Element and Attribute, which are commonly used. VisitorSupport is the Default Adapter provided by DOM4J, the Default Adapter pattern for the Visitor interface, which gives empty implementations of various VISIT (*) to simplify the code. Notice that the Visitor automatically traverses all the child nodes. If it is root.accept(MyVisitor), the child nodes are traversed. The first time I used it, I thought I needed to traverse it myself, so I called the Visitor recursively, with predictable results.

1.4. The XPath support

DOM4J has good support for XPath, such as access to a node, which can be selected directly with XPath.

   public void bar(Document document) {
        List list = document.selectNodes( //foo/bar );
        Node node = document.selectSingleNode(//foo/bar/author);
        String name = node.valueOf( @name );
     }
Copy the code

For example, if you wanted to find all the hyperlinks in an XHTML document, the following code would do it:

public void findLinks(Document document) throws DocumentException {
    List list = document.selectNodes( //a/@href );
    for(Iterator iter = list.iterator(); iter.hasNext(); ) { Attribute attribute = (Attribute) iter.next(); String url = attribute.getValue(); }}Copy the code

1.5. Conversion of strings to XML

Sometimes it’s very common to convert strings to XML or vice versa,

    // XML to stringDocument document = ... ; String text = document.asXML();// String to XML
    String text = James ;
    Document document = DocumentHelper.parseText(text);
Copy the code

1.6 Transform XML with XSLT

 public Document styleDocument( Document document, String stylesheet ) throws Exception {
    // load the transformer using JAXP
    TransformerFactory factory = TransformerFactory.newInstance();
    Transformer transformer = factory.newTransformer(
       new StreamSource( stylesheet )
    );
    // now lets style the given document
    DocumentSource source = new DocumentSource( document );
    DocumentResult result = new DocumentResult();
    transformer.transform( source, result );
    // return the transformed document
    Document transformedDoc = result.getDocument();
    return transformedDoc;
}
Copy the code

1.7 create XML

Creating XML is usually a pre-writing process, which is as easy as StringBuffer.

  public Document createDocument(a) {
       Document document = DocumentHelper.createDocument();
       Element root = document.addElement(root);
       Element author1 =
           root
              .addElement(author)
              .addAttribute(name, James)
              .addAttribute(location, UK)
              .addText(James Strachan);
       Element author2 =
           root
              .addElement(author)
              .addAttribute(name, Bob)
              .addAttribute(location, US)
              .addText(Bob McWhirter);
       return document;
    }
Copy the code

1.8. File output

A simple way to output is to print a Document or any Node through the write method

	FileWriter out = new FileWriter( foo.xml );
    document.write(out);
Copy the code

If you want to change the format of the output, such as beautifying it or reducing it, you can use the XMLWriter class

public void write(Document document) throws IOException {
   // Specify a file
   XMLWriter writer = new XMLWriter(
       new FileWriter( output.xml )
   );
   writer.write( document );
   writer.close();
   // beautify the format
   OutputFormat format = OutputFormat.createPrettyPrint();
   writer = new XMLWriter( System.out, format );
   writer.write( document );
   // Reduce the format
   format = OutputFormat.createCompactFormat();
   writer = new XMLWriter( System.out, format );
   writer.write( document );
}
Copy the code

DOM4J is simple enough, of course, there are some complex applications, such as ElementHandler, not mentioned. If you’re tempted, together with DOM4J. DOM4J’s official website www.dom4j.org/ : (I even don’t) DOM4J download (SourceForge), the latest version 1.4 sourceforge.net/projects/do…

Parsing XML and Chinese with Dom4j

This article discusses the basics of XML parsing with DOM4J, including creating XML documents, adding, modifying, and removing nodes, and formatting (beautifying) output and Chinese. Serves as an introduction to DOM4J. Reprinted from :jalorsoft.com/holen/ Author: Chen Guang ([email protected]) Time: 2004-09-11

This article focuses on the basics of XML parsing with DOM4J, including creating XML documents, adding, modifying, and removing nodes, and formatting (beautifying) output and Chinese. Serves as an introduction to DOM4J.

2.1. Download and Install

Dom4j is an open source project at Sourceforge.net for parsing XML. Since the first release in July 2001, several versions have been released, with the current highest version being 1.5. Developed specifically for Java, dom4J is simple and intuitive to use, and it is rapidly gaining popularity in the Java world.

Can download the latest version at http://sourceforge.net/projects/dom4j.

The full version of dom4j1.5 is about 13M, which is a compressed package named dom4j-1.5.zip. After decompression, there is a dom4j-1.5.jar file, which is required for application. There is also a jaxEN-1.1-beta-4. Otherwise execution time may be behind the Java. Lang. NoClassDefFoundError: org/jaxen/JaxenException anomalies, can choose to use the other bag.

2.2. Sample XML document (Holen.xml)

For the sake of presentation, we’ll start with an XML document and base our operations on that document.

holen.xml


      
    <! --This is a test for dom4j, Holen, 2004.9.11-->
Copy the code

This is a very simple XML document. The scene is an online bookstore. There are many books.

2.3. Create an XML document

   /** * Creates an XML document whose name is determined by the input attribute *@paramFilename specifies the filename to be created. *@returnTable 0 failed, table 1 succeeded */
    public int createXMLFile(String filename){
       /** return the operation result, 0 table failure, 1 table success */
       int returnValue = 0;
       /** Create the document object */
       Document document = DocumentHelper.createDocument();
       /** Create the root books of the XML document */
       Element booksElement = document.addElement("books");
       /** Add a line of comment */
       booksElement.addComment("This is a test for dom4j, holen, 2004.9.11");
       /** Adds the first book node */
       Element bookElement = booksElement.addElement("book");
       /** Add the show attribute */
       bookElement.addAttribute("show"."yes");
       /** Add the title node */
       Element titleElement = bookElement.addElement("title");
       /** Sets the content for title */
       titleElement.setText("Dom4j Tutorials");
      
       /** Similar to finish after two book */
       bookElement = booksElement.addElement("book");
       bookElement.addAttribute("show"."yes");
       titleElement = bookElement.addElement("title");
       titleElement.setText("Lucene Studing");
       bookElement = booksElement.addElement("book");
       bookElement.addAttribute("show"."no");
       titleElement = bookElement.addElement("title");
       titleElement.setText("Lucene in Action");
      
       /** Adds the owner node */
       Element ownerElement = booksElement.addElement("owner");
       ownerElement.setText("O'Reilly");
      
       try{
           /** Write the contents of the document to the file */
           XMLWriter writer = new XMLWriter(new FileWriter(new File(filename)));
           writer.write(document);
           writer.close();
           /** If the command is executed successfully, 1 */ is returned
           returnValue = 1;
       }catch(Exception ex){
           ex.printStackTrace();
       }
             
       return returnValue;
    }
Copy the code

Description:

Document document = DocumentHelper.createDocument();
Copy the code

Define an XML document object with this sentence.

Element booksElement = document.addElement("books");
Copy the code

This defines an XML element, where the root node is added. Element has several important methods:

  • AddComment: Adds a comment
  • AddAttribute: Add attributes
  • AddElement: Adds child elements

The default format of the generated XML file is messy. You can format the output using the createCompactFormat() or createPrettyPrint() method of the OutputFormat class. The createCompactFormat() method is used by default, and the display is compact, as discussed in more detail later.

The generated holen. XML file contains the following contents:


      
<! --This is a test for dom4j, Holen, 2004.9.11-->
Copy the code

2.4. Modifying XML documents

There are three modification tasks, in order:

  1. If the show property of the book node is yes, change it to no
  2. Change the owner entry to Tshinghua and add the date node
  3. If the title content is Dom4j Tutorials, delete the node
   /** * How to add nodes, modify nodes, delete nodes in dom4j@paramFilename Modifies the object file *@paramNewfilename is saved as the file *@returnTable 0 failed, table 1 succeeded */
    public int ModiXMLFile(String filename,String newfilename){
       int returnValue = 0;
       try{
           SAXReader saxReader = new SAXReader();
           Document document = saxReader.read(new File(filename));
           /** If the value of show property in book is yes, change it to no */
           /** Use xpath to find the object */
           List list = document.selectNodes("/books/book/@show" );
           Iterator iter = list.iterator();
           while(iter.hasNext()){
              Attribute attribute = (Attribute)iter.next();
              if(attribute.getValue().equals("yes")){
                  attribute.setValue("no"); }}/** * change owner to Tshinghua * and add date to owner, the content of date is 2004-09-11, and add type */ to date
           list = document.selectNodes("/books/owner" );
           iter = list.iterator();
           if(iter.hasNext()){
              Element ownerElement = (Element)iter.next();
              ownerElement.setText("Tshinghua");
              Element dateElement = ownerElement.addElement("date");
              dateElement.setText("2004-09-11");
              dateElement.addAttribute("type"."Gregorian calendar");
           }
          
           /** If title is Dom4j Tutorials, delete the node */
           list = document.selectNodes("/books/book");
           iter = list.iterator();
           while(iter.hasNext()){
              Element bookElement = (Element)iter.next();
              Iterator iterator = bookElement.elementIterator("title");
              while(iterator.hasNext()){
                  Element titleElement=(Element)iterator.next();
                  if(titleElement.getText().equals("Dom4j Tutorials")){ bookElement.remove(titleElement); }}}try{
              /** Write the contents of the document to the file */
              XMLWriter writer = new XMLWriter(new FileWriter(new File(newfilename)));
              writer.write(document);
              writer.close();
              /** If the command is executed successfully, 1 */ is returned
              returnValue = 1;
           }catch(Exception ex){ ex.printStackTrace(); }}catch(Exception ex){
           ex.printStackTrace();
       }
       return returnValue;
    }
Copy the code

Description:

List list = document.selectNodes("/books/book/@show" );
list = document.selectNodes("/books/book");
Copy the code

The above code looks it up in xpath.

Modify node content by setValue() and setText().

Remove nodes or properties by removing ().

2.5. Format the output and specify the encoding

The default output mode is compact, and the default encoding is UTF-8, but for our application, we usually need to use Chinese, and want to display in automatic indentation mode, which requires the OutputFormat class.

/** * Format XML documents and solve Chinese problems *@param filename
 * @return* /
public int formatXMLFile(String filename){
   int returnValue = 0;
   try{
       SAXReader saxReader = new SAXReader();
       Document document = saxReader.read(new File(filename));
       XMLWriter writer = null;
       /** Format output, type IE browsing the same */
       OutputFormat format = OutputFormat.createPrettyPrint();
       /** Specifies the XML encoding */
       format.setEncoding("GBK");
       writer= new XMLWriter(new FileWriter(new File(filename)),format);
       writer.write(document);
       writer.close();     
       /** If the command is executed successfully, 1 */ is returned
       returnValue = 1;    
   }catch(Exception ex){
       ex.printStackTrace();
   }
   return returnValue;
}
Copy the code

Description:

OutputFormat format = OutputFormat.createPrettyPrint();
Copy the code

If the format is indent, it is not compact.

format.setEncoding("GBK");
Copy the code

Specify GBK as the encoding.

XMLWriter writer = new XMLWriter(new FileWriter(new File(filename)),format);
Copy the code

This adds an OutputFormat object compared to the previous two methods, which specifies how to display and encode.

2.6. Method to import

The methods mentioned above are all piecemeal. The packages that need to be imported are as follows:

//package com.holen.dom4j;
 
import java.io.File;
import java.io.FileWriter;
import java.util.Iterator;
import java.util.List;
 
import org.dom4j.Attribute;
import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;
Copy the code