Chapter 27 Custom SAX Parsers perform custom entity parsing

Perform custom entity resolution

XML documents may contain references to external DTDS or other entities. By default, InterSystems IRIS tries to find the source documents of these entities and parse them. To control how InterSystems IRIS resolves external entities, use the following steps:

  1. Define the entity resolver class.

This class must extend % xml.sax.EntityResolver, and it must implement the resolveEntity() method, which has the following signature:

method resolveEntity(publicID As %Library.String, systemID As %Library.String) as %Library.Integer
Copy the code

This method is called whenever the XML processor finds a reference to an external entity, such as a DTD; Here the public ID and systemID are the public and system identifier strings for the entity.

This method should take an entity or document, return it as a stream, and wrap the stream in an instance of % xml.sax.StreamAdapter. This class provides the necessary methods to characterize a flow.

If the entity cannot be resolved, the method should return $$$NULLOREF to indicate to the SAX parser that the entity cannot be resolved).

Although the method signature indicates that the return value is % library.INTEGER, the method should return an instance of % xml.sax.StreamAdapter or a subclass of that class.

In addition, identifiers that refer to external entities are always passed to the resolveEntity() method specified in the document. Specifically, if such an identifier uses a relative URL, the identifier is passed as a relative URL, meaning that the actual location of the reference document is not passed to the resolveEntity() method and the entity cannot be resolved. In this case, use the default entity parser instead of a custom entity parser.

  1. When reading an XML document, do the following:

A. Create an instance of the entity resolver class. B. Use this instance when reading XML documents, as described in “Specifying parser options” earlier in this chapter.

The sample

For example, the following XML document:


      
<! DOCTYPEhtml SYSTEM  "c://temp/html.dtd">
<html>
<head><title></title></head>
<body>
<p>Some < xhtml-content > with custom entities &entity1; and &entity2; .</p>
<p>Here is another paragraph with &entity1; again.</p>
</body></html>
Copy the code

This document uses the following DTD:

<! ENTITYentity1
         PUBLIC "-//WRC//TEXT entity1//EN"
         "http://www.intersystems.com/xml/entities/entity1">
<! ENTITYentity2
         PUBLIC "-//WRC//TEXT entity2//EN"
         "http://www.intersystems.com/xml/entities/entity2">
<! ELEMENThtml (head.body) >
<! ELEMENThead (title) >
<! ELEMENTtitle (#PCDATA) >
<! ELEMENTbody (p)>
<! ELEMENT p (#PCDATA) >
Copy the code

To read this document, you need a custom entity resolver like this:

Class CustomResolver.Resolver Extends %XML.SAX.EntityResolver
{

Method resolveEntity(publicID As %Library.String, systemID As %Library.String) As %Library.Integer
{
    Try {
        Set res=##class(%Stream.TmpBinary%).New(a) / /check if we are here to resolve a custom entity
        If systemID="http://www.intersystems.com/xml/entities/entity1" 
        {
            Do res.Write("Value for entity1")
            Set return= # #class(%XML.SAX.StreamAdapter%).New(res)}Elseif systemID="http://www.intersystems.com/xml/entities/entity2" 
            {
                Do res.Write("Value for entity2")
                Set return= # #class(%XML.SAX.StreamAdapter%).New(res)}Else //otherwise call the default resolver
            {
                Set res=##class(%XML.SAX.EntityResolver%).New(a)Set return=res.resolveEntity(publicID,systemID)
            }
    }
    Catch 
    {
        Set return=$$$NULLOREF
    }
    Quit return}}Copy the code

The following class contains a demo method that parses the file shown earlier and uses this custom parser:

Include (%occInclude, %occSAX)

Class CustomResolver.ParseFileDemo
{

ClassMethod ParseFile(a) 
{
    Set res= ##class(CustomResolver.Resolver%).New(a)Set file="c:/temp/html.xml"
    Set parsemask=$$$SAXALLEVENTS+$$$SAXERROR
    Set status=##class(%XML.TextReader).ParseFile(file.textreader.res.parsemask,, 0)If $$$ISERR(status) {Do $system.OBJ.DisplayError(status) Quit } Write ! ."Parsing the file ",file,! 
    Write "Custom entities in this file:"
    While textreader.Read()
    {
        If textreader.NodeType="entity"{ Write ! ."Node:", textreader.seq Write ! ." name: ", textreader.Name Write ! ." value: ", textreader.Value
        }
    }

}

}
Copy the code

The output of this method in a terminal session is shown below:

GXML>d ##class(CustomResolver.ParseFileDemo).ParseFile(a)Parsing the file c: /temp/html.xml
Custom entities in this file:
Node13:name: entity1
    value: Value for entity1
Node: 15name: entity2
    value: Value for entity2
Node: 21name: entity1
    value: Value for entity1
Copy the code

Example 2

For example, read an XML document that contains the following:

<! DOCTYPEchapter PUBLIC "- / / OASIS / / DTD DocBook XML V4.1.2 / / EN"
 "c:\test\doctypes\docbook\docbookx.dtd">
Copy the code

In this case, set the publicId to -//OASIS//DTD DocBook XML V4.1.2//EN and set the systemId to C :\test\doctypes\ DocBook \ docbookX.dtd. The resolveEntity method is called.

The resolveEntity method determines the correct source of the external entity, returns it as a stream, and wraps it in an instance of % xml.streamadaptor. The XML parser reads the entity definition from this specialized stream.

For example, refer to the % xml. Catalog and % xml. CatalogResolverclass included in the InterSystems IRIS library. The % xml. Catalog class defines a simple database that associates public and system identifiers with urls. The % xml. CatalogResolver class is an entity resolver class that uses this database to look up the URL for a given identifier. % xml. Catalogclass can load its database from an SGML style catalog file; This file maps the identifier to a standard-format URL.