Chapter 26 Customizes the use of SAX parsers

Whenever InterSystems IRIS reads an XML document, it uses the InterSystems IRIS SAX(Simple API For XML) parser. This chapter describes the options for controlling the IRIS SAX parser between systems.

About the IRIS SAX parser

The InterSystems IRIS SAX parser is used whenever InterSystems IRIS reads an XML document.

It is an event-driven XML parser that reads XML files and issues callbacks when it finds items of interest, such as the start of an XML element, the start of a DTD, and so on.

(More precisely, the parser works in conjunction with the content handler, which issues callbacks. This distinction is important only if you customize the SAX interface, as described in “Creating a Custom Content Handler” later in this chapter.)

The parser uses the standard xerces-c ++ library, which conforms to the XML1.0 recommendation and many related standards.

Available parser options

You can control the behavior of a SAX parser in the following ways:

  • Flags can be set to specify the type of validation and processing to be performed.

Note that the parser always checks to see if the document is a well-formed XML document.

  • You can refer to events of interest (that is, items that you want the parser to look up). To do this, you need to specify a mask to indicate the event of interest.
  • You can provide architectural specifications against which to validate documents.
  • Entity resolution can be disabled using a special-purpose entity parser.
  • You can specify a timeout period for entity resolution.
  • If you want to control how the parser looks up the definition of any entity in the document, you can specify a more general custom entity parser.
  • If the source document is accessed through a URL, you can specify the request to the Web server as an instance of % net.Httprequest.
  • Custom content handlers can be specified.
  • You can use HTTPS.

The options available depend on how the InterSystems IRIS SAX Parser is used, as shown in the following table:

SAX parser options in the %XML class

Option %XML.Reader %XML.TextReader %XML.XPATH.Document %XML.SAX.Parser
Specify parser flags supported supported supported supported
Specify parsing events of interest (for example, start of element, end of element, comment) not supported supported not supported supported
Specify schema specification supported supported supported supported
Disable or otherwise customize entity resolution supported supported supported supported
Specifying custom HTTP requests (if parsing urls) not supported supported not supported supported
Specify the content handler not supported not supported not supported supported
Parse the document in the HTTPS location supported not supported not supported supported
Resolve entities at HTTPS locations not supported not supported not supported supported

Specify parser options

Specifying different parser behavior depends on how you use the InterSystems IRIS SAX parser:

  • If you are using%XML.Reader, you can set the reader instanceTimeout,SAXFlags,SAXSchemaSpecandEntityResolverProperties.

Such as:

   #include %occInclude
   #include %occSAX
   // set the parser options we want
   Set flags = $$$SAXVALIDATION
               + $$$SAXNAMESPACES
               + $$$SAXNAMESPACEPREFIXES
               + $$$SAXVALIDATIONSCHEMA
   
   Set reader=##class(%XML.Reader%).New(a)Set reader.SAXFlags=flags
Copy the code

These macros are defined in %occSAX. The company contains files.

  • In other cases, specify the parameters of the method being used. Such as:
   #include %occInclude
   #include %occSAX
   
   //set the parser options we want
   Set flags = $$$SAXVALIDATION
               + $$$SAXNAMESPACES
               + $$$SAXNAMESPACEPREFIXES
               + $$$SAXVALIDATIONSCHEMA

  Set status=##class(%XML.TextReader).ParseFile(myfile.doc.flags)
Copy the code

Set the parser flag

The %occSAX. Inc include file lists flags that can be used to control the validation performed by the Xerces parser. The basic marks are as follows:

  • $$$SAXVALIDATION– Whether to perform mode verification. If this flag is on (the default), all validation errors are reported.
  • $$$SAXNAMESPACES– Specifies whether to recognize the namespace. If this flag is ON(the default), the parser will process the namespace. If this flag is OFF, InterSystems IRIS causes%XML.SAX.ContentHandlerthestartElement()Element in the callbacklocalnameIs an empty string.
  • $$$SAXNAMESPACEPREFIXES– Specifies whether to process namespace prefixes. If this flag isON, the parser reports back the original prefix names and attributes used for namespace declarations. By default, this flag is turned off.
  • $$$SAXVALIDATIONDYNAMIC– Specifies whether authentication is performed dynamically. If this flag isON(default), validation is performed only when the syntax is specified.
  • $$$SAXVALIDATIONSCHEMA– Specifies whether validation is performed against the schema. If this flag isON(default) to perform validation for a given schema, if any.
  • $$$SAXVALIDATIONSCHEMAFULLCHECKING– Specifies whether to perform full schema constraint checks, including time-consuming or memory intensive checks. If this flag is on, all constraint checks are performed. By default, this flag is turned off.
  • $$$SAXVALIDATIONREUSEGRAMMAR– Specifies whether to cache the syntax for later reuse in analysis within the same IRIS process. By default, this flag is turned off.
  • $$$SAXVALIDATIONPROHIBITDTDS– Special flags that cause the parser to throw an error when encountering a DTD. Use this flag if you need to block processing of DTDS. To use this flag, the value must be$$$SAXVALIDATIONPROHIBITDTDSExplicitly added to pass to%XML.SAX.ParserThe various analysis methods of the analysis logo.

The following additional flags provide a useful combination of the basic flags:

  • $$$SAXDEFAULTS– Equivalent to SAX default values.
  • $$$SAXFULLDEFAULT– Equivalent to SAX defaults, plus an option to handle namespace prefixes.
  • $$$SAXNOVALIDATION– Does not perform schema validation, but recognizes namespaces and namespace prefixes. Note that the SAX parser always checks if the document is a well-formed XML document.

The following snippet shows how to combine parser options:

. #include %occInclude #include %occSAX ... ;; set the parser options we want set opt = $$$SAXVALIDATION + $$$SAXNAMESPACES + $$$SAXNAMESPACEPREFIXES + $$$SAXVALIDATIONSCHEMA ... set status=##class(%XML.TextReader).ParseFile(myfile,.doc,,opt) //check status if $$$ISERR(status) {do $System.Status.DisplayError(status) quit}Copy the code

Specifies the event mask

The basic marks are as follows:

  • $$$SAXSTARTDOCUMENT– Instructs the parser to issue a callback when the document is started.
  • $$$SAXENDDOCUMENT– Instructs the parser to issue a callback when it terminates a document.
  • $$$SAXSTARTELEMENT– Instructs the parser to issue a callback when it finds the beginning of an element.
  • $$$SAXENDELEMENT– Instructs the parser to issue a callback when it finds the end of an element.
  • $$$SAXCHARACTERS– Instructs the parser to issue a callback when it finds a character.
  • $$$SAXPROCESSINGINSTRUCTION– Instructs the parser to issue a callback when it finds a processing instruction.
  • $$$SAXSTARTPREFIXMAPPING– Instructs the parser to issue a callback when it finds the start of the prefix map.
  • $$$SAXENDPREFIXMAPPING– Instructs the parser to issue a callback when it finds the end of the prefix map.
  • $$$SAXIGNORABLEWHITESPACE– Instructs the parser to issue a callback when it finds ignorable whitespace. This only applies if the document has a DTD and validation is enabled.
  • $$$SAXSKIPPEDENTITY– Instructs the parser to issue a callback when it finds a skipped entity.
  • $$$SAXCOMMENT– Instructs the parser to issue a callback when it finds a comment.
  • $$$SAXSTARTCDATA– Indicates that the analyzer is findingCDATAA callback is issued at the beginning of the section.
  • $$$SAXENDCDATA– Indicates that the analyzer is findingCDATAA callback is issued at the end of the section.
  • $$$SAXSTARTDTD– Indicates that the analyzer is findingDTDIssue a callback at the beginning of.
  • $$$SAXENDDTD– Indicates that the analyzer is findingDTDA callback is issued at the end.
  • $$$SAXSTARTENTITY– Instructs the parser to issue a callback when it finds the beginning of an entity.
  • $$$SAXENDENTITY– Instructs the parser to issue a callback when it finds the end of an entity.

Convenient combination of logos

The following additional flags provide a useful combination of the basic flags:

  • $$$SAXCONTENTEVENTS– Instructs the parser on any contains"Content"The event emits a callback.
  • $$$SAXLEXICALEVENT– Instructs the parser to issue a callback to any lexical event.
  • $$$SAXALLEVENTS– Instructs the parser to issue callbacks for all events.

Combine flags into a single mask

The following snippet shows how to combine multiple flags into a mask:

. #include %occInclude #include %occSAX ...// set the mask options we want
 set mask = $$$SAXSTARTDOCUMENT
               + $$$SAXENDDOCUMENT
               + $$$SAXSTARTELEMENT
               + $$$SAXENDELEMENT
               + $$$SAXCHARACTERS
...
 // create a TextReader object (doc) by reference
 set status = ##class(%XML.TextReader).ParseFile(myfile.doc.mask)

Copy the code

Specify schema document

You can specify schema specifications for validating document sources. Specifies a string containing a comma-separated list of namespace /URL pairs:

"namespace URL,namespace URL,namespace URL,..."
Copy the code

The namespace here is an XML namespace (not a namespace prefix), and the URL is the URL that provides the location of the schema document for that namespace. There is a space character between the namespace and the URL value. For example, a schema specification with a single namespace is shown below:

"http://www.myapp.org http://localhost/myschemas/myapp.xsd"
Copy the code

Here is a schema specification with two namespaces:

"http://www.myapp.org http://localhost/myschemas/myapp.xsd,http://www.other.org http://localhost/myschemas/other.xsd"
Copy the code

Disabling Entity resolution

Even when the SAX flag is set to disable validation, the SAX parser still tries to resolve external entities, which can be time-consuming, depending on their location.

Class % XML, SAX NullEntityResolver achieve entity always return empty stream parser. Use this class if you want to disable entity resolution. Specifically, it reads the XML document, please use the % XML. The. The SAX parser NullEntityResolver instance as entities. Such as:

   Set resolver=##class(%XML.SAX.NullEntityResolver%).New(a)Set reader= # #class(%XML.Reader%).New(a)Set reader.EntityResolver=resolver
   
   Set status=reader.OpenFile(myfile)
   ...
Copy the code

Important: Because this change will disable all external entity resolution, this technique will also disable all external DTD and schema references in XML documents.