Getting around mixed content errors when using Snaplogic XML Parser

We're doing a Snaplogic proof of concept and the UI is quite pretty so I thought I would write some blog posts about it. This will make a break from the normal boring badly spelled content I publish! Snaplogic works very well with JSON data in its pipelines but the integrations that we need to build make heavy use of XML. One solution is to use the XML Parser snap which will convert the XML into JSON. Unfortunately, the XML Parser failed for the first pipeline we tried in our POC!

The error we were seeing was:

  1. {error:Failed to convert xml to json, stacktrace:com.snaplogic.api.ExecutionException: Encountered XML Mixed content
  2. \n  at com.snaplogic.snap.api.xml.XmlUtilsImpl.throwMixedContentEx...}
  1. "error": "Failed to convert xml to json"
  2. "stacktrace": "com.snaplogic.api.ExecutionException: Encountered XML Mixed content
  3. \n  at com.snaplogic.snap.api.xml.XmlUtilsImpl.throwMixedContentException(
  4. \n  at com.snaplogic.snap.api.xml.XmlUtilsImpl.copyEventsFrom(
  5. \n  at com.snaplogic.snap.api.xml.XmlUtilsImpl.convertToJson(
  6. \n  at com.snaplogic.snaps.transform.XmlParser.doWork(
  7. \n  at com.snaplogic.snap.api.SimpleBinarySnap.execute(
  8. \n  at
  9. \n  at
  10. \n  at
  11. \n  at$000(
  12. \n  at$

We were able to isolate the issue to the presence of mixed content in the input XML. This was because the API we are using is designed for presenting results onto web pages and as they have html elements mixed with plain text they have mixed content. The Saxon parse used by Snaplogic doesn't know how to deal with this.

We created an example XML that causes this error:

  1. <root>
  2. <a>Text A</a>
  3. <b>Text B</b>
  4. <c>Text C</c>
  5. <mixedContent><tag>  </tag>Text In mixed content</mixedContent>
  6. </root>

The problem is with the element 'mixedContent' which has two child nodes. One is an element node, and the other is a text node. (In XML Text sequences are considered nodes even though they don't have start and end tags.)

Lucky for us years of fighting with XML and XSLT helped us come up with a solution. We can use the XSLT snap to remove the mixed content from the input XML before passing that onwards to the XML Parser snap. The following XSLT provides a generic solution which can be applied here:

  1. <?xml version="1.0"?>
  2. <xsl:stylesheet xmlns:xsl="; version="1.0">
  3.   <xsl:output method="xml"/>
  5.   <xsl:template match="text()[string-length(normalize-space(.))>0]">
  6.     <xsl:if test="count(../*)=0"><xsl:copy-of select="."/></xsl:if>
  7.     <xsl:if test="count(../*)>0"><mixed_content_wrapper><xsl:copy-of select="."/></mixed_content_wrapper></xsl:if>
  8.   </xsl:template>
  10.   <xsl:template match="@*|*|processing-instruction()|comment()">
  11.     <xsl:copy>
  12.       <xsl:apply-templates select="*|@*|text()|processing-instruction()|comment()"/>
  13.     </xsl:copy>
  14.   </xsl:template>
  16. </xsl:stylesheet>

This gives us a happy XML parser and output JSON:

This approach can be further generalized by creating it as a Snaplogic Pattern - that will be a future learning exercise.

It does rely on converting the entire XML document into a JSON document. In an XML -> XML integration I think it would be better to directly transform the XML document directly into a target XML document. I am not sure how that will mix together with the other Snaps which seem to expect JSON, e.g. the mapper snap seems to only go from JSON to JSON.

Published Date