Java XML Keeping tabs on Nodes

I am struggling to parse an XML Document with variable fields and nodes. I either get all the data from the child nodes in one clump or nothing at all.

Imagine this XML Document

<?xml version="1.0" encoding="UTF-8"?>
<GreatGrandParent id="123456" version="5">
    <GrandParent1>
        <Parent1>
            <Child1>Text1</Child1>                 
            <Child2>Text2</Child2>
            <Child3>Text3</Child3>
        </Parent1>
        <Parent2>
            <Child1 name="something" from="0" to="999" /> 
            <Child2 name="something" from="0" to="999" />
        </Parent2>
    </GrandParent1>
<GrandParent2>
    <Parent1 id="something else" minv="0" maxv="999" name="name1"/> 
</GrandParent2>
</GreatGrandParent>

I am only interested in the lines that carry an attribute or text between the tags but want to identify the parents/grandparents etc. I know this is confusing but bear with me. Using the above example, I want Java to produce the following output:

GreatGrandParent.id="123456"
GreatGrandParent.version="5"
GreatGrandParent.GrandParent1.Parent1.Child1 = "Text1"
GreatGrandParent.GrandParent1.Parent1.Child2 = "Text2"
GreatGrandParent.GrandParent1.Parent1.Child3 = "Text3"
GreatGrandParent.GrandParent1.Parent2.Child1.Name="Something"
GreatGrandParent.GrandParent1.Parent2.Child1.from="0"
GreatGrandParent.GrandParent1.Parent2.Child1.to="999"
etc....

The nodes names can vary and don't necessarily follow this pattern but I am trying to explain what it is I am after.

I can do this by reading the file as text and scanning each line one by one but that's not particularly elegant or preferable since the fields can be very variable and I have no idea prior to processing what fields I am going to get or what the relationship of the field is to the ones prior and after, which leads to convoluted code which is difficult to debug.

I am not looking for a solution just an pointer in the right direction. Many of the examples on the internet are based on a static structure and knowing what fields are available which is of no use to me.

I am looking for something along the lines of MKYung's ideas on https://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/ - specifically the Looping the Node section which does go part of the way, but, the code dumps a lot of superflous code, for example the parent lists all the values of the children in one block, which is not what I want.

Does anyone have a suggestion to this?

1 answer

  • answered 2017-11-13 08:06 Ralf Renz

    You could use a SAX-Parser. In the org.xml.sax.helpers.DefaultHandler you must implement startElement() and endElement(). You need both methods to keep a track on the xpath to the actual element and in the startElement() you can check for attributes and print them.