Python - using element tree to get data from specific nodes in xml

I have been looking around and there are a lot of similar questions, but none that solved my issue sadly.

My XML file looks like this

<?xml version="1.0" encoding="utf-8"?>
  <Nodes>
    <Node ComponentID="1">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="On"/>
      </Settings>
    </Node>
    <Node ComponentID="2">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="Off"/>
      </Settings>
    </Node>
    <Node ComponentID="3">
      <Settings>
        <Value name="Text Box (1)"> SettingG </Value>
        <Value name="Text Box (2)"> SettingH </Value>
        <Value name="Text Box (3)"> SettingI </Value>
        <Value name="Text Box (4)"> SettingJ </Value>
      <AdvSettings State="Yes"/>
      </Settings>
    </Node>
  </Nodes>

With Python I'm trying to get the Values of text box 1 and text box 2 for each Node that has "AdvSettings" set on ON.

So in this case I would like a result like

ComponentID  State  Textbox1  Textbox2
1            On     SettingA  SettingB
3            On     SettingG  SettingH

I have done some attempts but didn't get far. With this I managed to get the AdvSettings tag, but that's as far as I got:

import xml.etree.ElementTree as ET
tree = ET.parse('XMLSearch.xml')
root = tree.getroot()

for AdvSettingsin root.iter('AdvSettings'):
    print(AdvSettings.tag, AdvSettings.attrib)

2 answers

  • answered 2022-05-04 12:08 Kris

    You can use an XPath to find all the relevant nodes and then extract the needed data out of them. An example to this will be like below. (Comments as explanation)

    from lxml import etree
    
    xml = etree.fromstring('''
      <Nodes>...
      </Nodes>
    ''')
    
    # Use XPath to select the relevant nodes
    
    on_nodes = xml.xpath("//Node[Settings[AdvSettings[@State='Yes' or @State='On']]]")
    
    # Get all needed information from every node
    data_collected = [dict(
        [("ComponentID", node.attrib['ComponentID'])] +
        [(c.get("name"), c.text) for c in node.find("Settings").getchildren() if c.text]) for node in on_nodes]
    
    
    # You got a list of dicts with all relevant information
    # print it out, I used pandas for formatting. Optional
    import pandas
    print(pandas.DataFrame.from_records(data_collected).to_markdown(index=False))
    

    Would give you an output like

    |   ComponentID | Text Box (1)   | Text Box (2)   | Text Box (3)   | Text Box (4)   |
    |--------------:|:---------------|:---------------|:---------------|:---------------|
    |             1 | SettingA       | SettingB       | SettingC       | SettingD       |
    |             3 | SettingG       | SettingH       | SettingI       | SettingJ       |
    

  • answered 2022-05-04 12:08 balderman

    Below (using python core xml lib)

    import xml.etree.ElementTree as ET
    import pandas as pd
    
    xml = '''<?xml version="1.0" encoding="utf-8"?>
      <Nodes>
        <Node ComponentID="1">
          <Settings>
            <Value name="Text Box (1)"> SettingA </Value>
            <Value name="Text Box (2)"> SettingB </Value>
            <Value name="Text Box (3)"> SettingC </Value>
            <Value name="Text Box (4)"> SettingD </Value>
          <AdvSettings State="On"/>
          </Settings>
        </Node>
        <Node ComponentID="2">
          <Settings>
            <Value name="Text Box (1)"> SettingA </Value>
            <Value name="Text Box (2)"> SettingB </Value>
            <Value name="Text Box (3)"> SettingC </Value>
            <Value name="Text Box (4)"> SettingD </Value>
          <AdvSettings State="Off"/>
          </Settings>
        </Node>
        <Node ComponentID="3">
          <Settings>
            <Value name="Text Box (1)"> SettingG </Value>
            <Value name="Text Box (2)"> SettingH </Value>
            <Value name="Text Box (3)"> SettingI </Value>
            <Value name="Text Box (4)"> SettingJ </Value>
          <AdvSettings State="Yes"/>
          </Settings>
        </Node>
      </Nodes>''' 
    
    data = []
    root = ET.fromstring(xml)
    nodes = root.findall('.//Node')
    for node in nodes:
      adv = node.find('.//AdvSettings')
      if adv is None:
        continue
      flag = adv.attrib.get('State','Off')
      if flag == 'On' or  flag == 'Yes':
        data.append({'id':node.attrib.get('ComponentID'),'txt_box_1':node.find('.//Value[@name="Text Box (1)"]').text.strip(),'txt_box_2':node.find('.//Value[@name="Text Box (2)"]').text.strip()})
    
    df = pd.DataFrame(data)
    print(df)
    

    output

      id txt_box_1 txt_box_2
    0  1  SettingA  SettingB
    1  3  SettingG  SettingH
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum