Python regex grouping groups into single string?

I am trying to do a regex search on some text where I am only interested in the text between some patterns.

Sample text:

<h2><font color='#fff'>some text </font></h2><HR noshade size="5" width="50%" align="center"><table><tr><th id='t1'>Host  </th><th>10.0.1.1</th></tr><th id='t1'>Port  </th><th>8080</th></tr><th id='t1'>User  </th><th>chris</th></tr><th id='t1'>Password  </th><th>chris</th></tr></table><h4><font color='#fff'>
...
<h2><font color='#fff'>some more text </font></h2><HR noshade size="5" width="50%" align="center"><table><tr><th id='t1'>Host  </th><th>10.0.1.2</th></tr><th id='t1'>Port  </th><th>9090</th></tr><th id='t1'>User  </th><th>bob</th></tr><th id='t1'>Password  </th><th>bob</th></tr></table><h4><font color='#fff'>

This is my regex:

Host.*?<th>(.*?)<.*Port.*?<th>(.*?)<.*User.*?<th>(.*?)<.*Password.*?<th>(.*?)<

Each regex match is returning a list and that is not what I want. I would like the groups to be combined into a string.

This is the output I want:

10.0.1.1 8080 chris chris
10.0.1.2 9090 bob bob

Here is what I am doing:

lines = []
lines.extend(re.findall(r"Host.*?<th>(.*?)<.*Port.*?<th>(.*?)<.*User.*?<th>(.*?)<.*Password.*?<th>(.*?)<", s))
print (lines)

Which gives me:

[('10.0.1.1', '8080', 'chris', 'chris'), ('10.0.1.2', '9090', 'bob', 'bob')]

Can anyone exaplin why this happens and how I can get what I want?

Thanks, Chris

1 answer

  • answered 2021-06-17 20:31 Boom100100

    re.findall returns a list of tuples for each matching group when you include two or more parentheses. You can further process that to match what you want:

    strings = [' '.join(tup) for tup in output]