Ruby: find all `<img>` tags that only have a `<br/>` immediately following it

Using Ruby, I am trying to find all <img> tags that have a <br /> immediately afterward.

For example, this is what I am looking for:

<img src="http://img-example.jpg" alt="some description"><br />

But this would be an example of what I am not looking for:

<img src="http://img2-example.jpg" alt=""><span>Some extended text</span><img src="http://img3-example.jpg" alt="some more descriptions"><br />

In the second example, there is a <br /> but it is not immediately preceded by the <img> tag and only the <img> tag.

I have tried Regex, and Nokogiri. Albeit, my Ruby skills are pretty terrible.

Thoughts? Is Nokogiri better? If so, what is your recommendation? Regex better? If so, what is your recommendation for that?

I have used the following, but it returns true for both instances above:

img_with_break = string[/<img(.*?)alt=\"(.*?)\"><br \/>/]

1 answer

  • answered 2018-01-12 00:33 pguardiario

    You can do:

    doc.search('img').select{|img| img.at('+ br')}
    

    I would have thought just:

    doc.search('img:has(+ br)')
    

    but that doesn't work (bug in nokogiri)