How can I click on a specific link with Nokogori or Mechanize?

I know how to find an element using Nokogiri. I know how to click a link using Mechanize. But I can't figure out how to find a specific link and click it. This seems like it should be really easy, but for some reason I can't find a solution.

Let's say I'm just trying to click on the first result on a Google search. I can't just click the first link with Mechanize, because the Google page has a bunch of other links, like Settings. The search result links themselves don't seem to have class names, but they're enveloped in <h3 class="r"></h3>.

I could just use Nokogiri to follow the href value of the link like so:

document = open("https://www.google.com/search?q=stackoverflow")
parsed_content = Nokogiri::HTML(document.read)
href = parsed_content.css('.r').children.first['href']
new_document = open(href)
# href is equal to "/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;url=https%3A%2F%2Fstackoverflow.com%2F"

but it's not a direct url, and going to that url gives an error. The data-href value is a direct url, but I can't figure out how to get that value - doing the same thing except with ...first['data-href'] returns nil.

Anyone know how I can just find the first .r element on the page and click the link inside it?

Here's the start to my action:

require 'open-uri'
require 'nokogiri'
require 'mechanize'
document = open("https://www.google.com/search?q=stackoverflow")
parsed_content = Nokogiri::HTML(document.read)

Here's the .r element on the Google search results page:

<h3 class="r">
  <a href="/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;url=https%3A%2F%2Fstackoverflow.com%2F" data-href="https://stackoverflow.com/">Stack Overflow</a>
</h3>

1 answer

  • answered 2017-11-12 20:23 max pleaner

    You should make sure your question is the correct code in your example - it looks like it is not, because you don't surround the url in quotes and the css selector is .r a not r. You use .r a because you want to access the link inside elements with the r class.

    Anyway, you can use the approach detailed here like so:

    require 'open-uri'
    require 'nokogiri'
    require 'uri'
    
    base_url = "https://www.google.com/search?q=stackoverflow"
    document = open(base_url)
    parsed_content = Nokogiri::HTML(document.read)
    href = parsed_content.css('.r a').children.first.attributes['href']
    new_url = URI.join base_url, href.value
    new_document = open(new_url)
    

    I tested this and following new_url does redirect to StackOverflow as expected.