Targeting the third list item with beautiful soup

I'm scraping a website with Beautiful Soup and am having trouble trying to target an item in a span tag nested within an li tag. The website I'm trying to scrape is using the same classes for each list items which is making it harder. The HTML looks something like this:

<div class="bigger-container">
<div class="smaller-container">
<ul class="ulclass">
<li>
<span class="description"></span>
<span class="item"></span>
</li>
<li>
<span class="description"></span>
<span class="item"></span>
</li>
<li>
<span class="description"></span>
<span class="item">**This is the only tag I want to scrape**</span>
</li>
<li>
<span class="description"></span>
<span class="item"></span>
</li>
</ul>

My first thought was to try and target it using "nth-of-type() - I found a similar questions here but it hasn't helped. I've tried playing with it for a while now but my code basically looks like this:

import requests
from bs4 import BeautifulSoup

url = 'url of website I'm scraping'
headers = {User-Agent Header}

for page in range(1):
    r = requests.get(url, headers = headers)
    soup = BeautifulSoup(r.content, features="lxml")

    scrape = soup.find_all('div', class_ = 'even_bigger_container_not_included_in_html_above') 

    for item in scrape:
        condition = soup.find('li:nth-of-type(2)', 'span:nth-of-type(1)').text
        print(condition)

Any help is greatly appreciated!

1 answer

  • answered 2020-11-25 01:11 MendelG

    To use a CSS Selector, use the select() method, not find().

    So to get the third <li>, use li:nth-of-type(3) as a CSS Selector:

    from bs4 import BeautifulSoup
    
    
    html = """<div class="bigger-container">
    <div class="smaller-container">
    <ul class="ulclass">
    <li>
    <span class="description"></span>
    <span class="item"></span>
    </li>
    <li>
    <span class="description"></span>
    <span class="item"></span>
    </li>
    <li>
    <span class="description"></span>
    <span class="item">**This is the only tag I want to scrape**</span>
    </li>
    <li>
    <span class="description"></span>
    <span class="item"></span>
    </li>
    </ul>"""
    
    
    soup = BeautifulSoup(html, "html.parser")
    
    
    print(soup.select_one("li:nth-of-type(3)").get_text(strip=True))
    

    Output:

    **This is the only tag I want to scrape**