Targeting the third list item with beautiful soup
I'm scraping a website with Beautiful Soup and am having trouble trying to target an item in a span tag nested within an li tag. The website I'm trying to scrape is using the same classes for each list items which is making it harder. The HTML looks something like this:
<div class="bigger-container">
<div class="smaller-container">
<ul class="ulclass">
<li>
<span class="description"></span>
<span class="item"></span>
</li>
<li>
<span class="description"></span>
<span class="item"></span>
</li>
<li>
<span class="description"></span>
<span class="item">**This is the only tag I want to scrape**</span>
</li>
<li>
<span class="description"></span>
<span class="item"></span>
</li>
</ul>
My first thought was to try and target it using "nth-of-type() - I found a similar questions here but it hasn't helped. I've tried playing with it for a while now but my code basically looks like this:
import requests
from bs4 import BeautifulSoup
url = 'url of website I'm scraping'
headers = {User-Agent Header}
for page in range(1):
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, features="lxml")
scrape = soup.find_all('div', class_ = 'even_bigger_container_not_included_in_html_above')
for item in scrape:
condition = soup.find('li:nth-of-type(2)', 'span:nth-of-type(1)').text
print(condition)
Any help is greatly appreciated!
1 answer
-
answered 2020-11-25 01:11
MendelG
To use a CSS Selector, use the
select()
method, notfind()
.So to get the third
<li>
, useli:nth-of-type(3)
as a CSS Selector:from bs4 import BeautifulSoup html = """<div class="bigger-container"> <div class="smaller-container"> <ul class="ulclass"> <li> <span class="description"></span> <span class="item"></span> </li> <li> <span class="description"></span> <span class="item"></span> </li> <li> <span class="description"></span> <span class="item">**This is the only tag I want to scrape**</span> </li> <li> <span class="description"></span> <span class="item"></span> </li> </ul>""" soup = BeautifulSoup(html, "html.parser") print(soup.select_one("li:nth-of-type(3)").get_text(strip=True))
Output:
**This is the only tag I want to scrape**