Having trouble extracting dynamic div list while scrolling down using Webdriver (selenium & python)
I'm having a hard time trying to figure out how to get a refreshed dynamic list while scrolling down the page using Webdriver in Selenium and Python3. https://www.ubereats.com/stores/ this is the website that I'm trying to scrape and if the site directs you to the homepage, please type any city and click, which will show you list of restaurants in div.
The interesting thing here is that if you go to inspect element, the list of <div class="base_ ue-ff ...>..</div>
changes as I scroll down the page and even I did scroll the page down using a webdriver in selenium python, it still retrieves the old data that has been extracted at the first place. Below is my sample code. I also made a sleep function to let the data to load, but there wasn't any difference to the data extraction.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from urlib.request import urlopen
from importlib import reload
import re
import sys
driver = webdriver.Chrome(path_chrome_driver)
driver.get('https://www.ubereats.com')
wait_time_for_search_complete = float(np.random.uniform(1,2,1))
time.sleep(wait_time_for_search_complete)
input_city_name = driver.find_element_by_xpath("//input[@placeholder='Enter your delivery address']")
time_to_wait_to_enter_city_name = float(np.random.uniform(1, 2, 1))
time.sleep(time_to_wait_to_enter_city_name)
input_city_name.send_keys('Sydney')
time_to_wait_to_write_city = float(np.random.uniform(2, 3, 1))
time.sleep(time_to_wait_to_write_city)
select_first_in_dropdown = driver.find_element_by_xpath('//*[@id="app-content"]/div/div[1]/div/div[1]/div[1]/div[2]/div/div/div[3]/div[1]/div/div/div[2]/div/div/button[1]')
select_first_in_dropdown.click()
time_to_wait_to_load_restaurants = float(np.random.uniform(2, 3, 1))
time.sleep(time_to_wait_to_load_restaurants)
current_page = driver.page_source
soup = BeautifulSoup(current_page,'html.parser')
height = 0
restaurant_site = []
while True:
restaurant_information = ''
restaurant_information = soup.find_all('a',['base_','ue-kl','ue-km','ue-kn','ue-ko'])
time.sleep(5)
for restaurant in restaurant_information:
print(restaurant['href'])
height += 1000
driver.execute_script("window.scrollTo(0,"+ str(height) +")")
driver.implicitly_wait(3)
I'm really having hard time trying to figure out how to retrieve the restaurant list as I scroll down the page since the div is dynamic. I believe it has something to do with ajax call, but if you do have any alternative solution, please do let me know. Really want to solve this issue as soon as possible.
Thank you!!
See also questions close to this topic
-
How to randomly call a function within a given time interval ? - Python
I am creating a simulator which tries to simulate an office Door Sensor. The office is usually open from 8.00am to 6.00pm. The function would give me timestamps when the door was opened and when the door was closed ( in Timestamps ). The door takes 10 seconds to close.
So I want to create a function which I would randomly call between 8.00am to 6.00pm. That function would give me timestamps of when the door was opened and closed.
Simulator runs a loop per second , and I would want to call this function randomly between 8.00am to 6.00pm.
I'd appreciate if you can suggest me a different approach.
Thanks.
-
How to fix inconsistent return statement in python?
I am new to python and i have this project I am working on a small project with two functions where the first returns the index of the first time a difference is spotted in a string. The next function does that but in a list of strings. Now, due to my being an amateur, i have used an excessive amount of if and else statements which resulted in too many return statements especially in the second function, and i get the error [R1710: inconsistent-return-statements]. How do i fix it and can anybody give me clear examples to better pieces of code? Sorry for the question being so long.
IDENTICAL = -1 def singleline_diff(line1, line2): """ Inputs: line1 - first single line string line2 - second single line string Output: Returns the index where the first difference between line1 and line2 occurs. Returns IDENTICAL if the two lines are the same. """ len1 = len(line1) len2 = len(line2) minimum_length = min(len1, len2) if len1 != len2: if minimum_length == 0: return 0 for idx in range(minimum_length): if line1[idx] == line2[idx]: pass else: return idx return idx + 1 for idx in range(len1): if line1[idx] == line2[idx]: pass else: return idx return IDENTICAL def multiline_diff(lines1, lines2): """ Inputs: lines1 - list of single line strings lines2 - list of single line strings Output: Returns a tuple containing the line number (starting from 0) and the index in that line where the first difference between lines1 and lines2 occurs. Returns (IDENTICAL, IDENTICAL) if the two lists are the same. """ line_no = singleline_diff(lines1, lines2) len_lines1, len_lines2 = len(lines1), len(lines2) if len_lines1 == len_lines2: if (len_lines1 or len_lines2) == 0: if len_lines1 == len_lines2: return (IDENTICAL, IDENTICAL) else: idx = singleline_diff(lines1[line_no], lines2[line_no]) return (line_no, idx) else: idx = singleline_diff(lines1[line_no], lines2[line_no]) if line_no == IDENTICAL: return (IDENTICAL, IDENTICAL) elif line_no != IDENTICAL: return (line_no, idx) else: return (line_no, 0)
-
how to set sqlite3 select delimeter?
How can I change sqlite3
SELECT
output delimiter in python? I have lot's of created databases on lot's of PC's, so changing delimiter for each pc from sqlite cli isn't possible. Expected format istext1|text2|text3
. Thx. -
Allure report - Index.html is not generated
When we run this code in eclipse allure report(INDEX.HTML)is not generated only json files getting generated in allure report folder. Kindly help to resolve this issue.
->Done "clean test site" in goals[run configuration in eclipse]
POM.XML
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>example.property.manager</groupId> <artifactId>agents</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>agents</name> <url>http://maven.apache.org</url> <properties> <aspectj.version>1.8.10</aspectj.version> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>javax.xml.bind</groupId> <artifactId>jaxb-api</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>io.github.bonigarcia</groupId> <artifactId>webdrivermanager</artifactId> <version>3.3.0</version> </dependency> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>3.141.59</version> </dependency> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-server</artifactId> <version>3.141.59</version> </dependency> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-chrome-driver</artifactId> <version>3.141.59</version> </dependency> <dependency> <groupId>io.qameta.allure</groupId> <artifactId>allure-testng</artifactId> <version>2.0-BETA19</version> <scope>test</scope> </dependency> <dependency> <groupId>org.hamcrest</groupId> <artifactId>hamcrest-all</artifactId> <version>1.3</version> <scope>test</scope> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-simple</artifactId> <version>1.7.21</version> <scope>test</scope> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> </dependency> <dependency> <groupId>io.qameta.allure</groupId> <artifactId>allure-java-commons</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>3.0.0-M3</version> <type>maven-plugin</type> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>3.0.0-M3</version> <configuration> <source>1.8</source> <target>1.8</target> <systemProperties> <property> <name>allure-results</name> <value>${allure.results.directory}</value> </property> </systemProperties> <argLine> -javaagent:"${settings.localRepository}/org/aspectj/aspectjweaver/${aspectj.version}/aspectjweaver-${aspectj.version}.jar" </argLine> </configuration> <dependencies> <dependency> <groupId>org.aspectj</groupId> <artifactId>aspectjweaver</artifactId> <version>${aspectj.version}</version> </dependency> </dependencies> </plugin> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> <reporting> <excludeDefaults>true</excludeDefaults> <plugins> <plugin> <groupId>io.qameta.allure</groupId> <artifactId>allure-maven</artifactId> <version>2.9</version> <configuration> <reportVersion>2.3.1</reportVersion> </configuration> </plugin> </plugins> </reporting> </project>
main class
import java.io.IOException; import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; import org.testng.annotations.Test; import io.qameta.allure.Description; public class App { @Test @Description("Rently Manager Portal") private static void Testcase1() { // TODO Auto-generated method stub FunctionalComponents page = new FunctionalComponents(); page.launchapplication(); page.authentication(); page.closeAllDriver(); } }
IDE -Eclipse 2018-12 Is anyone come cross this type of issue
-
automatic crawling using selenium
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC OUTPUT_FILE_NAME = 'output0.txt' driver = webdriver.Chrome() wait = WebDriverWait(driver, 10) def get_text(): driver.get("http://law.go.kr/precSc.do?tabMenuId=tab67") elem = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#viewHeightDiv > table > tbody > " "tr:nth-child(1) > td.s_tit > a"))) title = elem.text.strip().split(" ")[0] elem.click() wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, "#viewwrapCenter h2"), title)) content = driver.find_element_by_css_selector("#viewwrapCenter").text return content def main(): open_output_file = open(OUTPUT_FILE_NAME, 'w') result_text = get_text() open_output_file.write(result_text) open_output_file.close() main()
based on this code i want to crawl this website. like from the original url selenium goes into 1st link and save text to txt file and it goes back to original url and goes into 2nd link and keeps going but the problem is css_selector values for 1st link is #viewHeightDiv > table > tbody > tr:nth-child(1) > td.s_tit > a and 2nd link is #viewHeightDiv > table > tbody > tr:nth-child(3) > td.s_tit > a only difference between them is number after a child and it seems like has no rule it goes like 1,3,5,9,... so im stuck here...
-
c# Assertion to check if button is not available anymore - test
I am new with Assertions. I want to check whit an assertion if the cookie button is not on the page anymore. I am using C# with selenium and NUnit to test with. I am also using page object modelling.
Hope someone can help me.
This is my page opbject page.
using OpenQA.Selenium; using OpenQA.Selenium.Chrome; using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace MakroTest { class LandingPage { IWebDriver driver = new ChromeDriver(); public IWebElement CookieButton => driver.FindElement(By.Id("cookie-bar-btn")); public IWebElement AlgemeneVoowaarden => driver.FindElement(By.LinkText("Algemene voorwaarden")); public IWebElement Contact => driver.FindElement(By.LinkText("Contact")); public IWebElement InlogCode => driver.FindElement(By.Id("FormModel_PromotionName")); public IWebElement Wachtwoord => driver.FindElement(By.Id("FormModel_Secret")); public IWebElement InlogButton => driver.FindElement(By.ClassName("button-secondary")); public void OpenWebsite() { driver.Url = DELETED THIS BECAUSE OF PRIVACY REASONS driver.Manage().Window.Maximize(); ; } public void ClickCookieButton() { driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(10); CookieButton.Click(); } //Assert ClickCookieButton - geen button meer zichtbaar WERKT NOG NIET public bool AssertCookieButtonDisplayed() { bool isDisplayed = CookieButton.Displayed; return isDisplayed; } } }
And this is my test page
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using OpenQA.Selenium; using OpenQA.Selenium.Chrome; using NUnit.Framework; using OpenQA.Selenium.Support.UI; namespace MakroTest { class LoginTest { IWebDriver driver = new ChromeDriver(); // [SetUp] [Test] public void ShouldBeAbleToClickCookies() { LandingPage home = new LandingPage(); //Initialize the page by calling its reference home.OpenWebsite(); home.ClickCookieButton(); // assert toevoegen Assert.Null(home.AssertCookieButtonDisplayed()); home.CloseBrowser(); }
I know that there is something wrong but I cannot see what. Also did check google etc. Hope someone can help me. Thank you for your great help.
-
ruby selenium element.click has different output with different environment
require 'selenium-webdriver' caps = Selenium::WebDriver::Remote::Capabilities.firefox caps['acceptInsecureCerts'] = true @driver = Selenium::WebDriver.for(:firefox, desired_capabilities: caps) @driver.navigate.to "https://s1.demo.opensourcecms.com/s/44" el=@driver.find_element(:xpath,"//span[contains(text(),'Remove Frame')]").click p el
Output with following Setup
2.6.0 :006 > @driver.find_element(:xpath,"//span[contains(text(),'Remove Frame')]").click => nil
Environment
- Mozilla Firefox 60.5.0
- ruby 2.6.0p0 (2018-12-25 revision 66547)[x86_64-linux]
selenium-webdriver-3.141.0
Output with the following Setup
2.1.2 :006 > @driver.find_element(:xpath,"//span[contains(text(),'Remove Frame')]").click => "ok"
Environment
- Mozilla Firefox 52.2.0
- ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
- selenium-webdriver-2.53.4
-
I am not able to follow link in x-ray scraper and fetch the data
x(url,some_scope,{ 'title':'title', 'desc': x('.coloriginaljobtitle a@href',{ desc: '.desc', apply_link:'.applyBtnTopDiv a@href' }) }) //callback for this xray ((err, result)=>{ console.log(result) });
this is my actual code in xray scraping I can't follow link to next page. Wee heres the catch i can manuaaly scrape that link and scrape data but not able to follow that particular links. im just geting title not sub part .i.e desc and apply_link
-
list index out of range, trying to append element to list inside a function
So I got this code here:
def tickervals(ticker): tdvals = [] for tr in soup.findAll('tr'): for td in tr.find_all("td", {'name':'ju.cp.'+ticker+'.OSE'}): if not td.attrs.get('style'): tdvals.append(td.text) for td in tr.find_all("td", {'name':'ju.volume.'+ticker+'.OSE'}): if not td.attrs.get('style'): tdvals.append(td.text) amountlink = requests.get("url"+ticker+".remainer") amountsoup = BeautifulSoup(amountlink.content, 'html.parser') rows = amountsoup.find_all('tr') for row in rows: td_cells = amountsoup.find_all('td') tdvals.extend(td_cells[10].text) time.sleep(0.5) return(tdvals) recdict = {ticker : tickervals(ticker) for ticker in tickers}
which returns "list index out of range" for the tdvals.extend(td_cells[10].text) at the end of the function. Running this piece of code outside the function:
testlist = [] amountlink = requests.get("url for scraping") amountsoup = BeautifulSoup(amountlink.content, 'html.parser') rows = amountsoup.find_all('tr') for row in rows: td_cells = amountsoup.find_all('td') testlist.append(td_cells[10].text) testlist
works, where teslist will print out the list with the appended element scraped by the soup. So how can I get it to work in the function mentioned first?
Thanks
-
Extract usernames of people that liked a picture on Instagram using Python?
I'm pretty new to web scraping and I would like to create a little Python script allowing to extract in a list all usernames of people have liked a particular picture of a public Instagram account.
I started looking through selenium which looks like a great tool but I've also tried BeautifulSoup. All of this in Python.
For instance, what would be the steps to follow to get the list of likers' usernames of this picture: https://www.instagram.com/p/BuE9ZDPn8y_/ ?
Thanks for your help!
-
Scraping JSON data from e-commerce Ajax site with Python
Previously, I posted a question on how do I get the data from an AJAX website which is from this link: Scraping AJAX e-commerce site using python
I understand a bit on how to get the response which is using the chrome F12 in Network tab and do some coding with python to display the data. But I barely can't find the specific API url for it. The JSON data is not coming from a URL like the previous website, but it is in the Inspect Element in Chrome F12.
My real question actually is how do I get ONLY the JSON data using BeautifulSoup or anything related to it? After I can get only the JSON data from the application/id+json then I will convert it to be a JSON data that python can recognize so that I can display the products into table form.
One more problem is after several time I run the code, the JSON data is missing. I think the website will block my IP address. How to I solve this problem?
Here is the website link:
https://www.lazada.com.my/catalog/?_keyori=ss&from=input&page=1&q=h370m&sort=priceasc
Here is my code
from bs4 import BeautifulSoup import requests
page_link = 'https://www.lazada.com.my/catalog/?_keyori=ss&from=input&page=1&q=h370m&sort=priceasc'
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
print(page_content)
-
Jupiter notebook and BeautifulSoup4 installation
I have installed BeautifulSoup both using
pip install beautifulsoup4
pip install and usingconda install -c anaconda beautifulsoup4
and also tried to install it directly from the jupiter notebook usingimport pip if int(pip.__version__.split('.')[0])>9: from pip._internal import main else: from pip import main def install(package): main(['install', package]) install('BeautifulSoup4')
When I try to import the module I get
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) <ipython-input-8-9e5201d5ada7> in <module> ----> 1 import BeautifulSoup4
ModuleNotFoundError: No module named 'BeautifulSoup4'`
I want to premise that I'm a noob at this, I always have problems understanding where I should install new python modules, and for some reason they always get installed everywhere but where I need them. I searched here and on google but I could not find a answer that worked or that could set me on the right track to solve the problem.
Could some PRO explain step by step how to install the modules correctly, so that myself and the other people who might read this can, not only fix the problem, but also understand better how the problem was originated and how to fix similar problems in the future? Thanks