I have an issue with my crawler and i want to ask
It's really stupid I know, but i want to find out what library i need to download so this won't give me an error : from basic_crawler.items import BasicCrawlerItem Or something else you imagine I am doing wrong. Thanks for your time
Download this for Python
See also questions close to this topic
Selenium Python Unable to scroll down, while fetching google reviews
I am trying to fetch google reviews with the help of selenium in python. I have imported webdriver from selenium python module. Then I have initialized self.driver as follows:-
self.driver = webdriver.Chrome(executable_path="./chromedriver.exe",chrome_options=webdriver.ChromeOptions())
After this I am using the following code to type the company name on google homepage whose reviews I need, for now I am trying to fetch reviews for "STANLEY BRIDGE CYCLES AND SPORTS LIMITED ":-
company_name = self.driver.find_element_by_name("q") company_name.send_keys("STANLEY BRIDGE CYCLES AND SPORTS LIMITED ") time.sleep(2)
After this to click on the google search button, using the following code:-
Then finally I am on the page where I can see results. Now I want to click on the View on google reviews button. For that using the following code:-
self.driver.find_elements_by_link_text("View all Google reviews").click() time.sleep(2)
Now I am able to get reviews, but only 10. I need at least 20 reviews for a company. For that I am trying to scroll the page down using the following code:
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(5)
Even while using the above code to scroll the down the page, I am still getting only 10 reviews. I am not getting any error though.
Need help on how to scroll down the page to get atleast 20 reviews. As of now I am able to get only 10 reviews. Based on my online search for this issue, people have mostly used: "driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")" to scroll the page down whenever required. But for me this is not working. I checked the the height of the page before and after ("driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")") is the same.
Create a new column from itereated rows of timedate data
I am attempting to create a downward velocity model for offshore drilling which uses the variables Depth (which increases every 1 foot) and DateTime data which is more intermittent and is only updated every foot of depth:
Dept DateTime 1141 5/24/2017 04:31 1142 5/24/2017 04:32 1143 5/24/2017 04:40 1144 5/24/2017 04:42 1145 5/25/2017 04:58
I am trying to get something like this:
Where Velocity iterated down dept/(DateTime gap)
one to one mapping in shell script
I am in process of migration. Migrating from old set of servers to new set of servers, where there is no logical relationship in the server names between the 2 sets. I have a script that runs on old server, takes all necessary backups and the run another script to copy the backups to new server and execute it.
I can combine both scripts(taking backup and copying to new server), if I can include a logic to map the old server to new server. Is there a way I can do this.
Old server New server King Queen Bat Ball water fire sand rock
What I am expecting is, if the script is run on server 'King', I want the script to identify that the corresponding new server is 'Queen' and copy the backups to Queen.
How to Use Heritrix in my java code to Crawl
first of all i'm sorry for my bad English and the next i have a java code that read a file from my system that contains several URLs and i want to send it to Heritrix to crawl and return me the result and my problem is i don't know how to use it and do it i have download Heritrix-jar file and add to my project thanks guys it's very important
Submitted URL blocked by robots.txt
In the last few weeks Google has been reporting an error in the Search Console. More and more of my pages are not allowed to crawl - Coverage report says: Submitted URL blocked by robots.txt.
As you se, my robots.txt is ultra simple, why for about 20% of my pages this error occurs, I am lost about..
User-agent: * Disallow: /cgi-bin/ Allow: / Sitemap: https://www.theartstory.org/sitemapindex.xml Host: https://www.theartstory.org
Examples Pages, which show an error:
How can i build a spider like screaming frog?
I want to build an SEO spider/Crawler for MY FYP. I am working on python with scrapy. Still, I don't know how to build an SEO spider like Screaming Frog.
Please help/guide me with this and tell me, how can I build such a thing.