Headless Chrome driver unable to fetch Instagram page in python Selenium driver
I tried to login and scrape a few details from my Instagram page in Python. I wanna do it in the headless mode because I'm going to deploy it in Heroku. So when I try to login using this code in the headless Chrome driver, the Instagram login page is not fetched. I have provided the screenshot also.
def login_insta(driver,username,password):
driver.get("https://www.instagram.com/accounts/login")
time.sleep(5)
driver.save_screenshot('scrnsh.png')
driver.find_element_by_xpath(
"//input[@name='username']").send_keys(username)
driver.find_element_by_xpath(
"//input[@name='password']").send_keys(password)
driver.find_element_by_xpath("//button/div[text()='Log In']").click()
print("Logged in")
options = Options()
PATH = r"C:\Users\pcname\Downloads\chromedriver"
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path=PATH, chrome_options=options)
login_insta(driver,"name","pass")
The screenshot said "Error Please wait a few minutes before you try again" This error doesn't occur with the headlesss Firefox driver, I don't how to add Firefox buildpacks in Heroku. I have recent Chrome driver version. Please help me solve this issue.
Or if you can suggest buildpacks for Firefox for Heroku, and the steps to add them, it would be very helpful. Thank you!
See also questions close to this topic
-
How to search for a selector whose class contains spaces with BeautifulSoup?
from bs4 import BeautifulSoup import requests def getPage(url): try: req = requests.get(url) except requests.exceptions.RequestException: return None return BeautifulSoup(req.text, 'html.parser') bs = getPage('https://ssearch.oreilly.com/?q=python') searchResults = bs.select('article.result') searchResults[0].select('p.note')
[<p class="note">By Gabriele Lanaro</p>, <p class="note date2">Publish Date: August 04, 2016 </p>]
I want to obtain the paragraph with the class of "note date2" but when I try to put it into the select method it returns me an empty list. I've also tried variations such as "note_date2" and "note-date2" but I unfortunately obtain the same result.
-
How do I print dicts through multiple files,
so,I need to print a dict.Heres what my file2.py file looks like
import json dict = {} file1 = open("dictfile.txt","w") file1.write(json.dumps(dict)) file1.close()
then my main file where I use this
from api import dict dict["smthng"] = "hi" print(dict["smthng"])
it prints hi,but my dictfile.txt looks like
{}
normal,but not what I was expecting.I want it to save the value assigned in main.py,but do it from file2.py.
-
How to sign with secp256k1 ECDSA in python with a personal private and public key
I have a private and public key that I would like to use for signing a
str
. I'm using Stark Bank's ECDSA. It works, but I need to be able to use my own keys for it to be accepted in streamr. I don't know any alternatives. If you have an easy to understand way, please share it to me.My code:
privateKey = PrivateKey() publicKey = privateKey.publicKey() signature = Ecdsa.sign(c, privateKey) print(Ecdsa.verify(c, signature, publicKey))
c
is the variable that gets signed. This gets hashed beforehand. I need to know how to be able to use my own keys, and starkbank does not give any more info on their github and I'm not at the level that I can read the files directly.Thaks in advance!
-
NameError: name 'score' is not defined
if score < driver.find_element_by_xpath("/html/body/div[@id='body']/div[@id='inner']/blockquote[@class='success']/strong"): NameError: name 'score' is not defined
How to avoid this error?
while True: driver.find_element_by_xpath("/html/body/div[@id='body']/div[@id='inner']/form[1]/blockquote[@class='success']/p[@class='center'][2]/a").click() Score = 8,363 if score < driver.find_element_by_xpath("/html/body/div[@id='body']/div[@id='inner']/blockquote[@class='success']/strong"): break
-
Screenshot is not displayed in Extend report in jenkins due to the prefix "http://localhost:8080/"
Condition: Screenshot files and report HTML file in the same folder, report file display screenshot properly from local.
Jenkins plug-in: HTML Publisher plugin
When I looked at the report and move my cursor to src in the report from Jenkins, it has the "http://localhost:8080" prefix, but in HTML code it doesn't have this prefix.
ex: http://localhost:8080/project_PATH/target/surefire-reports/html/Screenshot.jpg If remove this prefix, the screenshot is accessible.
Could please help to suggest how to remove the "http://localhost:8080" prefix, please?
I've been searched for answers and tried a lot but nothing helps.
I'm using MAC running java Selenium and Jenkins(http://localhost:8080) in my local.
-
Python Selenium Firefox Driver Crash after closing the popup window
I have opened multiple popup windows using the python selenium script. When I close one of the popups the driver object get crashed. Further, I can't able to access the driver object. Can you please help on this to fix the issue.?
Code Snippet:
driver.switch_to.window(driver.window_handles[2]) driver.close() time.sleep(5) driver.switch_to.window(driver.window_handles[1])
Error Output:
2021-04-21 09:55:47 ERROR The Exception in Virtual Media Testcases: Message: Failed to decode response from marionette <class 'selenium.common.exceptions.WebDriverException'> test_selenium.py 560 2
geckodriver.log:
1619015645365 webdriver::server DEBUG -> DELETE /session/c1318c5d-796b-4564-bdc9-68a95bb7cac4/window 1619015645366 Marionette TRACE 0 -> [0,638,"WebDriver:CloseWindow",{}] 1619015645391 Marionette DEBUG Received observer notification message-manager-disconnect ###!!! [Child][DispatchAsyncMessage] Error: PFilePicker::Msg___delete__ Route error: message sent to unknown actor ID 1619015645414 Marionette TRACE 0 <- [1,638,null,["2147483649","2147483658"]] 1619015645416 webdriver::server DEBUG <- 200 OK {"value":["2147483649","2147483658"]} 1619015645419 webdriver::server DEBUG -> GET /session/c1318c5d-796b-4564-bdc9-68a95bb7cac4/window/handles 1619015645419 Marionette TRACE 0 -> [0,639,"WebDriver:GetWindowHandles",{}] 1619015645420 Marionette TRACE 0 <- [1,639,null,["2147483649","2147483658"]] 1619015645420 webdriver::server DEBUG <- 200 OK {"value":["2147483649","2147483658"]} 1619015645421 webdriver::server DEBUG -> POST /session/c1318c5d-796b-4564-bdc9-68a95bb7cac4/window {"handle": "2147483649"} 1619015645422 Marionette TRACE 0 -> [0,640,"WebDriver:SwitchToWindow",{"handle":"2147483649","name":"2147483649"}] 1619015645422 Marionette TRACE 0 <- [1,640,null,{}] 1619015645422 webdriver::server DEBUG <- 200 OK {"value":null} 1619015645424 webdriver::server DEBUG -> GET /session/c1318c5d-796b-4564-bdc9-68a95bb7cac4/screenshot 1619015645424 Marionette TRACE 0 -> [0,641,"WebDriver:TakeScreenshot",{"full":false,"highlights":[],"id":null}] [Parent 77210, Gecko_IOThread] WARNING: pipe error (55): Connection reset by peer: file /builddir/build/BUILD/firefox-60.1.0/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 ###!!! [Parent][MessageChannel] Error: (msgtype=0x15007F,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv ###!!! [Parent][MessageChannel] Error: (msgtype=0x15007F,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv 1619015645445 Marionette DEBUG Register listener.js for window 32 1619015645477 Marionette DEBUG Register listener.js for window 34 A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down 1619015645539 Marionette DEBUG Received DOM event unload for [object XULDocument] 1619015645546 Marionette DEBUG Received observer notification message-manager-disconnect 1619015645548 Marionette TRACE 0 <- [1,641,null,{}] 1619015645565 webdriver::server DEBUG <- 500 Internal Server Error {"value":{"error":"unknown error","message":"Failed to find value field","stacktrace":""}} 1619015645567 webdriver::server DEBUG -> DELETE /session/c1318c5d-796b-4564-bdc9-68a95bb7cac4 1619015645569 webdriver::server DEBUG Deleting session 1619015645574 Marionette DEBUG Closed connection 0 1619015645657 Marionette DEBUG Received observer notification xpcom-will-shutdown 1619015645657 Marionette DEBUG New connections will no longer be accepted
Driver Information:
Selenium Version: 3.11.0 { "rotatable":false, "browserVersion":"60.1.0", "timeouts":{ "pageLoad":300000, "implicit":0, "script":30000 }, "acceptInsecureCerts":true, "moz:headless":false, "moz:geckodriverVersion":"0.26.0", "moz:webdriverClick":true, "moz:profile":"/tmp/rust_mozprofileEdeK7L", "moz:accessibilityChecks":false, "browserName":"firefox", "moz:useNonSpecCompliantPointerOrigin":false, "platformVersion":"3.10.0-862.6.3.el7.x86_64", "moz:processID":359676, "pageLoadStrategy":"normal", "platformName":"linux" }
-
Django Rest Framework - Attribute Error: 'function' has no attribute 'get_extra_actions'
everybody.
So I'm learning about django rest framework and how to deploy on Heroku. I'm having this issue in my app and I have no ideia how to solve it.
views.py:
from rest_framework import viewsets, status from rest_framework.decorators import api_view from rest_framework.views import Response from api import models, serializers from api.integrations.github import GithubApi @api_view(['GET']) class LibrarynViewSet(viewsets.ViewSet): queryset = models.Library.objects.all() serializer_class = serializers.Library(queryset, many=True) lookup_field = "name" def retrieve(self, request, login=None): return Response(serializers.data)
routes.py:
from django.urls import include, path from rest_framework.routers import DefaultRouter from api import views routers = DefaultRouter() routers.register("organization", views.LibraryViewSet, basename="Library") urlpatterns = [ path("", include(routers.urls)), ]
Error:
extra_actions = viewset.get_extra_actions() AttributeError: 'function' object has no attribute 'get_extra_actions'
As I said, I'm learning so I have no ideia how to solve it.
I would appreciate if you help me. Thank's a lot.
-
How to force HTTPS with create-react-app on heroku
I'm running a very basic react app on Heroku and would like to force https on the production server, while still running localhost on https. I cannot for the life of me figure out how to do this on Heroku.
SSL and domain has been setup and works fine when I manually enter https:// before the domain. But if I manually type http:// this also works.
How do I redirect all http traffic to https for my react app on Heroko. I would like this to happen:
- localhost:3000 still works in development
- https is forced on production domain
- create-react-app still runs without being "ejected"
Typically I'd set this up on the DNS, but Heroku only allows for a CNAME setup, so I cannot do this on the DNS level. I assume a custom server is required like express, but I cannot find any documentation on how to do this for production while stille working normally in development/localhost.
-
Push issue with heroku
I didn't have any issue with the git itself. I have committed all of my changes and code to my master branch, and pushed my commits and changes to heroku app. System showed me messages that successfully have pushed the commits, so when I try to push the commits again it always shows me the message "Everything up-to-date". But when I go into heroku website, it shows no app or commits that I have made. What can be the problem?
-
How to get Instagram post's media url and caption in Python (Instaloader)
I want to get the media URLs of each post a public user has posted on Instagram, along with its caption. I found about Instaloader (https://instaloader.github.io/) sometime back, but I have not been able to get it work to my needs. Can anyone point me in the right direction on what to do?
Thank you!
-
Bypass cookie agreement page while web scraping using Python
I am facing an issue with google agreement page cookies after scraping on a redirect google url.
I am trying to scrape from different pages on Google News uri, but when i run this code:
req = requests.get(url,headers=headers) with "headers" = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.422.0 Safari/534.1', 'Upgrade-Insecure-Requests': '1', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'DNT': '1', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'it-IT'} and for example URL = https://news.google.com/./articles/CAIiEMb3PYSjFFVbudiidQPL79QqGQgEKhAIACoHCAow-ImTCzDRqagDMKiIvgY?hl=it&gl=IT&ceid=IT%3Ait the "request.content" is the HTMLs code of agreement cookies page by Google.
I have tried also to convert the redirect link into a normal link but the response gives me the redirect link to this
I have the same problem related to this question (How can I bypass a cookie agreement page while web scraping using Python?).
Anyway, the solution proposed in that works only for the specific site.
Note: the entire code worked until few weeks ago.
-
how to access dynamic button on html page for web scraping using Python-Django
I am making a web scraping tool using python where I am trying to access the leads from the Indiamart website. My code is giving only 10 to 14 leads after extraction but manually I can check in source code that there are more than 14 leads are present.
There is a 'show more results' tab on the web page which gives more results. How can I make my Django project to click on the 'show more results' tab to fetch more leads.
-
How do I write my Dockerfile to include chromedriver?
I am a newbie to Dockerfile as well as Selenium. I was working on the web scraping using selenium and taking a screenshot. I am trying to dockerize it. This questions of mine seems to be answered in a few questions but it did not solve my error. FYI, I am using a Windows laptop.
The screenshot code works on my local machine but dockerfile seems to be giving me errors.
I am trying to use this version of chromedriver=89.0.4389.82
This is my UPDATED Dockefile,
FROM python:3.6 RUN pip install --upgrade pip && pip install pytest && pip install pytest-mock && pip install pytest-smtp && pip install mock \ pip install schedule && pip install selenium && pip install Selenium-Screenshot && pip install python-dateutil # For running code COPY src/screenshotcode.py / RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - RUN echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list RUN apt-get update -y RUN apt-get install -y google-chrome-stable RUN apt-get install libxi6 libgconf-2-4 -y ENV CHROMEDRIVER_VERSION 2.19 ENV CHROMEDRIVER_DIR /chromedriver RUN mkdir -p $CHROMEDRIVER_DIR # Download and install Chromedriver RUN wget -q --continue -P $CHROMEDRIVER_DIR "http://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip" RUN unzip $CHROMEDRIVER_DIR/chromedriver* -d $CHROMEDRIVER_DIR # Put Chromedriver into the PATH ENV PATH $CHROMEDRIVER_DIR:$PATH CMD [ "python", "screenshotcode.py" ]
My screenshot code,
import time from Screenshot import Screenshot_Clipping from selenium.common.exceptions import NoSuchElementException from selenium.webdriver import Chrome from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.options import Options from email_it import email_it from environmental_variables import environmental_variables from error_alert_email import error_alert_email from selenium import webdriver def screenshot(): ob=Screenshot_Clipping.Screenshot() chrome_options = Options() chrome_options.add_argument('--start-maximized') chrome_options.add_argument('--start-fullscreen') chrome_options.add_argument('--no-sandbox') chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe") print('taking screenshot...') img_url=ob.full_Screenshot(driver, path = path, image_name = label) print('closing driver...') driver.close() screenshot()
EDIT: I get the following error
PS C:\Users\me\Documents\Projects\> docker run screenshot File "scheduler.py", line 16, in <module> from screenshot import screenshot File "/screenshotcode.py", line 72, in <module> screenshot() File "/screenshotcode.py", line 32, in screenshot driver = webdriver.Chrome() File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__ desired_capabilities=desired_capabilities) File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__ self.start_session(capabilities, browser_profile) File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.19.346067 (6abd8652f8bc7a1d825962003ac88ec6a37a82f1),platform=Linux 5.4.72-microsoft-standard-WSL2 x86_64)
-
ChromeDriver v90 does not give correct URL when connecting to remote-debugging port through the ChromeDriverService
We use ChromeDriver in C# to connect to existing instances of Chrome that have the remote debugging port 9222 set. Here is how we connect:
var svc = ChromeDriverService.CreateDefaultService(path); ChromeOptions options = new ChromeOptions(); options.DebuggerAddress = "127.0.0.1:9222"; var driver = new ChromeDriver(svc, options); var url = driver.Url;
The problem is that the value of driver.Url is not what it used to be when using ChromeDriver version 88.
At that point and all earlier versions, driver.Url was the value of the URL for the current active tab in Chrome. So if Chrome had five tabs open and tab 4 is active, the Url was that of tab 4. And that made sense.Once we upgraded to version 90 that is no longer the case. It appears that the value of Url is... well it's not clear. Sometimes the last active tab, sometimes some other tab, sometimes the first. I do not see a pattern.
Is this an error in ChromeDriver? In the past, whatever was the active tab was the one that driver.Url yielded. Now it's indeterminate which wreaks havoc with our code.
Update: If I have two tabs open, then the driver.Url and driver.Title are for the tab that was just prior active. So always the other tab. With 3 tabs it may be the 2nd to the last active tab. This feels like a off-by-one error within an internal array of tabs.
-
Google Chrome cannot read and write to its data directory : selenium
Here's the issue i am facing now.
I could launch chrome driver. However my selenium code suddenly doesnt work and pops up above image.
Hope someone can shed light as i couldn't find a solution online. .