Selenium driver.get(url) doesn't seem to be getting html

Sorry for such a vague title for the question, but I really couldn't find a way to elaborate. So here's the problem. EDIT: browser.get(url) doesn't seem to be doing anything. And here's the environment I'm on right now(uname -a output): Linux goorm 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86 _64 GNU/Linux

>>> from selenium import webdriver
>>> browser = webdriver.PhantomJS()
>>> browser.get('https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa')
>>> browser.page_source
u'<html><head></head><body></body></html>'
>>> browser.current_url
u'about:blank'

I'm kind of thinking that it's on the webdriver, and want to debug it, how would I find out if the driver's not functioning? If it's not the driver, what would the problem be?

2 answers

  • answered 2018-03-11 13:49 M Dillon

    Unfortunately I'm away from the machine I have selenium installed on, so I can't test it myself, but I have a couple suggestions to try.

    First, you can try adding a sleep statement between the browser.get() call and the rest of the script. I've had certain pages that load content dynamically need some extra time after the get call. Usually I'll prefer to use implicit or explicit waits in selenium instead, but I don't know if those would work well with browser.page_source, since that does exist, albeit incorrectly.

    Second, you can try a different browser driver. Both Firefox and Chrome have headless options, so you won't have to deal with visible browsers popping up (I'm assuming you don't want them to by your choice in PhantomJS).

  • answered 2018-03-12 11:33 DebanjanB

    You havn't mentioned the Selenium Python client version, and the PhantonJS exe version you are using.

    From my local Windows 8 machine using Python v3.6.1, Selenium Python Client v3.10.0 and phantomjs v2.1.1 binaries I am able to retrieve the following which seems pretty perfect :

    • Code :

      from selenium import webdriver
      
      browser = webdriver.PhantomJS(executable_path=r'C:\Utility\phantomjs-2.1.1-windows\bin\phantomjs.exe')
      browser.get('https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa')
      print(browser.page_source)
      print(browser.current_url)
      
    • Console Output :

       <!DOCTYPE html><html xmlns:cc="http://creativecommons.org/ns#" class="u-overflowHidden"><head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# medium-com: http://ogp.me/ns/fb/medium-com#"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=contain"><title>How to Scrape Javascript Rendered Websites with Python &amp; Selenium</title><link rel="canonical" href="https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"><meta name="title" content="How to Scrape Javascript Rendered Websites with Python &amp; Selenium"><meta name="referrer" content="always"><meta name="description" content="On my quest to learn, I wanted to eventually be able to write beginner- friendly guides that really help make one feel like they can improve. Normally, we’ll get hit with very long documentations and…"><meta name="theme-color" content="#000000"><meta property="og:title" content="How to Scrape Javascript Rendered Websites with Python &amp; Selenium"><meta property="og:url" content="https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"><meta property="og:image" content="https://cdn-images-1.medium.com/max/1200/1*-lqb_gM0ai9M4YniJRzWyQ.png"><meta property="fb:app_id" content="542599432471018"><meta property="og:description" content="In this guide:"><meta name="twitter:description" content="In this guide:"><meta name="twitter:image:src" content="https://cdn-images-1.medium.com/max/1200/1*-lqb_gM0ai9M4YniJRzWyQ.png"><link rel="publisher" href="https://plus.google.com/103654360130207659246"><link rel="author" href="https://medium.com/@hoppy"><meta property="author" content="Alex Hop"><meta property="og:type" content="article"><meta name="twitter:card" content="summary_large_image"><meta property="article:publisher" content="https://www.facebook.com/medium"><meta property="article:author" content="Alex Hop"><meta name="robots" content="index, follow"><meta property="article:published_time" content="2016-11-11T01:17:38.979Z"><meta name="twitter:site" content="@Medium"><meta property="og:site_name" content="Medium"><meta name="twitter:label1" value="Reading time"><meta name="twitter:data1" value="7 min read"><meta name="twitter:app:name:iphone" content="Medium"><meta name="twitter:app:id:iphone" content="828256236"><meta name="twitter:app:url:iphone" content="medium://p/c137892216aa"><meta property="al:ios:app_name" content="Medium"><meta property="al:ios:app_store_id" content="828256236"><meta property="al:android:package" content="com.medium.reader"><meta property="al:android:app_name" content="Medium"><meta property="al:ios:url" content="medium://p/c137892216aa"><meta property="al:android:url" content="medium://p/c137892216aa"><meta property="al:web:url" content="https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"><link rel="search" type="application/opensearchdescription+xml" title="Medium" href="/osd.xml"><link rel="alternate" href="android-app://com.medium.reader/https/medium.com/p/c137892216aa"><script type="application/ld+json">{"@context":"http://schema.org","@type":"NewsArticle","image":{"@type":"ImageObject","width":1920,"height":481,"url":"https://cdn-images-1.medium.com/max/1920/1*-lqb_gM0ai9M4YniJRzWyQ.png"},"datePublished":"2016-11-11T01:17:38.979Z","dateModified":"2018-03-05T13:55:30.448Z","headline":"How to Scrape Javascript Rendered Websites with Python & Selenium","name":"How to Scrape Javascript Rendered Websites with Python & Selenium","keywords":["Python","Ubuntu","Selenium","Automated Testing","Web Scraping"],"author":{"@type":"Person","name":"Alex Hop","url":"https://medium.com/@hoppy"},"creator":["Alex Hop"],"publisher":{"@type":"Organization","name":"Medium","url":"https://medium.com/","logo":{"@type":"ImageObject","width":308,"height":60,"url":"https://cdn-images-1.medium.com/max/308/1*OMF3fSqH8t4xBJ9-6oZDZw.png"}},"mainEntityOfPage":"https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"}</script><link rel="stylesheet" href="https://cdn-static-1.medium.com/_/fp/css/main-branding-base.hYiEpYs3x8GgQzREEhW49Q.css"><script>if (window.top !== window.self) window.top.location = window.self.location.href;var OB_startTime = new Date().getTime(); var OB_loadErrors = []; function _onerror(e) { OB_loadErrors.push(e) }; if (document.addEventListener) document.addEventListener("error", _onerror, true); else if (document.attachEvent) document.attachEvent("onerror", _onerror); function _asyncScript(u) {var d = document, f = d.getElementsByTagName("script")[0], s = d.createElement("script"); s.type = "text/javascript"; s.async = true; s.src = u; f.parentNode.insertBefore(s, f);}function _asyncStyles(u) {var d = document, f = d.getElementsByTagName("script")[0], s = d.createElement("link"); s.rel = "stylesheet"; s.href = u; f.parentNode.insertBefore(s, f); return s}(new Image()).src = "/_/stat?event=pixel.load&origin=" + encodeURIComponent(location.origin);</script><script>window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date; ga("create", "UA-24232453-2", "auto", {"allowLinker": true, "legacyCookieDomain": window.location.hostname}); ga("send", "pageview");</script><script async="" src="https://www.google-analytics.com/analytics.js"></script><script>(function () {var height = window.innerHeight || document.documentElement.clientHeight || document.body.clientHeight; var width = window.innerWidth || document.documentElement.clientWidth || document.body.clientWidth; document.write("<style>section.section-image--fullBleed.is-backgrounded {padding-top: " + Math.round(1.1 * height) + "px;}section.section-image--fullScreen.is-backgrounded, section.section-image--coverFade.is-backgrounded {min-height: " + height + "px; padding-top: " + Math.round(0.5 * height) + "px;}.u-sizeViewHeight100 {height: " + height + "px !important;}.u-sizeViewHeight110 {height: " + Math.round(1.1 * height) + "px !important;}.u-sizeViewHeightMin100 {min-height: " + height + "px !important;}.u-sizeViewHeightMax100 {max-height: " + height + "px !important;}section.section-image--coverFade {height: " + height + "px;}.section-aspectRatioViewportPlaceholder, .section-aspectRatioViewportCropPlaceholder {max-height: " + height + "px;}.section-aspectRatioViewportBottomSpacer, .section-aspectRatioViewportBottomPlaceholder {max-height: " + Math.round(0.5 * height) + "px;}.zoomable:before {top: " + (-1 * height) + "px; left: " + (-1 * width) + "px; padding: " + height + "px " + width + "px;}</style>");})()</script><style>section.section-image--fullBleed.is-backgrounded {padding-top: 330px;}section.section-image--fullScreen.is-backgrounded, section.section-image--coverFade.is-backgrounded {min-height: 300px; padding-top: 150px;}.u-sizeViewHeight100 {height: 300px !important;}.u-sizeViewHeight110 {height: 330px !important;}.u-sizeViewHeightMin100 {min-height: 300px !important;}.u-sizeViewHeightMax100 {max-height: 300px !important;}section.section-image--coverFade {height: 300px;}.section-aspectRatioViewportPlaceholder, .section-aspectRatioViewportCropPlaceholder {max-height: 300px;}.section-aspectRatioViewportBottomSpacer, .section-aspectRatioViewportBottomPlaceholder {max-height: 150px;}.zoomable:before {top: -300px; left: -400px; padding: 300px 400px;}</style>
       .
       .
       .
       https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa