How to scrape HTML rendered by JavaScript

I need to write an automated scraper that can take care of websites that are rendered by JavaScript (like YouTube) or just simply use some JavaScript somewhere in their HTML to generate some content (like generating copyright year) and therefore downloading their HTML source make no sense as it won't be the final code (with what users will see).

I use Python with Selenium and WebDriver, so that I can execute JavaScript on a given website. My code for that purpose is:

def execute_javascript_on_website(self, js_command):
   driver = webdriver.Firefox(firefox_options = self.webdriver_options, executable_path = os.path.dirname(os.path.abspath(__file__)) + '/executables/geckodriver')
   driver.get(self.url)

  try:
     return driver.execute_script(js_command)

  except Exception as exception_message:
     pass

  finally:
     driver.close()

Where js_command = "return document.documentElement.outerHTML;".

By this code I'm able to get the source code, but not the rendered one. I can do js_command = "return document;" (as I would do in console), but than I will get <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="5a784804-f623-3041-9840-03f13ce83f53", element="585b43a1-f3b2-1e4a-b348-4ddaf2944550")> object that has the HTML but it's not possible to get it out of it.

Does anyone know about the way how to get HTML rendered by JavaScript (ideally in form of string), using Selenium? Or some other technique that would do it?

1 answer

  • answered 2018-11-11 13:51 Rumpelstiltskin Koriat

    driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")