IndexError: List index out of range when returning a variable

I'm just starting/learning to use the Google Cloud platform (functions in particular) and I wrote a simple python scraper using BeautifulSoup that is returning an error and I can't figure out why.

from bs4 import BeautifulSoup
import requests

def hello_world(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): HTTP request object.
    Returns:
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    """

    url = 'https://example.com/'
    req = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'})
    html = req.text
    soup = BeautifulSoup(html, 'html.parser')
    title = soup.title
    print(title)
    return title

When I print the title of the scraped page, that shows up in the logs fine. When I return the variable though, the logs report an "IndexError: list index out of range". When I return soup.prettify() it also works fine.

This is the Traceback that I get in the GCP logs

Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/layers/google.python.pip/pip/lib/python3.9/site-packages/flask/app.py", line 1953, in full_dispatch_request return self.finalize_request(rv) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/flask/app.py", line 1968, in finalize_request response = self.make_response(rv) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/flask/app.py", line 2117, in make_response rv = self.response_class.force_type(rv, request.environ) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/werkzeug/wrappers/base_response.py", line 269, in force_type response = BaseResponse(*_run_wsgi_app(response, environ)) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/werkzeug/wrappers/base_response.py", line 26, in _run_wsgi_app return _run_wsgi_app(*args) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/werkzeug/test.py", line 1123, in run_wsgi_app return app_iter, response[0], Headers(response[1]) IndexError: list index out of range

1 answer

  • answered 2021-06-19 16:54 Fabix

    The problem is probably caused by wrong indentation.

    By the way try with this code, maybe it easier to undersand:

    from bs4 import BeautifulSoup
    import requests
    
    url = 'https://stackoverflow.com'
    
    def titleScaper(url):
        req = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"})
        soup = BeautifulSoup(req.content, 'html.parser')
        soup.encode('utf-8') 
    
        return soup.title.get_text()
    
    title = titleScaper(url)
    print(title)