Parsing brackets in HTML with Python

I am trying to parse some information thats in a var meta window, and I am just a little confused how to grab just the value for the "id".

My code is below

url = input("\n\nEnter URL: ")
print(Fore.MAGENTA + "\nSetting link . .  .")


def printID():
    print("")
session = requests.session()
response = session.get(url)
soup = bs(response.text, 'html.parser')
form = soup.find('script', {'id' : 'ProductJson-product-template'})
scripts = soup.findAll('id')

#get the id
'''
for scripts in form:
    data = soup.find_all()
    print data
    '''

print(form)

printID()

And the output of this prints

<script id="ProductJson-product-template" type="application/json">
    {"id":463448473639,"title":"n/a","handle":"n/a","description":"n/a"}
  </script>

Again, I just want to print just the value of the ID ("463448473639").

2 answers

  • answered 2018-01-11 19:57 DaLord

    It looks like you are going to want to do something like:

    import json
    id = json.loads(scripts[0].get_text())['id']
    

    I haven't tested that but if you want to get what is in between the script tags I think that is they way you will do it. get_text doc

  • answered 2018-01-11 20:02 Gaurang Shah

    you can retrieve all the attributes using following sytax.

    form.attrs 
    

    and if you looking some specific, it's dictionary.

    form['id']
    

    the full code is as below

    from bs4 import BeautifulSoup
    
    
    html_doc="""<script id="ProductJson-product-template" type="application/json">
        {"id":463448473639,"title":"n/a","handle":"n/a","description":"n/a"}
      </script>
    """
    
    soup = BeautifulSoup(html_doc, 'html.parser')
    print soup.find("script").attrs
    print soup.find("script")['id']
    

    However if you want to get value of ID from innerText {"id":463448473639,"title":"n/a","handle":"n/a","description":"n/a"} the only way to do is, as below.

    innerText = soup.find("script").getText()
    print innerText
    print ast.literal_eval(strip(innerText)).get("id")