find comments in javascript and css in python requests module

I am trying to find all the comments in the JavaScript and CSS pages. This code finds the HTML comments in HTML pages.

import requests
from bs4 import BeautifulSoup as BS
from bs4 import Comment

with requests.session() as r:
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0'}
    r = requests.get('https://example.com/page.js', verify=False, headers=headers)
    response = r.text
    soup = BS(response, 'html.parser')
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))

    for c in comments:
        print(c)

But for JavaScript and CSS, the comments are between /* and */. Is there any way I can modify that code to retrieve JavaScript or CSS comments.

1 answer

  • answered 2021-05-03 18:12 motyzk

    I am not familiar enough with BeautifulSoup , but you can find where comments are, using response.find('/*'), response.find('*/') in a loop, using find's second parameter, to start looking for the next comment, only after the end of the previous one.

    disclaimer: you can still have /* or */ as text rather than a comment, this one is trickier to cope.