How to parse an SEC EDGAR filing

I'm trying to parse an SEC filing that is stored as text but with XML and HTML code in it. This is what I have tried:

page_link = ''
page_response = requests.get(page_link,proxies=proxyDict)
page_content = BeautifulSoup(page_response.content, "html.parser")

When I print page_content, it seems little difference from the original file. I wonder what would be the best way to clear out page_content. Thanks.