Introduction to word extraction from PDF [Python]

I am in the process on beginnin my master's thesis (in finance and accounting) and wanted to start on programming in Python. I've only "coded" in LaTeX previously, so no real experience in Python (I have learned some basics however). For my project, I need to extract words from a lot of PDF's (+50.000) from a specific "section" that is all the same in the PDF's.

Can someone point me in the best direction to this? The whole point is to create word vectors, that can compare the similarity between these and hereafter group/cluster them into groups where the similarity is highest..

I would really like to learn about this area, and would be greatly appreciated if someone could point me in the direction of some youtube videos / links etc. that can help me understand how to do it. :)