Python PDF Files Text Extraction

I am trying to extract certain text from multiple lines from a PDF file in Python.

  • VNDLY, a Mason, Ohio-based cloud-based work management system, raised $35 million in Series B funding. ABC Ventures, XYZ Capital, and ABC Fund participated in the round.

So in this line, I would like to get the information from the line in a table format as below:

NAME   FUNDING       SERIES       BOLD NAME1     BOLD NAME2      BOLD NAME3
VNDLY  $35 million   Series B     ABC Ventures   XYZ Capital     ABC Fund

I have been able to extract the NAME but everything else doesn't seem to come together. And since there are multiple such lines it becomes harder. I would appreciate any help with this.

\pip install PyPDF2

import PyPDF2

import collections

from pandas import DataFrame

pdfFileObj = open('TSC.pdf', 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
numberPages = pdfReader.numPages
print(numberPages)

c = collections.Counter(range(numberPages))

#DEFINE A FUNCTION TO EXTRACT THE NAME OF THE STARTUPS AND ENTER IT INTO AN EXCEL SPREADSHEET

def startupName_TS():
    startup = []

    for i in c:
        page = pdfReader.getPage(i)
        content = page.extractText()
        new_content = content.strip()
        #print(new_content)
        my_string = new_content.replace('\n', '').replace('\r','')
        #print(my_string)
        lines = my_string.split('- ')
        #print(lines)
        for line in lines:   
            index1 = line.find(',')
            if index1 >= 0:
                startup.append(line[0:index1])
                print (line[0:index1])

    df = DataFrame({'Startup Name': startup})
    df
    df.to_excel('Venture Analysis.xlsx', sheet_name='sheet1', index=False)