Data preparation with python

I have a text file that I want to divide into four parts as indicated in the code. it always generates me errors.

# First import pandas and the regex module
import pandas as pd
import numpy as np
import re

data = open("Discussion.txt", encoding="utf8")
contenu = data.read()
data.close()
print(contenu)

# Read the .txt file into a string
data = open("Discussion.txt", encoding="utf8")
string = data.read()
data.close()

#Split seperate lines into list of strings
splitstring = string.splitlines()

# For each list item find the data needed (with regex or indexing) 
# and assign to a dictionary
df = {}
for i in range(len(splitstring)):
    match = re.search(r'(.* .*) - (.*): (.*)',splitstring[1])

    line = {
    'Date' : splitstring[i][:10],
    'Time' : match.group(1),
    'Number' : match.group(2),
    'Text' : match.group(3)}
    df[i] = line

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-43a1f0fdf7c6> in <module>()
      8     line = {
      9     'Date' : splitstring[i][:10],
---> 10     'Time' : match.group(1),
     11     'Number' : match.group(2),
     12     'Text' : match.group(3)}

AttributeError: 'NoneType' object has no attribute 'group'

# Convert dictionary to pandas dataframe
dataframe = pd.DataFrame(df).T
#Finally send to csv
dataframe.to_csv(filepath)

File "<ipython-input-6-2b1b4e00c433>", line 3
    Finally send to csv
    ^
IndentationError: unexpected indent

Here is a preview of the content of print (content) in image:

1 answer

  • answered 2018-09-21 19:39 RPG

    Read the file with open() then perform a split on the object. It is default to a literal type Which delimiter to choose is up to you