Data preparation with python

I have a text file that I want to divide into four parts as indicated in the code. it always generates me errors.

# First import pandas and the regex module
import pandas as pd
import numpy as np
import re

data = open("Discussion.txt", encoding="utf8")
contenu =

# Read the .txt file into a string
data = open("Discussion.txt", encoding="utf8")
string =

#Split seperate lines into list of strings
splitstring = string.splitlines()

# For each list item find the data needed (with regex or indexing) 
# and assign to a dictionary
df = {}
for i in range(len(splitstring)):
    match ='(.* .*) - (.*): (.*)',splitstring[1])

    line = {
    'Date' : splitstring[i][:10],
    'Time' :,
    'Number' :,
    'Text' :}
    df[i] = line

AttributeError                            Traceback (most recent call last)
<ipython-input-5-43a1f0fdf7c6> in <module>()
      8     line = {
      9     'Date' : splitstring[i][:10],
---> 10     'Time' :,
     11     'Number' :,
     12     'Text' :}

AttributeError: 'NoneType' object has no attribute 'group'

# Convert dictionary to pandas dataframe
dataframe = pd.DataFrame(df).T
#Finally send to csv

File "<ipython-input-6-2b1b4e00c433>", line 3
    Finally send to csv
IndentationError: unexpected indent

Here is a preview of the content of print (content) in image:

1 answer

  • answered 2018-09-21 19:39 RPG

    Read the file with open() then perform a split on the object. It is default to a literal type Which delimiter to choose is up to you