Turning a dataset into a nested tuple

I have a .txt dataset that I need to convert into a nested tuple. Here's part of the dataset:

Morocco, Rabat, 0.75 million|Algeria: 1000 KM,  Tunisia: 400 KM, Western Sahara: 50 KM|North Atlantic Ocean
Western Sahara, , 1.51 million|Morocco: 50 KM, Mauritania: 350 KM|North Atlantic Ocean
Senegal, Dakar, 0.5 million|Mauritania: 50 KM, Guinea Bissau: 30 KM, Guinea: 30 KM, Mali: 30 KM|North Atlantic Ocean, South Atlantic Ocean
...

The tuple needs to have the following layout:

(countryname, capital, population, ((neighbor1, distance1), (neighbor2, distance2),...etc..), (waterbody1,...etc.))

I'm most of the way there, except that the nested neighbor and water body tuples only have the first of multiple values, like this:

('Senegal', 'Dakar', 0.5, ['Mauritania', 50 KM], ['North Atlantic Ocean'])

But it should look like this:

('Senegal', 'Dakar', 0.5, (('Mauritania', 50), ('Guinea Bissau', 30), ('Guinea', 30), ('Mali', 30)), ('North Atlantic Ocean', 'South Atlantic Ocean'))

Here's my code:

def readData(fileHandle):
    
    ds = ()

    for line in fileHandle:
        row = line.split('|')

        countryname, capital, population = row[0].split(", ")
        population = float(population.strip(" million"))

        ngbr_list = row[1].split(', ')
        for neighbor in ngbr_list:
            ## change neighbor from 'nbrX:distX KM' to 'nbrx', float(distx)
            n_name, distance = neighbor.split(': ')
            distance = float(distance.strip(' KM')) 
           
        water_body_list = row[2].split(', ')

        ds = ds + ((countryname, capital, population, (n_name, distance), (water_body_list)),)
    return ds

What am I doing wrong here?

2 answers

  • answered 2020-10-16 05:05 naam

    You are not storing neighbour name and distance. That is resulting into only one n_name and distance. try to store all the n_name and distance in a list and then add the list to dataset.

    Here is the code. This may help you.

    def readData(fileHandle):
        
        ds = ()
    
        for line in fileHandle:
            row = line.split('|')
            countryname, capital, population = row[0].split(", ")
            population = float(population.strip(" million"))
    
            neighbor_distance_list = []
            ngbr_list = row[1].split(', ')
            for neighbor in ngbr_list:
                ## change neighbor from 'nbrX:distX KM' to 'nbrx', float(distx)
                n_name, distance = neighbor.split(': ')
                distance = float(distance.strip(' KM'))
                neighbor_distance_list.append((n_name, distance))
               
            water_body_list = row[2].split(', ')
    
            ds = ds + ((countryname, capital, population, tuple(neighbor_distance_list), tuple(water_body_list)),)
        return ds
    

    Output is:

    (('Morocco', 'Rabat', 0.75, (('Algeria', 1000.0), ('Tunisia', 400.0), ('Western Sahara', 50.0)), ('North Atlantic Ocean Western Sahara', '', '1.51 million')), ('Senegal', 'Dakar', 0.5, (('Mauritania', 50.0), ('Guinea Bissau', 30.0), ('Guinea', 30.0), ('Mali', 30.0)), ('North Atlantic Ocean', 'South Atlantic Ocean')))

  • answered 2020-10-16 05:15 RootTwo

    str.strip(letters) does not remove a suffix from a string. It removes any combination of the letters from the end of a string. That is, population.strip('ilmno ') is the same as population.strip(' million'). Both would remove ' million' from the end of population, but they would also turn 'ten million' into te.