python find elemet form array in pandas dataframe

I have a csv file (the prefix could be any string ) :

prefix,path
pref1,path1
pref2,path2

and files :

pref1_file.txt
pref2_file.txt
pref3_file.txt 

I want to get the path of a file based on his prefix

result for this example :

pref1_file.txt : path1
pref2_file.txt : path2
pref3_file.txt : path_not_found

Here is my code :

dirName = 'C:\\Users\\TEST\\Desktop\\Test'

# get all  files in all folders
listOfFiles = list()

for (dirpath, dirnames, filenames) in os.walk(dirName):
    listOfFiles += [os.path.join(dirpath, file) for file in filenames]

df = pd.read_csv(dir_path + 'file.csv')

for elem in listOfFiles:
    file_name = os.path.basename(elem)
    for index, row in df.iterrows():
        if file_name.startswith(row['prefix']):
            print(file_name + ":" + row['mask'])
        else:
            print(file_name + ":" + "path_not_found")

it's work but without else conditon (i need to display "path_not_found" if the prefix is not found in the csv file)

Thanks

3 answers

  • answered 2022-01-19 16:46 Vivek Kalyanarangan

    Use -

    dict(zip(files, pd.Series(files).str.split('_').str[0].map(df1.set_index('prefix')['path']).fillna('path_not_found')))
    

    Output

    {'pref1_file.txt': 'path1',
     'pref2_file.txt': 'path2',
     'pref3_file.txt': 'path_not_found'}
    

    Here, files is listOfFiles in your data

    Explanation

    • Convert files to pd.Series
    • Split by _ and take the first part
    • Use pandas map to get the path
    • Convert to dict

  • answered 2022-01-19 17:06 Henry

    Try this:

    dirName = 'C:\\Users\\TEST\\Desktop\\Test'
    
    # get all  files in all folders
    listOfFiles = list()
    
    for (dirpath, dirnames, filenames) in os.walk(dirName):
        listOfFiles += [os.path.join(dirpath, file) for file in filenames]
    
    df = pd.read_csv(dir_path + 'file.csv')
    
    for elem in listOfFiles:
        file_name = os.path.basename(elem)
        df_prefix = df[df['prefix'].apply(lambda x: file_name.startswith(x))]
        if df_prefix.size > 0:
           print( df_prefix['prefix'].loc[0] + ":" + file_name)
        else:
           print(file_name + ": Not found")
    

    https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selection-by-callable

  • answered 2022-01-20 08:30 Med_siraj

    To complete @Henry's solution:

    df_prefix = df[df['prefix'].apply(lambda x: file_name.startswith(x))]
    if df_prefix.size > 0:
         print(file_name + " : " + df_prefix['path'].iloc[0])
    else:
         print(file_name + ": path_not_found")
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum