Home > OS >  python find elemet form array in pandas dataframe
python find elemet form array in pandas dataframe

Time:01-20

I have a csv file :

prefix,path
pref1,path1
pref2,path2

and files :

pref1_file.txt
pref2_file.txt
pref3_file.txt 

I want to get the path of a file based on his prefix

result for this example :

pref1_file.txt : path1
pref2_file.txt : path2
pref3_file.txt : path_not_found

Here is my code :

dirName = 'C:\\Users\\TEST\\Desktop\\Test'

# get all  files in all folders
listOfFiles = list()

for (dirpath, dirnames, filenames) in os.walk(dirName):
    listOfFiles  = [os.path.join(dirpath, file) for file in filenames]

df = pd.read_csv(dir_path   'file.csv')

for elem in listOfFiles:
    file_name = os.path.basename(elem)
    for index, row in df.iterrows():
        if file_name.startswith(row['prefix']):
            print(file_name   ":"   row['mask'])
        else:
            print(file_name   ":"   "path_not_found")

it's work but without else conditon (i need to display "path_not_found" if the prefix is not found in the csv file)

Thanks

CodePudding user response:

Use -

dict(zip(files, pd.Series(files).str.split('_').str[0].map(df1.set_index('prefix')['path']).fillna('path_not_found')))

Output

{'pref1_file.txt': 'path1',
 'pref2_file.txt': 'path2',
 'pref3_file.txt': 'path_not_found'}

Here, files is listOfFiles in your data

Explanation

  • Convert files to pd.Series
  • Split by _ and take the first part
  • Use pandas map to get the path
  • Convert to dict

CodePudding user response:

Try this:

dirName = 'C:\\Users\\TEST\\Desktop\\Test'

# get all  files in all folders
listOfFiles = list()

for (dirpath, dirnames, filenames) in os.walk(dirName):
    listOfFiles  = [os.path.join(dirpath, file) for file in filenames]

df = pd.read_csv(dir_path   'file.csv')

for elem in listOfFiles:
    file_name = os.path.basename(elem)
    df_prefix = df[lambda df: file_name.startswith(df['prefix'])]
    if df_prefix.size > 0:
       print( df_prefix['prefix'].loc[0]   ":"   file_name)
    else:
       print(file_name   ": Not found")

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selection-by-callable

  •  Tags:  
  • Related