I have a csv file :
prefix,path
pref1,path1
pref2,path2
and files :
pref1_file.txt
pref2_file.txt
pref3_file.txt
I want to get the path of a file based on his prefix
result for this example :
pref1_file.txt : path1
pref2_file.txt : path2
pref3_file.txt : path_not_found
Here is my code :
dirName = 'C:\\Users\\TEST\\Desktop\\Test'
# get all files in all folders
listOfFiles = list()
for (dirpath, dirnames, filenames) in os.walk(dirName):
listOfFiles = [os.path.join(dirpath, file) for file in filenames]
df = pd.read_csv(dir_path 'file.csv')
for elem in listOfFiles:
file_name = os.path.basename(elem)
for index, row in df.iterrows():
if file_name.startswith(row['prefix']):
print(file_name ":" row['mask'])
else:
print(file_name ":" "path_not_found")
it's work but without else conditon (i need to display "path_not_found" if the prefix is not found in the csv file)
Thanks
CodePudding user response:
Use -
dict(zip(files, pd.Series(files).str.split('_').str[0].map(df1.set_index('prefix')['path']).fillna('path_not_found')))
Output
{'pref1_file.txt': 'path1',
'pref2_file.txt': 'path2',
'pref3_file.txt': 'path_not_found'}
Here, files is listOfFiles in your data
Explanation
- Convert
filestopd.Series - Split by
_and take the first part - Use pandas
mapto get thepath - Convert to
dict
CodePudding user response:
Try this:
dirName = 'C:\\Users\\TEST\\Desktop\\Test'
# get all files in all folders
listOfFiles = list()
for (dirpath, dirnames, filenames) in os.walk(dirName):
listOfFiles = [os.path.join(dirpath, file) for file in filenames]
df = pd.read_csv(dir_path 'file.csv')
for elem in listOfFiles:
file_name = os.path.basename(elem)
df_prefix = df[lambda df: file_name.startswith(df['prefix'])]
if df_prefix.size > 0:
print( df_prefix['prefix'].loc[0] ":" file_name)
else:
print(file_name ": Not found")
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selection-by-callable
