I am trying to figure out how to use only the files containing certain strings in my for loop code, and I am having trouble thinking through the order of how this should be sorted out.
I have the following code:
folder = 'Path/to/My/Folder'
for f in folder:
df = pd.read_csv(f)
df_new = df['Value'] * 2
df_new.to_csv('Path/to/My/Folder/Name.csv')
What this does is go to my folder and FOR EACH file in my folder, all .csv files, open the .csv file as a dataframe, and multiply the 'Value' column by 2, and then send that new dataframe to an output .csv file. However, what I want to do, is only iterate the files in the folder that contain certain strings, so for this example, only using the file if the string contains 'Blue', 'Red' or 'Green'. I would then want to append those color names as strings to the ouput .csv files so I know which is which. This is what I am thinking:
l = ['Blue', 'Red', 'Green']
folder = 'Path/to/My/Folder'
for f in folder IF contains l:
df = pd.read_csv(f)
df_new = df['Value'] * 2
df_new.to_csv(f'Path/to/My/Folder/Name_{i}.csv')
So now I made a list of the strings of interest. So my code now is saying loop through the files in the folder to do the operation, but ONLY use those files that contain either 'Blue', 'Red', or 'Green' in the file name. And then lastly, append the color name to the output .csv file name, so I know which output file is which. Is this the correct approach? I am confused about how to actually structure this logically with the correct syntax.
CodePudding user response:
You can try to use glob for this purpose. There, you can specify patterns that your file names should match up.
CodePudding user response:
If performance is not an issue, than this code should work for you:
colors = ['Blue', 'Red', 'Green']
folder = 'Path/to/My/Folder'
for f in folder:
for color in colors:
if color in f:
df = pd.read_csv(f)
df['Value'] = df['Value'] * 2
df.to_csv(f'Path/to/My/Folder/Name_{color}.csv')
If you see any warning, use pd.options.mode.chained_assignment = None
