How to access files in folder with same string in file name in python?-CodePudding

I am trying to use python to look through a directory folder and match up files with the same strings in the file name. Each of the files of interest in this folder is a ".csv" file, containing a single values column, Value_Blue for the Blue files and Value_Red for the Red files. The files in this folder go: Blue_111.csv, Blue_124.csv, Blue_145.csv, Blue_165.csv, Blue_176.csv... and then: Red_111.csv, Red_124.csv, Red_145.csv, Red_165.csv, Red_176.csv... and so on. The numbers associated with each of these files do not, as shown, go in equal interval order, but that is not relevant here. For most Blue files, there is a matching Red file with the same numbered extension attached to the file name. And so, there are some Blue files that do not have a corresponding Red file.

What I am trying to do is loop through all Blue files in the directory folder, open them as dataframes, and then find the matching Red file, open that file as a dataframe, and then multiply the Value columns together from both of those dataframes, and then send that new dataframe to a new .csv with the file name containing the same extension number.

For example, if in the loop it starts with Blue_111.csv, I then want it to find Red_111.csv. I want both of these .csv files to be opened as dataframes, and the Value columns multiplied. I then want to send this newly calculated dataframe to a new .csv called `Green_111.csv, and then keep going in the loop onto Blue_124.csv, etc.

Here is pseudocode exemplifying my goal:

folder = Path/to/Directory/Folder

for f in folder that is a .csv with "Blue" in filename:
     blue_df = pd.read_csv(f)  
     red = matching Red file
     red_df = pd.read_csv(red)
     green_df = blue_df.join(red_df) 
     green_df = green_df['Value_Blue'] * green_df['Value_Red']
     green_df.to_csv(Path/to/Directory/Folder/Green_*matching_number*.csv)

How can I match the files and then create the calculated output file with the same matched extension number in the file name?

CodePudding user response：

Use glob.glob() to match all filenames matching a wildcard pattern. Then you can use .replace() to replace Blue with Red and Green to create the other filenames.

import glob, os

folder = 'Path/to/Directory/Folder'

for blue in glob.glob(os.path.join(folder, "Blue_*.csv")):
    blue_df = pd.read_csv(blue)
    red = blue.replace("Blue_", "Red_")
    green = blue.replace("Blue_", "Green_")
    red_df = pd.read_csv(red)
    green_df = blue_df.join(red_df) 
    green_df = green_df['Value_Blue'] * green_df['Value_Red']
    green_df.to_csv(green)