I have two csv files which have different row numbers.
test1.csv
num,sam
1,1.2
2,1.13
3,0.99
test2.csv
num,sam
1,1.2
2,1.1
3,0.99
4,1.02
I would like to read the sam columns and append them to an empty dataframe. Thing is that, when I read test1.csv, I extract the base file name, test1 and want to append the sam column based on the `column header in the empty dataframe.
big_df = pd.DataFrame(columns =['test1','test2'])
pwd = os.getcwd()
for file in os.listdir(pwd):
filename = os.fsdecode(file)
if filename.endswith(".csv"):
prog = filename.split('.')[0] # test1 test2
df = pd.read_csv(filename, usecols=['sam'])
# The read dataframe has one column
# Move/append that column to the big_df where column == prog
big_df[prog] = df
print(big_df)
But big_df misses the fourth row of test2.csv.
test1 test2
0 1.20 1.20
1 1.13 1.1
2 0.99 0.99
I expect to see
test1 test2
0 1.20 1.20
1 1.13 1.1
2 0.99 0.99
3 NaN 1.02
How can I fix that?
CodePudding user response:
Using pandas.concat and a simple dictionary comprehension:
files = ['test1.csv', 'test2.csv']
df = pd.concat({f.rsplit('.', 1)[0]: pd.read_csv(f).set_index('num')['sam']
for f in files}, axis=1)
output:
test1 test2
num
1 1.20 1.20
2 1.13 1.10
3 0.99 0.99
4 NaN 1.02
CodePudding user response:
You could approach it differently and use concat instead of creating an empty data frame in the first place. Might be also a bit more efficient. In code that reads like
def get_columns():
for file in os.listdir(pwd):
filename = os.fsdecode(file)
if filename.endswith(".csv"):
prog = filename.split('.')[0] # test1 test2
yield pd.read_csv(filename, usecols=['sam'])['sam'].rename(prog)
big_df = pd.concat(get_columns(), axis=1)
Otherwise, you could use merge with outer as mentioned in the comment.
CodePudding user response:
read bothe csv's first using pd.read_csv("filename",sep=',')
df1
num sam
0 1 1.20
1 2 1.13
2 3 0.99
df2
num sam
0 1 1.20
1 2 1.10
2 3 0.99
3 4 1.02
the do the following
df2.drop('num',axis=1,inplace=True)
df2.columns=['test2']
df2['test1']=df1['sam']
output:
test2 test1
0 1.20 1.20
1 1.10 1.13
2 0.99 0.99
3 1.02 NaN
