I have a CSV file like this:
Time Latitude Longitude
2021-09-12 23:13 44.63 -63.56
2021-09-14 23:13 43.78 -62
2021-09-16 23:14 44.83 -54.6
2021-09-12 23:13 is under Time column.
I would like to open it using pandas. But there is a problem with the first column. It contains a space. If I open it using:
import pandas as pd
points = pd.read_csv("test.csv", delim_whitespace=True)
I get
| Time | Latitude | Longitude | |
|---|---|---|---|
| 2021-09-12 | 23:13 | 44.630 | -63.560 |
| 2021-09-14 | 23:13 | 43.780 | -62.000 |
| 2021-09-16 | 23:14 | 44.830 | -54.600 |
But I would like to skip the space in the first column in CSV (2021-09-12 23:13 should be under Time column) like:
| Time | Latitude | Longitude | |
|---|---|---|---|
| 0 | 2021-09-12 23:13 | 44.630 | -63.560 |
| 1 | 2021-09-14 23:13 | 43.780 | -62.000 |
| 2 | 2021-09-16 23:14 | 44.830 | -54.600 |
How can I ignore the first space when using pd.read_csv?
Please do not stick to this csv file. This is a general question to skip (not to consider as a delimiter) the first space(s) in the first column. Because everyone knows that the first space is part of the time value, not a delimiter.
CodePudding user response:
Your data is in 2 different kind of formats:
- your headerrow has a single space between
'Latitude'and'Longitude'. - the "data" rows look to be separated by multiple spaces.
You can either edit your data and add a second space between lat & long or trick it by supplying the column headers separately:
Create file:
with open("test.csv","w") as f:
f.write("""Time Latitude Longitude
2021-09-12 23:13 44.63 -63.56
2021-09-14 23:13 43.78 -62
2021-09-16 23:14 44.83 -54.6""")
Parse file:
import pandas as pd
# ignore files headers, supply own, use multiple spaces as seperator
df = pd.read_csv("test.csv", delimiter = " ",
header=0, names = ["Time","Latitude","Longitude"])
print (df)
Output:
Time Latitude Longitude
0 2021-09-12 23:13 44.63 -63.56
1 2021-09-14 23:13 43.78 -62.00
2 2021-09-16 23:14 44.83 -54.60
CodePudding user response:
What you have shown is not a csv file. Full stop. Pandas read_csv is indeed versatile enough to possibly find workaround allowing to process it. But it is actually a fixed width fields file and should be read with pd.read_fwf:
pd.read_fwf(file_name, [(0,16), (16,26), (26, 40)])
directly gives:
Time Latitude Longitude
0 2021-09-12 23:13 44.63 -63.56
1 2021-09-14 23:13 43.78 -62.00
2 2021-09-16 23:14 44.83 -54.60
From your edit, you only want to tell read_csv to consider the first white space as a non delimiter character. I know no simple way to do that. The hard way is to read the file, replace the first space in each and every line with a different character. Then you submit that changed file to read_csv with a custom converter for the first column to change the special character back to a space:
with open('test.csv') as fdin, open('test2.csv', 'w') as fdout):
fdout.write(next(fdin) # do not process the header line
for line in fdin:
fdout.write(line.replace(' ', '_', 1)
df = pd.read_csv('test2.csv', delim_whitespace=True,
converters = {'Time': lambda x: x.replace('_', ' ')})
It gives too:
Time Latitude Longitude
0 2021-09-12 23:13 44.63 -63.56
1 2021-09-14 23:13 43.78 -62.00
2 2021-09-16 23:14 44.83 -54.60
CodePudding user response:
Try to fix the column and the index after load the file:
import pandas as pd
points = pd.read_csv('test.csv', delim_whitespace=True)
points = points.assign(Time=pd.to_datetime(df.index ' ' df['Time'])) \
.reset_index(drop=True)
Output:
>>> points
Time Latitude Longitude
0 2021-09-12 23:13:00 44.63 -63.56
1 2021-09-14 23:13:00 43.78 -62.00
2 2021-09-16 23:14:00 44.83 -54.60
