I have a table like this: enter image description here
I want only date column and units column (column 1 and 5), but with date in another format. I used code like this:
`import pandas as pd
customer_calls = pd.read_excel("sales.xlsx", usecols=[0, 4])
customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d" "00")
customer_calls.to_excel("sales_YYYYMMDD.xlsx")
print(customer_calls)`
It gives me what I wanted: enter image description here
I need it without header and index. But when I use header=0 or header=None, then can not read line:
`customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d" "00")`
cause there is no "Orderdate" name of column anymore. I tried to use 0 instead of name and all kind of stuff, but it always says error. How can I remove header and index but still choose date column after that?
I've read dozens of examples here, nothing solved this. Or I can no see it.
CodePudding user response:
If you want to remove the headers and index, then essentially you are seeking only the values. If so, you extract the values and use the tolist() method.
Here is an example of this:
import pandas as pd
# example dataframe
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['A', 'B', 'C'])
# extract values only
data = df.values.tolist()
print(data)
Here is the result of the above:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
The values are now just a list of lists.
CodePudding user response:
I've done it! Posting it for the future similar questions. It can be done really easily in panda, just two more lines.
import pandas as pd
# Read the file and specify which column is the date
customer_calls = pd.read_excel("sales.xlsx", usecols=[0, 1])
# Output with dates converted to YYYY-MM-DD
customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d" "00")
customer_calls.to_excel("sales_YYYYMMDD.xlsx")
#set the location of the first row with columns
customer_calls.columns = customer_calls.iloc[0]
#remove first row from the dataframe rows
customer_calls = customer_calls[1:]
#display
print(customer_calls)
it gives output like this:
0 2020010600 East
1 2020020900 Central
2 2020031500 West
3 2020040100 East
4 2020050500 Central
5 2020060800 East
6 2020071200 East
7 2020081500 East
8 2020090100 Central
9 2020100500 Central
10 2020110800 East
11 2020121200 Central
changed data format and without header
