I have a pandas dataframe A that has 5 columns and a few 100 thousand rows. What I need is to create a dataframe B that has 50 columns with 45 of them empty and the other 5 filled with the data I have in dataframe A.
The reason I need it in this format is because I want to eventually covert to a csv file with a (,) delimiter and most of the columns empty.
My Dataframe A looks like this:
| id | order | first | last | type |
|---|---|---|---|---|
| 1 | 111 | Johnny | Depp | type1 |
| 2 | 222 | Amber | Heard | type2 |
my Dataframe B should look something like this with more empty columns at the end:
| x | order | first | last | x | x | x | x | x | x | x | type | x | x | x | x |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| empty | 111 | Johnny | Depp | empty | empty | empty | empty | empty | empty | empty | type1 | empty | empty | empty | empty |
| empty | 222 | Amber | Heard | empty | empty | empty | empty | empty | empty | empty | type2 | empty | empty | empty | empty |
As you can see I need to specify the position of the column for the type column. This is because I eventually want to convert to CSV with the function
to_csv(delimiter=',') which will eventually looks like this:
,111,Johnny,Depp,,,,,,,,,type1,,,,,
,222,Amber,Heard,,,,,,,,,type2,,,,,
CodePudding user response:
import pandas as pd
a = pd.DataFrame({"id": [1, 2], "order": [111, 222], "first": ["Johnny", "Amber"], "last": ["Depp", "Heard"], "type": ["type1", "type2"]})
push = ["x", "order", "first", "last"] list("x" * 7) ["type"] list("x" * 4)
cols = [f"x{num}" if value == "x" else value for num, value in enumerate(push)]
b = pd.DataFrame({col: a[col] if col in a.columns.to_list() else None for col in cols})
print(b)
Seems like a fairly arbitrary problem, but I think this solves your specific request. Feel free to change the "x" * 7 value to reflect your wishes. Also you can replace None with np.nan if you import numpy as np. Or you could replace None with "" to insert empty strings. Your questions is a bit vague by stating "empty".
Output:
x0 order first last x4 x5 x6 x7 x8 x9 x10 type x12 x13 x14 x15
0 None 111 Johnny Depp None None None None None None None type1 None None None None
1 None 222 Amber Heard None None None None None None None type2 None None None None
CodePudding user response:
Ok, so I am assuming dataframe B already has the the first 5 columns filled with the data you need.
You can then just make a loop to add however many blank columns you want:
i=4 # However many columns the df started with
while i < 50: # or however many blank columns you want to add
df[f'column_{i}'] = ''
i =1
