Home > Software engineering >  How to parse a dataframe efficiently, while storing data (specific row, or multiple rows) in others
How to parse a dataframe efficiently, while storing data (specific row, or multiple rows) in others

Time:01-15

How to parse data on all rows, and use this row to populate other dataframes with data from multiple rows ?

I am trying to parse a csv file containing several data entry for training purpose as I am quite new to this technology.

My data consist in 10 columns, and hunderds of rows. The first column is filled with a code that is either 10, 50, or 90. Example :

Dataframe 1 :

Code Power/Character
10 Power-220
90 End
10 Power-290
90 End
10 Power-445
90 End
10 Power-390
50 Clotho
50 Kronus
90 End
10 Power-550
50 Ares
50 Athena
50 Artemis
50 Demeter
90 End

And the list goes on..

On one hand I want to be able to read the first cell, and to populate another dataframe directly if this is a code 10.

On the other hand, I'd like to populate another dataframe with all the codes 50s, but I want to be able to get the data from the previous code 10, as it hold the type of Power that is used, and populate a new column on this dataframe.

The new data frames are supposed to look like this:

Dataframe 2 :

Code Power
10 Power-220
10 Power-290
10 Power-445
10 Power-390
10 Power-550

Dataframe 3 :

Code Character Power
50 Clotho Power-390
50 Kronus Power-390
50 Ares Power-550
50 Athena Power-550
50 Artemis Power-550
50 Demeter Power-550

So far, I was using iterrows, and I've read everywhere that it was a bad idea.. but i'm struggling implementing another method..

In my code I just create two other dataframes, but I don't know yet a way to retrieve data from the previous cell. I would usually use a classic method, but I think it's rather archaic.

for index, row in df.iterrows():
    if (df.iat[index,0] == '10'):
      df2 = df2.append(df.loc[index], ignore_index = True)
    if (df.iat[index,0] == '50'):
      df3 = df3.append(df.loc[index], ignore_index = True)

Any ideas ?

CodePudding user response:

For df2, it's pretty simple:

df2 = df.rename(columns={'Power/Character': 'Power'}) \
        .loc[df['Code'] == 10, :]

For df3, it's a bit more complex:

# Extract power and fill forward values
power = df.loc[df['Code'] == 10, 'Power/Character'].reindex(df.index).ffill()

df3 = df.rename(columns={'Power/Character': 'Character'}) \
        .assign(Power=power).loc[lambda x: x['Code'] == 50]

Output:

>>> df2
    Code      Power
0     10  Power-220
2     10  Power-290
4     10  Power-445
6     10  Power-390
10    10  Power-550

>>> df3
    Code Character      Power
7     50    Clotho  Power-390
8     50    Kronus  Power-390
11    50      Ares  Power-550
12    50    Athena  Power-550
13    50   Artemis  Power-550
14    50   Demeter  Power-550

CodePudding user response:

You could simply copy the required rows to another dataframe,

df2 = df[df.col_1 == '10'].copy()

This will make a new dataframe df2 that contains only the rows from column col_1 that fits some criteria. The copy() function guarantees that the two dataframes are not identical, and changes in one do not affect the other.

If df2 already exists, you can concatenate them

df2 = pd.concat([df2, df[df.col_1 == '10'].copy()])
  •  Tags:  
  • Related