How to parse a dataframe efficiently, while storing data (specific row, or multiple rows) in others-CodePudding

How to parse data on all rows, and use this row to populate other dataframes with data from multiple rows ?

I am trying to parse a csv file containing several data entry for training purpose as I am quite new to this technology.

My data consist in 10 columns, and hunderds of rows. The first column is filled with a code that is either 10, 50, or 90. Example :

Dataframe 1 :

Code	Power/Character
10	Power-220
90	End
10	Power-290
90	End
10	Power-445
90	End
10	Power-390
50	Clotho
50	Kronus
90	End
10	Power-550
50	Ares
50	Athena
50	Artemis
50	Demeter
90	End

And the list goes on..

On one hand I want to be able to read the first cell, and to populate another dataframe directly if this is a code 10.

On the other hand, I'd like to populate another dataframe with all the codes 50s, but I want to be able to get the data from the previous code 10, as it hold the type of Power that is used, and populate a new column on this dataframe.

The new data frames are supposed to look like this:

Dataframe 2 :

Code	Power
10	Power-220
10	Power-290
10	Power-445
10	Power-390
10	Power-550

Dataframe 3 :

Code	Character	Power
50	Clotho	Power-390
50	Kronus	Power-390
50	Ares	Power-550
50	Athena	Power-550
50	Artemis	Power-550
50	Demeter	Power-550

So far, I was using iterrows, and I've read everywhere that it was a bad idea.. but i'm struggling implementing another method..

In my code I just create two other dataframes, but I don't know yet a way to retrieve data from the previous cell. I would usually use a classic method, but I think it's rather archaic.

for index, row in df.iterrows():
    if (df.iat[index,0] == '10'):
      df2 = df2.append(df.loc[index], ignore_index = True)
    if (df.iat[index,0] == '50'):
      df3 = df3.append(df.loc[index], ignore_index = True)

Any ideas ?

CodePudding user response：

For df2, it's pretty simple:

df2 = df.rename(columns={'Power/Character': 'Power'}) \
        .loc[df['Code'] == 10, :]

For df3, it's a bit more complex:

# Extract power and fill forward values
power = df.loc[df['Code'] == 10, 'Power/Character'].reindex(df.index).ffill()

df3 = df.rename(columns={'Power/Character': 'Character'}) \
        .assign(Power=power).loc[lambda x: x['Code'] == 50]

Output:

>>> df2
    Code      Power
0     10  Power-220
2     10  Power-290
4     10  Power-445
6     10  Power-390
10    10  Power-550

>>> df3
    Code Character      Power
7     50    Clotho  Power-390
8     50    Kronus  Power-390
11    50      Ares  Power-550
12    50    Athena  Power-550
13    50   Artemis  Power-550
14    50   Demeter  Power-550

CodePudding user response：

You could simply copy the required rows to another dataframe,

df2 = df[df.col_1 == '10'].copy()

This will make a new dataframe df2 that contains only the rows from column col_1 that fits some criteria. The copy() function guarantees that the two dataframes are not identical, and changes in one do not affect the other.

If df2 already exists, you can concatenate them

df2 = pd.concat([df2, df[df.col_1 == '10'].copy()])