Using Pandas, i'm trying to keep on my DataFrame only 100 rows of each value of my column &quot-CodePudding

Home > database > Using Pandas, i'm trying to keep on my DataFrame only 100 rows of each value of my column "

Using Pandas, i'm trying to keep on my DataFrame only 100 rows of each value of my column "

Time：11-22

I have a super large dataset that i'm trying to shrink. My idea is to keep 100 rows by neighborhood.

Here's an overview of my data :

index	name	neighborhood
0	name 1	neighborhood A
1	name 2	neighborhood A
2	name 3	neighborhood B
3	name 4	neighborhood B
4	name 5	neighborhood C
5	name 6	neighborhood C
6	name 7	neighborhood D
7	name 8	neighborhood D
8	name 9	neighborhood E
9	name 10	neighborhood E

What is the more efficient way to do so ?

Thanks in advance

I'm expecting to create something that looks like :

index	name	neighborhood
0	name 1	neighborhood A
1	name 3	neighborhood B
2	name 5	neighborhood C
3	name 7	neighborhood D
4	name 9	neighborhood E

CodePudding user response：

It depends how you want to select the rows.

first n with `groupby.head`:

n = 100
out = df.groupby('neighborhood').head(n)

random n rows with `groupby.sample`:

n = 100
out = df.groupby('neighborhood').sample(n=n)

CodePudding user response：

i think, you can use groupby and *nth:

dfx=df.groupby('neighborhood').nth[:100]

Page link：https//www.codepudding.com/database/618163.html

Prev:Get max value across subset of rows and compare to constant to return max in new column

Next:For all values in a row, if a certain word is duplicated more than once, we want to remove it from

Tags：

pythonpandasdataframe

Related

Links：
CodePudding