Home > Software engineering >  Aggregate function does not respect alphabetical order of a vector
Aggregate function does not respect alphabetical order of a vector

Time:01-27

I have a data.frame that (simplified) looks like this:

Binomial<-c(rep("Capra aegagrus",2),rep("Capreolus capreolus",3),"Capra ibex")
area<-c(500,200,10,300,15,5)
mydata<-data.frame(Binomial,area)

I want to obtain a new data.frame with the names of the species (mydata$Binomial) and the sum of all the areas of each species. This is my process so far:

#sum all the areas of each species 
a<-aggregate(mydata$area,list(mydata$Binomial),FUN=sum)
#create a vector with a length equal to mydata number of rows
n<-max(lengths(mydata)) 
#insert the total of the areas of each species in the first row, and fill the rest with NA 
b<-lapply(a, `length<-`, n) 
summary(b)  
#Group.1 is the species, x is the area
#create a column with the species 
mydata$Binomial_2<-b$Group.1 
#create a column with the areas
mydata$area_tot<-b$x 
#create a final data frame with the species and the total of the areas 
mydata_2<-mydata[c(1:3),c(3:4)]

So far it worked on different datasets. The problem is that if I check a, I see that species are not in the same order as they were in mydata: now Capra ibex is before Capreolus capreolus. This messes up my following analyses. Do you have a suggestion on how to preserve the alphabetical order of mydata in this script? This means it should be Capra aegagrus, then Capreolus capreolus, and lastly Capra ibex in mydata_2 as well. Thanks.

CodePudding user response:

Regarding the subject of the question, as a commenter already pointed out, the output is in alphabetical order so we assume that that is not the problem and that the problem is that we want the input order to be preserved.

1) factor Define the Binomial_2 column to be a factor with the desired order. That is, after defining mydata in the first chunk in the question replace the rest of the code with:

tmp <- transform(mydata, Binomial_2 = factor(Binomial, levels = unique(Binomial)))
aggregate(area ~ Binomial_2, tmp, FUN = sum)
##                Binomial_2 area
##     1      Capra aegagrus  700
##     2 Capreolus capreolus  325
##     3          Capra ibex    5

or using pipes:

mydata |>
  transform(Binomial_2 = factor(Binomial, levels = unique(Binomial))) |>
  stats:::aggregate.formula(formula = area ~ Binomial_2, FUN = sum)

2) ave Another approach is:

tmp <- with(mydata, 
  data.frame(Binomial_2 = Binomial, area = ave(area, Binomial, FUN = sum)))
unique(tmp)
##            Binomial_2 area
## 1      Capra aegagrus  700
## 2 Capreolus capreolus  325
## 3          Capra ibex    5

with pipes this could be expressed as

mydata |>
  with(data.frame(Binomial_2 = Binomial, area = ave(area, Binomial, FUN = sum))) |>
  unique()
  •  Tags:  
  • Related