Is there a reason RowSums(df[grep wouldn't work accurately?-CodePudding

I used

df$Total.P.n <- rowSums(df[grep('p.n', names(df), ignore.case = FALSE)])

to sum count values from any column name containing p.n, but the values it produced are way off. The columns are counts of certain combinations of language types in a language corpus. I want to get a summary of all times p.n. was used within other combinations, but am struggling. It seems like perhaps it is counting other occurences like e.sp.NR in my variable names, but shouldn't ignore.case=FALSE take care of that? I've also tried tidyverse and dplyr solutions to no avail.

Here's example of df structure:

ID.	do.pn.NP	do.pn.SE.	p.d.e.sp.SR
1510	4	6	0
1515	2	0	1

and what I need:

ID.	do.pn.NP	do.pn.SE.	p.d.e.sp.SR.	Total.P.n
1510	4	6	2	10
1515	2	0	1	2

CodePudding user response：

The argument pattern you are searching for e.g. p.n does not exist in df. Therefore I think you mean pn: Then your code works as expectect:

df$Total.P.n <- rowSums(df[grep('pn', names(df), ignore.case = FALSE)])

   ID. do.pn.NP do.pn.SE. p.d.e.sp.SR Total.P.n
1 1510        4         6           0        10
2 1515        2         0           1         2

CodePudding user response：

If we can use dplyr, I would suggest using a tidy-select function like "matches". And please mind that your regex is likely wrong (should be "pn" for the desired output, not "p.n").

library(dplyr)

df %>%
  mutate(Total_pn = rowSums(across(matches("pn"))))