Home > Blockchain >  Want to find matching strings within a column, by group, and condition
Want to find matching strings within a column, by group, and condition

Time:01-28

I have a list of a few companies with each line as a deal. My table indicates whether they had a Initial Public Offering (IPO) or not, and what bank lead the deal:

df <- data.frame(Issuer=c("ABC Inc.", "ABC Inc.","ABC Inc.", "XYZ Co.","XYZ Co","123 Inc.","123 Inc."),
             IPO=c("Yes","No","No","Yes","No","Yes","No"),
             Bank =c("RBC","RBC","CIBC","RBC","Goldman","Citi","Citi"))

I want to create a list of all Issuers where they have the same Bank when IPO=="Yes" AND IPO=="No" like so:

Issuer Bank
ABC Inc. RBC
123 Inc. Citi

I've been circling the problem like this:

temp = df %>% group_by(Issuer) %>% filter(Bank[IPO=="Yes"] == Bank[IPO=="No"])

Thank you!

CodePudding user response:

The following groups the data by "Bank" and "Issuer", and checks to see if that combination has exactly two unique values for "IPO" (i.e., Yes and No). In that case, the Issuer/Bank combination meets your criteria. The rest of the dplyr chain filters down to the requested output information.

library(tidyverse)
df <- data.frame(Issuer=c("ABC Inc.", "ABC Inc.","ABC Inc.", "XYZ Co.","XYZ Co","123 Inc.","123 Inc."),
                 IPO=c("Yes","No","No","Yes","No","Yes","No"),
                 Bank =c("RBC","RBC","CIBC","RBC","Goldman","Citi","Citi"))

df.new <- df %>% 
  group_by(Issuer, Bank) %>% 
  mutate(
    both_ipo = length(unique(IPO)) == 2
  ) %>% 
  filter(both_ipo) %>% 
  select(Issuer, Bank) %>% 
  distinct

  Issuer   Bank 
  <chr>    <chr>
1 ABC Inc. RBC  
2 123 Inc. Citi 

You don't necessarily need to create the both_ipo flag, and could use it directly in filter like below, but I usually prefer to create any new attributes in a separate step so that I can check them.

df.new <- df %>% 
  group_by(Issuer, Bank) %>% 
  filter(length(unique(IPO)) == 2) %>% 
  select(Issuer, Bank) %>% 
  distinct

CodePudding user response:

Here's another way -

library(dplyr)

df %>% 
  group_by(Issuer, Bank) %>%
  filter(all(c('Yes', 'No') %in% IPO)) %>%
  ungroup %>%
  distinct(Issuer, Bank)

#  Issuer   Bank 
#  <chr>    <chr>
#1 ABC Inc. RBC  
#2 123 Inc. Citi 
  •  Tags:  
  • Related