Categorizing variable and make it a dummy in R-CodePudding

Guten Tag community :)

I'm currently working with the ESGscore of some companies. The ESGscore varies between 0 and 100.

I want to categorize the ESGscore into 4 segments:

0 - 25 --> poor --> 4
>25 - 50 --> medium --> 3
>50 - 75 --> good --> 2
75 - 100 --> excellent --> 1

The problem with dummy.code is that it is rearranging the ESGscore. So for example the ESGscore of the AIR PRODUCTS & CHEMICALS INC is always 'excellent' but the output shows me, that it is just medium.

This is what the CODE looks like:

Datensatz_final_so$ESG.Kategorien <- ifelse(Datensatz_final_so$ESGscore <= 25, "4",
                                            ifelse(Datensatz_final_so$ESGscore > 25 & Datensatz_final_so$ESGscore <= 50, "3",
                                                   ifelse(Datensatz_final_so$ESGscore > 50 & Datensatz_final_so$ESGscore <= 75, "2",
                                                          ifelse(Datensatz_final_so > 75, "1", 0))))``
    # Create ESGscore dummy #
    Dummy.ESG <- dummy.code(Datensatz_final_so$ESG.Kategorien)

colnames(Dummy.ESG) <- c("poor", "medium", "good", "excellent")

# Connect data and dummy #
Datensatz_final <- cbind(Datensatz_final, Dummy.ESG)

Do you know how to fix that?

One way is to rearrange colnames to

colnames(Dummy.ESG) <- c("good", "excellent", "poor", "medium")

but it is creating the problem, that R picks medium for reference in the analysis.

Thank you in advance! :)

DATA EXAMPLE:

    structure(list(Company = c("AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC", 
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC", 
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC", 
"AIR PRODUCTS & CHEMICALS INC", "HESS CORP", "HESS CORP", "HESS CORP", 
"HESS CORP", "HESS CORP", "HESS CORP", "HESS CORP", "APACHE CORP", 
"APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP", 
"APACHE CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP", 
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP", 
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "BALL CORP", "BALL CORP", 
"BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP", 
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", 
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "ECOLAB INC", 
"ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC", 
"ECOLAB INC", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", 
"EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", 
"FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", 
"FMC CORP", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", 
"HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", 
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE", 
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE", 
"HELMERICH & PAYNE"), Year = c(2011, 2012, 2013, 2014, 2015, 
2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 
2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 
2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 
2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 
2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 
2012, 2013, 2014, 2015, 2016, 2017), gvkey = c(1209, 1209, 1209, 
1209, 1209, 1209, 1209, 1380, 1380, 1380, 1380, 1380, 1380, 1380, 
1678, 1678, 1678, 1678, 1678, 1678, 1678, 1913, 1913, 1913, 1913, 
1913, 1913, 1913, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 2991, 
2991, 2991, 2991, 2991, 2991, 2991, 4213, 4213, 4213, 4213, 4213, 
4213, 4213, 4503, 4503, 4503, 4503, 4503, 4503, 4503, 4510, 4510, 
4510, 4510, 4510, 4510, 4510, 5439, 5439, 5439, 5439, 5439, 5439, 
5439, 5581, 5581, 5581, 5581, 5581, 5581, 5581), ESGscore = c(84.2750015258789, 
81.9225006103516, 77.4024963378906, 80.1125030517578, 78.6449966430664, 
76.3775024414062, 79.2699966430664, 69.4899978637695, 65.8300018310547, 
64.4300003051758, 74.3000030517578, 75.7600021362305, 71.4599990844727, 
74.6900024414062, 55.8300018310547, 56.0900001525879, 57.5, 60.75, 
60.8800010681152, 67.379997253418, 71.9899978637695, 82.9000015258789, 
77.3899993896484, 76.9300003051758, 78.7399978637695, 76.2283325195312, 
74.2125015258789, 68.3600006103516, 64.4100036621094, 65.6600036621094, 
63.75, 67.7300033569336, 67.5699996948242, 74.4300003051758, 
68.5699996948242, 86.5100021362305, 84.3099975585938, 82.6600036621094, 
82.3399963378906, 88.4100036621094, 90.0800018310547, 92.25, 
74.6999969482422, 72.3600006103516, 68.3899993896484, 67.9300003051758, 
65.629997253418, 74.9000015258789, 74.8600006103516, 81.6999969482422, 
79.370002746582, 79.0899963378906, 75.25, 81.9499969482422, 81.0199966430664, 
88.3399963378906, 59.8199996948242, 55.6500015258789, 52.2999992370605, 
51.8499984741211, 56.9199981689453, 66.620002746582, 65.3300018310547, 
85.9800033569336, 83.9499969482422, 85.1100006103516, 67.4300003051758, 
76.4400024414062, 69.9199981689453, 78.4599990844727, 19.0599994659424, 
17.5200004577637, 18.1200008392334, 23.5025005340576, 35.5349998474121, 
36.7350006103516, 41.1725006103516)), row.names = c(NA, -77L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response：

Let's take your data as df:

df<- structure(
  list(
    Company = c(
      "AIR PRODUCTS & CHEMICALS INC",
      ...
      ...
  ),
  row.names = c(NA,-77L),
  class = c("tbl_df",
            "tbl", "data.frame")
)

lets take categories and build a small dataframe, then a bit of dplyr

ESG <- c("poor", "medium", "good", "excellent")
da <- data.frame(ESGColumn = 1:4,FlatESG = ESG)

df <- df |> dplyr::mutate(ESGColumn = floor(ESGscore/25) 1) |>
  dplyr::left_join(da, by="ESGColumn") |>
  dplyr::select(-"ESGColumn")

head(df)

# A tibble: 6 × 5
  Company                       Year gvkey ESGscore FlatESG  
  <chr>                        <dbl> <dbl>    <dbl> <chr>    
1 AIR PRODUCTS & CHEMICALS INC  2011  1209     84.3 excellent
2 AIR PRODUCTS & CHEMICALS INC  2012  1209     81.9 excellent
3 AIR PRODUCTS & CHEMICALS INC  2013  1209     77.4 excellent
4 AIR PRODUCTS & CHEMICALS INC  2014  1209     80.1 excellent
5 AIR PRODUCTS & CHEMICALS INC  2015  1209     78.6 excellent
6 AIR PRODUCTS & CHEMICALS INC  2016  1209     76.4 excellent

Grzegorz

CodePudding user response：

Your sample data does not include a variable called ESG.Kategorien but it does include ESGscore. The following should give you what you want:

Datensatz_final_so$Dummy <- cut(Datensatz_final_so$ESGscore, breaks=c(0, 25, 50, 75, 100), labels=c("poor", "medium", "good", "excellent"))
table(Datensatz_final_so$Dummy)
# 
#      poor    medium      good excellent 
#         4         3        38        32 
levels(Datensatz_final_so$Dummy)
# [1] "poor"      "medium"    "good"      "excellent"

Note your original categorization places 75 in good AND excellent.