Guten Tag community :)
I'm currently working with the ESGscore of some companies. The ESGscore varies between 0 and 100.
I want to categorize the ESGscore into 4 segments:
0 - 25 --> poor --> 4
>25 - 50 --> medium --> 3
>50 - 75 --> good --> 2
75 - 100 --> excellent --> 1
The problem with dummy.code is that it is rearranging the ESGscore. So for example the ESGscore of the AIR PRODUCTS & CHEMICALS INC is always 'excellent' but the output shows me, that it is just medium.
This is what the CODE looks like:
Datensatz_final_so$ESG.Kategorien <- ifelse(Datensatz_final_so$ESGscore <= 25, "4",
ifelse(Datensatz_final_so$ESGscore > 25 & Datensatz_final_so$ESGscore <= 50, "3",
ifelse(Datensatz_final_so$ESGscore > 50 & Datensatz_final_so$ESGscore <= 75, "2",
ifelse(Datensatz_final_so > 75, "1", 0))))``
# Create ESGscore dummy #
Dummy.ESG <- dummy.code(Datensatz_final_so$ESG.Kategorien)
colnames(Dummy.ESG) <- c("poor", "medium", "good", "excellent")
# Connect data and dummy #
Datensatz_final <- cbind(Datensatz_final, Dummy.ESG)
Do you know how to fix that?
One way is to rearrange colnames to
colnames(Dummy.ESG) <- c("good", "excellent", "poor", "medium")
but it is creating the problem, that R picks medium for reference in the analysis.
Thank you in advance! :)
DATA EXAMPLE:
structure(list(Company = c("AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "HESS CORP", "HESS CORP", "HESS CORP",
"HESS CORP", "HESS CORP", "HESS CORP", "HESS CORP", "APACHE CORP",
"APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP",
"APACHE CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP",
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP",
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "BALL CORP", "BALL CORP",
"BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP",
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP",
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "ECOLAB INC",
"ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC",
"ECOLAB INC", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP",
"EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP",
"FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP",
"FMC CORP", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO",
"HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO",
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE",
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE",
"HELMERICH & PAYNE"), Year = c(2011, 2012, 2013, 2014, 2015,
2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016,
2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013,
2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011,
2012, 2013, 2014, 2015, 2016, 2017), gvkey = c(1209, 1209, 1209,
1209, 1209, 1209, 1209, 1380, 1380, 1380, 1380, 1380, 1380, 1380,
1678, 1678, 1678, 1678, 1678, 1678, 1678, 1913, 1913, 1913, 1913,
1913, 1913, 1913, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 2991,
2991, 2991, 2991, 2991, 2991, 2991, 4213, 4213, 4213, 4213, 4213,
4213, 4213, 4503, 4503, 4503, 4503, 4503, 4503, 4503, 4510, 4510,
4510, 4510, 4510, 4510, 4510, 5439, 5439, 5439, 5439, 5439, 5439,
5439, 5581, 5581, 5581, 5581, 5581, 5581, 5581), ESGscore = c(84.2750015258789,
81.9225006103516, 77.4024963378906, 80.1125030517578, 78.6449966430664,
76.3775024414062, 79.2699966430664, 69.4899978637695, 65.8300018310547,
64.4300003051758, 74.3000030517578, 75.7600021362305, 71.4599990844727,
74.6900024414062, 55.8300018310547, 56.0900001525879, 57.5, 60.75,
60.8800010681152, 67.379997253418, 71.9899978637695, 82.9000015258789,
77.3899993896484, 76.9300003051758, 78.7399978637695, 76.2283325195312,
74.2125015258789, 68.3600006103516, 64.4100036621094, 65.6600036621094,
63.75, 67.7300033569336, 67.5699996948242, 74.4300003051758,
68.5699996948242, 86.5100021362305, 84.3099975585938, 82.6600036621094,
82.3399963378906, 88.4100036621094, 90.0800018310547, 92.25,
74.6999969482422, 72.3600006103516, 68.3899993896484, 67.9300003051758,
65.629997253418, 74.9000015258789, 74.8600006103516, 81.6999969482422,
79.370002746582, 79.0899963378906, 75.25, 81.9499969482422, 81.0199966430664,
88.3399963378906, 59.8199996948242, 55.6500015258789, 52.2999992370605,
51.8499984741211, 56.9199981689453, 66.620002746582, 65.3300018310547,
85.9800033569336, 83.9499969482422, 85.1100006103516, 67.4300003051758,
76.4400024414062, 69.9199981689453, 78.4599990844727, 19.0599994659424,
17.5200004577637, 18.1200008392334, 23.5025005340576, 35.5349998474121,
36.7350006103516, 41.1725006103516)), row.names = c(NA, -77L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
Let's take your data as df:
df<- structure(
list(
Company = c(
"AIR PRODUCTS & CHEMICALS INC",
...
...
),
row.names = c(NA,-77L),
class = c("tbl_df",
"tbl", "data.frame")
)
lets take categories and build a small dataframe, then a bit of dplyr
ESG <- c("poor", "medium", "good", "excellent")
da <- data.frame(ESGColumn = 1:4,FlatESG = ESG)
df <- df |> dplyr::mutate(ESGColumn = floor(ESGscore/25) 1) |>
dplyr::left_join(da, by="ESGColumn") |>
dplyr::select(-"ESGColumn")
head(df)
# A tibble: 6 × 5
Company Year gvkey ESGscore FlatESG
<chr> <dbl> <dbl> <dbl> <chr>
1 AIR PRODUCTS & CHEMICALS INC 2011 1209 84.3 excellent
2 AIR PRODUCTS & CHEMICALS INC 2012 1209 81.9 excellent
3 AIR PRODUCTS & CHEMICALS INC 2013 1209 77.4 excellent
4 AIR PRODUCTS & CHEMICALS INC 2014 1209 80.1 excellent
5 AIR PRODUCTS & CHEMICALS INC 2015 1209 78.6 excellent
6 AIR PRODUCTS & CHEMICALS INC 2016 1209 76.4 excellent
Grzegorz
CodePudding user response:
Your sample data does not include a variable called ESG.Kategorien but it does include ESGscore. The following should give you what you want:
Datensatz_final_so$Dummy <- cut(Datensatz_final_so$ESGscore, breaks=c(0, 25, 50, 75, 100), labels=c("poor", "medium", "good", "excellent"))
table(Datensatz_final_so$Dummy)
#
# poor medium good excellent
# 4 3 38 32
levels(Datensatz_final_so$Dummy)
# [1] "poor" "medium" "good" "excellent"
Note your original categorization places 75 in good AND excellent.
