I have started working on a sentiment analysis, but I have problem with transforming the lexicon into the required format
My data looks like something this:
| word | alternativeform1 | alternativeform2 | value |
|---|---|---|---|
| abmachen | abgemacht | abmachst | 0.4 |
| Aktualisierung | Aktualisierungen | NA | 0.2 |
I need it to look like this
| word | value |
|---|---|
| abmachen | 0.4 |
| abgemacht | 0.4 |
| abmachst | 0.4 |
| Aktualisierung | 0.2 |
| Aktualisierungen | 0.2 |
Can you help me find the easy way to do this? Thank you very much :)
CodePudding user response:
You could use
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-value, values_to = "word") %>%
drop_na(word) %>%
select(word, value)
This returns
# A tibble: 5 x 2
word value
<chr> <dbl>
1 abmachen 0.4
2 abgemacht 0.4
3 abmachst 0.4
4 Aktualisierung 0.2
5 Aktualisierungen 0.2
