I have an Excel sheet including 2000 tweets. Five annotators labeled each tweet as Hate, Neutral, and Counterhate. I want to create a new column indicating the majority voting on these labels for each tweet.
For example, if for a tweet, three annotators voted on hate, one voted Counterhate, and one Neutral, then the majority voting of this tweet should be Hate.
- Question1: Please let me know what the formula is to do this.
- Question2: what is the majority of voting for a tweet that the number of two classes are same? For example, two annotators voted as
Hate, two asCounterhate, and oneNeutral?
Following is the screenshot of my excel sheet and the formula I wrote, but it returns false.
| dex | Text | Label1 | Label2 | Label3 | #of Hate | #of Neutral | #of Counterhate | majority | Agreement |
|---|---|---|---|---|---|---|---|---|---|
| 1 | tweet1 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
| 2 | tweet2 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
| 3 | tweet3 | Neutral | Neutral | Neutral | 0 | 3 | 0 | FALSE | 1 |
| 4 | tweet4 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
| 5 | tweet5 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.4 |
| 6 | tweet6 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.4 |
| 7 | tweet7 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 1 |
| 8 | tweet8 | Neutral | Hate | Neutral | 1 | 2 | 0 | FALSE | 0.6 |
| 9 | tweet9 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
| 10 | tweet10 | Counterhate | Neutral | Counterhate | 0 | 1 | 2 | FALSE | 0.4 |
I rewrite my formula:
=OR(IF(MAX(F2:H2)=F2,"Hate"),IF(MAX(F2:H2)=G2,"Neutral"),
IF(MAX(F2:H2)=H2,"Counterhate"))
CodePudding user response:
You can try the following for Question 1 in K2 cell:
=LET(values, {"Hate","Neutral","Counterhate"},
counts, COUNTIF($C2:$E2,values),
INDEX(values, XMATCH(MAX(counts),counts)))
Notes:
- Question 2 is not excel related, it is business specific rule you need to define and then to apply that rule in Excel.
- There is no need to use the helper additional columns you have for that. I obtain the result just based on Label columns:
LET is used for easier maintain the formula, creating variables representing portion of the formula will be repeated more than once.
counts, COUNTIF($C2:$E2,values)
Counts how many times the input range $C2:$E2 contains the values.
INDEX(values, XMATCH(MAX(counts),counts))
Calculates the index position in values from the maximum number of repetitions.
The rest is just to expand down the formula in K2 cell.
Tip: If the business rule for Question2 is to consider for example Neutral, if two or more values have the same count. Then the previous formula can be modified as follow:
=LET(values, {"Hate","Neutral","Counterhate"},counts,
COUNTIF($C2:$E2,values), maxNum, SUM(--
ISNUMBER(XMATCH(counts,MAX(counts)))),
IF(maxNum > 1,"Neutral",INDEX(values, XMATCH(MAX(counts),counts))))

