I'm hoping to use google sheets to determine the profitability of each word in around 100 10-25 word messages.
Right now I'm starting with something along the lines of this:
| Message | Success | Word Count |
|---|---|---|
| Cats are gross | 150% | 3 |
| I love Dogs | 250% | 3 |
| I love Dogs and Cats are gross | 450% | 6 |
To query the word usage I'm using the formula:
=ArrayFormula(query(Flatten(split(trim(REGEXREPLACE(LOWER(Message1:Message), "[!?,.:;-]"," "))," ")), "SELECT Col1, COUNT(Col1) Group By Col1",0))
This gets me to here:
| Word | Count |
|---|---|
| cats | 2 |
| are | 2 |
| gross | 2 |
| i | 2 |
| love | 2 |
| dogs | 2 |
| and | 1 |
And I'm using this formula to assign the Success% to each word using the word count as its basis for splitting the words and copying down:
=IFERROR(IF(COUNTIF(Value$1:Value2,Value2)<=VLOOKUP(Value2,$Success$2:$Word_Count$100,6,FALSE),Value2,IF(OFFSET($Success$2,MATCH(Value2,$Success$2:$Success$100,0),0)="","",OFFSET($Success$2,MATCH(Value2,$Success$2:$Success$100,0),0))),"")
This gets me here:
| Word | Value |
|---|---|
| cats | 150% |
| are | 150% |
| gross | 150% |
| I | 250% |
| love | 250% |
| dogs | 250% |
| I | 450% |
| love | 450% |
| dogs | 450% |
| and | 450% |
| cats | 450% |
| are | 450% |
| gross | 450% |
What formula could I use to combine these data sets to get me to here:
| Word | Count | Success AVG |
|---|---|---|
| cats | 2 | 300% |
| are | 2 | 300% |
| gross | 2 | 300% |
| I | 2 | 350% |
| love | 2 | 350% |
| dogs | 2 | 350% |
| and | 1 | 450% |
CodePudding user response:
use:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(SPLIT(REGEXREPLACE(LOWER(A2:A),
"[!?,.:;-]", " "), " ")&"×"&B2:B), "×"),
"select Col1,count(Col1),avg(Col2)
where Col2 is not null
group by Col1
label Col1'Word',count(Col1)'Count',avg(Col2)'Success AVG'"))


