I'm new to SQL and I am trying to filter out responses with no variation for survery collection (invalid responses) to do multi-linear regression. Do take note that there is actually more than 100 records for this table and I have simplified it for the illustration.
Database: MySQL 8.0.30 : TLSv1.2 (TablePlus)
- ID is the respondent number.
- Variables - x1, x2, x3 is the independent variables.
- Values - Survery response.
For example this is the current table I have:
| ID | Variables | Values |
|---|---|---|
| 1 | x1 | 1 |
| 1 | x2 | 1 |
| 1 | x3 | 1 |
| 2 | x1 | 2 |
| 2 | x2 | 3 |
| 2 | x3 | 4 |
| 3 | x1 | 5 |
| 3 | x2 | 5 |
| 3 | x3 | 5 |
Scripts used: SELECT ID, Variables, Values FROM TableA GROUP BY ID
I am trying to achieve the following table, where I only want to keep the records which have a variation in the responses:
| ID | Variables | Values |
|---|---|---|
| 2 | x1 | 2 |
| 2 | x2 | 3 |
| 2 | x3 | 4 |
I have tried to use the functions WHERE, DISTINCT, WHERE NOT, HAVING, but I can't seem to get the results that I require, or showing blank most times (like the table below). If anyone is able to help, that would be most helpful.
| ID | Variables | Values |
|---|
Thank you very much!
CodePudding user response:
Your problem has two parts so you are going to need to use a subquery for this.
- you want to know which responses have variations. For this you'll want to group by the responses by the
id, and then check that the responses that have the sameidall have the samevalue, or not. For this you can select only those having more than one distinct value:
select `id`
from results
group by `id`
having count(distinct `values`) > 1
- based on that you can just wrap it with a
selectto get all the fields that you want, ungrouped:
select *
from results
where `id` in (
select `id`
from results
group by `id`
having count(distinct `values`) > 1
)
This is MySQL syntax, but shouldn't have that many differences for main dbs
SQL Fiddle: http://sqlfiddle.com/#!9/a266f806/4/0
Hope that helps
CodePudding user response:
Try the following:
WITH ids_with_variations as
(
SELECT ID
,COUNT(DISTINCT [Values]) as unique_value_count
FROM TableA
GROUP BY ID
HAVING COUNT(DISTINCT [Values]) = 3 -- this assumes that you expect each ID to have exactly three responses
)
SELECT *
FROM TableA
WHERE ID IN (SELECT ID FROM ids_with_variations)
This is TSQL dialect. This also assumes that you expect exactly three variations in the value column.
