How to return all names that appear multiple times in table-CodePudding

Suppose I have the following schema:

student(name, siblings)

The related table has names and siblings. Note the number of rows of the same name will appear the same number of times as the number of siblings an individual has. For instance, a table could be as follows:

Jack, Lucy
Jack, Tim

Meaning that Jack has Lucy and Tim as his siblings. I want to identify an SQL query that reports the names of all students who have 2 or more siblings. My attempt is the following:

select name
from student
where count(name) >= 1;

I'm not sure I'm using count correctly in this SQL query. Can someone please help with identifying the correct SQL query for this?

CodePudding user response：

You're almost there:

select name
from student
group by name
having count(*) > 1;

HAVING is a where clause that runs after grouping is done. In it you can use things that a grouping would make available (like counts and aggregations). By grouping on the name and counting (filtering for >1, if you want two or more, not >=1 because that would include 1) you get the names you want..

This will just deliver "Jack" as a single result (in the example data from the question). If you then want all the detail, like who Jack's siblings are, you can join your grouped, filtered list of names back to the table:

select *
from 
student
INNER JOIN
(
  select name
  from student
  group by name
  having count(*) > 1
) morethanone ON morethanone.name = student.name

You can't avoid doing this "joining back" because the grouping has thrown the detail away in order to create the group. The only way to get the detail back is to take the name list the group gave you and use it to filter the original detail data again

Full disclosure; it's a bit of a lie to say "can't avoid doing this": SQL Server supports something called a window function, which will effectively perform a grouping in the background and join it back to the detail. Such a query would look like:

select student.*, count(*) over(partition by name) n
from student

And for a table like this:

jack, lucy
jack, tim
jane, bill
jane, fred
jane, tom
john, dave

It would produce:

jack, lucy, 2
jack, tim, 2
jane, bill, 3
jane, fred, 3
jane, tom, 3
john, dave, 1

The rows with jack would have 2 on because there are two jack rows. There are 3 janes, there is 1 john. You could then wrap all that in a subquery and filter for n > 1 which would remove john

select * 
from
(
  select student.*, count(*) over(partition by name) n
  from student
) x
where x.n > 1

If SQL Server didn't have window functions, it would look more like:

select *
from 
student
INNER JOIN
(
  select name, count(*) as n
  from student
  group by name
) x ON x.name = student.name

The COUNT(*) OVER(PARTITION BY name) is like a mini "group by name and return the count, then auto join back to the main detail using the name as key" i.e. a short form of the latter query

CodePudding user response：

You can do:

select name
from student as s1
where exists (
  select s2
  from student as s2
  where s1.name = s2.name and s1.siblings != s2.siblings
)

CodePudding user response：

I think the best approach is what 'Caius Jard' mentioned. However, additional way if you want to get how many siblings each name has .

SELECT name, COUNT(*) AS Occurrences
FROM  student
GROUP BY name
HAVING (COUNT(*) > 1)