Apologies if the title is unclear, the question is difficult to phrase.
I have a dataset that describes the positions of football players over time on a field during various plays. The table I'm trying to query has a frame_id column that is an incrementing counter, with there being 10 frames per second.
The table also has an event column, which describes an event that may have happened that frame, like ball_snap or kick_received. In the course of a play, some of these events can be guaranteed to happen in a specific order.
I want to find the frames that occur between two events.
If I want to get an ordered set of frame_ids, between two events defined in a different column, how would I go about that?
My first attempt was:
select game_id, play_id, min(frame_id) as snap_frame, max(frame_id) as tackle_frame
from tracking
where event = 'ball_snap'
or event = 'tackle'
group by game_id, play_id;
but this did not work. I seem to be getting some undefined behavior where this works for some records but not others
Edit to include some data
CSV: https://gist.github.com/bholzer/bae669a6e70f68807717f79a66af7a54
| game_id | play_id | frame_id | event |
|---|---|---|---|
| 10 | 20 | 10 | |
| 10 | 20 | 11 | ball_snap |
| ... | ... | ... | ... |
| 10 | 20 | 77 | |
| 10 | 20 | 78 | punt_received |
I want a query that can get everything between frame 11 and 78 in this data
CodePudding user response:
There are many ways to solve it, one of them is using the EXISTS operator:
SELECT *
FROM tracking t
WHERE EXISTS (
SELECT 1 FROM tracking te
WHERE t.game_id = te.game_id
AND t.play_id = te.play_id
AND te.event IN ('ball_snap', 'punt_received')
HAVING t.frame_id > MIN(te.frame_id) AND t.frame_id < MAX(te.frame_id)
)
CodePudding user response:
You have two different questions in your post :
If I want to get an ordered set of frame_ids, between two events defined in a different column, how would I go about that?
As far as I understand your tracking table structure, the same event name will appear in many rows of the table because it can occur several times during the game, so you can't use the column event in order to identify one specific event which occurs at one time, and as a consequence, you can't select all the events between two events characterized by their name only.
I want a query that can get everything between frame 11 and 78 in this data
If we assume that the frame_id values are ordered according to the timeline, then the query that achieves this is :
SELECT * FROM tracking WHERE frame_id BETWEEN 11 AND 78
