I am using hive .14 for a school project (so cant use a better version), and currently I analysing data taken from a uber in NY. I am using 2 tables: yellowtaxi22, for the information regarding the trip the uber did (in this case I will only use the pickupid and the dropoff id); and zoneinfo, that simply have the names of the pickup and dropoff id. I created a query to find what where the most common trips the uber took and then I created a view out of it(this being the commontrip), the problem now is that when I want to change the ids to their corresponding names, it only selects the trips where the pickupid=dropid. Example
| pulocationid | dolocationid | count |
|---|---|---|
| 12 | 34 | 20000 |
| 43 | 12 | 30000 |
| 12 | 12 | 100 |
| 5 | 91 | 40000 |
| 34 | 34 | 70000 |
the result should be
| pulocationid | dolocationid | count |
|---|---|---|
| JFK airport | NV | 20000 |
| Midtown | JFK airport | 30000 |
| JFK airport | JFK airport | 100 |
| Mountain Hill | LINCON SQR | 40000 |
| NV | NV | 70000 |
But the result is
| pulocationid | dolocationid | count |
|---|---|---|
| JFK airport | JFK airport | 100 |
| NV | NV | 70000 |
this is the query that I am using, zoneinfo is the table that contains the location´s name (refered as zone) and its id (whcih can be used in either the pickup or the dropoff column (or both), common trips is the name of the view that I used to count the number of times a trip was repeated, this count is named as "trips" in the query:
select /* Mapjoin(commontrip)*/ zoneinfo.zone as pickup, zoneinfo.zone as dropoff, commontrip.trips
from zoneinfo join commontrip on
(zoneinfo.locationid=commontrip.pulocationid and zoneinfo.locationid=commontrip.dolocationid);
CodePudding user response:
You're comparing zoneinfo.locationid to both commontrip.pulocationidand commontrip.dolocationid, meaning those must be equal with each other as well.
Try joining only one ID column at a time with your "label table"
