The following dialog is presented here to ilustrate the use of a SELF JOIN:
From: Paul Sorenson [SMTP:pauls@classware.com.au]
Sent: Saturday, August 01, 1998 7:17 AM
To: raghavan@NKU.EDU
Subject: SQL "query"
Dear Dr Raghavan,
You seem to know a lot about SQL and I wondered if you could suggest a single SQL statement that from the table below would select the rows marked with '*' including all columns in the result set. I want to:
"Retrieve rows with the latest date for each Id"
Id | Date | Percentt |
1746 | 3/04/98 | 85 |
1746 | 10/04/98 | 100* |
1747 | 14/11/97 | 0 |
1747 | 21/11/97 | 10 |
1747 | 28/11/97 | 5* |
The problem I have is that since I have to include Id and Percent in the select as well as max(Date), the group by (Id, Percent) forces me to retrieve all rows. If I leave Percent out the select then I get the rows that I want but not the Percent column so I need to create a temporary table then do a second join.
I would be most grateful to hear your suggestions.
Thank you.
Paul Sorenson Classware Australia Pty Ltd
Ph +613 9886-1913 73 Stanley Rd Vermont Sth
Fx +613 9886-1913 Australia
REPLY FROM DR.RAGAHVAN
At 07:24 2/08/98 -0400, you wrote:
It appears to be easy. Can you check whether the following query gives the result set you want? I changed the name of the date column to "ldate" since date is a reserved word in all implementations of SQL that I know.
SELECT * FROM delme WHERE ldate IN (SELECT MAX(ldate) FROM delme GROUP BY id);
The logic of this query is:
First, the subquery "SELECT MAX(ldate) FROM delme GROUP BY id" fetches the max(date) for each ID. Remember, it is okay to have a column in group by that is not part of the SELECT list, alothough SQL does not allow you to have a column in SELECT that is NOT in group by - except for columns in aggregate functions. (If you want a more detailed explanation of the group by clause, please visit my webpage: http://www.nku.edu/~raghavan/gby.html)
SELECT * FROM delme fetches all columns and the WHERE clause looks for
only those date which is in the result set of this subquery.
Hope I have understood your problem correctly. If you have something else in mind, please let me know.
Dr. Vijay V. Raghavan
REPLY FROM PAUL
Thank you for your suggestion.
It is close but not quite - to be fair your query yields the correct result on the data I provided you. However if I add one more record to the set with a duplicate date (quite common in this application), three records are returned rather than the two I want. The sub-select works fine but the join to the top level select somehow needs to include both Id and Date fields to work properly.
Id Date Percent | wanted | retrieved
1746 03/04/98 85 | | * (same date as last record)
1746 10/04/98 100 | * | *
1747 14/11/97 0 | |
1747 21/11/97 10 | |
1747 28/11/97 5 | |
1747 03/04/98 9 | * | *
The only solutions I see are:
- write two seperate SQL statements with an intermediate table (similar to above but now I can join on the two fields that make each row unique).
FINAL SOLUTION FROM DR. RAGHAVAN
At 17:56 14/08/98 -0400, you wrote:
Dear Paul:
Here is a belated answer to your problem.
Try the following:
SELECT * FROM delme a WHERE ldate = (SELECT MAX(ldate) FROM delme b WHERE b.id = a.id);
This technique uses a self join and I owe this solution to my
nephew Mr. Pradheep Kumar at
pkumar1@kmart.com
Vijay Raghavan
FAREWELL FROM PAUL
Thanks,
It works. I didn't know you could join on fields between inner and outer select. Please thank your nephew for me.
The only problem now is that it is much slower than my separate queries (I haven't shown you the whole problem) so I need to play around a bit to find the most efficient form.
Thanks again.
Paul
Mail your comments to: Raghavan