The following dialog is presented here to ilustrate the use of a SELF JOIN:

From: Paul Sorenson [SMTP:pauls@classware.com.au]

Sent: Saturday, August 01, 1998 7:17 AM

To: raghavan@NKU.EDU

Subject: SQL "query"

Dear Dr Raghavan,

You seem to know a lot about SQL and I wondered if you could suggest a single SQL statement that from the table below would select the rows marked with '*' including all columns in the result set. I want to:

"Retrieve rows with the latest date for each Id"

Id Date Percentt
1746 3/04/98 85
1746 10/04/98 100*
1747 14/11/97 0
1747 21/11/97 10
1747 28/11/97 5*

 

The problem I have is that since I have to include Id and Percent in the select as well as max(Date), the group by (Id, Percent) forces me to retrieve all rows. If I leave Percent out the select then I get the rows that I want but not the Percent column so I need to create a temporary table then do a second join.

I would be most grateful to hear your suggestions.

Thank you.

Paul Sorenson Classware Australia Pty Ltd

Ph +613 9886-1913 73 Stanley Rd Vermont Sth

Fx +613 9886-1913 Australia

REPLY FROM DR.RAGAHVAN

At 07:24 2/08/98 -0400, you wrote:

It appears to be easy. Can you check whether the following query gives the result set you want? I changed the name of the date column to "ldate" since date is a reserved word in all implementations of SQL that I know.

SELECT * FROM delme WHERE ldate IN (SELECT MAX(ldate) FROM delme GROUP BY id);

The logic of this query is:

First, the subquery "SELECT MAX(ldate) FROM delme GROUP BY id" fetches the max(date) for each ID. Remember, it is okay to have a column in group by that is not part of the SELECT list, alothough SQL does not allow you to have a column in SELECT that is NOT in group by - except for columns in aggregate functions. (If you want a more detailed explanation of the group by clause, please visit my webpage: http://www.nku.edu/~raghavan/gby.html)

SELECT * FROM delme fetches all columns and the WHERE clause looks for

only those date which is in the result set of this subquery.

Hope I have understood your problem correctly. If you have something else in mind, please let me know.

Dr. Vijay V. Raghavan

REPLY FROM PAUL

Thank you for your suggestion.

It is close but not quite - to be fair your query yields the correct result on the data I provided you. However if I add one more record to the set with a duplicate date (quite common in this application), three records are returned rather than the two I want. The sub-select works fine but the join to the top level select somehow needs to include both Id and Date fields to work properly.

Id Date Percent | wanted | retrieved

1746 03/04/98 85 | | * (same date as last record)

1746 10/04/98 100 | * | *

1747 14/11/97 0 | |

1747 21/11/97 10 | |

1747 28/11/97 5 | |

1747 03/04/98 9 | * | *

The only solutions I see are:

- write two seperate SQL statements with an intermediate table (similar to above but now I can join on the two fields that make each row unique).

FINAL SOLUTION FROM DR. RAGHAVAN

At 17:56 14/08/98 -0400, you wrote:

Dear Paul:

Here is a belated answer to your problem.

Try the following:

SELECT * FROM delme a WHERE ldate = (SELECT MAX(ldate) FROM delme b WHERE b.id = a.id);

This technique uses a self join and I owe this solution to my

nephew Mr. Pradheep Kumar at

pkumar1@kmart.com

Vijay Raghavan

FAREWELL FROM PAUL

Thanks,

It works. I didn't know you could join on fields between inner and outer select. Please thank your nephew for me.

The only problem now is that it is much slower than my separate queries (I haven't shown you the whole problem) so I need to play around a bit to find the most efficient form.

Thanks again.

Paul

Mail your comments to: Raghavan