GROUP BY Clause
Author: Dr. Vijay V. Raghavan
The GROUP BY Clause is how SQL summarizes data. It is used to divide data from a table based on rows that share common characteristics. Consider the following table:
PatList Table
Pat_id | Pat_type | Insurance | Tot_Amount |
001 | Outpatient | Choice Care | 960.36 |
006 | Inpatient | Choice Care | 859.12 |
009 | Outpatient | BC/BS | 50.97 |
010 | Inpatient | BC/BS | 387.26 |
012 | Outpatient | Choice Care | 123.96 |
014 | Outpatient | BC/BS | 450.00 |
Let us start with a simple GROUP BY query:
SELECT Insurance FROM PatList GROUP BY Insurance; |
You can think of a GROUP BY clause as working in two steps:
This means that the query mentioned above Will give the following output:
Insurance |
BC/BS |
Choice Care |
Now, Let us change the query to:
SELECT Insurance, pat_id FROM PatList GROUP BY Insurance; |
This should result in an error similar to "the column pat_id cannot be used unless it is in a group by expression." It is easy to see the reasoning behind this error message. The purpose of the GROUP BY clause is to produce groups that share common characteristics. The result is always a grouped tabled based on these characteristics. The SELECT statement should hence contain only those columns that further describe the groups. In other words, the select statements can contain only group attributes. These groups attributes are only those columns that are in the group expression or they could be aggregate functions such as COUNT, SUM, and AVG.
Following this logic, for the query:
SELECT Insurance, Pat_type FROM PatList GROUP BY Insurance, Pat_type; |
the result would be:
Insurance | Pat_type |
BC/BS | Inpatient |
Choice Care | Inpatient |
BC/BS | Outpatient |
Choice Care | Outpatient |
Consider the following query designed to extract the groups and the total of the tot_amount for each group:
SELECT Insurance, sum( tot_amount) FROM PatList GROUP BY Insurance ; |
This query is valid since aggregate functions are allowed in the GROUP BY clause. Through this query, we are first seeking to group by insurance, and we also want to select the sum of the tot_amount for each group. Note that the sum of the tot_amount is simply an attribute of the grouping we are seeking. The result of this query is:
Insurance | sum(tot_amount) |
BC/BS | 888.23 |
Choice Care | 1943.44 |
If the SELECT statement contains a GROUP BY clause, the select list can contain only the following:
TRAPS of GROUP BY
Although we have noted that we could use in the SELECT list only those columns that are specified in the GROUP BY, certain implementations such as SYBASE won't prevent you from not following this rule, but the result of such query is generally meaningless.
HAVING Clause
HAVING clause is a restriction placed on the groups in very much the way as a WHERE clause which is a restriction on rows. The HAVING clause checks for the conditions specified and retains only those groups that meet the specified conditions.
SELECT
Insurance FROM patlist GROUP BY Insurance HAVING sum(tot_amount) > 1000; |
This query will result in only Choice Care. The query SELECT Insurance FROM patlist GROUP BY Insurance is done first to produce all the groups of insurance and among those groups only those that have sum(tot_amount) > 1000 are retained in the final result set.
Please send your comments to: Raghavan
© Dr. Vijay V. Raghavan
BACK TO Table of contents
BACK TO DATABASE Main Page