Friday, July 01, 2005

Extra Columns in a GROUP BY

Happy Canada Day. I have a really good one today.

My problem:

I have a table 'ASSIGNMENTS' that keeps track of each assignment, who holds which assignment, and as of which date they have held it.

It would be very handy to have a view that shows all assignments, and who is currently on that assignment.

CREATE TABLE ASSIGNMENTS(ASSIGNMENT VARCHAR2(32), EMPLOYEE VARCHAR2(32), EFFECTIVE DATE);

INSERT INTO ASSIGNMENTS(ASSIGNMENT, EMPLOYEE, EFFECTIVE) VALUES ('Prime Minister', 'Jean Chretien', '04-Nov-93');
INSERT INTO ASSIGNMENTS(ASSIGNMENT, EMPLOYEE, EFFECTIVE) VALUES ('Governor General', 'Adrienne Clarkson', '07-Oct-99');
INSERT INTO ASSIGNMENTS(ASSIGNMENT, EMPLOYEE, EFFECTIVE) VALUES ('Prime Minister', 'Paul Martin', '12-Dec-03');

You must be familiar with GROUP BY.

SELECT ASSIGNMENT, MAX(EFFECTIVE) AS_OF FROM ASSIGNMENTS GROUP BY ASSIGNMENT;

ASSIGNMENT AS_OF
-------------------------------- ---------
Governor General 07-OCT-99
Prime Minister 12-DEC-03

2 rows selected.

Ok, now we would like to add in the name of the person. But whoops, doesn't work as you'd expect.

SELECT ASSIGNMENT, EMPLOYEE, MAX(EFFECTIVE) AS_OF FROM ASSIGNMENTS GROUP BY ASSIGNMENT;

ORA-00979: not a GROUP BY expression

You see, we can't include any columns that aren't part of the GROUP BY. And, we can't include it in the group by, because then we'd get a result no different from assignments.

There is an answer. Connor McDonald has it:

http://www.oracledba.co.uk/tips/9i_first_last.htm

Alright, what is Connor talking about? Basically he is grouping the "extra" column (empno), and making it a secondary grouping to the main grouping (sal).

But, really, MIN(EMPNO)? Why are we including an aggregate function on empno? Well, Connor is doing it because he wants to break ties.

Even though we don't need to break ties, we still need to use an aggregate function, otherwise it becomes part of the group by. Any aggregate function will do since we shouldn't have duplicates.

DENSE_RANK , FIRST, ORDER BY

- ORDER BY will sort a set of rows by a particular column. In Connor's case it is salary, in our case, it will be 'effective'.
- DENSE_RANK provides the positioning of a particular row within an ordered list of rows.
- FIRST goes hand-in-hand with DENSE_RANK and will naturally provide us with the first, ranked row in a ordered list of rows.

Note that the ORDER BY defaults to ascending order, and so in our case we either want to choose LAST or put the ORDER BY in descending order instead.

Essentially he is creating an ordered list of all salaries, and choosing the empno that shows up first on that list. Sounds like what we want!

Here is Dan Morgan's reference on numeric functions like these:
http://www.psoug.org/reference/number_func.html

KEEP is just a clever Oracle trick to keep the work we're doing in the shared pool because we're using it twice in the same query.

So applying Connor's teachings to our situation, we get:

CREATE OR REPLACE VIEW CURRENT_ASSIGNMENTS AS
SELECT ASSIGNMENT,
MAX(EMPLOYEE) KEEP (DENSE_RANK LAST ORDER BY EFFECTIVE) EMPLOYEE,
MAX(EFFECTIVE) AS_OF
FROM ASSIGNMENTS
GROUP BY ASSIGNMENT;

SELECT * FROM CURRENT_ASSIGNMENTS;

ASSIGNMENT EMPLOYEE AS_OF
-------------------------------- -------------------------------- ---------
Governor General Adrienne Clarkson 07-OCT-99
Prime Minister Paul Martin 12-DEC-03

2 rows selected.

Coming up in some future blog, Tom Kyte mentioned the awesome "MEMBER OF" feature that is in Oracle 10g. It will tell you easily whether a value is contained in a particular set/table. I'm looking for a way to do this in Oracle 9.

http://thinkoracle.blogspot.com/2005/05/enum-in-oracle.html

Comments:
In simple situations, here is the "normal" way of doing it:

SQL> SELECT a.assignment, a.employee, a.effective
2 FROM assignments a,
3 (SELECT assignment, MAX(effective) as_of FROM assignments GROUP BY assignments) a1
4 WHERE a.assignment = a1.assignment
5 AND a.effective = a1.as_of;
 
This comment has been removed by a blog administrator.
 
This comment has been removed by a blog administrator.
 
Is there a solution for when there are duplicates (and you want them) and that doesn't involve involve the "normal" way using a sub-query?
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?