How to Get a Non-Aggregate value in an Aggregate Query

The KEEP clause in Oracle

Suppose you want to find the maximum salary for each job in the emp table. This is very easy using using the max aggregate:

select job, max(sal) max_sal
from emp
group by job
order by job
;

JOB       MAX_SAL 
--------- ------- 
ANALYST      3000 
CLERK        1300 
MANAGER      2975 
PRESIDENT    5000 
SALESMAN     1600

The next logical question is: Who earns the max salary? A very common solution to this problem is to use an analytical function:

select
    job, ename, sal
from (
    select
        row_number() over (partition by job order by sal desc) rn,
        job, ename, sal
    from emp
)
where rn = 1
order by job
;

JOB       ENAME   SAL 
--------- ------ ---- 
ANALYST   SCOTT  3000 
CLERK     MILLER 1300 
MANAGER   JONES  2975 
PRESIDENT KING   5000 
SALESMAN  ALLEN  1600 

-- Explain plan:                                                                                     
------------------------------------------------------------------------------------ 
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
------------------------------------------------------------------------------------ 
|   0 | SELECT STATEMENT            |      |    14 |   546 |     4  (25)| 00:00:01 | 
|*  1 |  VIEW                       |      |    14 |   546 |     4  (25)| 00:00:01 | 
|*  2 |   WINDOW SORT PUSHED RANK   |      |    14 |   252 |     4  (25)| 00:00:01 | 
|   3 |    TABLE ACCESS STORAGE FULL| EMP  |    14 |   252 |     3   (0)| 00:00:01 | 
------------------------------------------------------------------------------------ 

Predicate Information (identified by operation id):                                  
---------------------------------------------------                                  

   1 - filter("RN"=1)                                                                
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "JOB" ORDER BY "SAL")<=1)

There is an alternate way to do this using the keep clause and dense_rank along with the first/last commands:

select
    job,
    -- The initial "min" aggregate could also be "max" 
    -- The key logic is in the dense_rank first/last and ordering
    min(ename) keep (dense_rank last order by sal) ename,
    -- Could also write as:
    -- min(ename) keep (dense_rank first order by sal desc) ename,
    max(sal)
from emp
group by job
order by job
;

JOB       ENAME  MAX(SAL) 
--------- ------ -------- 
ANALYST   FORD       3000 
CLERK     MILLER     1300 
MANAGER   JONES      2975 
PRESIDENT KING       5000 
SALESMAN  ALLEN      1600 

-- Explain Plan
----------------------------------------------------------------------------------- 
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     | 
----------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT           |      |     5 |    90 |     4  (25)| 00:00:01 | 
|   1 |  SORT GROUP BY             |      |     5 |    90 |     4  (25)| 00:00:01 | 
|   2 |   TABLE ACCESS STORAGE FULL| EMP  |    14 |   252 |     3   (0)| 00:00:01 | 
-----------------------------------------------------------------------------------

Using the dense_rank command is easier to write and it may result in more performant queries. The different explain plans shows that less bytes were required.

Note: the keen observer may note that for the ANALYST position there's two different results. That's because both they're two analysts (FORD and SCOTT) and they both earn the same amount. If "ties are allowed" the following query can be used:

select job, ename, sal
from (
    select 
        job, ename, sal,
        dense_rank() over (partition by job order by sal desc) rn
    from emp
)
where rn = 1
order by job
;

-- Note that ANALYST has two rows given that SCOTT and FORD earn the same amount
JOB       ENAME   SAL 
--------- ------ ---- 
ANALYST   SCOTT  3000 
ANALYST   FORD   3000 
CLERK     MILLER 1300 
MANAGER   JONES  2975 
PRESIDENT KING   5000 
SALESMAN  ALLEN  1600

References

  • Oracle's official documentation on this feature is actually in the first command

  • Rob van Wijk has a good article about the keep clause

    • Covers some of the performance gains in detail

Thanks to Jeff Kemp for showing me this feature!