August 22, 2005: On max and group by

create table customers (
  id       number primary key,
  name     varchar2(20)
);

insert into customers values (1, 'Micky Mouse');
insert into customers values (2, 'Donald Duck');
insert into customers values (3, 'Dagobert');

And you have a tables with the customer's orders:

create table orders (
  cust_id    references customers,
  article_nr varchar2(20),
  price      number(5,2)
);

insert into orders values (1, '204.290', 199.00);
insert into orders values (2, '923.150',   7.98);
insert into orders values (3, '523.672', 299.95);
insert into orders values (3, '817.543',  29.99);
insert into orders values (1, '195.866',  39.90);
insert into orders values (2, '674.687',  52.39);
insert into orders values (1, '375.052',  17.20);
insert into orders values (3, '440.291', 179.95);

Your task

Your task is now to find each customer's order with the highest price along with its respective article number and the customer's name.

The traditional approach

A traditional approach to your task is to have a subquery that groups the orders by the customers and join this subquery to the customers table:

select
  c.name,
  o.article_nr,
  o.price
from
  orders    o, 
  customers c,
   (
    select 
       max(price) max_price,
       cust_id
      from orders
     group by cust_id
    ) subquery
where
  o.price = subquery.max_price and 
  c.id    = subquery.cust_id;

The output of this query:

NAME                 ARTICLE_NR                PRICE
-------------------- -------------------- ----------
Micky Mouse          204.290                     199
Dagobert             523.672                  299.95
Donald Duck          674.687                   52.39

The row_number analytic function approach

With the row_number analyic functions, the need to join the subquery (used to find the record with the maximum price) is eliminated. However, a subquery is still needed and the row_number column must be compared with 1 (one):

select
  c.name,
  s.article_nr,
  s.price
from 
  customers c,
 (select
    cust_id,
    price,
    article_nr,
    row_number() over (partition by cust_id order by price desc) r
  from
    orders
  ) s
where 
  s.r = 1 and
  s.cust_id = c.id;

This query's output is the same as the traditional query's output, although the order of the records might change.

The min() keep (dense_rank...) approach

Although the row_number approach is (imho) more readable than the traditional approach, the need for the subquery is still disturbing.

Fortunately, in the quest for the most undecipherable syntax, some ill mind came up with a construct that does away with the subquery altogether:

select
  c.name,
  max(article_nr) keep (dense_rank last order by price) article_nr,
  max(price)                                            price
from
  orders o, customers c
where
  o.cust_id = c.id
group by
  c.name,
  cust_id;

The dense_rank last order by price might be understandable, however. It sort of tells to order each group (stemming from the group by) by the price. But it's beyond my imagination what the max(article_nr) keep are used for. In fact, the result set is the same if I choose any other aggregate function instead of max, such as for example sum:

select
  c.name,
  sum(article_nr) keep (dense_rank last order by price) article_nr,
  max(price)                                            price
from
  orders o, customers c
where
  o.cust_id = c.id
group by
  c.name,
  cust_id;

August 22, 2005: On max and group by

Your initial position

Your task

The traditional approach

The row_number analytic function approach

The min() keep (dense_rank...) approach

More on Oracle


	René Nyffenegger's collection of things on the web
	René Nyffenegger on Oracle - Most wanted - Feedback - Follow @renenyffenegger