René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle - Most wanted - Feedback -
 

Using SQL to select the value for the last row that was not null

I find it a common problem with SQL when I deal with tables that contain NULL values and if I do a select on that table, I don't want to have NULLs returned but rather the value of the last row that was not NULL.
For example, if I maintain a table that keeps track of how much certain items cost, I'd have something like:
create table item_price (
  dt date,
  item varchar2(30),
  price number(6,2)
);
Now, whenever the price for an item changes, a record is inserted that contains the date (dt) when item changed the price.
alter session set nls_date_format='dd.mm.yyyy';
insert into item_price values('14.05.2001','beans',15.4);
insert into item_price values('17.05.2001','pears', 2.8);
insert into item_price values('28.05.2001','beans',16.2);
insert into item_price values('26.05.2001','pears', 3.1);
insert into item_price values('03.06.2001','beans',14.9);
insert into item_price values('01.06.2001','pears', 2.9);
Now, I am interested in the price of beans between 14.05 and 03.06. First, I create a table that contains a record for each date in this range:
create table date_range as 
  select 
    to_date(rownum-1 + to_date('14.05.2001')) dt 
  from 
    all_objects
  where 
    rownum-1 <= to_date('03.06.2001') - to_date('14.05.2001')
;
The desired result can now be obtained with the following select statement:
select 
  dr_dt,
  sum(ip_price) over (partition by new_price) price   
from (
  select 
    dr.dt                               dr_dt, 
    ip.price                            ip_price, 
    sum(ip.price) over (order by dr.dt) new_price
  from
    date_range dr left join
    item_price ip on dr.dt=ip.dt
  where 
    nvl(ip.item,'beans') = 'beans'
)
order by dr_dt;
DR_DT           PRICE
---------- ----------
14.05.2001       15.4
15.05.2001       15.4
16.05.2001       15.4
17.05.2001       15.4
18.05.2001       15.4
19.05.2001       15.4
20.05.2001       15.4
21.05.2001       15.4
22.05.2001       15.4
23.05.2001       15.4
24.05.2001       15.4
25.05.2001       15.4
26.05.2001       15.4
27.05.2001       15.4
28.05.2001       16.2
29.05.2001       16.2
30.05.2001       16.2
31.05.2001       16.2
01.06.2001       16.2
02.06.2001       16.2
03.06.2001       14.9
How and why does this work? The select statement's inner select statement returns:
DR_DT        IP_PRICE  NEW_PRICE
---------- ---------- ----------
14.05.2001       15.4       15.4
15.05.2001                  15.4
16.05.2001                  15.4
18.05.2001                  15.4
19.05.2001                  15.4
20.05.2001                  15.4
21.05.2001                  15.4
22.05.2001                  15.4
23.05.2001                  15.4
24.05.2001                  15.4
25.05.2001                  15.4
27.05.2001                  15.4
28.05.2001       16.2       31.6
29.05.2001                  31.6
30.05.2001                  31.6
31.05.2001                  31.6
02.06.2001                  31.6
03.06.2001       14.9       46.5
Since I have left outer joined, the result set returns a record for each date stored in date_range. The column ip_price returns the price for the dates where one is stored in item_price. The column new_price does an «analytical» sum, that is it changes its value each time ip_price changes. This new_price can now be used to find out when the price has changed (or is new, hence the name) with the partition by new price clause.
Now, Gemes Tibor has sent me a mail concerning this and points out that this is problematic with negative or zero numbers and doesn't work with varchars at all. He writes:
[...] So I added a new varchar2 type column to the example:


alter table item_price add price2 varchar2(100);
update item_price set price2 = price;
insert into item_price values('22.05.2001','beans',15.9, 'a');


Here is the rewritten select which works for varchar2 type columns as well:

select
dr_dt,
max(ip_price) over (partition by pricechangeindicator) price
from (
select
dr.dt dr_dt,
ip.price2 ip_price, count(ip.price2) over (order by dr.dt) pricechangeindicator
from
(select price2,
dt
from item_price where item='beans'
) ip,
date_range dr
where ip.dt(+) = dr.dt
)
order by dr_dt;
This select results:
DR_DT      PRICE
---------- -----
15.05.2001 15.4
14.05.2001 15.4
16.05.2001 15.4
17.05.2001 15.4
18.05.2001 15.4
19.05.2001 15.4
20.05.2001 15.4
21.05.2001 15.4
23.05.2001 a
22.05.2001 a
24.05.2001 a
27.05.2001 a
26.05.2001 a
25.05.2001 a
28.05.2001 16.2
29.05.2001 16.2
30.05.2001 16.2
02.06.2001 16.2
01.06.2001 16.2
31.05.2001 16.2
03.06.2001 14.9
Finally, he also sends a solution for 10
select 
  dr.dt                                                    dr_dt, 
--ip.price2                                                ip_price, 
  last_value(ip.price2 ignore nulls) over (order by dr.dt) last_price
  from  (
    select price2, 
           dt 
      from item_price 
     where item='beans'
  ) ip, 
  date_range dr
where ip.dt(+) = dr.dt;
which returns the same result set.
I'd like to give thanks to Gemes Tibor for giving me this input.