René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle - Most wanted - Feedback
 

November 28, 2005: On identifiying «periods of activity»

Here's a table that logs what items (sold_item) were bought at what time (dt) from a vending machine:
create table vending_machine_log (
  sold_item varchar2(20),
  dt        date 
);
The content of the table is:
SOLD_ITEM            DT
-------------------- --------------------
Coke                 nov 27 2005 13:05:27
Sprite               nov 27 2005 13:07:14
Dr. Pepper           nov 27 2005 13:10:58
Diet Coke            nov 27 2005 14:28:05
Fanta                nov 27 2005 14:31:54
Pepsi Max            nov 27 2005 14:32:29
Mountain Dew         nov 27 2005 14:36:32
Root Beer            nov 27 2005 14:51:18
Fanta                nov 27 2005 14:55:57
Pepsi                nov 27 2005 16:22:39
Sprite               nov 27 2005 16:28:06
Vanilla Coke         nov 27 2005 16:28:42
Coke                 nov 27 2005 16:32:17
Now, I want to know if there were «periods of activity» on this machine. I define: A sold item belongs to the same period of activity if it was sold no later than five minutes after the previous sale.
First, I need to identify each row that starts a new activity. This is done by the following query which returns a 1 in the column activity_start for each row that starts a new period of activity, and a null otherwise. I can do that by using the lag analytical function. Lag compares each sale's dt with its previous sale's dt and when it is less than five minutes apart (5*1/24/60), the enclosing case .. when returns 1:
select
  dt,
  case when dt - lag(dt, 1) over (order by dt) > 5 * 1/24/60 then 1 else null end activity_start
from 
  vending_machine_log;
DT                   ACTIVITY_START
-------------------- --------------
nov 27 2005 13:05:27
nov 27 2005 13:07:14
nov 27 2005 13:10:58
nov 27 2005 14:28:05              1
nov 27 2005 14:31:54
nov 27 2005 14:32:29
nov 27 2005 14:36:32
nov 27 2005 14:51:18              1
nov 27 2005 14:55:57
nov 27 2005 16:22:39              1
nov 27 2005 16:28:06              1
nov 27 2005 16:28:42
nov 27 2005 16:32:17
Based on this result set, I can now give each period a unique number (or null for the first period) which is returned in activity_counter. This is achieved by the sum analytical function. The previous query is here used as a subquery:
select 
  dt,
  sum(activity_start) over (order by dt) activity_counter
from (
  select 
    dt,
    case when dt - lag(dt, 1) over (order by dt) > 5 * 1/24/60 then 1 else null end activity_start
  from 
    vending_machine_log
);
DT                   ACTIVITY_COUNTER
-------------------- ----------------
nov 27 2005 13:05:27
nov 27 2005 13:07:14
nov 27 2005 13:10:58
nov 27 2005 14:28:05                1
nov 27 2005 14:31:54                1
nov 27 2005 14:32:29                1
nov 27 2005 14:36:32                1
nov 27 2005 14:51:18                2
nov 27 2005 14:55:57                2
nov 27 2005 16:22:39                3
nov 27 2005 16:28:06                4
nov 27 2005 16:28:42                4
nov 27 2005 16:32:17                4
This result set can now be grouped by the activity counter. The min and max aggregate functions return the activity_begin and activity_end for each period. Also, using count(*), I see how many items were sold within a period. Again, I use the previous select statement as a subquery for the following query:
select min  (dt) activity_begin,
       max  (dt) activity_end,
       count( *) sold_items
from (
  select 
    dt,
    sum(activity_start) over (order by dt) activity_counter
  from (
    select 
      dt,
      case when dt - lag(dt, 1) over (order by dt) > 5 * 1/24/60 then 1 else null end activity_start
    from 
      vending_machine_log
  )
)
group by activity_counter
order by activity_counter nulls first;

ACTIVITY_BEGIN       ACTIVITY_END         SOLD_ITEMS
-------------------- -------------------- ----------
nov 27 2005 13:05:27 nov 27 2005 13:10:58          3
nov 27 2005 14:28:05 nov 27 2005 14:36:32          4
nov 27 2005 14:51:18 nov 27 2005 14:55:57          2
nov 27 2005 16:22:39 nov 27 2005 16:22:39          1
nov 27 2005 16:28:06 nov 27 2005 16:32:17          3
It can of course be argued if the second last period really is a period because only one item was sold then.

More on Oracle

This is an on Oracle article. The most current articles of this series can be found here.