René Nyffenegger's collection of things on the web | |
René Nyffenegger on Oracle - Most wanted - Feedback
- Follow @renenyffenegger
|
November 28, 2005: On identifiying «periods of activity» | ||
Here's a table that logs what items (
sold_item ) were bought at what time (dt ) from a vending machine:
create table vending_machine_log ( sold_item varchar2(20), dt date );
The content of the table is:
SOLD_ITEM DT -------------------- -------------------- Coke nov 27 2005 13:05:27 Sprite nov 27 2005 13:07:14 Dr. Pepper nov 27 2005 13:10:58 Diet Coke nov 27 2005 14:28:05 Fanta nov 27 2005 14:31:54 Pepsi Max nov 27 2005 14:32:29 Mountain Dew nov 27 2005 14:36:32 Root Beer nov 27 2005 14:51:18 Fanta nov 27 2005 14:55:57 Pepsi nov 27 2005 16:22:39 Sprite nov 27 2005 16:28:06 Vanilla Coke nov 27 2005 16:28:42 Coke nov 27 2005 16:32:17
Now, I want to know if there were «periods of activity» on this machine. I define: A sold item belongs to the same
period of activity if it was sold no later than five minutes after the previous sale.
First, I need to identify each row that starts a new activity. This is done by the following query which returns a 1 in the column activity_start
for each row that starts a new period of activity, and a
null otherwise. I can do that by using the
lag analytical function. Lag compares each sale's
dt with its previous sale's
dt and when it is less than five minutes apart (5*1/24/60), the enclosing case .. when
returns 1:
select dt, case when dt - lag(dt, 1) over (order by dt) > 5 * 1/24/60 then 1 else null end activity_start from vending_machine_log; DT ACTIVITY_START -------------------- -------------- nov 27 2005 13:05:27 nov 27 2005 13:07:14 nov 27 2005 13:10:58 nov 27 2005 14:28:05 1 nov 27 2005 14:31:54 nov 27 2005 14:32:29 nov 27 2005 14:36:32 nov 27 2005 14:51:18 1 nov 27 2005 14:55:57 nov 27 2005 16:22:39 1 nov 27 2005 16:28:06 1 nov 27 2005 16:28:42 nov 27 2005 16:32:17
Based on this result set, I can now give each period a unique number (or null for the first period) which is returned in
activity_counter . This is achieved by the
sum analytical function. The previous query is here used as a subquery:
select dt, sum(activity_start) over (order by dt) activity_counter from ( select dt, case when dt - lag(dt, 1) over (order by dt) > 5 * 1/24/60 then 1 else null end activity_start from vending_machine_log ); DT ACTIVITY_COUNTER -------------------- ---------------- nov 27 2005 13:05:27 nov 27 2005 13:07:14 nov 27 2005 13:10:58 nov 27 2005 14:28:05 1 nov 27 2005 14:31:54 1 nov 27 2005 14:32:29 1 nov 27 2005 14:36:32 1 nov 27 2005 14:51:18 2 nov 27 2005 14:55:57 2 nov 27 2005 16:22:39 3 nov 27 2005 16:28:06 4 nov 27 2005 16:28:42 4 nov 27 2005 16:32:17 4
This result set can now be grouped by the activity counter. The
min and
max aggregate functions return the activity_begin and
activity_end for each period. Also, using count(*), I see how many
items were sold within a period. Again, I use the previous select statement as a subquery for the following query:
select min (dt) activity_begin, max (dt) activity_end, count( *) sold_items from ( select dt, sum(activity_start) over (order by dt) activity_counter from ( select dt, case when dt - lag(dt, 1) over (order by dt) > 5 * 1/24/60 then 1 else null end activity_start from vending_machine_log ) ) group by activity_counter order by activity_counter nulls first; ACTIVITY_BEGIN ACTIVITY_END SOLD_ITEMS -------------------- -------------------- ---------- nov 27 2005 13:05:27 nov 27 2005 13:10:58 3 nov 27 2005 14:28:05 nov 27 2005 14:36:32 4 nov 27 2005 14:51:18 nov 27 2005 14:55:57 2 nov 27 2005 16:22:39 nov 27 2005 16:22:39 1 nov 27 2005 16:28:06 nov 27 2005 16:32:17 3
It can of course be argued if the second last period really is a period because only one item was sold then.
More on OracleThis is an on Oracle article. The most current articles of this series can be found here.
|