21 dicembre 2008

Analytics - n.4: ranking

Stavolta ci occupiamo delle funzioni RANK e DENSE_RANK.

Consideriamo ancora l'esempio dei maratoneti.
All'arrivo possiamo fare la classifica generale, ordinando per tempo di arrivo e assegnando a ogni corridore un numero progressivo che parte da 1, ad esempio con ROWNUM. Prendo un sottoinsieme della mia tabella di esempio per semplicità:
SQL> select ri, at, rownum rn from
2 (select runner_id ri, arrival_time at
3 from marathon
4 where arrival_time between to_date('2008-06-19 13:01:00', 'YYYY-MM-DD HH24:MI:SS')
5 and to_date('2008-06-19 13:03:00', 'YYYY-MM-DD HH24:MI:SS') order by arrival_time)
6 ;

RI AT RN
---------- ------------------- ----------
2015 2008-06-19 13:01:04 1
2293 2008-06-19 13:01:05 2
3895 2008-06-19 13:01:06 3
4931 2008-06-19 13:01:07 4
4810 2008-06-19 13:01:10 5
2747 2008-06-19 13:01:17 6
125 2008-06-19 13:01:19 7
1664 2008-06-19 13:01:20 8
632 2008-06-19 13:01:21 9
638 2008-06-19 13:01:23 10
2207 2008-06-19 13:01:31 11
4816 2008-06-19 13:01:32 12
3550 2008-06-19 13:01:33 13
3632 2008-06-19 13:01:33 14
309 2008-06-19 13:01:34 15
2254 2008-06-19 13:01:36 16
1671 2008-06-19 13:01:43 17
2068 2008-06-19 13:01:46 18
3167 2008-06-19 13:02:02 19
4269 2008-06-19 13:02:02 20
4962 2008-06-19 13:02:02 21
3921 2008-06-19 13:02:02 22
2298 2008-06-19 13:02:29 23
4933 2008-06-19 13:02:30 24
2093 2008-06-19 13:02:34 25
1528 2008-06-19 13:02:36 26
1779 2008-06-19 13:02:40 27
4313 2008-06-19 13:02:40 28
2547 2008-06-19 13:02:40 29
4482 2008-06-19 13:02:43 30
3667 2008-06-19 13:02:44 31
1117 2008-06-19 13:02:51 32
4393 2008-06-19 13:02:53 33
591 2008-06-19 13:02:55 34

34 rows selected.

Ma che cosa succede se due corridori hanno lo stesso tempo di arrivo (es. 13:02:02)? Bisogna assegnargli la posizione a parimerito. La funzione DENSE_RANK ci viene in aiuto (colonna DR):
SQL> select ri, at, rownum rn, dense_rank() over (order by at) dr from
2 (select runner_id ri, arrival_time at
3 from marathon
4 where arrival_time between to_date('2008-06-19 13:01:00', 'YYYY-MM-DD HH24:MI:SS')
5 and to_date('2008-06-19 13:03:00', 'YYYY-MM-DD HH24:MI:SS') order by arrival_time)
6 ;

RI AT RN DR
---------- ------------------- ---------- ----------
2015 2008-06-19 13:01:04 1 1
2293 2008-06-19 13:01:05 2 2
3895 2008-06-19 13:01:06 3 3
4931 2008-06-19 13:01:07 4 4
4810 2008-06-19 13:01:10 5 5
2747 2008-06-19 13:01:17 6 6
125 2008-06-19 13:01:19 7 7
1664 2008-06-19 13:01:20 8 8
632 2008-06-19 13:01:21 9 9
638 2008-06-19 13:01:23 10 10
2207 2008-06-19 13:01:31 11 11
4816 2008-06-19 13:01:32 12 12
3550 2008-06-19 13:01:33 13 13
3632 2008-06-19 13:01:33 14 13
309 2008-06-19 13:01:34 15 14
2254 2008-06-19 13:01:36 16 15
1671 2008-06-19 13:01:43 17 16
2068 2008-06-19 13:01:46 18 17
3167 2008-06-19 13:02:02 19 18
4269 2008-06-19 13:02:02 20 18
4962 2008-06-19 13:02:02 21 18
3921 2008-06-19 13:02:02 22 18
2298 2008-06-19 13:02:29 23 19
4933 2008-06-19 13:02:30 24 20
2093 2008-06-19 13:02:34 25 21
1528 2008-06-19 13:02:36 26 22
1779 2008-06-19 13:02:40 27 23
4313 2008-06-19 13:02:40 28 23
2547 2008-06-19 13:02:40 29 23
4482 2008-06-19 13:02:43 30 24
3667 2008-06-19 13:02:44 31 25
1117 2008-06-19 13:02:51 32 26
4393 2008-06-19 13:02:53 33 27
591 2008-06-19 13:02:55 34 28

34 rows selected.

Diverso sarebbe il caso in cui volessimo considerare l'ordine di arrivo, fatti salvi i corridori a parimerito: infatti quelli a parimerito sarebbero tutti nella stessa posizione, mentre il successivo avrebbe comunque la posizione assoluta (tipo ROWNUM) conservata (colonna R):
SQL> select ri, at, rownum rn, dense_rank() over (order by at) dr,
2 rank() over (order by at) r from
3 (select runner_id ri, arrival_time at
4 from marathon
5 where arrival_time between to_date('2008-06-19 13:01:00', 'YYYY-MM-DD HH24:MI:SS')
6 and to_date('2008-06-19 13:03:00', 'YYYY-MM-DD HH24:MI:SS') order by arrival_time)
7 ;

RI AT RN DR R
---------- ------------------- ---------- ---------- ----------
2015 2008-06-19 13:01:04 1 1 1
2293 2008-06-19 13:01:05 2 2 2
3895 2008-06-19 13:01:06 3 3 3
4931 2008-06-19 13:01:07 4 4 4
4810 2008-06-19 13:01:10 5 5 5
2747 2008-06-19 13:01:17 6 6 6
125 2008-06-19 13:01:19 7 7 7
1664 2008-06-19 13:01:20 8 8 8
632 2008-06-19 13:01:21 9 9 9
638 2008-06-19 13:01:23 10 10 10
2207 2008-06-19 13:01:31 11 11 11
4816 2008-06-19 13:01:32 12 12 12
3550 2008-06-19 13:01:33 13 13 13
3632 2008-06-19 13:01:33 14 13 13
309 2008-06-19 13:01:34 15 14 15
2254 2008-06-19 13:01:36 16 15 16
1671 2008-06-19 13:01:43 17 16 17
2068 2008-06-19 13:01:46 18 17 18
3167 2008-06-19 13:02:02 19 18 19
4269 2008-06-19 13:02:02 20 18 19
4962 2008-06-19 13:02:02 21 18 19
3921 2008-06-19 13:02:02 22 18 19
2298 2008-06-19 13:02:29 23 19 23
4933 2008-06-19 13:02:30 24 20 24
2093 2008-06-19 13:02:34 25 21 25
1528 2008-06-19 13:02:36 26 22 26
1779 2008-06-19 13:02:40 27 23 27
4313 2008-06-19 13:02:40 28 23 27
2547 2008-06-19 13:02:40 29 23 27
4482 2008-06-19 13:02:43 30 24 30
3667 2008-06-19 13:02:44 31 25 31
1117 2008-06-19 13:02:51 32 26 32
4393 2008-06-19 13:02:53 33 27 33
591 2008-06-19 13:02:55 34 28 34

34 rows selected.

Un'altra funzione è ROW_NUMBER, che semplicemente assegna un valore crescente a partire da 1 alle righe della partizione, nulla più.
In pratica potremmo sostituire la subquery più interna che fa l'ordinamento direttamente con ROW_NUMBER, senza utilizzare ROWNUM:
SQL> select runner_id ri, arrival_time at, row_number() over (order by arrival_time) rn
2 from marathon
3 where arrival_time between to_date('2008-06-19 13:01:00', 'YYYY-MM-DD HH24:MI:SS')
4 and to_date('2008-06-19 13:03:00', 'YYYY-MM-DD HH24:MI:SS') order by arrival_time;

RI AT RN
---------- ------------------- ----------
2015 2008-06-19 13:01:04 1
2293 2008-06-19 13:01:05 2
3895 2008-06-19 13:01:06 3
4931 2008-06-19 13:01:07 4
4810 2008-06-19 13:01:10 5
2747 2008-06-19 13:01:17 6
125 2008-06-19 13:01:19 7
1664 2008-06-19 13:01:20 8
632 2008-06-19 13:01:21 9
638 2008-06-19 13:01:23 10
2207 2008-06-19 13:01:31 11
4816 2008-06-19 13:01:32 12
3550 2008-06-19 13:01:33 13
3632 2008-06-19 13:01:33 14
309 2008-06-19 13:01:34 15
2254 2008-06-19 13:01:36 16
1671 2008-06-19 13:01:43 17
2068 2008-06-19 13:01:46 18
3167 2008-06-19 13:02:02 19
4269 2008-06-19 13:02:02 20
4962 2008-06-19 13:02:02 21
3921 2008-06-19 13:02:02 22
2298 2008-06-19 13:02:29 23
4933 2008-06-19 13:02:30 24
2093 2008-06-19 13:02:34 25
1528 2008-06-19 13:02:36 26
1779 2008-06-19 13:02:40 27
4313 2008-06-19 13:02:40 28
2547 2008-06-19 13:02:40 29
4482 2008-06-19 13:02:43 30
3667 2008-06-19 13:02:44 31
1117 2008-06-19 13:02:51 32
4393 2008-06-19 13:02:53 33
591 2008-06-19 13:02:55 34

34 rows selected.

Nessun commento: