« Home | Time Series Forecasting 2 - Single-step Forecasting » | The Democratization of Business Intelligence » | Running Oracle Data Miner on the Mac » | Real-Time Scoring & Model Management 1 - Life Cycle » | Real-Time Scoring & Model Management 2 - Implement... » | Real-Time Scoring & Model Management 3 - Performance » | Oracle Life Sciences Meeting » | Improved Stock Research Through Analytics? » | Multi-touch Screens - The Next Input Device? » | Poll - How do you work with analytics? » 

Friday, March 24, 2006 

Automating Pivot Queries

Suppose you want to create a result set where the columns of a table become the rows or the rows become the columns. In other words, you want to pivot your table, transforming columns into rows or rows into columns. How do you do that? It turns out it is quite easy using SQL. In this post I give a PL/SQL procedure that automates the creation of pivot queries to pivot columns into rows.

Tom Kyte has a great post at AskTom describing pivot queries (link). He shows how to pivot columns into rows and vice-versa using SQL queries. He also provides a PL/SQL procedure for rows to columns pivoting that returns a refcursor. Because I need to pivot columns into rows quite often when working with Bioinformatics data, I thought that it would be a good idea to automate the whole approach. The utility presented here (pivotc2r) complements that provided in AskTom. It has the following features:

  • Automatic pivoting of columns into rows instead of rows into columns
  • Automatic determination of number of columns to pivot
  • Transform only columns of a given data type
  • Support, through nested DECODE statements, as many columns as allowed by the RDBMS
  • Generation of a table or view instead of a refcursor for output
Let's say that we have the following data set:
SELECT deptno, job, ename, sal FROM emp;

DEPTNO JOB ENAME SAL
---------- --------- ---------- ----------
20 CLERK SMITH 800
30 SALESMAN ALLEN 1600
30 SALESMAN WARD 1250
20 MANAGER JONES 2975
30 SALESMAN MARTIN 1250
30 MANAGER BLAKE 2850
10 MANAGER CLARK 2450
20 ANALYST SCOTT 3000
10 PRESIDENT KING 5000
30 SALESMAN TURNER 1500
20 CLERK ADAMS 1100
30 CLERK JAMES 950
20 ANALYST FORD 3000
10 CLERK MILLER 1300

14 rows selected.
We would like to transform the data so that we keep ENAME and JOB as is and pivot SAL and DEPTNO into rows. The procedure pivotc2r does that and has the following specification:
CREATE OR REPLACE PACKAGE pivot_pkg AS
TYPE array IS TABLE OF VARCHAR2(30);
TYPE refcursor IS REF CURSOR;

PROCEDURE pivotc2r( p_input_name IN VARCHAR2,
p_output_name IN VARCHAR2,
p_out_type IN VARCHAR2 DEFAULT 'VIEW',
p_data_type VARCHAR2 DEFAULT 'NUMBER',
p_anchor IN array DEFAULT NULL,
p_anchor_label IN array DEFAULT NULL,
p_pivot_label IN VARCHAR2 DEFAULT 'COL_NAME',
p_value_label IN VARCHAR2 DEFAULT 'VALUE');
END;
/
The first two arguments are the name for the input and output tables or views. The argument P_OUT_TYPE indicates if the output should produce a table ('TABLE') or a view ('VIEW'). The next argument, P_DATA_TYPE, states the data type (default NUMBER) of the columns to be pivot. All columns of this type are pivoted with the exception of those in the P_ANCHOR argument. The P_ANCHOR argument is an array of column names. It indicates the columns that should stay CROSS RECORD (down the page). For our example P_ANCHOR = ('ENAME','JOB').

The next three arguments allow assigning new names to the output columns. P_ANCHOR_LABEL is an array with the same number of elements as the P_ANCHOR array. If NULL, the original anchor column names are kept. P_PIVOT_LABEL is the name for the column in the output indicating the name of the original columns. P_VALUE_LABEL is the name of the column holding the values of the original columns.

Before describing the code, let's take a look at an example. First we create a view of the table EMP with the columns we want in the output:
CREATE VIEW emp1 AS 
SELECT ename, job, deptno, sal
FROM emp;
Next we execute the procedure:
DROP VIEW outemp1;
BEGIN
pivot_pkg.pivotc2r
( p_input_name => 'EMP1',
p_output_name => 'OUTEMP1',
p_anchor => pivot_pkg.array('ENAME','JOB'));
END;

/SQL> SELECT * FROM outemp1;

ENAME JOB COL_NA VALUE
---------- --------- ------ ----------
SMITH CLERK DEPTNO 20
ALLEN SALESMAN DEPTNO 30
WARD SALESMAN DEPTNO 30
JONES MANAGER DEPTNO 20
MARTIN SALESMAN DEPTNO 30
BLAKE MANAGER DEPTNO 30
CLARK MANAGER DEPTNO 10
SCOTT ANALYST DEPTNO 20
KING PRESIDENT DEPTNO 10
TURNER SALESMAN DEPTNO 30
ADAMS CLERK DEPTNO 20
JAMES CLERK DEPTNO 30
FORD ANALYST DEPTNO 20
MILLER CLERK DEPTNO 10
SMITH CLERK SAL 800
ALLEN SALESMAN SAL 1600
WARD SALESMAN SAL 1250
JONES MANAGER SAL 2975
MARTIN SALESMAN SAL 1250
BLAKE MANAGER SAL 2850
CLARK MANAGER SAL 2450
SCOTT ANALYST SAL 3000
KING PRESIDENT SAL 5000
TURNER SALESMAN SAL 1500
ADAMS CLERK SAL 1100
JAMES CLERK SAL 950
FORD ANALYST SAL 3000
MILLER CLERK SAL 1300

28 rows selected.
As illustrated in the above results, the procedure automatically pivoted SAL and DEPTNO into rows by combining their values into a single column (VALUE). Another column (COLUMN_NAME) indicates the name of the original column for each value. The implementation of the package body is quite simple:
CREATE OR REPLACE PACKAGE BODY pivot_pkg AS
PROCEDURE pivotc2r( p_input_name IN VARCHAR2,
p_output_name IN VARCHAR2,
p_out_type IN VARCHAR2 DEFAULT 'VIEW',
p_data_type VARCHAR2 DEFAULT 'NUMBER',
p_anchor IN array DEFAULT NULL,
p_anchor_label IN array DEFAULT NULL,
p_pivot_label IN VARCHAR2 DEFAULT 'COL_NAME',
p_value_label IN VARCHAR2 DEFAULT 'VALUE')
AS
v_cursor_col refcursor;
v_cols array;
v_sql long;
v_sql1 long;
v_sql2 long;
v_max_cols NUMBER;
v_parens NUMBER := 1;
v_max_decode NUMBER := 127;
BEGIN
-- Get column name for the input table...
--
v_sql := 'SELECT column_name FROM user_tab_columns ' ||
'WHERE table_name=:1 AND data_type=:2 ';

-- Filter the anchor columns
IF (p_anchor IS NOT NULL) THEN
v_sql := v_sql || ' AND column_name NOT IN(''' ||
p_anchor(1) || '''';
FOR i IN 2..p_anchor.count LOOP
v_sql := v_sql || ',''' || p_anchor(i)||'''';
END LOOP;
v_sql := v_sql || ')';
END IF;
EXECUTE IMMEDIATE v_sql BULK COLLECT INTO v_cols
USING p_input_name, p_data_type;
v_max_cols := v_cols.count;

-- Now, construct the query to pivot the columns into
-- rows.
v_sql :='CREATE ' || p_out_type || ' ' || p_output_name ||
' AS SELECT ';

-- Add anchor columns and get maximum number of columns
IF (p_anchor IS NOT NULL) THEN
FOR i IN 1..p_anchor.count LOOP
v_sql := v_sql || p_anchor(i);
IF (p_anchor_label IS NOT NULL) THEN
v_sql := v_sql || ' ' || p_anchor_label(i) ||',';
ELSE
v_sql := v_sql || ',';
END IF;
END LOOP;
v_sql1 := rtrim(v_sql1,',');
END IF;

-- Pivot the columns
v_sql1 := ' DECODE(a.r, ';
v_sql2 := ' DECODE(a.r, ';

FOR i IN 1..v_cols.count LOOP
IF (MOD(i,v_max_decode) = 0) THEN
v_sql1 := v_sql1 ||' DECODE(a.r, ';
v_sql2 := v_sql2 ||' DECODE(a.r, ';
v_parens := v_parens + 1;
END IF;
v_sql1 := v_sql1 || i ||',''' || v_cols(i) || ''',';
v_sql2 := v_sql2 || i ||',' || v_cols(i) || ',';
END LOOP;
v_sql1 := rtrim(v_sql1,',');
v_sql1 := rpad(v_sql1,v_parens+length(v_sql1),')') ||' '
||p_pivot_label;
v_sql2 := rtrim(v_sql2,',');
v_sql2 := rpad(v_sql2,v_parens+length(v_sql2),')') ||' '
||p_value_label;
v_sql := v_sql || v_sql1 || ',' || v_sql2;

-- Add the original data
v_sql := v_sql ||
' FROM (SELECT * FROM ' || p_input_name;

-- Add the counter to drive the decode statements
v_sql := v_sql || '),' ||
'(SELECT rownum r FROM all_objects ' ||
' WHERE rownum <= ' || v_max_cols || ') a';
EXECUTE IMMEDIATE v_sql;
END pivotc2r;
END;
/
SHOW ERRORS;
The procedure does mainly string manipulation for creating the dynamic SQL for the transformation. First, it obtains the names of the columns in the input that match the specified data type. Anchor columns are filtered out. The remaining columns are the ones that are pivoted. The rest of the code implements the pivot approach described in AskTom. If the number of columns exceeds 127, the code creates a new DECODE statement nested within the previous one. This is done to avoid exceeding the maximum (255) number of arguments allowed in a single DECODE statement. Each column contributes with two arguments to the DECODE statement

The above procedure can be easily modified to take a refcursor or a query string as an input. In this case, we can use the DBMS_SQL package to handle the unknown number of columns in the input data. It can also be made into a pipelined table function.

EDIT: The maximum number of arguments allowed in a single decode, counting all arguments individually, is 255. I have changed the text to reflect this and the code as well. The only code change required was to initialize the v_max_decode variable to 127.

Hi on db 10.2 the package/procedure pivotc2r is compiled without errors.
When using the procedure to create outemp1 out of the emp1 view, we got :
BEGIN
*
ERROR at line 1:
ORA-00905: missing keyword
ORA-06512: at "SYSTEM.PIVOT_PKG", line 78
ORA-06512: at line 2.
Any Idea?

I have just tried it out on a newly installed 10.2.0.1 database and it worked okay.

Make sure that you are able to create the header, the body, and the input view without a problem. Then run the PL/SQL snippet in the blog that invokes the procedure.

Post a Comment

About me

  • Marcos M. Campos: Development Manager for Oracle Data Mining Technologies. Previously Senior Scientist with Thinking Machines. Over the years I have been working on transforming databases into easy to use analytical servers.
  • My profile

Disclaimer

  • Opinions expressed are entirely my own and do not reflect the position of Oracle or any other corporation. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
  • This work is licensed under a Creative Commons license.
  • Creative Commons License

Email-Digest



Feeds

Search


Posts

All Posts

Category Cloud

Links

Locations of visitors to this page
Powered by Blogger
Get Firefox