Oracle Data Mining and Analytics: Automating Pivot Queries

Suppose you want to create a result set where the columns of a table become the rows or the rows become the columns. In other words, you want to pivot your table, transforming columns into rows or rows into columns. How do you do that? It turns out it is quite easy using SQL. In this post I give a PL/SQL procedure that automates the creation of pivot queries to pivot columns into rows.

Tom Kyte has a great post at AskTom describing pivot queries (link). He shows how to pivot columns into rows and vice-versa using SQL queries. He also provides a PL/SQL procedure for rows to columns pivoting that returns a refcursor. Because I need to pivot columns into rows quite often when working with Bioinformatics data, I thought that it would be a good idea to automate the whole approach. The utility presented here (pivotc2r) complements that provided in AskTom. It has the following features:

Automatic pivoting of columns into rows instead of rows into columns
Automatic determination of number of columns to pivot
Transform only columns of a given data type
Support, through nested DECODE statements, as many columns as allowed by the RDBMS
Generation of a table or view instead of a refcursor for output

Let's say that we have the following data set:

SELECT deptno, job, ename, sal FROM emp;

    DEPTNO JOB       ENAME             SAL
---------- --------- ---------- ----------
        20 CLERK     SMITH             800
        30 SALESMAN  ALLEN            1600
        30 SALESMAN  WARD             1250
        20 MANAGER   JONES            2975
        30 SALESMAN  MARTIN           1250
        30 MANAGER   BLAKE            2850
        10 MANAGER   CLARK            2450
        20 ANALYST   SCOTT            3000
        10 PRESIDENT KING             5000
        30 SALESMAN  TURNER           1500
        20 CLERK     ADAMS            1100
        30 CLERK     JAMES             950
        20 ANALYST   FORD             3000
        10 CLERK     MILLER           1300

14 rows selected.

We would like to transform the data so that we keep ENAME and JOB as is and pivot SAL and DEPTNO into rows. The procedure pivotc2r does that and has the following specification:

CREATE OR REPLACE PACKAGE pivot_pkg AS
  TYPE array IS TABLE OF VARCHAR2(30);
  TYPE refcursor IS REF CURSOR;
  
  PROCEDURE pivotc2r( p_input_name     IN VARCHAR2, 
                      p_output_name    IN VARCHAR2,
                      p_out_type       IN VARCHAR2 DEFAULT 'VIEW',
                      p_data_type      VARCHAR2 DEFAULT 'NUMBER',  
                      p_anchor         IN array DEFAULT NULL,
                      p_anchor_label   IN array DEFAULT NULL,
                      p_pivot_label    IN VARCHAR2 DEFAULT 'COL_NAME',
                      p_value_label    IN VARCHAR2 DEFAULT 'VALUE');
END;
/

The first two arguments are the name for the input and output tables or views. The argument P_OUT_TYPE indicates if the output should produce a table ('TABLE') or a view ('VIEW'). The next argument, P_DATA_TYPE, states the data type (default NUMBER) of the columns to be pivot. All columns of this type are pivoted with the exception of those in the P_ANCHOR argument. The P_ANCHOR argument is an array of column names. It indicates the columns that should stay CROSS RECORD (down the page). For our example P_ANCHOR = ('ENAME','JOB').

The next three arguments allow assigning new names to the output columns. P_ANCHOR_LABEL is an array with the same number of elements as the P_ANCHOR array. If NULL, the original anchor column names are kept. P_PIVOT_LABEL is the name for the column in the output indicating the name of the original columns. P_VALUE_LABEL is the name of the column holding the values of the original columns.

Before describing the code, let's take a look at an example. First we create a view of the table EMP with the columns we want in the output:

CREATE VIEW emp1 AS 
SELECT ename, job, deptno, sal 
FROM emp;

Next we execute the procedure:

DROP VIEW outemp1;
BEGIN
   pivot_pkg.pivotc2r
        ( p_input_name => 'EMP1',
          p_output_name => 'OUTEMP1',
          p_anchor => pivot_pkg.array('ENAME','JOB'));
END;

/SQL> SELECT * FROM outemp1;

ENAME      JOB       COL_NA      VALUE
---------- --------- ------ ----------
SMITH      CLERK     DEPTNO         20
ALLEN      SALESMAN  DEPTNO         30
WARD       SALESMAN  DEPTNO         30
JONES      MANAGER   DEPTNO         20
MARTIN     SALESMAN  DEPTNO         30
BLAKE      MANAGER   DEPTNO         30
CLARK      MANAGER   DEPTNO         10
SCOTT      ANALYST   DEPTNO         20
KING       PRESIDENT DEPTNO         10
TURNER     SALESMAN  DEPTNO         30
ADAMS      CLERK     DEPTNO         20
JAMES      CLERK     DEPTNO         30
FORD       ANALYST   DEPTNO         20
MILLER     CLERK     DEPTNO         10
SMITH      CLERK     SAL           800
ALLEN      SALESMAN  SAL          1600
WARD       SALESMAN  SAL          1250
JONES      MANAGER   SAL          2975
MARTIN     SALESMAN  SAL          1250
BLAKE      MANAGER   SAL          2850
CLARK      MANAGER   SAL          2450
SCOTT      ANALYST   SAL          3000
KING       PRESIDENT SAL          5000
TURNER     SALESMAN  SAL          1500
ADAMS      CLERK     SAL          1100
JAMES      CLERK     SAL           950
FORD       ANALYST   SAL          3000
MILLER     CLERK     SAL          1300

28 rows selected.

As illustrated in the above results, the procedure automatically pivoted SAL and DEPTNO into rows by combining their values into a single column (VALUE). Another column (COLUMN_NAME) indicates the name of the original column for each value. The implementation of the package body is quite simple:

CREATE OR REPLACE PACKAGE BODY pivot_pkg AS
  PROCEDURE pivotc2r( p_input_name     IN VARCHAR2, 
                      p_output_name    IN VARCHAR2,
                      p_out_type       IN VARCHAR2 DEFAULT 'VIEW',
                      p_data_type      VARCHAR2 DEFAULT 'NUMBER',  
                      p_anchor         IN array DEFAULT NULL,
                      p_anchor_label   IN array DEFAULT NULL,
                      p_pivot_label    IN VARCHAR2 DEFAULT 'COL_NAME',
                      p_value_label    IN VARCHAR2 DEFAULT 'VALUE')
  AS  
    v_cursor_col        refcursor;
    v_cols              array;
    v_sql               long;
    v_sql1              long;
    v_sql2              long;
    v_max_cols          NUMBER;
    v_parens            NUMBER := 1;
    v_max_decode        NUMBER := 127;
  BEGIN                 
    -- Get column name for the input table...
    --  
    v_sql := 'SELECT column_name FROM user_tab_columns ' ||
             'WHERE table_name=:1  AND data_type=:2 ';
  
    -- Filter the anchor columns
    IF (p_anchor IS NOT NULL) THEN
      v_sql := v_sql || ' AND column_name NOT IN(''' ||
               p_anchor(1) || '''';
      FOR i IN 2..p_anchor.count LOOP
        v_sql :=  v_sql || ',''' || p_anchor(i)||'''';
      END LOOP;   
      v_sql := v_sql || ')';
    END IF;  
    EXECUTE IMMEDIATE v_sql BULK COLLECT INTO v_cols
            USING  p_input_name, p_data_type; 
    v_max_cols := v_cols.count;
  
    -- Now, construct the query to pivot the columns into
    -- rows. 
    v_sql :='CREATE ' || p_out_type || ' ' || p_output_name ||
            ' AS SELECT ';
  
    -- Add anchor columns and get maximum number of columns
    IF (p_anchor IS NOT NULL) THEN
      FOR i IN 1..p_anchor.count LOOP
        v_sql  :=  v_sql || p_anchor(i);
        IF (p_anchor_label IS NOT NULL) THEN
          v_sql := v_sql || ' ' || p_anchor_label(i) ||',';
        ELSE   
          v_sql := v_sql || ',';
        END IF;
      END LOOP;
      v_sql1 := rtrim(v_sql1,',');
    END IF;

    -- Pivot the columns
    v_sql1 := ' DECODE(a.r, ';
    v_sql2 := ' DECODE(a.r, ';
  
    FOR i IN 1..v_cols.count LOOP
      IF (MOD(i,v_max_decode) = 0) THEN
        v_sql1 := v_sql1 ||' DECODE(a.r, ';
        v_sql2 := v_sql2 ||' DECODE(a.r, ';
        v_parens := v_parens + 1;
      END IF;
      v_sql1 := v_sql1 || i ||',''' || v_cols(i) || ''',';
      v_sql2 := v_sql2 || i ||',' || v_cols(i) || ',';
    END LOOP;
    v_sql1 := rtrim(v_sql1,',');
    v_sql1 := rpad(v_sql1,v_parens+length(v_sql1),')') ||' '
              ||p_pivot_label;
    v_sql2 := rtrim(v_sql2,',');
    v_sql2 := rpad(v_sql2,v_parens+length(v_sql2),')') ||' '
              ||p_value_label;
    v_sql  := v_sql || v_sql1 || ',' || v_sql2;
    
    -- Add the original data
    v_sql := v_sql || 
             ' FROM (SELECT * FROM ' || p_input_name;
  
    -- Add the counter to drive the decode statements
    v_sql := v_sql || '),' ||
             '(SELECT rownum r FROM all_objects ' ||
             ' WHERE rownum <= ' || v_max_cols || ') a';
    EXECUTE IMMEDIATE v_sql;
  END pivotc2r;
END;
/
SHOW ERRORS;

The procedure does mainly string manipulation for creating the dynamic SQL for the transformation. First, it obtains the names of the columns in the input that match the specified data type. Anchor columns are filtered out. The remaining columns are the ones that are pivoted. The rest of the code implements the pivot approach described in AskTom. If the number of columns exceeds 127, the code creates a new DECODE statement nested within the previous one. This is done to avoid exceeding the maximum (255) number of arguments allowed in a single DECODE statement. Each column contributes with two arguments to the DECODE statement

The above procedure can be easily modified to take a refcursor or a query string as an input. In this case, we can use the DBMS_SQL package to handle the unknown number of columns in the input data. It can also be made into a pipelined table function.

EDIT: The maximum number of arguments allowed in a single decode, counting all arguments individually, is 255. I have changed the text to reflect this and the code as well. The only code change required was to initialize the v_max_decode variable to 127.

Hi on db 10.2 the package/procedure pivotc2r is compiled without errors.
When using the procedure to create outemp1 out of the emp1 view, we got :
BEGIN
*
ERROR at line 1:
ORA-00905: missing keyword
ORA-06512: at "SYSTEM.PIVOT_PKG", line 78
ORA-06512: at line 2.
Any Idea?

Posted by Anonymous | 11/09/2006 05:46:00 AM

I have just tried it out on a newly installed 10.2.0.1 database and it worked okay.

Make sure that you are able to create the header, the body, and the input view without a problem. Then run the PL/SQL snippet in the blog that invokes the procedure.

Posted by Marcos | 11/10/2006 06:28:00 PM

Oracle Data Mining and Analytics

Blog on Data Mining and Analytics, with a special focus on Oracle. Analytics paves the way to transform databases into Knowledgebases.

Friday, March 24, 2006

Automating Pivot Queries

About me

Disclaimer

Email-Digest

Feeds

Search

Series

Posts

Category Cloud

Links

Reading lists