Oracle Data Mining and Analytics: Explaining Predictions with Easy to Read Tree Rules

When scoring decision tree data mining models using SQL PREDICTION operators, one can obtain a node ID of the tree using the PREDICTION_DETAILS operator. This node ID indicates which node in the decision tree was used to predict the outcome for a specific record (row). Sometimes, we may also want to produce a textual rule for the specific node, and provide this rule on a per-row basis when scoring new data, for example:

PREDICTION RULE
---------- ----------------------
YES        RELATIONSHIP isIn ("Husband" "Wife") and
           EDUCATION_NUM greaterThan 12.5 and
           EDUCATION isIn ("Bach." "Masters" "PhD" "Profsc") and
           CAPITAL_GAIN greaterThan 5095.5

Peter Stengard sent me a tip on how to create this type of textual tree rules. The idea is to create a view (or a table) from the XML tree model representation and join it with the PREDICTION operator statement in the query. Before describing how this can be implemented, let's look at an example.

Consider the following data set:

SQL> SELECT empno, job, sal, deptno FROM emp;

     EMPNO JOB            SAL     DEPTNO
---------- --------- ---------- ----------
      7369 CLERK            800         20
      7499 SALESMAN        1600         30
      7521 SALESMAN        1250         30
      7566 MANAGER         2975         20
      7654 SALESMAN        1250         30
      7698 MANAGER         2850         30
      7782 MANAGER         2450         10
      7788 ANALYST         3000         20
      7839 PRESIDENT       5000         10
      7844 SALESMAN        1500         30
      7876 CLERK           1100         20
      7900 CLERK            950         30
      7902 ANALYST         3000         20
      7934 CLERK           1300         10

14 rows selected.

Also consider a decision tree model tree_model for predicting JOB based on SAL and DEPTNO. We can create this model as follows:

CREATE VIEW emp_view AS
SELECT empno, job, sal, deptno FROM emp;

CREATE TABLE my_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(30));

BEGIN
  -- Populate settings table
  INSERT INTO my_settings VALUES
    (dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree);
  INSERT INTO my_settings VALUES
    (dbms_data_mining.tree_term_minrec_split, 2);
  INSERT INTO my_settings VALUES
    (dbms_data_mining.tree_term_minrec_node, 1);
       
  COMMIT;
       
  dbms_data_mining.create_model(
    model_name          => 'TREE_MODEL',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'EMP_VIEW',
    case_id_column_name => 'EMPNO',
    target_column_name  => 'JOB',
    settings_table_name => 'my_settings');
END;

The following query scores the data using the model tree_model and also returns the tree node ID a record was assigned to:

SQL> SELECT PREDICTION(tree_model USING a.*) prediction,
            PREDICTION_DETAILS(tree_model USING a.*) node_id
     FROM emp a
     ORDER BY empno;

PREDICTION           NODE_ID
-------------------- --------------------
CLERK                <Node id="2"/>
CLERK                <Node id="2"/>
CLERK                <Node id="2"/>
MANAGER              <Node id="3"/>
CLERK                <Node id="2"/>
MANAGER              <Node id="3"/>
MANAGER              <Node id="3"/>
ANALYST              <Node id="4"/>
ANALYST              <Node id="4"/>
CLERK                <Node id="2"/>
CLERK                <Node id="2"/>
CLERK                <Node id="2"/>
ANALYST              <Node id="4"/>
CLERK                <Node id="2"/>

14 rows selected.

Using Peter's approach we could write the following query:

SELECT
  PREDICTION(tree_model using a.*) prediction,
  rule_details.description rule
FROM emp a,
     tree_model_rules rule_details
WHERE
  extractvalue(PREDICTION_DETAILS(tree_model USING a.*), 
               'Node/@id') = rule_details.node_id
ORDER BY empno;

PREDICTION RULE
---------- --------------------------------------------------
CLERK      SAL lessOrEqual 2025
CLERK      SAL lessOrEqual 2025
CLERK      SAL lessOrEqual 2025
MANAGER    SAL greaterThan 2025 and SAL lessOrEqual 2987.5
CLERK      SAL lessOrEqual 2025
MANAGER    SAL greaterThan 2025 and SAL lessOrEqual 2987.5
MANAGER    SAL greaterThan 2025 and SAL lessOrEqual 2987.5
ANALYST    SAL greaterThan 2025 and SAL greaterThan 2987.5
ANALYST    SAL greaterThan 2025 and SAL greaterThan 2987.5
CLERK      SAL lessOrEqual 2025
CLERK      SAL lessOrEqual 2025
CLERK      SAL lessOrEqual 2025
ANALYST    SAL greaterThan 2025 and SAL greaterThan 2987.5
CLERK      SAL lessOrEqual 2025

14 rows selected.

tree_model_rules_details is a view created from the XML model representation. The query uses the PREDICTION operator to get the predictions and the join with the tree_model_rules_details to get the textual rule for the node_id.

The following code illustrates how to create the tree_model_rules_details view. This example uses tree_model as the model name. For other models, we can use the same code and replace tree_model with the name of the new model.

CREATE VIEW tree_model_rules AS
SELECT * FROM
XMLTABLE('
declare function local:parent_node($e as element()*) as element()*
{
  for $n in $e
    return
  <rule>
    {
      local:parent_node($n/../../Node[@id=$n/../@id])
    }
    {
      if (count($n/SimplePredicate) > 0) then
        concat(
          string($n/SimplePredicate/@field),string('' ''),
          string($n/SimplePredicate/@operator),string('' ''),
          string($n/SimplePredicate/@value),string('' and '')
        )
      else
        string('''')
    }
    {
      if (count($n/SimpleSetPredicate) > 0) then
        concat(
          string($n/SimpleSetPredicate/@field),string('' ''),
          string($n/SimpleSetPredicate/@booleanOperator),string('' ''),
          string($n/SimpleSetPredicate/Array/text()),string('' and '')
        )
      else
        string('''')
    }
    {
      if ($n/CompoundPredicate/*[1] instance of 
          element(SimplePredicate)) then
        concat(
         string($n/CompoundPredicate/SimplePredicate[1]/@field),
                string('' ''),
         string($n/CompoundPredicate/SimplePredicate[1]/@operator),
                string('' ''),
         string($n/CompoundPredicate/SimplePredicate[1]/@value),
                string('' and '')
         )
      else if ($n/CompoundPredicate/*[1] 
               instance of element(SimpleSetPredicate)) then
        concat(
         string($n/CompoundPredicate/SimpleSetPredicate[1]/@field),
                string('' ''),
         string(
          $n/CompoundPredicate/SimpleSetPredicate[1]/@booleanOperator),
          string('' (''),
         string(
          $n/CompoundPredicate/SimpleSetPredicate[1]/Array/text()),
          string('') and '')
        )
      else
        string('''')
    }
  </rule>
}; for $a in /PMML/TreeModel//Node
   let $p := local:parent_node($a)
   where count($a/../../Node) > 0
   return
  <SimpleRule id="{$a/@id}">
    {normalize-space(substring($p,0,string-length($p)-4))}
  </SimpleRule>
  ' PASSING dbms_data_mining.get_model_details_xml('tree_model')
  COLUMNS
    "NODE_ID" NUMBER PATH '/SimpleRule/@id',
    "DESCRIPTION" VARCHAR2(4000) PATH '/SimpleRule'
);

If you have data mining or analytics tips that you would like to share send me an email at oracledmt@gmail.com.

James,

Good question. I probably made things look more complicated than they really are. The departing point for the post is that someone had built a decision tree to solve a classification problem. At this point he/she the model is already operational. It can be used to score new records using SQL statements, for example. The functionality also returns a node_ID per scored record and an XML rule description can be retrieved for each node_ID. So far the user only needs to use some very simple SQL. In general applications prefer to consume the XML description directly, so no extra work would be necessary. Languages such as Java have great support for XML processing and parsing.

Peter's suggestion was for those among us that like to play with SQL directly and rather see a textual representation of the rule. It does the parsing that an application would do.

Regarding decision trees (DT) versus business rules systems (BRS), these are complementary approaches. A DT infers discriminating rules directly form the data and does not require the input of an expert. A BRS reflects the know-how of a user and allows the enforcement of sensible and ad-hoc strategies familiar to the user.

Hope this helps to clarify that the operationalization of data mining is already at hand as soon as a model has been built. The approach in the post is not a requirement for operationalizing data mining.

Posted by Marcos | 4/12/2006 01:23:00 PM

hello,

I have a question , after creating the xml for rule details how do i write

the following query for k-means ?

SELECT
PREDICTION(tree_model using a.*) prediction,
rule_details.description rule
FROM emp a,
tree_model_rules rule_details
WHERE
extractvalue(PREDICTION_DETAILS(tree_model USING a.*),
'Node/@id') = rule_details.node_id
ORDER BY empno;

Posted by Anonymous | 10/06/2006 07:06:00 AM

Hi,

The above query is to be used with decision tree models. From your question it sounds that you want a way to obtain a textual rule for a K-Means model. Did I get it right? That requires a different approach. The details for KM models are object-relational. In fact decision tree models are the only data mining models that return details in XML. I can send you the details on how to do it for KM if this is what you are looking for.

Posted by Marcos | 10/06/2006 10:46:00 AM

Yes , thats what i am looking for . Thanks a lot for your reply .....please
let me know how to do that

Posted by Anonymous | 10/06/2006 12:36:00 PM

Dear Marcos ,

could you please tell me how to obtain a textual rule for a K-Means model s you did with decision tree ?

Posted by Anonymous | 10/10/2006 04:33:00 AM

The query below will return content of the rules for a k-means model for all clusters. I also placed some extra columns here in case you are interested: record count, parent, tree_level. The model name 'MY_MODEL' should be replaced by your model name.

SELECT a.id, record_count, parent, tree_level,
ru.attribute_name aname, conditional_operator op,
attribute_num_value anval, attribute_str_value asval
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_KM('MY_CL',null,null,0,0)) a,
TABLE(a.rule.antecedent)(+) ru;

If you only want the same results for a single cluster (e.g., 2) use the following:

SELECT a.id, record_count, parent, tree_level,
ru.attribute_name aname, conditional_operator op,
attribute_num_value anval, attribute_str_value asval
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_KM('MY_CL',2,null,0,0)) a,
TABLE(a.rule.antecedent)(+) ru;

If you want the results for leaf clusters only, use this variation:

SELECT a.id, record_count, parent, tree_level,
ru.attribute_name aname, conditional_operator op, attribute_num_value anval,
attribute_str_value asval
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_KM('MY_CL',null,null,0,0)) a,
TABLE(a.rule.antecedent)(+) ru,
TABLE(a.child) ch where ch.id IS NULL

You can find more details on how to extract different pieces of information from a clustering model using the GET_MODEL_DETAILS_KM fuction in Chapter 25 on DBMS_DATA_MINING package in the Oracle® Database PL/SQL Packages and Types Reference 10g Release 2 (10.2) manual (available online at otn.oracle.com).

Posted by Marcos | 10/13/2006 12:57:00 PM

Oracle Data Mining and Analytics

Blog on Data Mining and Analytics, with a special focus on Oracle. Analytics paves the way to transform databases into Knowledgebases.

Tuesday, April 11, 2006

Explaining Predictions with Easy to Read Tree Rules

About me

Disclaimer

Email-Digest

Feeds

Search

Series

Posts

Category Cloud

Links

Reading lists