« Home | Bioinformatics the SQL Way » | Now Available! - Oracle Data Miner Release 10g Rel... » | Automating Pivot Queries » | Time Series Forecasting 2 - Single-step Forecasting » | The Democratization of Business Intelligence » | Running Oracle Data Miner on the Mac » | Real-Time Scoring & Model Management 1 - Life Cycle » | Real-Time Scoring & Model Management 2 - Implement... » | Real-Time Scoring & Model Management 3 - Performance » | Oracle Life Sciences Meeting » 

Tuesday, April 11, 2006 

Explaining Predictions with Easy to Read Tree Rules

When scoring decision tree data mining models using SQL PREDICTION operators, one can obtain a node ID of the tree using the PREDICTION_DETAILS operator. This node ID indicates which node in the decision tree was used to predict the outcome for a specific record (row). Sometimes, we may also want to produce a textual rule for the specific node, and provide this rule on a per-row basis when scoring new data, for example:

PREDICTION RULE
---------- ----------------------
YES RELATIONSHIP isIn ("Husband" "Wife") and
EDUCATION_NUM greaterThan 12.5 and
EDUCATION isIn ("Bach." "Masters" "PhD" "Profsc") and
CAPITAL_GAIN greaterThan 5095.5
Peter Stengard sent me a tip on how to create this type of textual tree rules. The idea is to create a view (or a table) from the XML tree model representation and join it with the PREDICTION operator statement in the query. Before describing how this can be implemented, let's look at an example.

Consider the following data set:
SQL> SELECT empno, job, sal, deptno FROM emp;

EMPNO JOB SAL DEPTNO
---------- --------- ---------- ----------
7369 CLERK 800 20
7499 SALESMAN 1600 30
7521 SALESMAN 1250 30
7566 MANAGER 2975 20
7654 SALESMAN 1250 30
7698 MANAGER 2850 30
7782 MANAGER 2450 10
7788 ANALYST 3000 20
7839 PRESIDENT 5000 10
7844 SALESMAN 1500 30
7876 CLERK 1100 20
7900 CLERK 950 30
7902 ANALYST 3000 20
7934 CLERK 1300 10

14 rows selected.
Also consider a decision tree model tree_model for predicting JOB based on SAL and DEPTNO. We can create this model as follows:
CREATE VIEW emp_view AS
SELECT empno, job, sal, deptno FROM emp;

CREATE TABLE my_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(30));

BEGIN
-- Populate settings table
INSERT INTO my_settings VALUES
(dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree);
INSERT INTO my_settings VALUES
(dbms_data_mining.tree_term_minrec_split, 2);
INSERT INTO my_settings VALUES
(dbms_data_mining.tree_term_minrec_node, 1);

COMMIT;

dbms_data_mining.create_model(
model_name => 'TREE_MODEL',
mining_function => dbms_data_mining.classification,
data_table_name => 'EMP_VIEW',
case_id_column_name => 'EMPNO',
target_column_name => 'JOB',
settings_table_name => 'my_settings');
END;
The following query scores the data using the model tree_model and also returns the tree node ID a record was assigned to:
SQL> SELECT PREDICTION(tree_model USING a.*) prediction,
PREDICTION_DETAILS(tree_model USING a.*) node_id
FROM emp a
ORDER BY empno;

PREDICTION NODE_ID
-------------------- --------------------
CLERK <Node id="2"/>
CLERK <Node id="2"/>
CLERK <Node id="2"/>
MANAGER <Node id="3"/>
CLERK <Node id="2"/>
MANAGER <Node id="3"/>
MANAGER <Node id="3"/>
ANALYST <Node id="4"/>
ANALYST <Node id="4"/>
CLERK <Node id="2"/>
CLERK <Node id="2"/>
CLERK <Node id="2"/>
ANALYST <Node id="4"/>
CLERK <Node id="2"/>

14 rows selected.
Using Peter's approach we could write the following query:
SELECT
PREDICTION(tree_model using a.*) prediction,
rule_details.description rule
FROM emp a,
tree_model_rules rule_details
WHERE
extractvalue(PREDICTION_DETAILS(tree_model USING a.*),
'Node/@id') = rule_details.node_id
ORDER BY empno;

PREDICTION RULE
---------- --------------------------------------------------
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
MANAGER SAL greaterThan 2025 and SAL lessOrEqual 2987.5
CLERK SAL lessOrEqual 2025
MANAGER SAL greaterThan 2025 and SAL lessOrEqual 2987.5
MANAGER SAL greaterThan 2025 and SAL lessOrEqual 2987.5
ANALYST SAL greaterThan 2025 and SAL greaterThan 2987.5
ANALYST SAL greaterThan 2025 and SAL greaterThan 2987.5
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
ANALYST SAL greaterThan 2025 and SAL greaterThan 2987.5
CLERK SAL lessOrEqual 2025

14 rows selected.
tree_model_rules_details is a view created from the XML model representation. The query uses the PREDICTION operator to get the predictions and the join with the tree_model_rules_details to get the textual rule for the node_id.

The following code illustrates how to create the tree_model_rules_details view. This example uses tree_model as the model name. For other models, we can use the same code and replace tree_model with the name of the new model.
CREATE VIEW tree_model_rules AS
SELECT * FROM
XMLTABLE('
declare function local:parent_node($e as element()*) as element()*
{
for $n in $e
return
<rule>
{
local:parent_node($n/../../Node[@id=$n/../@id])
}
{
if (count($n/SimplePredicate) > 0) then
concat(
string($n/SimplePredicate/@field),string('' ''),
string($n/SimplePredicate/@operator),string('' ''),
string($n/SimplePredicate/@value),string('' and '')
)
else
string('''')
}
{
if (count($n/SimpleSetPredicate) > 0) then
concat(
string($n/SimpleSetPredicate/@field),string('' ''),
string($n/SimpleSetPredicate/@booleanOperator),string('' ''),
string($n/SimpleSetPredicate/Array/text()),string('' and '')
)
else
string('''')
}
{
if ($n/CompoundPredicate/*[1] instance of
element(SimplePredicate)) then
concat(
string($n/CompoundPredicate/SimplePredicate[1]/@field),
string('' ''),
string($n/CompoundPredicate/SimplePredicate[1]/@operator),
string('' ''),
string($n/CompoundPredicate/SimplePredicate[1]/@value),
string('' and '')
)
else if ($n/CompoundPredicate/*[1]
instance of element(SimpleSetPredicate)) then
concat(
string($n/CompoundPredicate/SimpleSetPredicate[1]/@field),
string('' ''),
string(
$n/CompoundPredicate/SimpleSetPredicate[1]/@booleanOperator),
string('' (''),
string(
$n/CompoundPredicate/SimpleSetPredicate[1]/Array/text()),
string('') and '')
)
else
string('''')
}
</rule>
}; for $a in /PMML/TreeModel//Node
let $p := local:parent_node($a)
where count($a/../../Node) > 0
return
<SimpleRule id="{$a/@id}">
{normalize-space(substring($p,0,string-length($p)-4))}
</SimpleRule>
' PASSING dbms_data_mining.get_model_details_xml('tree_model')
COLUMNS
"NODE_ID" NUMBER PATH '/SimpleRule/@id',
"DESCRIPTION" VARCHAR2(4000) PATH '/SimpleRule'
);
If you have data mining or analytics tips that you would like to share send me an email at oracledmt@gmail.com.

James,

Good question. I probably made things look more complicated than they really are. The departing point for the post is that someone had built a decision tree to solve a classification problem. At this point he/she the model is already operational. It can be used to score new records using SQL statements, for example. The functionality also returns a node_ID per scored record and an XML rule description can be retrieved for each node_ID. So far the user only needs to use some very simple SQL. In general applications prefer to consume the XML description directly, so no extra work would be necessary. Languages such as Java have great support for XML processing and parsing.

Peter's suggestion was for those among us that like to play with SQL directly and rather see a textual representation of the rule. It does the parsing that an application would do.

Regarding decision trees (DT) versus business rules systems (BRS), these are complementary approaches. A DT infers discriminating rules directly form the data and does not require the input of an expert. A BRS reflects the know-how of a user and allows the enforcement of sensible and ad-hoc strategies familiar to the user.

Hope this helps to clarify that the operationalization of data mining is already at hand as soon as a model has been built. The approach in the post is not a requirement for operationalizing data mining.

hello,

I have a question , after creating the xml for rule details how do i write

the following query for k-means ?

SELECT
PREDICTION(tree_model using a.*) prediction,
rule_details.description rule
FROM emp a,
tree_model_rules rule_details
WHERE
extractvalue(PREDICTION_DETAILS(tree_model USING a.*),
'Node/@id') = rule_details.node_id
ORDER BY empno;

Hi,

The above query is to be used with decision tree models. From your question it sounds that you want a way to obtain a textual rule for a K-Means model. Did I get it right? That requires a different approach. The details for KM models are object-relational. In fact decision tree models are the only data mining models that return details in XML. I can send you the details on how to do it for KM if this is what you are looking for.

Yes , thats what i am looking for . Thanks a lot for your reply .....please
let me know how to do that

Dear Marcos ,

could you please tell me how to obtain a textual rule for a K-Means model s you did with decision tree ?

The query below will return content of the rules for a k-means model for all clusters. I also placed some extra columns here in case you are interested: record count, parent, tree_level. The model name 'MY_MODEL' should be replaced by your model name.

SELECT a.id, record_count, parent, tree_level,
ru.attribute_name aname, conditional_operator op,
attribute_num_value anval, attribute_str_value asval
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_KM('MY_CL',null,null,0,0)) a,
TABLE(a.rule.antecedent)(+) ru;

If you only want the same results for a single cluster (e.g., 2) use the following:

SELECT a.id, record_count, parent, tree_level,
ru.attribute_name aname, conditional_operator op,
attribute_num_value anval, attribute_str_value asval
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_KM('MY_CL',2,null,0,0)) a,
TABLE(a.rule.antecedent)(+) ru;

If you want the results for leaf clusters only, use this variation:

SELECT a.id, record_count, parent, tree_level,
ru.attribute_name aname, conditional_operator op, attribute_num_value anval,
attribute_str_value asval
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_KM('MY_CL',null,null,0,0)) a,
TABLE(a.rule.antecedent)(+) ru,
TABLE(a.child) ch where ch.id IS NULL

You can find more details on how to extract different pieces of information from a clustering model using the GET_MODEL_DETAILS_KM fuction in Chapter 25 on DBMS_DATA_MINING package in the Oracle® Database PL/SQL Packages and Types Reference 10g Release 2 (10.2) manual (available online at otn.oracle.com).

Post a Comment

About me

  • Marcos M. Campos: Development Manager for Oracle Data Mining Technologies. Previously Senior Scientist with Thinking Machines. Over the years I have been working on transforming databases into easy to use analytical servers.
  • My profile

Disclaimer

  • Opinions expressed are entirely my own and do not reflect the position of Oracle or any other corporation. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
  • This work is licensed under a Creative Commons license.
  • Creative Commons License

Email-Digest



Feeds

Search


Posts

All Posts

Category Cloud

Links

Locations of visitors to this page
Powered by Blogger
Get Firefox