Thursday, February 23, 2006 

Running Oracle Data Miner on the Mac

I have a Mac, and, as all Mac users, I want to do everything from my Mac (it does not hurt that it has a great looking and easy to use OS). When I found out that I could run Oracle Data Miner from my Mac I had to try it. Small caveat, there are no instructions on how to install Oracle Data Miner on the Mac. But there are instructions for Linux (the Mac OS is Unix-based). So, I followed those instructions and it worked like a charm. But that required that I used the shell to run Oracle Data Miner. However, as most Mac users would, I wanted to run the program from the graphical user interface. Below I show how to install Oracle Data Miner on the Mac, run it from the shell, and how to create a script to launch it from the graphical interface of the OS.

Installing Oracle Data Miner on the Mac
The program requires Java JDK 1.4.2. If your Mac OS X is up to date (I am running OS X 10.4.5) you are all set. To check the version of Java, open a terminal session (using the Terminal application) and use the command:

java -version
Follow the instructions for installation on Unix or Linux:
  1. Download odminer.zip from here.
  2. Unzip odminer.zip by double clicking on odminer.zip. This creates the folder odminer (in the current working folder) and inflates the archive into it. For the Oracle Data Miner beta version, it creates the odminer_102_beta2 Folder.
  3. Move the created folder to the desired location. In my case, I moved this folder to the Applications folder (see Figure 1).


Directory location


Figure 1: Installing Oracle Data Miner in the Applications folder.

Running from the Shell
To start Oracle Data Miner, open a Terminal shell and run the script odminer in the directory MINER_HOME/bin (see Figure 2), where MINER_HOME is the directory where Oracle Data Miner is installed. In this example, MINER_HOME is /Applications/odminer_102_beta2 Folder. I also used "&" after the command to run odminer in the background. If the script is not executable (which was my case), reset the permissions:
chmod +x odminer



Figure 2: Invoking odminer from a Terminal shell.

The first time that you start Oracle Data Miner, a dialog (Figure 3) appears asking for the following information (you may need to contact your Oracle Data Mining DBA for this information):
  • Connection Name, the name of the connection
  • User, the name of the ODM user schema where data mining will take place
  • Password, the ODM user password
  • Host, the system where ODM is installed
  • Port, the port number for the connection
  • Sid, the SID for the database where ODM is installed


Figure 3: New connection dialog.

Click OK when you finish the definition. You are returned to the Choose Connection dialog (Figure 4). You can now select the connection that you just defined from the dropdown box.



Figure 4: Choose connection dialog.

Click OK to bring up the Oracle Data Miner main screen (Figure 5).



Figure 5: Oracle Data Miner on Mac OS X.

Running from the Graphical Interface
Running from the Terminal shell works fine, but, as most Mac users, I want to run my applications from the Mac OS graphical interface. This can be easily done using a shell script. In Mac OS 10.4 we can use Automator (Figure 6) to do create one.



Figure 6: Invoking Automator.

Start Automator and then select Run Shell Script from the Action list and drag it to the left pane (Figure 7).



Figure 7: Creating a Run Shell Script action.

Next enter the following lines in the Run Shell Script text box (Figure 8):
cd MINER_HOME/bin
./odminer
where, in my case, MINER_HOME is /Applications/odminer_102_beta2 Folder. Note how the folder name is spelled in Figure 7. To get the correct name it is probably a good idea to go to a Terminal shell, change directory to the MINER_HOME folder, and type pwd to get the name of the folder at the shell.



Figure 8: Entering the shell script actions.

Save the script using the File - Save As... menu option (Figure 9)



Figure 9: Saving the script.

Select a name for the application and Application for File Format (Figure 10) and you are done. Just double click on the icon for the new application and you will launch Oracle Data Miner!



Figure 10: Saving the script as an application.



UPDATE: The RTM version does not work with the Mac. There will be a patch fixing it. I'll post a note when it is out. For now keep using the beta release.

Tuesday, February 21, 2006 

Real-Time Scoring & Model Management 1 - Life Cycle

Lately, I have come across an increasing number of discussions on the need for large-scale real-time scoring and model management. As a result, I thought it would be a good idea to write about this. I wanted to answer the question: Can we implement a large-scale real-time scoring engine, coupled with model management, using the technologies available in the 10gR2 Oracle Database? To find out the answers, keep on reading. In this series (Part 1, Part 2, and Part 3) I describe how this can be done and give some performance numbers.

Typical examples of large-scale real-time scoring applications include call center service dispatch and cross-sell of financial products. These applications have in common the need for:

  • Scoring many models in real-time
  • Filtering models to a relevant set suitable for scoring a particular case
  • Managing models
Model management comes into the equation because of the large number of models required to support these applications. This makes automating the steps of model development and deployment in the model life cycle a necessity. Before describing how to implement a basic system for large-scale real-time scoring and my findings on performance, I would like to add a couple of words on model life cycle and the role of model management.

Model Life Cycle
In production data mining, where data mining models are deployed into production systems, models commonly follow a life cycle with five major phases (see Figure 1):
  1. Business understanding
  2. Data management
  3. Model development
  4. Model deployment
  5. Model management


Production data mining model life-cycle


Figure 1: Production data mining model life cycle.

Business Understanding
This initial phase focuses on identifying the application objectives and requirements from a business perspective. Based on that, the type of models needed are determined. For example, an application may have as an objective to cross-sell products to customers calling the company's call center. In order to support this objective, the system designer may create a model for each product that predicts the likelihood of a customer buying the product.

Data Management
The data management phase includes identifying the data necessary for building and scoring models. Data can come from a single table or view or from multiple tables or views. It is also common to create computed columns (e.g., aggregated values) to be used as input to models.

Model Development
Model development includes selecting the appropriate algorithm, preparing the data to meet the needs of the selected algorithm, and building and testing models for their predictive power. Sometimes, it may also involve sampling the data. Typical transformations include: outlier removal, missing value replacement, binning, normalization, and applying power transformations to handle heavily skewed data.

In the majority of cases, the transformations used during model build need to be deployed along side with the model in the deployment phase. Because in the database transformations can be easily implemented as SQL statements, this implies that, at scoring time, we need to craft scoring SQL queries that include the required transformations. This task is greatly simplified with the help of the Oracle Data Miner GUI.

Model Deployment
Model deployment requires moving the model and necessary transformations to the production system. For many data mining platforms, model deployment can be a labor-intensive task. Because data mining models are database objects and the data transformations are implemented using SQL, this is not the case for Oracle Data Mining. If the model was developed in the production instance of the database then, unlike external data mining tools, it is already part of the production system. Otherwise, a more common case, model deployment is performed using import/export or the Data Pump utilities. The process can be done manually or can be automated. The deployment can also be scheduled to take place on a periodic basis. The actual model scoring is implemented using SQL queries or Java calls depending on the application.

IT policy and internal audit requirements may require testing of models prior to deployment to the production environment. This type of testing is independent of the testing for model predictive power performed during model development.

Model Management
Usually, model management implies periodic creation of models and replacement of current ones in a automatic or semi-automatic fashion. In this sense, model management is tightly coupled with model development and deployment steps. It can be seen as an outer-loop controlling these steps. When dealing with large numbers of models, a model management component, responsible for automating the creation and deployment of models, becomes a necessity. A useful approach for implementing model management is the champion/challenger testing strategy. In a nutshell, champion/challenger testing is a systematic, empirical method of comparing the performance of a production model (the champion) against that of new models built on more recent data (the challengers). If a challenger model outperforms the champion model, it becomes the new champion and is deployed in the production system. Challenger models are built periodically as new data are made available.

Another dimension of model management is the creation and management of metadata about models. Model metadata can be used to dynamically select, for scoring, models appropriate to answer a given question. This is a key element in scaling up large-scale real-time scoring systems.

Consider that we have built a set of models to predict house value for each major housing market. This is a sensible strategy as a national model would be too generic. This set of models could then be used in an application for predicting the value of a property based on the house attributes. At scoring time, the application would select the model for the correct housing market. In order to accomplish this, we need metadata that associates different models with a task (e.g., predict house value) as well as other attributes that can be used to segment the models (e.g., region, product type, etc.).

The next posts in this series will describe how to implement a large-scale real-time scoring engine with model management and give performance numbers for the system. The full series is Part 1, Part 2, and Part 3.

Readings: Business intelligence, Data mining, Oracle analytics

Labels: ,

Real-Time Scoring & Model Management 2 - Implementation

This is Part 2 in a series on large-scale real-time scoring and model management - The full series is Part 1, Part 2, and Part 3.

System Architecture
Consider a call center application that allows cross-selling products to a customer based on the customer historical data and information from the current call. The type of offer is conditional on the product category the customer is interested in during the call and the region where the customer lives. For example, a customer in Florida would be more likely to buy beach products than one in New England. As the customer service representative interacts with the customer, the application returns new product recommendations as new information becomes available during the call.

Architecture for a large scale real-time scoring application


Figure 2: Architecture for a large-scale real-time scoring application.

A simplified architecture for such an application is illustrated in Figure 2. The system has four main components:
  1. Client application
  2. Filtering module
  3. Scoring module
  4. Learning module
The call center system initiates the interaction with the scoring system by sending the relevant information about the current call, which includes the customer ID. This information is used to filter down the available models at the server side to those that are meaningful to the current call. The customer ID is used to retrieve relevant historical data that is then combined with call information at the Scoring Module for scoring the models selected by the Filtering Module. The Scoring Module returns a ranked list of products to the call center application. This cycle can be repeated many times during a single call. At the end of the call, the call center application sends the information collected over the call to the server, including what offers were made and if they were accepted or not. This information is combined with historical information and used for training new models in the Learning Module. The Learning Module implements the following model management features discussed above:
  • Automated model build
  • Champion-challenger testing
  • Automated model deployment
  • Storage of model metadata
Implementation
Let's take a look how the Filtering and Scoring modules of the simplified architecture described above can be implemented using the technologies in the 10gR2 Oracle Database. To help illustrate the approach, I am providing implementation examples for each of the steps.

Filtering Module

The Filtering Module selects the set of models appropriated for a given call center call. Filtering is performed on a set of attributes associated with a model – business rules can also be used for defining the filtering criteria. Attributes can be associated with a model in two ways: 1) directly, in the model metadata table, and 2) indirectly, by joining the model metadata table with other tables. Model filtering is accomplished by querying the model metadata table and by applying the desired filtering criteria using a WHERE clause. The filtering criteria can refer to columns from other tables besides the model metadata table.

Consider that we have a model metadata table (modelmd_tab) with the following schema:
model_name       VARCHAR2(30),
model_type VARCHAR2(30),
prod_id NUMBER,
region VARCHAR2(2),
model_accuracy NUMBER
The Learning Module populates this table as models are created and deleted. The information in modelmd_tab identifies the model (name and type), the product and region the model was built for, and the model accuracy. Consider also that we have a product table (products) containing product information with the following schema:
prod_id            NUMBER,
prod_type VARCHAR2(30),
prod_description VARCHAR2(4000)
The following query selects cross-sell models for the northeast (NE) region:
SELECT A.model_name, A.prod_id
FROM modelmd_tab A
WHERE A.model_type = 'cross-sell' AND A.region = 'NE'
This approach to model filtering is very flexible. Other types of information can be used for filtering by joining the modelmd_tab table with other tables. For example, we could restrict the models returned by the previous query to those that belong to a given product category (e.g., credit-card) using the following query:
SELECT A.model_name, A.prod_id
FROM modelmd_tab A,
(SELECT prod_id
FROM products
WHERE prod_type = 'credit-card') B
WHERE A.model_type = 'cross-sell' AND
A.region = 'NE' AND
A.prod_id = B.prod_id
By taking advantage of Oracle Spatial and Oracle Text capabilities, this approach can easily accommodate filtering using spatial and textual information.

Scoring Module
This module scores the set of models selected by the Filtering Module using historical customer data and/or call information. In our example, it returns a ranked list of products. The implementation proposed here takes advantage of temporary session tables and model caching for performance.

A temporary session table is used for holding the scoring results for a given request from the client. Before scoring each new request the table is truncated.

For reasonably sized models, caching is achieved through shared cursors. In order to avoid reloading the model every time, this requires the same SQL string to be used for all queries that score a given model. For example, running the following scoring query multiple times would not reload the model tree-model after the first time:
SELECT PREDICTION(tree_model USING A.*)
FROM customers A
WHERE A.cust_id = '1001'
However, if we modify the above SQL string in any way the model would be re-loaded. By using binding variables in the scoring query, it is possible to the get the benefit of caching without sacrificing flexibility:
SELECT PREDICTION(tree_model USING A.*)
FROM customers A
WHERE A.cust_id = :1
Another benefit of model caching is a reduction in shared pool memory usage, enabling greater concurrency and larger multi-user loads.

In order to implement the Scoring Module, I first created a temporary session table to hold the results. This is done at the time the system is setup. For this example, the table was created as follow:
CREATE GLOBAL TEMPORARY TABLE scoretab(
prod_id NUMBER,
prob NUMBER) ON COMMIT PRESERVE ROWS;
For each score request, scoretab will store a row per each eligible model selected by the Filtering Module. Each row has the ID of the product associated with the model and the probability of the customer buying that product.

Next, I created a PL/SQL stored procedure that filters and then scores all the eligible models. For simplicity, I decided to bundle the filtering and scoring modules in a single PL/SQL procedure:
CREATE OR REPLACE PROCEDURE score_multimodel(
p_cid NUMBER,
p_model_table_name VARCHAR2,
p_score_table_name VARCHAR2,
p_prod_where_clause VARCHAR2,
p_mod_where_clause VARCHAR2) AS
TYPE Char_Tab IS TABLE OF VARCHAR2(30);
TYPE Num_Tab IS TABLE OF NUMBER;
TYPE ModelCursorType IS REF CURSOR;
v_model_tab Char_Tab;
v_prod_tab Num_Tab;
v_cursor_model ModelCursorType;
v_mod_where VARCHAR(4000) := p_mod_where_clause;
v_prod_where VARCHAR(4000) := p_prod_where_clause;
v_gender CHAR(1);
v_year NUMBER;
v_marital VARCHAR2(20);
v_credit NUMBER;
v_sql_stmt0 VARCHAR2(4000);
v_sql_stmt VARCHAR2(4000);
v_num_models NUMBER;
BEGIN
-- Clear result table
v_sql_stmt := 'TRUNCATE TABLE ' || p_score_table_name;
execute immediate v_sql_stmt;

-- Get customer information (NOTE: this assumes a fixed schema)
SELECT CUST_GENDER, CUST_YEAR_OF_BIRTH, CUST_MARITAL_STATUS,
CUST_CREDIT_LIMIT
INTO v_gender, v_year, v_marital, v_credit
FROM customers WHERE cust_id = p_cid;

-- Open cursor and bulk read to get eligible models
v_sql_stmt0 := 'SELECT prod_id FROM products ';

IF length(v_prod_where) > 0 THEN
v_sql_stmt0 := v_sql_stmt0 || ' WHERE ' || v_prod_where;
END IF;

v_sql_stmt := 'SELECT A.model_name, A.prod_id ' ||
'FROM ' || p_model_table_name || ' A,' ||
'(' || v_sql_stmt0 || ') B ' ||
'WHERE A.prod_id = B.prod_id ';

IF length(v_mod_where) > 0 THEN
v_sql_stmt := v_sql_stmt || ' AND ' || v_mod_where;
END IF;
OPEN v_cursor_model FOR v_sql_stmt;

LOOP
FETCH v_cursor_model BULK COLLECT INTO v_model_tab, v_prod_tab;
EXIT WHEN v_cursor_model%NOTFOUND;
END LOOP;
CLOSE v_cursor_model;

-- Score each model and persist results to temporary table
v_num_models := v_model_tab.count;
FOR i IN 1..v_num_models LOOP
v_sql_stmt :=
'INSERT INTO ' || p_score_table_name ||
' SELECT :1, prediction_probability('|| v_model_tab(i) ||
' USING :2 as CUST_GENDER, :3 as CUST_YEAR_OF_BIRTH, ' ||
' :4 as CUST_MARITAL_STATUS, :5 as CUST_CREDIT_LIMIT) '||
' FROM dual ' ||
' WHERE prediction(' || v_model_tab(i) ||
' USING :6 as CUST_GENDER, :7 as CUST_YEAR_OF_BIRTH, ' ||
' :8 as CUST_MARITAL_STATUS, :9 as CUST_CREDIT_LIMIT)=1';
EXECUTE IMMEDIATE v_sql_stmt USING v_prod_tab(i),
v_gender, v_year, v_marital, v_credit,
v_gender, v_year, v_marital, v_credit;
END LOOP;
END score_multimodel;
/
show errors;
COMMIT;
The score_multimodel procedure takes as inputs the customer ID, for retrieval of historical data, the model metadata name, the temporary session table name, for persisting results, a WHERE clause for product filtering, and a WHERE clause for model filtering. The procedure scores all models associated with the products filtered by the WHERE clauses. Models are scored one at a time and results persisted to the temporary session table. Scoring each model separately and using binding variables allow for model caching through shared cursors.

Consider a product with prod_id = 120, an example of the scoring query in the inner loop of the procedure is as follow:
SELECT 120, 
PREDICTION_PROBABILITY(model1 USING 'M' AS cust_gender,
1960 AS cust_year_of_birth,
'Married' AS cust_marital_status,
50000 AS cust_credit_limit)
FROM dual
WHERE PREDICTION(model1 USING 'M' AS cust_gender,
1960 AS cust_year_of_birth,
'Married' AS cust_marital_status,
50000 AS cust_credit_limit) = 1
This scoring query only returns rows for products that the customer is likely to buy. The outcomes a model can predict are a function of the data provided to the model during training. For this example, a model predicts one of two outcomes: 0, if the customer would not buy the product, and 1, if the customer would buy the product. The use of the PREDICTION operator in the WHERE clause removes, from the result set, products the customer is unlikely to buy.

In the above example procedure, the schema for the customer historical information is hard-coded. The customer information is stored in the customer table and the following columns are used for scoring a model: cust_gender, cust_year_of_birth, cust_marital_status, cust_credit_limit. Usually, this would not be a problem as many applications work against a fixed schema. Also, in this example, there is no call data passed to the procedure. This type of information can be easily added by either passing it directly as arguments to the procedure or through a different session table that is accessed inside the procedure.

A typical call to score_multimodel would be something like this:
BEGIN
score_multimodel(1001, 'modelmd_tab', 'scoretab',
'prod_type = ''credit-card''',
'model_type = ''cross-sell'' AND region = ''NE''');
END;
where 1001 is the customer ID, modelmd_tab is the name of the model metadata table, scoretab is the temporary session table, the product filtering condition prod_type = 'credit-card' restricts the recommendations to credit-card products, and the model filtering only allows scoring of cross-sell models for the NE region. A typical output for the procedure would be:
prod_id      prob
------- --------
100 0.6
130 0.4
90 0.8
The client application would, in general, rank the results by probability and take the top N probabilities. For example, for N=10 we would have:
SELECT A.*
FROM (SELECT prod_id, prob FROM scoretab ORDER BY prob DESC) A
WHERE rownum < 11
The next post in this series gives performance numbers for the system. The full series is Part 1, Part 2, and Part 3.

Readings: Business intelligence, Data mining, Oracle analytics

Labels: ,

Real-Time Scoring & Model Management 3 - Performance

This is Part 3 in a series on large-scale real-time scoring and model management - The full series is Part 1, Part 2, and Part 3.

Performance

How does the framework proposed in Part 2 of this series perform? Can it scale to large number of models and multiple simultaneous requests? To evaluate the performance of the implementation described in Part 2 I built 100 models using the same set of inputs. For each model I inserted the relevant metadata into modelmd_tab. I then invoked the score_multimodel stored procedure (Part 2) with different WHERE clause arguments so that a different set of models would be selected each time. The number of selected models ranged from 20 to 100. Figure 3 shows the time required for scoring a single row as a function of the number of models. The numbers are for a single 3 GHz CPU Linux box with 2 G of RAM. As indicated in the graph, the proposed architecture achieves real-time (below 1 second) performance. In fact, extrapolating the trend in the graph, it would take about 0.54 seconds to score one thousand models. Actual performance for different systems is impacted by the type of model and the number of attributes used for scoring. Nevertheless, the numbers are representative for an untuned database running on a single CPU box.

Besides the good performance with the number of models, the system also scales well with the number of concurrent users. The architecture can leverage multiple processors and RAC. The cursor sharing feature described in Part 2 also keeps the memory requirements to a minimum while the database caching mechanisms will make good use of available memory. Because we score each model independently, it is also possible for the application to assign groups of models to different servers and increase cache re-use.

It is important to note that the numbers in Figure 3 should not be used as a baseline to estimate the performance of scoring multiple records with a single model in Oracle Data Mining. In this type of task, Oracle Data Mining can score millions of records in a couple of seconds (link).

Time for sequentially scoring a single row with multiple models

Figure 3: Time for sequentially scoring a single row with multiple models.

Conclusions
I started this series trying to answer the question: Can we implement a large-scale real-time scoring engine, coupled with model management, using the technologies available in the 10gR2 Oracle Database? The answer is Yes. The technologies available in the 10gR2 Oracle Database provide a flexible framework for the implementation of large-scale real-time scoring applications. As shown in the example described above, it is possible to support:
  • Large number of models
  • Large number of concurrent calls
The approach relies on off-the-shelf components (e.g., RAC and Oracle Data Mining). It also supports a flexible filtering scheme (Part 2) and can be extended to leverage textual and spatial information.

Readings: Business intelligence, Data mining, Oracle analytics

Labels: ,

Friday, February 17, 2006 

Oracle Life Sciences Meeting

The OLSUG Conference Announcement & Agenda for the Oracle Life Sciences User Group Meeting in Boston on April 3 are now posted on the Oracle Life Sciences User Group and the OTN Life Sciences web sites.

The OLSUG Conference Program includes three tracks of industry leaders, technical experts, Oracle experts, and a Hands-on Technical Workshop with 30 PCs loaded with Oracle 10g Release 2 and a number of technical demos including: Data Mining, Statistical functions, RDF and the Semantic Web, InterMedia & Images, HTML DB (now called Applications Express), Text Mining of Medline, BLAST, JDeveloper, and other useful demos for life sciences & healthcare applications.

This is a great opportunity for customers, prospective customers and Oracle personnel to share, exchange, and experience "best practices" in the Life Sciences & Healthcare industry.

Labels:

Thursday, February 16, 2006 

Improved Stock Research Through Analytics?

Reuters has this interesting article on Stock tracker Majestic Research. Some snippets from the article:

"Majestic, which was founded in 2002, uses "quantitative" analysis that it claims can do the job better than traditional stock research methods, at least for consumer-sensitive companies that utilize the Internet in some way."

"From modestly-furnished Majestic offices overlooking Manhattan's Central Park, several dozen math Ph.D.s, statisticians and other quantitative analysts evaluate data spewed from computers using "Web crawling" programs track sales and other information from tens of millions of Web pages or other on-line resources."
Their approach does not always get it right, but customers claim it to be an improvement over traditional research methods.

Readings: Business intelligence, Data mining

Tuesday, February 14, 2006 

Multi-touch Screens - The Next Input Device?

This video is a very cool demo of work on multi-touch interaction at NYU. This research allows bi-manual, multi-point, and multi-user interactions on a graphical interaction surface. The potential for data exploration and visualization is tremendous. The beginning is more on the artistic side. Keep watching and you will see data manipulation, interaction with maps and spatial data, and exploration of network structures.

Thursday, February 09, 2006 

Poll - How do you work with analytics?

Different types of users prefer to work with analytics in different ways. Analysts like sophisticated tools that give them a great deal of control. Business users prefer analytics packaged in vertical applications or business intelligence tools. Developers, usually, rather have sample code showcasing interesting applications leveraging analytics that they can modify to their needs. Let me know how you work with analytics the most.


Create polls and vote for free. dPolls.com

Wednesday, February 08, 2006 

Competing on Analytics

The Harvard Business Review has recently published an article by Babson College's Tom H. Davenport on how analytics is becoming a key competitive factor for companies. Some key points made in the article:

  • Companies such as Amazon, Harrah's, Capital One, and the Boston Red Sox have all dominated their fields by deploying industrial-strength analytics across a wide variety of activities.
  • Business processes are among the few remaining points of differentiation for companies -- and analytics competitors extract every last drop of value from those processes.
  • In companies that compete on analytics, senior executives make it clear--from the top down--that analytics is central to strategy.
  • Statistical analysis, and quantitative activity, in these companies, is managed at the enterprise (not departmental) level.
To read the full text of the article online one needs a subscription to the Harvard Business Review. However, the article is based on a study by the author and two colleagues that is freely available. You can get a copy of the study from the author's website if you subscribe to receive their free newsletter.

Readings: Business intelligence

Friday, February 03, 2006 

SPSS and Inforsense Support for Oracle Data Mining

SPSS Supports Oracle Data Mining: "Analytics software firm SPSS Inc has released a new version of its Clementine data mining workbench geared specially for Oracle Corp's 10g relational database system. Chicago, Illinois-based SPPS unveiled the product as a complement to Oracle 10g's predictive modeling capabilities, which are provided by the Oracle Data Mining component. Clementine 9.0 provides interfaces that tap directly into Oracle predictive models, which are constructed and scored within the core database. Oracle Data Mining supports a variety of data mining algorithms including Bayes and Vector Machines. These algorithms are surfaced as "nodes" in Clementine's interface and can be manipulated natively."

Inforsense KDE 3.0 Released: "New features enable informaticians to build tailored analytic solutions and portals. Accessed via a redesigned client graphical user interface, enhancements include control workflow constructs to link and orchestrate multiple analytical workflows according to business rules; support for Oracle Database 10g Release 2 in-database processing and data mining functions; easy-to-use service creation and deployment wizards; definition of mid-workflow user intervention points; deployment of guided end user applications within the full client interface and portal; workflow annotation to explain workflow steps within collaborative environments; and the use of standardized JSR 168 portlet specifications for easy integration of InforSense KDE workflows into standard portal applications."

Many consider Clementine the best graphical user interface for data mining. Its new support for Oracle Data Mining is an important statement about the power and quality of the Oracle data mining server. In this release, Clementine only supports a subset of the algorithms in Oracle Data Mining. I have seen a demo of their product and I liked how these algorithms were tightly integrated with the rest of the product. This release is a good step towards strengthening Clementine in-database data mining story.

Inforsense is a great platform for developing analytical applications. I have tried the previous release of their IOE (In-Oracle environment) product and the breadth of the integration of Oracle features impressed me. Besides integrating all algorithms in Oracle Data Mining, they also offer support for Oracle Text, Oracle Spatial, database statistical functions, and in-database data transformations. This array of features, combined with InforSense's intuitive analytical workflow interface, places Inforsense in a unique position amongst analytic tool vendors.

About me

  • Marcos M. Campos: Development Manager for Oracle Data Mining Technologies. Previously Senior Scientist with Thinking Machines. Over the years I have been working on transforming databases into easy to use analytical servers.
  • My profile

Disclaimer

  • Opinions expressed are entirely my own and do not reflect the position of Oracle or any other corporation. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
  • This work is licensed under a Creative Commons license.
  • Creative Commons License

Email-Digest



Feeds

Search


Posts

All Posts

Category Cloud

Links

Locations of visitors to this page
Powered by Blogger
Get Firefox