Analytics in the Oracle Database
The addition of analytics to databases is a natural direction. As the volume of data increases, data movement dominates the cost of computation. It starts to make more sense to move the computation and algorithms to the database than to move, to an external server, the data to be analyzed. Furthermore, in most cases, the results of the analysis need to be persisted back to the database and combined with other data in order to make them actionable.
Picking up on this, over the past releases Oracle has continuously added analytic features to its database. Taken as a whole these features make the Oracle Database a powerful platform for developing applications leveraging analytics. However, most users are not aware of the complete set of analytic features available in the database. The following list covers the features present in the Oracle Database 10g Release 2:
- Complex data transformations
- Data mining
- Image feature extraction
- Linear algebra
- Predictive analytics
- Spatial analytics
- Statistical functions
- Text mining
SELECT A.cust_name, A.contact_infoThe above query selects all customers who have a high propensity to attrite (> 80% chance), are valuable customers (customer value rating > 90), and have had a recent conversation with customer services regarding a Checking Plus account. The propensity to attrite information is computed using a data mining model (tree_model). The query uses Oracle Text's CONTAINS operator to search call center notes for references to Checking Plus accounts.
FROM customers A
‘attrite’ USING A.*) > 0.8
AND A.cust_value > 90
AND A.cust_id IN
FROM call_center B
WHERE B.call_date BETWEEN ’01-Jan-2005’
AND CONTAINS(B.notes, ‘Checking Plus’, 1) > 0);
Finally, it is also easy to integrate the results from queries like the one above with Business Intelligence tools such as Oracle Discover, Oracle Portal, and Crystal Reports (more on that in future posts).
In future posts I will cover:
- How to get the most out of many of these features
- How to solve real problems using analytics
- The role of analytics in Business Intelligence and databases
Complex Data Transformations
Data transformation is a key aspect of analytical applications and ETL (extract, transform, and load). Besides support for transformations through SQL expressions, the Oracle Database, since the Oracle Database 10g Release 1, ships with a flexible data transformation package that includes a variety of missing value and outlier treatments, as well as binning and normalization capabilities.
Oracle Data Mining (ODM), an option to the Enterprise Edition of the Oracle Database, provides a rich set of data mining functionality. ODM 10g Release 2 has eleven algorithms that can be used for classification, regression, clustering, anomaly detection, feature extraction, association analysis, and attribute ranking.
The database also includes in both Standard Edition and Enterprise Edition the frequent itemset package (DBMS_FREQUENT_ITEMSET). This package enables frequent itemset counting and it is used as a building block for ODM's Association algorithm. Frequent itemsets provide a mechanism for counting how often multiple events occur together. This blog post has a nice discussion of this feature.
Image Feature Extraction
Oracle Intermedia is a feature of the Oracle Database that is included in both Standard Edition and Enterprise Edition. interMedia supports the extraction of image features (e.g., color histogram, texture, and positional color) that can then be used to characterize and analyze images.
Oracle Database 10g Release 2 ships with a new package UTL_NLA. The UTL_NLA package exposes a subset of the popular BLAS and LAPACK (Version 3.0) libraries for operations on vectors and matrices represented as VARRAYs. This package includes procedures to solve systems of linear equations, invert matrices, and compute eigenvalues and eigenvectors.
Data mining can uncover useful information buried in vast amounts of data. However, it is often the case that many users that could benefit from these results do not have any data mining expertise. The DBMS_PREDICTIVE_ANALYTICS package addresses this by automating the entire data mining process from data preprocessing through model building to scoring new data. This package provides an important tool that makes data mining possible for a wider audience of users, in particular, business analysts. The capabilities of this package are also exposed through the Oracle Spreadsheet Add-In for Predictive Analytics. The Oracle Spreadsheet Add-In for Predictive Analytics enables Microsoft Excel users to mine their Oracle Database or Excel data using simple, "one click" Predict and Explain predictive analytics features.
The Oracle Database provides a long list of SQL statistical functions with support for: hypothesis testing (e.g., t-test, F-test), correlation computation (e.g., pearson correlation), cross-tab statistics, and descriptive statistics (e.g., median and mode). The package DBMS_STAT_FUNCS adds distribution fitting procedures and a summary procedure that returns descriptive statistics for a column.
Oracle OLAP, an option to Oracle Database 10g Enterprise Edition, has features previously found only in specialized OLAP databases. Moving beyond drill-downs and roll-ups, Oracle OLAP also supports time-series modeling and forecast.
Oracle Text uses standard SQL to index, search, and analyze text and documents stored in the Oracle database, in files, and on the web. It also supports automatic classification and clustering of document collections. Many of these analytical features are layered on top of ODM functionality.
Oracle Spatial is an option for Oracle Enterprise Edition that provides advanced spatial features to support high-end GIS and LBS solutions. Oracle Spatial's analysis and mining capabilities include functions for binning, detection of regional patterns, spatial correlation, colocation mining, and spatial clustering. Oracle Spatial also includes support for topology and network data models and analytics. The topology data model of Oracle Spatial allows one to work with data about nodes, edges, and faces in a topology. It includes network analysis functions for computing shortest path, minimum cost spanning tree, nearest-neighbors analysis, traveling salesman problem, among others.
Readings: Business intelligence, Data mining, Oracle analytics