US7024417B1 - Data mining framework using a signature associated with an algorithm - Google Patents

Data mining framework using a signature associated with an algorithm Download PDF

Info

Publication number
US7024417B1
US7024417B1 US10/295,593 US29559302A US7024417B1 US 7024417 B1 US7024417 B1 US 7024417B1 US 29559302 A US29559302 A US 29559302A US 7024417 B1 US7024417 B1 US 7024417B1
Authority
US
United States
Prior art keywords
task
algorithm
signature
template
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US10/295,593
Inventor
Alexander Russakovsky
Uri Rodny
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Hyperion Solutions Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hyperion Solutions Corp filed Critical Hyperion Solutions Corp
Priority to US10/295,593 priority Critical patent/US7024417B1/en
Assigned to HYPERION SOLUTIONS CORPORATION reassignment HYPERION SOLUTIONS CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RODNY, URI, RUSSAKOVSKY, ALEXANDER
Application granted granted Critical
Publication of US7024417B1 publication Critical patent/US7024417B1/en
Assigned to ORACLE INTERNATIONAL CORPORATION reassignment ORACLE INTERNATIONAL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BEA SYSTEMS, INC.
Assigned to ORACLE INTERNATIONAL CORPORATION reassignment ORACLE INTERNATIONAL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HYPERION SOLUTIONS CORPORATION
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99936Pattern matching access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure

Definitions

  • the present invention relates to the field of data mining More specifically, the present invention relates to a universal framework for data mining.
  • Data Mining is a common term for the process of finding useful hidden dependencies or patterns in large amounts of data.
  • the process by which such dependencies or patterns are found is typically called an algorithm.
  • Data Mining activity typically follows a certain workflow having several important stages: data preparation, training (also called building), testing, and application.
  • Data preparation involves preparing the data in a format that can be utilized by an algorithm. Training involves the construction of a concise representation of the algorithm's findings about the data, referred to as the mining model. Testing involves validation of that model. Then, application involves utilizing the model to efficiently produce new previously unknown information, such as projecting the data to predict future events.
  • FIG. 1 is a diagram illustrating the typical organizational flow of data mining.
  • Data 100 is first prepared 102 . This may include cleaning up the formatting of the data so that it is in a form usable by the system. Then a user chooses which data to mine 104 . This data is fed to a build model method 106 , which builds a model based on the data. A test model method 108 , then tests the model and determines whether it can be applied to other data. An application method 110 then may apply the model to other data, after which results may be obtained 112 .
  • Data that needs to be mined may originate from a variety of sources.
  • Each data mining algorithm (which describes how to build, test, and apply the model, among other things) may have different requirements for the data format it takes on input, and produces on output.
  • Mining algorithm vendors have struggled to map various data sources to their input/output requirements.
  • Each mining algorithm vendor may create algorithms that build, test, and apply a certain model. Thus far, it has been all but impossible to use the software implementation of an algorithm with a new data source.
  • a framework is provided that enables data mining algorithms to be plugged into it without any change to algorithm software implementations, while still providing all the standard data mining tasks. It may be implemented by the data source provider, however, one of ordinary skill in the art will recognize that the invention should not be limited to implementations where it is implemented by the data source provider. It also then allows for the complete separation of data storage and algorithms.
  • the framework may become responsible for preparing a set of “prompts” to the user asking him to provide some expression which is specific to the particular kind of data the user is working with.
  • FIG. 1 is a diagram illustrating the typical organizational flow of data mining.
  • FIG. 2 is a diagram illustrating some of the objects which may be maintained by a framework in accordance with an embodiment of the present invention.
  • FIG. 3 is a flow diagram illustrating a method for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention.
  • FIG. 4 is a block diagram an apparatus for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention.
  • the components, process steps, and/or data structures may be implemented using various types of operating systems, computing platforms, computer programs, and/or general purpose machines.
  • devices of a less general purpose nature such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herein.
  • the present invention may be implemented using Extensible Markup Language (XML).
  • XML Extensible Markup Language
  • one of ordinary skill in the art will recognize that there may be other extensible markup languages other than XML, and the term XML in the specification should not be read to be limited to one implementation or version of XML. Additionally, the present invention may be implemented using non-extensible programming languages as well, and an extensible markup language implementation is simply one possibility.
  • An object is a self-contained module of data and its associated processing.
  • An object is an instance of a class, which defines a collection of objects that share the same characteristics.
  • Interface describes the methods of an algorithm, so indicating that the algorithm implementation must follow certain rules is the same as indicating that the algorithm class must implement a certain interface.
  • Each mining attribute plays a certain role, for example, it can represent an independent variable (predictor), a derived variable (target), or a model element. It can also represent a collection of variables and/or elements.
  • Each mining attribute's actual data values can be accessed through its accessor.
  • An accessor is an interface featuring methods (such as getValue) used to get values for different data types. These methods may return the data value at the current position.
  • One attribute may refer to many positions in the data, so it is often necessary to have a cursor that points to the current position and a way to iterate through positions. There may be several directions in which iteration is possible, called axes.
  • a typical example is a table where one can iterate over columns as well as over rows.
  • Several axes each within its own accessor, may form a domain. This means that iteration over the axes is synchronized: changing a position in one of the axes leads to a simultaneous change in other axes within the domain. Thus, iteration actually takes place on the domain.
  • a domain is a type of interface with typical iterator methods, such as reset, advance, getSize, getPosition, setPosition, etc.
  • the present invention provides a framework that enables data mining algorithms to be plugged into it without any change to algorithm software implementations, while still providing all the standard data mining tasks. It may be implemented by the data source provider, however, one of ordinary skill in the art will recognize that the invention should not be limited to implementations where it is implemented by the data source provider. It also then allows for the complete separation of data storage and algorithms.
  • the framework itself is a generic tool for performing data mining. Data mining operations are performed by mining algorithms that the framework initially does not know about but is able to supply with data according to their signatures.
  • the signature determines which data is required and what its logical structure is but does not make any assumptions as to how this data is physically organized.
  • the framework When the user initiates a mining session and picks an algorithm for a build task or a model for an apply or test task, the framework becomes responsible for preparing a set of “prompts” to the user asking him to provide some expression which is specific to the particular kind of data source that the user is working with. This means that while an algorithm would not change depending on the kind of data source, the framework's data access layer typically needs to be implemented for each particular kind of data source.
  • a specific embodiment of the present invention is a particular variation of the data mining framework described herein that implements data access layer to mutidimensional data.
  • the interfaces described in the invention such as DataProvider, DataAccessor, DomainIterator, etc are implemented in Java and expect the algorithms to implement the Algorithm interface in Java as well, although the implementation of the latter interface in Java could be just a wrapper on top of C or C++ code.
  • This implementation of the framework allows the algorithms to access data directly within the server process thereby avoiding movement of large amounts of data across the network.
  • the user may also be provided with the ability to perform data mining related activities via a graphical user interface built according to the principles described in the invention.
  • Algorithm implementations may cover the main kinds of data mining algorithms such as regression, clustering, neural networks, association rules, decision trees, naive Bayes, etc. Despite very different nature of these algorithms they all work well within the framework.
  • the algorithm software developer may implement an interface with methods build, test, apply, etc., one for each mining task, as well as methods setParameterValue, for each supported parameter type, and method getSignature.
  • the mining task methods may each take one parameter of the type DataProvider, which is another interface whose only purpose is to let the algorithm obtain the objects implementing the accessor and domain interfaces by name for each accessor and domain involved in the task.
  • the task methods may return success statuses (true or false?).
  • the setParameterValue methods may each take two arguments, one being the name of the parameter and the other being of one of several supported types (e.g., double, integer, boolean, text string, etc.). The purpose of these methods is to let the framework communicate to the algorithm the values of parameters which may be required for the particular task invocation.
  • a getSignature method of the algorithm may take no arguments and return the signature object.
  • a signature is used to describe parameters required by tasks in an algorithm. This signature describes not only the number and type of parameters, but also may include an information field, which is utilized to describe some or all of the functionality of each parameter. The functionality will typically include the meaning of the parameter and/or the recommended usage of the parameter.
  • the system may utilize the signature for a particular algorithm to create a template for each task.
  • the template may indicate one or more fields that need to be initialized by the user to invoke the task, as well as information retrieved from the information field.
  • a graphical user interface may then be generated using the template, where the user can initialize the fields by indicating a mapping between the terms of the task and the actual data source. This allows each algorithm the luxury of ignoring the complexity of the data, and simply dealing with the mapping it is passed.
  • FIG. 2 is a diagram illustrating some of the objects which may be maintained by a framework in accordance with an embodiment of the present invention.
  • Algorithms 200 and filters 202 may be framework objects but may actually be maintained externally. Algorithms have been described earlier in this document, and filters are functions that perform transformations on data.
  • a model object 204 may be maintained by the framework.
  • a template object 206 may also then be maintained by the framework.
  • a session (runtime object) 208 may be maintained.
  • results 210 may be maintained. Functions available for objects in the framework may include list, add, and remove.
  • the signature file may be implemented as a text string in an file in accordance with a certain XML format.
  • the XML format may be defined in an XML document type definition (DTD), such as:
  • FIG. 3 is a flow diagram illustrating a method for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention.
  • the algorithm has been assigned some unique name under which it becomes known to the users.
  • a signature has also been created for the algorithm by the algorithm developer.
  • the user To invoke a particular mining task, the user first chooses a (previously built) model to apply. Then, from the model, the framework determines which algorithm should be used to apply it, specifying the algorithm name. Thus, at 300 , the framework receives the algorithm name.
  • the signature for the algorithm with that algorithm name is retrieved.
  • this signature may be compared with an XML DTD to determine if it is of the proper format. If not, then the algorithm is not supported by the framework and the data mining task cannot proceed. If it is supported, however, at 306 the framework extracts the information from the signature (what parameters and accessors are required for the task, what structure (domains) those accessors have, what role they play, etc.). At 308 , the framework uses this information to create a mining task template for the particular mining task and the particular algorithm.
  • the template is a specification of all the fields that need to be initialized by the user to invoke the task, together with the information about their recommended usage.
  • the framework may generate a graphical user interface (GUI) having a graphical dialog in which the user can initialize the required fields. This may include prompting the user to provide a mapping from terms of the algorithm to an actual data source.
  • GUI graphical user interface
  • the framework may dynamically create accessor and domain objects in response to the user-initialized required fields.
  • the framework may assemble the accessor and domain objects into a data provider object.
  • the framework may call an appropriate setParameterValue or similar method of the algorithm if any parameters need to be initialized.
  • the framework may call the appropriate task method with the data provider object as the argument. This completes the mining task.
  • FIG. 4 is a block diagram an apparatus for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention.
  • the algorithm has been assigned some unique name under which it becomes known to the users.
  • a signature has also been created for the algorithm by the algorithm developer. To invoke a particular mining task, the user first chooses a (previously built) model to apply. Then, from the model, the framework determines which algorithm should be used to apply it, specifying the algorithm name.
  • a signature information field receiver 400 may retrieve the signature for the algorithm with that algorithm name.
  • a signature verifier 402 coupled to the signature information field receiver 400 may compare the signature with an XML DTD to determine if it is of the proper format.
  • a task template creator 404 coupled to the signature information field receiver 400 may use this information to create a mining task template for the particular mining task and the particular algorithm.
  • the template is a specification of all the fields that need to be initialized by the user to invoke the task, together with the information about their recommended usage.
  • a graphical dialog generator 406 coupled to the task template creator 404 may generate a graphical user interface (GUI) having a graphical dialog in which the user can initialize the required fields.
  • GUI graphical user interface
  • An accessor and domain objects dynamic creator 408 coupled to the graphical dialog generator 406 may dynamically create accessor and domain objects in response to the user-initialized required fields. Then, an accessor and domain object data provider object assembler 410 coupled to the access and domain objects dynamic creator 408 may assemble the accessor and domain objects into a data provider object.
  • the framework may call an appropriate setParameterValue or similar method of the algorithm if any parameters need to be initialized. Then an algorithm caller 412 coupled to the accessor and domain object data provider object assembler 410 may call the appropriate task method of the algorithm with the data provider object as the argument. This completes the mining task.
  • the framework prompts the user for a mapping from the terms of the algorithm to the actual data store.
  • This mapping is transparent for the algorithm, but the framework uses it to construct the accessor and domain objects.
  • the way that the mapping is created depends on the data access mechanism for the data store. For example, when dealing with relational sources, a Structured Query Language (SQL) mapping may be used, but for multidimensional databases, some multidimensional query language mapping may be used. In either case, the user provides some source specific linguistic expression or source specific “query object” for each axis in each accessor.
  • SQL Structured Query Language
  • Mining objects such as models and result sets are usually stored in at the site of the original data source. Therefore, it is the algorithm's responsibility to make sure that they are persisted. Since their structure is described in the algorithm signature, the framework has all the necessary information to build the corresponding accessors with “write access” mode, so the algorithm can use those accessors to save the objects.
  • the framework may capture the expressions entered by the user (or constructed internally) so that the mining objects can be located and retrieved at any time.
  • the major mining tasks (build, test, and apply) have been described above. Each algorithm typically must support at least these three tasks. However, there may be other tasks that make sense within the framework, for instance exporting mining models to and importing mining models from Predictive Model Markup Language (PMML).
  • PMML Predictive Model Markup Language
  • the former takes a model object built by a particular algorithm and represents it in PMML format whereas the latter takes a model in PMML format and creates a model object that can be used for application purposes within the framework.
  • These “exchange” tasks should also be described in the algorithm signature if the algorithm supports them. More tasks can be easily added to the framework workflow as it evolves.
  • the framework captures all the information regarding the location of a mining object in the data store (called object metadata), and each such object is uniquely named, it is possible for the user to query the objects through the framework.
  • object metadata the information regarding the location of a mining object in the data store
  • the framework uses the object metadata and the regular means available in the particular data store (such as SQL, or multidimensional query language) to retrieve the object data. This way although the object's signature may be specific to the algorithm used, it can be queried and retrieved in standard format without the algorithm. This provides great flexibility to the client tools because they do not have to worry about how to access mining objects.
  • the algorithm is a linear regression algorithm.
  • Linear regression attempts to determine the equation of a line that best represents a series of data points.
  • a value for x i may be plugged in, resulting in a predicted value for y i derived using a formula involving the coefficients.
  • Its signature file may then contain information regarding various accessors, including predictor, target, slope, and intercept.
  • accessor predictor may have two parameters, indicated in the signature file as domain 1 and domain 2 . This, therefore, indicates that predictor has two parameters, and that they are different from each other.
  • Accessor target may also have two parameters, indicated in the signature file as domain 3 and domain 2 . This indicates that the second parameter for target is the navigated simultaneously with the second parameter for predictor.
  • accessors for slope write, and for intercept may be provided, each having a single parameter.
  • the names of the accessors (e.g., predictor, target) may also indicate the role of the accessor. The described information is sufficient to perform the build task.
  • the signature for the apply task may contain information regarding similar accessors, except that this time, slope and intercept as well as predictor may indicate read access and target may indicate write access. Or, the apply task may be contain some other accessors describing the output of the algorithm, for instance, expected precision of the line fit, various statistics about the algorithm execution, model characteristics, etc.

Abstract

A framework is provided that enables data mining algorithms to be plugged into it without any change to algorithm software implementations, while still providing all the standard data mining tasks. It may be implemented by the data source provider. It also then allows for the complete separation of data storage and algorithms. When the user initiates a mining session and picks an algorithm for build task or a model for an apply or test task, the framework may become responsible for preparing a set of “prompts” to the user asking him to provide some expression which is specific to the particular kind of data the user is working with.

Description

FIELD OF THE INVENTION
The present invention relates to the field of data mining More specifically, the present invention relates to a universal framework for data mining.
BACKGROUND OF THE INVENTION
Data Mining is a common term for the process of finding useful hidden dependencies or patterns in large amounts of data. The process by which such dependencies or patterns are found is typically called an algorithm. Data Mining activity typically follows a certain workflow having several important stages: data preparation, training (also called building), testing, and application. Data preparation involves preparing the data in a format that can be utilized by an algorithm. Training involves the construction of a concise representation of the algorithm's findings about the data, referred to as the mining model. Testing involves validation of that model. Then, application involves utilizing the model to efficiently produce new previously unknown information, such as projecting the data to predict future events.
FIG. 1 is a diagram illustrating the typical organizational flow of data mining. Data 100 is first prepared 102. This may include cleaning up the formatting of the data so that it is in a form usable by the system. Then a user chooses which data to mine 104. This data is fed to a build model method 106, which builds a model based on the data. A test model method 108, then tests the model and determines whether it can be applied to other data. An application method 110 then may apply the model to other data, after which results may be obtained 112.
Data that needs to be mined may originate from a variety of sources. Each data mining algorithm (which describes how to build, test, and apply the model, among other things) may have different requirements for the data format it takes on input, and produces on output. Mining algorithm vendors have struggled to map various data sources to their input/output requirements. Each mining algorithm vendor may create algorithms that build, test, and apply a certain model. Thus far, it has been all but impossible to use the software implementation of an algorithm with a new data source.
What is needed is a solution that allows data mining algorithms from different vendors to be plugged in without any change to the algorithm software implementation, and also could be used to perform all the standard mining tasks.
BRIEF DESCRIPTION
A framework is provided that enables data mining algorithms to be plugged into it without any change to algorithm software implementations, while still providing all the standard data mining tasks. It may be implemented by the data source provider, however, one of ordinary skill in the art will recognize that the invention should not be limited to implementations where it is implemented by the data source provider. It also then allows for the complete separation of data storage and algorithms. When the user initiates a mining session and picks an algorithm for a build task or a model for an apply or test task, the framework may become responsible for preparing a set of “prompts” to the user asking him to provide some expression which is specific to the particular kind of data the user is working with.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more embodiments of the present invention and, together with the detailed description, serve to explain the principles and implementations of the invention.
In the drawings:
FIG. 1 is a diagram illustrating the typical organizational flow of data mining.
FIG. 2 is a diagram illustrating some of the objects which may be maintained by a framework in accordance with an embodiment of the present invention.
FIG. 3 is a flow diagram illustrating a method for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention.
FIG. 4 is a block diagram an apparatus for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION
Embodiments of the present invention are described herein in the context of a system of computers, servers, and software. Those of ordinary skill in the art will realize that the following detailed description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the present invention will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the present invention as illustrated in the accompanying drawings. The same reference indicators will be used throughout the drawings and the following detailed description to refer to the same or like parts.
In the interest of clarity, not all of the routine features of the implementations described herein are shown and described. It will, of course, be appreciated that in the development of any such actual implementation, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with application- and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of this disclosure.
In accordance with the present invention, the components, process steps, and/or data structures may be implemented using various types of operating systems, computing platforms, computer programs, and/or general purpose machines. In addition, those of ordinary skill in the art will recognize that devices of a less general purpose nature, such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herein.
The present invention may be implemented using Extensible Markup Language (XML). However, one of ordinary skill in the art will recognize that there may be other extensible markup languages other than XML, and the term XML in the specification should not be read to be limited to one implementation or version of XML. Additionally, the present invention may be implemented using non-extensible programming languages as well, and an extensible markup language implementation is simply one possibility.
The terms “object”, “class”, and “interface” will be used throughout this document. An object is a self-contained module of data and its associated processing. An object is an instance of a class, which defines a collection of objects that share the same characteristics. Interface describes the methods of an algorithm, so indicating that the algorithm implementation must follow certain rules is the same as indicating that the algorithm class must implement a certain interface.
Data mining algorithms work with data, which also may be referred to as variables called mining attributes. Each mining attribute plays a certain role, for example, it can represent an independent variable (predictor), a derived variable (target), or a model element. It can also represent a collection of variables and/or elements. Each mining attribute's actual data values can be accessed through its accessor. An accessor is an interface featuring methods (such as getValue) used to get values for different data types. These methods may return the data value at the current position. One attribute may refer to many positions in the data, so it is often necessary to have a cursor that points to the current position and a way to iterate through positions. There may be several directions in which iteration is possible, called axes. A typical example is a table where one can iterate over columns as well as over rows. Several axes, each within its own accessor, may form a domain. This means that iteration over the axes is synchronized: changing a position in one of the axes leads to a simultaneous change in other axes within the domain. Thus, iteration actually takes place on the domain. Hence, a domain is a type of interface with typical iterator methods, such as reset, advance, getSize, getPosition, setPosition, etc.
The present invention provides a framework that enables data mining algorithms to be plugged into it without any change to algorithm software implementations, while still providing all the standard data mining tasks. It may be implemented by the data source provider, however, one of ordinary skill in the art will recognize that the invention should not be limited to implementations where it is implemented by the data source provider. It also then allows for the complete separation of data storage and algorithms.
The framework itself is a generic tool for performing data mining. Data mining operations are performed by mining algorithms that the framework initially does not know about but is able to supply with data according to their signatures. The signature determines which data is required and what its logical structure is but does not make any assumptions as to how this data is physically organized.
When the user initiates a mining session and picks an algorithm for a build task or a model for an apply or test task, the framework becomes responsible for preparing a set of “prompts” to the user asking him to provide some expression which is specific to the particular kind of data source that the user is working with. This means that while an algorithm would not change depending on the kind of data source, the framework's data access layer typically needs to be implemented for each particular kind of data source.
A specific embodiment of the present invention is a particular variation of the data mining framework described herein that implements data access layer to mutidimensional data. As a language for implementation it uses a mixture of C and Java programming languages. The interfaces described in the invention such as DataProvider, DataAccessor, DomainIterator, etc are implemented in Java and expect the algorithms to implement the Algorithm interface in Java as well, although the implementation of the latter interface in Java could be just a wrapper on top of C or C++ code. This implementation of the framework allows the algorithms to access data directly within the server process thereby avoiding movement of large amounts of data across the network. The user may also be provided with the ability to perform data mining related activities via a graphical user interface built according to the principles described in the invention.
This implementation in particular proves the validity of the framework design and the fact that the framework actually works and satisfies the requirements stated in the invention description. Algorithm implementations may cover the main kinds of data mining algorithms such as regression, clustering, neural networks, association rules, decision trees, naive Bayes, etc. Despite very different nature of these algorithms they all work well within the framework.
The algorithm software developer may implement an interface with methods build, test, apply, etc., one for each mining task, as well as methods setParameterValue, for each supported parameter type, and method getSignature. The mining task methods may each take one parameter of the type DataProvider, which is another interface whose only purpose is to let the algorithm obtain the objects implementing the accessor and domain interfaces by name for each accessor and domain involved in the task. The task methods may return success statuses (true or false?). The setParameterValue methods may each take two arguments, one being the name of the parameter and the other being of one of several supported types (e.g., double, integer, boolean, text string, etc.). The purpose of these methods is to let the framework communicate to the algorithm the values of parameters which may be required for the particular task invocation. A getSignature method of the algorithm may take no arguments and return the signature object.
A signature is used to describe parameters required by tasks in an algorithm. This signature describes not only the number and type of parameters, but also may include an information field, which is utilized to describe some or all of the functionality of each parameter. The functionality will typically include the meaning of the parameter and/or the recommended usage of the parameter. The system may utilize the signature for a particular algorithm to create a template for each task. The template may indicate one or more fields that need to be initialized by the user to invoke the task, as well as information retrieved from the information field. A graphical user interface may then be generated using the template, where the user can initialize the fields by indicating a mapping between the terms of the task and the actual data source. This allows each algorithm the luxury of ignoring the complexity of the data, and simply dealing with the mapping it is passed.
FIG. 2 is a diagram illustrating some of the objects which may be maintained by a framework in accordance with an embodiment of the present invention. Algorithms 200 and filters 202 may be framework objects but may actually be maintained externally. Algorithms have been described earlier in this document, and filters are functions that perform transformations on data. After the build method has been executed, a model object 204 may be maintained by the framework. Once a template is created, a template object 206 may also then be maintained by the framework. At runtime, a session (runtime object) 208 may be maintained. After the apply method is performed, results 210 may be maintained. Functions available for objects in the framework may include list, add, and remove.
In an embodiment of the present invention, the signature file may be implemented as a text string in an file in accordance with a certain XML format. The XML format may be defined in an XML document type definition (DTD), such as:
<!ELEMENT algorithm (information, domain*, attribute*, task+)>
<!ATTLIST algorithm
function (Regression | Clustering | DecisionTree | NeuralNet |
AssociationRules) #REQUIRED
vector (false | true) ‘false’
>
<!ELEMENT information (#PCDATA)>
<!ELEMENT task (information, parameter*, accessor*)>
<ATTLIST task
mode (build | test | apply | import | export | score) #REQUIRED
>
<!ELEMENT parameter (information, itemlist*)>
<!ELEMENT itemlist (item+)>
<!ELEMENT item (information*)>
<!ATTLIST item
value NMTOKEN #REQUIRED
description CDATA #IMPLIED
>
<!ATTLIST parameter
name NMTOKEN #REQUIRED
type (double | integer | string | boolean | enum) ‘double’
value NMTOKEN #REQUIRED
>
<!ELEMENT attribute (information, axis*)>
<!ATTLIST attribute
name NMTOKEN #REQUIRED
type (numerical | categorical | ordinal | boolean) ‘numerical’
parameter NMTOKEN #IMPLIED
role (predictor | target | model) #IMPLIED
>
<!ELEMENT domain (information*)>
<!ATTLIST domain
name NMTOKEN #REQUIRED
type (regular | attribute | sequence | modelData | internal) ‘regular’
size NMTOKEN #IMPLIED
>
<!ELEMENT accessor (information*)>
<!ATTLIST accessor
attribute NMTOKEN #REQUIRED
mode (read | write) #IMPLLED
>
<!ELEMENT axis (information*)>
<!ATTLIST axis
domain NMTOKEN #REQUIRED
>
FIG. 3 is a flow diagram illustrating a method for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention. The algorithm has been assigned some unique name under which it becomes known to the users. A signature has also been created for the algorithm by the algorithm developer. To invoke a particular mining task, the user first chooses a (previously built) model to apply. Then, from the model, the framework determines which algorithm should be used to apply it, specifying the algorithm name. Thus, at 300, the framework receives the algorithm name. At 302, the signature for the algorithm with that algorithm name is retrieved. At 304, this signature may be compared with an XML DTD to determine if it is of the proper format. If not, then the algorithm is not supported by the framework and the data mining task cannot proceed. If it is supported, however, at 306 the framework extracts the information from the signature (what parameters and accessors are required for the task, what structure (domains) those accessors have, what role they play, etc.). At 308, the framework uses this information to create a mining task template for the particular mining task and the particular algorithm. The template is a specification of all the fields that need to be initialized by the user to invoke the task, together with the information about their recommended usage. At 310, the framework may generate a graphical user interface (GUI) having a graphical dialog in which the user can initialize the required fields. This may include prompting the user to provide a mapping from terms of the algorithm to an actual data source. At 312, the framework may dynamically create accessor and domain objects in response to the user-initialized required fields. Then, at 314, the framework may assemble the accessor and domain objects into a data provider object. At 316, the framework may call an appropriate setParameterValue or similar method of the algorithm if any parameters need to be initialized. Then at 318, the framework may call the appropriate task method with the data provider object as the argument. This completes the mining task.
FIG. 4 is a block diagram an apparatus for data mining using an algorithm, the algorithm having one or more tasks, each task having a number of parameters, each parameter having a type, in accordance with an embodiment of the present invention. The algorithm has been assigned some unique name under which it becomes known to the users. A signature has also been created for the algorithm by the algorithm developer. To invoke a particular mining task, the user first chooses a (previously built) model to apply. Then, from the model, the framework determines which algorithm should be used to apply it, specifying the algorithm name. A signature information field receiver 400 may retrieve the signature for the algorithm with that algorithm name. A signature verifier 402 coupled to the signature information field receiver 400 may compare the signature with an XML DTD to determine if it is of the proper format. If not, then the algorithm is not supported by the framework and the data mining task cannot proceed. If it is supported, however, the framework extracts the information from the signature (what parameters and accessors are required for the task, what structure (domains) those accessors have, what role they play, etc.). A task template creator 404 coupled to the signature information field receiver 400 may use this information to create a mining task template for the particular mining task and the particular algorithm. The template is a specification of all the fields that need to be initialized by the user to invoke the task, together with the information about their recommended usage. A graphical dialog generator 406 coupled to the task template creator 404 may generate a graphical user interface (GUI) having a graphical dialog in which the user can initialize the required fields. This may include prompting the user to provide a mapping from terms of the algorithm to an actual data source. An accessor and domain objects dynamic creator 408 coupled to the graphical dialog generator 406 may dynamically create accessor and domain objects in response to the user-initialized required fields. Then, an accessor and domain object data provider object assembler 410 coupled to the access and domain objects dynamic creator 408 may assemble the accessor and domain objects into a data provider object. The framework may call an appropriate setParameterValue or similar method of the algorithm if any parameters need to be initialized. Then an algorithm caller 412 coupled to the accessor and domain object data provider object assembler 410 may call the appropriate task method of the algorithm with the data provider object as the argument. This completes the mining task.
As described above, the framework prompts the user for a mapping from the terms of the algorithm to the actual data store. This mapping is transparent for the algorithm, but the framework uses it to construct the accessor and domain objects. The way that the mapping is created depends on the data access mechanism for the data store. For example, when dealing with relational sources, a Structured Query Language (SQL) mapping may be used, but for multidimensional databases, some multidimensional query language mapping may be used. In either case, the user provides some source specific linguistic expression or source specific “query object” for each axis in each accessor.
Mining objects such as models and result sets are usually stored in at the site of the original data source. Therefore, it is the algorithm's responsibility to make sure that they are persisted. Since their structure is described in the algorithm signature, the framework has all the necessary information to build the corresponding accessors with “write access” mode, so the algorithm can use those accessors to save the objects. The framework may capture the expressions entered by the user (or constructed internally) so that the mining objects can be located and retrieved at any time.
The major mining tasks (build, test, and apply) have been described above. Each algorithm typically must support at least these three tasks. However, there may be other tasks that make sense within the framework, for instance exporting mining models to and importing mining models from Predictive Model Markup Language (PMML). The former takes a model object built by a particular algorithm and represents it in PMML format whereas the latter takes a model in PMML format and creates a model object that can be used for application purposes within the framework. These “exchange” tasks should also be described in the algorithm signature if the algorithm supports them. More tasks can be easily added to the framework workflow as it evolves.
Since the framework captures all the information regarding the location of a mining object in the data store (called object metadata), and each such object is uniquely named, it is possible for the user to query the objects through the framework. The framework uses the object metadata and the regular means available in the particular data store (such as SQL, or multidimensional query language) to retrieve the object data. This way although the object's signature may be specific to the algorithm used, it can be queried and retrieved in standard format without the algorithm. This provides great flexibility to the client tools because they do not have to worry about how to access mining objects.
An example is hereby provided to help illustrate some of the terms utilized in this document. One of ordinary skill in the art will recognize that this is merely an example, and is not intended to be limiting in any way. Suppose the algorithm is a linear regression algorithm. Linear regression attempts to determine the equation of a line that best represents a series of data points. The equation of the line may be described using two coefficients, slope and intercept. This line may then be used to predict future data points. If the initial training data points are represented as (xi, yi)N i=1, and xi is a predictor and yi is a target, the algorithm uses their values to produce appropriate coefficients. These coefficients comprise a model of the training dataset.
Thus, to predict a value in the future, a value for xi may be plugged in, resulting in a predicted value for yi derived using a formula involving the coefficients.
The knowledge, which data must be provided on input and which coefficients would be produced for the model, lies exclusively with the algorithm (and not with the framework). Its signature file may then contain information regarding various accessors, including predictor, target, slope, and intercept. For the build task, that aims at deriving the model coefficients, accessor predictor may have two parameters, indicated in the signature file as domain 1 and domain 2. This, therefore, indicates that predictor has two parameters, and that they are different from each other. The graphic dialog may prompt for the mapping of these two parameters, which the user may respond with as “time” (for domain 1), which indicates where to take the data from, and “I=1 . . . N” or “from January 2001 to September 2002” (for domain 2), which indicates how to navigate through time data. Accessor target may also have two parameters, indicated in the signature file as domain 3 and domain 2. This indicates that the second parameter for target is the navigated simultaneously with the second parameter for predictor. The mapping of domain 3 may be to “sales”, whereas the mapping for domain 2 may remain “I=1 . . . N”. Additionally, accessors for slope write, and for intercept may be provided, each having a single parameter. The names of the accessors (e.g., predictor, target) may also indicate the role of the accessor. The described information is sufficient to perform the build task. The signature for the apply task may contain information regarding similar accessors, except that this time, slope and intercept as well as predictor may indicate read access and target may indicate write access. Or, the apply task may be contain some other accessors describing the output of the algorithm, for instance, expected precision of the line fit, various statistics about the algorithm execution, model characteristics, etc.
While embodiments and applications of this invention have been shown and described, it would be apparent to those skilled in the art having the benefit of this disclosure that many more modifications than mentioned above are possible without departing from the inventive concepts herein. The invention, therefore, is not to be restricted except in the spirit of the appended claims.

Claims (31)

1. A method for data mining using an algorithm, the algorithm having a build task, a test task, and an apply task, each task having a number of parameters, each parameter having a type, the method comprising:
retrieving a signature associated with the algorithm, said signature including, for the build task, the number of parameters and the type of each parameter associated with said task, as well as an information field for each parameter associated with said task, said information field indicating the meaning and/or recommended usage of said parameter, said signature also including, for the build task, one or more coefficients for the algorithm; and
creating a template for said the build task based on said signature, said template indicating one or more of said parameters that need to be initialized by a user to invoke said task and one or more model values that are to be derived from a data set; and
executing said template to create a mapping between said one or more coefficients and said one or more model values.
2. The method of claim 1, further comprising:
verifying said signature.
3. The method of claim 2, wherein said verifying comprises:
comparing said signature to a predefined standard format for said signature; and
rejecting said signature if said signature does not match said predefined standard format.
4. The method of claim 3, wherein said predefined standard format is a document type definition.
5. The method of claim 1, further comprising:
generating a graphical dialog based on said template, said graphical dialog allowing a user to initialize required parameters.
6. The method of claim 5, further comprising:
dynamically creating accessor and domain objects in response to user-initialized required parameters; and
assembling said accessor and domain objects into a data provider object.
7. The method of claim 6, further comprising calling the algorithm using said data provider object as an argument.
8. The method of claim 6, wherein said graphical dialog prompts a user to provide a mapping from terms of the algorithm to an actual data source.
9. The method of claim 8, wherein said terms of the algorithm include accessors and domains.
10. The method of claim 9, wherein said graphical dialog further prompts a user to provide a source specific linguistic expression for each axis in each accessor.
11. The method of claim 10, wherein said source specific linguistic expression is a source specific query object.
12. A method for data mining using an algorithm, the algorithm having a build task, a test task, and an apply task, each task having a number of parameters, each parameter having a type, the method comprising:
retrieving a signature associated with the algorithm, said signature including, for the build task, the number of parameters and the type of each parameter associated with said task, as well as an information field for each parameter associated with said task, said information field indicating the meaning and/or recommended usage of said parameter, said signature also including, for the build task, one or more coefficients for the algorithm;
creating a template for the build task based on said signature, said template indicating one or more of said parameters that need to be initialized by a user to invoke said task and one or more model values that are to be derived from a data set; and
executing said template to create a mapping between said one or more coefficients and said one or more model values, said execution generating a set of prompts asking said user to provide some expression specific to a data source said user is working with.
13. An apparatus for data mining using an algorithm, the algorithm having, a build task, a test task, and an apply task, each task having a number of parameters, each parameter having a type, the apparatus comprising:
a signature information field receiver configured to retrieve a signature associated with the algorithm, said signature including, for the build task, the number of parameters and the type of each parameter associated with said task, as well as an information field for each parameter associated with said task, said information field indicating the meaning and/or recommended usage of said parameter, said signature also including, for the build task, one or more coefficients for the algorithm;
a task template creator coupled to said signature information field receiver, wherein the task template creator is configured to create a template for the build task based on said signature, said template indicating one or more of said parameters that need to be initialized by a user to invoke said task and one or more model values that are to be derived from a data set; and
an executing module coupled to the task template creator, wherein the execution module is configured to execute said template to create a mapping between said one or more coefficients and said one or more model values.
14. The apparatus of claim 13, further comprising a signature verifier coupled to said signature information field receiver.
15. The apparatus of claim 13, further comprising a graphical dialog generator coupled to said task template creator.
16. The apparatus of claim 15, further comprising:
an accessor and domain objects dynamic creator coupled to said graphic dialog generator; and
an accessor and domain object data provider object assembler coupled to said accessor and domain objects dynamic creator.
17. The apparatus of claim 16, further comprising an algorithm caller coupled to said accessor and domain object data provider object assembler.
18. An apparatus for data mining using an algorithm, the algorithm having a build task, a test task, and an apply task, each task having a number of parameters, each parameter having a type, the apparatus comprising:
means for retrieving a signature associated with the algorithm, said signature including, for the build task, the number of parameters and the type of each parameter associated with said task, as well as an information field for each parameter associated with said task, said information field indicating the meaning and/or recommended usage of said parameter, said signature also including, for the build task, one or more coefficients for the algorithm;
means for creating a template for the build task based on said signature, said template indicating one or more of said parameters that need to be initialized by a user to invoke said task and one or more model values that are to be derived from a data set; and
means for executing said template to create a mapping between said one or more coefficients and said one or more model values.
19. The apparatus of claim 18, further comprising:
means for verifying said signature.
20. The apparatus of claim 19, wherein said verifying comprises:
means for comparing said signature to a predefined standard format for said signature; and
means for rejecting said signature if said signature does not match said predefined standard format.
21. The apparatus of claim 20, wherein said predefined standard format is a document type definition.
22. The apparatus of claim 18, further comprising:
means for generating a graphical dialog based on said template, said graphical dialog allowing a user to initialize required parameters.
23. The apparatus of claim 22, further comprising:
means for dynamically creating accessor and domain objects in response to user-initialized required parameters; and
means for assembling said accessor and domain objects into a data provider object.
24. The apparatus of claim 23, further comprising means for calling the algorithm using said data provider object as an argument.
25. The apparatus of claim 24, wherein said graphical dialog prompts a user to provide a mapping from terms of the algorithm to an actual data source.
26. The apparatus of claim 25, wherein said terms of the algorithm include accessors and domains.
27. The apparatus of claim 26, wherein said graphical dialog further prompts a user to provide a source specific linguistic expression for each axis in each accessor.
28. The apparatus of claim 27, wherein said source specific linguistic expression is a source specific query object.
29. An apparatus for data mining using an algorithm, the algorithm having a build task, a test task, and an apply task, each task having a number of parameters, each parameter having a type, the apparatus comprising:
means for retrieving a signature associated with the algorithm, said signature including, for the build task, the number of parameters and the type of each parameter associated with said task, as well as an information field for each parameter associated with said task, said information field indicating the meaning and/or recommended usage of said parameter, said signature also including, for the build task, one or more coefficients for the algorithm;
means for creating a template for the build task based on said signature, said template indicating one or more of said parameters that need to be initialized by a user to invoke said task and one or more model values that are to be derived from a data set; and
means for executing said template to create a mapping between said one or more coefficients and said one or more model values, said execution generating a set of prompts asking said user to provide some expression specific to a data source said user is working with.
30. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a method for data mining using an algorithm, the algorithm having a build task, a test task, and an apply task, each task having a number of parameters, each parameter having a type, the method comprising:
retrieving a signature associated with the algorithm, said signature including, for the build task, the number of parameters and the type of each parameter associated with said task, as well as an information field for each parameter associated with said task, said information field indicating the meaning and/or recommended usage of said parameter, said signature also including, for the build task, one or more coefficients for the algorithm; and
creating a template for the build task based on said signature, said template indicating one or more of said parameters that need to be initialized by a user to invoke said task and one or more model values that are to be derived from a data set; and
executing said template to create a mapping between said one or more coefficients and said one or more model values.
31. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a method for data mining using an algorithm, the algorithm having a build task, a test task, and an apply task, each task having a number of parameters, each parameter having a type, the method comprising:
retrieving a signature associated with the algorithm, said signature including, for the build task, the number of parameters and the type of each parameter associated with said task, as well as an information field for each parameter associated with said task, said information field indicating the meaning and/or recommended usage of said parameter, said signature also including, for the build task, one or more coefficients for the algorithm;
creating a template for the build task based on said signature, said template indicating one or more of said parameters fields that need to be initialized by a user to invoke said task and one or more model values that are to be derived from a data set; and
executing said template to create a mapping between said one or more coefficients and said one or more model values, said execution generating a set of prompts asking said user to provide some expression specific to a data source said user is working with.
US10/295,593 2002-11-14 2002-11-14 Data mining framework using a signature associated with an algorithm Expired - Lifetime US7024417B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/295,593 US7024417B1 (en) 2002-11-14 2002-11-14 Data mining framework using a signature associated with an algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/295,593 US7024417B1 (en) 2002-11-14 2002-11-14 Data mining framework using a signature associated with an algorithm

Publications (1)

Publication Number Publication Date
US7024417B1 true US7024417B1 (en) 2006-04-04

Family

ID=36102121

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/295,593 Expired - Lifetime US7024417B1 (en) 2002-11-14 2002-11-14 Data mining framework using a signature associated with an algorithm

Country Status (1)

Country Link
US (1) US7024417B1 (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040230978A1 (en) * 2003-05-15 2004-11-18 Achim Kraiss Analytical task invocation
US20040230417A1 (en) * 2003-05-16 2004-11-18 Achim Kraiss Multi-language support for data mining models
US20040230977A1 (en) * 2003-05-15 2004-11-18 Achim Kraiss Application interface for analytical tasks
US20040249867A1 (en) * 2003-06-03 2004-12-09 Achim Kraiss Mining model versioning
US20040250255A1 (en) * 2003-06-03 2004-12-09 Achim Kraiss Analytical application framework
US20050044486A1 (en) * 2000-06-21 2005-02-24 Microsoft Corporation User interface for integrated spreadsheets and word processing tables
US20050183006A1 (en) * 2004-02-17 2005-08-18 Microsoft Corporation Systems and methods for editing XML documents
US20050187973A1 (en) * 2004-02-19 2005-08-25 Microsoft Corporation Managing XML documents containing hierarchical database information
US20070101364A1 (en) * 2003-05-27 2007-05-03 Toru Morita Multimedia reproducing apparatus and reproducing method
US7281018B1 (en) 2004-05-26 2007-10-09 Microsoft Corporation Form template data source change
US20070255730A1 (en) * 2006-04-26 2007-11-01 Robert Mack Data requirements methodology
US20080040310A1 (en) * 2004-04-13 2008-02-14 Ramsey Mark S Method, system and program product for developing a data model in a data mining system
US7334187B1 (en) * 2003-08-06 2008-02-19 Microsoft Corporation Electronic form aggregation
US20080172735A1 (en) * 2005-10-18 2008-07-17 Jie Jenie Gao Alternative Key Pad Layout for Enhanced Security
US20090112717A1 (en) * 2007-10-31 2009-04-30 Ryan Steelberg Apparatus, system and method for a brand affinity engine with delivery tracking and statistics
US20090187525A1 (en) * 2006-07-28 2009-07-23 Persistent Systems Private Limited System and method for network association inference, validation and pruning based on integrated constraints from diverse data
US7627432B2 (en) 2006-09-01 2009-12-01 Spss Inc. System and method for computing analytics on structured data
US20100005051A1 (en) * 2006-07-28 2010-01-07 Persistent Systems Limited System and method for inferring a network of associations
US7676843B1 (en) 2004-05-27 2010-03-09 Microsoft Corporation Executing applications at appropriate trust levels
US7689929B2 (en) 2000-06-21 2010-03-30 Microsoft Corporation Methods and systems of providing information to computer users
US7692636B2 (en) 2004-09-30 2010-04-06 Microsoft Corporation Systems and methods for handwriting to a screen
US7712048B2 (en) 2000-06-21 2010-05-04 Microsoft Corporation Task-sensitive methods and systems for displaying command sets
US7712022B2 (en) 2004-11-15 2010-05-04 Microsoft Corporation Mutually exclusive options in electronic forms
US7721190B2 (en) 2004-11-16 2010-05-18 Microsoft Corporation Methods and systems for server side form processing
US7725834B2 (en) 2005-03-04 2010-05-25 Microsoft Corporation Designer-created aspect for an electronic form template
US7743063B2 (en) 2000-06-21 2010-06-22 Microsoft Corporation Methods and systems for delivering software via a network
US20100174720A1 (en) * 2006-04-26 2010-07-08 Robert Mack Coherent data identification method and apparatus for database table development
US7818677B2 (en) 2000-06-21 2010-10-19 Microsoft Corporation Single window navigation methods and systems
US7865477B2 (en) 2003-03-28 2011-01-04 Microsoft Corporation System and method for real-time validation of structured data files
US7900134B2 (en) 2000-06-21 2011-03-01 Microsoft Corporation Authoring arbitrary XML documents using DHTML and XSLT
US7904801B2 (en) 2004-12-15 2011-03-08 Microsoft Corporation Recursive sections in electronic forms
US7913159B2 (en) 2003-03-28 2011-03-22 Microsoft Corporation System and method for real-time validation of structured data files
US7925621B2 (en) 2003-03-24 2011-04-12 Microsoft Corporation Installing a solution
US7937651B2 (en) 2005-01-14 2011-05-03 Microsoft Corporation Structural editing operations for network forms
US7979856B2 (en) 2000-06-21 2011-07-12 Microsoft Corporation Network-based software extensions
US8001459B2 (en) 2005-12-05 2011-08-16 Microsoft Corporation Enabling electronic documents for limited-capability computing devices
US8010515B2 (en) 2005-04-15 2011-08-30 Microsoft Corporation Query to an electronic form
US20110231454A1 (en) * 2009-07-10 2011-09-22 Robert Mack Method and apparatus for converting heterogeneous databases into standardized homogeneous databases
US8078960B2 (en) 2003-06-30 2011-12-13 Microsoft Corporation Rendering an HTML electronic form by applying XSLT to XML using a solution
US8200975B2 (en) 2005-06-29 2012-06-12 Microsoft Corporation Digital signatures for network forms
US20140229491A1 (en) * 2013-02-08 2014-08-14 Arindam Bhattacharjee Converting data models into in-database analysis models
US8819072B1 (en) 2004-02-02 2014-08-26 Microsoft Corporation Promoting data from structured data files
US8874619B2 (en) 2011-06-03 2014-10-28 Robert Mack Method and apparatus for defining common entity relationships
US8892993B2 (en) 2003-08-01 2014-11-18 Microsoft Corporation Translation file
US8918729B2 (en) 2003-03-24 2014-12-23 Microsoft Corporation Designing electronic forms
US20150363472A1 (en) * 2014-06-11 2015-12-17 Siemens Aktiengesellschaft Computer system and method for analyzing data
US20170098101A1 (en) * 2014-12-23 2017-04-06 Yahoo! Inc. System and method for privacy-aware information extraction and validation
US20190272340A1 (en) * 2018-03-05 2019-09-05 Honeywell International Inc. System and method to configure a flow algorithm automatically by using a primary element data sheet in multivariable smart line transmitters
US10417263B2 (en) 2011-06-03 2019-09-17 Robert Mack Method and apparatus for implementing a set of integrated data systems
US10579238B2 (en) 2016-05-13 2020-03-03 Sap Se Flexible screen layout across multiple platforms
US10649611B2 (en) 2016-05-13 2020-05-12 Sap Se Object pages in multi application user interface
USRE48312E1 (en) 2013-01-21 2020-11-17 Robert Mack Method and apparatus for defining common entity relationships

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787425A (en) * 1996-10-01 1998-07-28 International Business Machines Corporation Object-oriented data mining framework mechanism
US5878432A (en) * 1996-10-29 1999-03-02 International Business Machines Corporation Object oriented framework mechanism for a source code repository
US6108004A (en) * 1997-10-21 2000-08-22 International Business Machines Corporation GUI guide for data mining
US20020184610A1 (en) * 2001-01-22 2002-12-05 Kelvin Chong System and method for building multi-modal and multi-channel applications
US6618852B1 (en) * 1998-09-14 2003-09-09 Intellichem, Inc. Object-oriented framework for chemical-process-development decision-support applications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787425A (en) * 1996-10-01 1998-07-28 International Business Machines Corporation Object-oriented data mining framework mechanism
US5878432A (en) * 1996-10-29 1999-03-02 International Business Machines Corporation Object oriented framework mechanism for a source code repository
US6108004A (en) * 1997-10-21 2000-08-22 International Business Machines Corporation GUI guide for data mining
US6618852B1 (en) * 1998-09-14 2003-09-09 Intellichem, Inc. Object-oriented framework for chemical-process-development decision-support applications
US20020184610A1 (en) * 2001-01-22 2002-12-05 Kelvin Chong System and method for building multi-modal and multi-channel applications

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fortin et al., "An object-oriented approach to multi-level association rule mining", Proceedings of the fifth international conference on Information and knowledge management, Rockville, Maryland, United States, pp.: 65-72, Year of Publication: 1996. *
Mitchell et al., A framework for user-interfaces to databases, Proceedings of the workshop on Advanced visual interfaces Gubbio, Italy, pp.: 81-90. *

Cited By (84)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779027B2 (en) 2000-06-21 2010-08-17 Microsoft Corporation Methods, systems, architectures and data structures for delivering software via a network
US7818677B2 (en) 2000-06-21 2010-10-19 Microsoft Corporation Single window navigation methods and systems
US7712048B2 (en) 2000-06-21 2010-05-04 Microsoft Corporation Task-sensitive methods and systems for displaying command sets
US7673227B2 (en) 2000-06-21 2010-03-02 Microsoft Corporation User interface for integrated spreadsheets and word processing tables
US7900134B2 (en) 2000-06-21 2011-03-01 Microsoft Corporation Authoring arbitrary XML documents using DHTML and XSLT
US20050044486A1 (en) * 2000-06-21 2005-02-24 Microsoft Corporation User interface for integrated spreadsheets and word processing tables
US7689929B2 (en) 2000-06-21 2010-03-30 Microsoft Corporation Methods and systems of providing information to computer users
US9507610B2 (en) 2000-06-21 2016-11-29 Microsoft Technology Licensing, Llc Task-sensitive methods and systems for displaying command sets
US7743063B2 (en) 2000-06-21 2010-06-22 Microsoft Corporation Methods and systems for delivering software via a network
US7979856B2 (en) 2000-06-21 2011-07-12 Microsoft Corporation Network-based software extensions
US8074217B2 (en) 2000-06-21 2011-12-06 Microsoft Corporation Methods and systems for delivering software
US8918729B2 (en) 2003-03-24 2014-12-23 Microsoft Corporation Designing electronic forms
US7925621B2 (en) 2003-03-24 2011-04-12 Microsoft Corporation Installing a solution
US7865477B2 (en) 2003-03-28 2011-01-04 Microsoft Corporation System and method for real-time validation of structured data files
US9229917B2 (en) 2003-03-28 2016-01-05 Microsoft Technology Licensing, Llc Electronic form user interfaces
US7913159B2 (en) 2003-03-28 2011-03-22 Microsoft Corporation System and method for real-time validation of structured data files
US7360215B2 (en) * 2003-05-15 2008-04-15 Sap Ag Application interface for analytical tasks
US7694307B2 (en) * 2003-05-15 2010-04-06 Sap Ag Analytical task invocation
US20040230978A1 (en) * 2003-05-15 2004-11-18 Achim Kraiss Analytical task invocation
US20040230977A1 (en) * 2003-05-15 2004-11-18 Achim Kraiss Application interface for analytical tasks
US7558726B2 (en) * 2003-05-16 2009-07-07 Sap Ag Multi-language support for data mining models
US20040230417A1 (en) * 2003-05-16 2004-11-18 Achim Kraiss Multi-language support for data mining models
US20070101364A1 (en) * 2003-05-27 2007-05-03 Toru Morita Multimedia reproducing apparatus and reproducing method
US7373633B2 (en) * 2003-06-03 2008-05-13 Sap Ag Analytical application framework
US7370316B2 (en) * 2003-06-03 2008-05-06 Sap Ag Mining model versioning
US20040250255A1 (en) * 2003-06-03 2004-12-09 Achim Kraiss Analytical application framework
US20040249867A1 (en) * 2003-06-03 2004-12-09 Achim Kraiss Mining model versioning
US8078960B2 (en) 2003-06-30 2011-12-13 Microsoft Corporation Rendering an HTML electronic form by applying XSLT to XML using a solution
US8892993B2 (en) 2003-08-01 2014-11-18 Microsoft Corporation Translation file
US9239821B2 (en) 2003-08-01 2016-01-19 Microsoft Technology Licensing, Llc Translation file
US7334187B1 (en) * 2003-08-06 2008-02-19 Microsoft Corporation Electronic form aggregation
US7971139B2 (en) 2003-08-06 2011-06-28 Microsoft Corporation Correlation, association, or correspondence of electronic forms
US8429522B2 (en) 2003-08-06 2013-04-23 Microsoft Corporation Correlation, association, or correspondence of electronic forms
US9268760B2 (en) 2003-08-06 2016-02-23 Microsoft Technology Licensing, Llc Correlation, association, or correspondence of electronic forms
US8819072B1 (en) 2004-02-02 2014-08-26 Microsoft Corporation Promoting data from structured data files
US20050183006A1 (en) * 2004-02-17 2005-08-18 Microsoft Corporation Systems and methods for editing XML documents
US20050187973A1 (en) * 2004-02-19 2005-08-25 Microsoft Corporation Managing XML documents containing hierarchical database information
US20080195644A1 (en) * 2004-04-13 2008-08-14 Ramsey Mark S Method, system and program product for developing a data model in a data mining system
US8122429B2 (en) * 2004-04-13 2012-02-21 International Business Machines Corporation Method, system and program product for developing a data model in a data mining system
US20080040310A1 (en) * 2004-04-13 2008-02-14 Ramsey Mark S Method, system and program product for developing a data model in a data mining system
US7281018B1 (en) 2004-05-26 2007-10-09 Microsoft Corporation Form template data source change
US7774620B1 (en) 2004-05-27 2010-08-10 Microsoft Corporation Executing applications at appropriate trust levels
US7676843B1 (en) 2004-05-27 2010-03-09 Microsoft Corporation Executing applications at appropriate trust levels
US7692636B2 (en) 2004-09-30 2010-04-06 Microsoft Corporation Systems and methods for handwriting to a screen
US7712022B2 (en) 2004-11-15 2010-05-04 Microsoft Corporation Mutually exclusive options in electronic forms
US7721190B2 (en) 2004-11-16 2010-05-18 Microsoft Corporation Methods and systems for server side form processing
US7904801B2 (en) 2004-12-15 2011-03-08 Microsoft Corporation Recursive sections in electronic forms
US7937651B2 (en) 2005-01-14 2011-05-03 Microsoft Corporation Structural editing operations for network forms
US7725834B2 (en) 2005-03-04 2010-05-25 Microsoft Corporation Designer-created aspect for an electronic form template
US8010515B2 (en) 2005-04-15 2011-08-30 Microsoft Corporation Query to an electronic form
US8200975B2 (en) 2005-06-29 2012-06-12 Microsoft Corporation Digital signatures for network forms
US20080172735A1 (en) * 2005-10-18 2008-07-17 Jie Jenie Gao Alternative Key Pad Layout for Enhanced Security
US8001459B2 (en) 2005-12-05 2011-08-16 Microsoft Corporation Enabling electronic documents for limited-capability computing devices
US9210234B2 (en) 2005-12-05 2015-12-08 Microsoft Technology Licensing, Llc Enabling electronic documents for limited-capability computing devices
US20070255730A1 (en) * 2006-04-26 2007-11-01 Robert Mack Data requirements methodology
WO2007127096A2 (en) * 2006-04-26 2007-11-08 Robert Mack Data requirements methodology
US20100174720A1 (en) * 2006-04-26 2010-07-08 Robert Mack Coherent data identification method and apparatus for database table development
US7979475B2 (en) 2006-04-26 2011-07-12 Robert Mack Coherent data identification method and apparatus for database table development
WO2007127096A3 (en) * 2006-04-26 2008-12-18 Robert Mack Data requirements methodology
US20100005051A1 (en) * 2006-07-28 2010-01-07 Persistent Systems Limited System and method for inferring a network of associations
US8332347B2 (en) 2006-07-28 2012-12-11 Persistent Systems Limited System and method for inferring a network of associations
US8200589B2 (en) 2006-07-28 2012-06-12 Persistent Systems Limited System and method for network association inference, validation and pruning based on integrated constraints from diverse data
US20090187525A1 (en) * 2006-07-28 2009-07-23 Persistent Systems Private Limited System and method for network association inference, validation and pruning based on integrated constraints from diverse data
US7627432B2 (en) 2006-09-01 2009-12-01 Spss Inc. System and method for computing analytics on structured data
US20090112717A1 (en) * 2007-10-31 2009-04-30 Ryan Steelberg Apparatus, system and method for a brand affinity engine with delivery tracking and statistics
US10545937B2 (en) 2009-07-10 2020-01-28 Robert Mack Method and apparatus for converting heterogeneous databases into standardized homogeneous databases
US8554801B2 (en) 2009-07-10 2013-10-08 Robert Mack Method and apparatus for converting heterogeneous databases into standardized homogeneous databases
US20110231454A1 (en) * 2009-07-10 2011-09-22 Robert Mack Method and apparatus for converting heterogeneous databases into standardized homogeneous databases
US9552380B2 (en) 2009-07-10 2017-01-24 Robert Mack Method and apparatus for converting heterogeneous databases into standardized homogeneous databases
US8874619B2 (en) 2011-06-03 2014-10-28 Robert Mack Method and apparatus for defining common entity relationships
US11893046B2 (en) 2011-06-03 2024-02-06 Robert Mack Method and apparatus for implementing a set of integrated data systems
US11341171B2 (en) 2011-06-03 2022-05-24 Robert Mack Method and apparatus for implementing a set of integrated data systems
US10417263B2 (en) 2011-06-03 2019-09-17 Robert Mack Method and apparatus for implementing a set of integrated data systems
USRE48312E1 (en) 2013-01-21 2020-11-17 Robert Mack Method and apparatus for defining common entity relationships
US20140229491A1 (en) * 2013-02-08 2014-08-14 Arindam Bhattacharjee Converting data models into in-database analysis models
US9552403B2 (en) * 2013-02-08 2017-01-24 Sap Se Converting data models into in-database analysis models
US20150363472A1 (en) * 2014-06-11 2015-12-17 Siemens Aktiengesellschaft Computer system and method for analyzing data
US10599871B2 (en) * 2014-12-23 2020-03-24 Oath Inc. System and method for privacy aware information extraction and validation
US10078761B2 (en) * 2014-12-23 2018-09-18 Oath Inc. System and method for privacy-aware information extraction and validation
US20170098101A1 (en) * 2014-12-23 2017-04-06 Yahoo! Inc. System and method for privacy-aware information extraction and validation
US10579238B2 (en) 2016-05-13 2020-03-03 Sap Se Flexible screen layout across multiple platforms
US10649611B2 (en) 2016-05-13 2020-05-12 Sap Se Object pages in multi application user interface
US20190272340A1 (en) * 2018-03-05 2019-09-05 Honeywell International Inc. System and method to configure a flow algorithm automatically by using a primary element data sheet in multivariable smart line transmitters
US11176183B2 (en) * 2018-03-05 2021-11-16 Honeywell International Inc. System and method to configure a flow algorithm automatically by using a primary element data sheet in multivariable smart line transmitters

Similar Documents

Publication Publication Date Title
US7024417B1 (en) Data mining framework using a signature associated with an algorithm
JP5065056B2 (en) Method, computer program, and system for processing a workflow (integrating data management operations into a workflow system)
US7734607B2 (en) Universal visualization platform
US6865573B1 (en) Data mining application programming interface
US7203929B1 (en) Design data validation tool for use in enterprise architecture modeling
US7149752B2 (en) Method for simplifying databinding in application programs
US8341172B2 (en) Method and system for providing aggregate data access
US7047249B1 (en) Method and apparatus for executing stored code objects in a database
US7536406B2 (en) Impact analysis in an object model
RU2340937C2 (en) Declarative sequential report parametrisation
US8027997B2 (en) System and article of manufacture for defining and generating a viewtype for a base model
US8122044B2 (en) Generation of business intelligence entities from a dimensional model
US20050108684A1 (en) Method and system for generating an application object repository from application framework metadata
US8650152B2 (en) Method and system for managing execution of data driven workflows
US20100030757A1 (en) Query builder for testing query languages
US20040230584A1 (en) Object oriented query root leaf inheritance to relational join translator method, system, article of manufacture, and computer program product
US7376937B1 (en) Method and mechanism for using a meta-language to define and analyze traces
WO2000075849A2 (en) Method and apparatus for data access to heterogeneous data sources
EP3486798A1 (en) Reporting and data governance management
US6938050B2 (en) Content management system and methodology employing a tree-based table hierarchy which accomodates opening a dynamically variable number of cursors therefor
US20060136805A1 (en) Using viewtypes for accessing instance data structured by a base model
US20080184109A1 (en) Generating a relational view for a base model schema
Jaakkola et al. Visual SQL–high-quality ER-based query treatment
US10318524B2 (en) Reporting and data governance management
Jiang et al. Building business intelligence applications having prescriptive and predictive capabilities

Legal Events

Date Code Title Description
AS Assignment

Owner name: HYPERION SOLUTIONS CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RUSSAKOVSKY, ALEXANDER;RODNY, URI;REEL/FRAME:013508/0117

Effective date: 20021029

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: ORACLE INTERNATIONAL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BEA SYSTEMS, INC.;REEL/FRAME:025747/0775

Effective date: 20110202

AS Assignment

Owner name: ORACLE INTERNATIONAL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HYPERION SOLUTIONS CORPORATION;REEL/FRAME:025986/0490

Effective date: 20110202

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553)

Year of fee payment: 12