US20060179026A1 - Knowledge discovery tool extraction and integration - Google Patents

Knowledge discovery tool extraction and integration Download PDF

Info

Publication number
US20060179026A1
US20060179026A1 US11/127,778 US12777805A US2006179026A1 US 20060179026 A1 US20060179026 A1 US 20060179026A1 US 12777805 A US12777805 A US 12777805A US 2006179026 A1 US2006179026 A1 US 2006179026A1
Authority
US
United States
Prior art keywords
value
information
data
tool
data item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/127,778
Inventor
Michael Bechtel
Sanjay Mathur
Jordi Arago
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Accenture Global Services GmbH
Original Assignee
Accenture Global Services GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/051,733 external-priority patent/US20060179024A1/en
Application filed by Accenture Global Services GmbH filed Critical Accenture Global Services GmbH
Priority to US11/127,778 priority Critical patent/US20060179026A1/en
Assigned to ACCENTURE GLOBAL SERVICES GMBH reassignment ACCENTURE GLOBAL SERVICES GMBH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARAGO, JORDI, MATHUR, SANJAY, BECHTEL, MICHAEL E.
Priority to AU2006210140A priority patent/AU2006210140B2/en
Priority to PCT/EP2006/001021 priority patent/WO2006082094A2/en
Priority to EP06706676A priority patent/EP1844407A2/en
Publication of US20060179026A1 publication Critical patent/US20060179026A1/en
Priority to US12/070,457 priority patent/US8356036B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Definitions

  • the present invention relates generally to an improved method for obtaining, managing, and providing complex, detailed information stored in electronic form in a plurality of sources.
  • the invention may find particular use in organizations that have a need to discover relationships among various pieces of information in a given field.
  • An “information space” is the set of all sources of information that is available to a user at a given time or setting.
  • An “information space” is the set of all sources of information that is available to a user at a given time or setting.
  • a user is forced to spend too much overhead on discovering and remembering where different information is located (e.g., web pages, online databases, etc).
  • the user also spends a large amount of time remembering how to find information in each delivery mechanism.
  • each of these data sources typically includes a large volume of files.
  • collecting and integrating information from a particular data source consumes both time and resources.
  • these tools must collect data from many data sources.
  • Each data source added to the process becomes an additional strain on both resources and time.
  • this information must be processed repeatedly to ensure that the data model includes the most current information.
  • Present systems will process a data source in its entirety each and every time an extraction and integration cycle take place. Accordingly, there is a need for a system that doesn't waste time and resources re-integrating information that has already been integrated into the data model.
  • Information in the data model may be overwritten by less reliable data. For example, a particular person's name may be found in both a structured database maintained by the IRS and the text of an email. In present systems, the name sourced from the email may be used to overwrite the name obtained from the IRS if the email is integrated later. Because the information maintained by the IRS is inherently more reliable than the text of an email (because of both source credibility and structured data), there is a need for a system that takes into account the reliability of the information maintained by the data sources before integrating that information into the data model.
  • the present invention provides a robust technique for integrating, from a plurality of data sources, only the necessary, most reliable data into a data model, and automatically discovering inter-relationships among the various elements of the data model.
  • a method for integrating a data item into a knowledge model may include retrieving the data item from a data source, determining if the data item has been previously integrated into the knowledge model, and integrating the data element into the knowledge model if the data item has not been previously integrated.
  • a method of integrating a data item into a knowledge model including data collected from a plurality of data sources may include retrieving a data item from one of the plurality of data sources, the data item including a first type of information, determining a reliability value for the one of the plurality of data sources for the first type of information by either leveraging an existing reliability score indicative of a source's reliability or generating an independent reliability score indicative of a source's reliability, and integrating the data item and the reliability value into the knowledge model.
  • FIG. 1 is a diagram representative of an embodiment of a knowledge discovery tool in accordance with an embodiment of the present invention
  • FIG. 2A is a diagram representative of tables of an exemplary knowledge model in accordance with an embodiment of the present invention.
  • FIG. 2B is a diagram representative of a field-to-field relationship in accordance with an embodiment of the present invention.
  • FIG. 2C a diagram representative of a field-to-text relationship in accordance with an embodiment of the present invention.
  • FIG. 3 is a diagram representative of an exemplary workflow for an extraction tool in accordance with an embodiment of the present invention.
  • FIG. 4 is a diagram representative of an exemplary workflow for a compare tool in accordance with an embodiment of the present invention.
  • FIG. 5 is a diagram representative of an exemplary workflow for an integration tool in accordance with an embodiment of the present invention.
  • FIG. 6 is a diagram representative of an exemplary workflow for an integrate tool in accordance with an embodiment of the present invention.
  • FIG. 7 is a diagram representative of an exemplary workflow for loading the information of a received message in accordance with an embodiment of the present invention.
  • FIG. 8 is a diagram representative of an exemplary workflow for a Thesaurus component in accordance with an embodiment of the present invention.
  • FIG. 9 is a diagram representative of an exemplary workflow for a Merge component in accordance with an embodiment of the present invention.
  • FIG. 10 is a diagram representative of an exemplary workflow for a LookUp component in accordance with an embodiment of the present invention.
  • FIG. 11 is a diagram representative of an exemplary workflow for a Compare component in accordance with an embodiment of the present invention.
  • FIG. 12 is a diagram representative of an exemplary workflow for an Insert component in accordance with an embodiment of the present invention.
  • FIG. 13 is a diagram representative of an exemplary workflow for a Update component in accordance with an embodiment of the present invention.
  • FIG. 14 is a diagram representative of an exemplary relationship generation tool in accordance with an embodiment of the present invention.
  • FIG. 15 is an exemplary screen shot of a navigator tool in accordance with an embodiment of the present invention:
  • FIG. 16 is a diagram of exemplary components of a navigator tool in accordance with an embodiment of the present invention.
  • FIG. 17 is an exemplary layout for a navigation tool in accordance with an embodiment of the present invention.
  • FIGS. 18 A-E are exemplary screen shots of a navigator tool in accordance with an embodiment of the present invention.
  • FIG. 19 is an exemplary screen shot of a navigation toolbar in accordance with an embodiment of the present invention.
  • FIG. 20 is an exemplary screen shot of a history dialogue window in accordance with an embodiment of the present invention.
  • FIG. 21 is an exemplary screen shot of a master options dialog in accordance with an embodiment of the present invention.
  • FIG. 22 is an exemplary screen shot of a search tool in accordance with an embodiment of the present invention.
  • FIG. 23A -B are exemplary screen shots of a navigator with a bookmark list in accordance with an embodiment of the present invention.
  • FIGS. 24 A-L are exemplary screen shots of a wizard service in accordance with an embodiment of the present invention.
  • FIG. 25 is an exemplary screen shot of a monitored items dialog in accordance with an embodiment of the present invention.
  • FIGS. 26 A-E are exemplary screen shots of a filters dialog in accordance with an embodiment of the present invention.
  • FIG. 1 there is shown an embodiment of a knowledge discovery system 100 in accordance with the present invention. While the preferred embodiments disclosed herein contemplate a knowledge model based on an information space for pharmaceutical research and the information and data sources related thereto, the present invention is equally applicable for knowledge discovery for any information space defined in any type of data source. Examples of information spaces include software development, drug development, financial research, governmental data administration, and clinical trials, product development and testing etc.
  • the knowledge discovery system in the embodiment of FIG. 1 includes an extraction tool 120 , an integration tool 130 , a knowledge model 140 , a user information database 145 , a middle tier 150 , and a web server 160 .
  • the extraction tool 120 extracts relevant information from a plurality of data sources 110 a , 110 b , and 110 x .
  • the extraction tool 120 may convert the information into a common format 125 , such as XML.
  • the extraction tool 120 is implemented using BIZTALK SERVER, provided by Microsoft Corporation of Redmond, Wash.
  • the integration tool 140 incorporates the information into the knowledge model 140 .
  • the integration tool is implemented as a COM+ application, using the COMPONENT OBJECT MODEL software architecture provided by Microsoft Corporation of Redmond Wash.
  • the middle tier 150 and optional web server 160 are provided to present the information contained in the knowledge model 140 via a navigator tool 170 .
  • the middle tier is implemented using the .NET framework for Web services and component software provided by Microsoft Corporation of Redmond, Wash.
  • access to the knowledge model 140 via the navigator 170 may be restricted to registered users.
  • User information may be stored in the user information database 145 .
  • the knowledge model 140 defines an information space for pharmaceutical research, and is represented by a relational database consisting of four distinct types of types.
  • Entity tables define the content of the information space.
  • each entity table may include a name field (which may or may not be the primary key for that table) and attribute fields.
  • Exemplary entity tables are shown in FIG. 2A .
  • Field-to-field relation tables define the relationships between the fields in the entity tables.
  • three types of field-to-field relationships exist.
  • a name-to-name relationship relates two name fields from two entity tables.
  • a name-to-attribute relationship relates the name of one entity to an attribute of another entity.
  • An exemplary field-to-field relationship is shown in FIG. 2B .
  • an attribute-to-attribute relationship relates the attribute of one entity to an attribute of another.
  • Field-to-text relationships define the relationships between a fielded entity terms and the text of unstructured data.
  • the data model 140 may include a person table that defines people in the information space and a literature table that includes fields for various information about an article in the information space, but necessarily the text of the article.
  • a text search of the article may be performed to determine if the person is mentioned in the article.
  • An exemplary field-to-text relationship is shown in FIG. 2C .
  • each of the field-to-field relationship tables and the field-to-text relationship tables includes a field for the primary key of each entity referenced as well as managerial data, such as a date created field.
  • the relationship tables are described in more detail below in reference to FIG. 5 .
  • each data source 110 may contain thousands of data items of stored in various types of files—XML, flat-files, HTML, text, spreadsheets, presentations, diagrams, programming code, databases, etc.—that include information belonging to the given domain.
  • each data source 110 may contain documents of any type, created at any point in time.
  • one data source may be provided containing every piece of information to be analyzed.
  • a plurality of data sources may be provided where each data source may contain only documents of certain types, created at discrete segments of time, or created at a certain geographical locations.
  • the extraction tool 120 extracts relevant information from the various data sources 110 .
  • the extraction tool 120 is an asynchronous process that begins processing a file as soon as that file is retrieved from a data source 110 .
  • the extraction tool 120 may be implemented as a batch process.
  • each data source has an associated data source type.
  • each data source may be either an internal data source or an external data source.
  • An internal data source is a data source that is internal to the organization utilizing the knowledge discovery system 100
  • an external data source is a data source maintained by any other organization.
  • the data source type may define the structure of the data source, such as the underlying directory structure of data source or the files contained therein.
  • the data source may be a simple data source consisting of a single directory, or a complex data source that may store metadata associated with each file kept in the data source.
  • the extraction tool 120 connects to each of the data sources 110 through data source adapters.
  • An adapter acts as an Application Programming Interface, or API, to the repository.
  • the data source adapter may allow for the extraction of metadata associated with the information.
  • Exemplary data sources include PUBMED, a service of the National Library of Medicine that includes over 15 million citations for biomedical articles back to the 1950's, SWISS_PROT PROTEIN KNOWLEDGEBASE, which is an annotated protein sequence database established in 1986, the REFERENCE SEQUENCE (RefSeq) collection, which aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products, for major research organisms, KEGG, or the Kyoto Encyclopedia of Genes and Genomes, an ongoing project from Kyoto University, LOCUSLINK, a service of the National Library of Medicine that provides a single query interface to curated sequence and descriptive information about genetic loci, MESH, or Medical Subject Headings, the National Library of Medicine's controlled vocabulary thesaurus, OMIM, or Online Mendelian Inheritance in Man, a database catalog of human genes and genetic disorders, and NLM TAXONOMY, a searchable hierarchical index of names of all the organisms for which nu
  • the files stored in any particular data source 110 may include information relating the information therein.
  • the PUBMED data source 110 may include information 260 relating a particular person to an organization. This information can be used to determine a relationship definition 266 for a particular person 262 and organization 264 in the knowledge model 140 .
  • a field-to-field relationship that has been determined from information obtained from a data source 110 is called a direct relationship.
  • all the field-to-field relationships are determined automatically using information from the data sources 110 .
  • a file may include information relating information in itself to information in other data sources 110 , or relating information in two separate data sources 110 .
  • the extraction tool 120 may include various parameters used to determine whether a document is relevant. These parameters may be predefined or configurable by a user. For example, a user may configure the extraction tool to only extract files from specified directories. It should be apparent to one of ordinary skill in the art that many other relevance parameters—for example, only certain file types or only files that have changed after a certain date—are contemplated by the present invention.
  • the extraction process 120 retrieves files from the data sources 110 .
  • the original files may include large files that are of varying formats.
  • the extraction tool 120 includes a cut tool 310 that will split the original files into smaller records or documents 315 a , 315 b , etc.
  • the cut tool 310 will process the original files such that each record or document 315 a , 315 b includes one and only one data item.
  • the cut tool 310 may generate records or documents 315 a , 315 b that include more than one data item.
  • the original files may also include the information about all items in a single file, separating the information using delimiters. Exemplary delimiters include “///” or a blank line.
  • a configuration file may be provided that details the delimiters used at a particular source.
  • the configuration file may be used by the cut tool 310 to process the original files.
  • the cut tool 310 may include particularized processor application for processing a particular type of original file, such as an XML processor for cutting XML files or a text processor for manipulating text files.
  • these particularized processor applications are implemented as C# objects using the C# object-oriented programming language from Microsoft Corporation of Redmond, Wash.
  • the extraction tool 120 preferably stores the records or documents 315 a , 315 b in a file system.
  • each record may include an identifier, such as an identifier used by the data source to identify the original file.
  • exemplary identifiers include a SWISS_PROT ID or a file name.
  • the extraction tool 120 also generates a global unique identifier for each record or document 315 a , 315 b . The global unique identifier is used for tracking purposes, as described below.
  • the extraction tool 120 may also be provided with a map tool 320 .
  • the map 320 functions to standardize the format of each record or document 315 a , 315 b .
  • the map tool 320 serves two functions.
  • the map tool 320 may create a normalized specification for the records or documents 315 a , 315 b , such as a standardized XML specification.
  • records or documents 315 a , 315 b created from flat files may be transformed into xml files, while records or documents 315 a , 315 b created from XML files may be mapped to the standard XML specification.
  • the map tool 320 may remove information from the record or document 315 a , 315 b that is unnecessary to maintaining the knowledge model 140 .
  • the map tool 320 outputs a single text string of XML.
  • the compare tool 330 of the extraction tool 120 compares the records or documents 315 a , 315 b with those records or documents 315 a , 315 b that have already been integrated into the knowledge model so that only records or documents 315 a , 315 b that are new are further processed.
  • a new record or document 315 a , 315 b includes records or documents 315 a , 315 b that have been integrated into the knowledge model 140 , but have since been modified.
  • previously entered records or documents 315 a and 315 b may include only those records or documents that have been integrated into the knowledge model 140 and have not changed since their integration.
  • compare tool 330 will compute a value based on the record or document 315 a , 315 b .
  • the compare tool 330 uses a hash function to generate a hash value for each record or document 315 a , 315 b .
  • the value may be based any part of the record or document 315 a , 315 b , such as the identifier or the information contained therein.
  • each record or document 315 a , 315 b has an associated identifier, DocumentID, as well as a data source identifier, DataSourceID, that identifies the data source from where the record or document 315 a , 315 b was retrieved.
  • the compare tool generates a hash value, HashCode, for the current record or document 315 a , 315 b .
  • the compare tool 330 compares the DataSourceID and DocumentID for the current record or document 315 a , 315 b to a table of data for previously entered records or documents 315 a , 315 b at block 402 .
  • the compare tool 330 compares the DataSourceID and DocumentID for the current record or document 315 a , 315 b to a table of data for previously entered records or documents 315 a , 315 b at block 402 .
  • the table includes four items for each previously entered record or document 315 a , 315 b : a DataSourceID that identifies the data source; a DocumentID that identifies the record or document 315 a , 315 b ; a first has code value, HashCodeActual, that represents the hash code value for that record or document 315 a , 315 b before it is integrated into the knowledge model 140 , and a second hash code value, HashCodeCompare, that represents the hash code value for that record or document 315 a , 315 b after it has been integrated into knowledge model 140 . If no match is found in the table, this record or document 315 a , 315 b has never been previously integrated into the knowledge model.
  • the compare tool 330 stores the current DataSourceID and Document ID in the table at block 404 . Additionally, the HashCode will be stored as the HashCodeActual value for that record or document 315 a , 315 b . The extraction process 120 will continue to process the record or document 315 a , 315 b at block 406 . Once the record or document 315 a , 315 b is integrated into the knowledge model 140 , the HashCodeCompare value will be updated with the HashCodeActual value at block 408 .
  • the compare tool 330 next compares HashCodeActual to HashCodeCompare for the match. If two values are identical, the record or document 315 a , 315 b has not been modified since its last integration. Accordingly, the record or document 315 a , 315 b is not further processed as shown at block 412 . If the values are different, the record or document 315 a , 315 b has been modified since its last integration. In this case, the compare tool 330 updates the HashCodeActual value with the current HashCode value at block 414 .
  • the extraction process 120 will continue to process the record or document 315 a , 315 b at block 416 . Once the record or document 315 a , 315 b is integrated into the knowledge model 140 , the HashCodeCompare value will be updated with the HashCodeActual value at block 418 .
  • the only records or documents 315 a , 315 b to be processed are new records or documents 315 a , 315 b that have been properly formatted.
  • the information contained therein may contain unnecessary information as a consequence of different data sources using different nomenclatures. For example, an attribute name may be preceded by an asterisk or dash.
  • the record or document 315 a , 315 b may contain HTML tag information.
  • the extraction process 120 is provided with a clean tool 340 that removes this unnecessary information from the records or documents 315 a , 315 b.
  • the parse tool 350 of the extraction tool 120 restructures the information of the record or document 315 a , 315 b .
  • the parse tool 350 may each value into separate tags.
  • the parse tool 350 may unifies the different nomenclatures of the records or documents 315 a , 315 b so that the information from the different sources is coherent. For example, an Organism name may be listed under a first label in one data source 110 and a second label 110 in another data source. The parse tool 350 may standardize this information.
  • the extraction process 120 may store the record or document 315 a , 315 b to be integrated into the knowledge model.
  • the record or document 315 a , 315 b is stored in a database 360 .
  • the record or document 315 a , 315 b may be stored in any manner that is apparent to one of ordinary skill in the art.
  • the record or document 315 a , 315 b is transmitted as part of a message to the integration process 130 .
  • the extraction tool 120 stores the record or document 315 a , 315 b in a database 260 and sends a message that alerts the integration tool 130 that a new record or document 315 a , 315 b has been inserted.
  • the message may be a field in the database 260 which is polled by the integration tool 130 .
  • the integration process is an automatic, asynchronous process that doesn't need the entire extraction process 120 to finish.
  • the integration process 130 may begin integrating a record or document 315 a , 315 b as soon as it is inserted into the database 360 .
  • This entry may be treated and integrated in an individual way and is passed through several components whose purpose is to integrate this source register into the knowledge model 140 .
  • the integration tool 130 provides the users with more complete and higher quality information than the data sources 110 alone.
  • the integration tool 130 only processes new records or documents 315 a , 315 b because the extraction tool 120 has removed those records or documents 315 a , 315 b that have not been updated since the prior integration. This greatly improves the performance of the integration tool 130 , reducing the time necessary to complete the integration process.
  • the integration tool 130 is equally capable of integrating any types of records or documents 315 a , 315 b , regardless of whether they have been integrated previously.
  • the integration tool 130 may receive information to integrate in three ways.
  • the integration tool 130 may receive information from the extraction tool 120 .
  • the extraction tool 120 may process a record or document 315 a , 315 b from a data source, insert the record or document 315 a , 315 b into a database 360 , and alert the integration tool 130 of the presence of the new information.
  • the integration tool 130 may retrieve the information from the database 360 .
  • the integration tool 130 may receive information from a re-integration batch process.
  • the re-integration batch process may build a message (of a similar format to those generated by the extraction process 130 ) that alerts the integration process 130 to the presence of a record or document 315 a , 315 b that could not be integrated into the knowledge model 140 during a previous attempt.
  • custom applications may be developed to alert the integration tool 130 of information from particular data sources 110 that do not require the full functionality of the extraction tool 120 .
  • an internal data source 110 may be provided that includes files that adhere to a particular structure designed to ease the integration process. It should be apparent to one of ordinary skill in the art that any method may be used to introduce a record or document 315 a , 315 b to the integration tool 130 .
  • the integration tool 130 may be provided with an integrate tool 500 .
  • the integrate tool 500 performs four primary processes. First, the integrate tool may retrieve a record or document 315 a , 315 b from the database 360 . Next, the integrate tool 500 may perform a spell check function 510 on the data included in the record or document 315 a , 315 b to ensure that misspellings in the original data source 110 files do not effect the integrity of the knowledge model 140 . Similarly, the integrate tool 500 may perform a synonym function 520 to determine if the current term (as used in the record or document 315 a , 315 b ) is a synonym for a preferred name.
  • the integrate tool 500 may perform a merge function 530 that integrates the record or document 315 a , 315 b into a database 540 .
  • the database 540 represents a un-optimized version of the knowledge model 140 .
  • a particular embodiment of the integrate tool 500 is discussed in more detail below in reference to FIGS. 9-13 .
  • the integration tool 130 may also be provided with various batch-process tools to perform various functions on the information in the database 540 .
  • the integration tool 130 includes a relationship generation tool 550 that may be used to analyze the information in the database 540 .
  • the relationship generation tool 550 is discussed in more detail below in reference to FIG. 14 .
  • a synonym synchronization tool 560 may run periodically to update the information in the database 540 in accordance with the most recent list of synonyms.
  • a transition tool 570 may be provided to optimize the information in the database 540 to create the knowledge model 140 .
  • the transition tool 570 may denormalize the information in the database 540 , generate cross-over tables, build indices on clustered indices on the primary key columns of various tables of the database 540 , and optimize the database 540 for queries and data retrieval tasks.
  • the transition tool 570 generates a database 580 that is replicated in a production environment as the knowledge model 140 .
  • the extraction tool 120 may send a message to the integrate tool 130 to inform the integration tool 130 that new entries in the database 360 need to be integrated into the knowledge model 140 .
  • the message may also indicate that the entries are from a particular data source 110 .
  • the integrate tool 500 creates an XMLDocument object.
  • the XMLDocument object is a working version of a standard configuration file.
  • each data source has a standard configuration file in XML that acts as template for the integration tool 130 .
  • An exemplary configuration file is shown in Table 1. It should be apparent to one of ordinary skill in the art that various types of configuration files in other formats are contemplated by the present invention.
  • the configuration file includes various attributes that are used in later stages of the integration process.
  • the exemplary configuration file includes five attributes, a Thesaurus attribute, a LookUp attribute, a Compare attribute, an Insert attribute, and an Update attribute.
  • the thesaurus attribute includes information in the record that need to be checked for spelling and/or synonyms.
  • the thesaurus attributes define a field name to be checked and the values for that field name. This value will appear in ThesaurusSP and SpellingSP attributes if the value needs to be checked for synonyms or spelling, respectively. If both the value needs to be checked for both spelling and synonyms, it will appear in both attributes.
  • the LookUp attribute defines each field in the database 360 and the name of a procedure that can be used to lookup the associated row in the knowledge model 140 .
  • the Compare attribute defines the field in the database 360 and its corresponding field in the knowledge model 140 .
  • the Insert attribute defines each field in the database 360 and its corresponding confidence value, as described below.
  • the Update attribute defines each field in the database 360 , its corresponding confidence level, the field type, and the corresponding field in the knowledge model 140 and its corresponding confidence value.
  • An update type implies that the value of the field should be replaced in its entirety if a new record or document 315 a , 315 b is to replace an existing entry in the knowledge model 140 .
  • An append type implies that the information in the new record or document 315 a , 315 b should be appended to the current information.
  • each field includes an associated confidence value.
  • the confidence value is used score the reliability of the data sources 110 for each field of the knowledge model 140 .
  • multiple data sources 110 may include information for one field of the knowledge model 140 .
  • the confidence value is used to determine which data source is more reliable for a given field.
  • the confidence value may reflect an internal view of the reliability of the data sources 110 (i.e. the view of the system developers or the organization utilizing the knowledge discovery system 100 ) or may reflect an external view of reliability (i.e. the use of a third party reliability standard).
  • the confidence value is a numerical value from 1-20 where the confidence value increases with the reliability of the data source 110 .
  • each of the plurality of data sources 110 is ranked from 1 to N for each field of the knowledge model, where N is the number of data sources 110 .
  • multiple data sources 110 may be equally reliable and therefore have the same confidence value.
  • the integration tool 130 may chose the most recent record or document 315 a , 315 b as controlling.
  • the integration tool 130 may only replace a field if the confidence value of the new record or document 315 a , 315 b is greater than the current entry.
  • a confidence value configuration file is provided.
  • the confidence value configuration file may define a confidence value for each field of the knowledge model 140 and for all data sources 110 .
  • a separate confidence value configuration file may be provided for each data source 110 .
  • An exemplary XML confidence value configuration file is shown in table 2. In the exemplary confidence value configuration file, each field of each table from each data source 110 is ranked. TABLE 2 Sample XML Confidence Value Configuration File ⁇ Table> ⁇ DataSource1> ⁇ field1> ConfidenceValue ⁇ /field1> ... ⁇ fieldn) ConfidenceValue ⁇ /fieldn> ⁇ /DataSource1> ... ⁇ /Table>
  • the integrate tool 500 reads the configuration file for the data source identified in the message at block 702 .
  • a check is performed to determine if an XMLDocument object for this data source is cached at block 704 . If so, the XMLDocument object is retrieved from the cache at block 706 , and the information from the message is used to populate the ConfigFileContent property of the XMLDocument at block 708 .
  • the integrate tool 500 will create a new XMLDocument object and load it with the configuration file information at block 710 , put the new XMLDocument in the cache at block 712 , and populate the ConfigFileContent property of the XMLDocument with the information from the message at block 708 .
  • the integrate tool 500 after loading the received message into an XMLDocument object at 602 , the integrate tool 500 next checks to see if the message contains a record or document 315 a , 315 b that needs to be integrated into the knowledge model at block 604 . If the message does not contain any additional records or documents 315 a , 315 b that need to be integrated, the process ends at block 606 . If the message does contain a record or document 315 a , 315 b that needs to be integrated, the integrate method retrieves that record or document 315 a , 315 b from the database 360 at block 608 . Next, the integrate tool 500 calls the thesaurus component to perform the spelling function 510 and synonym function 520 at block 610 .
  • the thesaurus component includes an internal source, such as a database, with containing information on commonly misspelled words and synonyms or preferred words. In either case, the thesaurus component will replace the misspelled or non-preferred word with the proper word.
  • an external source may be used by the thesaurus component.
  • the Thesaurus component retrieves the field names from the XMLDocument Thesaurus attribute at block 802 .
  • the Thesaurus component will check to determine if any more fields need to be checked at block 804 . If no more fields need to be checked, the Thesaurus component will exit at block 806 . If a field needs processing, the Thesaurus component will retrieve the corresponding ThesaurusSP and SpellingSp values at block 808 .
  • the Thesaurus component will retrieve the word to check at block 810 , and call the SpellingCheck procedure at block 812 .
  • the SpellingCheck procedure first determines if the SpellingSp value is non-blank at block 814 .
  • the SpellingSP procedure is executed at block 816 .
  • the SpellingSp procedure checks the SpellingSp value against a spellings table that includes the correct word and various misspellings. When the correct word is found, it is substituted for the old value at block 818 .
  • the Thesaurus component moves on to the ThesaurusCheck procedure at block 820 . Similar to the SpellingSp procedure, the ThesaurusCheck procedure first determines if the ThesaurusSP value is non-blank at block 822 .
  • the ThesaurusSP procedure is executed at block 824 .
  • the ThesaurusSP procedure checks the ThesaurusSP value against synonym table that includes a preferred word and various synonyms. When the correct word is found, it is substituted for the old value at block 824 .
  • the Thesaurus component then returns to block 804 to determine if any additional fields need to be checked, and continues to loop until all the fields have been processed.
  • the record or document 315 a , 315 b is passed to the Merge component at block 612 .
  • the knowledge model 140 typically includes more information on a given entity than any single data source 110 .
  • the Merge component is used to update the knowledge model 140 with the new records or documents 315 a , 315 b stored in the database 360 and assimilate the various pieces of information from the various data sources 110 .
  • the Merge component takes a single record or document 315 a , 315 b and uses it to fill a single row in the database 540 .
  • the Merge component has to determine if the information provided by the record or document 315 a , 315 b complements the existing information or it represents new information. Depending on the comparison, the record or document 315 a , 315 b is either inserted into the database 540 as a new row or used to update the contents of an existing row. In one embodiment, four tools are used to accomplish these tasks.
  • the Merge component may include a LookUp component that is used to determine if the record or document 315 a , 315 b can be integrated into the knowledge model and if the record or document 315 a , 315 b is entirely new, for example, if there is now row in the database 540 that corresponds to this record or document 315 a , 315 b . If a row exists that corresponds to this record or document 315 a , 315 b , the Merge component may utilize a Compare component to determine if the existing row in the database 540 includes null values in the fields to be modified by the record or document 315 a , 315 b to be processed. If not, a new row may be added to the database 540 .
  • a LookUp component that is used to determine if the record or document 315 a , 315 b can be integrated into the knowledge model and if the record or document 315 a , 315 b is entirely new, for example, if there is now row in the
  • an Insert component may be used to add a new row or an Update component may be used to update a row.
  • the Merge component calls the LookUp component at block 902 , which determines if the record or document 315 a , 315 b can be integrated at block 904 . If the record or document 315 a , 315 b cannot be integrated, the Merge component returns this information to the integrate tool 500 at block 906 and exits at block 908 . If the record or document 315 a , 315 b can be integrated, the LookUp component then determines if the record exists at block 910 . If not, the record or document 315 a , 315 b is then passed to the Insert component at block 912 , and the Merge component ends at block 908 .
  • the Compare component is called to determine if the record exists with null information at block 916 . If the record does not include null information, the record or document 315 a , 315 b is passed to the Insert component at block 912 and the Merge component exits at block 908 . If the record does not include null information, the record or document 315 a , 315 b is passed to the Compare component at block 918 and the Merge component exits at block 908 .
  • the LookUp component retrieves the StoredProcedure attribute from the XMLDocument object, as described above, at block 1002 .
  • the LookUp component retrieves the first field information from the database 360 which need to be checked at block 1004 .
  • the LookUp component determines if any additional fields need to be processed. If so, the LookUp component compiles a dataset of all the values that need to be looked up. To do this, the LookUp component retrieves the additional field from the value at blocks 1008 and 1010 , and determines the corresponding table in the database 540 for this field at block 1012 .
  • the LookUp component performs a lookup function on the value for the fields at block 1016 and determines if the ID for that value is found at block 1018 . If the ID is not found, the LookUp component checks the record to be re-integrated later at block 1020 , informs the integrate tool 500 that the record could not be integrated at block 1020 , and exits at block 1024 . If the ID is found, the LookUp component will return to block 1006 and continue compiling the list of fields to look up. Once there are no additional fields to look up, the LookUp component determines if the records exist at block 1022 and exits at block 1024 .
  • the Compare component retrieves the XMLDocument Compare attribute at block 1102 .
  • the Compare component compiles a dataset of all the values in the record that need to be compared at blocks 1104 , 1106 and 1108 . Once this dataset is compiled, the Compare component determines if any values in this dataset are included in the dataset determined by the LookUp component at block 1110 . If so, those records are returned to the Update component, as described above, at block 114 and exits at block 1116 . If the values are not the same, the Compare component then determines if the values are null. If so, those records are returned to the Update component, as described above, at block 114 and exits at block 1116 . If the values are not null, the Compare component exits at block 1116 .
  • an exemplary workflow for an Insert component is shown.
  • the Insert component retrieves the stored procedure name that performs the actual inserts at block 1202 .
  • the Insert component retrieves the field values and confidence levels from the XMLDocument object, as well as the values from the database 360 for the record to be inserted at block 1204 .
  • the Insert component builds a call to the stored procedure to insert the new information at block 1206 .
  • the call is executed at block 1208 .
  • the Update component retrieves the name of the stored procedure that performs the actual update at block 1302 .
  • it reads the Update attribute from the XMLDocument object at block 1304 .
  • a check is performed to determine if there any more fields in the Update attribute that need to be processed at 1306 . If so, the Update component retrieves the field value and corresponding confidence level from record or document 315 a , 315 b at blocks 1314 and 1316 , respectively. It then retrieves the confidence level of the current entry in the knowledge model 140 , and compares the two confidence values at block 1320 .
  • the Update component continues in this manner until all of the update fields have been processed. When there are no additional fields to process, the Update component builds the procedure call at block 1308 , executes the call at block 1310 , and exits at block 1312 .
  • the merge component can be used to merge entities or relationships.
  • a potential problem could arise if the system attempts to merge a relationship before one of entities of the relationships exists in the knowledge model 140 , such as a relationship that defines a relation between entities a and b before entity b exists in the knowledge model 140 .
  • the re-integration batch process described above may be used to reintroduce these records or documents 315 a , 315 b at a later time.
  • the records or documents 315 a , 315 b may be deleted if their ‘age’ reaches a particular level, for example, 10.
  • either the integration or re-integration process may determine if a record or document 315 a , 315 b covering the same field and from the same data source 110 has been integrated subsequently. If so, the integration of the ‘old’ record or document 315 a , 315 b is no longer necessary, and it may be deleted.
  • the relationship generation tool 550 includes three components.
  • the field-to-text relationship tool 1410 generates the field-to-text relationships, as described above.
  • the field-to-text relationship tool 1410 reads each name field from every entity table. For each name field, the field-to-text relationship tool 1410 executes a stored procedure that searches for the given name in various other fields of the entity tables. For example and with reference to FIGS.
  • the field-to-text relationship tool 1410 may select the name field from person entity table and search for that entry in the title and abstract fields of the literature entity table. If a match is found, a field-to-text relationship may be added to the field-to-text relationship table. Alternatively, or in addition to, the field-to-text relationship tool 1410 may retrieve the full text of the article referenced by the literature table (even though the article is not necessarily stored in the knowledge model 140 ) and perform a similar search. It should be apparent to one of ordinary skill in the art that the field-to-text relationship tool 1410 may be configured to select any set of fields from the entity tables and search any other fields in the entity tables. Additionally, the field-to-text relationship tool 1410 may be configured to search the text of unstructured data that is not referenced in any entity in the knowledge model.
  • the relationship generation tool 550 may also be configured to derive relationships by analyzing the data of the knowledge model 140 . These types of relationships are referred to herein as derived relationships.
  • the relationship generation tool may include a transitive relationship tool 1420 .
  • the transitive relationship tool 1420 determines transitive relationships.
  • a transitive relationship is defined as any relationship between two entities that is based on at least two separate relationships.
  • a direct relationship is a relationship that has been determined from information in a data source 110 . These direct relationships may be stored in a direct relationship table. In one embodiment, the transitive relationship tool 1420 selects each row in the direct relationship table.
  • the transitive relationship tool 1420 may search every other row in the direct relationship table for a match. If a match is found, a new relationship is created to reflect the commonality. For example, if a direct relationship is defined between field A and field B, the transitive relationship tool 1420 may search the other rows of the direct relationship table for a match on field A. If a match is found, for example, relating field A to field C, the transitive relationship tool 1420 may create a transitive relationship relating field B to field C. This is an example of a single hop transitive relationship. Preferably, the transitive relationship tool 1420 uses a search depth algorithm to calculate the transitive relationships across n hops. In one embodiment, the transitive relationship may be stored in a transitive relationship table. Alternatively, the transitive relationship may be stored in the same table as the direct relationships. In one embodiment, the transitive relationship definition includes information detailing each hop from the two related entities.
  • the relationship generation tool 550 may also include a proximity relationship tool 1430 . Similar to the field-to-text relationship tool 1410 , the proximity relationship tool 1430 searched the text of either fields in the knowledge model 140 or unstructured files, such as articles. The proximity relationship tool 1430 creates a proximity relationship if two entities appear in the same text. In one embodiment, indexes are created for all the text to be searched (i.e. specific field values or unstructured data items). The indexes are then used to determine if two entities appear in the same text. Alternatively, or in addition to, the proximity relationship tool 1430 may be configured to generate a proximity relationship if the entities appear within a given proximity of each other in the text, for example, within n words of each other.
  • a proximity relationship may be dependent on the type of file being examined. For example, if a text file is be used, a proximity relationship may be generated if the words fields appear within the same paragraph. If, however, the file being searched is a spreadsheet, the proximity relationship tool 1430 may generate a proximity relationship if the two fields appear in same cell, row, or column. In one embodiment, the proximity relationship tool 1430 stores the proximity relationship definition as well as information detailing the rationale behind the generation of the relationship. For example, to define a proximity relationship between two fields, the proximity relationship tool 1430 may store each field, the criteria used to determine the relationship, and the article or reference in which the use of the fields met the given criteria.
  • the navigator tool 170 is a graphical user interface that allows the user to select a record or item from one of a table of the knowledge model 140 and, in response to the selection, display a set of related items or records. Preferably, and only registered users may access the knowledge model 140 . It should be apparent to one of ordinary skill in the art that other implementations of the navigator tool 170 are contemplated herein.
  • the user may be initially directed to a log in to the navigator tool 170 in order to access the data stored in the knowledge model 140 . To do so, the user may enter a valid username and password combination. The user may then submit this information to be validated against a database of user information, for example, the user information database 145 .
  • the user may be allowed to select an option to store the username and password information for future log in attempts.
  • the navigator tool 170 includes a toolbar 1510 and a navigation area 1520 .
  • the toolbar 1510 may provide access to a variety of functions of the navigator tool 170 via corresponding interface objects, such as a navigation functions.
  • the toolbar and various capabilities accessible via the toolbar are described in more detail below in reference to FIGS. 19-26 .
  • the navigation area 1520 includes nine visually separated panels 1530 .
  • Each panel 1530 contains information corresponding to an entity of the knowledge model 140 .
  • the information contained in each panel may be referred to as an Item.
  • the Item in the center, or active, panel 1530 may display a single Item.
  • Each of the remaining panels 1530 may display zero, one or more Items for a particular entity table of the knowledge model 140 that relate to the Item in active panel 1530 .
  • each Navigator component 1602 , 1702 is the main component that will contain the rest of the components and manage the interface among all the other components of the navigator tool 170 .
  • each Navigator component 1602 , 1702 comprises a ToolTipPanel component 1604 , 1704 , one to nine EntityPanel components 1606 , 1706 , one or more RelationLine components 1620 , 1720 , and an Information Panel component 1622 , 1722 .
  • the ToolTipPanel component 1604 , 1704 may include summary and supporting attribute information about an Item.
  • ToolTipPanel components 1604 , 1704 are implemented as pop-up boxes that appear when a user mouses-over an Item.
  • a ToolTipPanel component 1604 , 1704 for an Item describing a person might contain their age, level within their company, hire date, email address, and the like.
  • the ToolTipPanel component 1604 , 1704 associated with the active Item may be permanently displayed below the Item name.
  • the EntityPanel component 1606 , 1706 includes information corresponding to an entity of the knowledge model 140 .
  • each EntityPanel component 1606 , 1706 consists of a TitleBar component 1608 , 1708 and a body component 1610 , 1710 .
  • the TitleBar component 1608 , 1708 may include information about the entity, such as an entity name, icon for the entity.
  • the Body component 1610 , 1710 may include information about the Items in an entity table.
  • the Body component 1610 , 1710 includes one or more EntityItem components 1614 and a DataList component 1616 .
  • Each EntityItem component 1614 , 1712 includes information for an item being displayed in the EntityPanel component 1606 , 1706 .
  • the TitleBar component 1608 , 1708 may include node counter information that shows how many Items from the particular entity table are related to the Item in the active panel 1606 , 1706 as well as which items are currently visible.
  • both the EntityItem components 1614 , 1714 and TitleBar components 1608 , 1708 may be associated with a PopUpMenu components 1612 , 1712 which provide access to various functions associated with the the EntityItem components 1614 , 1714 and TitleBar components 1612 , 1712 , respectively.
  • the navigator tool 170 may include a toolbar 1810 and a navigator component 1820 .
  • the navigator component 1820 includes the elements described above in regard to FIGS. 16 and 17 .
  • the navigator component 1820 includes nine entity components 1830 , each including a title component 1834 and a body component 1836 .
  • the title component 1834 includes the name of an entity table and, where applicable, a node counter that displays the total number of items 1840 included in the corresponding entity components 1832 .
  • the navigator tool 170 may be implemented as a graphical user interface that allows the user to select a record or item from one of a table of the knowledge model 140 and, in response to the selection, display a set of related items or records.
  • the center entity component 1832 represents the active or selected node 1838 and includes the name of the active node 1838 .
  • the name of active node 1838 may be truncated.
  • the navigator tool 170 may be configured to display a pop-up window displaying various information about the active item 1838 upon a predetermined event, such as an activation of the item 1838 via a single-click, double-click, mouse-over, and the like.
  • the same functionality may be provided for the related nodes 1840 .
  • the remaining entity components 1832 may be used to display those related items 1840 in the knowledge model 140 related to the active node 1838 , for example, by displaying the name of the related item 1840 .
  • indicia of the link type associating each related item 1840 to the active node 1838 may be included.
  • a roman numeral indicating the type of link is used to indicate the link type.
  • direct, or field-to-field, links may be designated by the roman numeral “I”, field-to-text links by the roman numeral “II”, transitive links by the roman numeral “III,” and proximity links by the roman numeral “IV.”
  • Other exemplary indicia may include using associated font colors, font sizes, or any other visual indicator.
  • the navigator tool 170 may query the knowledge model 140 to determine the related items 1840 in response to the selection of the active node 1838 .
  • queries are performed via a batch process that determines all related items 1840 for each item 1830 of the knowledge model.
  • the queries may be saved, for example in a database table, to vastly improve the performance of the navigator tool 170 .
  • Each entity component 1832 is associated with a particular table of the knowledge model 140 .
  • each entity component 1832 displays all the related items 1840 for the associated table of the knowledge model 140 .
  • the user will be allowed to select the type of entity being displayed in any particular entity component 1832 by associating that entity component 1832 to any table in the knowledge model 140 .
  • the user may configure the entity components 1832 to display the tables of interest to that particular user.
  • the associations of entity components to knowledge model 140 tables may be stored.
  • each entity component 1832 may be configured to display a set number of item 1840 at a given time.
  • navigation tools such as a scroll bar or navigation arrows, may be provided to allow the user to access the entire list of related items 1840 .
  • the entity component 1832 may include node 1840 count information to inform the user of the additional though not visible items 1840 .
  • the entity component 1832 also includes information describing which related items 1840 of the set are currently being displayed. For example, the entity component 1832 may show that items 1840 three through nine of eighty-six total items 1840 are currently being displayed.
  • a scrollbar or other user-interface control may be included to provide access to the items 1840 not being displayed.
  • each entity component 1832 may include tools to manipulate the related items 1840 contained therein.
  • each entity component includes a sort button 1842 .
  • the user may activate the sort button 1842 to sort the list of related items 1840 alphabetically or by confidence level. Other criteria such as date restrictions and the like may also be used to sort the related items 1840 .
  • the entity component may also include a filters button 1844 which opens the master filters dialog for the corresponding entity, described in more detail below in reference to FIGS. 26 A-E.
  • each entity component 1832 may be associated with an entity type of the knowledge model 140 .
  • the user may change the entity table associated with any entity component 1832 that displays related items 1840 .
  • the user may activate a menu, that includes a list of all possible entity tables of the knowledge model 140 that may be associated with the particular entity component 1832 . This menu may be activated, for example, by selecting the appropriate triangle icon 1848 on the title component 1834 .
  • Other methods of changing the associations between an entity components 1832 and entity tables of the knowledge model 140 are contemplated herein.
  • the activation of a particular related item 1840 may cause additional information about that item 1840 and its relationship to the active item 1838 to be displayed.
  • the selection of a related item 1840 may cause a ToolTipPanel component 1850 to be displayed that shows summary information for the related item 1840 .
  • a relationship line 1852 between the related item 1840 and the active item 1838 may also be displayed upon activation of the related item 1840 .
  • the color and style of the relationship line 1852 indicates the type of relationship between the two items. For example, a continuous green line may indicate a field-to-field link, a dashed blue line may indicate a field-to-text link, a dashed and dotted yellow line may indicate a transitive relationship, and a dotted red line may indicate a proximity relationship. It should be readily apparent to one of ordinary skill in the art that the relationship type may be indicated using color, style, size, and the like, or any combination therein.
  • the user may select any of the related items 1840 to make that item the active node 1838 .
  • the navigator tool 170 may update the display accordingly.
  • the navigator tool 170 may submit a new query or retrieve saved queries from the knowledge model 140 and display the related items 1840 to the new active item 1838 .
  • the user may drag-and-drop a related item into the center entity panel to make that item the active item 1838 .
  • the user may access a variety of item-related options via a pop-up menu 1854 , for example, by right clicking on an item.
  • the pop-up menu 1852 provides access to functions create a bookmark to an item, make an item the home item, email a link to an item, monitor an item, and show link evidence for a related item 1840 .
  • a bookmark is a link to a particular item. Bookmarks are stored in a list of bookmarks accessible via the bookmark button of the navigator toolbar 1810 , described in more detail below.
  • the home item is a special bookmark that can be loaded into the navigator tool by pressing the home button of the navigator toolbar 1810 . Items may be emailed to an individual by selecting the email link option.
  • selecting the email link option launches the default mail program, creates a new e-mail with a system generated introduction, and places the link to the item into the new e-mail message. Additionally, the user may select an item to monitor via the pop-up menu. As described in more detail below, the system 100 may monitor items and notify the user of updates and/or changes to the items. When a user denotes an item to monitor, a date stamp may be created and saved with item information to be used by the system 100 for monitoring.
  • link information for field-to-field links may include the data source from which the link was extracted.
  • Link information for field-to-text links may include a short part or clip of the literature text that surrounds the keyword. In one embodiment, the clip length should user configurable.
  • the clip length may be initially set to be N words total, such that (N ⁇ 1)/2 words preceding the item keyword and (N ⁇ 1)/2 words following the item keyword are included.
  • the clip may inlcude the 15 words preceding and following the item keyword.
  • the link information may inlcude each field-to-field link information for each hop included in the link.
  • link information for proximity links may inlcude the title of the article which mentions both items, as well as a clip for showing each item in context.
  • the navigator tool 170 may include a navigation toolbar 1810 .
  • One embodiment of the navigation toolbar 1810 is shown in FIG. 19 .
  • the navigation toolbar 1510 may contain icons and controls which enable the user to access and configure the various services of the navigator tool 170 .
  • the navigation toolbar 1510 may include a back button 1910 , a forward button 1912 , a stop button 1914 , a refresh button 1916 , a home button 1918 , a history button 1920 , a signoff button 1922 , a help button 1924 , an about button 1926 , a search button 1928 , a wizards button 1930 , a bookmarks button 1932 , a monitored items button 1934 , a filters button 1936 , a source filters drop-down list 1936 , a confidence level tool 1940 , a context drop down list 1942 , and an options button 1944 .
  • a back button 1910 may be used provide access to the functions described below.
  • the navigation tool 170 provides basic navigational functions via the navigation buttons.
  • the back button 1910 and forward button 1912 may be provided to allow the user to step through their recent navigation history backwards and forwardly, respectively.
  • Activating the stop button 1914 may cancel the submission of a query to the knowledge model 140 .
  • a command is issued to the knowledge model 140 to abort query processing.
  • Preferably, all current client and server processing activity is stopped.
  • Activating the refresh button 1916 may allow the user to manually refresh their current view (for example, by resending a query to the knowledge model 140 ) and update the display of related item 1840 based on the new results.
  • a home button 1918 may be provided that takes the user to their home view (i.e. home item).
  • the home view is a set node.
  • the home view may be user customizable.
  • a history dialog button 1920 may also be provided to launch a history dialog window.
  • a history dialogue window is shown in FIG. 20 .
  • the dialog window 2000 may show the user's recent navigation history, such as a list of navigation events 2010 .
  • both the node name and entity name are displayed.
  • the user may be able to highlight a navigation event and click a “show” button 2020 to refocus the navigator 170 on that item by making that item the active node 1838 .
  • the user may be able to double-click on a history item and refocus the navigator on that item.
  • the user may close the history dialogue window 2000 by selecting the close button 2030 .
  • the navigator tool 170 may save a set number of history events. This number may be user-configurable.
  • the history events may be stored in the user information database 145 to make the history events session independent and persistent.
  • the user may be logged out of the navigator tool 170 .
  • the help button 1924 the user may be provided access to a help system, as known in the art.
  • selection of the help button 1924 may cause an html based help system to be launched in a separate window.
  • a window containing information about the knowledge discovery tool 100 or navigator tool 170 may be opened upon selection of the about button 1926 .
  • This information may include version information, such as a revision number, intellectual property information, such as copyright, patent and/or licensing information, and the like.
  • the options button 1944 may launch the master options dialog.
  • One embodiment of the master options dialog 2100 is shown in FIG. 21 .
  • the master preferences dialog 2100 includes a startup view preference 2110 , a navigation history preference 2120 , a related items limit preference 2130 , an animations preference 2140 , a reset button 2150 , an ok button 2160 , and a cancel button 2170 .
  • the startup view preference 2110 allows the user to select what they want to see upon starting the navigator tool 170 .
  • three options are provided: search, last item visited and home item. If the search option is selected, the navigator tools 170 opens with a search dialog, discussed below in more detail. If the last item visited option is selected, the navigator tool 170 opens with the active node 1838 from when the navigator was last closed. In one embodiment, all filter, confidence, and entity component 1832 association settings may also be preserved. Filter and confidence settings are described in more detail below. Finally, if the home item option is selected, the navigator tool 170 will open with the home item as the active node 1838 . Preferably, the home item startup option is the default option and the home view is set to a standard node.
  • the navigation history preference 2120 defines the number of navigation events stored for the navigation session. In one embodiment, the default value is set to 10. Alternatively, or in addition to, the navigation history preference 2120 may have a maximum value, for example, 30 events. Preferably, the navigation history preference 2120 is implemented as a drop down box.
  • the related items limit preference 2130 controls the number of records which can be returned to each entity panel 1932 in the navigator tool 170 from a query. In one embodiment, a default value is selected to optimally balance performance and quality of the results returned.
  • the animations preference 2140 may allow the user to enable or disable animation rendering effects in the user interface.
  • the animations preference 2140 is implemented as a checkbox and is selected by default.
  • An ok button 2150 may be provided to accept the currently selected preferences, and a cancel button 2160 may be provided to close the dialog 2100 without changing preferences.
  • the search button 1928 may launch a search tool that allows the user to perform a keyword search of the knowledge model 140 .
  • the search dialog may include the appropriate user interface tools to allow the user to specify a search term(s) for querying the knowledge model 140 .
  • a search tool 2200 is shown in FIG. 22 .
  • To perform a search a user may enter one or more keywords of interest in the search term field 2210 .
  • the search will perform a literal search for the entered search terms.
  • a ‘*’ character acts as a wildcard identifier and denotes multiple characters.
  • a search for the keyword “ind*” may cause the knowledge model 140 to search for all terms starting with the text “ind.”
  • the user may also be able to select the type of information they are looking for by checking an entity type from those listed in the menu 2220 of checkboxes below the search field 2210 .
  • one may restrict the results of a search to diseases, genes or literature by selecting the appropriate items in the menu.
  • the user may further refine a search target by selecting “Internal, External, or Both” under the literature entity.
  • the navigator tool 170 searches against all entities by default.
  • the user may click the find button 2212 .
  • the system 100 performs a free-text search against the information stored in the knowledge model 140 .
  • the results are shown in the Search Results field 2230 .
  • the search results include a description 2232 of the item and the entity table 2234 to which it belongs.
  • the user may also be able to view more detailed information in the description field 2240 by selecting the item from the list.
  • the selection of an item is made via a single click on any of the search results.
  • the results may be sorted by name or by type by clicking on the header of the appropriate fields 2232 and 2234 .
  • the user may be able to view the source of a particular search result by clicking the View Web Page button 2250 .
  • the Show button 2252 shows the selected item in the navigation window, making it the active node 1838 .
  • the user may double-click a particular search result to make that item the active item 1838 .
  • the Close button 2254 will close the search dialog box.
  • a bookmarks button 1930 may also be provided on the navigator toolbar 1510 .
  • bookmarking an item allows the user to save links to previously viewed items to enable their quick retrieval later.
  • Clicking the Bookmark button 1930 may cause a list of saved bookmarks to be displayed.
  • An exemplary screen shot of the navigator tool 170 with a bookmark list 2310 is shown in FIG. 23A .
  • the bookmark list 2310 includes a list of bookmarks 2312 . Selection of a bookmark 2312 may cause the item that is bookmarked to become the active item 1838 of the navigator tool 170 .
  • bookmarks 2312 include a name.
  • the bookmark 2312 may have the same name as the item that is being bookmarked.
  • the user may rename the bookmark 2312 , for example, by clicking the right mouse button over the bookmark 2312 and selecting “Rename” from a popup menu and typing the new name.
  • Bookmarks 2312 may also be deleted from the list, for example, by clicking the right mouse button over the bookmark and selecting “Delete” from a popup menu.
  • bookmarks 2312 may be organized into folders much like computer files or internet bookmarks are managed.
  • the user may create a folder by clicking the right mouse button over the folder under which you want to create your new folder and selecting a “Create folder” option from a popup menu. Folders may also be renamed using a similar procedure as renaming bookmarks 2312 described above. A folder may also be deleted in a similar manner.
  • the user may organize bookmarks 2312 by dragging the bookmark 2312 (i.e., hold the left mouse button over the bookmark and move your mouse) to the folder. Folders may also be hierarchically arranged in a similar manner. In one embodiment, clicking a folder will alternatively show or hide the contents of that folder.
  • bookmarks 2312 may be shared among users.
  • the system 100 may notify users of a common interest in particular item if one or more colleagues have the same bookmark 2312 by creating a special bookmark that is added to each users list 2310 . Selection of this special bookmark may open a shared bookmarks tool.
  • a shared bookmarks tool 2320 is shown in FIG. 23B .
  • the shared bookmark tool includes information about the subject item 2322 , such as an item name, as well as information about each user sharing the interest.
  • each users' first name 2324 , last name 2326 , and email address 2326 are displayed. It should be apparent to one of ordinary skill in the art that other information may be displayed.
  • the user may elect not to share a bookmark with colleagues.
  • users may be notified of common bookmarks by other methods, such as via email, instant messages, pop-up windows, and the like.
  • a wizards button 1930 may be provided to allow the user to launch a wizard service.
  • the wizard service may guide the user through a series of screens to formulate a search.
  • the wizard service may assist with the process of identifying existing assets that have indication in a specified area.
  • An exemplary area may be a particular disease.
  • Exemplary assets may be compounds into which research efforts have been invested.
  • the wizard may take user selected diseases and targets as inputs, allow the user to also specify genes, proteins, or pathways, and then and return a list of possibly relevant projects, literature and compounds, as related by the knowledge model 140 .
  • FIGS. 24 A-L Exemplary screen shots of a wizard service are shown in FIGS. 24 A-L.
  • the user may initially choose to create a new search 2402 or load a previously saved search 2404 . Saved searches may be retrieved via a drop-down list 2406 .
  • the user may define the scope of the analysis. For example, diseases experts and target class representatives identify their initial area of interest such as a disease 2408 or a target 2410 , or both 2412 , through the use of the wizard, as shown in FIG. 24B .
  • the wizard service will guide the user through a series of screens to further define the scope of the search.
  • FIGS. 24 C-D An exemplary process for determining additional keywords for diseases is shown in FIGS. 24 C-D.
  • the wizard service may assist the user to enhance the list of terms 2416 by providing them with a list of diseases including the keyword 2414 , as shown in FIG. 24C .
  • the user may choose 2418 to include known related diseases, such as parent and/or child diseases, as shown in FIG. 24D . If the user so chooses 2418 , a list of known related diseases 2420 may be displayed. The may choose to include any or all of the related diseases in the search.
  • the user may select targets by entering a target keyword 2422 and selecting targets that include the keyword 2424 , as shown in FIG. 24E .
  • the user may be be provided with a list of current diseases 2426 and/or targets 2428 and prompted to validate the selections, as shown in FIG. 24F .
  • the user may edit the search parameters associated with each of the diseases 2426 and/or targets 2428 .
  • the user may choose to augment the search to include additional keywords from topics such as genes 2430 , proteins 2432 , and pathways 2434 , as shown in FIG. 24G .
  • the user may be presented with a list of additional keywords and have the ability to select any keywords from the list to include them in the search.
  • the user may be presented with a list 2436 of genes related to the selected diseases and/or targets. The user may then select any of the genes to add them in the search.
  • the user may also provide keywords 2440 to search for additional genes including the keyword 2440 . Genes including the keyword 2440 may be displayed in the corresponding field 2438 , and the user may select any gene from the list to include it in the search.
  • the user may also be able to directly add a known gene to the scope of a search by manually entering the gene into the appropriate field 2442 . Similar processes may be included for adding protein and pathway related keywords to the search, as shown in FIGS. 241 and 24 J.
  • the result of this first stage is a collection of keywords that are related by the knowledge model 140 .
  • the result of this first stage is a collection of keywords that are related by the knowledge model 140 .
  • the user may be prompted to validate the scope of the search, as shown in FIG. 24K .
  • a list of all keywords 2444 may be displayed.
  • the user may then choose to go back to any of the previous steps and further refine the scope of the search.
  • the user also have the option to save 2446 the query at this point.
  • the user may save the query by entering a query name.
  • these keywords may be searched against project and literature databases, for example, by submitting search strings to the database search indices to find, for example, projects and literature that match the list of relevant terms.
  • the wizard service may return a set of projects/literature that match the set of query terms.
  • the query terms may be ranked and organized by the number of relevant search terms that were found in each search result.
  • a results list of pointers to projects and literature that mention the keyword combinations within the analysis scope may be created.
  • the user reviews the results identified to review potentially applicable projects and literature and compounds, as shown in FIG. 24L .
  • selecting an item on the results lists 2448 and 2450 causes that item to become the active node 1838 .
  • that item takes centrals focus in navigator tool 170 , allowing the user to rapidly build an understanding of the item selected and to explore the knowledge model 140 around the project/asset to add context and explore related literature and topics.
  • a monitored items button 1934 may be provided to launch a monitored items dialog that allows the user to select to be notified when new relationships or literature are discovered for a particular item.
  • An exemplary monitored items dialog 2500 is shown in FIG. 25 .
  • the monitored items dialog 2500 includes a last publication date 2510 which represents the most recent date on which new information was integrated into the knowledge model 140 .
  • the dialog also includes a list 2512 of all monitored items that have changed since the items associated monitoring date and the last publication date 2510 .
  • a filters button 1936 may be provided to launch a filters dialog that allow the user to establish filter settings that filter the related items 1940 being displayed in an entity component 1932 .
  • filters are a mechanism for focusing the results displayed in the navigator tool 170 .
  • the filters are implemented as client-side applications. It should be apparent to one of ordinary skill in the art that the number of filters available for an entity component may vary based on the data stored in the associated knowledge model 140 table. Preferably, several types of filters are accessible directly from the Navigator panels.
  • the entity component 1832 should display a filter icon 1844 if one or more filters exist for that pane. Clicking on the filter icon may also launch the filters dialog.
  • the filters dialog 2600 may include several tabbed filter options pages in which the user may specify various filtering options, such as general filter options, entity filtering options, journal filtering options, publication filtering options, and the like.
  • general filtering options include filter persistence 2602 and internal/external filtering 2604 . If the user selects persistent filtering 2602 , the navigator tool 170 will filter the results of each navigation event. Otherwise, the navigator tool will only filter the current navigation event. Toggling the internal/external filtering option 2604 allows the user to limit results to data source that are internal or external to their enterprise.
  • FIG. 26B shows an exemplary screen shot of a entity filter options page.
  • Entity filtering allows the user to specify parameters to filter the display to show only those related items 1840 that relate to specific entities.
  • Exemplary entity filter entities for a pharmaceutical research navigation tool include organisms and phenotypes.
  • the user may specify a list of phenotypes 2610 and/or organisms 2612 to display.
  • the user may edit the list of displayable organisms by selecting the edit list button 2614 , which may launch a dialog 2620 as shown in FIG. 26C .
  • the user may then view a list of available organisms 2622 by entering a keyword or selecting the appropriate first letter of the organism name from the alpha-bar 2626 .
  • the user may then select organisms to add or remove from the list of displayable organisms 2628 .
  • a similar dialog may be used to edit the phenotype list.
  • the user may also be able to filter displayed literature items to those items found in particular journals.
  • An exemplary screen shot of a journal fitler options page is shown in FIG. 26D .
  • the user may specify a list of displayable journals 2630 in a similar manner to the organism and phenotype lists described above.
  • the user may specify a threshold journal impact level via the corresponding controls 2632 .
  • the journal impact level corresponds to an ISI journal impact ranking.
  • the user may also be able to filter items based on their publication date, as shown in FIG. 26E .
  • the user may limit the results to items published within a set amount of time 2640 , or to those items published before a certain date 2642 .
  • an internal/external filter button 1938 may be provided to allow the user to select related items 1940 based on the source from which they were obtained, as describe above.
  • a confidence box 1940 may also be provided to allow the user to filter the items 1940 displayed in all entity components 1930 based on confidence values. These filters are referred to as confidence filters.
  • the confidence box 1940 is implemented a button associated with each confidence value may be provided to allow the user to display/hide links of the corresponding confidence value.
  • the confidence button 1940 may be implemented as a list of confidence values wherein the navigator tool only displays those items 1940 meeting the selected threshold confidence value.
  • the confidence button 1940 may be implemented as a text box that establishes a threshold confidence value and only those related items 1940 meeting the threshold value may be displayed.
  • the threshold confidence value may be indicative of the relationship type, as described above. For example, a threshold value of one may correspond to a direct relationship.
  • a context drop down list 1942 may be included to provide the user with a list of previously saved, or system provided, stored sets of context.
  • a context represents a set of navigator tool settings.
  • a context includes filter settings, confidence filter settings, and panel layouts.
  • the context drop down list 1942 may also provide access to personal and group default preferences sets associated with login information.
  • the navigator tool 170 Upon selection of a context set, the navigator tool 170 will update the current display to reflect the newly selected context.
  • Alternate context sets containing various sets of information should be readily apparent to one of ordinary skill in the art.
  • master context information may also be stored in a context set.
  • the context drop down list 2090 may display a list of stored preference sets by name.
  • a user may save a new context by selecting a “save new” option from the context drop-down list 1942 .

Abstract

A method for integrating a data item into a knowledge model is provided. The method may include retrieving the data item from a data source, determining if the data item has been previously integrated into the knowledge model, and integrating the data element into the knowledge model if the data item has not been previously integrated.

Description

    RELATED APPLICATIONS
  • The present patent document is a continuation-in-part of application Ser. No. 11/051,733 filed Feb. 4, 2005, which is hereby incorporated by reference.
  • COPYRIGHT NOTICE
  • A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to any software and data as described below and in the drawings hereto: Copyright © 2004, Accenture, All Rights Reserved.
  • BACKGROUND
  • 1. Technical Field
  • The present invention relates generally to an improved method for obtaining, managing, and providing complex, detailed information stored in electronic form in a plurality of sources. The invention may find particular use in organizations that have a need to discover relationships among various pieces of information in a given field.
  • 2. Background Information
  • With the advent of the Internet, the Information Age is upon us. Today, one can find vast amounts of information about any given field or topic at the touch of a button. This information may be available from myriad sources in a variety of commonly recognized formats, such as XML, flat-files, HTML, text, spreadsheets, presentations, diagrams, programming code, databases, etc. This information may also be kept in third-party proprietary formats.
  • Amid this apparent wealth of online information, people still have problems finding the information they need. Online information retrieval may have problems including those related to inappropriate user interface designs and to poor or inappropriate organization and structure of the information. Additionally, the storage of information online in the variety of formats described above also leads to retrieval problems.
  • The existence of a variety of information sources leads to many problems. First, there is a lack of a unified information space. An “information space” is the set of all sources of information that is available to a user at a given time or setting. When information is stored in many formats and at many sources, a user is forced to spend too much overhead on discovering and remembering where different information is located (e.g., web pages, online databases, etc). The user also spends a large amount of time remembering how to find information in each delivery mechanism. Thus, it is difficult for the user to remember where potentially relevant information might be, and the user is forced to jump between multiple different tools to find it.
  • The existence of a variety of information sources also leads to information discovery strategies that lack cohesion. Users must learn to use and remember a variety of metaphors, user interfaces, and searching techniques for each delivery mechanism and class of information. Other problems associated with large numbers of information sources include a lack of links between information sources, and poor delivery mechanisms that don't provide a global view of the information space.
  • To overcome these problems, knowledge discovery tools have been developed. These tools extract information from a plurality of data sources, integrate the information into a common data model, and provide a graphical user interface for viewing the information. While these types of systems have been useful for unifying the information space for a given domain, they still suffer from several limitations.
  • First, each of these data sources typically includes a large volume of files. Thus, collecting and integrating information from a particular data source consumes both time and resources. However, in order to truly represent the information space for a given domain, these tools must collect data from many data sources. Each data source added to the process becomes an additional strain on both resources and time. Moreover, this information must be processed repeatedly to ensure that the data model includes the most current information. Present systems will process a data source in its entirety each and every time an extraction and integration cycle take place. Accordingly, there is a need for a system that doesn't waste time and resources re-integrating information that has already been integrated into the data model.
  • Second, integrating information from a plurality of data sources also leads to problems in the consistency of the information contained in the data model. Information in the data model may be overwritten by less reliable data. For example, a particular person's name may be found in both a structured database maintained by the IRS and the text of an email. In present systems, the name sourced from the email may be used to overwrite the name obtained from the IRS if the email is integrated later. Because the information maintained by the IRS is inherently more reliable than the text of an email (because of both source credibility and structured data), there is a need for a system that takes into account the reliability of the information maintained by the data sources before integrating that information into the data model.
  • Third, the information integrated into the data model is inherently related as that information defines the information space for a given domain. Unfortunately, present systems do not fully realize these interrelationships. Typically, relationships between the data in the knowledge must be defined manually. Manually defining these relationships, however, is a time consuming and expensive process. While systems automatically incorporate those relationships maintained by a particular data source (for example, relationships defined by a database data source), these relationships only represent a fraction of the relationships present among the information contained in the data model. Accordingly, there is a need for a system automatically discovering and generating various types of relationships.
  • The present invention provides a robust technique for integrating, from a plurality of data sources, only the necessary, most reliable data into a data model, and automatically discovering inter-relationships among the various elements of the data model.
  • BRIEF SUMMARY
  • In one embodiment, a method for integrating a data item into a knowledge model is provided. The method may include retrieving the data item from a data source, determining if the data item has been previously integrated into the knowledge model, and integrating the data element into the knowledge model if the data item has not been previously integrated.
  • In another embodiment, a method of integrating a data item into a knowledge model including data collected from a plurality of data sources is provided. The method may include retrieving a data item from one of the plurality of data sources, the data item including a first type of information, determining a reliability value for the one of the plurality of data sources for the first type of information by either leveraging an existing reliability score indicative of a source's reliability or generating an independent reliability score indicative of a source's reliability, and integrating the data item and the reliability value into the knowledge model.
  • These and other embodiments and aspects of the invention are described with reference to the noted Figures and the below detailed description of the preferred embodiments.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a diagram representative of an embodiment of a knowledge discovery tool in accordance with an embodiment of the present invention;
  • FIG. 2A is a diagram representative of tables of an exemplary knowledge model in accordance with an embodiment of the present invention;
  • FIG. 2B is a diagram representative of a field-to-field relationship in accordance with an embodiment of the present invention;
  • FIG. 2C a diagram representative of a field-to-text relationship in accordance with an embodiment of the present invention;
  • FIG. 3 is a diagram representative of an exemplary workflow for an extraction tool in accordance with an embodiment of the present invention;
  • FIG. 4 is a diagram representative of an exemplary workflow for a compare tool in accordance with an embodiment of the present invention;
  • FIG. 5 is a diagram representative of an exemplary workflow for an integration tool in accordance with an embodiment of the present invention;
  • FIG. 6 is a diagram representative of an exemplary workflow for an integrate tool in accordance with an embodiment of the present invention;
  • FIG. 7 is a diagram representative of an exemplary workflow for loading the information of a received message in accordance with an embodiment of the present invention;
  • FIG. 8 is a diagram representative of an exemplary workflow for a Thesaurus component in accordance with an embodiment of the present invention;
  • FIG. 9 is a diagram representative of an exemplary workflow for a Merge component in accordance with an embodiment of the present invention;
  • FIG. 10 is a diagram representative of an exemplary workflow for a LookUp component in accordance with an embodiment of the present invention;
  • FIG. 11 is a diagram representative of an exemplary workflow for a Compare component in accordance with an embodiment of the present invention;
  • FIG. 12 is a diagram representative of an exemplary workflow for an Insert component in accordance with an embodiment of the present invention;
  • FIG. 13 is a diagram representative of an exemplary workflow for a Update component in accordance with an embodiment of the present invention;
  • FIG. 14 is a diagram representative of an exemplary relationship generation tool in accordance with an embodiment of the present invention;
  • FIG. 15 is an exemplary screen shot of a navigator tool in accordance with an embodiment of the present invention:
  • FIG. 16 is a diagram of exemplary components of a navigator tool in accordance with an embodiment of the present invention;
  • FIG. 17 is an exemplary layout for a navigation tool in accordance with an embodiment of the present invention;
  • FIGS. 18A-E are exemplary screen shots of a navigator tool in accordance with an embodiment of the present invention;
  • FIG. 19 is an exemplary screen shot of a navigation toolbar in accordance with an embodiment of the present invention;
  • FIG. 20 is an exemplary screen shot of a history dialogue window in accordance with an embodiment of the present invention;
  • FIG. 21 is an exemplary screen shot of a master options dialog in accordance with an embodiment of the present invention;
  • FIG. 22 is an exemplary screen shot of a search tool in accordance with an embodiment of the present invention;
  • FIG. 23A-B are exemplary screen shots of a navigator with a bookmark list in accordance with an embodiment of the present invention;
  • FIGS. 24A-L are exemplary screen shots of a wizard service in accordance with an embodiment of the present invention;
  • FIG. 25 is an exemplary screen shot of a monitored items dialog in accordance with an embodiment of the present invention; and
  • FIGS. 26A-E are exemplary screen shots of a filters dialog in accordance with an embodiment of the present invention.
  • DETAILED DESCRIPTION OF THE DRAWINGS AND THE PRESENTLY PREFERRED EMBODIMENTS
  • Referring now to the drawings, and particularly to FIG. 1, there is shown an embodiment of a knowledge discovery system 100 in accordance with the present invention. While the preferred embodiments disclosed herein contemplate a knowledge model based on an information space for pharmaceutical research and the information and data sources related thereto, the present invention is equally applicable for knowledge discovery for any information space defined in any type of data source. Examples of information spaces include software development, drug development, financial research, governmental data administration, and clinical trials, product development and testing etc.
  • The knowledge discovery system in the embodiment of FIG. 1 includes an extraction tool 120, an integration tool 130, a knowledge model 140, a user information database 145, a middle tier 150, and a web server 160. The extraction tool 120 extracts relevant information from a plurality of data sources 110 a, 110 b, and 110 x. Optionally, the extraction tool 120 may convert the information into a common format 125, such as XML. Preferably, the extraction tool 120 is implemented using BIZTALK SERVER, provided by Microsoft Corporation of Redmond, Wash. Once relevant information is extracted, the integration tool 140 incorporates the information into the knowledge model 140. Preferably, the integration tool is implemented as a COM+ application, using the COMPONENT OBJECT MODEL software architecture provided by Microsoft Corporation of Redmond Wash. Finally, the middle tier 150 and optional web server 160 are provided to present the information contained in the knowledge model 140 via a navigator tool 170. Preferably, the middle tier is implemented using the .NET framework for Web services and component software provided by Microsoft Corporation of Redmond, Wash. Optionally, access to the knowledge model 140 via the navigator 170 may be restricted to registered users. User information may be stored in the user information database 145.
  • Referring now to FIGS. 2A-C, an exemplary knowledge model 140 for use in one embodiment of the knowledge discovery system 100 is shown. In the embodiment of FIGS. 2A-C, the knowledge model 140 defines an information space for pharmaceutical research, and is represented by a relational database consisting of four distinct types of types. Entity tables define the content of the information space. In one embodiment, each entity table may include a name field (which may or may not be the primary key for that table) and attribute fields. Exemplary entity tables are shown in FIG. 2A.
  • Field-to-field relation tables define the relationships between the fields in the entity tables. In one embodiment, three types of field-to-field relationships exist. A name-to-name relationship relates two name fields from two entity tables. A name-to-attribute relationship relates the name of one entity to an attribute of another entity. An exemplary field-to-field relationship is shown in FIG. 2B. Finally, an attribute-to-attribute relationship relates the attribute of one entity to an attribute of another. Field-to-text relationships define the relationships between a fielded entity terms and the text of unstructured data. For example, the data model 140 may include a person table that defines people in the information space and a literature table that includes fields for various information about an article in the information space, but necessarily the text of the article. A text search of the article may be performed to determine if the person is mentioned in the article. An exemplary field-to-text relationship is shown in FIG. 2C. In one embodiment, each of the field-to-field relationship tables and the field-to-text relationship tables includes a field for the primary key of each entity referenced as well as managerial data, such as a date created field. The relationship tables are described in more detail below in reference to FIG. 5.
  • Referring now to FIG. 3, an exemplary workflow for an extraction tool 120 in accordance with one embodiment is shown. Although the embodiment of FIG. 3 shows certain processes being performed by certain exemplary tools and components, it should be apparent to one of ordinary skill in the art that functions discussed below could be performed by any of the tools or components. In one embodiment, a plurality of data sources 110 is provided. As stated above, each data source may contain thousands of data items of stored in various types of files—XML, flat-files, HTML, text, spreadsheets, presentations, diagrams, programming code, databases, etc.—that include information belonging to the given domain. In the embodiment of FIG. 3, each data source 110 may contain documents of any type, created at any point in time. It should be apparent to one of ordinary skill in the art that other repository structures are contemplated by the present invention. For example, one data source may be provided containing every piece of information to be analyzed. In other embodiments, a plurality of data sources may be provided where each data source may contain only documents of certain types, created at discrete segments of time, or created at a certain geographical locations.
  • The extraction tool 120 extracts relevant information from the various data sources 110. Preferably, the extraction tool 120 is an asynchronous process that begins processing a file as soon as that file is retrieved from a data source 110. Alternatively, the extraction tool 120 may be implemented as a batch process. In one embodiment, each data source has an associated data source type. In one embodiment, each data source may be either an internal data source or an external data source. An internal data source is a data source that is internal to the organization utilizing the knowledge discovery system 100, whereas an external data source is a data source maintained by any other organization. Alternatively, or in addition to, the data source type may define the structure of the data source, such as the underlying directory structure of data source or the files contained therein. Additionally, the data source may be a simple data source consisting of a single directory, or a complex data source that may store metadata associated with each file kept in the data source. In one embodiment, the extraction tool 120 connects to each of the data sources 110 through data source adapters. An adapter acts as an Application Programming Interface, or API, to the repository. For complex data sources, the data source adapter may allow for the extraction of metadata associated with the information.
  • Exemplary data sources include PUBMED, a service of the National Library of Medicine that includes over 15 million citations for biomedical articles back to the 1950's, SWISS_PROT PROTEIN KNOWLEDGEBASE, which is an annotated protein sequence database established in 1986, the REFERENCE SEQUENCE (RefSeq) collection, which aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products, for major research organisms, KEGG, or the Kyoto Encyclopedia of Genes and Genomes, an ongoing project from Kyoto University, LOCUSLINK, a service of the National Library of Medicine that provides a single query interface to curated sequence and descriptive information about genetic loci, MESH, or Medical Subject Headings, the National Library of Medicine's controlled vocabulary thesaurus, OMIM, or Online Mendelian Inheritance in Man, a database catalog of human genes and genetic disorders, and NLM TAXONOMY, a searchable hierarchical index of names of all the organisms for which nucleotide or peptide sequences are to be found in certain data sources. Although each of these data sources constitutes a separate data source, the information in each data source has strong inter-relationships to information in others. Accordingly, the files stored in any particular data source 110 may include information relating the information therein. Referring to FIG. 2B, for example, the PUBMED data source 110 may include information 260 relating a particular person to an organization. This information can be used to determine a relationship definition 266 for a particular person 262 and organization 264 in the knowledge model 140. In one embodiment, a field-to-field relationship that has been determined from information obtained from a data source 110 is called a direct relationship. In one embodiment, all the field-to-field relationships are determined automatically using information from the data sources 110. In further embodiments, a file may include information relating information in itself to information in other data sources 110, or relating information in two separate data sources 110.
  • Optionally, the extraction tool 120 may include various parameters used to determine whether a document is relevant. These parameters may be predefined or configurable by a user. For example, a user may configure the extraction tool to only extract files from specified directories. It should be apparent to one of ordinary skill in the art that many other relevance parameters—for example, only certain file types or only files that have changed after a certain date—are contemplated by the present invention.
  • As stated above, the extraction process 120 retrieves files from the data sources 110. The original files may include large files that are of varying formats. In one embodiment, the extraction tool 120 includes a cut tool 310 that will split the original files into smaller records or documents 315 a, 315 b, etc. Preferably, the cut tool 310 will process the original files such that each record or document 315 a, 315 b includes one and only one data item. Alternatively, the cut tool 310 may generate records or documents 315 a, 315 b that include more than one data item. The original files may also include the information about all items in a single file, separating the information using delimiters. Exemplary delimiters include “///” or a blank line. A configuration file may be provided that details the delimiters used at a particular source. The configuration file may be used by the cut tool 310 to process the original files. In one embodiment, the cut tool 310 may include particularized processor application for processing a particular type of original file, such as an XML processor for cutting XML files or a text processor for manipulating text files. In one embodiment, these particularized processor applications are implemented as C# objects using the C# object-oriented programming language from Microsoft Corporation of Redmond, Wash.
  • Once the files are split into records or documents 315 a, 315 b, the extraction tool 120 preferably stores the records or documents 315 a, 315 b in a file system. Optionally, each record may include an identifier, such as an identifier used by the data source to identify the original file. Exemplary identifiers include a SWISS_PROT ID or a file name. Preferably, the extraction tool 120 also generates a global unique identifier for each record or document 315 a, 315 b. The global unique identifier is used for tracking purposes, as described below.
  • The extraction tool 120 may also be provided with a map tool 320. The map 320 functions to standardize the format of each record or document 315 a, 315 b. In one embodiment, the map tool 320 serves two functions. First, the map tool 320 may create a normalized specification for the records or documents 315 a, 315 b, such as a standardized XML specification. For example, records or documents 315 a, 315 b created from flat files may be transformed into xml files, while records or documents 315 a, 315 b created from XML files may be mapped to the standard XML specification. Second, the map tool 320 may remove information from the record or document 315 a, 315 b that is unnecessary to maintaining the knowledge model 140. In one embodiment, the map tool 320 outputs a single text string of XML.
  • Next, the compare tool 330 of the extraction tool 120 compares the records or documents 315 a, 315 b with those records or documents 315 a, 315 b that have already been integrated into the knowledge model so that only records or documents 315 a, 315 b that are new are further processed. As used herein, a new record or document 315 a, 315 b includes records or documents 315 a, 315 b that have been integrated into the knowledge model 140, but have since been modified. In other words, previously entered records or documents 315 a and 315 b may include only those records or documents that have been integrated into the knowledge model 140 and have not changed since their integration. In one embodiment, compare tool 330 will compute a value based on the record or document 315 a, 315 b. Preferably, the compare tool 330 uses a hash function to generate a hash value for each record or document 315 a, 315 b. The value may be based any part of the record or document 315 a, 315 b, such as the identifier or the information contained therein.
  • Referring now to FIG. 4, an exemplary workflow for a compare tool 330 is described in more detail. In the embodiment of FIG. 4, each record or document 315 a, 315 b has an associated identifier, DocumentID, as well as a data source identifier, DataSourceID, that identifies the data source from where the record or document 315 a, 315 b was retrieved. First, the compare tool generates a hash value, HashCode, for the current record or document 315 a, 315 b. Next, the compare tool 330 compares the DataSourceID and DocumentID for the current record or document 315 a, 315 b to a table of data for previously entered records or documents 315 a, 315 b at block 402. In the embodiment of FIG. 4, the table includes four items for each previously entered record or document 315 a, 315 b: a DataSourceID that identifies the data source; a DocumentID that identifies the record or document 315 a, 315 b; a first has code value, HashCodeActual, that represents the hash code value for that record or document 315 a, 315 b before it is integrated into the knowledge model 140, and a second hash code value, HashCodeCompare, that represents the hash code value for that record or document 315 a, 315 b after it has been integrated into knowledge model 140. If no match is found in the table, this record or document 315 a, 315 b has never been previously integrated into the knowledge model. Accordingly, the compare tool 330 stores the current DataSourceID and Document ID in the table at block 404. Additionally, the HashCode will be stored as the HashCodeActual value for that record or document 315 a, 315 b. The extraction process 120 will continue to process the record or document 315 a, 315 b at block 406. Once the record or document 315 a, 315 b is integrated into the knowledge model 140, the HashCodeCompare value will be updated with the HashCodeActual value at block 408.
  • If a match is found in the table at block 302, the record or document 315 a, 315 b has been previously integrated into the knowledge model 140. The compare tool 330 next compares HashCodeActual to HashCodeCompare for the match. If two values are identical, the record or document 315 a, 315 b has not been modified since its last integration. Accordingly, the record or document 315 a, 315 b is not further processed as shown at block 412. If the values are different, the record or document 315 a, 315 b has been modified since its last integration. In this case, the compare tool 330 updates the HashCodeActual value with the current HashCode value at block 414. The extraction process 120 will continue to process the record or document 315 a, 315 b at block 416. Once the record or document 315 a, 315 b is integrated into the knowledge model 140, the HashCodeCompare value will be updated with the HashCodeActual value at block 418.
  • At this point, the only records or documents 315 a, 315 b to be processed are new records or documents 315 a, 315 b that have been properly formatted. However, the information contained therein may contain unnecessary information as a consequence of different data sources using different nomenclatures. For example, an attribute name may be preceded by an asterisk or dash. Alternatively, the record or document 315 a, 315 b may contain HTML tag information. In one embodiment, the extraction process 120 is provided with a clean tool 340 that removes this unnecessary information from the records or documents 315 a, 315 b.
  • Once the record or document 315 a, 315 b is cleaned, the parse tool 350 of the extraction tool 120 restructures the information of the record or document 315 a, 315 b. For example, if a record or document 315 a, 315 b includes an XML attribute tag containing multiple values separated by a delimiter, the parse tool 350 may each value into separate tags. Additionally, the parse tool 350 may unifies the different nomenclatures of the records or documents 315 a, 315 b so that the information from the different sources is coherent. For example, an Organism name may be listed under a first label in one data source 110 and a second label 110 in another data source. The parse tool 350 may standardize this information.
  • Finally, the extraction process 120 may store the record or document 315 a, 315 b to be integrated into the knowledge model. In the embodiment of FIG. 3, the record or document 315 a, 315 b is stored in a database 360. Alternatively, the record or document 315 a, 315 b may be stored in any manner that is apparent to one of ordinary skill in the art. In yet another embodiment, the record or document 315 a, 315 b is transmitted as part of a message to the integration process 130. Preferably, the extraction tool 120 stores the record or document 315 a, 315 b in a database 260 and sends a message that alerts the integration tool 130 that a new record or document 315 a, 315 b has been inserted. In one embodiment, the message may be a field in the database 260 which is polled by the integration tool 130.
  • Referring now to FIG. 5, an exemplary workflow for the integration process 130 is shown. Preferably, the integration process is an automatic, asynchronous process that doesn't need the entire extraction process 120 to finish. For example, in the embodiment of FIG. 5, the integration process 130 may begin integrating a record or document 315 a, 315 b as soon as it is inserted into the database 360. This entry may be treated and integrated in an individual way and is passed through several components whose purpose is to integrate this source register into the knowledge model 140. The integration tool 130 provides the users with more complete and higher quality information than the data sources 110 alone.
  • In the embodiment of FIG. 5, the integration tool 130 only processes new records or documents 315 a, 315 b because the extraction tool 120 has removed those records or documents 315 a, 315 b that have not been updated since the prior integration. This greatly improves the performance of the integration tool 130, reducing the time necessary to complete the integration process. However, the integration tool 130 is equally capable of integrating any types of records or documents 315 a, 315 b, regardless of whether they have been integrated previously.
  • In one embodiment, the integration tool 130 may receive information to integrate in three ways. First, the integration tool 130 may receive information from the extraction tool 120. For example, the extraction tool 120 may process a record or document 315 a, 315 b from a data source, insert the record or document 315 a, 315 b into a database 360, and alert the integration tool 130 of the presence of the new information. In response, the integration tool 130 may retrieve the information from the database 360. Second, the integration tool 130 may receive information from a re-integration batch process. The re-integration batch process may build a message (of a similar format to those generated by the extraction process 130) that alerts the integration process 130 to the presence of a record or document 315 a, 315 b that could not be integrated into the knowledge model 140 during a previous attempt. Finally, custom applications may be developed to alert the integration tool 130 of information from particular data sources 110 that do not require the full functionality of the extraction tool 120. For example, an internal data source 110 may be provided that includes files that adhere to a particular structure designed to ease the integration process. It should be apparent to one of ordinary skill in the art that any method may be used to introduce a record or document 315 a, 315 b to the integration tool 130.
  • The integration tool 130 may be provided with an integrate tool 500. The integrate tool 500 performs four primary processes. First, the integrate tool may retrieve a record or document 315 a, 315 b from the database 360. Next, the integrate tool 500 may perform a spell check function 510 on the data included in the record or document 315 a, 315 b to ensure that misspellings in the original data source 110 files do not effect the integrity of the knowledge model 140. Similarly, the integrate tool 500 may perform a synonym function 520 to determine if the current term (as used in the record or document 315 a, 315 b) is a synonym for a preferred name. Finally, the integrate tool 500 may perform a merge function 530 that integrates the record or document 315 a, 315 b into a database 540. In one embodiment, the database 540 represents a un-optimized version of the knowledge model 140. A particular embodiment of the integrate tool 500 is discussed in more detail below in reference to FIGS. 9-13.
  • The integration tool 130 may also be provided with various batch-process tools to perform various functions on the information in the database 540. In the embodiment of FIG. 5, the integration tool 130 includes a relationship generation tool 550 that may be used to analyze the information in the database 540. The relationship generation tool 550 is discussed in more detail below in reference to FIG. 14. Similarly, a synonym synchronization tool 560 may run periodically to update the information in the database 540 in accordance with the most recent list of synonyms. Finally, a transition tool 570 may be provided to optimize the information in the database 540 to create the knowledge model 140. For example, the transition tool 570 may denormalize the information in the database 540, generate cross-over tables, build indices on clustered indices on the primary key columns of various tables of the database 540, and optimize the database 540 for queries and data retrieval tasks. In one embodiment, the transition tool 570 generates a database 580 that is replicated in a production environment as the knowledge model 140.
  • Referring now to FIG. 6, the workflow for one embodiment of the integrate tool 500 is shown. As described above, the extraction tool 120 may send a message to the integrate tool 130 to inform the integration tool 130 that new entries in the database 360 need to be integrated into the knowledge model 140. The message may also indicate that the entries are from a particular data source 110. Initially, the integrate tool 500 creates an XMLDocument object. The XMLDocument object is a working version of a standard configuration file. In one embodiment, each data source has a standard configuration file in XML that acts as template for the integration tool 130. An exemplary configuration file is shown in Table 1. It should be apparent to one of ordinary skill in the art that various types of configuration files in other formats are contemplated by the present invention.
    TABLE 1
    Sample XML Data Source Configuration File
    <DataSource Name=”DataSourceName”>
      <SDB1Table Name=”SDB1TableName”>
        <Thesaurus>
          <SDB1FieldThesaurus Name=”FieldName”
          ThesaurusSP=”ThesaurusSPName” SpellingSP
          =”SpellingSPName” />
          ...
        </Thesaurus>
        <LookUp SPName=”SPName”>
          < SDB1FieldLookUp Name=”SDB1FieldName”
          GetIDSP=”SPGetID”/>
          ...
        </LookUp>
        <Compare>
          <SDB1FieldCompare Name=”SDB1FieldName”
          MDB1Field=”MDB1FieldName”>
          ...
        </Compare>
        <Insert SPName=”StoredProcToInsert”>
          <SDB1FieldInsert Name=”SDB1FieldName”
          ConfidenceValue=”ConfidenceValue”/>
          ...
        </Insert>
        <Update SPName=”StoredProcToInsert”>
          <SDB1FieldUpdate Name=”SDB1FieldName”
          ConfidenceValue=”ConfidenceValue”
          Type=”U/A” DB1FieldName=”MDBFieldName”
          MDB1ConfidenceValue=”MDB1ConfidenceField
          Name”/>
          ...
        </Update>
      </ SDB1Table >
      ...
    </DataSource>
  • As shown, the configuration file includes various attributes that are used in later stages of the integration process. The exemplary configuration file includes five attributes, a Thesaurus attribute, a LookUp attribute, a Compare attribute, an Insert attribute, and an Update attribute. The thesaurus attribute includes information in the record that need to be checked for spelling and/or synonyms. In particular, the thesaurus attributes define a field name to be checked and the values for that field name. This value will appear in ThesaurusSP and SpellingSP attributes if the value needs to be checked for synonyms or spelling, respectively. If both the value needs to be checked for both spelling and synonyms, it will appear in both attributes. The LookUp attribute defines each field in the database 360 and the name of a procedure that can be used to lookup the associated row in the knowledge model 140. The Compare attribute defines the field in the database 360 and its corresponding field in the knowledge model 140. The Insert attribute defines each field in the database 360 and its corresponding confidence value, as described below. Finally, the Update attribute defines each field in the database 360, its corresponding confidence level, the field type, and the corresponding field in the knowledge model 140 and its corresponding confidence value. In one embodiment, two field types are defined. An update type implies that the value of the field should be replaced in its entirety if a new record or document 315 a, 315 b is to replace an existing entry in the knowledge model 140. An append type implies that the information in the new record or document 315 a, 315 b should be appended to the current information.
  • As stated above, each field includes an associated confidence value. The confidence value is used score the reliability of the data sources 110 for each field of the knowledge model 140. For example, multiple data sources 110 may include information for one field of the knowledge model 140. To resolve this conflict, the confidence value is used to determine which data source is more reliable for a given field. The confidence value may reflect an internal view of the reliability of the data sources 110 (i.e. the view of the system developers or the organization utilizing the knowledge discovery system 100) or may reflect an external view of reliability (i.e. the use of a third party reliability standard). In one embodiment, the confidence value is a numerical value from 1-20 where the confidence value increases with the reliability of the data source 110. In one embodiment, each of the plurality of data sources 110 is ranked from 1 to N for each field of the knowledge model, where N is the number of data sources 110. Alternatively, multiple data sources 110 may be equally reliable and therefore have the same confidence value. In such an embodiment, the integration tool 130 may chose the most recent record or document 315 a, 315 b as controlling. Alternatively, the integration tool 130 may only replace a field if the confidence value of the new record or document 315 a, 315 b is greater than the current entry.
  • In one embodiment, a confidence value configuration file is provided. The confidence value configuration file may define a confidence value for each field of the knowledge model 140 and for all data sources 110. Alternatively, a separate confidence value configuration file may be provided for each data source 110. It should be apparent to one of ordinary skill in the art, that various ways of tracking the reliability of a data source 110, as well as various types of configuration files, are contemplated herein. An exemplary XML confidence value configuration file is shown in table 2. In the exemplary confidence value configuration file, each field of each table from each data source 110 is ranked.
    TABLE 2
    Sample XML Confidence Value Configuration File
    <Table>
      <DataSource1>
        <field1> ConfidenceValue </field1>
          ...
        <fieldn) ConfidenceValue </fieldn>
      </DataSource1>
      ...
    </Table>
  • Referring now to FIG. 7, an exemplary workflow for the loading the information from a received message into an XMLDocument object is shown. First, the integrate tool 500 reads the configuration file for the data source identified in the message at block 702. Next, a check is performed to determine if an XMLDocument object for this data source is cached at block 704. If so, the XMLDocument object is retrieved from the cache at block 706, and the information from the message is used to populate the ConfigFileContent property of the XMLDocument at block 708. If no XMLDocument object for the particular data source is in the cache, the integrate tool 500 will create a new XMLDocument object and load it with the configuration file information at block 710, put the new XMLDocument in the cache at block 712, and populate the ConfigFileContent property of the XMLDocument with the information from the message at block 708.
  • Returning to FIG. 6, after loading the received message into an XMLDocument object at 602, the integrate tool 500 next checks to see if the message contains a record or document 315 a, 315 b that needs to be integrated into the knowledge model at block 604. If the message does not contain any additional records or documents 315 a, 315 b that need to be integrated, the process ends at block 606. If the message does contain a record or document 315 a, 315 b that needs to be integrated, the integrate method retrieves that record or document 315 a, 315 b from the database 360 at block 608. Next, the integrate tool 500 calls the thesaurus component to perform the spelling function 510 and synonym function 520 at block 610. In the embodiment of FIG. 6, the thesaurus component includes an internal source, such as a database, with containing information on commonly misspelled words and synonyms or preferred words. In either case, the thesaurus component will replace the misspelled or non-preferred word with the proper word. Alternatively, an external source may be used by the thesaurus component.
  • Referring to FIG. 8, an exemplary workflow for the Thesaurus component is shown. First, the Thesaurus component retrieves the field names from the XMLDocument Thesaurus attribute at block 802. Next, the Thesaurus component will check to determine if any more fields need to be checked at block 804. If no more fields need to be checked, the Thesaurus component will exit at block 806. If a field needs processing, the Thesaurus component will retrieve the corresponding ThesaurusSP and SpellingSp values at block 808. Next, the Thesaurus component will retrieve the word to check at block 810, and call the SpellingCheck procedure at block 812. The SpellingCheck procedure first determines if the SpellingSp value is non-blank at block 814. If the SpellingSp value is non-blank, the SpellingSP procedure is executed at block 816. In one embodiment, the SpellingSp procedure checks the SpellingSp value against a spellings table that includes the correct word and various misspellings. When the correct word is found, it is substituted for the old value at block 818. At this point, or if the SpellingSp value is determined to be blank at block 814, the Thesaurus component moves on to the ThesaurusCheck procedure at block 820. Similar to the SpellingSp procedure, the ThesaurusCheck procedure first determines if the ThesaurusSP value is non-blank at block 822. If the ThesaurusSP value is non-blank, the ThesaurusSP procedure is executed at block 824. In one embodiment, the ThesaurusSP procedure checks the ThesaurusSP value against synonym table that includes a preferred word and various synonyms. When the correct word is found, it is substituted for the old value at block 824. The Thesaurus component then returns to block 804 to determine if any additional fields need to be checked, and continues to loop until all the fields have been processed.
  • Returning to FIG. 6, once the Thesaurus component has finished, the record or document 315 a, 315 b is passed to the Merge component at block 612. In order to make the knowledge model 140 a richer source of information than any one underlying data source 110, the knowledge model 140 typically includes more information on a given entity than any single data source 110. The Merge component is used to update the knowledge model 140 with the new records or documents 315 a, 315 b stored in the database 360 and assimilate the various pieces of information from the various data sources 110. In one embodiment, the Merge component takes a single record or document 315 a, 315 b and uses it to fill a single row in the database 540. First, the Merge component has to determine if the information provided by the record or document 315 a, 315 b complements the existing information or it represents new information. Depending on the comparison, the record or document 315 a, 315 b is either inserted into the database 540 as a new row or used to update the contents of an existing row. In one embodiment, four tools are used to accomplish these tasks. First, the Merge component may include a LookUp component that is used to determine if the record or document 315 a, 315 b can be integrated into the knowledge model and if the record or document 315 a, 315 b is entirely new, for example, if there is now row in the database 540 that corresponds to this record or document 315 a, 315 b. If a row exists that corresponds to this record or document 315 a, 315 b, the Merge component may utilize a Compare component to determine if the existing row in the database 540 includes null values in the fields to be modified by the record or document 315 a, 315 b to be processed. If not, a new row may be added to the database 540. If the row does include null values, that information must be updated with the information in the record or document 315 a, 315 b. Depending on the results of these tests, an Insert component may be used to add a new row or an Update component may be used to update a row.
  • Referring now to FIG. 9, an exemplary workflow for an embodiment of the Merge component is shown. First, the Merge component calls the LookUp component at block 902, which determines if the record or document 315 a, 315 b can be integrated at block 904. If the record or document 315 a, 315 b cannot be integrated, the Merge component returns this information to the integrate tool 500 at block 906 and exits at block 908. If the record or document 315 a, 315 b can be integrated, the LookUp component then determines if the record exists at block 910. If not, the record or document 315 a, 315 b is then passed to the Insert component at block 912, and the Merge component ends at block 908. If the record does exist, the Compare component is called to determine if the record exists with null information at block 916. If the record does not include null information, the record or document 315 a, 315 b is passed to the Insert component at block 912 and the Merge component exits at block 908. If the record does not include null information, the record or document 315 a, 315 b is passed to the Compare component at block 918 and the Merge component exits at block 908.
  • Referring now to FIG. 10, an exemplary workflow for an embodiment of the LookUp component is shown. First, the LookUp component retrieves the StoredProcedure attribute from the XMLDocument object, as described above, at block 1002. Next, the LookUp component retrieves the first field information from the database 360 which need to be checked at block 1004. At block 1006, the LookUp component determines if any additional fields need to be processed. If so, the LookUp component compiles a dataset of all the values that need to be looked up. To do this, the LookUp component retrieves the additional field from the value at blocks 1008 and 1010, and determines the corresponding table in the database 540 for this field at block 1012. If the value is not found in the database 540, the LookUp component performs a lookup function on the value for the fields at block 1016 and determines if the ID for that value is found at block 1018. If the ID is not found, the LookUp component checks the record to be re-integrated later at block 1020, informs the integrate tool 500 that the record could not be integrated at block 1020, and exits at block 1024. If the ID is found, the LookUp component will return to block 1006 and continue compiling the list of fields to look up. Once there are no additional fields to look up, the LookUp component determines if the records exist at block 1022 and exits at block 1024.
  • Referring now to FIG. 11, an exemplary workflow for the Compare component is shown. First, the Compare component retrieves the XMLDocument Compare attribute at block 1102. Next, the Compare component compiles a dataset of all the values in the record that need to be compared at blocks 1104, 1106 and 1108. Once this dataset is compiled, the Compare component determines if any values in this dataset are included in the dataset determined by the LookUp component at block 1110. If so, those records are returned to the Update component, as described above, at block 114 and exits at block 1116. If the values are not the same, the Compare component then determines if the values are null. If so, those records are returned to the Update component, as described above, at block 114 and exits at block 1116. If the values are not null, the Compare component exits at block 1116.
  • Referring to FIG. 12, an exemplary workflow for an Insert component is shown. First, the Insert component retrieves the stored procedure name that performs the actual inserts at block 1202. Next, the Insert component retrieves the field values and confidence levels from the XMLDocument object, as well as the values from the database 360 for the record to be inserted at block 1204. Using this information, the Insert component builds a call to the stored procedure to insert the new information at block 1206. Finally, the call is executed at block 1208.
  • Referring now to FIG. 13, an exemplary workflow for an Update component is shown. First, the Update component retrieves the name of the stored procedure that performs the actual update at block 1302. Next, it reads the Update attribute from the XMLDocument object at block 1304. A check is performed to determine if there any more fields in the Update attribute that need to be processed at 1306. If so, the Update component retrieves the field value and corresponding confidence level from record or document 315 a, 315 b at blocks 1314 and 1316, respectively. It then retrieves the confidence level of the current entry in the knowledge model 140, and compares the two confidence values at block 1320. If the confidence value for the new field is greater than the current confidence value, the new field is marked to ‘Update’, meaning that this new value should replace the existing value, at block 1322. If the current confidence value is greater than the new confidence value, however, the current value will not be overwritten. The Update component continues in this manner until all of the update fields have been processed. When there are no additional fields to process, the Update component builds the procedure call at block 1308, executes the call at block 1310, and exits at block 1312.
  • Returning to FIG. 6, once the Merge component has finished processing the records or documents 315 a, 315 b from the message, a check is made to determine the result at block 614. If the process was successful, the record or document is removed from the database 360 at block 616, and the integrate tool 500 returns to block 604 to process the next record in the message. Alternatively, if the Merge component was unsuccessful, the age field for the record is incremented at block 618, and the integrate tool 500 returns to block 604 to process the next record in the message. The concept of “age” appears as a result of the automatic, asynchronous nature of the integration process. For example, as described above, the merge component can be used to merge entities or relationships. A potential problem could arise if the system attempts to merge a relationship before one of entities of the relationships exists in the knowledge model 140, such as a relationship that defines a relation between entities a and b before entity b exists in the knowledge model 140. The re-integration batch process described above may be used to reintroduce these records or documents 315 a, 315 b at a later time. In one embodiment, the records or documents 315 a, 315 b may be deleted if their ‘age’ reaches a particular level, for example, 10. Alternatively, or in addition to, either the integration or re-integration process may determine if a record or document 315 a, 315 b covering the same field and from the same data source 110 has been integrated subsequently. If so, the integration of the ‘old’ record or document 315 a, 315 b is no longer necessary, and it may be deleted.
  • Referring now to FIG. 14, an exemplary relationship generation tool 550 is shown. As discussed above, the relationship generation too may be used to analyze the information in the knowledge model 140 and populate various relationship tables. In the embodiment of FIG. 14, the relationship generation tool 550 includes three components. The field-to-text relationship tool 1410 generates the field-to-text relationships, as described above. In one embodiment, the field-to-text relationship tool 1410 reads each name field from every entity table. For each name field, the field-to-text relationship tool 1410 executes a stored procedure that searches for the given name in various other fields of the entity tables. For example and with reference to FIGS. 2A and 2C, the field-to-text relationship tool 1410 may select the name field from person entity table and search for that entry in the title and abstract fields of the literature entity table. If a match is found, a field-to-text relationship may be added to the field-to-text relationship table. Alternatively, or in addition to, the field-to-text relationship tool 1410 may retrieve the full text of the article referenced by the literature table (even though the article is not necessarily stored in the knowledge model 140) and perform a similar search. It should be apparent to one of ordinary skill in the art that the field-to-text relationship tool 1410 may be configured to select any set of fields from the entity tables and search any other fields in the entity tables. Additionally, the field-to-text relationship tool 1410 may be configured to search the text of unstructured data that is not referenced in any entity in the knowledge model.
  • The relationship generation tool 550 may also be configured to derive relationships by analyzing the data of the knowledge model 140. These types of relationships are referred to herein as derived relationships. In one embodiment, the relationship generation tool may include a transitive relationship tool 1420. The transitive relationship tool 1420 determines transitive relationships. As used herein, a transitive relationship is defined as any relationship between two entities that is based on at least two separate relationships. As discussed above, a direct relationship is a relationship that has been determined from information in a data source 110. These direct relationships may be stored in a direct relationship table. In one embodiment, the transitive relationship tool 1420 selects each row in the direct relationship table. For each field referred to in the relationship definition, the transitive relationship tool 1420 may search every other row in the direct relationship table for a match. If a match is found, a new relationship is created to reflect the commonality. For example, if a direct relationship is defined between field A and field B, the transitive relationship tool 1420 may search the other rows of the direct relationship table for a match on field A. If a match is found, for example, relating field A to field C, the transitive relationship tool 1420 may create a transitive relationship relating field B to field C. This is an example of a single hop transitive relationship. Preferably, the transitive relationship tool 1420 uses a search depth algorithm to calculate the transitive relationships across n hops. In one embodiment, the transitive relationship may be stored in a transitive relationship table. Alternatively, the transitive relationship may be stored in the same table as the direct relationships. In one embodiment, the transitive relationship definition includes information detailing each hop from the two related entities.
  • The relationship generation tool 550 may also include a proximity relationship tool 1430. Similar to the field-to-text relationship tool 1410, the proximity relationship tool 1430 searched the text of either fields in the knowledge model 140 or unstructured files, such as articles. The proximity relationship tool 1430 creates a proximity relationship if two entities appear in the same text. In one embodiment, indexes are created for all the text to be searched (i.e. specific field values or unstructured data items). The indexes are then used to determine if two entities appear in the same text. Alternatively, or in addition to, the proximity relationship tool 1430 may be configured to generate a proximity relationship if the entities appear within a given proximity of each other in the text, for example, within n words of each other. Other criteria, such as each field appearing at multiple instances within each document, each field appearing in the same sentence, and the like, may also be used to define a proximity relationship. It should be apparent to one of ordinary skill in the art that the determination of a proximity relationship may be dependent on the type of file being examined. For example, if a text file is be used, a proximity relationship may be generated if the words fields appear within the same paragraph. If, however, the file being searched is a spreadsheet, the proximity relationship tool 1430 may generate a proximity relationship if the two fields appear in same cell, row, or column. In one embodiment, the proximity relationship tool 1430 stores the proximity relationship definition as well as information detailing the rationale behind the generation of the relationship. For example, to define a proximity relationship between two fields, the proximity relationship tool 1430 may store each field, the criteria used to determine the relationship, and the article or reference in which the use of the fields met the given criteria.
  • Referring to FIGS. 15-26, an exemplary navigator tool 170 is shown. In the embodiment of FIGS. 15-26, the navigator tool 170 is a graphical user interface that allows the user to select a record or item from one of a table of the knowledge model 140 and, in response to the selection, display a set of related items or records. Preferably, and only registered users may access the knowledge model 140. It should be apparent to one of ordinary skill in the art that other implementations of the navigator tool 170 are contemplated herein. In one embodiment, the user may be initially directed to a log in to the navigator tool 170 in order to access the data stored in the knowledge model 140. To do so, the user may enter a valid username and password combination. The user may then submit this information to be validated against a database of user information, for example, the user information database 145. Optionally, the user may be allowed to select an option to store the username and password information for future log in attempts.
  • In the embodiment of FIGS. 15-26, the navigator tool 170 includes a toolbar 1510 and a navigation area 1520. The toolbar 1510 may provide access to a variety of functions of the navigator tool 170 via corresponding interface objects, such as a navigation functions. The toolbar and various capabilities accessible via the toolbar are described in more detail below in reference to FIGS. 19-26. In one embodiment, the navigation area 1520 includes nine visually separated panels 1530. Each panel 1530 contains information corresponding to an entity of the knowledge model 140. The information contained in each panel may be referred to as an Item. The Item in the center, or active, panel 1530 may display a single Item. Each of the remaining panels 1530 may display zero, one or more Items for a particular entity table of the knowledge model 140 that relate to the Item in active panel 1530.
  • Referring now to FIGS. 16 and 17, a diagram of exemplary components and an exemplary layout for one embodiment of a navigation tool 170 are shown, respectively. The Navigator component 1602, 1702 is the main component that will contain the rest of the components and manage the interface among all the other components of the navigator tool 170. In one embodiment, each Navigator component 1602, 1702 comprises a ToolTipPanel component 1604, 1704, one to nine EntityPanel components 1606, 1706, one or more RelationLine components 1620, 1720, and an Information Panel component 1622, 1722.
  • The ToolTipPanel component 1604, 1704 may include summary and supporting attribute information about an Item. In one embodiment, ToolTipPanel components 1604, 1704 are implemented as pop-up boxes that appear when a user mouses-over an Item. For example, a ToolTipPanel component 1604, 1704 for an Item describing a person might contain their age, level within their company, hire date, email address, and the like. In one embodiment, the ToolTipPanel component 1604, 1704 associated with the active Item may be permanently displayed below the Item name.
  • The EntityPanel component 1606, 1706 includes information corresponding to an entity of the knowledge model 140. In the embodiment of FIGS. 16 and 17, each EntityPanel component 1606, 1706 consists of a TitleBar component 1608, 1708 and a body component 1610, 1710. The TitleBar component 1608, 1708 may include information about the entity, such as an entity name, icon for the entity. The Body component 1610, 1710 may include information about the Items in an entity table. In one embodiment, the Body component 1610, 1710 includes one or more EntityItem components 1614 and a DataList component 1616. Each EntityItem component 1614, 1712 includes information for an item being displayed in the EntityPanel component 1606, 1706. Optionally, the TitleBar component 1608, 1708 may include node counter information that shows how many Items from the particular entity table are related to the Item in the active panel 1606, 1706 as well as which items are currently visible. In one embodiment, both the EntityItem components 1614, 1714 and TitleBar components 1608, 1708 may be associated with a PopUpMenu components 1612, 1712 which provide access to various functions associated with the the EntityItem components 1614, 1714 and TitleBar components 1612, 1712, respectively.
  • Referring now to FIG. 18A-D, an exemplary screen shot of a navigator tool 170 is shown. The navigator tool 170 may include a toolbar 1810 and a navigator component 1820. In the embodiment of FIG. 18, the navigator component 1820 includes the elements described above in regard to FIGS. 16 and 17. As shown, the navigator component 1820 includes nine entity components 1830, each including a title component 1834 and a body component 1836. The title component 1834 includes the name of an entity table and, where applicable, a node counter that displays the total number of items 1840 included in the corresponding entity components 1832.
  • As described above, the navigator tool 170 may be implemented as a graphical user interface that allows the user to select a record or item from one of a table of the knowledge model 140 and, in response to the selection, display a set of related items or records. In the embodiment of FIG. 18 the center entity component 1832 represents the active or selected node 1838 and includes the name of the active node 1838. In one embodiment, the name of active node 1838 may be truncated. Optionally, the navigator tool 170 may be configured to display a pop-up window displaying various information about the active item 1838 upon a predetermined event, such as an activation of the item 1838 via a single-click, double-click, mouse-over, and the like. Optionally, the same functionality may be provided for the related nodes 1840.
  • The remaining entity components 1832 may be used to display those related items 1840 in the knowledge model 140 related to the active node 1838, for example, by displaying the name of the related item 1840. Optionally, indicia of the link type associating each related item 1840 to the active node 1838 may be included. In the embodiment of FIG. 18, a roman numeral indicating the type of link is used to indicate the link type. For example, direct, or field-to-field, links may be designated by the roman numeral “I”, field-to-text links by the roman numeral “II”, transitive links by the roman numeral “III,” and proximity links by the roman numeral “IV.” Other exemplary indicia may include using associated font colors, font sizes, or any other visual indicator. In one embodiment, the navigator tool 170 may query the knowledge model 140 to determine the related items 1840 in response to the selection of the active node 1838. Preferably, queries are performed via a batch process that determines all related items 1840 for each item 1830 of the knowledge model. The queries may be saved, for example in a database table, to vastly improve the performance of the navigator tool 170.
  • Each entity component 1832 is associated with a particular table of the knowledge model 140. In one embodiment, each entity component 1832 displays all the related items 1840 for the associated table of the knowledge model 140. Preferably, the user will be allowed to select the type of entity being displayed in any particular entity component 1832 by associating that entity component 1832 to any table in the knowledge model 140. In such an embodiment, the user may configure the entity components 1832 to display the tables of interest to that particular user. Preferably, the associations of entity components to knowledge model 140 tables may be stored.
  • In one embodiment, each entity component 1832 may be configured to display a set number of item 1840 at a given time. In such an embodiment, navigation tools, such as a scroll bar or navigation arrows, may be provided to allow the user to access the entire list of related items 1840. Additionally, the entity component 1832 may include node 1840 count information to inform the user of the additional though not visible items 1840. Preferably, the entity component 1832 also includes information describing which related items 1840 of the set are currently being displayed. For example, the entity component 1832 may show that items 1840 three through nine of eighty-six total items 1840 are currently being displayed. In such an embodiment, a scrollbar or other user-interface control may be included to provide access to the items 1840 not being displayed.
  • Optionally, the entity component 1832 may include tools to manipulate the related items 1840 contained therein. In the embodiment of FIG. 18A, each entity component includes a sort button 1842. The user may activate the sort button 1842 to sort the list of related items 1840 alphabetically or by confidence level. Other criteria such as date restrictions and the like may also be used to sort the related items 1840. The entity component may also include a filters button 1844 which opens the master filters dialog for the corresponding entity, described in more detail below in reference to FIGS. 26A-E.
  • As described above, each entity component 1832 may be associated with an entity type of the knowledge model 140. In one embodiment, the user may change the entity table associated with any entity component 1832 that displays related items 1840. As shown in FIG. 18B, the user may activate a menu, that includes a list of all possible entity tables of the knowledge model 140 that may be associated with the particular entity component 1832. This menu may be activated, for example, by selecting the appropriate triangle icon 1848 on the title component 1834. Other methods of changing the associations between an entity components 1832 and entity tables of the knowledge model 140 are contemplated herein.
  • In one embodiment, the activation of a particular related item 1840 may cause additional information about that item 1840 and its relationship to the active item 1838 to be displayed. As shown in FIG. 18C, the selection of a related item 1840 may cause a ToolTipPanel component 1850 to be displayed that shows summary information for the related item 1840.
  • Additionally, or alternatively, a relationship line 1852 between the related item 1840 and the active item 1838 may also be displayed upon activation of the related item 1840. In the embodiment of FIG. 18C, the color and style of the relationship line 1852 indicates the type of relationship between the two items. For example, a continuous green line may indicate a field-to-field link, a dashed blue line may indicate a field-to-text link, a dashed and dotted yellow line may indicate a transitive relationship, and a dotted red line may indicate a proximity relationship. It should be readily apparent to one of ordinary skill in the art that the relationship type may be indicated using color, style, size, and the like, or any combination therein.
  • As shown in FIG. 18D, the user may select any of the related items 1840 to make that item the active node 1838. In response, the navigator tool 170 may update the display accordingly. In one embodiment, the navigator tool 170 may submit a new query or retrieve saved queries from the knowledge model 140 and display the related items 1840 to the new active item 1838. Alternatively, or in addition to, the user may drag-and-drop a related item into the center entity panel to make that item the active item 1838.
  • As shown in FIG. 18E, the user may access a variety of item-related options via a pop-up menu 1854, for example, by right clicking on an item. In one embodiment, the pop-up menu 1852 provides access to functions create a bookmark to an item, make an item the home item, email a link to an item, monitor an item, and show link evidence for a related item 1840. A bookmark is a link to a particular item. Bookmarks are stored in a list of bookmarks accessible via the bookmark button of the navigator toolbar 1810, described in more detail below. The home item is a special bookmark that can be loaded into the navigator tool by pressing the home button of the navigator toolbar 1810. Items may be emailed to an individual by selecting the email link option. In one embodiment, selecting the email link option launches the default mail program, creates a new e-mail with a system generated introduction, and places the link to the item into the new e-mail message. Additionally, the user may select an item to monitor via the pop-up menu. As described in more detail below, the system 100 may monitor items and notify the user of updates and/or changes to the items. When a user denotes an item to monitor, a date stamp may be created and saved with item information to be used by the system 100 for monitoring.
  • Finally, the user may wish to see information on why a particular related item 1840 is considered related to the active node 1838. To do so, the user may select the show link evidence option from the pop-up menu 1854. Depending on the type of link establishing a connection between the active node 1838 and the related node 1840, different link information may be shown. For example, link information for field-to-field links may include the data source from which the link was extracted. Link information for field-to-text links may include a short part or clip of the literature text that surrounds the keyword. In one embodiment, the clip length should user configurable. Preferably, the clip length may be initially set to be N words total, such that (N−1)/2 words preceding the item keyword and (N−1)/2 words following the item keyword are included. For example, if the clip is set to 31 words, the clip may inlcude the 15 words preceding and following the item keyword. For transitive links, the link information may inlcude each field-to-field link information for each hop included in the link. Finally, link information for proximity links may inlcude the title of the article which mentions both items, as well as a clip for showing each item in context.
  • As described above, the navigator tool 170 may include a navigation toolbar 1810. One embodiment of the navigation toolbar 1810 is shown in FIG. 19. The navigation toolbar 1510 may contain icons and controls which enable the user to access and configure the various services of the navigator tool 170. In one embodiment, the navigation toolbar 1510 may include a back button 1910, a forward button 1912, a stop button 1914, a refresh button 1916, a home button 1918, a history button 1920, a signoff button 1922, a help button 1924, an about button 1926, a search button 1928, a wizards button 1930, a bookmarks button 1932, a monitored items button 1934, a filters button 1936, a source filters drop-down list 1936, a confidence level tool 1940, a context drop down list 1942, and an options button 1944. It should be apparent to one of ordinary skill in the art that the various user interface components may be used provide access to the functions described below.
  • The navigation tool 170 provides basic navigational functions via the navigation buttons. For example, the back button 1910 and forward button 1912 may be provided to allow the user to step through their recent navigation history backwards and forwardly, respectively. Activating the stop button 1914 may cancel the submission of a query to the knowledge model 140. In one embodiment, a command is issued to the knowledge model 140 to abort query processing. Preferably, all current client and server processing activity is stopped. Activating the refresh button 1916 may allow the user to manually refresh their current view (for example, by resending a query to the knowledge model 140) and update the display of related item 1840 based on the new results. A home button 1918 may be provided that takes the user to their home view (i.e. home item). The home view is a set node. The home view may be user customizable.
  • A history dialog button 1920 may also be provided to launch a history dialog window. One embodiment of a history dialogue window is shown in FIG. 20. The dialog window 2000 may show the user's recent navigation history, such as a list of navigation events 2010. In one embodiment, both the node name and entity name are displayed. The user may be able to highlight a navigation event and click a “show” button 2020 to refocus the navigator 170 on that item by making that item the active node 1838. Alternatively, or in addition to, the user may be able to double-click on a history item and refocus the navigator on that item. The user may close the history dialogue window 2000 by selecting the close button 2030. In one embodiment, the navigator tool 170 may save a set number of history events. This number may be user-configurable. Preferably, the history events may be stored in the user information database 145 to make the history events session independent and persistent.
  • Upon selection of the signoff button 1922, the user may be logged out of the navigator tool 170. Upon selection of the help button 1924, the user may be provided access to a help system, as known in the art. In one embodiment, selection of the help button 1924 may cause an html based help system to be launched in a separate window. A window containing information about the knowledge discovery tool 100 or navigator tool 170 may be opened upon selection of the about button 1926. This information may include version information, such as a revision number, intellectual property information, such as copyright, patent and/or licensing information, and the like.
  • The options button 1944 may launch the master options dialog. One embodiment of the master options dialog 2100 is shown in FIG. 21. In the embodiment of FIG. 21, the master preferences dialog 2100 includes a startup view preference 2110, a navigation history preference 2120, a related items limit preference 2130, an animations preference 2140, a reset button 2150, an ok button 2160, and a cancel button 2170.
  • The startup view preference 2110 allows the user to select what they want to see upon starting the navigator tool 170. In one embodiment, three options are provided: search, last item visited and home item. If the search option is selected, the navigator tools 170 opens with a search dialog, discussed below in more detail. If the last item visited option is selected, the navigator tool 170 opens with the active node 1838 from when the navigator was last closed. In one embodiment, all filter, confidence, and entity component 1832 association settings may also be preserved. Filter and confidence settings are described in more detail below. Finally, if the home item option is selected, the navigator tool 170 will open with the home item as the active node 1838. Preferably, the home item startup option is the default option and the home view is set to a standard node.
  • The navigation history preference 2120 defines the number of navigation events stored for the navigation session. In one embodiment, the default value is set to 10. Alternatively, or in addition to, the navigation history preference 2120 may have a maximum value, for example, 30 events. Preferably, the navigation history preference 2120 is implemented as a drop down box.
  • The related items limit preference 2130 controls the number of records which can be returned to each entity panel 1932 in the navigator tool 170 from a query. In one embodiment, a default value is selected to optimally balance performance and quality of the results returned.
  • The animations preference 2140 may allow the user to enable or disable animation rendering effects in the user interface. Preferably, the animations preference 2140 is implemented as a checkbox and is selected by default. An ok button 2150 may be provided to accept the currently selected preferences, and a cancel button 2160 may be provided to close the dialog 2100 without changing preferences.
  • Referring again to FIG. 19, the search button 1928 may launch a search tool that allows the user to perform a keyword search of the knowledge model 140. The search dialog may include the appropriate user interface tools to allow the user to specify a search term(s) for querying the knowledge model 140. One embodiment of a search tool 2200 is shown in FIG. 22. To perform a search, a user may enter one or more keywords of interest in the search term field 2210. The search will perform a literal search for the entered search terms. In one embodiment, a ‘*’ character acts as a wildcard identifier and denotes multiple characters. For example, a search for the keyword “ind*” may cause the knowledge model 140 to search for all terms starting with the text “ind.” The user may also be able to select the type of information they are looking for by checking an entity type from those listed in the menu 2220 of checkboxes below the search field 2210. For example, one may restrict the results of a search to diseases, genes or literature by selecting the appropriate items in the menu. In one embodiment, the user may further refine a search target by selecting “Internal, External, or Both” under the literature entity. Preferably, the navigator tool 170 searches against all entities by default.
  • To begin a search, the user may click the find button 2212. In response, the system 100 performs a free-text search against the information stored in the knowledge model 140. When the search is complete, the results are shown in the Search Results field 2230. In one embodiment, the search results include a description 2232 of the item and the entity table 2234 to which it belongs. The user may also be able to view more detailed information in the description field 2240 by selecting the item from the list. In one embodiment, the selection of an item is made via a single click on any of the search results. The results may be sorted by name or by type by clicking on the header of the appropriate fields 2232 and 2234. The user may be able to view the source of a particular search result by clicking the View Web Page button 2250. The Show button 2252 shows the selected item in the navigation window, making it the active node 1838. Alternatively, or in addition to, the user may double-click a particular search result to make that item the active item 1838. The Close button 2254 will close the search dialog box.
  • Referring again to FIG. 19, a bookmarks button 1930 may also be provided on the navigator toolbar 1510. As described above, bookmarking an item allows the user to save links to previously viewed items to enable their quick retrieval later. Clicking the Bookmark button 1930 may cause a list of saved bookmarks to be displayed. An exemplary screen shot of the navigator tool 170 with a bookmark list 2310 is shown in FIG. 23A. As shown, the bookmark list 2310 includes a list of bookmarks 2312. Selection of a bookmark 2312 may cause the item that is bookmarked to become the active item 1838 of the navigator tool 170. In one embodiment, bookmarks 2312 include a name. When a bookmark 2312 is created, the bookmark 2312 may have the same name as the item that is being bookmarked. Optionally, the user may rename the bookmark 2312, for example, by clicking the right mouse button over the bookmark 2312 and selecting “Rename” from a popup menu and typing the new name. Bookmarks 2312 may also be deleted from the list, for example, by clicking the right mouse button over the bookmark and selecting “Delete” from a popup menu.
  • Optionally, bookmarks 2312 may be organized into folders much like computer files or internet bookmarks are managed. In one embodiment, the user may create a folder by clicking the right mouse button over the folder under which you want to create your new folder and selecting a “Create folder” option from a popup menu. Folders may also be renamed using a similar procedure as renaming bookmarks 2312 described above. A folder may also be deleted in a similar manner. Once a folder has been created, the user may organize bookmarks 2312 by dragging the bookmark 2312 (i.e., hold the left mouse button over the bookmark and move your mouse) to the folder. Folders may also be hierarchically arranged in a similar manner. In one embodiment, clicking a folder will alternatively show or hide the contents of that folder.
  • Optionally, bookmarks 2312 may be shared among users. In one embodiment, the system 100 may notify users of a common interest in particular item if one or more colleagues have the same bookmark 2312 by creating a special bookmark that is added to each users list 2310. Selection of this special bookmark may open a shared bookmarks tool. One embodiment of a shared bookmarks tool 2320 is shown in FIG. 23B. The shared bookmark tool includes information about the subject item 2322, such as an item name, as well as information about each user sharing the interest. In one embodiment, each users' first name 2324, last name 2326, and email address 2326 are displayed. It should be apparent to one of ordinary skill in the art that other information may be displayed. Optionally, the user may elect not to share a bookmark with colleagues. Alternatively, or in addition to, users may be notified of common bookmarks by other methods, such as via email, instant messages, pop-up windows, and the like.
  • Referring again to FIG. 19, a wizards button 1930 may be provided to allow the user to launch a wizard service. In one embodiment, the wizard service may guide the user through a series of screens to formulate a search. For example, the wizard service may assist with the process of identifying existing assets that have indication in a specified area. An exemplary area may be a particular disease. Exemplary assets may be compounds into which research efforts have been invested. For a knowledge model 140 for pharmaceutical research, the wizard may take user selected diseases and targets as inputs, allow the user to also specify genes, proteins, or pathways, and then and return a list of possibly relevant projects, literature and compounds, as related by the knowledge model 140.
  • Exemplary screen shots of a wizard service are shown in FIGS. 24A-L. In one embodiment, there are three stages to the workflow of the wizard service. As shown in FIG. 24A, the user may initially choose to create a new search 2402 or load a previously saved search 2404. Saved searches may be retrieved via a drop-down list 2406. Next, the user may define the scope of the analysis. For example, diseases experts and target class representatives identify their initial area of interest such as a disease 2408 or a target 2410, or both 2412, through the use of the wizard, as shown in FIG. 24B. Depending on their selection, the wizard service will guide the user through a series of screens to further define the scope of the search.
  • Next, matching terms are searched and allow user to select one or more matching terms to augment or refine search parameters. An exemplary process for determining additional keywords for diseases is shown in FIGS. 24C-D. Based on the input keyword 2414, the wizard service may assist the user to enhance the list of terms 2416 by providing them with a list of diseases including the keyword 2414, as shown in FIG. 24C. Additionally, the user may choose 2418 to include known related diseases, such as parent and/or child diseases, as shown in FIG. 24D. If the user so chooses 2418, a list of known related diseases 2420 may be displayed. The may choose to include any or all of the related diseases in the search. Similarly, the user may select targets by entering a target keyword 2422 and selecting targets that include the keyword 2424, as shown in FIG. 24E. Once the user has defined the diseases and/or targets to include in the search, the user may be be provided with a list of current diseases 2426 and/or targets 2428 and prompted to validate the selections, as shown in FIG. 24F. At this point, the user may edit the search parameters associated with each of the diseases 2426 and/or targets 2428.
  • Next, the user may choose to augment the search to include additional keywords from topics such as genes 2430, proteins 2432, and pathways 2434, as shown in FIG. 24G. In each case, the user may be presented with a list of additional keywords and have the ability to select any keywords from the list to include them in the search. As shown in FIG. 24H, the user may be presented with a list 2436 of genes related to the selected diseases and/or targets. The user may then select any of the genes to add them in the search. Optionally, the user may also provide keywords 2440 to search for additional genes including the keyword 2440. Genes including the keyword 2440 may be displayed in the corresponding field 2438, and the user may select any gene from the list to include it in the search. Additionally, or alternatively, the user may also be able to directly add a known gene to the scope of a search by manually entering the gene into the appropriate field 2442. Similar processes may be included for adding protein and pathway related keywords to the search, as shown in FIGS. 241 and 24J.
  • The result of this first stage is a collection of keywords that are related by the knowledge model 140. The result of this first stage is a collection of keywords that are related by the knowledge model 140. At this point, the user may be prompted to validate the scope of the search, as shown in FIG. 24K. A list of all keywords 2444 may be displayed. In one embodiment, the user may then choose to go back to any of the previous steps and further refine the scope of the search. The user also have the option to save 2446 the query at this point. In one embodiment, the user may save the query by entering a query name.
  • Once all the terms have been finalized, the wizard submits the query and collates the results. In one embodiment, these keywords may be searched against project and literature databases, for example, by submitting search strings to the database search indices to find, for example, projects and literature that match the list of relevant terms. The wizard service may return a set of projects/literature that match the set of query terms. Preferably, the query terms may be ranked and organized by the number of relevant search terms that were found in each search result. Thus, a results list of pointers to projects and literature that mention the keyword combinations within the analysis scope may be created.
  • Finally, the user reviews the results identified to review potentially applicable projects and literature and compounds, as shown in FIG. 24L. In one embodiment, selecting an item on the results lists 2448 and 2450 causes that item to become the active node 1838. When an item of the results list is selected, that item takes centrals focus in navigator tool 170, allowing the user to rapidly build an understanding of the item selected and to explore the knowledge model 140 around the project/asset to add context and explore related literature and topics.
  • Referring again to FIG. 19, a monitored items button 1934 may be provided to launch a monitored items dialog that allows the user to select to be notified when new relationships or literature are discovered for a particular item. An exemplary monitored items dialog 2500 is shown in FIG. 25. The monitored items dialog 2500 includes a last publication date 2510 which represents the most recent date on which new information was integrated into the knowledge model 140. The dialog also includes a list 2512 of all monitored items that have changed since the items associated monitoring date and the last publication date 2510.
  • Referring again to FIG. 19, a filters button 1936 may be provided to launch a filters dialog that allow the user to establish filter settings that filter the related items 1940 being displayed in an entity component 1932. In general, filters are a mechanism for focusing the results displayed in the navigator tool 170. Preferably, the filters are implemented as client-side applications. It should be apparent to one of ordinary skill in the art that the number of filters available for an entity component may vary based on the data stored in the associated knowledge model 140 table. Preferably, several types of filters are accessible directly from the Navigator panels. The entity component 1832 should display a filter icon 1844 if one or more filters exist for that pane. Clicking on the filter icon may also launch the filters dialog.
  • An exemplary filters dialog 2600 is shown in FIGS. 26A-E. The filters dialog 2600 may include several tabbed filter options pages in which the user may specify various filtering options, such as general filter options, entity filtering options, journal filtering options, publication filtering options, and the like. In one embodiment, general filtering options include filter persistence 2602 and internal/external filtering 2604. If the user selects persistent filtering 2602, the navigator tool 170 will filter the results of each navigation event. Otherwise, the navigator tool will only filter the current navigation event. Toggling the internal/external filtering option 2604 allows the user to limit results to data source that are internal or external to their enterprise.
  • FIG. 26B shows an exemplary screen shot of a entity filter options page. Entity filtering allows the user to specify parameters to filter the display to show only those related items 1840 that relate to specific entities. Exemplary entity filter entities for a pharmaceutical research navigation tool include organisms and phenotypes. In one embodiment, the user may specify a list of phenotypes 2610 and/or organisms 2612 to display. The user may edit the list of displayable organisms by selecting the edit list button 2614, which may launch a dialog 2620 as shown in FIG. 26C. The user may then view a list of available organisms 2622 by entering a keyword or selecting the appropriate first letter of the organism name from the alpha-bar 2626. The user may then select organisms to add or remove from the list of displayable organisms 2628. A similar dialog may be used to edit the phenotype list.
  • The user may also be able to filter displayed literature items to those items found in particular journals. An exemplary screen shot of a journal fitler options page is shown in FIG. 26D. The user may specify a list of displayable journals 2630 in a similar manner to the organism and phenotype lists described above. Additionally, the user may specify a threshold journal impact level via the corresponding controls 2632. In one embodiment, the journal impact level corresponds to an ISI journal impact ranking. Finally, the user may also be able to filter items based on their publication date, as shown in FIG. 26E. In one embodiment, the user may limit the results to items published within a set amount of time 2640, or to those items published before a certain date 2642.
  • Referring again to FIG. 19, an internal/external filter button 1938 may be provided to allow the user to select related items 1940 based on the source from which they were obtained, as describe above. A confidence box 1940 may also be provided to allow the user to filter the items 1940 displayed in all entity components 1930 based on confidence values. These filters are referred to as confidence filters. In one embodiment, the confidence box 1940 is implemented a button associated with each confidence value may be provided to allow the user to display/hide links of the corresponding confidence value. Alternatively, the confidence button 1940 may be implemented as a list of confidence values wherein the navigator tool only displays those items 1940 meeting the selected threshold confidence value. In yet another embodiment, the confidence button 1940 may be implemented as a text box that establishes a threshold confidence value and only those related items 1940 meeting the threshold value may be displayed. The threshold confidence value may be indicative of the relationship type, as described above. For example, a threshold value of one may correspond to a direct relationship.
  • A context drop down list 1942 may be included to provide the user with a list of previously saved, or system provided, stored sets of context. A context represents a set of navigator tool settings. In one embodiment, a context includes filter settings, confidence filter settings, and panel layouts. Alternatively, or in addition to, the context drop down list 1942 may also provide access to personal and group default preferences sets associated with login information. Upon selection of a context set, the navigator tool 170 will update the current display to reflect the newly selected context. Alternate context sets containing various sets of information should be readily apparent to one of ordinary skill in the art. For example, master context information may also be stored in a context set. The context drop down list 2090 may display a list of stored preference sets by name. In one embodiment, a user may save a new context by selecting a “save new” option from the context drop-down list 1942.
  • It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention.

Claims (20)

1. A method for integrating a data item into a knowledge model, the method comprising:
retrieving the data item from a data source;
determining if the data item has been previously integrated into the knowledge model; and
integrating the data element into the knowledge model if the data item has not been previously integrated.
2. The method of claim 1, wherein determining if the data item has been previously integrated further comprising:
generating a value based in part on the data item; and
comparing the value to a table of values generated for previously integrated data items.
3. The method of claim 2, further comprising storing the generated value in the table if the value is not in the table.
4. The method of claim 2, wherein the value is generated by a hash function.
5. The method of claim 2, wherein the data item includes a title and content.
6. The method of claim 5, wherein the value includes an identifier and a sub-value, the identifier based on at least one designator selected from the group consisting of the title and the data source, the sub-value based in part on the content, the identifier and sub-value forming an identifier and sub-value pair,
where the table of values includes identifier and sub-value pairs,
where the comparing further comprises comparing the identifier and sub-value pair to the table of identifier and sub-value pairs, and
where the integrating further comprises integrating the data item into the knowledge model if the identifier and sub-value pair is not in the table.
7. A method of integrating a data item into a knowledge model, the knowledge model including data collected from a plurality of data sources, the method comprising:
retrieving a data item from one of the plurality of data sources, the data item including a first type of information;
determining a reliability value for the one of the plurality of data sources for the first type of information by either leveraging an existing reliability score indicative of a source's reliability or generating an independent reliability score indicative of a source's reliability; and
integrating the data item and the reliability value into the knowledge model.
8. The method of claim 7, wherein the integrating includes inserting the data item into a field of the knowledge model.
9. The method of claim 8 further comprising:
determining if the field includes previously integrated information, the previously integrated information having an associated previous reliability value;
comparing the reliability value to the previous reliability value; and
integrating the data item if the reliability value is greater than the previous reliability value.
10. The method of claim 7, wherein the reliability value is based in part on an external ranking of data source reliability.
11. A system for integrating a data item into a knowledge model, the system comprising:
a retrieval tool adapted for retrieving the data item from a data source; and
an integration tool adapted for determining if the data item has been previously integrated into the knowledge model and integrating the data element into the knowledge model if the data item has not been previously integrated.
12. The system of claim 11, wherein the integrations tool is further adapted for generating a value based in part on the data item and comparing the value to a table of values generated for previously integrated data items.
13. The system of claim 12, wherein the integrations tool is further adapted for storing the generated value in the table if the value is not in the table.
14. The system of claim 12, wherein the value is generated by a hash function.
15. The system of claim 12, wherein the data item includes a title and content.
16. The system of claim 15, wherein the value includes an identifier and a sub-value, the identifier based on at least one designator selected from the group consisting of the title and the data source, the sub-value based in part on the content, the identifier and sub-value forming an identifier and sub-value pair,
where the table of values includes identifier and sub-value pairs,
where the integration tool is further adapted for comparing the identifier and sub-value pair to the table of identifier and sub-value pairs and integrating the data item into the knowledge model if the identifier and sub-value pair is not in the table.
17. A system for integrating a data item into a knowledge model, the knowledge model including data collected from a plurality of data sources, the system comprising:
a retrieval tool adapted for retrieving a data item from one of the plurality of data sources, the data item including a first type of information; and
an integration tool adapted for determining a reliability value for the one of the plurality of data sources for the first type of information by either leveraging an existing reliability score indicative of a source's reliability or generating an independent reliability score indicative of a source's reliability and integrating the data item and the reliability value into the knowledge model.
18. The system of claim 17, wherein the integration tool is further adapted for inserting the data item into a field of the knowledge model.
19. The system of claim 18, wherein the integration tool is further adapted for determining if the field includes previously integrated information, the previously integrated information having an associated previous reliability value, comparing the reliability value to the previous reliability value, and integrating the data item if the reliability value is greater than the previous reliability value.
20. The system of claim 17, wherein the reliability value is based in part on an external ranking of data source reliability.
US11/127,778 2005-02-04 2005-05-11 Knowledge discovery tool extraction and integration Abandoned US20060179026A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US11/127,778 US20060179026A1 (en) 2005-02-04 2005-05-11 Knowledge discovery tool extraction and integration
AU2006210140A AU2006210140B2 (en) 2005-02-04 2006-02-06 Knowledge discovery tool extraction and integration
PCT/EP2006/001021 WO2006082094A2 (en) 2005-02-04 2006-02-06 Knowledge discovery tool extraction and integration
EP06706676A EP1844407A2 (en) 2005-02-04 2006-02-06 Knowledge discovery tool extraction and integration
US12/070,457 US8356036B2 (en) 2005-02-04 2008-02-19 Knowledge discovery tool extraction and integration

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/051,733 US20060179024A1 (en) 2005-02-04 2005-02-04 Knowledge discovery tool extraction and integration
US11/127,778 US20060179026A1 (en) 2005-02-04 2005-05-11 Knowledge discovery tool extraction and integration

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/051,733 Continuation-In-Part US20060179024A1 (en) 2005-02-04 2005-02-04 Knowledge discovery tool extraction and integration

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US12/070,457 Continuation US8356036B2 (en) 2005-02-04 2008-02-19 Knowledge discovery tool extraction and integration
US12/070,457 Division US8356036B2 (en) 2005-02-04 2008-02-19 Knowledge discovery tool extraction and integration

Publications (1)

Publication Number Publication Date
US20060179026A1 true US20060179026A1 (en) 2006-08-10

Family

ID=36204338

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/127,778 Abandoned US20060179026A1 (en) 2005-02-04 2005-05-11 Knowledge discovery tool extraction and integration
US12/070,457 Active 2028-02-26 US8356036B2 (en) 2005-02-04 2008-02-19 Knowledge discovery tool extraction and integration

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/070,457 Active 2028-02-26 US8356036B2 (en) 2005-02-04 2008-02-19 Knowledge discovery tool extraction and integration

Country Status (4)

Country Link
US (2) US20060179026A1 (en)
EP (1) EP1844407A2 (en)
AU (1) AU2006210140B2 (en)
WO (1) WO2006082094A2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060179027A1 (en) * 2005-02-04 2006-08-10 Bechtel Michael E Knowledge discovery tool relationship generation
US20080086343A1 (en) * 2006-10-10 2008-04-10 Accenture Forming a business relationship network
US20080147590A1 (en) * 2005-02-04 2008-06-19 Accenture Global Services Gmbh Knowledge discovery tool extraction and integration
US20080281841A1 (en) * 2003-09-12 2008-11-13 Kishore Swaminathan Navigating a software project respository
US7765176B2 (en) 2006-11-13 2010-07-27 Accenture Global Services Gmbh Knowledge discovery system with user interactive analysis view for analyzing and generating relationships
US20100325101A1 (en) * 2009-06-19 2010-12-23 Beal Alexander M Marketing asset exchange
US20110055127A1 (en) * 2009-08-31 2011-03-03 Accenture Global Services Gmbh Model optimization system using variable scoring
US20110131209A1 (en) * 2005-02-04 2011-06-02 Bechtel Michael E Knowledge discovery tool relationship generation
US8010581B2 (en) 2005-02-04 2011-08-30 Accenture Global Services Limited Knowledge discovery tool navigation
US20130268538A1 (en) * 2005-05-06 2013-10-10 Nelson Information Systems Database and Index Organization for Enhanced Document Retrieval
US20140257861A1 (en) * 2007-01-05 2014-09-11 Idexx Laboratories, Inc. Method and System for Representation of Current and Historical Medical Data
US20150234859A1 (en) * 2012-10-30 2015-08-20 Landmark Graphics Corporation Managing Inferred Data
US20210133349A1 (en) * 2019-11-04 2021-05-06 Aetna Inc. Unified data fabric for managing data lifecycles and data flows

Families Citing this family (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7941399B2 (en) 2007-11-09 2011-05-10 Microsoft Corporation Collaborative authoring
US8028229B2 (en) * 2007-12-06 2011-09-27 Microsoft Corporation Document merge
US8825758B2 (en) * 2007-12-14 2014-09-02 Microsoft Corporation Collaborative authoring modes
US8301588B2 (en) * 2008-03-07 2012-10-30 Microsoft Corporation Data storage for file updates
US8352870B2 (en) * 2008-04-28 2013-01-08 Microsoft Corporation Conflict resolution
US8429753B2 (en) 2008-05-08 2013-04-23 Microsoft Corporation Controlling access to documents using file locks
US8825594B2 (en) 2008-05-08 2014-09-02 Microsoft Corporation Caching infrastructure
US8417666B2 (en) 2008-06-25 2013-04-09 Microsoft Corporation Structured coauthoring
US10679749B2 (en) * 2008-08-22 2020-06-09 International Business Machines Corporation System and method for virtual world biometric analytics through the use of a multimodal biometric analytic wallet
US8346768B2 (en) 2009-04-30 2013-01-01 Microsoft Corporation Fast merge support for legacy documents
US20110173236A1 (en) * 2010-01-13 2011-07-14 E-Profile Method and system for generating a virtual profile of an entity
WO2013020084A1 (en) * 2011-08-04 2013-02-07 Google Inc. Providing knowledge panels with search results
CA2844065C (en) 2011-08-04 2018-04-03 Google Inc. Providing knowledge panels with search results
US9286414B2 (en) * 2011-12-02 2016-03-15 Microsoft Technology Licensing, Llc Data discovery and description service
US8825763B2 (en) * 2011-12-09 2014-09-02 Facebook, Inc. Bookmarking social networking system content
US9292094B2 (en) 2011-12-16 2016-03-22 Microsoft Technology Licensing, Llc Gesture inferred vocabulary bindings
CN103970758A (en) * 2013-01-29 2014-08-06 鸿富锦精密工业(深圳)有限公司 Database accessing system and method
US9189539B2 (en) 2013-03-15 2015-11-17 International Business Machines Corporation Electronic content curating mechanisms
US9348573B2 (en) 2013-12-02 2016-05-24 Qbase, LLC Installation and fault handling in a distributed system utilizing supervisor and dependency manager nodes
CN106462575A (en) 2013-12-02 2017-02-22 丘贝斯有限责任公司 Design and implementation of clustered in-memory database
US9659108B2 (en) 2013-12-02 2017-05-23 Qbase, LLC Pluggable architecture for embedding analytics in clustered in-memory databases
US9547701B2 (en) 2013-12-02 2017-01-17 Qbase, LLC Method of discovering and exploring feature knowledge
US9922032B2 (en) 2013-12-02 2018-03-20 Qbase, LLC Featured co-occurrence knowledge base from a corpus of documents
US9336280B2 (en) 2013-12-02 2016-05-10 Qbase, LLC Method for entity-driven alerts based on disambiguated features
WO2015084726A1 (en) * 2013-12-02 2015-06-11 Qbase, LLC Event detection through text analysis template models
US9317565B2 (en) 2013-12-02 2016-04-19 Qbase, LLC Alerting system based on newly disambiguated features
US9424294B2 (en) 2013-12-02 2016-08-23 Qbase, LLC Method for facet searching and search suggestions
US9355152B2 (en) 2013-12-02 2016-05-31 Qbase, LLC Non-exclusionary search within in-memory databases
US9424524B2 (en) 2013-12-02 2016-08-23 Qbase, LLC Extracting facts from unstructured text
US9208204B2 (en) 2013-12-02 2015-12-08 Qbase, LLC Search suggestions using fuzzy-score matching and entity co-occurrence
US9230041B2 (en) 2013-12-02 2016-01-05 Qbase, LLC Search suggestions of related entities based on co-occurrence and/or fuzzy-score matching
US9223875B2 (en) 2013-12-02 2015-12-29 Qbase, LLC Real-time distributed in memory search architecture
KR20160124742A (en) 2013-12-02 2016-10-28 큐베이스 엘엘씨 Method for disambiguating features in unstructured text
US9542477B2 (en) 2013-12-02 2017-01-10 Qbase, LLC Method of automated discovery of topics relatedness
US9984427B2 (en) 2013-12-02 2018-05-29 Qbase, LLC Data ingestion module for event detection and increased situational awareness
US9025892B1 (en) 2013-12-02 2015-05-05 Qbase, LLC Data record compression with progressive and/or selective decomposition
US9223833B2 (en) 2013-12-02 2015-12-29 Qbase, LLC Method for in-loop human validation of disambiguated features
US9544361B2 (en) 2013-12-02 2017-01-10 Qbase, LLC Event detection through text analysis using dynamic self evolving/learning module
US9619571B2 (en) 2013-12-02 2017-04-11 Qbase, LLC Method for searching related entities through entity co-occurrence
US9177262B2 (en) 2013-12-02 2015-11-03 Qbase, LLC Method of automated discovery of new topics
US9201744B2 (en) 2013-12-02 2015-12-01 Qbase, LLC Fault tolerant architecture for distributed computing systems
US9361317B2 (en) 2014-03-04 2016-06-07 Qbase, LLC Method for entity enrichment of digital content to enable advanced search functionality in content management systems
US20170213044A1 (en) * 2016-01-25 2017-07-27 Lighthouse Document Technologies, Inc. (d/b/a Lighthouse eDiscovery) Privilege Log Generation Method and Apparatus
US11036692B2 (en) * 2016-09-17 2021-06-15 Oracle International Corporation Governance pools in hierarchical systems
CN108268581A (en) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 The construction method and device of knowledge mapping
US10997635B2 (en) * 2018-11-29 2021-05-04 Walmart Apollo, Llc Method and apparatus for advertisement information error detection and correction
US11256709B2 (en) 2019-08-15 2022-02-22 Clinicomp International, Inc. Method and system for adapting programs for interoperability and adapters therefor

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5265065A (en) * 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US5535325A (en) * 1994-12-19 1996-07-09 International Business Machines Corporation Method and apparatus for automatically generating database definitions of indirect facts from entity-relationship diagrams
US5644740A (en) * 1992-12-02 1997-07-01 Hitachi, Ltd. Method and apparatus for displaying items of information organized in a hierarchical structure
US5953723A (en) * 1993-04-02 1999-09-14 T.M. Patents, L.P. System and method for compressing inverted index files in document search/retrieval system
US6233571B1 (en) * 1993-06-14 2001-05-15 Daniel Egger Method and apparatus for indexing, searching and displaying data
US6256032B1 (en) * 1996-11-07 2001-07-03 Thebrain Technologies Corp. Method and apparatus for organizing and processing information using a digital computer
US20020046296A1 (en) * 1999-09-10 2002-04-18 Kloba David D. System, method , and computer program product for syncing to mobile devices
US6397231B1 (en) * 1998-08-31 2002-05-28 Xerox Corporation Virtual documents generated via combined documents or portions of documents retrieved from data repositories
US20020065856A1 (en) * 1998-05-27 2002-05-30 Wisdombuilder, Llc System method and computer program product to automate the management and analysis of heterogeneous data
US6434558B1 (en) * 1998-12-16 2002-08-13 Microsoft Corporation Data lineage data type
US6460034B1 (en) * 1997-05-21 2002-10-01 Oracle Corporation Document knowledge base research and retrieval system
US20040015486A1 (en) * 2002-07-19 2004-01-22 Jiasen Liang System and method for storing and retrieving data
US20040090472A1 (en) * 2002-10-21 2004-05-13 Risch John S. Multidimensional structured data visualization method and apparatus, text visualization method and apparatus, method and apparatus for visualizing and graphically navigating the world wide web, method and apparatus for visualizing hierarchies
US20040186842A1 (en) * 2003-03-18 2004-09-23 Darren Wesemann Systems and methods for providing access to data stored in different types of data repositories
US20040186824A1 (en) * 2003-03-17 2004-09-23 Kemal Delic Storing and/or retrieving a document within a knowledge base or document repository
US20050043940A1 (en) * 2003-08-20 2005-02-24 Marvin Elder Preparing a data source for a natural language query
US20050060643A1 (en) * 2003-08-25 2005-03-17 Miavia, Inc. Document similarity detection and classification system
US7047236B2 (en) * 2002-12-31 2006-05-16 International Business Machines Corporation Method for automatic deduction of rules for matching content to categories

Family Cites Families (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2690782B2 (en) 1989-05-30 1997-12-17 富士写真フイルム株式会社 Image filing system
US5659724A (en) * 1992-11-06 1997-08-19 Ncr Interactive data analysis apparatus employing a knowledge base
US5539865A (en) * 1992-11-10 1996-07-23 Adobe Systems, Inc. Method and apparatus for processing data for a visual-output device with reduced buffer memory requirements
US5499334A (en) 1993-03-01 1996-03-12 Microsoft Corporation Method and system for displaying window configuration of inactive programs
US5506984A (en) 1993-06-30 1996-04-09 Digital Equipment Corporation Method and system for data retrieval in a distributed system using linked location references on a plurality of nodes
US6339767B1 (en) * 1997-06-02 2002-01-15 Aurigin Systems, Inc. Using hyperbolic trees to visualize data generated by patent-centric and group-oriented data processing
US5600831A (en) * 1994-02-28 1997-02-04 Lucent Technologies Inc. Apparatus and methods for retrieving information by modifying query plan based on description of information sources
US5745895A (en) * 1994-06-21 1998-04-28 International Business Machines Corporation Method for association of heterogeneous information
US5608900A (en) * 1994-06-21 1997-03-04 Internationl Business Machines Corp. Generation and storage of connections between objects in a computer network
US5590250A (en) * 1994-09-14 1996-12-31 Xerox Corporation Layout of node-link structures in space with negative curvature
US5619632A (en) * 1994-09-14 1997-04-08 Xerox Corporation Displaying node-link structure with region of greater spacings and peripheral branches
US5801702A (en) * 1995-03-09 1998-09-01 Terrabyte Technology System and method for adding network links in a displayed hierarchy
US6061675A (en) * 1995-05-31 2000-05-09 Oracle Corporation Methods and apparatus for classifying terminology utilizing a knowledge catalog
US5794257A (en) * 1995-07-14 1998-08-11 Siemens Corporate Research, Inc. Automatic hyperlinking on multimedia by compiling link specifications
US5995961A (en) * 1995-11-07 1999-11-30 Lucent Technologies Inc. Information manifold for query processing
US6035300A (en) * 1995-12-15 2000-03-07 International Business Machines Corporation Method and apparatus for generating a user interface from the entity/attribute/relationship model of a database
JP2000507008A (en) 1996-04-04 2000-06-06 フレア・テクノロジーズ・リミテッド Systems, software and methods for locating information in a collection of text-based information sources
US6012055A (en) * 1996-04-09 2000-01-04 Silicon Graphics, Inc. Mechanism for integrated information search and retrieval from diverse sources using multiple navigation methods
US6052693A (en) * 1996-07-02 2000-04-18 Harlequin Group Plc System for assembling large databases through information extracted from text sources
US5819291A (en) * 1996-08-23 1998-10-06 General Electric Company Matching new customer records to existing customer records in a large business database using hash key
US6166739A (en) * 1996-11-07 2000-12-26 Natrificial, Llc Method and apparatus for organizing and processing information using a digital computer
US6037944A (en) * 1996-11-07 2000-03-14 Natrificial Llc Method and apparatus for displaying a thought network from a thought's perspective
EP1010100A1 (en) * 1997-01-24 2000-06-21 The Board Of Regents Of The University Of Washington Method and system for network information access
JP4054398B2 (en) * 1997-03-24 2008-02-27 キヤノン株式会社 Information processing apparatus and method
JPH10307881A (en) * 1997-05-08 1998-11-17 Fujitsu Ltd Electronic transaction device and computer-readable storage medium recording control program for executing format conversion by electronic transaction
US5999940A (en) * 1997-05-28 1999-12-07 Home Information Services, Inc. Interactive information discovery tool and methodology
US6374275B2 (en) 1997-06-11 2002-04-16 Scientific-Atlanta, Inc. System, method, and media for intelligent selection of searching terms in a keyboardless entry environment
US5983218A (en) * 1997-06-30 1999-11-09 Xerox Corporation Multimedia database for use over networks
US6166736A (en) * 1997-08-22 2000-12-26 Natrificial Llc Method and apparatus for simultaneously resizing and relocating windows within a graphical display
US6018735A (en) * 1997-08-22 2000-01-25 Canon Kabushiki Kaisha Non-literal textual search using fuzzy finite-state linear non-deterministic automata
US6038668A (en) * 1997-09-08 2000-03-14 Science Applications International Corporation System, method, and medium for retrieving, organizing, and utilizing networked data
US6658623B1 (en) 1997-09-15 2003-12-02 Fuji Xerox Co., Ltd. Displaying in a first document a selectable link to a second document based on a passive query
US5974412A (en) * 1997-09-24 1999-10-26 Sapient Health Network Intelligent query system for automatically indexing information in a database and automatically categorizing users
US6236994B1 (en) * 1997-10-21 2001-05-22 Xerox Corporation Method and apparatus for the integration of information and knowledge
US5965688A (en) * 1997-12-12 1999-10-12 General Electric Company Interfacial polycarbonate polymerization process and product
US6370551B1 (en) 1998-04-14 2002-04-09 Fuji Xerox Co., Ltd. Method and apparatus for displaying references to a user's document browsing history within the context of a new document
US6134559A (en) * 1998-04-27 2000-10-17 Oracle Corporation Uniform object model having methods and additional features for integrating objects defined by different foreign object type systems into a single type system
US6581058B1 (en) * 1998-05-22 2003-06-17 Microsoft Corporation Scalable system for clustering of large databases having mixed data attributes
EP1086435A1 (en) * 1998-06-11 2001-03-28 Boardwalk AG System, method, and computer program product for providing relational patterns between entities
NL1009376C1 (en) * 1998-06-11 1998-07-06 Boardwalk Ag Data system for providing relationship patterns between people.
US6112209A (en) * 1998-06-17 2000-08-29 Gusack; Mark David Associative database model for electronic-based informational assemblies
US6446061B1 (en) * 1998-07-31 2002-09-03 International Business Machines Corporation Taxonomy generation for document collections
US6567814B1 (en) * 1998-08-26 2003-05-20 Thinkanalytics Ltd Method and apparatus for knowledge discovery in databases
US6266682B1 (en) * 1998-08-31 2001-07-24 Xerox Corporation Tagging related files in a document management system
US6446076B1 (en) * 1998-11-12 2002-09-03 Accenture Llp. Voice interactive web-based agent system responsive to a user location for prioritizing and formatting information
US6330007B1 (en) * 1998-12-18 2001-12-11 Ncr Corporation Graphical user interface (GUI) prototyping and specification tool
US6425525B1 (en) 1999-03-19 2002-07-30 Accenture Llp System and method for inputting, retrieving, organizing and analyzing data
EP1039265A1 (en) 1999-03-23 2000-09-27 Sony International (Europe) GmbH System and method for automatically managing geolocation information
US6826559B1 (en) * 1999-03-31 2004-11-30 Verizon Laboratories Inc. Hybrid category mapping for on-line query tool
US6434556B1 (en) * 1999-04-16 2002-08-13 Board Of Trustees Of The University Of Illinois Visualization of Internet search information
US6493702B1 (en) * 1999-05-05 2002-12-10 Xerox Corporation System and method for searching and recommending documents in a collection using share bookmarks
US6442545B1 (en) * 1999-06-01 2002-08-27 Clearforest Ltd. Term-level text with mining with taxonomies
US7181438B1 (en) * 1999-07-21 2007-02-20 Alberti Anemometer, Llc Database access system
CA2360589A1 (en) 1999-11-15 2001-05-25 Mohammed Shahbaz Anwar Programs and methods for the display, analysis and manipulation of multi-dimension data implemented on a computer
US20020007284A1 (en) * 1999-12-01 2002-01-17 Schurenberg Kurt B. System and method for implementing a global master patient index
US6418448B1 (en) * 1999-12-06 2002-07-09 Shyam Sundar Sarkar Method and apparatus for processing markup language specifications for data and metadata used inside multiple related internet documents to navigate, query and manipulate information from a plurality of object relational databases over the web
US6963867B2 (en) * 1999-12-08 2005-11-08 A9.Com, Inc. Search query processing to provide category-ranked presentation of search results
US6829615B2 (en) * 2000-02-25 2004-12-07 International Business Machines Corporation Object type relationship graphical user interface
US6957205B1 (en) 2000-03-08 2005-10-18 Accenture Llp Knowledge model-based indexing of information
US6721726B1 (en) * 2000-03-08 2004-04-13 Accenture Llp Knowledge management tool
US6564209B1 (en) * 2000-03-08 2003-05-13 Accenture Llp Knowledge management tool for providing abstracts of information
US6900807B1 (en) * 2000-03-08 2005-05-31 Accenture Llp System for generating charts in a knowledge management tool
US7350138B1 (en) * 2000-03-08 2008-03-25 Accenture Llp System, method and article of manufacture for a knowledge management tool proposal wizard
US6727927B1 (en) * 2000-03-08 2004-04-27 Accenture Llp System, method and article of manufacture for a user interface for a knowledge management tool
US6636848B1 (en) * 2000-05-31 2003-10-21 International Business Machines Corporation Information search using knowledge agents
US6684388B1 (en) * 2000-08-22 2004-01-27 International Business Machines Corporation Method for generating platform independent, language specific computer code
AU2001290646A1 (en) 2000-09-08 2002-03-22 The Regents Of The University Of California Data source integration system and method
US6684205B1 (en) * 2000-10-18 2004-01-27 International Business Machines Corporation Clustering hypertext with applications to web searching
US6460558B2 (en) * 2000-12-04 2002-10-08 Sauer-Danfoss, Inc. Pilot stage or pressure control pilot valve having a single armature/flapper
US6938053B2 (en) 2001-03-02 2005-08-30 Vality Technology Incorporated Categorization based on record linkage theory
WO2002084590A1 (en) * 2001-04-11 2002-10-24 Applied Minds, Inc. Knowledge web
US20020194201A1 (en) * 2001-06-05 2002-12-19 Wilbanks John Thompson Systems, methods and computer program products for integrating biological/chemical databases to create an ontology network
US20050108200A1 (en) * 2001-07-04 2005-05-19 Frank Meik Category based, extensible and interactive system for document retrieval
US20030115191A1 (en) * 2001-12-17 2003-06-19 Max Copperman Efficient and cost-effective content provider for customer relationship management (CRM) or other applications
US7072883B2 (en) * 2001-12-21 2006-07-04 Ut-Battelle Llc System for gathering and summarizing internet information
WO2003067497A1 (en) * 2002-02-04 2003-08-14 Cataphora, Inc A method and apparatus to visually present discussions for data mining purposes
US6996774B2 (en) * 2002-02-12 2006-02-07 Accenture Global Services Gmbh Display of data element indicia based on data types
US7567953B2 (en) * 2002-03-01 2009-07-28 Business Objects Americas System and method for retrieving and organizing information from disparate computer network information sources
US20040122689A1 (en) * 2002-12-20 2004-06-24 Dailey Roger S. Method and apparatus for tracking a part
JP3944102B2 (en) * 2003-03-13 2007-07-11 株式会社日立製作所 Document retrieval system using semantic network
US7499046B1 (en) * 2003-03-15 2009-03-03 Oculus Info. Inc. System and method for visualizing connected temporal and spatial information as an integrated visual representation on a user interface
US20050060287A1 (en) * 2003-05-16 2005-03-17 Hellman Ziv Z. System and method for automatic clustering, sub-clustering and cluster hierarchization of search results in cross-referenced databases using articulation nodes
US7321886B2 (en) 2003-07-29 2008-01-22 Accenture Global Services Gmbh Rapid knowledge transfer among workers
US7383269B2 (en) * 2003-09-12 2008-06-03 Accenture Global Services Gmbh Navigating a software project repository
US20050149538A1 (en) * 2003-11-20 2005-07-07 Sadanand Singh Systems and methods for creating and publishing relational data bases
US7689585B2 (en) 2004-04-15 2010-03-30 Microsoft Corporation Reinforced clustering of multi-type data objects for search term suggestion
EP1759280A4 (en) 2004-05-04 2009-08-26 Boston Consulting Group Inc Method and apparatus for selecting, analyzing and visualizing related database records as a network
US7343666B2 (en) * 2004-06-30 2008-03-18 Hitachi Global Storage Technologies Netherlands B.V. Methods of making magnetic write heads with use of linewidth shrinkage techniques
US7496593B2 (en) * 2004-09-03 2009-02-24 Biowisdom Limited Creating a multi-relational ontology having a predetermined structure
US20060074833A1 (en) * 2004-09-03 2006-04-06 Biowisdom Limited System and method for notifying users of changes in multi-relational ontologies
US20060074836A1 (en) * 2004-09-03 2006-04-06 Biowisdom Limited System and method for graphically displaying ontology data
EP1667041A3 (en) 2004-11-30 2008-03-05 Oculus Info Inc. System and method for interactive visual representation of information content and relationships using layout and gestures
US20060179025A1 (en) * 2005-02-04 2006-08-10 Bechtel Michael E Knowledge discovery tool relationship generation
US20060179024A1 (en) * 2005-02-04 2006-08-10 Bechtel Michael E Knowledge discovery tool extraction and integration
US20060179069A1 (en) * 2005-02-04 2006-08-10 Bechtel Michael E Knowledge discovery tool navigation
US7904411B2 (en) * 2005-02-04 2011-03-08 Accenture Global Services Limited Knowledge discovery tool relationship generation
US20060179026A1 (en) * 2005-02-04 2006-08-10 Bechtel Michael E Knowledge discovery tool extraction and integration
US20060179067A1 (en) * 2005-02-04 2006-08-10 Bechtel Michael E Knowledge discovery tool navigation

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5265065A (en) * 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US5644740A (en) * 1992-12-02 1997-07-01 Hitachi, Ltd. Method and apparatus for displaying items of information organized in a hierarchical structure
US5953723A (en) * 1993-04-02 1999-09-14 T.M. Patents, L.P. System and method for compressing inverted index files in document search/retrieval system
US6233571B1 (en) * 1993-06-14 2001-05-15 Daniel Egger Method and apparatus for indexing, searching and displaying data
US5535325A (en) * 1994-12-19 1996-07-09 International Business Machines Corporation Method and apparatus for automatically generating database definitions of indirect facts from entity-relationship diagrams
US6256032B1 (en) * 1996-11-07 2001-07-03 Thebrain Technologies Corp. Method and apparatus for organizing and processing information using a digital computer
US6460034B1 (en) * 1997-05-21 2002-10-01 Oracle Corporation Document knowledge base research and retrieval system
US20020065856A1 (en) * 1998-05-27 2002-05-30 Wisdombuilder, Llc System method and computer program product to automate the management and analysis of heterogeneous data
US6397231B1 (en) * 1998-08-31 2002-05-28 Xerox Corporation Virtual documents generated via combined documents or portions of documents retrieved from data repositories
US6434558B1 (en) * 1998-12-16 2002-08-13 Microsoft Corporation Data lineage data type
US20020046296A1 (en) * 1999-09-10 2002-04-18 Kloba David D. System, method , and computer program product for syncing to mobile devices
US20040015486A1 (en) * 2002-07-19 2004-01-22 Jiasen Liang System and method for storing and retrieving data
US20040090472A1 (en) * 2002-10-21 2004-05-13 Risch John S. Multidimensional structured data visualization method and apparatus, text visualization method and apparatus, method and apparatus for visualizing and graphically navigating the world wide web, method and apparatus for visualizing hierarchies
US7047236B2 (en) * 2002-12-31 2006-05-16 International Business Machines Corporation Method for automatic deduction of rules for matching content to categories
US20040186824A1 (en) * 2003-03-17 2004-09-23 Kemal Delic Storing and/or retrieving a document within a knowledge base or document repository
US20040186842A1 (en) * 2003-03-18 2004-09-23 Darren Wesemann Systems and methods for providing access to data stored in different types of data repositories
US20050043940A1 (en) * 2003-08-20 2005-02-24 Marvin Elder Preparing a data source for a natural language query
US20050060643A1 (en) * 2003-08-25 2005-03-17 Miavia, Inc. Document similarity detection and classification system

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080281841A1 (en) * 2003-09-12 2008-11-13 Kishore Swaminathan Navigating a software project respository
US7853556B2 (en) 2003-09-12 2010-12-14 Accenture Global Services Limited Navigating a software project respository
US8010581B2 (en) 2005-02-04 2011-08-30 Accenture Global Services Limited Knowledge discovery tool navigation
US20080147590A1 (en) * 2005-02-04 2008-06-19 Accenture Global Services Gmbh Knowledge discovery tool extraction and integration
US20060179027A1 (en) * 2005-02-04 2006-08-10 Bechtel Michael E Knowledge discovery tool relationship generation
US8660977B2 (en) 2005-02-04 2014-02-25 Accenture Global Services Limited Knowledge discovery tool relationship generation
US7904411B2 (en) 2005-02-04 2011-03-08 Accenture Global Services Limited Knowledge discovery tool relationship generation
US8356036B2 (en) 2005-02-04 2013-01-15 Accenture Global Services Knowledge discovery tool extraction and integration
US20110131209A1 (en) * 2005-02-04 2011-06-02 Bechtel Michael E Knowledge discovery tool relationship generation
US8938458B2 (en) * 2005-05-06 2015-01-20 Nelson Information Systems Database and index organization for enhanced document retrieval
US20130268538A1 (en) * 2005-05-06 2013-10-10 Nelson Information Systems Database and Index Organization for Enhanced Document Retrieval
US8249903B2 (en) 2006-10-10 2012-08-21 Accenture Global Services Limited Method and system of determining and evaluating a business relationship network for forming business relationships
US20080086343A1 (en) * 2006-10-10 2008-04-10 Accenture Forming a business relationship network
US7765176B2 (en) 2006-11-13 2010-07-27 Accenture Global Services Gmbh Knowledge discovery system with user interactive analysis view for analyzing and generating relationships
US7953687B2 (en) 2006-11-13 2011-05-31 Accenture Global Services Limited Knowledge discovery system with user interactive analysis view for analyzing and generating relationships
US20100293125A1 (en) * 2006-11-13 2010-11-18 Simmons Hillery D Knowledge discovery system with user interactive analysis view for analyzing and generating relationships
US20140257861A1 (en) * 2007-01-05 2014-09-11 Idexx Laboratories, Inc. Method and System for Representation of Current and Historical Medical Data
US11551789B2 (en) * 2007-01-05 2023-01-10 Idexx Laboratories, Inc. Method and system for representation of current and historical medical data
US20100325101A1 (en) * 2009-06-19 2010-12-23 Beal Alexander M Marketing asset exchange
US20110055127A1 (en) * 2009-08-31 2011-03-03 Accenture Global Services Gmbh Model optimization system using variable scoring
US9147206B2 (en) * 2009-08-31 2015-09-29 Accenture Global Services Limited Model optimization system using variable scoring
US20150234859A1 (en) * 2012-10-30 2015-08-20 Landmark Graphics Corporation Managing Inferred Data
US11861037B2 (en) * 2019-11-04 2024-01-02 Aetna Inc. Unified data fabric for managing data lifecycles and data flows
US20210133349A1 (en) * 2019-11-04 2021-05-06 Aetna Inc. Unified data fabric for managing data lifecycles and data flows

Also Published As

Publication number Publication date
EP1844407A2 (en) 2007-10-17
WO2006082094A2 (en) 2006-08-10
US20080147590A1 (en) 2008-06-19
AU2006210140A2 (en) 2006-08-10
US8356036B2 (en) 2013-01-15
AU2006210140B2 (en) 2011-09-08
WO2006082094A3 (en) 2006-11-02
AU2006210140A1 (en) 2006-08-10

Similar Documents

Publication Publication Date Title
US8356036B2 (en) Knowledge discovery tool extraction and integration
US7904411B2 (en) Knowledge discovery tool relationship generation
US8010581B2 (en) Knowledge discovery tool navigation
US20060179025A1 (en) Knowledge discovery tool relationship generation
US8660977B2 (en) Knowledge discovery tool relationship generation
US20060179067A1 (en) Knowledge discovery tool navigation
Lu PubMed and beyond: a survey of web tools for searching biomedical literature
US20060179024A1 (en) Knowledge discovery tool extraction and integration
Plake et al. AliBaba: PubMed as a graph
US8370352B2 (en) Contextual searching of electronic records and visual rule construction
Jagadish et al. Making database systems usable
WO2001024038A2 (en) Internet brokering service based upon individual health profiles
Zhang et al. VISAGE: a query interface for clinical research
de la Calle et al. BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature
Kraus et al. Olelo: a web application for intuitive exploration of biomedical literature
Rastegar-Mojarad et al. BELTracker: evidence sentence retrieval for BEL statements
Schulman Managing your patients' data in the neonatal and pediatric ICU: an introduction to databases and statistical analysis
WO2010141480A2 (en) Advanced features, service and displays of legal and regulatory information
Teixeira et al. Data mart construction based on semantic annotation of scientific articles: A case study for the prioritization of drug targets
Gaizauskas et al. Integrating biomedical text mining services into a distributed workflow environment
Abdullah Efficient searching strategies in Pubmed
CN116070002A (en) Information searching method, system, search engine and computer system
Muthukuri Ranking Literature from the Network of Drug-Disease Association through Multi-Layered Semantic Model
JP2005222263A (en) Term browsing type information access support system
Kanagasabai et al. Literature-driven, ontology-centric knowledge navigation for Lipidomics

Legal Events

Date Code Title Description
AS Assignment

Owner name: ACCENTURE GLOBAL SERVICES GMBH, SWITZERLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BECHTEL, MICHAEL E.;MATHUR, SANJAY;ARAGO, JORDI;REEL/FRAME:017211/0516;SIGNING DATES FROM 20060120 TO 20060124

STCB Information on status: application discontinuation

Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION