WO1997015018A1 - Method and system for providing uniform access to heterogeneous information - Google Patents

Method and system for providing uniform access to heterogeneous information Download PDF

Info

Publication number
WO1997015018A1
WO1997015018A1 PCT/US1996/015620 US9615620W WO9715018A1 WO 1997015018 A1 WO1997015018 A1 WO 1997015018A1 US 9615620 W US9615620 W US 9615620W WO 9715018 A1 WO9715018 A1 WO 9715018A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
server
metadata
εaid
objects
Prior art date
Application number
PCT/US1996/015620
Other languages
French (fr)
Inventor
Howard Marcus
Kshitij Jawahar Shah
Amit Pravinkumar Sheth
Leon A. Shklar
Jerome Raymond Surak
Satish Mukund Thatte
Original Assignee
Bell Communications Research, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Communications Research, Inc. filed Critical Bell Communications Research, Inc.
Publication of WO1997015018A1 publication Critical patent/WO1997015018A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Definitions

  • This invention relates to data processing systems and networks. More specifically, this invention relates to methods and systems for accessing distributed heterogeneous information sources and databases.
  • HTML hypertext markup language
  • These HTML files not only provide users with textual information, but embedded within the text are pointers to other sources of information, they may be graphic, audio, video or textual.
  • Most commercially available browsers e.g. Mosaic, Netscape
  • Most commercially available browsers e.g. Mosaic, Netscape
  • these tools capable of displaying graphic or textual information.
  • this information in order for this information to be displayed it must have been at some point converted into HTML files.
  • there is a tremendous amount of legacy information in networks that could be made available to users if there was a means to access it without the owners or providers of the information having to convert it to HTML files.
  • Our invention is a system and methodology for integrating heterogenous information in a distributed environment by encapsulating data about existing and new information in objects without converting, restructuring, or reformatting the information. The process of encapsulating the information requires extracting from the information metadata.
  • This database of object and collections is instantiated into runtime memory of a server, organized into repositories of objects and collections.
  • a user seeking access to the information would then, using an HTTP compliant browser, access the server to access the information through the objects created and stored in the server.
  • Our invention provides an integrated view of and access to diverse heterogeneous information. Our invention also provides tools - for accessing, retrieving, browsing and administering the information.
  • FIG. 1 illustrates a system in accordance with one embodiment of our invention.
  • Figure 2 depicts a method for pre-processing the information units in accordance with one embodiment of our invention.
  • Figure 3 depicts a method for accessing heterogeneous information in accordance with one embodiment of our invention.
  • Figure 4(a) depicts the format a of the metadata as used in the present embodiment of our invention
  • Figure 4 (b) depicts a table defining the metadata fields as used in the metadata format of the present embodiment of our invention.
  • Figure 5 depicts the format of one embodiment of an object identifier as used in our invention.
  • Figure 6 depicts a table that defines the attributes of an ihMeta object.
  • Figure 7 depicts a class inheritance diagram for the lhArtifact family of classes as defined for the present embodiment or our invention.
  • Figure 8 is a table of ihArtifact classes as used in the present embodiment of our invention.
  • Figure 9 is a table of ihArtifact sub-class definitions as used in the present embodiment of our invention.
  • Figure 10 is a table the defines ihGraph class as used in the present embodiment of our invention.
  • Figure 11 illustrates the relationship between the run- - time modules operating in a server in accordance with our invention.
  • Figure 12 depicts an example interaction diagram of the operating in accordance with the present embodiment of our invention.
  • Figure 13 illustrates the ih_prep process and extractors and indexers in accordance with the present invention.
  • Figure 14 illustrates the process for conducting metadata context queries in accordance with the present embodiment of our invention.
  • FIG. 15 illustrates the process for conducting information content queries in accordance with the present embodiment of our invention.
  • Figure 16 illustrates the process for invoking a server side browser in accordance with the present embodiment of our invention.
  • Figure 17 illustrates the process for invoking a client side browser in accordance with the present embodiment of our invention.
  • An information unit is a piece of information that may be of interest to an end user.
  • the most common kind of IU is a document stored in a single file.
  • An IU can also represent a portion of a file (such as a single program function in the C language within a larger source code file, or a single email message in an email file, etc.), a grouping of many files, or other kinds of information.
  • Metabase is a file or database of metadata extracted from the information units and organized into InfoHarness Objects and collections.
  • Metadata is "data about data” -- it is data that describes various saliant characteristics of some other data. For instance, metadata about this patent specification could include its filing date, the inventors names, a keyword summary, etc.
  • An InfoHarness Object, or IHO. is an encapsulation of an information unit that is be accessed using our inventive system. An IHO encapsulates metadata describing the salient characteristics of an IU.
  • a collection represents a set of IHOs. Collections are logical entities; that is, the information units encapsulated by the member IHOs do not have to be physically co-located in the same directory. Encapsulated files can be distributed on many systems of a network. Further, IHOs can be members of more than one collection. Collections can be nested (i.e., contain other collections) . They can also be indexed or non-indexed
  • Collections thus, provide a logical view of physically distributed, heterogeneous information.
  • a repository is the a of collections. Its contents are accessed through an InfoHarness server operating in accordance with our invention.
  • a gateway is a component of the present embodiment of our of our invention that provides an means for connecting an Hypertext Transfer Protocol (hereinafter HTTP) server to an InfoHarness server according to the Common Gateway Interface
  • HTTP Hypertext Transfer Protocol
  • An InfoHarness Server (IH server) . is a server operating. in accordance with our invention.
  • FIG. 1 One embodiment of our invention is illustrated in Figure 1.
  • Our inventive system 10 is interposed between a plurality of end users 12 who want access to heterogeneous information 14 composed of a plurality of IUs 16.
  • the end users 12 use an HTTP- compliant browser 18 to connect to an HTTP server 20, which in turn connects to an IH server 22.
  • IH server 22 Within the IH server 22 instantiated into memory is a respository 24 of IHOs 26 and collections 28. This respository 24 was created from a database 30 of IHOs and collections created from metatdata extracted from the IUs 16 and stored in a database.
  • Users 12 connected to the IH Server 22 then can obtain IHO, metadata or search collections, using any user-specified criteria to retrieve the target information from the IUs 16.
  • Phase 1 is the registration phase under which IU's are pre-processed to create IHO's, collections and repositories.
  • Phase 2 is the information access phase wherein end-users access the IH server through the HTTP server and use the IHO's, created in the registration phase and loaded in the memory of the IH server, to locate and access the IU's.
  • Figure 2 illustrates the methodology embodied within the registration phase.
  • An information provider or InfoHarness administrator having information which the provider desires to make accessible to user ⁇ , would invoke an InforHarness registration procedure (software) to register the information units 30.
  • the administrator Upon invoking the InfoHarness registration procedure, the administrator would first invoke a pre-processor 32 to prepare the information for the extraction process .
  • the next step involves the administrator invoking one of a plurality of extraction processes 33 to extract metadata from the information units that are be registered (the appropriate extractor process depends on the type of information the administrator is registering) .
  • the output of the extraction process is the creation of a metabase (which is a file or database) of IHOs 34.
  • This metabase contains metadata of the information units logically collected into a collection, and also information about IHOs and collections and the relationships among them.
  • Figure 3 illustrates the methodology for accessing the information in accordance with our invention.
  • the IH server must be initialized, then the IHOs are loaded from the metabase into the server's memory and organized into repositories 36. After the server is initialized and running, the IH server enters a main event loop and waits for requests from clients 38. End-users then access the IH server through an HTTP server 40. Once the end-users access the IH server, they perform one of three actions to select an object 42: (1) a metadata based query, (2) a content based query, or (3) explicitly navigate around the IHOs. Once an object is selected, it can be accessed and browsed by activating either a client side browser 44 or server side browser 46. The user may also operate on the object choosing from a set of procedures such as print, store, fax, etc.
  • C. REGISTRATION PROCESS REGISTRATION PROCESS
  • the registration process involves an owner, creator, or provider of information working with a system administrator to pre-processing the information for the purposes of extracting metatdata in a format usable by our IH server.
  • Registration is accomplished in four steps: pre ⁇ processing the physical data, extracting the metadata, storing the metadata in a metabase, and transferring the extracted metadata from the metabase to the IH server.
  • the main function of the pre-processing is to process the physical data and build logical structures (IHOs and collections) which the IH server can later use for presentation to end users.
  • Physical data in this sense, includes formatted, unformatted, structured, and unstructured data. It could also be dynamic; e.g., SQL queries or newsfeeds.
  • Metadata which is extracted in accordance with the methods described herein can be content-dependent, content- descriptive, or content-independent.
  • Content-dependent metadata is based strictly on the contents of the physical data. Examples of content-dependent metadata are keyword indices for textual data, grids for image data, speaker change lists for audio data, etc. As the name "content-descriptive" suggests, it describes the physical contents of the data.
  • Examples include spatial information for video data, the subject of a talk for audio data, document composition for multimedia data, etc.
  • Content- independent metadata does not rely on the specific content of the underlying data for it's values.
  • Media type, document history and location, temporal information for video and audio, etc. are examples of content-independent metadata.
  • all data to be registered with the system must be accessible as a file on a mounted file system. This typically means that the data must all be on the same LAN, although it may be ⁇ tored on multiple file servers if those servers are directly accessible, such as via an network file ⁇ erver (NFS) mount.
  • NFS network file ⁇ erver
  • a directory structure is one example of a logical structure which usually exists for most file systems.
  • the system administrator working with the InfoHarness application builder may determine that users might be interested in relationships among collections and IHOs. As and example, a parent-child relationship between the two collections could be imposed.
  • Other relationships that could be modeled are: 'contains, ' 'is contained in, ' and 'part-of. '
  • the end result of pre-processing and metadata extraction is the creation of a metabase (which may be a file or database) containing metadata, IHOs and collections.
  • a metabase which may be a file or database
  • IHOs and collections As de ⁇ cribed above, when this information is loaded from the metabase into an IH server, it is materialized as IHOs and collections in the server's memory organized into repositories.
  • extractor processes for various document data types. These extractor processes are easily created using skills well known in that art. As an example, we use extractors for Text, PostScript, HTML, man pages, and e-mail message files.
  • An IHO encapsulates a single IU.
  • a collection does not encapsulate any IU, rather it is a set of other IHOs or collections.
  • An IHO, encapsulating an IU, would thus have a unique identifier by which to distingui ⁇ h itself from other
  • a collection is a set of IHOs, related together at the discretion of the system administrator of InfoHarness application builder. Physically, in the embodiment described herein, a collection is represented by a number of Unix files in a common subdirectory, whose name is the name of the collection.
  • This collection directory contains several important files:
  • one or more index files may be pre ⁇ ent.
  • Index extracted text if requested consists of extracting metadata from physical information sources, creating representations for IHOs, collections and relationships, and optionally creating an index on textual contents of the sources .
  • the physical information sources should exist on the same file system as the pre-processor and indexer.
  • the metadata is also stored on this file system at the location specified by the administrator.
  • the pre-processor uses extractor methods for the extraction of metadata from the phy ⁇ ical information sources. These are type-specific methods which process the information sources and return metadata in a specific format. In our embodiment, the preprocessor does not analyze the source type to invoke an extractor; instead the sy ⁇ tem administrator of our IH server is expected to indicate a particular extractor which will then be used for metadata extraction.
  • the pre-processor treats all the IHOs generated as constituents of a collection. A user-specified location is used to store the metadata files created. The user has the option to append newly generated metadata to an existing collection. The user can also indicate whether this generated collection should have a text index built for it, and if so, which indexing technology to use for this purpose. The indexing technology itself is not part of the present invention.
  • indexing technologies are WAIS and GLIMPSE.
  • WAIS WAIS
  • GLIMPSE GLIMPSE.
  • An index is generated, it is installed in the same directory as the metadata files.
  • a cross-reference file is also generated which maps the index database objects to the to IHOs. If indexing is not performed the generated collection is treated as a set.
  • a typical extractor takes as input the location of the information source which is to be encapsulated. It returns a formatted string which the pre-proces ⁇ or interprets to generate metadata entries that are stored in a metadata file.
  • the metadata file itself has a well-defined format, described in more detail below.
  • the extractor also extracts the text associated with the generated IHOs.
  • the C file has to be parsed to recognize comments and function signatures, because indexing the language constructs and variable names does not usually make sense.
  • an IU would be associated with either a function or the file as a whole. Representative information is also extracted and associated with the IU. This will be displayed to the user at browse time; e.g., for mail mes ⁇ age ⁇ the ⁇ ubject line i ⁇ used a ⁇ a representative, for HTML documents the contents of the TITLE construct are used, etc.
  • Metadata is passed from the extractor in a format called the metadata transfer format.
  • This format (a Perl data structure) has constructs which allow arbitrary graph structures to be imposed on top of the IHOs (e.g., parent-child relationships between collections) .
  • the object' ⁇ type and subtype are as ⁇ ociated with the IUs and are both determined by the extractor process.
  • the location attribute i.e., a value used to locate the IU in the file sy ⁇ tem
  • URL Uniform Resource Locator
  • URLs are used for HTML documents.
  • the location of a ⁇ C function could be specified a ⁇ ' filename%function_name' .
  • Metadata is transferred between the extractors and the pre-processor as a structured Perl string, who ⁇ e format i ⁇ shown as 48 in Figure 4(a) .
  • Each IU has six field ⁇ of metadata a ⁇ ociated with it (e.g., fll through fl6) , each separated by a colon, and each IU's metadata is separated from the next IU's by a vertical bar 52.
  • Figure 4(b) depicts a table 54 that summarizes the purpose of each field.
  • the location field 55 is created by the extractor process to identify where the IU is ⁇ tored.
  • the Unique Objld Indicator field 56 instructs the pre ⁇ processor whether to use the Location to construct a unique object identifier. For some case ⁇ the extractor supplied locator is guaranteed to be unique so that the pre-proces ⁇ or need not manipulate it.
  • One ⁇ uch case is IUs a ⁇ ociated with HTML file ⁇ , for which URLS are generated by the extractor a ⁇ IU locations. These URL ⁇ are unique. If this flag is set, the pre- proces ⁇ or constructs a unique identifier for the object.
  • the ordinal value of the Depth field 58 indicates the depth of that IU in an in-order traversal of the desired repository ⁇ tructure.
  • the collection object which is the root of this tree, is pre-assigned a depth of 0.
  • An extractor returning a simple list of file IUs that are to be a part of this collection would assign a depth of 1 to each of these file IUs.
  • the pre-processor then makes all of these file IUs children of the collection object.
  • An example of the structure in the metadata transfer format is shown in Fig. 5.
  • the Subtype field 60 is determined by the extractor and is used later by the IH server to determine how to access the actual IU.
  • the Subject field 62 contains summary information related to an IU and is what the user will see as the "name" of the object at the time of browsing.
  • the last field 64 is the text body of the IU, to be used if the collection is being indexed.
  • An IU may be repre ⁇ ented multiple time ⁇ in thi ⁇ ⁇ tream, po ⁇ ibly to a ⁇ ert a relation ⁇ hip with other IU ⁇ , but a metadata entry is made for only the first occurrence.
  • An empty text field indicates that the IU need not be cross- referenced for indexing.
  • Object entries are flat representations of the IHOs whereas relationship entries represent parent-child relationship ⁇ between IHO ⁇ .
  • Object entries have an object identifier. This object identifier could be constructed by the pre-proces ⁇ or or the extractor as ⁇ pecified by the indicator in the metadata transfer format. If the pre-processor constructs the object identifier, it does so in a specific format.
  • the format is: machineid: location: ⁇ ubtype
  • the machineid i ⁇ a unique phy ⁇ ical machine identifier of the machine on which the pre-processor is run. This field is automatically generated by the pre-processor.
  • the location and subtype field value ⁇ are assigned based on the values returned in the metadata transfer stream.
  • the location field, for a simple or composite IHO would be the location of the associated IU.
  • For a collection IHO this would be the location of the collection; e.g., for an indexed collection it would be the location of the index.
  • the subtype field value is the same as the subtype value returned in the metadata transfer stream. For a collection IHO this i ⁇ the index type; i.e., wai ⁇ or glimpse.
  • An object entry is of the form as shown in Figure 5.
  • the first field 70 serves as the object identifier. This object identifier is used for uniquely identifying the object and ⁇ erves as a key.
  • the type 71 and subtype 72 values corre ⁇ pond to non-terminal and terminal classes in the server abstract class hierarchy.
  • the location value 73 is used by the browser methods to retrieve the data associated with the IU encapsulated by this object. Following this there could be an arbitrary number of attribute-value pairs 74.
  • a relationship entry is of the form: [objidl
  • the HTTP server is connected to the IH server through a gateway.
  • This gateway interacts with two types of programs: an HTTP server (which in turn interacts with an HTTP browser (e.g. Mosaic or Netscape) .
  • HTTP browser e.g. Mosaic or Netscape
  • Any HTTP-compliant browser can interact with the HTTP server, and the IH servers .
  • the HTTP protocol i ⁇ ⁇ tatele ⁇ ( ⁇ ee http:// info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html for information) .
  • the u ⁇ er u ⁇ t ⁇ pecify the machine where the IH ⁇ erver they wi ⁇ h to interact with i ⁇ running, the port number the IH ⁇ erver i ⁇ u ⁇ ing to accept connection ⁇ , their X display value, the query text that should be used to select objects from the collection, the maximum number of hits to return on a successful query, and the collection against which the query will be executed.
  • One approach to gathering this information would be to force the user to specify all neces ⁇ ary parameters by hand on each interaction with the gateway. However, that would clearly not be a very user-friendly approach. Instead, our design is such that the user only needs to enter certain information once, on a " ⁇ etup" screen.
  • This arrangement causes the gateway to spend time performing two tasks: retrieving information from incoming URLs, and reformatting the output of the IH server into URLs (and HTML) .
  • the HTTP browser opens a URL pointing to the gateway (e.g., http://http.ctt.bellcore.com/cgi-bin/nph-ih.cgi) .
  • the HTTP server responds by returning the setup screen to the HTTP browser.
  • the user determine ⁇ the IH ⁇ erver to connect to and enters the correct information on the fill out form on the setup screen.
  • the gateway connects to the ⁇ pecified IH ⁇ erver and request ⁇ a list of collections managed by the IH server.
  • the gateway For each item in the list returned by the IH server, the gateway generates a URL containing all the necessary information required to acces ⁇ thi ⁇ collection on the next interaction, and return ⁇ the li ⁇ t to the HTTP server, which in pas ⁇ e ⁇ it to the reque ⁇ ting HTTP brow ⁇ er. The u ⁇ er can then ⁇ elect one of the collections returned by the gateway for further interrogation. If the collection is indexed, the gateway presents a form to the user for entering the search text. If the collection is not indexed, the gateway connect ⁇ to the appropriate IH ⁇ erver (a ⁇ ⁇ pecified in the URL) and reque ⁇ ts the contents of the list. The list contents are then formatted appropriately in HTML by the gateway, and URLs are generated for each item in the list.
  • a search can be initiated. If the user submit ⁇ a query, the gateway ⁇ ends that request to the IH server. The IH server response is similar to the result ⁇ returned when the member ⁇ of a li ⁇ t are reque ⁇ ted, and again, the gateway format ⁇ the results into a list with their corresponding URLs. In either the search result ⁇ li ⁇ t, or the ⁇ imple li ⁇ t, the HTTP brow ⁇ er can select any of the items in the list. If the user selects an item (i.e., clicks on the link) , this translate ⁇ to ⁇ aying " ⁇ how me this item.
  • the gateway contacts the appropriate IH server (again determined by the ⁇ tate information embedded within the URL) and reque ⁇ ts the particular item. If the item has been designated as displayable by the IH server, the IH server retrieves the item and uses X to display the item back to the user. If the item ha ⁇ been de ⁇ ignated a ⁇ di ⁇ playable by the HTTP brow ⁇ er, the IH ⁇ erver retrieve ⁇ the item and send ⁇ it back to the gateway. The gateway determines (based upon the type of data returned) what Multimedia Internet Mail Extension (MIME) type the item corresponds to and returns the appropriate header information as well as the actual data to the HTTP browser.
  • MIME Multimedia Internet Mail Extension
  • IH users will find the steps outlined in the previous paragraph familiar, it is important to remember that these steps can occur in any sequence as long as the appropriate information is passed to the gateway. Again, the reason for this is the stateless nature of the HTTP. Some users may wish to exploit this feature.
  • a user may wish to construct several "canned" queries against a particular IH ⁇ erver.
  • the URL's representing these queries can be imbedded in other HTML documents providing more descriptive text regarding the queries, or their intended re ⁇ ult ⁇ .
  • Another u ⁇ er may want to provide acces ⁇ to individual object ⁇ held by the IH ⁇ erver. They may con ⁇ truct URLs that point directly to the objects (even objects that are members of an indexed collection) and circumvent the need for search queries to retrieve the objects.
  • an IH server generated link is activated by the user (e.g., the user clicks on an object on the query result ⁇ ⁇ creen)
  • the gateway examines the URL that was activated. All such URLs are unescaped and validated. Unescaping a URL consi ⁇ t ⁇ of replacing all ⁇ equences of the form %XX (where X is a valid hexadecimal value) with their corresponding ASCII value.
  • Validating a URL consist ⁇ of extracting the information contained in the URL (i.e., IH ⁇ erver addre ⁇ , port, query text, etc.) and checking that the value ⁇ are within certain con ⁇ traint ⁇ (e.g., the address is a valid TCP/IP addres ⁇ , the port number i ⁇ non-negative, etc.) .
  • the gateway identifies the action being requested by the user and performs the specified action. For some actions (e.g., query, expand, show) the IH server is contacted for the desired information. For others, the gateway can handle the request it ⁇ elf. In cases where interaction with the IH server is necessary, the gateway determines the response type for the IH server and performs the neces ⁇ ary reformatting of any returned data. The gateway convert ⁇ the response into an
  • the gateway support ⁇ a number of different "actions" that a HTTP browser can request. Each of these actions is described below.
  • a "setup" request presents the user with the initial IH server setup screen. This screen is used to set default values used in other interactions with the gateway. This action is normally the first action in a set of interactions between the user and the gateway.
  • the "init" reque ⁇ t determine ⁇ the host name of the IH server, the port where the server is accepting reque ⁇ ts, and the DISPLAY value of the user's machine. Default values for these variable ⁇ are maintained in the gateway and are pre ⁇ ented to the u ⁇ er. The end user may alter any of these values from the setup screen.
  • the values submitted by the user are then maintained acros ⁇ invocation ⁇ of the gateway by adding them to all URLs created by the gateway and returned to the user. Once the user has specified these value ⁇ and ha ⁇ ⁇ ubmitted the reque ⁇ t to the gateway, they are presented with the list of collections that the IH server they specified can access.
  • the "expand" request expands collections. Expanding a collection has a different meaning for different types of collections. For indexed (i.e., searchable) collections, expand provides a form-ba ⁇ ed interface for specifying search arguments for the collection. For all other collections, expand cause ⁇ a request to be sent to the IH server asking for a particular IH collection (specified by an object ID) . The results of this request are formatted in HTML for display back to the HTTP browser. The HTML will not include a URL to the parent collection when the object's type is LIST; otherwi ⁇ e, a URL to the parent will be included in the HTML.
  • a "query" request performs a query on an indexed collection.
  • the query text is pas ⁇ ed to the IH ⁇ erver and if the collection contain ⁇ any information unit ⁇ that ⁇ ati ⁇ fy the ⁇ earch criteria, the IH server returns a list of the IHO IDs corresponding to the information units. If no matching information units were found, the IH server returns a message ⁇ tating that no matches were found.
  • the "show" reque ⁇ t provide ⁇ the u ⁇ er with a capability to view particular object.
  • the object ID of the desired object and the HTTP browser's DISPLAY value are pas ⁇ ed to the IH ⁇ erver.
  • the IH server will either return the desired object to the gateway (which then passes the object back to the HTTP browser) , or it will start a proces ⁇ to display the object back to the HTTP browser.
  • E. DESCRIPTION OF THE IH SERVER The IH Server is key to our inventive sy ⁇ tem and provides the end-users with access to a set of IH Objects (IHOs) that make up that server's repository. Upon ⁇ tart-up, the server is told what collections will make up that server's repository.
  • IHOs IH Objects
  • the server For each collection specified, the server locates, reads, and parses the collection's metadata file, constructing an internal (in-memory) representation of the IHOs and their relationships.
  • Each IHO in memory is an instance of an "artifact" C++ subcla ⁇ s; the particular subcla ⁇ s depends upon the type of the IHO and determines how the object will handle incoming HTTP browser requests.
  • the server Once it has read the metadata, the server goes into an event loop where it waits for incoming request ⁇ from the Gateway, processes those reque ⁇ t ⁇ , and ⁇ end ⁇ back appropriate re ⁇ ponses .
  • the IH server is initialized either manually by an admini ⁇ trator or automatically during a machine's boot cycle. The server is told which collections will make up its repo ⁇ itory through variou ⁇ command-line argument ⁇ . For each collection, an ihMeta object i ⁇ constructed to read and parse the metadata for that collection (see table 75 in Figure 6) . Each collection is stored in its own subdirectory and contains a file called
  • IH_SUMMARY that contains meta-information about the collection.
  • the ⁇ erver uses that meta-information to determine ⁇ pecifically which IHO metadata files to read.
  • Each metadata file contains entities describing encapsulated IHOs and their inter-relationships.
  • the ihMeta object parses each entity one at a time. An entity can be either an IHO or a relationship. For each IHO entity, a new ihArtifact C++ object is constructed. The object is actually an instance of one of the concrete clas ⁇ e ⁇ derived from ihArtifact. The particular concrete class generated depends on the IHO's type attribute; each artifact subclass defines specific behavior for variou ⁇ requests against that type of object. The type thus determines how the artifact will re ⁇ pond to end-user actions on the object. Once the object has been created, it is added to a global object table for future reference, using the Objectld as the key.
  • Relationship entities designate parent-child associations between two objects.
  • the ⁇ erver look ⁇ up both "ends" of the relationship in a global object table and establishes a bi ⁇ directional reference between the parent and child artifacts (i.e., the child is added to the parent's set of children and the parent is added to the child's set of parents) .
  • the server While parsing metadata, if the ihMeta object detects malformed entities it reports appropriate error me ⁇ age ⁇ to the admini ⁇ trator. If too many error ⁇ are found, the server iborts before reaching the event loop. Once the server has ⁇ ucce ⁇ fully read in all of its collections, it goes into the main event loop and waits for requests from clients.
  • the IH ⁇ erver runtime object model is ba ⁇ ed upon a cla ⁇ hierarchy of abstract and concrete C++ classe ⁇ . Every IH Object has both a type and a subtype.
  • the type defines which concrete clas ⁇ will repre ⁇ ent the IHO in the server's internal representation of the object and how, in general, the object will respond to user action ⁇ .
  • the subtype determines how those general actions on the object will actually be implemented (for instance, server-side PostScript objects (type MM, subtype postscript) get displayed by running ghostview while server- side FrameMaker objects (type MM, ⁇ ubtype frame) get di ⁇ played by running FrameMaker software.
  • the types and subtype ⁇ of the objects are determined by the extractors during collection preparation.
  • Figure 7 show ⁇ a cla ⁇ inheritance diagram for the ihArtifact family of cla ⁇ ses.
  • ihArtifact is an ab ⁇ tract class that defines the interface to all IH Object ⁇ in the system.
  • the, ihArtifact abstract class 80 inherits the attributes from the ihArtFile objects 82 and the inArtSet objects 84.
  • Figure 8 depicts a table that defines the abstract interface to artifact objects.
  • Figure 9 depicts a table containing description ⁇ of how each of the subclas ⁇ es implements those methods described in Figure 8.
  • Each metadata entity in a repository is repre ⁇ ented at runtime by an instance of a class in the ihArtifact hierarchy.
  • These artifacts are maintained via two mechanisms: (1) an object table that maps object IDs to artifacts, and (2) a graph, linking objects by two-way parent-child relationships.
  • This table is stored in an instance of the ihGraph class (see Figure 10) called "graph”.
  • Figure 11 shows an example of the primary object relationship ⁇ in the server at runtime.
  • the server enter ⁇ the main event loop.
  • the main loop i ⁇ responsible for reading and processing requests.
  • the server processes each incoming request as it is received from the HTTP browser.
  • the server contains a global _ instance of the clas ⁇ ihlpc called " ⁇ erver" that handles the inter-process communications.
  • the main event loop asks the "server” object to read the next request; once read, the request is pas ⁇ ed on to the metadata graph object for proce ⁇ ing.
  • the graph parses the request to determine the object ID of the object being acted on as well as the action to take on it.
  • the graph looks up the artifact in its object mapping table, invokes the appropriate method on that artifact, and captures the results.
  • the results are then returned back to the HTTP browser.
  • Figure 12 shows an example of this behavior in an object interaction diagram.
  • the main event loop 100 tells the server object 101 to read a request and tells the graph to process 102 the request.
  • the graph invokes the appropriate method on the artifact (in this ca ⁇ e, activate 103) , which may in turn run ⁇ a browser script 104 to actually retrieve the desired data.
  • the result ⁇ are returned to the gateway by the ⁇ erver object.
  • Each object type in the IH server responds to user interactions in its own way. Sometimes this functionality is coded directly in C++ in the IH server, other times the functionality is dependent upon "helper" programs called “browser- ⁇ cript ⁇ . " A browser-script defines type/ ⁇ ubtype- specific mechanisms for accessing an object.
  • the input to a browser-script is a location parameter that identifie ⁇ the object to be viewed.
  • the re ⁇ pon ⁇ ibility of the brow ⁇ er- ⁇ cript is to display this object to the user; how this is achieved depends upon the kind of data contained in the object and how that data i ⁇ to be shown to the user.
  • the browser-script for PostScript documents is invoked when the user wants to display a document whose type is MM ("server-side" multimedia) and whose subtype is p ⁇ .
  • the PostScript browser-script takes the name of a PostScript document and executes a viewer program (i.e., ghostview) to display that document.
  • the C browser-script is passed the name of a C file and the name of a function within that file; the script extracts the specified function and send ⁇ that text back to the invoking program (the ⁇ erver) .
  • IH ⁇ erver need ⁇ to execute a UNIX program (such a ⁇ a Perl ⁇ cript) and capture it ⁇ output.
  • a UNIX program such as a ⁇ a Perl ⁇ cript
  • the server runs Perl programs called "Browser- ⁇ cripts; " these scripts display the contents of an object to the user in a type- and subtype-specific manner.
  • the server querie ⁇ when it need ⁇ to run an indexer- ⁇ pecific Perl program, which in turn executes a search program and formats the response ⁇ .
  • the stand-alone function "encapsulate" is used for both of these tasks. Encapsulate forks a new child proces ⁇ and establishes the equivalent of a pipe between the parent and child processes: the child's standard error and output are redirected back to the parent, which then reads that output. The output from the child is collected in a dynamically sized buffer (see the Block Manager, below) ; the buffer can then be sent back to the HTTP browser if necessary.
  • the ihBlockMgr class serve ⁇ this purpose.
  • This clas ⁇ maintain ⁇ a sequence of zero or more "blocks," or buffer ⁇ , of data. Each block can hold up to a fixed number of byte ⁇ .
  • a ⁇ data is being captured by the encapsulate function or read in from a file, it is written into the last block in the block manager' ⁇ ⁇ equence.
  • ihBlockMgr include ⁇ method ⁇ for iterating through the block ⁇ one at a time and for clearing out the manager's contents.
  • In_prep is a Perl script used to extract metadata.
  • In_prep cooperates with two other type ⁇ of programs: extractor ⁇ and indexer ⁇ .
  • Extractor ⁇ are type specific Perl subroutine ⁇ required by in_prep to traverse phy ⁇ ical data and extract the neces ⁇ ary information required for metadata and indexe ⁇ .
  • a ⁇ eparate extractor is needed for each type of data placed under control of an IH server.
  • Indexers can be implemented using any language desirable. The only limitation imposed is that the in_prep proces ⁇ mu ⁇ t be able to access the indexer via the Perl "systemO" function. Indexers are not type specific, since they can be applied to any text data.
  • Indexers are used to provide content-oriented queries over physical data.
  • Figure 13 illustrates the interaction that take place between in__prep, extractors, and indexers.
  • an extractor For each invocation of in_prep 111, an extractor is called to process each member of the desired information units.
  • the in_prep process passes the location of the physical data (usually a file name) to the extractor 112.
  • the extractor in turn processes the physical data (referred to as an information unit IU) and extracts metadata as well as text to be indexed from the IU, and if there is more than one IHO in the IU, the extractor also establishes relationship ⁇ between the object ⁇ .
  • information unit IU information unit
  • the object ⁇ and relation ⁇ hip ⁇ created by the extractor 112 are returned to in_prep 111 which write ⁇ them to the metabase for use later by the IH server.
  • In_prep 111 invokes the appropriate indexer to index 113 the text data extracted from the IU.
  • the output of the indexer is saved in the metabase for later use by the IH server.
  • the metadata entries produced by in_prep and stored in the metabase are loaded into memory by the IH server at run time.
  • the IH server then enters a loop where it responds to incoming reque ⁇ t ⁇ from HTTP browsers . Referring back to Fig. 3 ,_ after the server is initialized and running, the IH server enters a main event loop and waits for requests from clients 38. End-users then acces ⁇ the IH server through an HTTP server 40.
  • an object 42 Once the end-users access the IH server, they perform one of three actions to select an object 42: (1) a metadata based query, (2) a content ba ⁇ ed query, or (3) explicitly navigate around the IHOs.
  • an object Once an object is ⁇ elected, it can be accessed and browsed by activating either a client side browser 44 or server side browser 46. The user may also operate on the object choosing from a set of procedures such as print, store, fax, etc.
  • FIG 14 illustrates the processing of a request by an end-u ⁇ er for conducting a metadata query.
  • a client requests 121, via HTTP, the initial collection held by an IH server.
  • the request is passed, via the CGI 122, to the gateway.
  • the gateway connects to the IH ⁇ erver and requests 123 the initial collection via a internal protocol.
  • the IH server determines the initial collection based upon its in-memory metadata and returns the results to the gateway 124.
  • the gateway reformats the response into HTML and sends 125 its respon ⁇ e to the HTTP ⁇ erver.
  • the HTTP ⁇ erver passes 126 the results back to the HTTP browser client without interruption since our gateway is a "no parse header" gateway. This means that the HTTP server will do no parsing of our response, and the gateway must be able to form correct HTTP responses.
  • Figure 15 illustrates the process for conducting a context-oriented query.
  • the end-u ⁇ er via HTTP for an InfoHarne ⁇ s collection held by an ih_server requests a context- oriented query 151.
  • the request is passed via the CGI to the gateway 152.
  • the gateway connects to the ih_server and request ⁇ a context-oriented query 153, passing the query text.
  • the proper indexer is invoked to perform the search 154.
  • the indexer returns a list of IHOs that satisfy the query 155.
  • the IH server returns the list of IHOs to the gateway 156.
  • the gateway reformats the list of InfoHarness objects in the HTML and returns the list to the HTTP server 157.
  • the HTTP server transmits the list of objects to the HTTP browser 158.
  • Figure 16 illustrate ⁇ a the processing of a request for invoking a server side brow ⁇ er.
  • a client requests, via HTTP, an IH object held by an IH server 161.
  • the request is pa ⁇ ed via the CGI to the gateway 162.
  • the gateway connect ⁇ to the IH server and requests the IH object via any internal protocol 163.
  • IH server determines that the requested object requires the invocation of a ⁇ erver side browser 164.
  • the correct browser is invoked with the location of the object.
  • the browser starts a proces ⁇ that di ⁇ play ⁇ the object back to the client' ⁇ machine 164.
  • Any error text generated by the browser is returned to IH server 166.
  • IH server returns a message to the gateway indicating either successful invocation of the browser, or error text generated by the browser 167.
  • the gateway indicates success via the HTTP 169 OK message.
  • the response from the gateway is transmitted to the user via HTTP 170.
  • the user does not close the application started by the browser, they can invoke any action ⁇ ⁇ upported by the application and the re ⁇ ult ⁇ will be ⁇ ent back to the machine where the browser was started. (Note the security ri ⁇ k ⁇ a ⁇ ociated with server side browsers. The user has acces ⁇ to an application that runs with the inherited permis ⁇ ion ⁇ of IH ⁇ erver.
  • Figure 17 illustrates the process for a request for invoking a client side browser.
  • a client request ⁇ via HTTP, to see an IH object held by an IH server 171.
  • the request is pas ⁇ ed via the CGI to the gateway 172.
  • the gateway connects to the IH server and reque ⁇ ts the IH object 173.
  • IH ⁇ erver examines the type of the object reque ⁇ ted and determines that the object can be displayed using a client side browser (or in HTTP browser terms, an external viewer) .
  • the location of the object is determined and the IH server returns the contents of the file to the gateway 174.
  • the gateway performs a mapping between the IH subtype of the object and the MIME type corresponding to the object.
  • Thi ⁇ MIME type is returned with the object contents to the HTTP server 175.
  • the HTTP browser receives the contents of the object and determines which external viewer to invoke for the specified MIME type 176.
  • the contents of the object cire stored in a temporary file.
  • the external viewer is started 177_ with the name of a temporary file that contains the contents of the requested object.

Abstract

Our invention is a system and methodology for integrating heterogeneous information in a distributed environment by encapsulating data about existing and new information into objects (16). The process of encapsulating the information requires extracting from the information metadata. Creating from the metadata, a database (30), where the metadata is grouped into objects (26) and groups of objects (28) which are logically associated into collections (28). This database of object and collections is instantiated into runtime memory of a server (22), organized into repositories (24) of objects (20) and collections (28). A user (12) seeking access to the information would then, using an HTTP compliant browser (20), access the server (22) to access the information through the objects (26) created and stored in the server.

Description

METHOD AND SYSTEM FOR PROVIDING UNIFORM ACCESS TO HETEROGENEOUS INFORMATION
TECHNICAL FIELD OF THE INVENTION
This invention relates to data processing systems and networks. More specifically, this invention relates to methods and systems for accessing distributed heterogeneous information sources and databases.
BACKGROUND OF THE INVENTION
Given the advances in modern computer technology and the proliferation of relatively inexpensive off-the-shelf authoring and office automation software, the ability to create information has increased dramatically. Naturally therefore, the size, diversity, and quantity of information repositories have also increased. As a result, enormous amounts of i formation have been accumulated within corporations, government organizations, and universities. With the advances in data communication technology and computer networking, much of this information is in electronic repositories on networks accessible to anyone with a computer connected to these networks. However, the information is heterogeneous; i.e. stored in many forms of differing types and representations.
In such an environment in order for users to access these heterogeneous types of information, they not only have to know the about the existence and location of the information but also the format of the information, the different database query languages procedures, and differing access and retrieval procedures for accessing and retrieving this information. Accordingly, knowledge workers are spending too much time trying to locate, access and retrieve the information they need. Often times because of these barriers to access, knowledge worker's give up trying to access the information and recreate the same information in another repository in yet another inconsistent manner. These problems in accessing heterogeneous information reduce individual and organization productivity, thereby increasing the cost of doing business. To address these problems, those practicing in the art have attempted to build uniform information repositories by relocating and reformatting the original information in some standard format and at centralized locations. This approach requires the design and maintenance of an ever-increasing number of ever-changing format translators. In addition, the initial conversion of the information often requires substantial human and computing resources. Furthermore, maintaining the repositories requires either creating new and updating information in the uniform format, or continuously managing changing data in different formats. These approaches are not only resource intensive, but because they are based on a centralized model of system management, they are characterized by a performance, administrative and reliability bottleneck, inherent in centralized systems.
Another problem presented by the prior art is that sophisticated indexing and search techniques are available only for certain types of information, or such techniques come embedded within an application and cannot be applied to other kinds of information, i.e., such techniques are part of a closed system. Therefore, on a network with heterogeneous information, users are therefore burdened with having to cope with multiple indexing and search techniques that are developed and applied in idiosyncratic ways to handle different kinds of information. One recent advance in the art of providing users easy access to information from a variety of sources is the development of the World Wide Web on the Internet. The users, using hypertext transfer protocol (HTTP) browsers, connecting to HTTP servers have access to numerous sources of information. The information they are be able to retrieve are textual files formatted using a hypertext markup language (HTML) . These HTML files not only provide users with textual information, but embedded within the text are pointers to other sources of information, they may be graphic, audio, video or textual. Most commercially available browsers (e.g. Mosaic, Netscape) contain tools capable of displaying graphic or textual information. However, in order for this information to be displayed it must have been at some point converted into HTML files. However, there is a tremendous amount of legacy information in networks that could be made available to users if there was a means to access it without the owners or providers of the information having to convert it to HTML files.
Accordingly, what is needed in the art is a system and method for providing users with integrated access to large amounts of heterogeneous information without the end-user needing to know the type, format or location of the information and without burdening the owners or providers of the information with having to translate, relocate or re-format the information.
SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide users with integrated access to large amounts of heterogeneous information without the end-user needing to know the type, format or location of the information. It is a further object of the present invention to accomplish these goals without having to burden the information owners with having to translate, relocate or reformat the information. These objectives are achieved and an advance in the art is made by our invention. Our invention is a system and methodology for integrating heterogenous information in a distributed environment by encapsulating data about existing and new information in objects without converting, restructuring, or reformatting the information. The process of encapsulating the information requires extracting from the information metadata. Creating from the metadata, a database, where the metadata is grouped into objects and groups of objects are which logically associated into collections. This database of object and collections is instantiated into runtime memory of a server, organized into repositories of objects and collections. A user seeking access to the information would then, using an HTTP compliant browser, access the server to access the information through the objects created and stored in the server. Our invention provides an integrated view of and access to diverse heterogeneous information. Our invention also provides tools - for accessing, retrieving, browsing and administering the information.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates a system in accordance with one embodiment of our invention.
Figure 2 depicts a method for pre-processing the information units in accordance with one embodiment of our invention.
Figure 3 depicts a method for accessing heterogeneous information in accordance with one embodiment of our invention.
Figure 4(a) depicts the format a of the metadata as used in the present embodiment of our invention
Figure 4 (b) depicts a table defining the metadata fields as used in the metadata format of the present embodiment of our invention.
Figure 5 depicts the format of one embodiment of an object identifier as used in our invention.
Figure 6 depicts a table that defines the attributes of an ihMeta object.
Figure 7 depicts a class inheritance diagram for the lhArtifact family of classes as defined for the present embodiment or our invention.
Figure 8 is a table of ihArtifact classes as used in the present embodiment of our invention.
Figure 9 is a table of ihArtifact sub-class definitions as used in the present embodiment of our invention.
Figure 10 is a table the defines ihGraph class as used in the present embodiment of our invention. Figure 11 illustrates the relationship between the run- - time modules operating in a server in accordance with our invention.
Figure 12 depicts an example interaction diagram of the operating in accordance with the present embodiment of our invention.
Figure 13 illustrates the ih_prep process and extractors and indexers in accordance with the present invention.
Figure 14 illustrates the process for conducting metadata context queries in accordance with the present embodiment of our invention.
Figure 15 illustrates the process for conducting information content queries in accordance with the present embodiment of our invention.
Figure 16 illustrates the process for invoking a server side browser in accordance with the present embodiment of our invention.
Figure 17 illustrates the process for invoking a client side browser in accordance with the present embodiment of our invention.
DETAILED DESCRIPTION Described below is one preferred implementation of the present invention which is illustrated in the accompanying drawings. This one embodiment or our invention is described as it has been implemented in a product, known as the InfoHarness™ software and system. This description of our invention is organized into six sections. First, we define terms that will be
InforHarness is a trademark of Bell Communications Research Inc., the assignee of this patent used throughout the specification. Second, we provide a high _ level overview of our system and method. Thirdly, we describe in detail the process for metabase preparation. In the fourth section, we describe the operation of a gateway, necessary in our embodiment, to connect the HTTP server to the InfoHarness Server (It isn't material to our invention to have this gateway as a stand alone process but could in other embodiments be embedded within the server) . In the fifth section, we describe the operation of the InfoHarness server which operates in accordance with our invention. Finally in the sixth section, we describe the interactions between the components of our inventive system. These descriptions are only exemplary of the invention. The present invention is not limited to the implementations described, but may be realized by other implementations. A. DEFINITIONS
An information unit, or IU, is a piece of information that may be of interest to an end user. The most common kind of IU is a document stored in a single file. An IU can also represent a portion of a file (such as a single program function in the C language within a larger source code file, or a single email message in an email file, etc.), a grouping of many files, or other kinds of information.
Metabase is a file or database of metadata extracted from the information units and organized into InfoHarness Objects and collections.
Metadata is "data about data" -- it is data that describes various saliant characteristics of some other data. For instance, metadata about this patent specification could include its filing date, the inventors names, a keyword summary, etc. An InfoHarness Object, or IHO. is an encapsulation of an information unit that is be accessed using our inventive system. An IHO encapsulates metadata describing the salient characteristics of an IU.
A collection represents a set of IHOs. Collections are logical entities; that is, the information units encapsulated by the member IHOs do not have to be physically co-located in the same directory. Encapsulated files can be distributed on many systems of a network. Further, IHOs can be members of more than one collection. Collections can be nested (i.e., contain other collections) . They can also be indexed or non-indexed
(e.g. processed to permit content searches) . Collections, thus, provide a logical view of physically distributed, heterogeneous information.
A repository is the a of collections. Its contents are accessed through an InfoHarness server operating in accordance with our invention.
A gateway is a component of the present embodiment of our of our invention that provides an means for connecting an Hypertext Transfer Protocol (hereinafter HTTP) server to an InfoHarness server according to the Common Gateway Interface
(CGI) specification which is well known by those who practice in the art. An InfoHarness Server (IH server) . is a server operating. in accordance with our invention. B. OVERVIEW
One embodiment of our invention is illustrated in Figure 1. Our inventive system 10 is interposed between a plurality of end users 12 who want access to heterogeneous information 14 composed of a plurality of IUs 16. In the embodiment described herein, the end users 12 use an HTTP- compliant browser 18 to connect to an HTTP server 20, which in turn connects to an IH server 22. Within the IH server 22 instantiated into memory is a respository 24 of IHOs 26 and collections 28. This respository 24 was created from a database 30 of IHOs and collections created from metatdata extracted from the IUs 16 and stored in a database. Users 12 connected to the IH Server 22 then can obtain IHO, metadata or search collections, using any user-specified criteria to retrieve the target information from the IUs 16.
Our inventive method is composed of two phases. Phase 1 is the registration phase under which IU's are pre-processed to create IHO's, collections and repositories. Phase 2 is the information access phase wherein end-users access the IH server through the HTTP server and use the IHO's, created in the registration phase and loaded in the memory of the IH server, to locate and access the IU's. Figure 2 illustrates the methodology embodied within the registration phase. An information provider or InfoHarness administrator having information which the provider desires to make accessible to userε, would invoke an InforHarness registration procedure (software) to register the information units 30. Upon invoking the InfoHarness registration procedure, the administrator would first invoke a pre-processor 32 to prepare the information for the extraction process . The next step involves the administrator invoking one of a plurality of extraction processes 33 to extract metadata from the information units that are be registered (the appropriate extractor process depends on the type of information the administrator is registering) . The output of the extraction process is the creation of a metabase (which is a file or database) of IHOs 34. This metabase contains metadata of the information units logically collected into a collection, and also information about IHOs and collections and the relationships among them. Upon the creation of the metabase the registration phase is complete.
Figure 3 illustrates the methodology for accessing the information in accordance with our invention. First the IH server must be initialized, then the IHOs are loaded from the metabase into the server's memory and organized into repositories 36. After the server is initialized and running, the IH server enters a main event loop and waits for requests from clients 38. End-users then access the IH server through an HTTP server 40. Once the end-users access the IH server, they perform one of three actions to select an object 42: (1) a metadata based query, (2) a content based query, or (3) explicitly navigate around the IHOs. Once an object is selected, it can be accessed and browsed by activating either a client side browser 44 or server side browser 46. The user may also operate on the object choosing from a set of procedures such as print, store, fax, etc. C. REGISTRATION PROCESS
As described above, the registration process involves an owner, creator, or provider of information working with a system administrator to pre-processing the information for the purposes of extracting metatdata in a format usable by our IH server. Registration is accomplished in four steps: pre¬ processing the physical data, extracting the metadata, storing the metadata in a metabase, and transferring the extracted metadata from the metabase to the IH server.
The main function of the pre-processing is to process the physical data and build logical structures (IHOs and collections) which the IH server can later use for presentation to end users. Physical data, in this sense, includes formatted, unformatted, structured, and unstructured data. It could also be dynamic; e.g., SQL queries or newsfeeds. Metadata, which is extracted in accordance with the methods described herein can be content-dependent, content- descriptive, or content-independent. Content-dependent metadata is based strictly on the contents of the physical data. Examples of content-dependent metadata are keyword indices for textual data, grids for image data, speaker change lists for audio data, etc. As the name "content-descriptive" suggests, it describes the physical contents of the data. Examples include spatial information for video data, the subject of a talk for audio data, document composition for multimedia data, etc. Content- independent metadata, on the other hand, does not rely on the specific content of the underlying data for it's values. Media type, document history and location, temporal information for video and audio, etc., are examples of content-independent metadata.
In accordance with this embodiment of our invention, all data to be registered with the system must be accessible as a file on a mounted file system. This typically means that the data must all be on the same LAN, although it may be εtored on multiple file servers if those servers are directly accessible, such as via an network file εerver (NFS) mount.
Related data, although physically scattered, is usually grouped together in some logical structure superimposed on the underlying physical data. A directory structure is one example of a logical structure which usually exists for most file systems. Also, the system administrator working with the InfoHarness application builder may determine that users might be interested in relationships among collections and IHOs. As and example, a parent-child relationship between the two collections could be imposed. Other relationships that could be modeled are: 'contains, ' 'is contained in, ' and 'part-of. '
The end result of pre-processing and metadata extraction is the creation of a metabase (which may be a file or database) containing metadata, IHOs and collections. As deεcribed above, when this information is loaded from the metabase into an IH server, it is materialized as IHOs and collections in the server's memory organized into repositories. In our invention, we use different extractor processes for various document data types. These extractor processes are easily created using skills well known in that art. As an example, we use extractors for Text, PostScript, HTML, man pages, and e-mail message files.
An IHO encapsulates a single IU. A collection does not encapsulate any IU, rather it is a set of other IHOs or collections. An IHO, encapsulating an IU, would thus have a unique identifier by which to distinguiεh itself from other
IHOs.
A collection, is a set of IHOs, related together at the discretion of the system administrator of InfoHarness application builder. Physically, in the embodiment described herein, a collection is represented by a number of Unix files in a common subdirectory, whose name is the name of the collection.
This collection directory contains several important files:
IH_SUMMARY file -- This file contains some meta- information about the collection itself, such as where the index is located (if any) , what the collection metadata filename is called, etc.
Metadata file -- A file that contains the metadata extracted during the registration process.
Index -- Depending upon the indexing scheme used, one or more index files may be preεent.
Our metadata extraction proceεses are summarized by the following pseudocode:
1. Validate user supplied options for the extraction process . 2. For each directory to be scanned
For each file eligible for extraction Invoke extractor Process returned metadata Collect extracted text
3. Index extracted text if requested Pre-processing consists of extracting metadata from physical information sources, creating representations for IHOs, collections and relationships, and optionally creating an index on textual contents of the sources . In the current embodiment, the physical information sources should exist on the same file system as the pre-processor and indexer. The metadata is also stored on this file system at the location specified by the administrator.
The pre-processor uses extractor methods for the extraction of metadata from the phyεical information sources. These are type-specific methods which process the information sources and return metadata in a specific format. In our embodiment, the preprocessor does not analyze the source type to invoke an extractor; instead the syεtem administrator of our IH server is expected to indicate a particular extractor which will then be used for metadata extraction. The pre-processor treats all the IHOs generated as constituents of a collection. A user-specified location is used to store the metadata files created. The user has the option to append newly generated metadata to an existing collection. The user can also indicate whether this generated collection should have a text index built for it, and if so, which indexing technology to use for this purpose. The indexing technology itself is not part of the present invention. However, the architecture of the present invention allows a variety of indexing technologies to be uεed in a plug-and-play fashion. Example indexing technologies are WAIS and GLIMPSE. If an index is generated, it is installed in the same directory as the metadata files. A cross-reference file is also generated which maps the index database objects to the to IHOs. If indexing is not performed the generated collection is treated as a set. A typical extractor takes as input the location of the information source which is to be encapsulated. It returns a formatted string which the pre-procesεor interprets to generate metadata entries that are stored in a metadata file. The metadata file itself has a well-defined format, described in more detail below. The extractor also extracts the text associated with the generated IHOs. To extract text from a 'C file, for inεtance, the C file has to be parsed to recognize comments and function signatures, because indexing the language constructs and variable names does not usually make sense. In this case an IU would be associated with either a function or the file as a whole. Representative information is also extracted and associated with the IU. This will be displayed to the user at browse time; e.g., for mail mesεageε the εubject line iε used aε a representative, for HTML documents the contents of the TITLE construct are used, etc.
Metadata is passed from the extractor in a format called the metadata transfer format. This format (a Perl data structure) has constructs which allow arbitrary graph structures to be imposed on top of the IHOs (e.g., parent-child relationships between collections) . The object'ε type and subtype are asεociated with the IUs and are both determined by the extractor process. Finally, the location attribute (i.e., a value used to locate the IU in the file syεtem) is also determined by the extractor. This could be the full path for a UNIX file for caεes where the IU is associated with the whole file. It could also be a Uniform Resource Locator (URL) (as it iε understood on the World Wide Web) or some other locator. URLs are used for HTML documents. The location of a λC function, on the other hand, could be specified aε ' filename%function_name' . There iε no requirement on the preciεe format of thiε locator, as long as the browsing methods can decipher it to retrieve the original data asεociated with that IU. Beεideε these, any number of attribute-value pairs can be asεociated with the IU. e.g. the attribute name associated with an IU will contain the representative information extracted by the extractor. For IHOs which do not contain an IU any arbitrary text could be assigned to this attribute; e.g., for a collection IHO the name of the collection can be asεigned and thiε will be diεplayed to the user.
Metadata is transferred between the extractors and the pre-processor as a structured Perl string, whoεe format iε shown as 48 in Figure 4(a) . Each IU has six fieldε of metadata aεεociated with it (e.g., fll through fl6) , each separated by a colon, and each IU's metadata is separated from the next IU's by a vertical bar 52. Figure 4(b) depicts a table 54 that summarizes the purpose of each field.
The location field 55 is created by the extractor process to identify where the IU is εtored. The Unique Objld Indicator field 56 instructs the pre¬ processor whether to use the Location to construct a unique object identifier. For some caseε the extractor supplied locator is guaranteed to be unique so that the pre-procesεor need not manipulate it. One εuch case is IUs aεεociated with HTML fileε, for which URLS are generated by the extractor aε IU locations. These URLε are unique. If this flag is set, the pre- procesεor constructs a unique identifier for the object.
The ordinal value of the Depth field 58 indicates the depth of that IU in an in-order traversal of the desired repository εtructure. The collection object, which is the root of this tree, is pre-assigned a depth of 0. An extractor returning a simple list of file IUs that are to be a part of this collection would assign a depth of 1 to each of these file IUs. The pre-processor then makes all of these file IUs children of the collection object. An example of the structure in the metadata transfer format is shown in Fig. 5.
The Subtype field 60 is determined by the extractor and is used later by the IH server to determine how to access the actual IU. The Subject field 62 contains summary information related to an IU and is what the user will see as the "name" of the object at the time of browsing. The last field 64 is the text body of the IU, to be used if the collection is being indexed.
The sequence of the entries in this metadata transfer format εtream 48, along with the value of its depth field 58, determineε it's position in the collection structure built by the pre-procesεor. An IU may be repreεented multiple timeε in thiε εtream, poεεibly to aεεert a relationεhip with other IUε, but a metadata entry is made for only the first occurrence. An empty text field indicates that the IU need not be cross- referenced for indexing.
Since the colon ( λ : ' ) and vertical bar ( ' | ' ) characters are used as delimiters for the fields and IU entries, respectively, they need to be "escaped" with a backslash ('\') if they occur anywhere within the content of any of the fields. After the pre-procesεor haε parsed the εtream returned by the extractors it εtores the object representations into a single metadata file. These metadata entities will be read in by the IH server when it is brought up and instantiated as in- memory IHO representations. There is a fixed structure to the entries appearing in the metadata files.
There are two kinds of entries in the metadata file, object entrieε and relationship entries. Object entries are flat representations of the IHOs whereas relationship entries represent parent-child relationshipε between IHOε . Object entries have an object identifier. This object identifier could be constructed by the pre-procesεor or the extractor as εpecified by the indicator in the metadata transfer format. If the pre-processor constructs the object identifier, it does so in a specific format. The format is: machineid: location: εubtype
The machineid iε a unique phyεical machine identifier of the machine on which the pre-processor is run. This field is automatically generated by the pre-processor. The location and subtype field valueε are assigned based on the values returned in the metadata transfer stream. The location field, for a simple or composite IHO would be the location of the associated IU. For a collection IHO this would be the location of the collection; e.g., for an indexed collection it would be the location of the index. The subtype field value is the same as the subtype value returned in the metadata transfer stream. For a collection IHO this iε the index type; i.e., waiε or glimpse.
An object entry is of the form as shown in Figure 5. The first field 70 serves as the object identifier. This object identifier is used for uniquely identifying the object and εerves as a key. The type 71 and subtype 72 values correεpond to non-terminal and terminal classes in the server abstract class hierarchy. The location value 73 is used by the browser methods to retrieve the data associated with the IU encapsulated by this object. Following this there could be an arbitrary number of attribute-value pairs 74. The >name=string' pair is used when the user is browsing the repository. The string iε displayed to the user.
A relationship entry is of the form: [objidl | objid2]
This establishes a parent-child relationship between the objects represented by objidl and objid2 , with the former being treated as a parent of the latter. There are no constraints on the order in which entries appear in the metadata file except that the object entry has to appear before its object identifier can take part in a relationship. D. GATEWAY PROCESS
In our illustrative embodiment, the HTTP server is connected to the IH server through a gateway. This gateway interacts with two types of programs: an HTTP server (which in turn interacts with an HTTP browser (e.g. Mosaic or Netscape) . Any HTTP-compliant browser can interact with the HTTP server, and the IH servers . There are five actions exported by the gateway to the HTTP browser. They are: Setup, Init, Expand, Query, and Show. There are four actions exported by the IH server that the gateway uses. They are: Init, Expand, Query, Show
By design, the HTTP protocol iε εtateleεε (εee http:// info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html for information) . This implies that interactions between an HTTP browser and any HTTP server is stateless. No information about clients iε kept by the HTTP εerver between connectionε . This is contrary to the needs of many applications, including our gateway. To understand why this is so, consider the information necessary for a user to issue a content-based query againεt a collection. The uεer uεt εpecify: the machine where the IH εerver they wiεh to interact with iε running, the port number the IH εerver iε uεing to accept connectionε, their X display value, the query text that should be used to select objects from the collection, the maximum number of hits to return on a successful query, and the collection against which the query will be executed. One approach to gathering this information would be to force the user to specify all necesεary parameters by hand on each interaction with the gateway. However, that would clearly not be a very user-friendly approach. Instead, our design is such that the user only needs to enter certain information once, on a "εetup" screen. All screens that are presented to the user after the setup εcreen have "state" information embedded into the URLε, εo that if the user activateε the URL link, the embedded εtate information can be extracted from it. One εide effect of this is that, since some of the HTML pages created by the gateway have many URLs, and each of these URLs contains all of the information necessary to maintain the state of the user's interactions, there is a large amount of duplicated information in the URLs on a single page.
This arrangement causes the gateway to spend time performing two tasks: retrieving information from incoming URLs, and reformatting the output of the IH server into URLs (and HTML) .
Given the fact that interaction between the HTTP browser and the HTTP server is stateless, it does not necessarily make sense to talk about a correct sequence of calls to the HTTP server. As long as the HTTP browser passes valid requests to the gateway, the requests will be processed without regard to order. However, in order to develop a basic understanding of how the HTTP browser and the HTTP server interact, consider the following εequence of eventε which many users will find typical .
The HTTP browser opens a URL pointing to the gateway (e.g., http://http.ctt.bellcore.com/cgi-bin/nph-ih.cgi) . The HTTP server responds by returning the setup screen to the HTTP browser. The user determineε the IH εerver to connect to and enters the correct information on the fill out form on the setup screen. Once the form is submitted, the gateway connects to the εpecified IH εerver and requestε a list of collections managed by the IH server. For each item in the list returned by the IH server, the gateway generates a URL containing all the necessary information required to accesε thiε collection on the next interaction, and returnε the liεt to the HTTP server, which in pasεeε it to the requeεting HTTP browεer. The uεer can then εelect one of the collections returned by the gateway for further interrogation. If the collection is indexed, the gateway presents a form to the user for entering the search text. If the collection is not indexed, the gateway connectε to the appropriate IH εerver (aε εpecified in the URL) and requeεts the contents of the list. The list contents are then formatted appropriately in HTML by the gateway, and URLs are generated for each item in the list.
If the HTTP browser receives a fill out form, a search can be initiated. If the user submitε a query, the gateway εends that request to the IH server. The IH server response is similar to the resultε returned when the memberε of a liεt are requeεted, and again, the gateway formatε the results into a list with their corresponding URLs. In either the search resultε liεt, or the εimple liεt, the HTTP browεer can select any of the items in the list. If the user selects an item (i.e., clicks on the link) , this translateε to εaying "εhow me this item. " The gateway contacts the appropriate IH server (again determined by the εtate information embedded within the URL) and requeεts the particular item. If the item has been designated as displayable by the IH server, the IH server retrieves the item and uses X to display the item back to the user. If the item haε been deεignated aε diεplayable by the HTTP browεer, the IH εerver retrieveε the item and sendε it back to the gateway. The gateway determines (based upon the type of data returned) what Multimedia Internet Mail Extension (MIME) type the item corresponds to and returns the appropriate header information as well as the actual data to the HTTP browser. Although IH users will find the steps outlined in the previous paragraph familiar, it is important to remember that these steps can occur in any sequence as long as the appropriate information is passed to the gateway. Again, the reason for this is the stateless nature of the HTTP. Some users may wish to exploit this feature. A user may wish to construct several "canned" queries against a particular IH εerver. The URL's representing these queries can be imbedded in other HTML documents providing more descriptive text regarding the queries, or their intended reεultε. Another uεer may want to provide accesε to individual objectε held by the IH εerver. They may conεtruct URLs that point directly to the objects (even objects that are members of an indexed collection) and circumvent the need for search queries to retrieve the objects.
The proceεεing that occurε at the gateway iε relatively εtraightforward. When an IH server generated link is activated by the user (e.g., the user clicks on an object on the query resultε εcreen) , the gateway examines the URL that was activated. All such URLs are unescaped and validated. Unescaping a URL consiεtε of replacing all εequences of the form %XX (where X is a valid hexadecimal value) with their corresponding ASCII value. Validating a URL consistε of extracting the information contained in the URL (i.e., IH εerver addreεε, port, query text, etc.) and checking that the valueε are within certain conεtraintε (e.g., the address is a valid TCP/IP addresε, the port number iε non-negative, etc.) . After validation, the gateway identifies the action being requested by the user and performs the specified action. For some actions (e.g., query, expand, show) the IH server is contacted for the desired information. For others, the gateway can handle the request itεelf. In cases where interaction with the IH server is necessary, the gateway determines the response type for the IH server and performs the necesεary reformatting of any returned data. The gateway convertε the response into an
HTTP compliant message and ships it back to the HTTP browser.
The gateway supportε a number of different "actions" that a HTTP browser can request. Each of these actions is described below.
A "setup" request presents the user with the initial IH server setup screen. This screen is used to set default values used in other interactions with the gateway. This action is normally the first action in a set of interactions between the user and the gateway. The "init" requeεt determineε the host name of the IH server, the port where the server is accepting requeεts, and the DISPLAY value of the user's machine. Default values for these variableε are maintained in the gateway and are preεented to the uεer. The end user may alter any of these values from the setup screen. The values submitted by the user are then maintained acrosε invocationε of the gateway by adding them to all URLs created by the gateway and returned to the user. Once the user has specified these valueε and haε εubmitted the requeεt to the gateway, they are presented with the list of collections that the IH server they specified can access.
The "expand" request expands collections. Expanding a collection has a different meaning for different types of collections. For indexed (i.e., searchable) collections, expand provides a form-baεed interface for specifying search arguments for the collection. For all other collections, expand causeε a request to be sent to the IH server asking for a particular IH collection (specified by an object ID) . The results of this request are formatted in HTML for display back to the HTTP browser. The HTML will not include a URL to the parent collection when the object's type is LIST; otherwiεe, a URL to the parent will be included in the HTML.
A "query" request performs a query on an indexed collection. The query text is pasεed to the IH εerver and if the collection containε any information unitε that εatiεfy the εearch criteria, the IH server returns a list of the IHO IDs corresponding to the information units. If no matching information units were found, the IH server returns a message εtating that no matches were found.
The "show" requeεt provideε the uεer with a capability to view particular object. The object ID of the desired object and the HTTP browser's DISPLAY value are pasεed to the IH εerver. The IH server will either return the desired object to the gateway (which then passes the object back to the HTTP browser) , or it will start a procesε to display the object back to the HTTP browser. E. DESCRIPTION OF THE IH SERVER The IH Server is key to our inventive syεtem and provides the end-users with access to a set of IH Objects (IHOs) that make up that server's repository. Upon εtart-up, the server is told what collections will make up that server's repository. For each collection specified, the server locates, reads, and parses the collection's metadata file, constructing an internal (in-memory) representation of the IHOs and their relationships. Each IHO in memory is an instance of an "artifact" C++ subclaεs; the particular subclaεs depends upon the type of the IHO and determines how the object will handle incoming HTTP browser requests. Once it has read the metadata, the server goes into an event loop where it waits for incoming requestε from the Gateway, processes those requeεtε, and εendε back appropriate reεponses .
The following sections describe the processing performed by the server in more detail.
The IH server is initialized either manually by an adminiεtrator or automatically during a machine's boot cycle. The server is told which collections will make up its repoεitory through variouε command-line argumentε . For each collection, an ihMeta object iε constructed to read and parse the metadata for that collection (see table 75 in Figure 6) . Each collection is stored in its own subdirectory and contains a file called
IH_SUMMARY that contains meta-information about the collection. The εerver uses that meta-information to determine εpecifically which IHO metadata files to read.
Each metadata file contains entities describing encapsulated IHOs and their inter-relationships. The ihMeta object parses each entity one at a time. An entity can be either an IHO or a relationship. For each IHO entity, a new ihArtifact C++ object is constructed. The object is actually an instance of one of the concrete clasεeε derived from ihArtifact. The particular concrete class generated depends on the IHO's type attribute; each artifact subclass defines specific behavior for variouε requests against that type of object. The type thus determines how the artifact will reεpond to end-user actions on the object. Once the object has been created, it is added to a global object table for future reference, using the Objectld as the key.
Relationship entities designate parent-child associations between two objects. When a relationεhip is read from the metadata file, the εerver lookε up both "ends" of the relationship in a global object table and establishes a bi¬ directional reference between the parent and child artifacts (i.e., the child is added to the parent's set of children and the parent is added to the child's set of parents) .
While parsing metadata, if the ihMeta object detects malformed entities it reports appropriate error meεεageε to the adminiεtrator. If too many errorε are found, the server iborts before reaching the event loop. Once the server has εucceεεfully read in all of its collections, it goes into the main event loop and waits for requests from clients.
The IH εerver runtime object model is baεed upon a claεε hierarchy of abstract and concrete C++ classeε. Every IH Object has both a type and a subtype. The type defines which concrete clasε will repreεent the IHO in the server's internal representation of the object and how, in general, the object will respond to user actionε. The subtype determines how those general actions on the object will actually be implemented (for instance, server-side PostScript objects (type MM, subtype postscript) get displayed by running Ghostview while server- side FrameMaker objects (type MM, εubtype frame) get diεplayed by running FrameMaker software. The types and subtypeε of the objects are determined by the extractors during collection preparation.
Figure 7 showε a claεε inheritance diagram for the ihArtifact family of claεses. ihArtifact is an abεtract class that defines the interface to all IH Objectε in the system. As an example, the, ihArtifact abstract class 80 inherits the attributes from the ihArtFile objects 82 and the inArtSet objects 84.
Figure 8 depicts a table that defines the abstract interface to artifact objects. Figure 9 depicts a table containing descriptionε of how each of the subclasεes implements those methods described in Figure 8.
Each metadata entity in a repository is repreεented at runtime by an instance of a class in the ihArtifact hierarchy. These artifacts are maintained via two mechanisms: (1) an object table that maps object IDs to artifacts, and (2) a graph, linking objects by two-way parent-child relationships. As the metadata entries are read from files and instantiated as artifacts, they are added to the object table. This table is stored in an instance of the ihGraph class (see Figure 10) called "graph". Figure 11 shows an example of the primary object relationshipε in the server at runtime.
Once the server has finished loading all of the metadata from the repository's collections, the server enterε the main event loop. The main loop iε responsible for reading and processing requests. In pseudocode:
Do forever:
Wait for an incoming connection from a client
Spawn a new process to handle the request (s)
For each incoming request (normally only one) , Read the request Process the request Return the response to the client
Close connection and exit child process
The server processes each incoming request as it is received from the HTTP browser. The server contains a global _ instance of the clasε ihlpc called "εerver" that handles the inter-process communications. The main event loop asks the "server" object to read the next request; once read, the request is pasεed on to the metadata graph object for proceεεing. The graph parses the request to determine the object ID of the object being acted on as well as the action to take on it. The graph looks up the artifact in its object mapping table, invokes the appropriate method on that artifact, and captures the results. The results are then returned back to the HTTP browser. Figure 12 shows an example of this behavior in an object interaction diagram. The main event loop 100 tells the server object 101 to read a request and tells the graph to process 102 the request. The graph invokes the appropriate method on the artifact (in this caεe, activate 103) , which may in turn runε a browser script 104 to actually retrieve the desired data. The resultε are returned to the gateway by the εerver object. Each object type in the IH server responds to user interactions in its own way. Sometimes this functionality is coded directly in C++ in the IH server, other times the functionality is dependent upon "helper" programs called "browser-εcriptε . " A browser-script defines type/εubtype- specific mechanisms for accessing an object.
The input to a browser-script is a location parameter that identifieε the object to be viewed. The reεponεibility of the browεer-εcript is to display this object to the user; how this is achieved depends upon the kind of data contained in the object and how that data iε to be shown to the user. For example, the browser-script for PostScript documents is invoked when the user wants to display a document whose type is MM ("server-side" multimedia) and whose subtype is pε . The PostScript browser-script takes the name of a PostScript document and executes a viewer program (i.e., ghostview) to display that document. The C browser-script is passed the name of a C file and the name of a function within that file; the script extracts the specified function and sendε that text back to the invoking program (the εerver) .
There are two implementation detailε that are not central our invention but which are important to highlight in thiε embodiment: (1) The encapsulate method for executing system commands; and, (2) the ihBlockMgr class for capturing large output.
There are several inεtanceε where the IH εerver needε to execute a UNIX program (such aε a Perl εcript) and capture itε output. For example, the server runs Perl programs called "Browser-εcripts; " these scripts display the contents of an object to the user in a type- and subtype-specific manner.
Additionally, when the server querieε an index, it needε to run an indexer-εpecific Perl program, which in turn executes a search program and formats the responseε . The stand-alone function "encapsulate" is used for both of these tasks. Encapsulate forks a new child procesε and establishes the equivalent of a pipe between the parent and child processes: the child's standard error and output are redirected back to the parent, which then reads that output. The output from the child is collected in a dynamically sized buffer (see the Block Manager, below) ; the buffer can then be sent back to the HTTP browser if necessary. The GNU String clasε iε not sufficient by itself as a data structure for storing arbitrarily long byte streams because it is restricted to containing a maximum of about 32,768 bytes. Therefore, a more sophisticated mechanism is required for capturing the output of browser εcriptε or for reading in arbitrarily large fileε . The ihBlockMgr class serveε this purpose. This clasε maintainε a sequence of zero or more "blocks," or bufferε, of data. Each block can hold up to a fixed number of byteε . Aε data is being captured by the encapsulate function or read in from a file, it is written into the last block in the block manager' ε εequence. When the current block fills up, a new block is added to the sequence. Thus, the block manager is an efficient way to hold a dynamically growing stream of bytes. In addition to providing mechanisms to add data to the block manager (which is inεtantiated once globally) , ihBlockMgr includeε methodε for iterating through the blockε one at a time and for clearing out the manager's contents. E. SUBSYSTEM INTERACTION
Within our pre-processing methodology we define a procesε "in_prep", which is a Perl script used to extract metadata. In_prep cooperates with two other typeε of programs: extractorε and indexerε . Extractorε are type specific Perl subroutineε required by in_prep to traverse phyεical data and extract the necesεary information required for metadata and indexeε . A εeparate extractor is needed for each type of data placed under control of an IH server. Indexers can be implemented using any language desirable. The only limitation imposed is that the in_prep procesε muεt be able to access the indexer via the Perl "systemO" function. Indexers are not type specific, since they can be applied to any text data. Indexers are used to provide content-oriented queries over physical data. Figure 13 illustrates the interaction that take place between in__prep, extractors, and indexers. For each invocation of in_prep 111, an extractor is called to process each member of the desired information units. The in_prep process passes the location of the physical data (usually a file name) to the extractor 112. The extractor in turn processes the physical data (referred to as an information unit IU) and extracts metadata as well as text to be indexed from the IU, and if there is more than one IHO in the IU, the extractor also establishes relationshipε between the objectε.
The objectε and relationεhipε created by the extractor 112 are returned to in_prep 111 which writeε them to the metabase for use later by the IH server.
In_prep 111 invokes the appropriate indexer to index 113 the text data extracted from the IU. The output of the indexer is saved in the metabase for later use by the IH server. The metadata entries produced by in_prep and stored in the metabase are loaded into memory by the IH server at run time. The IH server then enters a loop where it responds to incoming requeεtε from HTTP browsers . Referring back to Fig. 3 ,_ after the server is initialized and running, the IH server enters a main event loop and waits for requests from clients 38. End-users then accesε the IH server through an HTTP server 40. Once the end-users access the IH server, they perform one of three actions to select an object 42: (1) a metadata based query, (2) a content baεed query, or (3) explicitly navigate around the IHOs. Once an object is εelected, it can be accessed and browsed by activating either a client side browser 44 or server side browser 46. The user may also operate on the object choosing from a set of procedures such as print, store, fax, etc.
Figure 14 illustrates the processing of a request by an end-uεer for conducting a metadata query. A client requests 121, via HTTP, the initial collection held by an IH server. The request is passed, via the CGI 122, to the gateway. The gateway connects to the IH εerver and requests 123 the initial collection via a internal protocol. The IH server determines the initial collection based upon its in-memory metadata and returns the results to the gateway 124. The gateway reformats the response into HTML and sends 125 its responεe to the HTTP εerver. The HTTP εerver passes 126 the results back to the HTTP browser client without interruption since our gateway is a "no parse header" gateway. This means that the HTTP server will do no parsing of our response, and the gateway must be able to form correct HTTP responses.
Figure 15 illustrates the process for conducting a context-oriented query. The end-uεer via HTTP, for an InfoHarneεs collection held by an ih_server requests a context- oriented query 151. The request is passed via the CGI to the gateway 152. The gateway connects to the ih_server and requestε a context-oriented query 153, passing the query text. Based upon the type of the InfoHarness collection, the proper indexer is invoked to perform the search 154. The indexer returns a list of IHOs that satisfy the query 155. The IH server returns the list of IHOs to the gateway 156. The gateway reformats the list of InfoHarness objects in the HTML and returns the list to the HTTP server 157. The HTTP server transmits the list of objects to the HTTP browser 158.
Figure 16 illustrateε a the processing of a request for invoking a server side browεer. A client requests, via HTTP, an IH object held by an IH server 161. The request is paεεed via the CGI to the gateway 162. The gateway connectε to the IH server and requests the IH object via any internal protocol 163. IH server determines that the requested object requires the invocation of a εerver side browser 164. The correct browser is invoked with the location of the object. The browser starts a procesε that diεplayε the object back to the client'ε machine 164. Any error text generated by the browser is returned to IH server 166. IH server returns a message to the gateway indicating either successful invocation of the browser, or error text generated by the browser 167. If an error message was received from IH server, it is reformatted into HTML and passed back to the HTTP server 168, otherwise, the gateway indicates success via the HTTP 169 OK message. The response from the gateway is transmitted to the user via HTTP 170. As long as the user does not close the application started by the browser, they can invoke any actionε εupported by the application and the reεultε will be εent back to the machine where the browser was started. (Note the security riεkε aεεociated with server side browsers. The user has accesε to an application that runs with the inherited permisεionε of IH εerver. Thiε implieε that the uεer may be able to open other files, change other files, and may even be able to escape to the shell on the machine where the browεer was started (again inheriting the identity of the user that started IH server) .
Figure 17 illustrates the process for a request for invoking a client side browser. A client requestε via HTTP, to see an IH object held by an IH server 171. The request is pasεed via the CGI to the gateway 172. The gateway connects to the IH server and requeεts the IH object 173. IH εerver examines the type of the object requeεted and determines that the object can be displayed using a client side browser (or in HTTP browser terms, an external viewer) . The location of the object is determined and the IH server returns the contents of the file to the gateway 174. The gateway performs a mapping between the IH subtype of the object and the MIME type corresponding to the object. Thiε MIME type is returned with the object contents to the HTTP server 175. The HTTP browser receives the contents of the object and determines which external viewer to invoke for the specified MIME type 176. The contents of the object cire stored in a temporary file. The external viewer is started 177_ with the name of a temporary file that contains the contents of the requested object.
It is to be understood that the method and system for providing uniform access to heterogeneous information as illustrated herein are not limited to the specific forms disclosed and illustrated, but may assume other embodiments limited only by the scope of the appended claims.

Claims

We Claim :
1. A system for providing uniform access heterogeneous data from a plurality of end-uεers, εaid system comprising: a database of metadata extracted from a plurality of information sources; and a server having loaded in memory, instantiations of said metadata from said database
2. The system as claimed in claim 1 wherein further comprising information servers containing said information sources connected to said server.
3. The system as claimed in claim 2 further comprising a plurality of end-users operating HTTP compatible browsers all connected to εaid εerver.
4. The εyεtem as claimed in claim 3 wherein said instantiations of said metadata loaded in said server memory ajre organized into objects, collections, and respositories .
5. A method for providing a plurality of end-userε acceεε to individual information unitε of heterogeneouε information, εaid method comprising: pre-processing said individual information units of heterogenous information to extract metadata for each of said informtion units; creating a database of said metadata; loading said metadata from said database into a server'ε resident memory; placing said server into a main line loop awaiting requestε from εaid end-users; receiving requestε for information at εaid εerver from εaid end-users; and responding to said requests using said metadata εtored in εaid reεident memory.
6. The method aε recited in claim 5 wherein said database created from said metadata organizes said metadata into objects and collections.
7. The method as recited in claim 6 wherein the step of loading said metadata from said database includes the steps of loading said objects and collections, and further includes the εtep of organizing εaid objectε and collectionε into repositories .
8. The method as recited in claim 6 wherein said request received from said end-userε is either a metadata query or a information content query and wherein said server respondε to εaid query returning one of εaid objects satiεfying said query.
9. The method aε recited in claim 8 wherein εaid reεponding step further includes the step of invoking a client side browser to view said information units identified by said one of said objects.
10. The method as recited in claim 8 wherein said responding εtep further includes the step of invoking a server side browser to view said information unitε identified by εaid one of said objects.
PCT/US1996/015620 1995-10-16 1996-09-26 Method and system for providing uniform access to heterogeneous information WO1997015018A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US54364495A 1995-10-16 1995-10-16
US08/543,644 1995-10-16

Publications (1)

Publication Number Publication Date
WO1997015018A1 true WO1997015018A1 (en) 1997-04-24

Family

ID=24168924

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1996/015620 WO1997015018A1 (en) 1995-10-16 1996-09-26 Method and system for providing uniform access to heterogeneous information

Country Status (2)

Country Link
TW (1) TW307840B (en)
WO (1) WO1997015018A1 (en)

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998055940A2 (en) * 1997-06-02 1998-12-10 Telefonaktiebolaget Lm Ericsson A method and arrangement for browsing documents in a database
EP0924628A2 (en) * 1997-12-22 1999-06-23 Hewlett-Packard Company Methods and system for using web browser to search large collections of documents
WO1999039286A1 (en) * 1998-01-30 1999-08-05 Eoexchange, Inc. Information platform
WO1999041688A1 (en) * 1998-02-13 1999-08-19 Newriver Investor Communications, Inc. Mapping compliance information into useable format
WO2000010105A1 (en) * 1998-08-17 2000-02-24 Iatlas Corporation Enhancing computer-based searching
EP1003110A2 (en) * 1998-08-31 2000-05-24 Xerox Corporation Property-based user level document management
WO2000043917A2 (en) * 1999-01-20 2000-07-27 Computer Associates Think, Inc. System and method of presenting channelized data
WO2000048073A1 (en) * 1999-02-09 2000-08-17 Hearme Method and apparatus for managing assets of a client side application
WO2001013287A1 (en) * 1999-06-11 2001-02-22 Cci Europe A/S A content management computer system for managing publishing content objects
WO2001016734A2 (en) * 1999-08-31 2001-03-08 Accenture Llp A system, method and article of manufacture for a self-describing stream in a communication services patterns environment
WO2001033387A2 (en) * 1999-10-29 2001-05-10 Liberty Integration Software, Inc. Apparatus, systems and methods for electronic data development, management, control and integration in a global communications network environment
WO2001055910A2 (en) * 2000-01-27 2001-08-02 American Express Travel Related Services Company, Inc. Information architecture for an interactive environment
WO2001067309A2 (en) * 2000-03-03 2001-09-13 Radiant Logic, Inc. System and method for providing access to databases via directories and other hierarchical structures and interfaces
WO2001075673A2 (en) * 2000-03-31 2001-10-11 Mypoints.Com, Inc. Method for developing and maintaining content for websites
FR2808348A1 (en) * 2000-04-28 2001-11-02 Noheto S A S PROCESS FOR THE DYNAMIC PREPARATION OF DIGITAL FILES CORRESPONDING TO INFORMATION MEANS SUCH AS PAGES IN HTML FORMAT
GB2364403A (en) * 1999-11-11 2002-01-23 Canon Kk An information search system and a text translation system
EP1179789A2 (en) * 2000-08-11 2002-02-13 International Business Machines Corporation Method and system for accessing information on a network
EP1244032A1 (en) * 2001-03-21 2002-09-25 Nokia Corporation Management and distribution of electronic media
EP1244031A1 (en) * 2001-03-21 2002-09-25 Nokia Corporation Management and distribution of electronic media
EP1244030A1 (en) * 2001-03-21 2002-09-25 Nokia Corporation Management and distribution of electronic media
EP1250669A1 (en) * 1999-11-03 2002-10-23 Accenture LLP Data warehouse computing system
GB2378004A (en) * 2001-07-27 2003-01-29 Financial Services Dot Com Plc Macro data and software management
WO2002001384A3 (en) * 2000-06-28 2003-01-30 Twi Interactive Inc Database system, particularly for multimedia objects
US6562076B2 (en) 1998-08-31 2003-05-13 Xerox Corporation Extending application behavior through active properties attached to a document in a document management system
US6582474B2 (en) 1998-08-31 2003-06-24 Xerox Corporation Tagging related files in a document management system
US6588011B1 (en) 1999-12-14 2003-07-01 International Business Machines Corporation Apparatus for automatically generating restore process during software depolyment and method therefor
US6604237B1 (en) 1999-12-14 2003-08-05 International Business Machines Corporation Apparatus for journaling during software deployment and method therefor
US6615274B1 (en) 1999-12-09 2003-09-02 International Business Machines Corporation Computer network control systems and methods
EP1366449A2 (en) * 2001-02-07 2003-12-03 Biz360, Inc. System of analysing networked searches within business markets
US6704782B1 (en) 1999-12-09 2004-03-09 International Business Machines Corporation System and methods for real time progress monitoring in a computer network
US6718335B1 (en) 2000-05-31 2004-04-06 International Business Machines Corporation Datawarehouse including a meta data catalog
US6735585B1 (en) 1998-08-17 2004-05-11 Altavista Company Method for search engine generating supplemented search not included in conventional search result identifying entity data related to portion of located web page
US6772158B1 (en) 1999-12-14 2004-08-03 International Business Machines Corporation Apparatus for data depoting and method therefor
WO2004114148A1 (en) * 2003-06-20 2004-12-29 International Business Machines Corporation Heterogeneous indexing for annotation systems
FR2861476A1 (en) * 2003-10-24 2005-04-29 Thales Sa Software granules addressing process for accessing heterogeneous software data, involves defining generic rule for identification of granules, and applying rule to requests exchanged between software tools
NL1025547C2 (en) * 2003-02-21 2005-05-23 Hewlett Packard Development Co Content management portal and method for managing digital values.
US6904454B2 (en) 2001-03-21 2005-06-07 Nokia Corporation Method and apparatus for content repository with versioning and data modeling
US6912586B1 (en) * 1999-11-12 2005-06-28 International Business Machines Corporation Apparatus for journaling during software deployment and method therefor
EP1559034A2 (en) * 2002-11-07 2005-08-03 Thomson Legal and Regulatory Global AG Electronic document repository management and access system
US6959288B1 (en) * 1998-08-13 2005-10-25 International Business Machines Corporation Digital content preparation system
US7028190B2 (en) 1998-02-12 2006-04-11 Newriver, Inc. Method and system for electronic delivery of sensitive information
US7191208B1 (en) 1999-12-14 2007-03-13 International Business Machines Corporation Methods of selectively distributing data in a computer network and systems using the same
US7200627B2 (en) 2001-03-21 2007-04-03 Nokia Corporation Method and apparatus for generating a directory structure
US7254570B2 (en) 2001-03-21 2007-08-07 Nokia Corporation Query resolution system and service
US7353236B2 (en) 2001-03-21 2008-04-01 Nokia Corporation Archive system and data maintenance method
US7970260B2 (en) 2001-06-27 2011-06-28 Verizon Business Global Llc Digital media asset management system and method for supporting multiple users
US8126313B2 (en) 2000-06-28 2012-02-28 Verizon Business Network Services Inc. Method and system for providing a personal video recorder utilizing network-based digital media content
EP2506142A3 (en) * 1996-03-14 2012-11-07 Nortel Networks Limited Systems and methods for executing application programs from a memory device linked to a server
US8321465B2 (en) 2004-11-14 2012-11-27 Bloomberg Finance L.P. Systems and methods for data coding, transmission, storage and decoding
US8972862B2 (en) 2001-06-27 2015-03-03 Verizon Patent And Licensing Inc. Method and system for providing remote digital media ingest with centralized editorial control
US8990214B2 (en) 2001-06-27 2015-03-24 Verizon Patent And Licensing Inc. Method and system for providing distributed editing and storage of digital media over a network
US9026901B2 (en) 2003-06-20 2015-05-05 International Business Machines Corporation Viewing annotations across multiple applications
US9038108B2 (en) 2000-06-28 2015-05-19 Verizon Patent And Licensing Inc. Method and system for providing end user community functionality for publication and delivery of digital media content
US9076311B2 (en) 2005-09-07 2015-07-07 Verizon Patent And Licensing Inc. Method and apparatus for providing remote workflow management
US9401080B2 (en) 2005-09-07 2016-07-26 Verizon Patent And Licensing Inc. Method and apparatus for synchronizing video frames

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493677A (en) * 1994-06-08 1996-02-20 Systems Research & Applications Corporation Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493677A (en) * 1994-06-08 1996-02-20 Systems Research & Applications Corporation Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
AMY ROGERS, "Oracle Intros Add-Ons for Data Warehouses", Communications Week, No. 567, 24 July 1995, pages 15-16. *
COMPUTER, Vol. 28, No. 8, August 1995, C. JAMES, "What Goes Into an Information Warehouse?", pages 84-85. *
COMPUTERWORLD, Vol. 29, No. 11, 13 March 1995, ELLIS BOOKER, US West Chanpions Internal Internet, page 2. *
DATABASE PROGRAMMING AND DESIGN, Vol. 8, No. 1, January 1995, DAVID STODDER, "Reinventing the Database: Data Warehouse and Economics Will Shift the Database Landscape in 1995", pages 7-9. *
DATABASE PROGRAMMING AND DESIGN, Vol. 8, No. 2, February 1995, COLIN WHITE, "The Key to a Data Warehouse: Unlocking the Secrets of Data Warehousing With Thye Information Directory", pages 23-265. *
GEORGE A. THOMPSON, Warehouse? There House, HP Professional, Vol. 9, No. 5, May 1995, pages 11-13. *
IEEE WESCANEX 95: Communications, Power and Computing Conference Proceedings, Winnipeg, Manitoba, Canada, 15-16 May 1995, R. OSWALD, "Manitoba Land Related Information System: The Information Utility", pages 252-257. *
MACWEEK, Vol. 8, No. 44, 07 November 1994, ROBERT HESS, "Cyberdog to Fetch Internet Resources for OpenDoc Apps.", pages 1-2. *
SOFTWARE ENGINEERING JOURNAL, Vol. J5, No. 4, July 1990, SLONIM et al., "The Information Utility: a Project Retrospective", pages 223-236 *
SPRING COMPCON 94, San Francisco, California, 28 February - 04 March 1994, FREEMAN et al., "Hosting Services- Linking the Information Warehouse to the Information Consumer, Digest of Papers", pages 165-171. *

Cited By (104)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2506142A3 (en) * 1996-03-14 2012-11-07 Nortel Networks Limited Systems and methods for executing application programs from a memory device linked to a server
US7089489B1 (en) 1997-06-02 2006-08-08 Telefonaktiebolaget Lm Ericsson (Publ) Method and arrangement for browsing documents in a database
WO1998055940A3 (en) * 1997-06-02 1999-03-04 Ericsson Telefon Ab L M A method and arrangement for browsing documents in a database
WO1998055940A2 (en) * 1997-06-02 1998-12-10 Telefonaktiebolaget Lm Ericsson A method and arrangement for browsing documents in a database
EP0924628A2 (en) * 1997-12-22 1999-06-23 Hewlett-Packard Company Methods and system for using web browser to search large collections of documents
EP0924628A3 (en) * 1997-12-22 2005-08-24 Hewlett-Packard Company, A Delaware Corporation Methods and system for using web browser to search large collections of documents
WO1999039286A1 (en) * 1998-01-30 1999-08-05 Eoexchange, Inc. Information platform
US6078924A (en) * 1998-01-30 2000-06-20 Aeneid Corporation Method and apparatus for performing data collection, interpretation and analysis, in an information platform
US8612322B2 (en) 1998-02-12 2013-12-17 Broadridge Content Solutions, Inc. Method and system for electronic delivery of sensitive information
US7885875B2 (en) 1998-02-12 2011-02-08 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US7890401B2 (en) 1998-02-12 2011-02-15 Burakoff Stephen V Method and system for electronic delivery of sensitive information
US7885876B2 (en) 1998-02-12 2011-02-08 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US7885877B2 (en) 1998-02-12 2011-02-08 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US7890400B2 (en) 1998-02-12 2011-02-15 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US7885873B2 (en) 1998-02-12 2011-02-08 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US7028190B2 (en) 1998-02-12 2006-04-11 Newriver, Inc. Method and system for electronic delivery of sensitive information
US7890399B2 (en) 1998-02-12 2011-02-15 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US7885874B2 (en) 1998-02-12 2011-02-08 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US7885872B2 (en) 1998-02-12 2011-02-08 Burakoff Stephen V Obtaining consent for electronic delivery of compliance information
US8566195B2 (en) 1998-02-12 2013-10-22 Broadridge Content Solutions, Inc. Obtaining consent for electronic delivery of compliance information
WO1999041688A1 (en) * 1998-02-13 1999-08-19 Newriver Investor Communications, Inc. Mapping compliance information into useable format
US6122635A (en) * 1998-02-13 2000-09-19 Newriver Investor Communications, Inc. Mapping compliance information into useable format
AU756986B2 (en) * 1998-02-13 2003-01-30 Newriver, Inc. Mapping compliance information into useable format
US6959288B1 (en) * 1998-08-13 2005-10-25 International Business Machines Corporation Digital content preparation system
US7398266B2 (en) 1998-08-17 2008-07-08 Overture Services, Inc. Dynamically categorizing entity information
US6735585B1 (en) 1998-08-17 2004-05-11 Altavista Company Method for search engine generating supplemented search not included in conventional search result identifying entity data related to portion of located web page
US6654813B1 (en) 1998-08-17 2003-11-25 Alta Vista Company Dynamically categorizing entity information
WO2000010105A1 (en) * 1998-08-17 2000-02-24 Iatlas Corporation Enhancing computer-based searching
WO2000010106A1 (en) * 1998-08-17 2000-02-24 Atlas Corporation Mapping information sources
WO2000010107A1 (en) * 1998-08-17 2000-02-24 Iatlas Corporation Analyzing internet-based information
WO2000010108A1 (en) * 1998-08-17 2000-02-24 Iatlas Corporation Dynamically categorizing entity information
EP1003110A3 (en) * 1998-08-31 2001-01-17 Xerox Corporation Property-based user level document management
EP1003110A2 (en) * 1998-08-31 2000-05-24 Xerox Corporation Property-based user level document management
US6562076B2 (en) 1998-08-31 2003-05-13 Xerox Corporation Extending application behavior through active properties attached to a document in a document management system
US6582474B2 (en) 1998-08-31 2003-06-24 Xerox Corporation Tagging related files in a document management system
WO2000043917A2 (en) * 1999-01-20 2000-07-27 Computer Associates Think, Inc. System and method of presenting channelized data
WO2000043917A3 (en) * 1999-01-20 2000-11-30 Information Advantage System and method of presenting channelized data
EP1315114A2 (en) * 1999-01-20 2003-05-28 Computer Associates Think, Inc. System and method of presenting channelized data
US6453339B1 (en) 1999-01-20 2002-09-17 Computer Associates Think, Inc. System and method of presenting channelized data
EP1315114A3 (en) * 1999-01-20 2005-10-05 Computer Associates Think, Inc. System and method of presenting channelized data
WO2000048073A1 (en) * 1999-02-09 2000-08-17 Hearme Method and apparatus for managing assets of a client side application
WO2001013287A1 (en) * 1999-06-11 2001-02-22 Cci Europe A/S A content management computer system for managing publishing content objects
WO2001016734A2 (en) * 1999-08-31 2001-03-08 Accenture Llp A system, method and article of manufacture for a self-describing stream in a communication services patterns environment
WO2001016734A3 (en) * 1999-08-31 2002-02-21 Accenture Llp A system, method and article of manufacture for a self-describing stream in a communication services patterns environment
WO2001033387A3 (en) * 1999-10-29 2002-04-04 Liberty Integration Software I Apparatus, systems and methods for electronic data development, management, control and integration in a global communications network environment
WO2001033387A2 (en) * 1999-10-29 2001-05-10 Liberty Integration Software, Inc. Apparatus, systems and methods for electronic data development, management, control and integration in a global communications network environment
AU782134B2 (en) * 1999-10-29 2005-07-07 Liberty Integration Software, Inc. Apparatus, systems and methods for electronic data development, management, control and integration in a global communications network environment
EP1250669A1 (en) * 1999-11-03 2002-10-23 Accenture LLP Data warehouse computing system
EP1250669A4 (en) * 1999-11-03 2006-10-25 Accenture Llp Data warehouse computing system
GB2364403A (en) * 1999-11-11 2002-01-23 Canon Kk An information search system and a text translation system
US6912586B1 (en) * 1999-11-12 2005-06-28 International Business Machines Corporation Apparatus for journaling during software deployment and method therefor
US6704782B1 (en) 1999-12-09 2004-03-09 International Business Machines Corporation System and methods for real time progress monitoring in a computer network
US6615274B1 (en) 1999-12-09 2003-09-02 International Business Machines Corporation Computer network control systems and methods
US6588011B1 (en) 1999-12-14 2003-07-01 International Business Machines Corporation Apparatus for automatically generating restore process during software depolyment and method therefor
US6772158B1 (en) 1999-12-14 2004-08-03 International Business Machines Corporation Apparatus for data depoting and method therefor
US6604237B1 (en) 1999-12-14 2003-08-05 International Business Machines Corporation Apparatus for journaling during software deployment and method therefor
US7191208B1 (en) 1999-12-14 2007-03-13 International Business Machines Corporation Methods of selectively distributing data in a computer network and systems using the same
US7992079B2 (en) 2000-01-27 2011-08-02 American Express Travel Related Services Company, Inc. Information architecture for the interactive environment
US8683310B2 (en) 2000-01-27 2014-03-25 American Express Travel Related Services Company, Inc. Information architecture for the interactive environment
WO2001055910A3 (en) * 2000-01-27 2003-01-23 American Express Travel Relate Information architecture for an interactive environment
US7293230B2 (en) 2000-01-27 2007-11-06 American Express Travel Related Services Company, Inc. Information architecture for the interactive environment
WO2001055910A2 (en) * 2000-01-27 2001-08-02 American Express Travel Related Services Company, Inc. Information architecture for an interactive environment
WO2001067309A2 (en) * 2000-03-03 2001-09-13 Radiant Logic, Inc. System and method for providing access to databases via directories and other hierarchical structures and interfaces
WO2001067309A3 (en) * 2000-03-03 2004-02-12 Radiant Logic Inc System and method for providing access to databases via directories and other hierarchical structures and interfaces
US6985905B2 (en) 2000-03-03 2006-01-10 Radiant Logic Inc. System and method for providing access to databases via directories and other hierarchical structures and interfaces
WO2001075673A3 (en) * 2000-03-31 2003-12-04 Mypoints Com Inc Method for developing and maintaining content for websites
WO2001075673A2 (en) * 2000-03-31 2001-10-11 Mypoints.Com, Inc. Method for developing and maintaining content for websites
WO2001084362A1 (en) * 2000-04-28 2001-11-08 Noheto S.A.S Method for dynamic preparation of digital files corresponding to information means such as html pages
FR2808348A1 (en) * 2000-04-28 2001-11-02 Noheto S A S PROCESS FOR THE DYNAMIC PREPARATION OF DIGITAL FILES CORRESPONDING TO INFORMATION MEANS SUCH AS PAGES IN HTML FORMAT
US6718335B1 (en) 2000-05-31 2004-04-06 International Business Machines Corporation Datawarehouse including a meta data catalog
WO2002001384A3 (en) * 2000-06-28 2003-01-30 Twi Interactive Inc Database system, particularly for multimedia objects
US7096226B2 (en) 2000-06-28 2006-08-22 Mci, Llc Database system, particularly for multimedia objects
US8126313B2 (en) 2000-06-28 2012-02-28 Verizon Business Network Services Inc. Method and system for providing a personal video recorder utilizing network-based digital media content
US9038108B2 (en) 2000-06-28 2015-05-19 Verizon Patent And Licensing Inc. Method and system for providing end user community functionality for publication and delivery of digital media content
EP1179789A3 (en) * 2000-08-11 2005-11-23 International Business Machines Corporation Method and system for accessing information on a network
EP1179789A2 (en) * 2000-08-11 2002-02-13 International Business Machines Corporation Method and system for accessing information on a network
US7529750B2 (en) 2000-08-11 2009-05-05 International Business Machines Corporation Accessing information on a network
EP1366449A4 (en) * 2001-02-07 2007-06-20 Biz360 Inc System of analysing networked searches within business markets
EP1366449A2 (en) * 2001-02-07 2003-12-03 Biz360, Inc. System of analysing networked searches within business markets
US7254570B2 (en) 2001-03-21 2007-08-07 Nokia Corporation Query resolution system and service
US7200627B2 (en) 2001-03-21 2007-04-03 Nokia Corporation Method and apparatus for generating a directory structure
EP1244032A1 (en) * 2001-03-21 2002-09-25 Nokia Corporation Management and distribution of electronic media
EP1244030A1 (en) * 2001-03-21 2002-09-25 Nokia Corporation Management and distribution of electronic media
EP1244031A1 (en) * 2001-03-21 2002-09-25 Nokia Corporation Management and distribution of electronic media
US7353236B2 (en) 2001-03-21 2008-04-01 Nokia Corporation Archive system and data maintenance method
US6904454B2 (en) 2001-03-21 2005-06-07 Nokia Corporation Method and apparatus for content repository with versioning and data modeling
US8977108B2 (en) 2001-06-27 2015-03-10 Verizon Patent And Licensing Inc. Digital media asset management system and method for supporting multiple users
US7970260B2 (en) 2001-06-27 2011-06-28 Verizon Business Global Llc Digital media asset management system and method for supporting multiple users
US8990214B2 (en) 2001-06-27 2015-03-24 Verizon Patent And Licensing Inc. Method and system for providing distributed editing and storage of digital media over a network
US8972862B2 (en) 2001-06-27 2015-03-03 Verizon Patent And Licensing Inc. Method and system for providing remote digital media ingest with centralized editorial control
GB2378004A (en) * 2001-07-27 2003-01-29 Financial Services Dot Com Plc Macro data and software management
EP1559034A4 (en) * 2002-11-07 2008-04-02 Thomson Global Resources Ag Electronic document repository management and access system
EP1559034A2 (en) * 2002-11-07 2005-08-03 Thomson Legal and Regulatory Global AG Electronic document repository management and access system
US7941431B2 (en) 2002-11-07 2011-05-10 Thomson Reuters Global Resources Electronic document repository management and access system
NL1025547C2 (en) * 2003-02-21 2005-05-23 Hewlett Packard Development Co Content management portal and method for managing digital values.
WO2004114148A1 (en) * 2003-06-20 2004-12-29 International Business Machines Corporation Heterogeneous indexing for annotation systems
US9026901B2 (en) 2003-06-20 2015-05-05 International Business Machines Corporation Viewing annotations across multiple applications
US8793231B2 (en) 2003-06-20 2014-07-29 International Business Machines Corporation Heterogeneous multi-level extendable indexing for general purpose annotation systems
WO2005041062A3 (en) * 2003-10-24 2007-07-19 Thales Sa Method for universal addressing of software granules, for simplifying access to heterogeneous data managed by software tools
FR2861476A1 (en) * 2003-10-24 2005-04-29 Thales Sa Software granules addressing process for accessing heterogeneous software data, involves defining generic rule for identification of granules, and applying rule to requests exchanged between software tools
WO2005041062A2 (en) * 2003-10-24 2005-05-06 Thales Method for universal addressing of software granules, for simplifying access to heterogeneous data managed by software tools
US8321465B2 (en) 2004-11-14 2012-11-27 Bloomberg Finance L.P. Systems and methods for data coding, transmission, storage and decoding
US9076311B2 (en) 2005-09-07 2015-07-07 Verizon Patent And Licensing Inc. Method and apparatus for providing remote workflow management
US9401080B2 (en) 2005-09-07 2016-07-26 Verizon Patent And Licensing Inc. Method and apparatus for synchronizing video frames

Also Published As

Publication number Publication date
TW307840B (en) 1997-06-11

Similar Documents

Publication Publication Date Title
WO1997015018A1 (en) Method and system for providing uniform access to heterogeneous information
US7266826B2 (en) Publish-subscribe architecture using information objects in a computer network
US6456308B1 (en) Embedded web server
US6769032B1 (en) Augmented processing of information objects in a distributed messaging framework in a computer network
US5973696A (en) Embedded web server
US6567846B1 (en) Extensible user interface for a distributed messaging framework in a computer network
US7290061B2 (en) System and method for internet content collaboration
US9165077B2 (en) Technology for web site crawling
US6105043A (en) Creating macro language files for executing structured query language (SQL) queries in a relational database via a network
KR100188484B1 (en) A sub-agent service for fulfilling requests of a web browser
US6128622A (en) IMS web studio taskguide
US7873649B2 (en) Method and mechanism for identifying transaction on a row of data
US6253254B1 (en) Hyper media object management
JP5320438B2 (en) Method and apparatus for XML data storage, query rewriting, visualization, mapping, and referencing
US20020035564A1 (en) Automated on-line information service and directory, particularly for the world wide Web
US20030088639A1 (en) Method and an apparatus for transforming content from one markup to another markup language non-intrusively using a server load balancer and a reverse proxy transcoding engine
US20120131045A1 (en) Group universal resource identifiers
US20020087622A1 (en) Meta-application architecture for integrating photo-service websites for browser-enabled devices
US20050177595A1 (en) Link generation system
JPH0981445A (en) Information controller
CA2635265A1 (en) Transferring and displaying hierarchical data between databases and electronic documents
JP5048956B2 (en) Information retrieval by database crawling
Papiani et al. A distributed scientific data archive using the Web, XML and SQL/MED
US20040117349A1 (en) Intermediary server for facilitating retrieval of mid-point, state-associated web pages
US6745203B1 (en) User interface for a distributed messaging framework

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 97515832

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA