US6907422B1 - Method and system for access and display of data from large data sets - Google Patents

Method and system for access and display of data from large data sets Download PDF

Info

Publication number
US6907422B1
US6907422B1 US10/025,061 US2506101A US6907422B1 US 6907422 B1 US6907422 B1 US 6907422B1 US 2506101 A US2506101 A US 2506101A US 6907422 B1 US6907422 B1 US 6907422B1
Authority
US
United States
Prior art keywords
data
records
bucket
database
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US10/025,061
Inventor
Daniel Thomas Predovic
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle America Inc
Original Assignee
Siebel Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siebel Systems Inc filed Critical Siebel Systems Inc
Priority to US10/025,061 priority Critical patent/US6907422B1/en
Assigned to SIEBEL SYSTEMS, INC. reassignment SIEBEL SYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PREDOVIC, DANIEL THOMAS
Priority to US11/101,752 priority patent/US7899844B2/en
Application granted granted Critical
Publication of US6907422B1 publication Critical patent/US6907422B1/en
Assigned to Oracle America, Inc. reassignment Oracle America, Inc. MERGER (SEE DOCUMENT FOR DETAILS). Assignors: SIEBEL SYSTEMS, INC.
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99932Access augmentation or optimizing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99937Sorting

Definitions

  • the present invention concerns databases in general, and, in particular, a method for access and display of data from large data sets in which the large data sets are portioned into a plurality of buckets, each of which contain a small subset of records in the large data set.
  • RDBMS relational database management system
  • SQL databases SQL-(structured query language) based databases.
  • SQL databases run on database software provided by various vendors, including Oracle (Oracle 8i and 9i), Microsoft (SQL Server 7), IBM (DB2), Sybase, and Informix.
  • RDBMS databases are usually run on one or more networked database servers and are accessed by client machines connected to the database server(s) via a computer network.
  • middle tiers e.g., application servers
  • a database 10 hosted by a database server 12 may be accessed by a client machine 14 via a network 16 .
  • Modern SQL RDBMS databases enable a multitude of client users to concurrently access (i.e., Insert, Update and Delete) data using appropriate client-side software (or through middleware running on an application server), such as client applications that provide a graphical user interface (GUI) that allows users to interactively access database data.
  • GUI graphical user interface
  • RDBMS databases data are stored in tables in accordance with a database “schema,” which defines storage parameters (metadata) that define structures for each of the tables stored in the database, various data constraints, relationships between tables, and indexes, etc.
  • the table data are stored in one or more shared storage spaces, each comprising one or more datafiles, although there are some RDBMS databases that store data for each table in a separate file, and others that store all of the data for a given database in a single file.
  • an Oracle database will include a plurality tablespaces (e.g., system, user, rollback, etc.), wherein user data are stored in a plurality of tables in one or more selected tablespaces that may be specifically configured using various configuration parameters during creation or alteration of the tablespaces.
  • tablespaces e.g., system, user, rollback, etc.
  • Data are stored in a tablespace in the following manner.
  • a plurality of segments are allocated for the tablespace and a datafile is assigned to the tablespace, wherein each segment comprises a plurality of fixed-size (e.g., 2K, 4K or 8K) storage blocks.
  • Each of these storage blocks comprise-multiple operating storage blocks, which are the base unit the operating system uses to define where data is physically stored.
  • These storage blocks are (typically) filled with data in a substantially sequential manner until close to all of the storage space provided by the allocated segments is consumed.
  • the tablespace must be “extended” with one or more extents (similar to segments) so that additional data may be added.
  • the data is logically stored in a plurality of rows, wherein each row of data includes a set of data (i.e., record) pertaining to various columns defined for the table the data are stored in.
  • each row of data includes a set of data (i.e., record) pertaining to various columns defined for the table the data are stored in.
  • additional “row ID” information that uniquely identifies every row in the database is stored for each row.
  • the row ID comprises a string having an encoded format that can be parsed by Oracle to quickly access the row of data corresponding to the row ID.
  • the row IDs are used by primary key and other types of indexes to speed up queries and sorting operations.
  • the data are stored in a substantially sequential manner based on the approximate order the data are entered into a database.
  • a database For example, Oracle uses a background operation to write blocks of data in response to predetermined conditions (memory full, time interval, etc.), wherein the block writes are performed continuously, although they are slightly asynchronous to when data are actually entered.
  • the data for a given table are stored in a somewhat randomized order. For instance, suppose that contact information corresponding to various customers are stored in a table that includes various rows for storing the contact information, such as last name, first name, address, city, state, phone number, etc.
  • databases provide various solutions for providing requested data in a sorted configuration based on specified sort criteria, such as last name sorted alphabetically.
  • the data corresponding to a requested data set i.e., query
  • a temporary sort space comprising physical storage and/or memory space
  • the data are sorted corresponding to the predefined ordering scheme, and then provided to an application, applet, or module that is used to present the data to the user (or provide the data to an internal software component user).
  • non-indexed queries i.e., a query that doesn't use a primary key or other index for a table or set of related tables
  • a full table scan must be performed to retrieve appropriate rows specified by the query (the result set or recordset), and then a memory or disk sort must be performed on the result set prior to providing the data to the user as an “open” data set.
  • one or more indexes are used to identify the appropriate rows of data in the result set, whereupon mapping information provided by the index(es) (e.g., row IDs) that identifies where those rows of data are located is used to retrieve appropriate rows of data meeting the query search criteria.
  • mapping information provided by the index(es) e.g., row IDs
  • the rows of data are retrieved in a sorted order. In other cases, this is not possible or impractical, and the rows of data are first retrieved and then sorted in the sort space.
  • a request for all records in a table will cause a table scan to be performed, regardless of whether or not indexes are used.
  • the rows in the contact table (stored as randomized set of contact records 18 ) are retrieved and sent to a sort area 26 , which depending on the size of the sort may comprise a memory space and/or a temporary disk storage space.
  • the rows are then sorted, using a sort process 28 and then provided back to client-side application 20 as a sorted full recordset 30 .
  • the retrieval and sorting of data can be very time-consuming, particularly if the sorting has to be performed using a disk sort. This is often the case for large result sets, which may involve millions, or even 10's of millions of rows of data. In such instances, even an optimized query using indexes might take 10's of minutes or even hours. Furthermore, such queries are resource (CPU and memory) intensive, often slowing access by other users to a substantial halt.
  • RDBMS database vendors have developed various schemes to provide data to GUI-based applications that function as client-side front ends to enable users to access and view the database data.
  • these schemes are centered around “virtual” lists of data that are either continually loaded using background operations or implemented through the use of built-in SQL commands, such as the TOP command.
  • These conventional schemes are limited in their ability to retrieve and provide data to end users when large data sets are queried.
  • sorted full recordset 30 is logically stored as a virtual list that is managed by a virtual list manager 32 .
  • virtual list manager 32 may actually manage smaller sets of all of those rows that are continually being retrieved from database 10 using a background process—for convenience, a complete data set is illustrated in FIG. 1.
  • Virtual list manager 32 interacts with a GUI manager 34 that is used to provide display information to a video subsystem that includes a display device (e.g., monitor) 36 to generate a GUI display screen 38 that enables the user to selectively view contact data stored in database 10 .
  • GUI manager 34 that is used to provide display information to a video subsystem that includes a display device (e.g., monitor) 36 to generate a GUI display screen 38 that enables the user to selectively view contact data stored in database 10 .
  • GUI manager 34 e.g., a display device
  • the present invention provides a method and system for accessing and displaying data records that are retrieved from large data sets in a manner that dramatically reduces the latency problems common to conventional access schemes.
  • a large data set is logically partitioned into a plurality of “buckets” by defining “soft” boundaries corresponding to a sort order in which data records are presented to users or provided to internal users for purposes like batch processing.
  • the bucket of data the record or at least a portion of the group of related records is stored in is identified, and those records are retrieved from the database sorted in accordance with the sort order, and provided to the requester.
  • Data defining a relative position of each boundary point in the sort order are built and stored as a set of boundary markers, preferably during an administative process.
  • Each of these boundary markers comprises a unique set of data corresponding to two or more columns in a database table on which the sort order is based, and in which at least a base portion of the data records are stored.
  • the chosen columns will already be indexed or have indexes applied to them subsequent to building the boundaries.
  • a pair of columns are used, such as columns that store first and last name data. When very large data sets are used, it may be necessary to use a tertiary set of column data, or even a higher number of columns.
  • the boundary markers are stored in sets, preferably in accordance with the sort order they correspond to.
  • the boundary markers are searched to determine which bucket the desired data record or records are contained in.
  • a query is then formulated based on the boundary markers to retrieve the records corresponding to the bucket.
  • the software components for implementing the invention are run on an application server that receives various data requests and navigation events from various clients.
  • all or a viewable subset (viewset) of the records are provided to the clients to enable users to access the data record or group of related data records.
  • new viewsets of data records are provided to the client in response to navigation events, such as scrolling.
  • a navigation event that seeks data records that are not contained in a current bucket
  • a new bucket in which the requested data records are contained is identified and retrieved, and a new viewset from the new bucket is presented to the user.
  • users can search or browse data records retrieved from data sets that may comprise 10's of millions or even 100's of millions of records, wherein new viewsets are typically provided to the user with subsecond responses.
  • FIG. 1 is a schematic diagram illustrating how data records contained in a full data set are retrieved using conventional data access schemes found in the prior art
  • FIG. 2 is a schematic diagram conceptually illustrating how data records from large data sets are retrieved in “buckets” in accordance with the present invention
  • FIG. 3 is an exemplary architecture and data flow diagram corresponding to a two-tier implementation of the invention.
  • FIG. 4 is an exemplary architecture and data flow diagram corresponding to a three-tier implementation of the invention.
  • FIG. 5 is a flowchart illustrating the logic used by the invention when generating bucket boundaries
  • FIG. 6 is a flowchart illustrating the logic used by the invention in response to user navigation events.
  • FIG. 7 is a schematic diagram of a computer system that may be used for the client machines and servers in the architectures of FIGS. 3 and 4 .
  • the present invention provides a mechanism for accessing very large data sets, wherein the data sets are logically broken into manageable “buckets” of data, greatly reducing the data access latencies inherent in the prior art.
  • the mechanism and its associated operations are referred to herein a large data volume (LDV) handling or simply LDV.
  • LDV works by dividing a query that would normally be used to retrieve a large number of rows of data (e.g., contact data from a contact table or set of related tables) into a number of smaller queries, wherein each query specifies a set of data contained in a data bucket.
  • a bucket is defined as all of the data returned from a corresponding SQL query, set with a pre-defined sort order, in which upper and lower boundaries are used to specify the range of data contained in the bucket.
  • a target object e.g., a table or set of tables containing related data
  • a target object e.g., a table or set of tables containing related data
  • FIG. 2 This configuration is illustrated in FIG. 2 , in which a sorted contact table 50 is divided into a plurality of buckets 52 defined by boundaries 54 .
  • the various data in a “base” table are logically shown as being divided alphabetically into buckets and corresponding bucket boundaries. This is merely illustrative to show the buckets and boundaries are generated corresponding to a sort order, as explained below, LDV provides various schemes for dividing data sets into buckets.
  • LDV determines an appropriate bucket that record or group is stored in and retrieves the bucket.
  • An appropriate query 58 is then formulated and submitted to database 10 for execution, returning a data set comprising a bucket 60 that includes data contained within a lower boundary 62 and upper boundary 64 specified by the query. The difference between the lower and upper boundaries define a size of the bucket 66 .
  • the bucket is then returned to LDV user 56 , whereupon it may be used in a batch process or filtered by a client application that enables users to request and view data stored in database 10 .
  • an appropriate query is formulated to retrieve a new bucket containing the desired records.
  • LDV operates like a sliding window 68 over the full set of data records contained in sorted contacts table 50 .
  • Architecture 70 includes several components that share the same root reference numerals (e.g., 10 and 10 A, etc.) as components depicted in FIG. 1 and discussed above; it will be understood that these components perform substantially similar operations in both architectures.
  • Architecture 70 corresponds to a two-tier environment in which a client application 72 running on a client machine 14 enables a user to selectively view contact data stored in a datafile corresponding to a CONTACT table in database 10 as a randomized set of contact records 18 A. It will be understood that in addition to the use of a single table for storing contact information, such information may be stored in a set of related tables, as will be recognized by those skilled in the database arts.
  • LDV works by retrieving buckets of data rather than an entire data set.
  • the data contained in the buckets are logically bounded by boundaries, the generation of which are described below.
  • various boundary markers are stored in database 10 A, such as depicted by a set of contact boundary markers 74 .
  • all or a portion of the boundary markers may be stored in one or more files on database server 12 A, or in one or more files on client machine 14 A.
  • LDV operations are implemented by an LDV class 76 and an LDV helper class 78 provided by client application 72 .
  • client application 72 may comprise a single software module, or a set of modules that interact using appropriate application program interfaces (API's) 80 .
  • API's application program interfaces
  • a business logic component 84 determines what data records a user of client machine 14 desires to view.
  • Business logic component 84 then provides this information to LDV class 76 , which generates a first query 86 to retrieve an appropriate set of boundary marker data 88 corresponding to the object containing the data the user requested (in this instance the CONTACT table).
  • LDV class 76 parses the data to determine an appropriate pair of lower and upper boundaries corresponding to the bucket of data the requested data is contained in. For example, suppose that a user desires to search for the contacts with the last name of “Smith.” Further suppose that a set of boundary data has been generated for the contact table that uses the LASTNAME column, and includes respective boundaries corresponding to the last record for selected letters C, H, K Q, U, and Z, as illustrated in FIG. 3 . Accordingly, the boundary data are parsed to determine the nearest boundaries encompassing the row(s) of data containing a last name of “Smith,” which in this case yield the boundaries at Q (lower boundary) and U (upper boundary) corresponding to a bucket 90 . As a result, the following SQL query is formulated as a query 92 and submitted to database 10 to retrieve the bucket:
  • database 10 uses a boundary index-based retrieval process 92 that retrieves all of the records for the CONTACTS table contained within the upper and lower boundaries for bucket 90 , as depicted by all records having a LASTNAME beginning with the letters R, S, T, and U.
  • the retrieved data are forwarded to sort area 26 , whereupon they are sorted by sort process 28 and returned to client machine 14 as an open set comprising sorted bucket data 94 .
  • LDV class 76 (or another client-side module) then implements a display filter 96 that is used to determine a current subset of the sorted data in the bucket to be displayed (the “viewset) and passes the viewset to a GUI manager 96 , which drives the video subsystem to display data pertaining to the records in the viewset within a GUI display screen 38 A on display device 36 .
  • architecture 100 depicts a well-known 3-tier architecture in which an application server 102 “sits between” database server 12 and a plurality of client machines 14 .
  • application server 102 “sits between” database server 12 and a plurality of client machines 14 .
  • one or more additionally tiers e.g., a web server tier
  • Application server 102 hosts various software modules that are used to perform various operations, including delivering data to client machines 14 in response to corresponding requests for data from users of the machines. These components include an LDV class 76 , and LDV helper class 78 , which interact with a business logic component 104 via an API 106 .
  • client machines 14 host a client application 108 that includes a business logic component 110 and a GUI manager 112 .
  • a typical request process begins with user input 114 that is received by business logic 110 .
  • the business logic processes the user input to determine what data the user desires and submits a corresponding data request 116 to application server 102 .
  • business logic components 110 and 104 comprises a client-server set, that are configured to interact with one another using a common protocol. Accordingly, data requests comprising “native” data pertaining to the common protocol are sent from business logic component 110 to business logic component 104 .
  • client application 108 comprises a web browser
  • data request 116 comprises HTML (hyper-text markup language) data.
  • data request 116 will be received by business logic 104 .
  • data request 116 will be received by LDV class 76 .
  • the LDV class and LDV helper class than interact with database 10 in a similar manner to that described above, returning a sorted data bucket 94 to the application server.
  • the LDV class and/or business logic 104 then determine what data to filter, ultimately sending the data from business logic component 104 to client machine 14 as filtered display data 118 .
  • the filtered display data may be in a native format, or comprise HTML data. Filtered display data is then passed by business logic 110 to GUI manager 112 , which drives the client machine's video subsystem to generate a GUI display screen 36 B on display device 36 .
  • LDV retrieves selected buckets of data based on data requests from users rather than entire data sets.
  • the data contained in a given bucket is defined by the boundaries for the bucket.
  • the boundaries are pre-determined through an administrative process that is somewhat similar to building indexes on a database. Alternately, the boundaries may be generated at runtime.
  • LDV boundaries are generated in the following manner. Initially, parameters for the process must be selected. With reference to FIG. 5 , this process begins in a block 150 in which a table is selected to apply LDV boundaries to. Typically, these tables will store data that is usually presented to end users in a sorted list format, such as contacts, products, employees, clients, etc. In many instances, the tables will comprise “base” or “parent” tables, while the displayed data will include not only data contained in the base or parent table, but also data stored in one or more child tables that are linked to the base or parent table via a primary key-foreign key relationship. For the purposes of LDV, only the base or parent table needs to be considered when generating LDV boundaries.
  • allowable sort orders for the table are determined.
  • the sort orders will be based on the intended use of the data, and in one embodiment will be enforced though the LDV data access classes and subsequently by the graphical user interface.
  • Each sort order to be provided requires its own unique set of boundary markers. For example, suppose the targeted table is a contact table that includes contact data stored in various columns, including LASTNAME, FIRSTNAME, COMPANY, ADDRESS, CITY, STATE, etc. Typically, it will be desired to sort the contact data alphabetically by LASTNAME.
  • other sort orders may be included, such as alphabetical sorted on COMPANY, CITY, STATE etc.
  • the sort order may be complex, comprising a combination of columns.
  • the data might be sorted on LASTNAME, then FIRSTNAME, then COMPANY.
  • the data may be sorted by COMPANY, then LASTNAME, then FIRSTNAME.
  • the actual combination used will depend on the type of data, the column configurations that data are stored in, and how the data are to be presented to the user (or to be used internally by, e.g., a batch processing component).
  • the selected column or columns should be columns that are either currently indexed, or will have a database index generated for them prior to using the LDV scheme.
  • a block 158 the approximate number of records to be included within each bucket is determined. To determine this value, many considerations are required. First, how will the data generally be requested and used? In a two-tier environment for heavy batch processing, a larger bucket size is acceptable, since the entire data set will often need to be processed and the speed of a two-tier connection is provided. In an n-tier environment for GUI restricted usage, like that used in customer relationship management (CRM) data systems, a small bucket size would be better since only a small amount of the data (e.g., 20-30 rows) is viewed at once. Furthermore, since there are many users who are concurrently access the database in typical data systems of this type, larger bucket sizes should be avoided because of their higher server overhead.
  • CRM customer relationship management
  • the size of the bucket should be scaled inversely with the percentage of the database that the user sees. For example, if users always use selection criteria that restrict their view to half of the records in the database, then the bucket size should be doubled. This is because, if a user has a selection criteria set that eliminates half of the database from their view, then each bucket will contain only one-half of its capacity of records that match the baseline selection criteria (when compared with a baseline condition in which the boundaries are defined as if all records will be access). As a result, the actual number of records returned in the bucket remain substantially constant
  • the parameters are used by a process that opens the data set sorted in the appropriate order, as provided by a block 160 .
  • data corresponding to only those columns considered in the sort order need be retrieved, rather than the entire records. For example, if the sort order is based on LASTNAME, then FIRSTNAME, the corresponding SQL query might look like:
  • the process then loops through the open data set in a block 162 , start to finish, defining boundary markers based on data found every N rows, wherein N corresponds to the number of records for the bucket determined in block 158 .
  • N corresponds to the number of records for the bucket determined in block 158 .
  • the scheme enhances the boundary definition through a second and/or tertiary column, incorporating that data into the boundary marker. For example, suppose the LASTNAME, FIRSTNAME, and COMPANY columns are considered for use as boundary markers.
  • each contact's last name may be unique (or have limited duplications), so only data corresponding to the LASTNAME column need to be used for the boundary markers.
  • the boundary markers should include data from a second column, such as LAST NAME.
  • these two columns may also not be sufficient, so data in a tertiary column (e.g., COMPANY) will be necessary.
  • this can be taken to a fourth or higher order if necessary to ensure appropriate uniqueness for each boundary marker.
  • a unique data column could be used, such as a client ID column or the like.
  • the boundary marker data are stored as a set so that it can be easily accessed during production data retrieval situations.
  • these data may be stored in a database table on database 10 , on application server 102 , or locally on the client machines.
  • the boundary marker data is stored as a concatenated string within an optimized database table.
  • boundary markers are stored in a concatenated string having the following format:
  • each adjacent set of boundary marker data corresponds to an adjacent boundary point.
  • each set of boundary marker data may be stored as a separate record in a table.
  • the concatenated string format or separate records will likely be dependent on the relative workload of the database and application servers. If the database server workload is light to moderate, built-in database operations (e.g., SQL and cursor loops) can be used to search for appropriate boundary markers.
  • the advantage of the concatenated string technique is that the entire string can be passed to the application server (which requires very minimal overhead for the database server), and then parsed at the application server rather than having an equivalent operation performed by the database server.
  • building bucket boundaries for a given data set requires the data set to be opened.
  • opening the data set can be performed using the LDV scheme itself so that the entire data set does not need to be opened (at one time) to complete the administrative process.
  • building bucket boundaries can be carried on during 7 ⁇ 24 operation of the database, without adversely affecting enterprise applications that access the database.
  • appropriate seed data i.e., estimated boundaries that should satisfactorily break up the data
  • Each bucket of data that is retrieved from the database includes all of the records contained within the lower and upper boundaries corresponding to the data bucket. Also, any given record in the entire data set will be contained in only one bucket. Accordingly, when using the boundary marker string discussed above, determination of the boundary markers corresponding to the bucket in which a particular data record is stored is performed by parsing the string until passing the first boundary marker that would be logically ordered prior to the particular data record. For example, if the search was for a “Gilbert Smith,” concatenated string (1) would be parsed until the “
  • the SQL will include data in the where clause corresponding to that column. For example, suppose a user is searching on the last name of “Billingsby” in a CONTACT table and the upper and lower boundaries respectively comprise “Benton” and “Bingham.”The corresponding SQL statement would then be:
  • a simple boundary definition works well for thousands of rows, but can breakdown when there are many entries with the same value in an extremely large database.
  • An example is a list of names that contains 200,000 Smith's. A more complex algorithm is required for that case.
  • the boundary markers will be based on multiple database columns to further break the data into appropriate buckets.
  • the boundary data by using a secondary boundary based on the FIRSTNAME column and a tertiary boundary on the COMPANY column, hundreds of millions of records can be handled since there is a very limited chance of data grouping (i.e., relatively large groups of records having identical values for LASTNAME, FIRSTNAME, and COMPANY).
  • the bucket boundary data the resulting query can be simple to very complex depending on boundaries.
  • a simple bucket would be created when there are many Fred Smiths in the list:
  • the LDV scheme When implemented, the LDV scheme enables users to navigate through various data screens or “views” during which appropriate buckets are retrieved and data contained in those buckets are displayed in a manner that is transparent to the users.
  • a flowchart illustrating the logic used by the invention to implement this functionality is shown in FIG. 6 .
  • a query will be performed to retrieve a first bucket of data based on a requested search, as provided by a block 170 in FIG. 6 .
  • the data returned includes a sorted set of all of the data matching the SQL query criteria, with record positioned at the beginning of the data set being marked as the active record.
  • the returned record set includes all of the data falling within the first bucket, positioned on the first record, and has knowledge that the record set is an “incomplete” set.
  • appropriate viewsets i.e., subsets of the bucket that are displayed at any one time to the user
  • each user navigation input (e.g., page-up, page down, etc.) will generate a new navigation “context” for the user, and an appropriate new viewset will be provided to the user.
  • the viewset acts as a sliding window over the sorted records in the bucket.
  • the LDV classes monitor regular record set calls corresponding to navigation events such as MoveNext, MovePrevious and EOF, to determine if the user desires to view data that is not contained in the current bucket. If this condition is sensed, as indicated by a YES (TRUE) result to a decision block 176 , appropriate boundaries for the new bucket are determined in a block 178 .
  • the current bucket is discarded in a block 180 .
  • the data corresponding to the current bucket may be maintained in a cache, such as a LRU (least recently used) cache.
  • LRU least recently used cache
  • the new bucket of data is retrieved from the database, whereupon an appropriate viewset is sent to the client and the active record is positioned to meet the user's request. For example, if the user request data for “Frank Smith” the active record will correspond to the first “Frank Smith” encountered in the sorted list.
  • the entire set is open when the GUI allows for an unfiltered list of data or when a batch process is going to operate on all of the data within a table.
  • filters are often applied to the sorted columns of the query in order to return a targeted set of data.
  • Analysis of the additional SQL and monitoring of the retrieved data sets allows the LDV helper class to alter the way it selects bucket boundaries to more optimally return data. For example, if a ‘user’ only requires people with a last name starting with “S”, the helper processing immediately recognizes the additional restriction on the sort order columns and begins its bucket boundaries at the first boundary pair that includes that data.
  • the additional SQL (indicated in bold) for the first bucket might be:
  • the LDV helper class can alter the number of buckets returned by one query. In the simplest form, this is accomplished by skipping a boundary definition and moving to the next one. For example, suppose a user selects everyone in New York City from a database that containes data on every person in the United States. This would result, assuming evenly distributed last names throughout the US, in about ⁇ fraction (1/25) ⁇ th of the expected rows per bucket, since approximaty one in every twenty-five Americans live in New York City.
  • the helper class In an attempt to retrieve the preset optimal count per bucket, the helper class would monitor the number of records returned in response to an initial or prior query, and internally increase the number of skipped boundaries in subsequent queries to retrieve new buckets of data as scrolling continues until the general returned amount is close to a desired bucket size.
  • the helper might observer that the number of records retrived via a first query resulted in only ⁇ fraction (1/25) ⁇ th of the number of records in the desired bucket size, whereupon the helper may adjust subsequent queries to skip 25 boundaries per re-query.
  • This added technology is known as Dynamic Bucketing.
  • the LDV class finds the appropriate bucket that the search for piece of data will lie in, returns that bucket of data (typically a sub-second call), and positions the active record on the appropriate record using its regular seek process. All the overhead of opening, sorting and seeking through hundreds of millions of rows is avoided.
  • a unique key is usually known—such as the client ID.
  • the application stores the client ID (or other unique identifyier) for the last records that was active and simply needs to query a single row corresponding to the stored client ID to return to the record(very quick).
  • the appropriate sort column data can then be extracted from the record, enabling an appropriate bucket containing the record to be identified, whereupon the record set corresponding to the bucket is retrieved and opened (typically a sub-second call), and the active record is positioned on the record corresponding to the stored client ID.
  • the present invention provides several advantages over conventional large data set access techniques. Most notably, data access latencies are greatly reduced, often enabling a user to literally scroll through millions of records without requiring the user to wait for data to be retrieved or screens to be updated.
  • the LDV scheme easily scales from small data sets to very large data sets, and enables a large number of concurrent users to leverage its features, while causing minimal additional overhead for both the database server and application servers.
  • a generally conventional computer 200 is illustrated, which is suitable for use as client machines, application servers, and database servers in connection with practicing the present invention, and may be used for running client and server-side software comprising one or more software modules that implement the various operations of the invention discussed above.
  • client machines Examples of computers that may be suitable for client machines as discussed above include PC-class systems operating the Windows NT or Windows 2000 operating systems, Sun workstations operating the UNIX-based Solaris operating system, and various computer architectures that implement LINUX operating systems.
  • Computer 200 is also intended to encompass various server architectures, as well as computers having multiple processors.
  • Computer 200 includes a processor chassis 202 in which are mounted a floppy disk drive 204 , a hard drive 206 , a motherboard 208 populated with appropriate integrated circuits including memory 210 and one or more processors (CPUs) 212 , and a power supply (not shown), as are generally well known to those of ordinary skill in the art.
  • hard drive 206 may comprise a single unit, or multiple hard drives, and may optionally reside outside of computer 200 .
  • a monitor 214 is included for displaying graphics and text generated by software programs and program modules that are run by the computer.
  • a mouse 216 may be connected to a serial port (or to a bus port or USB port) on the rear of processor chassis 202 , and signals from mouse 216 are conveyed to the motherboard to control a cursor on the display and to select text, menu options, and graphic components displayed on monitor 214 by software programs and modules executing on the computer.
  • a keyboard 218 is coupled to the motherboard for user entry of text and commands that affect the running of software programs executing on the computer.
  • Computer 200 also includes a network interface card 220 or built-in network adapter for connecting the computer to a computer network, such as a local area network, wide area network, or the Internet.
  • Computer 200 may also optionally include a compact disk-read only memory (CD-ROM) drive 222 into which a CD-ROM disk may be inserted so that executable files and data on the disk can be read for transfer into the memory and/or into storage on hard drive 206 of computer 200 .
  • CD-ROM compact disk-read only memory
  • Other mass memory storage devices such as an optical recorded medium or DVD drive may be included.
  • the machine instructions comprising the software that causes the CPU to implement the functions of the present invention that have been discussed above will likely be distributed on floppy disks or CD-ROMs (or other memory media) and stored in the hard drive until loaded into random access memory (RAM) for execution by the CPU.
  • RAM random access memory
  • all or a portion of the machine instructions may be loaded via a computer network.

Abstract

A method and system for accessing and display of data records from large data sets. The method includes defining a plurality of boundaries to logically partition the large data set into a plurality of buckets in accordance with a predefined sort order. The sort order is based on data stored in at least two columns of a database table in which at least a base portion of each data record is stored. In response to a request to retrieve a data record or group of related records, such as all contacts having a given first and last name, a determination is made to which bucket the data record or at least a portion of group of related records are stored in. A query is then formulated to retrieve a subset of the large set of records from the database corresponding to data records contained in the bucket.

Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention concerns databases in general, and, in particular, a method for access and display of data from large data sets in which the large data sets are portioned into a plurality of buckets, each of which contain a small subset of records in the large data set.
2. Background Information
During the past decade, the use of databases that store very large amounts of data has become increasingly prevalent. This is due, in part, to the availability of both computer hardware and database software resources to support these large databases. Prior to the present widespread availability of these resources, data was typically stored using “flat-file” databases or data systems running on mainframes or standalone computers using storage schemes that either supported limited sizes, or provided limited real-time access to the data (e.g., data systems that stored data on tapes). Oftentimes, an enterprise organization had various types of data stored on separate machines that did not provide for easy remote access. In contrast, today's IT environments often involve the use of huge centralized repositories in which data for an entire enterprise are stored, wherein the repositories may be accessed from remote clients connected to them via LANs, WANs, or even over the Internet.
The majority of the large databases in operation today are RDBMS (relational database management system) databases. Furthermore, most of these RDBMS databases are SQL-(structured query language) based databases. These SQL databases run on database software provided by various vendors, including Oracle (Oracle 8i and 9i), Microsoft (SQL Server 7), IBM (DB2), Sybase, and Informix. RDBMS databases are usually run on one or more networked database servers and are accessed by client machines connected to the database server(s) via a computer network. On some n-tier architectures, there are one or more middle tiers (e.g., application servers) that sit between a “backend” database server and the clients. A typical 2-tier architecture is illustrated in FIG. 1, wherein a database 10 hosted by a database server 12 may be accessed by a client machine 14 via a network 16. Modern SQL RDBMS databases enable a multitude of client users to concurrently access (i.e., Insert, Update and Delete) data using appropriate client-side software (or through middleware running on an application server), such as client applications that provide a graphical user interface (GUI) that allows users to interactively access database data.
In RDBMS databases, data are stored in tables in accordance with a database “schema,” which defines storage parameters (metadata) that define structures for each of the tables stored in the database, various data constraints, relationships between tables, and indexes, etc. Generally, the table data are stored in one or more shared storage spaces, each comprising one or more datafiles, although there are some RDBMS databases that store data for each table in a separate file, and others that store all of the data for a given database in a single file. Under Oracle's architecture, these shared storage spaces are called “tablespaces.” Typically, an Oracle database will include a plurality tablespaces (e.g., system, user, rollback, etc.), wherein user data are stored in a plurality of tables in one or more selected tablespaces that may be specifically configured using various configuration parameters during creation or alteration of the tablespaces.
Data are stored in a tablespace in the following manner. First, a plurality of segments are allocated for the tablespace and a datafile is assigned to the tablespace, wherein each segment comprises a plurality of fixed-size (e.g., 2K, 4K or 8K) storage blocks. Each of these storage blocks comprise-multiple operating storage blocks, which are the base unit the operating system uses to define where data is physically stored. These storage blocks are (typically) filled with data in a substantially sequential manner until close to all of the storage space provided by the allocated segments is consumed. At this point, the tablespace must be “extended” with one or more extents (similar to segments) so that additional data may be added. Typically, the data is logically stored in a plurality of rows, wherein each row of data includes a set of data (i.e., record) pertaining to various columns defined for the table the data are stored in. In Oracle, additional “row ID” information that uniquely identifies every row in the database is stored for each row. The row ID comprises a string having an encoded format that can be parsed by Oracle to quickly access the row of data corresponding to the row ID. The row IDs are used by primary key and other types of indexes to speed up queries and sorting operations.
As discussed above, the data are stored in a substantially sequential manner based on the approximate order the data are entered into a database. (For example, Oracle uses a background operation to write blocks of data in response to predetermined conditions (memory full, time interval, etc.), wherein the block writes are performed continuously, although they are slightly asynchronous to when data are actually entered.) As a result, the data for a given table are stored in a somewhat randomized order. For instance, suppose that contact information corresponding to various customers are stored in a table that includes various rows for storing the contact information, such as last name, first name, address, city, state, phone number, etc. As new contact information is entered into the database, data pertaining to a new row will be written to the initial datafile (or currently active datafile) corresponding to the tablespace the table is assigned to. Since the contact information will usually not be entered in a sequential manner (e.g., alphabetically be last name), a sequential row-by-row examination of the data in the datafile will appear to not follow any predetermined ordering scheme, as illustrated by a randomized set of all contact records 18 in database 10.
In contrast to the foregoing data storage scheme, database users (both people operating client machines and internal software components that perform batch operations) typically desire to retrieve and/or view data in an ordered (i.e., sorted) manner. To meet this criteria, databases provide various solutions for providing requested data in a sorted configuration based on specified sort criteria, such as last name sorted alphabetically. In general, the data corresponding to a requested data set (i.e., query) must first be retrieved to a temporary sort space (comprising physical storage and/or memory space), whereupon the data are sorted corresponding to the predefined ordering scheme, and then provided to an application, applet, or module that is used to present the data to the user (or provide the data to an internal software component user).
In non-indexed queries (i.e., a query that doesn't use a primary key or other index for a table or set of related tables), a full table scan must be performed to retrieve appropriate rows specified by the query (the result set or recordset), and then a memory or disk sort must be performed on the result set prior to providing the data to the user as an “open” data set. For index-based queries, one or more indexes are used to identify the appropriate rows of data in the result set, whereupon mapping information provided by the index(es) (e.g., row IDs) that identifies where those rows of data are located is used to retrieve appropriate rows of data meeting the query search criteria. In some instances, the rows of data are retrieved in a sorted order. In other cases, this is not possible or impractical, and the rows of data are first retrieved and then sorted in the sort space.
For example, suppose a user wanted to view all of the contacts stored in database 10 sorted by their last names. In the two-tier architecture illustrated in FIG. 1, the user is running a client-side application 20 on client machine 14, which includes business logic 22 that generates a query 24 in response to a user request to view all of the contract records and submits query 24 to database 10 via network 16. Assuming the data are stored in a single CONTACT table that includes a LASTNAME column, the corresponding SQL query would look like:
SELECT*FROM CONTACTS
ORDER BY LASTNAME;
Generally, a request for all records in a table will cause a table scan to be performed, regardless of whether or not indexes are used. The rows in the contact table (stored as randomized set of contact records 18) are retrieved and sent to a sort area 26, which depending on the size of the sort may comprise a memory space and/or a temporary disk storage space. The rows are then sorted, using a sort process 28 and then provided back to client-side application 20 as a sorted full recordset 30.
The retrieval and sorting of data can be very time-consuming, particularly if the sorting has to be performed using a disk sort. This is often the case for large result sets, which may involve millions, or even 10's of millions of rows of data. In such instances, even an optimized query using indexes might take 10's of minutes or even hours. Furthermore, such queries are resource (CPU and memory) intensive, often slowing access by other users to a substantial halt.
In response to this problem, RDBMS database vendors have developed various schemes to provide data to GUI-based applications that function as client-side front ends to enable users to access and view the database data. Typically, these schemes are centered around “virtual” lists of data that are either continually loaded using background operations or implemented through the use of built-in SQL commands, such as the TOP command. These conventional schemes are limited in their ability to retrieve and provide data to end users when large data sets are queried.
In continuance of the foregoing example, in one conventional embodiment, sorted full recordset 30 is logically stored as a virtual list that is managed by a virtual list manager 32. (In instances in which a very large number of rows are to be accessed, virtual list manager 32 may actually manage smaller sets of all of those rows that are continually being retrieved from database 10 using a background process—for convenience, a complete data set is illustrated in FIG. 1.) Virtual list manager 32 interacts with a GUI manager 34 that is used to provide display information to a video subsystem that includes a display device (e.g., monitor) 36 to generate a GUI display screen 38 that enables the user to selectively view contact data stored in database 10. Typically, virtual list manager 32 will implement a filter 40 to pass a small number of rows of data to GUI manager 34, which then displays the rows in GUI display screen 38.
The conventional schemes for retrieving and display data from very large data sets have several problems. Most notably, they are very slow, consume a great deal of resources, and in the case of huge data sets, may not even be possible to perform in a useable manner.
SUMMARY OF THE INVENTION
The present invention provides a method and system for accessing and displaying data records that are retrieved from large data sets in a manner that dramatically reduces the latency problems common to conventional access schemes. In accordance with the method, a large data set is logically partitioned into a plurality of “buckets” by defining “soft” boundaries corresponding to a sort order in which data records are presented to users or provided to internal users for purposes like batch processing. When a request to retrieve a data record or group of related records is received, e.g., in response to a user navigation event, the bucket of data the record or at least a portion of the group of related records is stored in is identified, and those records are retrieved from the database sorted in accordance with the sort order, and provided to the requester.
Data defining a relative position of each boundary point in the sort order are built and stored as a set of boundary markers, preferably during an administative process. Each of these boundary markers comprises a unique set of data corresponding to two or more columns in a database table on which the sort order is based, and in which at least a base portion of the data records are stored. Preferably, the chosen columns will already be indexed or have indexes applied to them subsequent to building the boundaries. In some instances, a pair of columns are used, such as columns that store first and last name data. When very large data sets are used, it may be necessary to use a tertiary set of column data, or even a higher number of columns. The boundary markers are stored in sets, preferably in accordance with the sort order they correspond to.
In response to a request to retrieve a data record or group of related data records, such as a list of contacts from a contact table or object having the same first and last name, the boundary markers are searched to determine which bucket the desired data record or records are contained in. A query is then formulated based on the boundary markers to retrieve the records corresponding to the bucket.
In a typical n-tier implementation of the invention, the software components for implementing the invention are run on an application server that receives various data requests and navigation events from various clients. Upon retrieving an appropriate bucket of data records in response to such requests, all or a viewable subset (viewset) of the records are provided to the clients to enable users to access the data record or group of related data records. In instances in which viewsets are used, new viewsets of data records are provided to the client in response to navigation events, such as scrolling. In response to a navigation event that seeks data records that are not contained in a current bucket, a new bucket in which the requested data records are contained is identified and retrieved, and a new viewset from the new bucket is presented to the user. In this manners, users can search or browse data records retrieved from data sets that may comprise 10's of millions or even 100's of millions of records, wherein new viewsets are typically provided to the user with subsecond responses.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
FIG. 1 is a schematic diagram illustrating how data records contained in a full data set are retrieved using conventional data access schemes found in the prior art;
FIG. 2 is a schematic diagram conceptually illustrating how data records from large data sets are retrieved in “buckets” in accordance with the present invention;
FIG. 3 is an exemplary architecture and data flow diagram corresponding to a two-tier implementation of the invention;
FIG. 4 is an exemplary architecture and data flow diagram corresponding to a three-tier implementation of the invention;
FIG. 5 is a flowchart illustrating the logic used by the invention when generating bucket boundaries;
FIG. 6 is a flowchart illustrating the logic used by the invention in response to user navigation events; and
FIG. 7 is a schematic diagram of a computer system that may be used for the client machines and servers in the architectures of FIGS. 3 and 4.
DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS
A system and method for retrieval and displaying data from large datasets is described in detail herein. In the following description, numerous specific details are discussed to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures or operations are not shown or described in detail to avoid obscuring aspects of various embodiments of the invention.
Reference throughout this specification to “One embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The present invention provides a mechanism for accessing very large data sets, wherein the data sets are logically broken into manageable “buckets” of data, greatly reducing the data access latencies inherent in the prior art. The mechanism and its associated operations are referred to herein a large data volume (LDV) handling or simply LDV. LDV works by dividing a query that would normally be used to retrieve a large number of rows of data (e.g., contact data from a contact table or set of related tables) into a number of smaller queries, wherein each query specifies a set of data contained in a data bucket. A bucket is defined as all of the data returned from a corresponding SQL query, set with a pre-defined sort order, in which upper and lower boundaries are used to specify the range of data contained in the bucket.
Logically, all of the rows of data for a target object (e.g., a table or set of tables containing related data) are sorted corresponding to a specified sort order and then divided into a plurality of contiguous buckets. This configuration is illustrated in FIG. 2, in which a sorted contact table 50 is divided into a plurality of buckets 52 defined by boundaries 54. As used herein, the various data in a “base” table are logically shown as being divided alphabetically into buckets and corresponding bucket boundaries. This is merely illustrative to show the buckets and boundaries are generated corresponding to a sort order, as explained below, LDV provides various schemes for dividing data sets into buckets. Upon a request from an LDV user 56 to retrieve (for batch processing) or view (for end-users) a particular record or group of related records, LDV determines an appropriate bucket that record or group is stored in and retrieves the bucket. An appropriate query 58 is then formulated and submitted to database 10 for execution, returning a data set comprising a bucket 60 that includes data contained within a lower boundary 62 and upper boundary 64 specified by the query. The difference between the lower and upper boundaries define a size of the bucket 66. The bucket is then returned to LDV user 56, whereupon it may be used in a batch process or filtered by a client application that enables users to request and view data stored in database 10. In further response to various events, such as a user requesting to scroll past the end of the records contained in a current bucket or request to seek a record not contained in the current bucket, an appropriate query is formulated to retrieve a new bucket containing the desired records. In this manner, LDV operates like a sliding window 68 over the full set of data records contained in sorted contacts table 50.
An exemplary architecture 70 illustrating one implementation of the invention is shown in FIG. 2. Architecture 70 includes several components that share the same root reference numerals (e.g., 10 and 10A, etc.) as components depicted in FIG. 1 and discussed above; it will be understood that these components perform substantially similar operations in both architectures. Architecture 70 corresponds to a two-tier environment in which a client application 72 running on a client machine 14 enables a user to selectively view contact data stored in a datafile corresponding to a CONTACT table in database 10 as a randomized set of contact records 18A. It will be understood that in addition to the use of a single table for storing contact information, such information may be stored in a set of related tables, as will be recognized by those skilled in the database arts.
As discussed above, LDV works by retrieving buckets of data rather than an entire data set. The data contained in the buckets are logically bounded by boundaries, the generation of which are described below. In one embodiment, various boundary markers are stored in database 10A, such as depicted by a set of contact boundary markers 74. Optionally, all or a portion of the boundary markers may be stored in one or more files on database server 12A, or in one or more files on client machine 14A.
In one embodiment, various LDV operations are implemented by an LDV class 76 and an LDV helper class 78 provided by client application 72. As will be understood by those skilled in the art, client application 72 may comprise a single software module, or a set of modules that interact using appropriate application program interfaces (API's) 80. In response to user input 82, a business logic component 84 determines what data records a user of client machine 14 desires to view. Business logic component 84 then provides this information to LDV class 76, which generates a first query 86 to retrieve an appropriate set of boundary marker data 88 corresponding to the object containing the data the user requested (in this instance the CONTACT table).
Upon receiving the boundary marker data, LDV class 76 parses the data to determine an appropriate pair of lower and upper boundaries corresponding to the bucket of data the requested data is contained in. For example, suppose that a user desires to search for the contacts with the last name of “Smith.” Further suppose that a set of boundary data has been generated for the contact table that uses the LASTNAME column, and includes respective boundaries corresponding to the last record for selected letters C, H, K Q, U, and Z, as illustrated in FIG. 3. Accordingly, the boundary data are parsed to determine the nearest boundaries encompassing the row(s) of data containing a last name of “Smith,” which in this case yield the boundaries at Q (lower boundary) and U (upper boundary) corresponding to a bucket 90. As a result, the following SQL query is formulated as a query 92 and submitted to database 10 to retrieve the bucket:
SELECT*FROM CONTACTS
WHERE LASTNAME>Q AND LASTNAME<=U
ORDER BY LASTNAME;
Generally, the columns used for generating boundaries will have a corresponding database index. As a result, in response to the query, database 10 uses a boundary index-based retrieval process 92 that retrieves all of the records for the CONTACTS table contained within the upper and lower boundaries for bucket 90, as depicted by all records having a LASTNAME beginning with the letters R, S, T, and U. The retrieved data are forwarded to sort area 26, whereupon they are sorted by sort process 28 and returned to client machine 14 as an open set comprising sorted bucket data 94. LDV class 76 (or another client-side module) then implements a display filter 96 that is used to determine a current subset of the sorted data in the bucket to be displayed (the “viewset) and passes the viewset to a GUI manager 96, which drives the video subsystem to display data pertaining to the records in the viewset within a GUI display screen 38A on display device 36.
Another exemplary architecture 100 for implementing the invention in an n-tier environment is shown in FIG. 4. As depicted, architecture 100 depicts a well-known 3-tier architecture in which an application server 102 “sits between” database server 12 and a plurality of client machines 14. As well be recognized by those skilled in the art, one or more additionally tiers (e.g., a web server tier) can be added to the illustrated 3-tier architecture.
Application server 102 hosts various software modules that are used to perform various operations, including delivering data to client machines 14 in response to corresponding requests for data from users of the machines. These components include an LDV class 76, and LDV helper class 78, which interact with a business logic component 104 via an API 106. In one embodiment, client machines 14 host a client application 108 that includes a business logic component 110 and a GUI manager 112.
In general, the various aspects of the LDV technology are performed in a substantially similar manner in both of architectures 70 and 100. However, the interaction between the various other software components differ, as follows. A typical request process begins with user input 114 that is received by business logic 110. The business logic processes the user input to determine what data the user desires and submits a corresponding data request 116 to application server 102. In one embodiment, business logic components 110 and 104 comprises a client-server set, that are configured to interact with one another using a common protocol. Accordingly, data requests comprising “native” data pertaining to the common protocol are sent from business logic component 110 to business logic component 104. In another embodiment, client application 108 comprises a web browser, and data request 116 comprises HTML (hyper-text markup language) data.
Typically, data request 116 will be received by business logic 104. Optionally, data request 116 will be received by LDV class 76. The LDV class and LDV helper class than interact with database 10 in a similar manner to that described above, returning a sorted data bucket 94 to the application server. The LDV class and/or business logic 104 then determine what data to filter, ultimately sending the data from business logic component 104 to client machine 14 as filtered display data 118. As before, the filtered display data may be in a native format, or comprise HTML data. Filtered display data is then passed by business logic 110 to GUI manager 112, which drives the client machine's video subsystem to generate a GUI display screen 36B on display device 36.
Building Buckets
As discussed above, LDV retrieves selected buckets of data based on data requests from users rather than entire data sets. The data contained in a given bucket is defined by the boundaries for the bucket. Generally, the boundaries are pre-determined through an administrative process that is somewhat similar to building indexes on a database. Alternately, the boundaries may be generated at runtime.
LDV boundaries are generated in the following manner. Initially, parameters for the process must be selected. With reference to FIG. 5, this process begins in a block 150 in which a table is selected to apply LDV boundaries to. Typically, these tables will store data that is usually presented to end users in a sorted list format, such as contacts, products, employees, clients, etc. In many instances, the tables will comprise “base” or “parent” tables, while the displayed data will include not only data contained in the base or parent table, but also data stored in one or more child tables that are linked to the base or parent table via a primary key-foreign key relationship. For the purposes of LDV, only the base or parent table needs to be considered when generating LDV boundaries.
Next, in a block 152, allowable sort orders for the table are determined. Generally, the sort orders will be based on the intended use of the data, and in one embodiment will be enforced though the LDV data access classes and subsequently by the graphical user interface. Each sort order to be provided requires its own unique set of boundary markers. For example, suppose the targeted table is a contact table that includes contact data stored in various columns, including LASTNAME, FIRSTNAME, COMPANY, ADDRESS, CITY, STATE, etc. Typically, it will be desired to sort the contact data alphabetically by LASTNAME. In addition, other sort orders may be included, such as alphabetical sorted on COMPANY, CITY, STATE etc. Furthermore, the sort order may be complex, comprising a combination of columns. For example, in one instance the data might be sorted on LASTNAME, then FIRSTNAME, then COMPANY. In another instance, the data may be sorted by COMPANY, then LASTNAME, then FIRSTNAME. The actual combination used will depend on the type of data, the column configurations that data are stored in, and how the data are to be presented to the user (or to be used internally by, e.g., a batch processing component). Finally, the selected column or columns should be columns that are either currently indexed, or will have a database index generated for them prior to using the LDV scheme.
The following operations are then performed for each sort order, as indicated by start and end loop blocks 154 and 156. First, in a block 158 the approximate number of records to be included within each bucket is determined. To determine this value, many considerations are required. First, how will the data generally be requested and used? In a two-tier environment for heavy batch processing, a larger bucket size is acceptable, since the entire data set will often need to be processed and the speed of a two-tier connection is provided. In an n-tier environment for GUI restricted usage, like that used in customer relationship management (CRM) data systems, a small bucket size would be better since only a small amount of the data (e.g., 20-30 rows) is viewed at once. Furthermore, since there are many users who are concurrently access the database in typical data systems of this type, larger bucket sizes should be avoided because of their higher server overhead.
Another consideration is “viewability” of data. As a general rule, the size of the bucket should be scaled inversely with the percentage of the database that the user sees. For example, if users always use selection criteria that restrict their view to half of the records in the database, then the bucket size should be doubled. This is because, if a user has a selection criteria set that eliminates half of the database from their view, then each bucket will contain only one-half of its capacity of records that match the baseline selection criteria (when compared with a baseline condition in which the boundaries are defined as if all records will be access). As a result, the actual number of records returned in the bucket remain substantially constant
Initialized through an administrative operation, the parameters are used by a process that opens the data set sorted in the appropriate order, as provided by a block 160. In practice, data corresponding to only those columns considered in the sort order need be retrieved, rather than the entire records. For example, if the sort order is based on LASTNAME, then FIRSTNAME, the corresponding SQL query might look like:
SELECT LASTNAME, FIRSTNAME FROM CONTACTS
ORDER BY LASTNAME, FIRSTNAME;
The process then loops through the open data set in a block 162, start to finish, defining boundary markers based on data found every N rows, wherein N corresponds to the number of records for the bucket determined in block 158. Depending on the data size and uniqueness of the data in the chose column, a single column may not be unique enough to mark a boundary. In these cases, the scheme enhances the boundary definition through a second and/or tertiary column, incorporating that data into the boundary marker. For example, suppose the LASTNAME, FIRSTNAME, and COMPANY columns are considered for use as boundary markers. In a small data set of approximately 5,000 rows, each contact's last name may be unique (or have limited duplications), so only data corresponding to the LASTNAME column need to be used for the boundary markers. With large data volumes, it may be common for various data values in the LASTNAME column to be repeated more than N times, such that that two or more sequential boundary markers would be identical if only the LASTNAME column was considered. In this instance, the boundary markers should include data from a second column, such as LAST NAME. In extremely large databases, these two columns may also not be sufficient, so data in a tertiary column (e.g., COMPANY) will be necessary. Similarly, this can be taken to a fourth or higher order if necessary to ensure appropriate uniqueness for each boundary marker. Optionally, a unique data column could be used, such as a client ID column or the like.
When complete, the boundary marker data are stored as a set so that it can be easily accessed during production data retrieval situations. For example, these data may be stored in a database table on database 10, on application server 102, or locally on the client machines. In one embodiment, the boundary marker data is stored as a concatenated string within an optimized database table. In accordance with a tertiary marker comprising LASTNAME, FIRSTNAME, and COMPANY data, boundary markers are stored in a concatenated string having the following format:
|Smith˜˜Frank˜˜Boeing|Smith˜˜Gerald˜˜Tyco|Smith˜˜Harold˜˜IBM|  (1)
In this format, each adjacent set of boundary marker data (e.g., “|Smith˜˜Frank ˜˜Boeing|”) corresponds to an adjacent boundary point.
In another embodiment, each set of boundary marker data may be stored as a separate record in a table. Whether an implementation uses the concatenated string format or separate records (or still other configurations that will be recognized by those skilled in the data system arts) will likely be dependent on the relative workload of the database and application servers. If the database server workload is light to moderate, built-in database operations (e.g., SQL and cursor loops) can be used to search for appropriate boundary markers. The advantage of the concatenated string technique is that the entire string can be passed to the application server (which requires very minimal overhead for the database server), and then parsed at the application server rather than having an equivalent operation performed by the database server.
As discussed above, building bucket boundaries for a given data set requires the data set to be opened. Conveniently, opening the data set can be performed using the LDV scheme itself so that the entire data set does not need to be opened (at one time) to complete the administrative process. As a result, building bucket boundaries can be carried on during 7×24 operation of the database, without adversely affecting enterprise applications that access the database. In the event that the very first run of the boundary building process is performed, appropriate seed data (i.e., estimated boundaries that should satisfactorily break up the data) can easily be implemented to define a temporary set of boundaries that may be used to retrieve records from the data set using temporary buckets, whereby the actual boundaries and corresponding markers are determined by looping through each temporary bucket of records.
Using Boundaries to Generate SQL for Data Buckets
Each bucket of data that is retrieved from the database includes all of the records contained within the lower and upper boundaries corresponding to the data bucket. Also, any given record in the entire data set will be contained in only one bucket. Accordingly, when using the boundary marker string discussed above, determination of the boundary markers corresponding to the bucket in which a particular data record is stored is performed by parsing the string until passing the first boundary marker that would be logically ordered prior to the particular data record. For example, if the search was for a “Gilbert Smith,” concatenated string (1) would be parsed until the “|Smith˜˜Gerald˜˜Tyco|” portion of the string was encountered. This represents the lower boundary marker. The next set of boundary marker data corresponds to upper boundary marker. These are used to formulate the SQL statement for the bucket.
In a simple boundary definition that only uses a single column, the SQL will include data in the where clause corresponding to that column. For example, suppose a user is searching on the last name of “Billingsby” in a CONTACT table and the upper and lower boundaries respectively comprise “Benton” and “Bingham.”The corresponding SQL statement would then be:
SELECT*FROM CONTACTS
WHERE LASTNAME>‘Benton’ AND LASTNAME<=‘Bingham’
ORDER BY LASTNAME;
A simple boundary definition, as described above, works well for thousands of rows, but can breakdown when there are many entries with the same value in an extremely large database. An example is a list of names that contains 200,000 Smith's. A more complex algorithm is required for that case. In this instance, the boundary markers will be based on multiple database columns to further break the data into appropriate buckets. In the name example, by using a secondary boundary based on the FIRSTNAME column and a tertiary boundary on the COMPANY column, hundreds of millions of records can be handled since there is a very limited chance of data grouping (i.e., relatively large groups of records having identical values for LASTNAME, FIRSTNAME, and COMPANY). Using the bucket boundary data, the resulting query can be simple to very complex depending on boundaries. A simple bucket would be created when there are many Fred Smiths in the list:
SELECT*FROM CONTACTS
WHERE LASTNAME=‘Smith’ AND FIRSTNAME=‘Fred’ AND
COMPANY>=‘Avis’ AND COMPANY<‘Tyco’
ORDER BY COMPANY;
A more complex query results when a bucket spans the first name:
SELECT*FROM CONTACTS
WHERE LASTNAME=‘Smith’ AND FIRSTNAME=‘Fred’ AND
COMPANY>=‘Xerox’ OR
LASTNAME=‘SMITH’ AND FIRSTNAME>‘Fred’ AND
FIRSTNAME<‘Funkie’
ORDER BY FIRSTNAME, COMPANY;
An even more complex query can result when the bucket spans a last name as well.
Although the SQL can appear to be quite complex, resulting query plans show very quick isolation of data due to the user of indexed columns according to the specified sort order defined by the corresponding boundary marker set.
Scrolling Through Data Buckets
When implemented, the LDV scheme enables users to navigate through various data screens or “views” during which appropriate buckets are retrieved and data contained in those buckets are displayed in a manner that is transparent to the users. A flowchart illustrating the logic used by the invention to implement this functionality is shown in FIG. 6.
Initially, a query will be performed to retrieve a first bucket of data based on a requested search, as provided by a block 170 in FIG. 6. When a typical data set is opened, the data returned includes a sorted set of all of the data matching the SQL query criteria, with record positioned at the beginning of the data set being marked as the active record. With LDV applied, the returned record set includes all of the data falling within the first bucket, positioned on the first record, and has knowledge that the record set is an “incomplete” set. As the bucket data is navigated (e.g., browsed or scrolled through using various user inputs), appropriate viewsets (i.e., subsets of the bucket that are displayed at any one time to the user) are presented to the user. As used herein, each user navigation input (e.g., page-up, page down, etc.) will generate a new navigation “context” for the user, and an appropriate new viewset will be provided to the user. These operations are depicted by blocks 172 and 174 in the flowchart. In effect, the viewset acts as a sliding window over the sorted records in the bucket. At the same time, the LDV classes monitor regular record set calls corresponding to navigation events such as MoveNext, MovePrevious and EOF, to determine if the user desires to view data that is not contained in the current bucket. If this condition is sensed, as indicated by a YES (TRUE) result to a decision block 176, appropriate boundaries for the new bucket are determined in a block 178. In one embodiment, the current bucket is discarded in a block 180. Optionally, the data corresponding to the current bucket may be maintained in a cache, such as a LRU (least recently used) cache. The choice of whether to discard buckets or use a cache will depend generally depend on the latency characteristics of the bucket retrievals.
Next, in a block 182, the new bucket of data is retrieved from the database, whereupon an appropriate viewset is sent to the client and the active record is positioned to meet the user's request. For example, if the user request data for “Frank Smith” the active record will correspond to the first “Frank Smith” encountered in the sorted list.
Sparsely Populated or Restricted Record Sets
Generally, the entire set is open when the GUI allows for an unfiltered list of data or when a batch process is going to operate on all of the data within a table. However, filters are often applied to the sorted columns of the query in order to return a targeted set of data. Analysis of the additional SQL and monitoring of the retrieved data sets allows the LDV helper class to alter the way it selects bucket boundaries to more optimally return data. For example, if a ‘user’ only requires people with a last name starting with “S”, the helper processing immediately recognizes the additional restriction on the sort order columns and begins its bucket boundaries at the first boundary pair that includes that data. For example, the additional SQL (indicated in bold) for the first bucket might be:
SELECT*FROM CONTACTS
WHERE LASTNAME LIKE ‘S*’
AND LASTNAME=>‘S’ AND
LASTNAME<‘Sampson’
ORDER BY LASTNAME;
If the query is filtered on a column not covered by the LDV scheme or is part of an external table join, the pre-query processing to determine a bucket boundary cannot be made. However, through monitoring of the returned data sets, the LDV helper class can alter the number of buckets returned by one query. In the simplest form, this is accomplished by skipping a boundary definition and moving to the next one. For example, suppose a user selects everyone in New York City from a database that containes data on every person in the United States. This would result, assuming evenly distributed last names throughout the US, in about {fraction (1/25)}th of the expected rows per bucket, since approximaty one in every twenty-five Americans live in New York City. In an attempt to retrieve the preset optimal count per bucket, the helper class would monitor the number of records returned in response to an initial or prior query, and internally increase the number of skipped boundaries in subsequent queries to retrieve new buckets of data as scrolling continues until the general returned amount is close to a desired bucket size. In this example, the helper might observer that the number of records retrived via a first query resulted in only {fraction (1/25)}th of the number of records in the desired bucket size, whereupon the helper may adjust subsequent queries to skip 25 boundaries per re-query. The end result of this extra processing reduces unnecessary trips to the database server during a scrolling operation. This added technology is known as Dynamic Bucketing.
Rapid Seek Mechanism
When displaying a list of records, there are two main functional needs that must be met: The ability to soft seek on a piece of data, and the ability to re-open the application properly positioned within the list. Although trivial with small amounts of data, these become very complex tasks when dealing with hundreds of millions of rows. LDV bucketing information allows these tasks to be accomplished in subsecond time frames.
Using the boundary data, a soft seek forward in the sorted list becomes a simple task. The LDV class finds the appropriate bucket that the search for piece of data will lie in, returns that bucket of data (typically a sub-second call), and positions the active record on the appropriate record using its regular seek process. All the overhead of opening, sorting and seeking through hundreds of millions of rows is avoided.
To restart an application on a certain record, a unique key is usually known—such as the client ID. To take advantage of the boundary data, the application stores the client ID (or other unique identifyier) for the last records that was active and simply needs to query a single row corresponding to the stored client ID to return to the record(very quick). The appropriate sort column data can then be extracted from the record, enabling an appropriate bucket containing the record to be identified, whereupon the record set corresponding to the bucket is retrieved and opened (typically a sub-second call), and the active record is positioned on the record corresponding to the stored client ID.
The present invention provides several advantages over conventional large data set access techniques. Most notably, data access latencies are greatly reduced, often enabling a user to literally scroll through millions of records without requiring the user to wait for data to be retrieved or screens to be updated. The LDV scheme easily scales from small data sets to very large data sets, and enables a large number of concurrent users to leverage its features, while causing minimal additional overhead for both the database server and application servers.
Exemplary Computer System for Practicing the Invention
With reference to FIG. 7, a generally conventional computer 200 is illustrated, which is suitable for use as client machines, application servers, and database servers in connection with practicing the present invention, and may be used for running client and server-side software comprising one or more software modules that implement the various operations of the invention discussed above. Examples of computers that may be suitable for client machines as discussed above include PC-class systems operating the Windows NT or Windows 2000 operating systems, Sun workstations operating the UNIX-based Solaris operating system, and various computer architectures that implement LINUX operating systems. Computer 200 is also intended to encompass various server architectures, as well as computers having multiple processors.
Computer 200 includes a processor chassis 202 in which are mounted a floppy disk drive 204, a hard drive 206, a motherboard 208 populated with appropriate integrated circuits including memory 210 and one or more processors (CPUs) 212, and a power supply (not shown), as are generally well known to those of ordinary skill in the art. It will be understood that hard drive 206 may comprise a single unit, or multiple hard drives, and may optionally reside outside of computer 200. A monitor 214 is included for displaying graphics and text generated by software programs and program modules that are run by the computer. A mouse 216 (or other pointing device) may be connected to a serial port (or to a bus port or USB port) on the rear of processor chassis 202, and signals from mouse 216 are conveyed to the motherboard to control a cursor on the display and to select text, menu options, and graphic components displayed on monitor 214 by software programs and modules executing on the computer. In addition, a keyboard 218 is coupled to the motherboard for user entry of text and commands that affect the running of software programs executing on the computer. Computer 200 also includes a network interface card 220 or built-in network adapter for connecting the computer to a computer network, such as a local area network, wide area network, or the Internet.
Computer 200 may also optionally include a compact disk-read only memory (CD-ROM) drive 222 into which a CD-ROM disk may be inserted so that executable files and data on the disk can be read for transfer into the memory and/or into storage on hard drive 206 of computer 200. Other mass memory storage devices such as an optical recorded medium or DVD drive may be included. The machine instructions comprising the software that causes the CPU to implement the functions of the present invention that have been discussed above will likely be distributed on floppy disks or CD-ROMs (or other memory media) and stored in the hard drive until loaded into random access memory (RAM) for execution by the CPU. Optionally, all or a portion of the machine instructions may be loaded via a computer network.
Although the present invention has been described in connection with a preferred form of practicing it and modifications thereto, those of ordinary skill in the art will understand that many other modifications can be made to the invention within the scope of the claims that follow. Accordingly, it is not intended that the scope of the invention in any way be limited by the above description, but instead be determined entirely by reference to the claims that follow.

Claims (23)

1. A computer implemented method for accessing data records from a large set of records stored in a database, comprising:
Defining a plurality of boundaries to logically partition the large set of records into a plurality of buckets in accordance with a predefined sort order, said sort order based on data stored in at least two columns of a database table in which at least a base portion of each data record is stored;
in response to a request to retrieve a data record;
determining a bucket in which the data record is contained;
retrieving a subset of the large set of records from the database corresponding to data records contained in the bucket; and
searching the plurality of boundaries for a sequentially-adjacent pair of boundaries as defined by the predetermined sort order that bound the data record in the request;
generating a query including a search clause defining a data set that is bounded by the sequentially-adjacent pair of boundaries; and
executing the query against the database.
2. The method of claim 1, further comprising storing a boundary marker for each boundary comprising data pertaining to said at least two columns of the database table.
3. The method of claim 2, wherein each boundary marker comprises first, second and tertiary data values pertaining to data values stored in respective columns in the database table, said data values pertaining to an actual record or pseudo record to which that boundary marker corresponds.
4. The method of claim 2, wherein the boundary markers are stored as a concatenated list.
5. The method of claim 2, wherein the boundary markers are stored as a set of respective records in the database.
6. The method of claim 1, wherein the subset of data records corresponding to the bucket is provided to a batch processing component, further comprising retrieving other buckets of data and providing them to the batch processing component until data records corresponding to the entire large set of data records have been provided to the batch processing component.
7. The method of claim 1, further comprising:
monitoring user navigation events;
sending data corresponding to a current viewset to a client, said viewset comprising a subset of data records in the data bucket and corresponding to a current user navigation context; and
in response to a user navigation event corresponding to a request to navigate to a new navigation context, updating the current viewset with a new viewset by sending a new subset of data records in the data bucket, wherein the new viewset corresponds to the new navigation context.
8. The method of claim 7, further comprising:
determining if a user navigation event corresponds to a request to view one or more data records that are not in a current data bucket;
and, in response to such a determination,
formulating a query request corresponding to a new data bucket that contains the data records that are requested to be viewed;
executing the query to retrieve data records corresponding to the new data bucket from the database; and
providing data corresponding to a new viewset comprising a subset of the data records in the new data bucket to the client.
9. The method of claim 7, wherein the client is a web browser and the data corresponding to the viewsets are sent as HTML data over a network to a client machine on which the web browser is running.
10. A machine-readable media on which a plurality of instructions are stored that when executed by a computing machine perform the operations of:
receiving a request to retrieve data records from a database on which a large set of records are stored;
searching a predefined set of boundaries that logically partition the large set of records into a plurality of buckets in accordance with a predefined sort order that is based on data stored in at least two columns of a database table in which at least a base portion of each data record is stored;
determining a bucket in which the data record is contained;
formulating a query to retrieve a subset of the large set of records, said subset corresponding to the data records contained in the bucket;
submitting the query to the database for execution;
receiving the subset of records contained in the bucket from the database;
searching the plurality of boundaries for a sequentially-adjacent pair of boundaries as defined by the predetermined sort order that bound the data record in the request; and
formulating the query to include a search clause defining a data set that is bounded by the sequentially-adjacent pair of boundaries.
11. The machine-readable media of claim 10, wherein execution of the instructions further performs the operations of:
providing the subset of data records corresponding to the bucket to a batch processing component running on the computing machine or on another computing machine linked in communication with the machine;
iteratively formulating and submitting queries to the database to retrieve, other buckets of data and providing them to the batch processing component until data records corresponding to the entire large set of data records have been provided to the batch processing component.
12. The machine-readable media of claim 10, wherein execution of the instructions further performs the operations of:
monitoring user navigation events received from a client;
sending data corresponding to a current viewset to the client, said viewset comprising a subset of data records in the data bucket and corresponding to a current user navigation context; and
in response to a user navigation event corresponding to a request to navigate to a new navigation context, updating the current viewset with a new viewset by sending a new subset of data records in the data bucket, wherein the new viewset corresponds to the new navigation context.
13. The machine-readable media of claim 12, wherein execution of the instructions further performs the operations of:
determining if a user navigation event corresponds to a request to view one or more data records that are not in a current data bucket;
and, in response to such a determination,
formulating a query request corresponding to a new data bucket that contains the data records that are requested to be viewed;
executing the query to retrieve data records corresponding to the new data bucket from the database; and
providing data corresponding to a new viewset comprising a subset of the data records in the new data bucket to the client.
14. The machine-readable media of claim 12, wherein the client is a web browser and the data corresponding to the viewsets are sent as HTML data over a network to a client machine on which the web browser is running.
15. The machine-readable media of claim 10, wherein execution of the instructions further performs the operations of:
monitoring a number of records returned by a query that is submitted to the database to retrieve the bucket of data records; and
altering the query to dynamically adjust the size of subsequent buckets based on the number of records returned by a prior query relative to a number of records contained in a bucket having a desired size.
16. The machine-readable media of claim 15, wherein the size of the subsequent buckets are adjusted by skipping a determined number of boundaries.
17. A computer system comprising:
a memory in which a plurality of machine instructions are stored;
a network interface to link the computer in communication with a database server; and
a processor, coupled to the memory and the network interface, to execute the plurality of machine instructions to cause the computer system to perform the operations of:
receiving a request to retrieve data records from a database hosted by the database server on which a large set of records are stored;
searching a predefined set of boundaries that logically partition the large set of records into a plurality of buckets in accordance with a predefined sort order that is based on data stored in at least two columns of a database table in which at least a base portion of each data record is stored;
determining a bucket in which the data record is contained;
formulating a query to retrieve a subset of the large set of records, said subset corresponding to the data records contained in the bucket;
submitting the query over the network interface to the database server for execution;
receiving the subset of records contained in the bucket from the database server;
searching the plurality of boundaries for a sequentially-adjacent pair of boundaries as defined by the predetermined sort order that bound the data record in the request; and
formulating the query to include a search clause defining a data set that is bounded by the sequentially-adjacent pair of boundaries.
18. The computer system of claim 17, wherein execution of the machine instructions further performs the operations of:
providing the subset of data records corresponding to the bucket to a batch processing component running on the computing machine or on another computing machine linked in communication with the machine;
iteratively formulating and submitting queries to the database to retrieve, other buckets of data and providing them to the batch processing component until data records corresponding to the entire large set of data records have been provided to the batch processing component.
19. The computer system of claim 17, wherein the computer system is linked in communication with a client machine via the network interface and wherein execution of the machine instructions further performs the operations of:
monitoring user navigation events received from the client machine;
sending data corresponding to a current viewset to the client machine, said viewset comprising a subset of data records in the data bucket and corresponding to a current user navigation context; and
in response to a user navigation event corresponding to a request to navigate to a new navigation context, updating the current viewset with a new viewset by sending a new subset of data records in the data bucket to the client machine, wherein the new viewset corresponds to the new navigation context.
20. The computer system of claim 19, wherein execution of the machine instructions further performs the operations of:
determining if a user navigation event corresponds to a request to view one or more data records that are not in a current data bucket;
and, in response to such a determination,
formulating a query request corresponding to a new data bucket that contains the data records that are requested to be viewed;
submitting the query request to the database server;
receiving corresponding to the new data bucket from the database server in response to the query request; and
providing data corresponding to a new viewset comprising a subset of the data records in the new data bucket to the client machine.
21. The computer system of claim 19, wherein a web browser is running on the client machine and the data corresponding to the viewsets are sent as HTML data via the network interface to the client machine.
22. The computer system of claim 17, wherein execution of the machine instructions further performs the operations of:
monitoring a number of records returned by a query that is submitted to the database server to retrieve the bucket of data records; and
altering the query to dynamically adjust the size of subsequent buckets based on the number of records returned by a prior query relative to a number of records contained in a bucket having a desired size.
23. The computer system of claim 22, wherein the size of the subsequent buckets are adjusted by skipping a determined number of boundaries.
US10/025,061 2001-12-18 2001-12-18 Method and system for access and display of data from large data sets Expired - Lifetime US6907422B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/025,061 US6907422B1 (en) 2001-12-18 2001-12-18 Method and system for access and display of data from large data sets
US11/101,752 US7899844B2 (en) 2001-12-18 2005-04-08 Method and system for access and display of data from large data sets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/025,061 US6907422B1 (en) 2001-12-18 2001-12-18 Method and system for access and display of data from large data sets

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/101,752 Continuation US7899844B2 (en) 2001-12-18 2005-04-08 Method and system for access and display of data from large data sets

Publications (1)

Publication Number Publication Date
US6907422B1 true US6907422B1 (en) 2005-06-14

Family

ID=34632396

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/025,061 Expired - Lifetime US6907422B1 (en) 2001-12-18 2001-12-18 Method and system for access and display of data from large data sets
US11/101,752 Expired - Lifetime US7899844B2 (en) 2001-12-18 2005-04-08 Method and system for access and display of data from large data sets

Family Applications After (1)

Application Number Title Priority Date Filing Date
US11/101,752 Expired - Lifetime US7899844B2 (en) 2001-12-18 2005-04-08 Method and system for access and display of data from large data sets

Country Status (1)

Country Link
US (2) US6907422B1 (en)

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050004936A1 (en) * 2003-07-03 2005-01-06 Oracle International Corporation Fact table storage in a decision support system environment
US20050240559A1 (en) * 2004-04-22 2005-10-27 International Business Machines Corporation Framework for retrieval and display of large result sets
US20050257166A1 (en) * 2004-05-11 2005-11-17 Tu Edgar A Fast scrolling in a graphical user interface
US20060020579A1 (en) * 2004-07-22 2006-01-26 Microsoft Corporation System and method for graceful degradation of a database query
US20060041835A1 (en) * 2004-08-19 2006-02-23 International Business Machines Corporation User-controlled web browser table reduction
US7051041B1 (en) * 2002-10-21 2006-05-23 Hewlett-Packard Development Company, L.P. Simplified relational database extension to DBM hash tables and method for using same
US20060136246A1 (en) * 2004-12-22 2006-06-22 Tu Edgar A Hierarchical program guide
US20060143563A1 (en) * 2004-12-23 2006-06-29 Sap Aktiengesellschaft System and method for grouping data
US20060224604A1 (en) * 2005-04-05 2006-10-05 Landsman Richard A Content-descriptive user interface correlation for page navigation systems
US20060242119A1 (en) * 2005-04-20 2006-10-26 Bea Systems, Inc. SQL-style filtered rowset
US20060242120A1 (en) * 2005-04-20 2006-10-26 Bea Systems, Inc. Sorted Rowset
US20060242100A1 (en) * 2005-04-20 2006-10-26 Bea Systems, Inc. Shared rowset
US20070073634A1 (en) * 2005-09-23 2007-03-29 Chicago Mercantile Exchange Non-indexed in-memory data storage and retrieval
US20070139443A1 (en) * 2005-12-12 2007-06-21 Sonny Computer Entertainment Inc. Voice and video control of interactive electronically simulated environment
US20080077656A1 (en) * 2002-09-06 2008-03-27 Oracle International Corporation Method and apparatus for a report cache in a near real-time business intelligence system
US20080114733A1 (en) * 2006-11-14 2008-05-15 Microsoft Corporation User-structured data table indexing
US7412481B2 (en) 2002-09-16 2008-08-12 Oracle International Corporation Method and apparatus for distributed rule evaluation in a near real-time business intelligence system
US7454423B2 (en) 2002-09-06 2008-11-18 Oracle International Corporation Enterprise link for a software database
US20080288527A1 (en) * 2007-05-16 2008-11-20 Yahoo! Inc. User interface for graphically representing groups of data
US20080294595A1 (en) * 2007-05-22 2008-11-27 Yahoo! Inc. Visual interface to indicate custom binning of items
US20080307189A1 (en) * 2007-06-11 2008-12-11 Microsoft Corporation, Data partitioning via bucketing bloom filters
US20080313161A1 (en) * 2007-06-12 2008-12-18 Microsoft Corporation Discosql: distributed processing of structured queries
WO2009042941A1 (en) * 2007-09-28 2009-04-02 Initiate Systems, Inc. Method and system for analysis of a system for matching data records
US20090089071A1 (en) * 2007-10-02 2009-04-02 Chicago Mercantile Exchange, Inc. Compressed non-indexed data storage
US20090187511A1 (en) * 2005-09-23 2009-07-23 Chicago Mercantile Exchange Inc. Live alerts
US20090299914A1 (en) * 2005-09-23 2009-12-03 Chicago Mercantile Exchange Inc. Publish and Subscribe System Including Buffer
US7640264B1 (en) * 2005-09-23 2009-12-29 Emc Corporation System and methods for querying a report database
US20090327319A1 (en) * 2008-06-25 2009-12-31 International Business Machines Corporation Hybrid data model and user interaction for data sets in a user interface
US7668917B2 (en) 2002-09-16 2010-02-23 Oracle International Corporation Method and apparatus for ensuring accountability in the examination of a set of data elements by a user
US7739229B2 (en) 2007-05-22 2010-06-15 Yahoo! Inc. Exporting aggregated and un-aggregated data
US20100192087A1 (en) * 2009-01-26 2010-07-29 Thomas Stanton Brugler System, method, and computer program product for generating a representative view for a multipage compilation of information
US20100198856A1 (en) * 2009-02-03 2010-08-05 Honeywell International Inc. Method to assist user in creation of highly inter-related models in complex databases
US20110010346A1 (en) * 2007-03-22 2011-01-13 Glenn Goldenberg Processing related data from information sources
US7904823B2 (en) 2003-03-17 2011-03-08 Oracle International Corporation Transparent windows methods and apparatus therefor
US7912899B2 (en) 2002-09-06 2011-03-22 Oracle International Corporation Method for selectively sending a notification to an instant messaging device
US7941542B2 (en) 2002-09-06 2011-05-10 Oracle International Corporation Methods and apparatus for maintaining application execution over an intermittent network connection
US7945846B2 (en) 2002-09-06 2011-05-17 Oracle International Corporation Application-specific personalization for data display
US8122056B2 (en) 2007-05-17 2012-02-21 Yahoo! Inc. Interactive aggregation of data on a scatter plot
US8165993B2 (en) 2002-09-06 2012-04-24 Oracle International Corporation Business intelligence system with interface that provides for immediate user action
US20120117079A1 (en) * 2006-10-05 2012-05-10 Splunk Inc. Time series search engine
US8255454B2 (en) 2002-09-06 2012-08-28 Oracle International Corporation Method and apparatus for a multiplexed active data window in a near real-time business intelligence system
US8321393B2 (en) 2007-03-29 2012-11-27 International Business Machines Corporation Parsing information in data records and in different languages
US8321383B2 (en) 2006-06-02 2012-11-27 International Business Machines Corporation System and method for automatic weight generation for probabilistic matching
US20130006921A1 (en) * 2003-02-07 2013-01-03 Bruce Wayne Britton Method For Transferring Data into Database Systems
US8356009B2 (en) 2006-09-15 2013-01-15 International Business Machines Corporation Implementation defined segments for relational database systems
US8359339B2 (en) 2007-02-05 2013-01-22 International Business Machines Corporation Graphical user interface for configuration of an algorithm for the matching of data records
US8370355B2 (en) 2007-03-29 2013-02-05 International Business Machines Corporation Managing entities within a database
US8370366B2 (en) 2006-09-15 2013-02-05 International Business Machines Corporation Method and system for comparing attributes such as business names
US8402095B2 (en) 2002-09-16 2013-03-19 Oracle International Corporation Apparatus and method for instant messaging collaboration
US8417702B2 (en) 2007-09-28 2013-04-09 International Business Machines Corporation Associating data records in multiple languages
US8423514B2 (en) 2007-03-29 2013-04-16 International Business Machines Corporation Service provisioning
US8429220B2 (en) 2007-03-29 2013-04-23 International Business Machines Corporation Data exchange among data sources
US8510338B2 (en) 2006-05-22 2013-08-13 International Business Machines Corporation Indexing information about entities with respect to hierarchies
US8589415B2 (en) 2006-09-15 2013-11-19 International Business Machines Corporation Method and system for filtering false positives
US8713434B2 (en) 2007-09-28 2014-04-29 International Business Machines Corporation Indexing, relating and managing information about entities
US8793292B1 (en) * 2009-10-23 2014-07-29 Lockheed Martin Corporation System and method for computer simulations
US10019496B2 (en) 2013-04-30 2018-07-10 Splunk Inc. Processing of performance data and log data from an information technology environment by using diverse data stores
US20180218037A1 (en) * 2017-01-31 2018-08-02 Splunk Inc. Using an inverted index in a pipelined search query to determine a set of event data that is further limited by filtering and/or processing of subsequent query pipestages
US10225136B2 (en) 2013-04-30 2019-03-05 Splunk Inc. Processing of log data and performance data obtained via an application programming interface (API)
US10229150B2 (en) 2015-04-23 2019-03-12 Splunk Inc. Systems and methods for concurrent summarization of indexed data
US20190087156A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Sorting a table in analytical databases
US10318541B2 (en) 2013-04-30 2019-06-11 Splunk Inc. Correlating log data with performance measurements having a specified relationship to a threshold value
US10346357B2 (en) 2013-04-30 2019-07-09 Splunk Inc. Processing of performance data and structure data from an information technology environment
US10353957B2 (en) 2013-04-30 2019-07-16 Splunk Inc. Processing of performance data and raw log data from an information technology environment
US10387396B2 (en) 2013-01-31 2019-08-20 Splunk Inc. Collection query driven generation of summarization information for raw machine data
US10402384B2 (en) 2012-05-18 2019-09-03 Splunk Inc. Query handling for field searchable raw machine data
US10614132B2 (en) 2013-04-30 2020-04-07 Splunk Inc. GUI-triggered processing of performance data and log data from an information technology environment
US10997191B2 (en) 2013-04-30 2021-05-04 Splunk Inc. Query-triggered processing of performance data and log data from an information technology environment
US11003673B2 (en) 2018-09-06 2021-05-11 Microsoft Technology Licensing, Llc Dynamic level group creation
US11238044B2 (en) 2019-05-06 2022-02-01 International Business Machines Corporation Candidate data record prioritization for match processing
US11379530B2 (en) * 2017-01-31 2022-07-05 Splunk Inc. Leveraging references values in inverted indexes to retrieve associated event records comprising raw machine data
US11397715B2 (en) 2019-07-31 2022-07-26 International Business Machines Corporation Defining indexing fields for matching data entities
US20220391059A1 (en) * 2020-08-25 2022-12-08 Beijing Bytedance Network Technology Co., Ltd. Method and apparatus for displaying active friend information, electronic device, and storage medium
US11789950B1 (en) * 2020-10-19 2023-10-17 Splunk Inc. Dynamic storage and deferred analysis of data stream events
US11809384B2 (en) * 2017-03-06 2023-11-07 Microsoft Technology Licensing, Llc Optimized data storage for fast retrieval

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8095531B2 (en) * 2006-10-03 2012-01-10 Salesforce.Com, Inc. Methods and systems for controlling access to custom objects in a database
US20100031167A1 (en) * 2008-08-04 2010-02-04 Alexander Roytman Browser-based development tools and methods for developing the same
JP5230184B2 (en) * 2007-12-12 2013-07-10 三菱電機株式会社 SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM
US20120166402A1 (en) * 2010-12-28 2012-06-28 Teradata Us, Inc. Techniques for extending horizontal partitioning to column partitioning
CN102857483B (en) 2011-06-30 2016-06-29 国际商业机器公司 Prefetch the method for data, equipment and device
US10664474B1 (en) 2013-03-15 2020-05-26 Progress Software Corporation Query system
US10108649B2 (en) * 2014-02-25 2018-10-23 Internatonal Business Machines Corporation Early exit from table scans of loosely ordered and/or grouped relations using nearly ordered maps
CN104021161B (en) * 2014-05-27 2018-06-15 华为技术有限公司 A kind of clustering storage method and device
US10489366B2 (en) * 2017-01-27 2019-11-26 Salesforce.Com, Inc. Change data capture using nested buckets
US10565202B2 (en) * 2017-02-08 2020-02-18 International Business Machines Corporation Data write/import performance in a database through distributed memory
US11061918B2 (en) 2017-04-05 2021-07-13 Splunk Inc. Locating and categorizing data using inverted indexes
US10853399B2 (en) 2017-04-05 2020-12-01 Splunk Inc. User interface search tool for locating and summarizing data
US11106713B2 (en) * 2017-04-05 2021-08-31 Splunk Inc. Sampling data using inverted indexes in response to grouping selection
US11386264B2 (en) * 2018-09-14 2022-07-12 Sap Se Configuring complex tables in a client experience framework

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6014656A (en) * 1996-06-21 2000-01-11 Oracle Corporation Using overlapping partitions of data for query optimization
US6289334B1 (en) * 1994-01-31 2001-09-11 Sun Microsystems, Inc. Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system
US6571282B1 (en) * 1999-08-31 2003-05-27 Accenture Llp Block-based communication in a communication services patterns environment
US6691109B2 (en) * 2001-03-22 2004-02-10 Turbo Worx, Inc. Method and apparatus for high-performance sequence comparison

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5729730A (en) * 1995-03-28 1998-03-17 Dex Information Systems, Inc. Method and apparatus for improved information storage and retrieval system
US5924089A (en) * 1996-09-03 1999-07-13 International Business Machines Corporation Natural language translation of an SQL query
US5926807A (en) * 1997-05-08 1999-07-20 Microsoft Corporation Method and system for effectively representing query results in a limited amount of memory
US6295539B1 (en) * 1998-09-14 2001-09-25 Computer Associates Think, Inc. Dynamic determination of optimal process for enforcing constraints
CA2279359C (en) * 1999-07-30 2012-10-23 Basantkumar John Oommen A method of generating attribute cardinality maps
US6438562B1 (en) * 1999-08-24 2002-08-20 Oracle Corporation Parallel index maintenance
US6546382B1 (en) * 1999-11-03 2003-04-08 Oracle Corporation Finding the TOP N values through the execution of a query
US6421666B1 (en) * 1999-11-08 2002-07-16 Oracle Corporation Mechanism for sharing ancillary data between a family of related functions
US6694324B1 (en) * 1999-12-16 2004-02-17 Ncr Corporation Determination of records with a specified number of largest or smallest values in a parallel database system
US6505189B1 (en) * 2000-06-15 2003-01-07 Ncr Corporation Aggregate join index for relational databases
AU2002222963B2 (en) * 2000-07-13 2007-05-10 Oracle International Corporation Performing spreadsheet-like calculations in a database system
US6879984B2 (en) * 2000-10-05 2005-04-12 Clareos, Inc. Analytical database system that models data to speed up and simplify data analysis
US6795817B2 (en) * 2001-05-31 2004-09-21 Oracle International Corporation Method and system for improving response time of a query for a partitioned database object
US6691099B1 (en) * 2001-05-31 2004-02-10 Oracle International Corporation Method and system for histogram determination in a database
US6732085B1 (en) * 2001-05-31 2004-05-04 Oracle International Corporation Method and system for sample size determination for database optimizers
US7328212B2 (en) * 2001-05-31 2008-02-05 Oracle International Corporation Generalized method for modeling complex ordered check constraints in a relational database system
US7499907B2 (en) * 2001-10-12 2009-03-03 Teradata Us, Inc. Index selection in a database system
US7080072B1 (en) * 2001-11-14 2006-07-18 Ncr Corp. Row hash match scan in a partitioned database system
US7035851B1 (en) * 2001-12-07 2006-04-25 Ncr Corp. Reorganization of database records using partition merging

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6289334B1 (en) * 1994-01-31 2001-09-11 Sun Microsystems, Inc. Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system
US6014656A (en) * 1996-06-21 2000-01-11 Oracle Corporation Using overlapping partitions of data for query optimization
US6571282B1 (en) * 1999-08-31 2003-05-27 Accenture Llp Block-based communication in a communication services patterns environment
US6691109B2 (en) * 2001-03-22 2004-02-10 Turbo Worx, Inc. Method and apparatus for high-performance sequence comparison

Cited By (154)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080077656A1 (en) * 2002-09-06 2008-03-27 Oracle International Corporation Method and apparatus for a report cache in a near real-time business intelligence system
US7945846B2 (en) 2002-09-06 2011-05-17 Oracle International Corporation Application-specific personalization for data display
US8577989B2 (en) * 2002-09-06 2013-11-05 Oracle International Corporation Method and apparatus for a report cache in a near real-time business intelligence system
US8255454B2 (en) 2002-09-06 2012-08-28 Oracle International Corporation Method and apparatus for a multiplexed active data window in a near real-time business intelligence system
US8566693B2 (en) 2002-09-06 2013-10-22 Oracle International Corporation Application-specific personalization for data display
US7454423B2 (en) 2002-09-06 2008-11-18 Oracle International Corporation Enterprise link for a software database
US8001185B2 (en) 2002-09-06 2011-08-16 Oracle International Corporation Method and apparatus for distributed rule evaluation in a near real-time business intelligence system
US8165993B2 (en) 2002-09-06 2012-04-24 Oracle International Corporation Business intelligence system with interface that provides for immediate user action
US7899879B2 (en) * 2002-09-06 2011-03-01 Oracle International Corporation Method and apparatus for a report cache in a near real-time business intelligence system
US9094258B2 (en) 2002-09-06 2015-07-28 Oracle International Corporation Method and apparatus for a multiplexed active data window in a near real-time business intelligence system
US7941542B2 (en) 2002-09-06 2011-05-10 Oracle International Corporation Methods and apparatus for maintaining application execution over an intermittent network connection
US7912899B2 (en) 2002-09-06 2011-03-22 Oracle International Corporation Method for selectively sending a notification to an instant messaging device
US8402095B2 (en) 2002-09-16 2013-03-19 Oracle International Corporation Apparatus and method for instant messaging collaboration
US7668917B2 (en) 2002-09-16 2010-02-23 Oracle International Corporation Method and apparatus for ensuring accountability in the examination of a set of data elements by a user
US7412481B2 (en) 2002-09-16 2008-08-12 Oracle International Corporation Method and apparatus for distributed rule evaluation in a near real-time business intelligence system
US7051041B1 (en) * 2002-10-21 2006-05-23 Hewlett-Packard Development Company, L.P. Simplified relational database extension to DBM hash tables and method for using same
US20130006921A1 (en) * 2003-02-07 2013-01-03 Bruce Wayne Britton Method For Transferring Data into Database Systems
US8554722B2 (en) * 2003-02-07 2013-10-08 Teradata Us, Inc. Method for transferring data into database systems
US7904823B2 (en) 2003-03-17 2011-03-08 Oracle International Corporation Transparent windows methods and apparatus therefor
US7480662B2 (en) * 2003-07-03 2009-01-20 Oracle International Corporation Fact table storage in a decision support system environment
US20050004936A1 (en) * 2003-07-03 2005-01-06 Oracle International Corporation Fact table storage in a decision support system environment
US7447682B2 (en) * 2004-04-22 2008-11-04 International Business Machines Corporation Framework for retrieval and display of large result sets
US20050240559A1 (en) * 2004-04-22 2005-10-27 International Business Machines Corporation Framework for retrieval and display of large result sets
US7681141B2 (en) * 2004-05-11 2010-03-16 Sony Computer Entertainment America Inc. Fast scrolling in a graphical user interface
US20050257166A1 (en) * 2004-05-11 2005-11-17 Tu Edgar A Fast scrolling in a graphical user interface
US7580920B2 (en) * 2004-07-22 2009-08-25 Microsoft Corporation System and method for graceful degradation of a database query
US20060020579A1 (en) * 2004-07-22 2006-01-26 Microsoft Corporation System and method for graceful degradation of a database query
US8271866B2 (en) * 2004-08-19 2012-09-18 International Business Machines Corporation User-controlled web browser table reduction
US20060041835A1 (en) * 2004-08-19 2006-02-23 International Business Machines Corporation User-controlled web browser table reduction
US20060136246A1 (en) * 2004-12-22 2006-06-22 Tu Edgar A Hierarchical program guide
US20060143563A1 (en) * 2004-12-23 2006-06-29 Sap Aktiengesellschaft System and method for grouping data
US7685510B2 (en) * 2004-12-23 2010-03-23 Sap Ag System and method for grouping data
US20060224604A1 (en) * 2005-04-05 2006-10-05 Landsman Richard A Content-descriptive user interface correlation for page navigation systems
US7502894B2 (en) 2005-04-20 2009-03-10 Bea Systems, Inc. Shared rowset
US20060242100A1 (en) * 2005-04-20 2006-10-26 Bea Systems, Inc. Shared rowset
US20060242120A1 (en) * 2005-04-20 2006-10-26 Bea Systems, Inc. Sorted Rowset
US20060242119A1 (en) * 2005-04-20 2006-10-26 Bea Systems, Inc. SQL-style filtered rowset
US7558779B2 (en) * 2005-04-20 2009-07-07 Bea Systems, Inc. Sorted rowset
WO2007038229A3 (en) * 2005-09-23 2008-01-10 Chicago Mercantile Exchange Non-indexed in-memory data storage and retrieval
US20090299914A1 (en) * 2005-09-23 2009-12-03 Chicago Mercantile Exchange Inc. Publish and Subscribe System Including Buffer
US8984033B2 (en) * 2005-09-23 2015-03-17 Chicago Mercantile Exchange, Inc. Non-indexed in-memory data storage and retrieval
US7640264B1 (en) * 2005-09-23 2009-12-29 Emc Corporation System and methods for querying a report database
US8095452B2 (en) 2005-09-23 2012-01-10 Chicago Mercantile Exchange Inc. Live alerts
US20070073634A1 (en) * 2005-09-23 2007-03-29 Chicago Mercantile Exchange Non-indexed in-memory data storage and retrieval
US20090187511A1 (en) * 2005-09-23 2009-07-23 Chicago Mercantile Exchange Inc. Live alerts
US8200563B2 (en) 2005-09-23 2012-06-12 Chicago Mercantile Exchange Inc. Publish and subscribe system including buffer
US8407133B2 (en) 2005-09-23 2013-03-26 Chicago Mercantile Exchange Inc. Live alerts
US8244626B2 (en) 2005-09-23 2012-08-14 Chicago Mercantile Exchange Inc. Live alerts
US8549442B2 (en) * 2005-12-12 2013-10-01 Sony Computer Entertainment Inc. Voice and video control of interactive electronically simulated environment
US20070139443A1 (en) * 2005-12-12 2007-06-21 Sonny Computer Entertainment Inc. Voice and video control of interactive electronically simulated environment
US8510338B2 (en) 2006-05-22 2013-08-13 International Business Machines Corporation Indexing information about entities with respect to hierarchies
US8332366B2 (en) 2006-06-02 2012-12-11 International Business Machines Corporation System and method for automatic weight generation for probabilistic matching
US8321383B2 (en) 2006-06-02 2012-11-27 International Business Machines Corporation System and method for automatic weight generation for probabilistic matching
US8589415B2 (en) 2006-09-15 2013-11-19 International Business Machines Corporation Method and system for filtering false positives
US8370366B2 (en) 2006-09-15 2013-02-05 International Business Machines Corporation Method and system for comparing attributes such as business names
US8356009B2 (en) 2006-09-15 2013-01-15 International Business Machines Corporation Implementation defined segments for relational database systems
US10891281B2 (en) 2006-10-05 2021-01-12 Splunk Inc. Storing events derived from log data and performing a search on the events and data that is not log data
US9002854B2 (en) * 2006-10-05 2015-04-07 Splunk Inc. Time series search with interpolated time stamp
US11550772B2 (en) 2006-10-05 2023-01-10 Splunk Inc. Time series search phrase processing
US9747316B2 (en) 2006-10-05 2017-08-29 Splunk Inc. Search based on a relationship between log data and data from a real-time monitoring environment
US9922065B2 (en) * 2006-10-05 2018-03-20 Splunk Inc. Determining timestamps to be associated with events in machine data
US10678767B2 (en) 2006-10-05 2020-06-09 Splunk Inc. Search query processing using operational parameters
US20120117079A1 (en) * 2006-10-05 2012-05-10 Splunk Inc. Time series search engine
US9594789B2 (en) * 2006-10-05 2017-03-14 Splunk Inc. Time series search in primary and secondary memory
US9922066B2 (en) 2006-10-05 2018-03-20 Splunk Inc. Aggregation and display of search results from multi-criteria search queries on event data
US11947513B2 (en) 2006-10-05 2024-04-02 Splunk Inc. Search phrase processing
US9922067B2 (en) 2006-10-05 2018-03-20 Splunk Inc. Storing log data as events and performing a search on the log data and data obtained from a real-time monitoring environment
US9514175B2 (en) 2006-10-05 2016-12-06 Splunk Inc. Normalization of time stamps for event data
US9928262B2 (en) 2006-10-05 2018-03-27 Splunk Inc. Log data time stamp extraction and search on log data real-time monitoring environment
US20130060783A1 (en) * 2006-10-05 2013-03-07 Splunk Inc. Time series search engine
US11537585B2 (en) 2006-10-05 2022-12-27 Splunk Inc. Determining time stamps in machine data derived events
US11526482B2 (en) 2006-10-05 2022-12-13 Splunk Inc. Determining timestamps to be associated with events in machine data
US10242039B2 (en) 2006-10-05 2019-03-26 Splunk Inc. Source differentiation of machine data
US20160070736A1 (en) * 2006-10-05 2016-03-10 Splunk Inc. Determining Timestamps To Be Associated With Events In Machine Data
US10977233B2 (en) * 2006-10-05 2021-04-13 Splunk Inc. Aggregating search results from a plurality of searches executed across time series data
US11249971B2 (en) 2006-10-05 2022-02-15 Splunk Inc. Segmenting machine data using token-based signatures
US20150149480A1 (en) * 2006-10-05 2015-05-28 Splunk Inc. Time series search in primary and secondary memory
US11144526B2 (en) 2006-10-05 2021-10-12 Splunk Inc. Applying time-based search phrases across event data
US10262018B2 (en) 2006-10-05 2019-04-16 Splunk Inc. Application of search policies to searches on event data stored in persistent data structures
US10740313B2 (en) 2006-10-05 2020-08-11 Splunk Inc. Storing events associated with a time stamp extracted from log data and performing a search on the events and data that is not log data
US10747742B2 (en) 2006-10-05 2020-08-18 Splunk Inc. Storing log data and performing a search on the log data and data that is not log data
US10255312B2 (en) 2006-10-05 2019-04-09 Splunk Inc. Time stamp creation for event data
US10216779B2 (en) 2006-10-05 2019-02-26 Splunk Inc. Expiration of persistent data structures that satisfy search queries
US9996571B2 (en) 2006-10-05 2018-06-12 Splunk Inc. Storing and executing a search on log data and data obtained from a real-time monitoring environment
US20180157693A1 (en) * 2006-10-05 2018-06-07 Splunk Inc. Aggregating Search Results from a Plurality of Searches Executed Across Time Series Data
US11561952B2 (en) 2006-10-05 2023-01-24 Splunk Inc. Storing events derived from log data and performing a search on the events and data that is not log data
US8990184B2 (en) * 2006-10-05 2015-03-24 Splunk Inc. Time series search engine
US20080114733A1 (en) * 2006-11-14 2008-05-15 Microsoft Corporation User-structured data table indexing
US8359339B2 (en) 2007-02-05 2013-01-22 International Business Machines Corporation Graphical user interface for configuration of an algorithm for the matching of data records
US20110010346A1 (en) * 2007-03-22 2011-01-13 Glenn Goldenberg Processing related data from information sources
US8515926B2 (en) 2007-03-22 2013-08-20 International Business Machines Corporation Processing related data from information sources
US8321393B2 (en) 2007-03-29 2012-11-27 International Business Machines Corporation Parsing information in data records and in different languages
US8429220B2 (en) 2007-03-29 2013-04-23 International Business Machines Corporation Data exchange among data sources
US8423514B2 (en) 2007-03-29 2013-04-16 International Business Machines Corporation Service provisioning
US8370355B2 (en) 2007-03-29 2013-02-05 International Business Machines Corporation Managing entities within a database
US20080288527A1 (en) * 2007-05-16 2008-11-20 Yahoo! Inc. User interface for graphically representing groups of data
US8122056B2 (en) 2007-05-17 2012-02-21 Yahoo! Inc. Interactive aggregation of data on a scatter plot
US7739229B2 (en) 2007-05-22 2010-06-15 Yahoo! Inc. Exporting aggregated and un-aggregated data
US7756900B2 (en) * 2007-05-22 2010-07-13 Yahoo!, Inc. Visual interface to indicate custom binning of items
US20080294595A1 (en) * 2007-05-22 2008-11-27 Yahoo! Inc. Visual interface to indicate custom binning of items
US20080307189A1 (en) * 2007-06-11 2008-12-11 Microsoft Corporation, Data partitioning via bucketing bloom filters
US7743013B2 (en) 2007-06-11 2010-06-22 Microsoft Corporation Data partitioning via bucketing bloom filters
US7840585B2 (en) 2007-06-12 2010-11-23 Microsoft Corporation DISCOSQL: distributed processing of structured queries
US20080313161A1 (en) * 2007-06-12 2008-12-18 Microsoft Corporation Discosql: distributed processing of structured queries
US9286374B2 (en) 2007-09-28 2016-03-15 International Business Machines Corporation Method and system for indexing, relating and managing information about entities
US8799282B2 (en) 2007-09-28 2014-08-05 International Business Machines Corporation Analysis of a system for matching data records
US8713434B2 (en) 2007-09-28 2014-04-29 International Business Machines Corporation Indexing, relating and managing information about entities
WO2009042941A1 (en) * 2007-09-28 2009-04-02 Initiate Systems, Inc. Method and system for analysis of a system for matching data records
US10698755B2 (en) 2007-09-28 2020-06-30 International Business Machines Corporation Analysis of a system for matching data records
US20090089630A1 (en) * 2007-09-28 2009-04-02 Initiate Systems, Inc. Method and system for analysis of a system for matching data records
US8417702B2 (en) 2007-09-28 2013-04-09 International Business Machines Corporation Associating data records in multiple languages
US9600563B2 (en) 2007-09-28 2017-03-21 International Business Machines Corporation Method and system for indexing, relating and managing information about entities
US20090089071A1 (en) * 2007-10-02 2009-04-02 Chicago Mercantile Exchange, Inc. Compressed non-indexed data storage
US20090327319A1 (en) * 2008-06-25 2009-12-31 International Business Machines Corporation Hybrid data model and user interaction for data sets in a user interface
US7987192B2 (en) * 2008-06-25 2011-07-26 International Business Machines Corporation Hybrid data model and user interaction for data sets in a user interface
US9824158B2 (en) 2009-01-26 2017-11-21 International Business Machines Corporation Generating a representative view for a multipage compilation of information
US10769232B2 (en) 2009-01-26 2020-09-08 International Business Machines Corporation Generating a representative view for a multipage compilation of information
US20100192087A1 (en) * 2009-01-26 2010-07-29 Thomas Stanton Brugler System, method, and computer program product for generating a representative view for a multipage compilation of information
US9589066B2 (en) * 2009-01-26 2017-03-07 International Business Machines Corporation System, method, and computer program product for generating a representative view for a multipage compilation of information
US7958137B2 (en) 2009-02-03 2011-06-07 Honeywell International Inc. Method to assist user in creation of highly inter-related models in complex databases
US20100198856A1 (en) * 2009-02-03 2010-08-05 Honeywell International Inc. Method to assist user in creation of highly inter-related models in complex databases
US8793292B1 (en) * 2009-10-23 2014-07-29 Lockheed Martin Corporation System and method for computer simulations
US10423595B2 (en) 2012-05-18 2019-09-24 Splunk Inc. Query handling for field searchable raw machine data and associated inverted indexes
US10409794B2 (en) 2012-05-18 2019-09-10 Splunk Inc. Directly field searchable and indirectly searchable by inverted indexes raw machine datastore
US10402384B2 (en) 2012-05-18 2019-09-03 Splunk Inc. Query handling for field searchable raw machine data
US10387396B2 (en) 2013-01-31 2019-08-20 Splunk Inc. Collection query driven generation of summarization information for raw machine data
US11163738B2 (en) 2013-01-31 2021-11-02 Splunk Inc. Parallelization of collection queries
US10685001B2 (en) 2013-01-31 2020-06-16 Splunk Inc. Query handling using summarization tables
US10318541B2 (en) 2013-04-30 2019-06-11 Splunk Inc. Correlating log data with performance measurements having a specified relationship to a threshold value
US11119982B2 (en) 2013-04-30 2021-09-14 Splunk Inc. Correlation of performance data and structure data from an information technology environment
US10877987B2 (en) 2013-04-30 2020-12-29 Splunk Inc. Correlating log data with performance measurements using a threshold value
US10353957B2 (en) 2013-04-30 2019-07-16 Splunk Inc. Processing of performance data and raw log data from an information technology environment
US10346357B2 (en) 2013-04-30 2019-07-09 Splunk Inc. Processing of performance data and structure data from an information technology environment
US10997191B2 (en) 2013-04-30 2021-05-04 Splunk Inc. Query-triggered processing of performance data and log data from an information technology environment
US10592522B2 (en) 2013-04-30 2020-03-17 Splunk Inc. Correlating performance data and log data using diverse data stores
US10225136B2 (en) 2013-04-30 2019-03-05 Splunk Inc. Processing of log data and performance data obtained via an application programming interface (API)
US11782989B1 (en) 2013-04-30 2023-10-10 Splunk Inc. Correlating data based on user-specified search criteria
US10614132B2 (en) 2013-04-30 2020-04-07 Splunk Inc. GUI-triggered processing of performance data and log data from an information technology environment
US10877986B2 (en) 2013-04-30 2020-12-29 Splunk Inc. Obtaining performance data via an application programming interface (API) for correlation with log data
US10019496B2 (en) 2013-04-30 2018-07-10 Splunk Inc. Processing of performance data and log data from an information technology environment by using diverse data stores
US11250068B2 (en) 2013-04-30 2022-02-15 Splunk Inc. Processing of performance data and raw log data from an information technology environment using search criterion input via a graphical user interface
US11604782B2 (en) 2015-04-23 2023-03-14 Splunk, Inc. Systems and methods for scheduling concurrent summarization of indexed data
US10229150B2 (en) 2015-04-23 2019-03-12 Splunk Inc. Systems and methods for concurrent summarization of indexed data
US11379530B2 (en) * 2017-01-31 2022-07-05 Splunk Inc. Leveraging references values in inverted indexes to retrieve associated event records comprising raw machine data
US20220365932A1 (en) * 2017-01-31 2022-11-17 Splunk Inc. Pipelined search query, leveraging reference values of an inverted index to access a set of event data and performing further queries on associated raw data
US20180218037A1 (en) * 2017-01-31 2018-08-02 Splunk Inc. Using an inverted index in a pipelined search query to determine a set of event data that is further limited by filtering and/or processing of subsequent query pipestages
US10474674B2 (en) * 2017-01-31 2019-11-12 Splunk Inc. Using an inverted index in a pipelined search query to determine a set of event data that is further limited by filtering and/or processing of subsequent query pipestages
US11809384B2 (en) * 2017-03-06 2023-11-07 Microsoft Technology Licensing, Llc Optimized data storage for fast retrieval
US20190087156A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Sorting a table in analytical databases
US11003673B2 (en) 2018-09-06 2021-05-11 Microsoft Technology Licensing, Llc Dynamic level group creation
US11238044B2 (en) 2019-05-06 2022-02-01 International Business Machines Corporation Candidate data record prioritization for match processing
US11397715B2 (en) 2019-07-31 2022-07-26 International Business Machines Corporation Defining indexing fields for matching data entities
US20220391059A1 (en) * 2020-08-25 2022-12-08 Beijing Bytedance Network Technology Co., Ltd. Method and apparatus for displaying active friend information, electronic device, and storage medium
US11789950B1 (en) * 2020-10-19 2023-10-17 Splunk Inc. Dynamic storage and deferred analysis of data stream events

Also Published As

Publication number Publication date
US7899844B2 (en) 2011-03-01
US20070244849A1 (en) 2007-10-18

Similar Documents

Publication Publication Date Title
US6907422B1 (en) Method and system for access and display of data from large data sets
US6879976B1 (en) Data indexing using bit vectors
US7574423B2 (en) Partial data model exposure through client side caching
US6353830B1 (en) Graphical interface for object-relational query builder
US6618732B1 (en) Database query handler supporting querying of textual annotations of relations between data objects
Raman et al. Online dynamic reordering for interactive data processing
US5991765A (en) System and method for storing and manipulating data in an information handling system
US7747640B2 (en) Method for regenerating selected rows for an otherwise static result set
US6480848B1 (en) Extension of data definition language (DDL) capabilities for relational databases for applications issuing DML and DDL statements
US9639542B2 (en) Dynamic mapping of extensible datasets to relational database schemas
US20040148273A1 (en) Method, system, and program for optimizing database query execution
US7774318B2 (en) Method and system for fast deletion of database information
US11176105B2 (en) System and methods for providing a schema-less columnar data store
US20040015486A1 (en) System and method for storing and retrieving data
IL147694A (en) Database system for viewing effects of changes to an index for a query optimization
US9128969B2 (en) Columnwise storage of point data
US6775676B1 (en) Defer dataset creation to improve system manageability for a database system
US20180060362A1 (en) Method and system for implementing distributed lobs
US10482110B2 (en) Columnwise range k-nearest neighbors search queries
US7979412B2 (en) Object query over previous query results
Raman et al. Online dynamic reordering
US20080215539A1 (en) Data ordering for derived columns in a database system
US20050125387A1 (en) Method of joining data and its metadata using dynamic metadata in relational database
US7765207B2 (en) Fast rich application view initiation
US20100205197A1 (en) Two-valued logic database management system with support for missing information

Legal Events

Date Code Title Description
AS Assignment

Owner name: SIEBEL SYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PREDOVIC, DANIEL THOMAS;REEL/FRAME:012413/0780

Effective date: 20011214

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: ORACLE AMERICA, INC., CALIFORNIA

Free format text: MERGER;ASSIGNOR:SIEBEL SYSTEMS, INC.;REEL/FRAME:037702/0516

Effective date: 20150522

FPAY Fee payment

Year of fee payment: 12