USRE41440E1 - Gathering enriched web server activity data of cached web content - Google Patents

Gathering enriched web server activity data of cached web content Download PDF

Info

Publication number
USRE41440E1
USRE41440E1 US12/437,581 US43758109A USRE41440E US RE41440 E1 USRE41440 E1 US RE41440E1 US 43758109 A US43758109 A US 43758109A US RE41440 E USRE41440 E US RE41440E
Authority
US
United States
Prior art keywords
activity data
network element
obtaining enriched
server
enriched activity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US12/437,581
Inventor
Paul Roger Briscoe
Cameron Donald Ferstat
Matthew Robert Ganis
Stephen Carl Hammer
Gary Bob Kip Hansen
Sean Alan Harp
Michael Shannon Nichols
Herbert Daniel Pearthree
Paul Reed
Brian James Snitzer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=24572627&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=USRE41440(E1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/437,581 priority Critical patent/USRE41440E1/en
Application granted granted Critical
Publication of USRE41440E1 publication Critical patent/USRE41440E1/en
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/87Monitoring of transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/875Monitoring of systems including the internet
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/885Monitoring specific for caches

Definitions

  • the present invention relates generally to client-server computer systems and, more specifically, to information access requests to a web site server over a global communications network.
  • HyperText Markup Language HTML
  • Hypertext and universality are two essential features of HTML.
  • Hypertext means that a programmer can create a link on a web page that leads the visitor to any other web page or to practically anything else on the Internet.
  • Hypertext enables information on the web to be accessed from many different directions.
  • Universality means that because HTML documents are saved as ASCII or text only files, virtually any computer can read a web page.
  • HTML lets the web designer format text, add graphics, sound, and video, and save it all in a text or an American Standard Code for Information Interchange (ASCII) file that any computer can read.
  • ASCII American Standard Code for Information Interchange
  • the key to HTML is in the tags, which are key words enclosed between less than ( ⁇ ) and greater than (>) signs, that indicate the type of content coming up next. While practically any computer can display web pages, how those pages actually look depends on the type of computer, the monitor, the speed of the Internet connection, and the browser software used to view the page.
  • HTML tags are commands written between angle brackets ( ⁇ >) that indicate how the browser should display the text. Examples of HTML tags are BASE, FORM, FRAME, IMG and SCRIPT. There are opening and closing versions for many tags and the affected text is contained within the two tags. The opening and closing tags use the same command word; the closing tag carries an initial forward slash (/) symbol. Many tags have special attributes that offer a variety of options for the contained text. The attribute is entered between the command word and the final angle bracket.
  • a series of attributes can be used in a single tag just by writing one after the other, in any order, with a space separating each one.
  • the attributes in turn, often have values. In some cases, a selection of value is made from a small group of choices.
  • Other attributes are more strict about the type of values they accept. Examples of attributes are HREF, SRC, ACCESS-KEY and VALUE.
  • a web page is nothing more than a text document written with HTML tags. Like any other text document, web pages have a file name that identifies the documents to the web site designer, the web site visitors, and a visitor's web browser.
  • Uniform Resource Locators URLs contain information about where a file is located and what a browser should do with it. Each file on the Internet has a unique URL. The first part of the URL is called the scheme. It tells the browser how to deal with the file that it is about to open. One of the most common schemes to access web pages is HypterText Transfer Protocol (HTTP). The second part of the URL is the name of a server where the file is located followed by the path that leads to the file and the file name.
  • HTTP HypterText Transfer Protocol
  • a URL ends in a trailing forward slash with no file name given.
  • the URL refers to the default file in the last directory in the path (i.e., index.html), which generally corresponds to the home page.
  • index.html the domain name is “census.rolandgarros.org”. This is the specific host computer on which corresponding web pages reside.
  • the next segment of the URL is the directory (“rc”) and subdirectory “images”) on the host computer that contains a specific web site.
  • the last segment of the URL represented by the ellipsis mark, is the filename of the specific web page being requested.
  • URLs can be either absolute or relative.
  • An absolute URL shows the entire path to the file, including the scheme, server name, the complete path, and the file name itself.
  • a relative URL describes the location of the desired file with reference to the location of the file that contains the URL itself. The relative URL for a file that is in the same directory as the current file is simply the file name and extension.
  • the browser running on a client computer may request and download numerous files from a web site server.
  • the number of object access requests (“hits”) stored in the web site server's access log will typically exceed the number of distinct client sessions in which clients are accessing information on the web site, reducing the accuracy of the access log.
  • Caching is the technique of keeping frequently accessed information in a location close to the requester.
  • a web cache stores web pages and content on a storage device that is physically or logically closer to the user. This access to stored web content is closer and faster than a web lookup.
  • ISPs Internet Service Providers
  • the two key benefits of web caching are cost savings due to the reduction of WAN bandwidth and improved productivity for end users resulting from quicker access.
  • ISPs can place cache engines at strategic points on their networks to improve response times and lower the bandwidth demand on their backbones.
  • ISPs can station cache engines at strategic WAN access points to serve web requests from local storage, rather than from a distant or overburdened web server.
  • the dramatic reduction in bandwidth usage due to web caching allows a lower bandwidth WAN link to service the user base.
  • the organization can add users or add more services that make use of the free bandwidth on the existing WAN link.
  • the response of the local web cache is almost three times faster than the download time for the same content over the wide area network. Therefore, users see dramatic improvements in response times, and the implementation of web caching is completely transparent to them.
  • Web caching offers other benefits including access control, monitoring and operational logging.
  • the cache engine provides network administrators with a simple, secure method to enforce a sitewide access policy through Uniform Resource Locator (URL) filtering. Network administrators can learn which URLs receive hits, the number of hits per second the cache is serving, the percentage of URLs that are served from the cache, along with other related operational statistics.
  • URL Uniform Resource Locator
  • Web caching starts by an end user accessing a web page over the Internet. While the page is being transmitted to the end user, the caching system saves the page and all of its associated graphics on local storage. The page content is now cached. Another user, or the original user can then access the web page at a later time, but instead of sending the request over the Internet to the web server, the web cache system delivers the web page from local storage. This process speeds download times for the user, and reduces the bandwidth demand on the WAN link. Updating of the cache data can occur in a number of ways depending upon the design of the web cache system.
  • Web caching can be a major problem for publishers of web content. For example, a publisher can gather an inaccurate number of hits if some of the visitors access web content already in a caching server. Furthermore, if a caching server doesn't update content promptly, it can return expired or stale content to users.
  • the single-pixel transparent GIF (Graphic Interchange Format) is the most flexible tool in a web designer's toolbox.
  • the use of a transparent GIF is a way to discretely control the layout of text and graphics on the web page. No matter where the transparent GIF is placed on the page, it will remain unseen with all background graphics and fills remaining untouched.
  • the single pixel clear GIF has been used before, but the data has not been enriched such that it can be used as a surrogate for the complete set of log records.
  • the present invention enriches the information recorded in the web logs for the uncacheable single pixel clear GIF by appending additional information to it as Common Gateway Interface (CGI) query string parameters.
  • CGI Common Gateway Interface
  • FIG. 1 illustrates an implementation of web cache engines over a global communications network.
  • FIG. 2 illustrates an exemplary implementation of the uncacheable single pixel GIF with CGI query string parameters added to enrich information recorded in web logs.
  • FIG. 3 illustrates the processing logic for handling client requests for web pages utilizing the single pixel transparent GIF in accordance with a preferred embodiment of the present invention.
  • FIG. 4 illustrates a site level analysis display that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • FIG. 5 illustrates an exemplary display of referral categories that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • FIG. 6 illustrates an exemplary display of referral category for search engines and directories that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • FIG. 7 illustrates an exemplary display of the referral results for a specific search engine that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • FIG. 8 illustrates exemplary content categories for various web pages that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • FIG. 9 illustrates an exemplary content category for a home page that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • FIG. 10 illustrates an exemplary display of the available saved reports that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • FIGS. 11A-11M illustrate various available saved reports that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
  • Web server software typically collects and saves information pertaining to each HTTP request, including date and time, the originating Internet Protocol (IP) address, the object requested, and the completion status of the request.
  • IP Internet Protocol
  • the logs are analyzed on a periodic basis to determine the traffic through the server in terms of hits, the number of pages served, and the level of demand for pages of interest during each period.
  • Internet browser applications allow an individual user to cache web pages on his local hard disk.
  • a user can configure the amount of disk space devoted to caching. The first time a user views a website, that content is saved as files in a subdirectory on that computer's hard disk. The next time the user points to this website, the browser gets the content from the cache without accessing the network. Certain elements of the page, including buttons, icons and images, appear much more quickly then they did the first time the page was opened.
  • Proxy servers are software applications that run on general-purpose hardware and operating systems.
  • a proxy server is placed on hardware that is physically between a web browser client application and a web server.
  • the proxy server acts as a gatekeeper that receives all the packets destined for the web server and examines each packet to determine whether it can fulfill the request itself. If the proxy cannot fulfill the request itself, it forwards the request to the web server.
  • Proxy servers can be used to filter requests, e.g., to prevent employees from accessing specific websites.
  • proxy servers are not optimized for caching and can fail under a heavy network load. Traffic is slowed to allow the proxy servers to examine each packet, and the failure of the proxy software or hardware causes all users to lose network access. Furthermore, proxy servers require configuration of each end-user's browser, which is an unacceptable option for ISPs and large enterprises. Because of these shortcomings of proxy servers, applications that create network caches have become popular. These caching-focused software applications are designed to improve performance by enhancing the caching software and eliminating the other slow aspects of proxy server implementations. Because a proxy server is run under a general purpose operating system that involves very high per-process context overhead, they are not easily scaleable to large numbers of simultaneous processes.
  • Networking product vendors offer cache engines as a single purpose network appliance that stores and retrieves content using caching and retrieval algorithms. Such cache engines are dedicated solely to content management and delivery. Since only web requests are routed to the cache engine, no other user traffic is affected by the caching process. For non-web traffic, the router functions entirely in its traditional role. The communications between a cache engine and a router is defined by a cache control protocol. Under this protocol, the router directs only web requests to the cache engine rather than to the intended server. With a cache engine, a client requests web content in the usual manner. A router running a cache control protocol intercepts Transmission Control Protocol (TCP) port 80 web traffic and routes it to the cache engine. The client is not involved in the transaction, and no changes to the client or browser are required.
  • TCP Transmission Control Protocol
  • the cache engine If the cache engine does not have the requested content, it sends the request to the Internet or Intranet in the usual fashion.
  • the content is returned to and stored at the cache engine.
  • the cache engine returns the content to the client.
  • the cache engine fulfills the requests from local storage.
  • FIG. 1 illustrates an implementation of web cache engines over a global communications network such as the Internet.
  • a client computer 12 , 14 , 16 can request web content via a router 18 .
  • the router 18 intercepts TCP Port 80 web traffic and routes it to the local cache engine 20 .
  • the client 12 , 14 , 16 is not involved in this transaction and no changes to the client computer or browser are required. If the cache engine 20 does not have the requested content, it sends the request via router 18 to the Internet to access an Internet content server 40 , 42 , 44 . The content is returned to, and stored at, the cache engine 20 . The cache engine 20 then returns the requested content to the client computer 12 , 14 , 16 via the router 18 .
  • Several cache engines 32 , 34 , 36 can be placed in a cache farm in a hierarchical fashion at an Internet Service Provider (ISP) site 30 .
  • ISP Internet Service Provider
  • Requests from clients 12 , 14 , 16 directed through router 18 and ISP server 30 are diverted to the cache farm 32 , 34 , 36 to fulfill the client request from its storage. If the cache engines 32 , 34 , 36 are unable to fulfill the request from local storage, a normal web request is made via ISP server 30 over the Internet 50 to an appropriate server 40 , 42 , 44 for the requested Internet content.
  • routers 26 , 46 are also shown connected to ISP server 30 . Routers 18 , 26 , 46 are frequently referred to as Points-of-Presence (POPs).
  • POPs Points-of-Presence
  • a POP usually includes routers, digital/analog call aggregators, servers and frequently frame relay or Asynchronous Transfer Mode (ATM) switches. Shown connected to router 46 is cache engine 48 . Connected to router 26 is cache engine 28 and router 24 . Router 24 is connected to a corporate intranet 22 .
  • ATM Asynchronous Transfer Mode
  • the cache engine operates transparently to clients. Clients do not need to configure their browsers to be in proxy server mode. In addition, the operation of the cache engine is transparent to the network. The router operates entirely in its normal role for non-web traffic.
  • a web object can contain a Hypertext Transfer Protocol (HTTP) header to instruct a browser in a caching server how to cache the web object.
  • HTTP Hypertext Transfer Protocol
  • the expiration header can be set to “no expiration” so that caching servers can keep the image in the cache forever.
  • a small image object can be added to the page with the object set to expire immediately, so the caching server won't cache the object. Then, every time a user requests that page, the browser or caching server will retrieve the object from the original web server, and the web server can then count the exact number of requests.
  • CGI Common Gateway Interface
  • protocol protocol
  • CGI is simply a standardized way for sending information between the server and the script.
  • the CGI script is a program that communicates with the server in a standard way.
  • the supported information servers are HTTP servers.
  • Each CGI server implementation must define a mechanism to pass data about the request from the server to the script.
  • Each element on a web page form will have a name and value associated with it.
  • the name identifies the data being sent.
  • the value is the data and can either come from the web page designer or from the visitor who types it in a field.
  • the submit button When a visitor clicks the submit button, the name-value pair of each form element is sent to the server.
  • CGI scripts generally have two functions. The first is to take all the name-value pairs and separate them out into individual intelligible pieces. The second is to actually do something with that data, such as printing it out, multiplying fields together, sending an email confirmation, or storing it on a server.
  • the form has three important parts: the form tag, which includes the URL of the CGI script that will process the form; the form elements, such as fields and menus; and the submit button which sends the data to the CGI script on the server.
  • Scripts are little programs that add interactivity to a web page. Simple scripts can be written to add an alert box or some text to the web page; more complicated scripts can be written that load particular pages according to the visitor's browser or that change a frame's background color depending on the visitor's mouse clicks. Most scripts are written in a scripting language called JavaScript that is supported by most browsers, including Netscape Communicator and Microsoft Internet Explorer.
  • JavaScript is an object-oriented language, which means that it works by manipulating objects on a web page, such as windows, images and documents.
  • JavaScript commands are put directly into the HTML file that creates a web page. Depending on the script being run, the commands can be placed into several parts of the file. The commands are frequently placed near the top of the file.
  • JavaScript is an interpreted language, which means its commands are executed by the browser in the order in which the browser reads them.
  • JavaScript works by taking actions on objects. These actions are called methods. In the basic syntax of JavaScript, the object is first named, and then a period appears follows by the action taken on the object, i.e., the method. So the command to open a new window in JavaScript is window.open. In this instance, window is the object and open is the method. This command opens a new browser window.
  • Other parameters can be added after the command. All the parameters are placed inside one set of parenthesis, with each individual parameter inside quotation marks, with the parameters separated by commas.
  • An automatic script is executed by the client browser when the web page is loaded. There is no limit to the number of automatic scripts that can be on a web page. The location of the script on the HTML page determines when the script will load. Scripts are loaded in the order in which they appear in an HTML document. An automatic Java Script is added to an HTML document by the following HTML code:
  • a CGI string of data is appended to the SRC attribute for the single pixel GIF at the time the page is published, as follows:
  • the persistent cookie identification of the user's cookie can be appended to the CGI string of data as follows:
  • FIG. 2 illustrates an example of an implementation of the single pixel GIF with the addition of query string parameters to act as a surrogate for the complete set of log records that would have been created had the page content not been cached.
  • the Java Script statements are embedded directly on the HTML page. It includes a document object with a write method (“document.write”).
  • the document object contains information on the current document and provides methods for displaying the HTML expressions to the user in a specified window.
  • the IMG and BR tags are the HTML expressions that are displayed in the window.
  • the BR CLEAR tag and attribute simply create a line break and stop text wrap.
  • the SRC attribute following the IMG tag provides the absolute URL of the page containing the single pixel clear GIF (“uc.GIF”); i.e.,
  • the NOSCRIPT tag demarcates the HTML statements to be interpreted by the browser. This includes the IMG tag wherein the SRC attribute has a query string after “uc.GIF” that is modified to include the default URL of the HTML page (i.e., “index.html”).
  • the index.html file is the default file for the top level directory on the web site.
  • web sites need a host computer and server software that runs on the host.
  • the host manages the communications, protocols, and houses the pages and related software required to create a website on the Internet.
  • the server software resides on the host and serves up the pages, and otherwise acts on the requests sent by the client's browser software.
  • the server handles the HTTP requests and communications with the host operating system, which in turn handles the TCP/IP communications.
  • server software There are different types of server software that perform different types of services for different types of clients.
  • a web server is an HTTP server and its function is to send information to the client software (browser) using the HyperText Transfer Protocol.
  • the client browser requests that the server return an HTML document.
  • the server receives this request and sends back a response.
  • the top portion of the response includes transmission information and the rest of the response is the HTML file.
  • a web server also passes requests to run CGI scripts to CGI applications. These scripts run external mini-programs, such as a database lookup or interactive forms processing.
  • the server sends the script to the application via CGI and communicates the script back to the browser.
  • the server software also includes configuration files and utilities to secure and manage the website in a variety of ways.
  • FIG. 3 illustrates the processing logic of the present invention.
  • the process starts in start block 300 .
  • the client browser software requests an HTML web page.
  • the client browser determines if the requested HTML page has been cached at the client in decision block 302 . If the page has been cached at the client, then the HTML file is delivered to the browser as indicated in logic block 310 .
  • the browser interprets the HTML file and builds the web page with source (i.e., from the origin web server) or cached images.
  • the cached images can be available locally or at an ISP, or at a router or other network device along the path.
  • decision block 304 it is determined that the page is not cached at the client, then another test is performed in decision block 306 to determine if the page has been cached at an ISP.
  • the ISP cache test is intended to be illustrative of an embodiment of the invention.
  • the next hop from the client can be to a server on an intranet which has a TCP/IP address and provides direct Internet access. If the page has been cached along the path, then, as indicated in logic block 312 , the HTML file is delivered to the client browser to interpret the HTML code and build the web page with images that have been cached or retrieved from the origin web server.
  • the request for the page is transmitted to the host where the web server software processes the request as indicated in logic block 308 . If the browser has requested an HTML file, the web server retrieves the original source HTML file, attaches a header to the file, and send the file to the browser as indicated in logic block 314 .
  • a test is made in decision block 318 to determine if the HTML file contains an uncacheable single pixel GIF (represented by uc.GIF in this invention). If it does not, the retrieved cached images are displayed to complete the build of the web requested web page in logic block 316 . Processing of the request is then completed as indicated by termination block 326 . If, in decision block 318 , a uc.GIF request is found in the HTML file, then the uc.GIF and CGI query string are transmitted to the origin web server in logic block 320 where they are analyzed to gather the enriched web server activity data made possible by this invention.
  • uc.GIF uncacheable single pixel GIF
  • the browser again interprets the HTML code and builds the page with source or cached images.
  • 14 hits are recorded for the web page, including one for the transmitted uc.GIF request and 13 for the other source images that are retrieved based on the HTML IMG SRC tags/attributes in the HTML file. This represents the surrogate nature of using the uncacheable single pixels GIF requests.
  • the gathering and storing of this enriched web server activity data is indicated by logic block 322 .
  • the request processing then ends as indicated in termination block 324 .
  • the browser When a user visits a website, the browser examines the URL and looks into a cookie file stored on the client computer's hard drive. If the browser finds a cookie associated with that URL, it sends that cookie information to the server. If no cookie is associated with the URL, the server places a cookie inside the cookie file. Some sites may first ask a series of questions, such as name and password, and then will place a cookie on the hard disk with that information in it. This is typical of sites that require registration. Commonly, a GCI script on the server takes the information that the user has entered and then writes a cookie onto the client computer's hard disk. When the user leaves a web site, the cookie information remains on the hard disk so that the site can recognize the user the next time the user visits the web site, unless the cookie has specifically been written to expire when the user leaves the site.
  • On-Line Analytical Processing describes a class of technologies that are designed for live ad hoc data access and analysis. While transaction processing generally relies on relational databases, OLAP has become synonymous with multidimensional views of business data. These multidimensional views are supported by multidimensional database technology. OLAP applications are used by analysts who frequently want a higher level, aggregated view of the data, such as total sales by product line, by region, etc. The OLAP database is usually updated in batch mode, often from multiple sources, and provides an analytical backend to multiple user applications.
  • FIG. 4 illustrates an exemplary site level analysis display that can be derived from the collecting of accurate hit information using the single pixel GIF as a surrogate for the complete set of log records which would have been generated if the web page content had not been cached.
  • the figure depicts the various measurements that can be made for selected intervals of time and includes hits, pages visited, seconds per page view, visits, hits per visit, page views per visit, and seconds per visit.
  • FIG. 5 illustrates an exemplary referral categories display that can be generated from the use of the single pixel GIF to log information pertaining to the web page referral source.
  • the different referral categories include commercial, education, government, internal referrals, ISP referrals, and search engines and directories among them. Again, the data is presented for selected intervals of time (e.g., calendar weeks).
  • the various referral categories are underlined, which means that they can “drilled down” to sub-referral categories as illustrated in FIG. 6 .
  • FIG. 6 illustrates the breakdown of the search engines and directories referral category for the selected intervals of time based on the referrals made from common search engines or browsers. For example, during the week ending June 10 in which the peak number of page referrals occurred, over 71% were referred by the Yahoo search engine. Further drill down is possible into the search engine referral category as indicated by the underlined subcategories.
  • FIG. 7 illustrates a further drill down of the AltaVista referral subcategory.
  • the display shows that 84% of the referrals from AltaVista during the week ending June 3 originated from a CGI query string on the AltaVista home page. No further drill downs are possible in this referral subcategory.
  • FIG. 8 illustrates an exemplary display of web page by content categories that can be derived from the collecting of accurate hit information using the single pixel GIF as a surrogate for the complete set of log records which would have been generated if the web page content had not been cached.
  • the content categories include draws, home page, news and photos, players, scoreboard, and shop (gift shop) among other content categories. The data is presented for selected intervals of time.
  • the various content categories are underlined which means they can be drilled down to a lower level of detail.
  • FIG. 9 illustrates a drill down of the home page content category.
  • the resources include the English version home page (/en) accessible via a Java Script-enabled browser; the French version home page (/fr) accessible via a Java Script-enabled browser; the English version home page (/en/index.html) accessible from a browser that is not Java Script-enabled, etc.
  • FIG. 10 illustrates a display of exemplary saved reports that can be generated using OLAP processing of the surrogate log records created through the use of the single pixel GIF of this invention.
  • the saved reports include site level reports, visit distribution reports, traffic reports, content reports, domain/sub-domain reports etc. Each of the listed reports is underlined indicating that a detailed report is available simply by clicking on the report name.
  • FIGS. 11A-11M illustrate the format of the corresponding exemplary saved report.
  • FIG. 11A shows the site level report that is available.
  • the available site level report is a site traffic report.
  • the report name is underline indicating that a further drill down to a detailed report results from clicking on the report name. Such action would generate a display like that of FIG. 4 .
  • the available visit distribution reports are listed in the display of FIG. 11 B.
  • FIGS. 11C-11K and 11 M illustrate various saved reports that are basically “top 10 ” lists.
  • FIG. 11C depicts traffic reports and enables display of the top 10 requested resources.
  • FIG. 11D depicts content reports and enables display of the top 10 most requested pages.
  • FIG. 11C depicts traffic reports and enables display of the top 10 requested resources.
  • FIG. 11E depicts sub-domain reports and enables display of the top 10 sub-domains by either pages viewed or by number of visits.
  • FIG. 11F depicts domain reports and enables display of the top 10 domains by either pages viewed or by number of visits.
  • FIG. 11G depicts referral reports and enables display of the top 10 referrals by pages viewed or by number of visits.
  • FIG. 11H depicts entry page reports and enables display of the top 10 site entry pages.
  • FIG. 11I depicts exit page reports and enables display of the top 10 exit pages.
  • FIG. 11J depicts browser reports and enables display of the top 10 browsers by either pages viewed or by number of visits.
  • FIG. 11K depicts platform reports and enables display of the top 10 platforms by pages viewed of by the number of visits.
  • FIG. 11L depicts usage cluster reports and enables display of usage cluster visits.
  • FIG. 11M depicts ad reports and enables display of the top 10 ads by impression created. All of the available saved reports are presented for

Abstract

A method and system for gathering enriched web server activity data in a global communications network in which requested information files are cached at a plurality of network devices. With the prevalence of web caching on the Internet, the origin web servers do not serve the majority of requests for web site content. A single pixel clear Graphics Image Format (GIF) request is added to the HyperText Markup Language (HTML) source file for a web page. Appended to the GIF request is a Common Gateway Interface (CGI) string of data that contains enhanced web activity data information, including the number of images (“hits”) that have to be retrieved by a client browser to build the web page, and the referring identifier that resulted in access to the web page. The single pixel clear GIF request is not cacheable and results in the request being transmitted to the origin web server when the client browser interprets the HTML file. The enriched data is stored in log files at the origin web server to accumulate an accurate number of hits on the web page.

Description

This application is a reissue application of U.S. Pat. No. 7,216,149, issued May 8, 2007 on U.S. Ser. No. 09/641,495 filed Aug. 18, 2000.
BACKGROUND OF THE INVENTION
The present invention relates generally to client-server computer systems and, more specifically, to information access requests to a web site server over a global communications network.
All web pages are written with HyperText Markup Language (HTML). Hypertext and universality are two essential features of HTML. Hypertext means that a programmer can create a link on a web page that leads the visitor to any other web page or to practically anything else on the Internet. Hypertext enables information on the web to be accessed from many different directions. Universality means that because HTML documents are saved as ASCII or text only files, virtually any computer can read a web page. HTML lets the web designer format text, add graphics, sound, and video, and save it all in a text or an American Standard Code for Information Interchange (ASCII) file that any computer can read. The key to HTML is in the tags, which are key words enclosed between less than (<) and greater than (>) signs, that indicate the type of content coming up next. While practically any computer can display web pages, how those pages actually look depends on the type of computer, the monitor, the speed of the Internet connection, and the browser software used to view the page.
Advanced web designers often use a scripting language called JavaScript and a system of naming parts of the web page called the document object model (DOM), together with HTML to create dynamic content on a page. These effects are sometimes called dynamic HTML, or DHTML. HTML tags are commands written between angle brackets (< >) that indicate how the browser should display the text. Examples of HTML tags are BASE, FORM, FRAME, IMG and SCRIPT. There are opening and closing versions for many tags and the affected text is contained within the two tags. The opening and closing tags use the same command word; the closing tag carries an initial forward slash (/) symbol. Many tags have special attributes that offer a variety of options for the contained text. The attribute is entered between the command word and the final angle bracket. A series of attributes can be used in a single tag just by writing one after the other, in any order, with a space separating each one. The attributes in turn, often have values. In some cases, a selection of value is made from a small group of choices. Other attributes are more strict about the type of values they accept. Examples of attributes are HREF, SRC, ACCESS-KEY and VALUE.
A web page is nothing more than a text document written with HTML tags. Like any other text document, web pages have a file name that identifies the documents to the web site designer, the web site visitors, and a visitor's web browser. Uniform Resource Locators (URLs) contain information about where a file is located and what a browser should do with it. Each file on the Internet has a unique URL. The first part of the URL is called the scheme. It tells the browser how to deal with the file that it is about to open. One of the most common schemes to access web pages is HypterText Transfer Protocol (HTTP). The second part of the URL is the name of a server where the file is located followed by the path that leads to the file and the file name. Sometimes, a URL ends in a trailing forward slash with no file name given. In this case, the URL refers to the default file in the last directory in the path (i.e., index.html), which generally corresponds to the home page. For example, consider the web address “census.rolandgarros.org/rc/images/ . . .”. The domain name is “census.rolandgarros.org”. This is the specific host computer on which corresponding web pages reside. The next segment of the URL is the directory (“rc”) and subdirectory “images”) on the host computer that contains a specific web site. The last segment of the URL, represented by the ellipsis mark, is the filename of the specific web page being requested.
URLs can be either absolute or relative. An absolute URL shows the entire path to the file, including the scheme, server name, the complete path, and the file name itself. A relative URL describes the location of the desired file with reference to the location of the file that contains the URL itself. The relative URL for a file that is in the same directory as the current file is simply the file name and extension.
To view a single page, the browser running on a client computer, may request and download numerous files from a web site server. The number of object access requests (“hits”) stored in the web site server's access log will typically exceed the number of distinct client sessions in which clients are accessing information on the web site, reducing the accuracy of the access log.
Data networking is growing at a phenomenal rate. The number of web users is expected to increase by a factor of five over the next few years. The resulting uncontrolled growth of web access requirements is straining all attempts to meet the bandwidth demand. Additionally, although the volume of web traffic on the Internet is staggering, a large percentage of that traffic is redundant, i.e., multiple users at any given site request much of the same content. This means that a significant percentage of the wide area network (WAN) infrastructure carries the identical content and identical requests for accessing it daily. Web caching performs a local storage of web content to serve these redundant user requests more quickly, without sending the requests and the resulting content over the wide area network.
Caching is the technique of keeping frequently accessed information in a location close to the requester. A web cache stores web pages and content on a storage device that is physically or logically closer to the user. This access to stored web content is closer and faster than a web lookup. By reducing the amount of traffic on wide area network links and on already overburdened web servers, caching provides significant benefits to Internet Service Providers (ISPs), enterprise networks, and end users. The two key benefits of web caching are cost savings due to the reduction of WAN bandwidth and improved productivity for end users resulting from quicker access. ISPs can place cache engines at strategic points on their networks to improve response times and lower the bandwidth demand on their backbones. ISPs can station cache engines at strategic WAN access points to serve web requests from local storage, rather than from a distant or overburdened web server. In enterprise networks, the dramatic reduction in bandwidth usage due to web caching allows a lower bandwidth WAN link to service the user base. Alternatively, the organization can add users or add more services that make use of the free bandwidth on the existing WAN link. For the end user, the response of the local web cache is almost three times faster than the download time for the same content over the wide area network. Therefore, users see dramatic improvements in response times, and the implementation of web caching is completely transparent to them.
Web caching offers other benefits including access control, monitoring and operational logging. The cache engine provides network administrators with a simple, secure method to enforce a sitewide access policy through Uniform Resource Locator (URL) filtering. Network administrators can learn which URLs receive hits, the number of hits per second the cache is serving, the percentage of URLs that are served from the cache, along with other related operational statistics.
Web caching starts by an end user accessing a web page over the Internet. While the page is being transmitted to the end user, the caching system saves the page and all of its associated graphics on local storage. The page content is now cached. Another user, or the original user can then access the web page at a later time, but instead of sending the request over the Internet to the web server, the web cache system delivers the web page from local storage. This process speeds download times for the user, and reduces the bandwidth demand on the WAN link. Updating of the cache data can occur in a number of ways depending upon the design of the web cache system.
Web caching can be a major problem for publishers of web content. For example, a publisher can gather an inaccurate number of hits if some of the visitors access web content already in a caching server. Furthermore, if a caching server doesn't update content promptly, it can return expired or stale content to users.
SUMMARY OF THE INVENTION
Cache engines are becoming pervasive on the World Wide Web. As a result, the origin web servers do not serve or see the majority of the user requests for web site content. Packet sniffers will not see the requests either, as they are satisfied by cache engines elsewhere on the Internet. The technique of using a single pixel clear GIF (which is not cacheable) has been used to ensure that some record is recorded by the origin server for advertisements for some years. However, this solution only logs information about the request for the single pixel GIF file itself.
The single-pixel transparent GIF (Graphic Interchange Format) is the most flexible tool in a web designer's toolbox. The use of a transparent GIF is a way to discretely control the layout of text and graphics on the web page. No matter where the transparent GIF is placed on the page, it will remain unseen with all background graphics and fills remaining untouched. The single pixel clear GIF has been used before, but the data has not been enriched such that it can be used as a surrogate for the complete set of log records.
The present invention enriches the information recorded in the web logs for the uncacheable single pixel clear GIF by appending additional information to it as Common Gateway Interface (CGI) query string parameters. This enables the log record created by the request for the single pixel clear GIF to function as a “surrogate” for the complete set of log records which would have been created if the page content had not been cached.
DESCRIPTION OF THE DRAWINGS
The invention is better understood by reading the following detailed description of the invention in conjunction with the accompanying drawings, wherein:
FIG. 1 illustrates an implementation of web cache engines over a global communications network.
FIG. 2 illustrates an exemplary implementation of the uncacheable single pixel GIF with CGI query string parameters added to enrich information recorded in web logs.
FIG. 3 illustrates the processing logic for handling client requests for web pages utilizing the single pixel transparent GIF in accordance with a preferred embodiment of the present invention.
FIG. 4 illustrates a site level analysis display that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
FIG. 5 illustrates an exemplary display of referral categories that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
FIG. 6 illustrates an exemplary display of referral category for search engines and directories that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
FIG. 7 illustrates an exemplary display of the referral results for a specific search engine that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
FIG. 8 illustrates exemplary content categories for various web pages that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
FIG. 9 illustrates an exemplary content category for a home page that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
FIG. 10 illustrates an exemplary display of the available saved reports that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
FIGS. 11A-11M illustrate various available saved reports that can be generated based on the implementation of the single pixel transparent GIF of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
Web server software typically collects and saves information pertaining to each HTTP request, including date and time, the originating Internet Protocol (IP) address, the object requested, and the completion status of the request. The logs are analyzed on a periodic basis to determine the traffic through the server in terms of hits, the number of pages served, and the level of demand for pages of interest during each period.
Internet browser applications allow an individual user to cache web pages on his local hard disk. A user can configure the amount of disk space devoted to caching. The first time a user views a website, that content is saved as files in a subdirectory on that computer's hard disk. The next time the user points to this website, the browser gets the content from the cache without accessing the network. Certain elements of the page, including buttons, icons and images, appear much more quickly then they did the first time the page was opened.
To limit bandwidth demand caused by the uncontrolled growth of Internet use, software developers have developed applications that extend local caching to the network level. The two current types of network level caching products are proxy servers and network caches. Proxy servers are software applications that run on general-purpose hardware and operating systems. A proxy server is placed on hardware that is physically between a web browser client application and a web server. The proxy server acts as a gatekeeper that receives all the packets destined for the web server and examines each packet to determine whether it can fulfill the request itself. If the proxy cannot fulfill the request itself, it forwards the request to the web server. Proxy servers can be used to filter requests, e.g., to prevent employees from accessing specific websites. The problem with using proxy servers is that they are not optimized for caching and can fail under a heavy network load. Traffic is slowed to allow the proxy servers to examine each packet, and the failure of the proxy software or hardware causes all users to lose network access. Furthermore, proxy servers require configuration of each end-user's browser, which is an unacceptable option for ISPs and large enterprises. Because of these shortcomings of proxy servers, applications that create network caches have become popular. These caching-focused software applications are designed to improve performance by enhancing the caching software and eliminating the other slow aspects of proxy server implementations. Because a proxy server is run under a general purpose operating system that involves very high per-process context overhead, they are not easily scaleable to large numbers of simultaneous processes.
Networking product vendors offer cache engines as a single purpose network appliance that stores and retrieves content using caching and retrieval algorithms. Such cache engines are dedicated solely to content management and delivery. Since only web requests are routed to the cache engine, no other user traffic is affected by the caching process. For non-web traffic, the router functions entirely in its traditional role. The communications between a cache engine and a router is defined by a cache control protocol. Under this protocol, the router directs only web requests to the cache engine rather than to the intended server. With a cache engine, a client requests web content in the usual manner. A router running a cache control protocol intercepts Transmission Control Protocol (TCP) port 80 web traffic and routes it to the cache engine. The client is not involved in the transaction, and no changes to the client or browser are required. If the cache engine does not have the requested content, it sends the request to the Internet or Intranet in the usual fashion. The content is returned to and stored at the cache engine. The cache engine returns the content to the client. Upon subsequent requests for the same content, the cache engine fulfills the requests from local storage.
FIG. 1 illustrates an implementation of web cache engines over a global communications network such as the Internet. A client computer 12, 14, 16 can request web content via a router 18.
The router 18 intercepts TCP Port 80 web traffic and routes it to the local cache engine 20. The client 12, 14, 16 is not involved in this transaction and no changes to the client computer or browser are required. If the cache engine 20 does not have the requested content, it sends the request via router 18 to the Internet to access an Internet content server 40, 42, 44. The content is returned to, and stored at, the cache engine 20. The cache engine 20 then returns the requested content to the client computer 12, 14, 16 via the router 18. Several cache engines 32, 34, 36 can be placed in a cache farm in a hierarchical fashion at an Internet Service Provider (ISP) site 30. Requests from clients 12, 14, 16 directed through router 18 and ISP server 30, are diverted to the cache farm 32, 34, 36 to fulfill the client request from its storage. If the cache engines 32, 34, 36 are unable to fulfill the request from local storage, a normal web request is made via ISP server 30 over the Internet 50 to an appropriate server 40,42,44 for the requested Internet content. In addition to router 18, routers 26, 46 are also shown connected to ISP server 30. Routers 18, 26, 46 are frequently referred to as Points-of-Presence (POPs). A POP is the location of an access point to the Internet and has a unique Internet IP address. A POP usually includes routers, digital/analog call aggregators, servers and frequently frame relay or Asynchronous Transfer Mode (ATM) switches. Shown connected to router 46 is cache engine 48. Connected to router 26 is cache engine 28 and router 24. Router 24 is connected to a corporate intranet 22.
Because the router redirects packets destined for web servers to the cache engine, the cache engine operates transparently to clients. Clients do not need to configure their browsers to be in proxy server mode. In addition, the operation of the cache engine is transparent to the network. The router operates entirely in its normal role for non-web traffic.
A web object can contain a Hypertext Transfer Protocol (HTTP) header to instruct a browser in a caching server how to cache the web object. For a static image, such as a company logo, the expiration header can be set to “no expiration” so that caching servers can keep the image in the cache forever. In order to gather the exact number of hits on a specific page, e.g., an advertisement, a small image object can be added to the page with the object set to expire immediately, so the caching server won't cache the object. Then, every time a user requests that page, the browser or caching server will retrieve the object from the original web server, and the web server can then count the exact number of requests.
The Common Gateway Interface (CGI) is a simple interface (protocol) for running external programs, software or gateways under an information server in a platform-independent manner. CGI is simply a standardized way for sending information between the server and the script. The CGI script is a program that communicates with the server in a standard way. Currently, the supported information servers are HTTP servers. Each CGI server implementation must define a mechanism to pass data about the request from the server to the script.
Each element on a web page form will have a name and value associated with it. The name identifies the data being sent. The value is the data and can either come from the web page designer or from the visitor who types it in a field. When a visitor clicks the submit button, the name-value pair of each form element is sent to the server. CGI scripts generally have two functions. The first is to take all the name-value pairs and separate them out into individual intelligible pieces. The second is to actually do something with that data, such as printing it out, multiplying fields together, sending an email confirmation, or storing it on a server. The form has three important parts: the form tag, which includes the URL of the CGI script that will process the form; the form elements, such as fields and menus; and the submit button which sends the data to the CGI script on the server. Scripts are little programs that add interactivity to a web page. Simple scripts can be written to add an alert box or some text to the web page; more complicated scripts can be written that load particular pages according to the visitor's browser or that change a frame's background color depending on the visitor's mouse clicks. Most scripts are written in a scripting language called JavaScript that is supported by most browsers, including Netscape Communicator and Microsoft Internet Explorer.
JavaScript is an object-oriented language, which means that it works by manipulating objects on a web page, such as windows, images and documents. JavaScript commands are put directly into the HTML file that creates a web page. Depending on the script being run, the commands can be placed into several parts of the file. The commands are frequently placed near the top of the file.
Special codes set off the commands, alerting the browser that they are JavaScript commands. If the commands are put before the HTML <Body> tag at the top of the file, then the script will be able to start executing while the HTML page is still loading. JavaScript is an interpreted language, which means its commands are executed by the browser in the order in which the browser reads them. JavaScript works by taking actions on objects. These actions are called methods. In the basic syntax of JavaScript, the object is first named, and then a period appears follows by the action taken on the object, i.e., the method. So the command to open a new window in JavaScript is window.open. In this instance, window is the object and open is the method. This command opens a new browser window. Other parameters can be added after the command. All the parameters are placed inside one set of parenthesis, with each individual parameter inside quotation marks, with the parameters separated by commas.
An automatic script is executed by the client browser when the web page is loaded. There is no limit to the number of automatic scripts that can be on a web page. The location of the script on the HTML page determines when the script will load. Scripts are loaded in the order in which they appear in an HTML document. An automatic Java Script is added to an HTML document by the following HTML code:
    • <SCRIPT LANGUAGE=“JavaScript”>
    • type content of the script
    • </SCRIPT>
Some of the older browsers cannot run scripts and will not understand the SCRIPT tag. In order to provide information to a visitor accessing an HTML page, an alternate way to provide information is through the use of the NOSCRIPT tag, followed by the information that is treated as regular text. The older browser won't understand the NOSCRIPT tag and will ignore it, but process the following text. The following is added to the HTML document:
    • <NOSCRIPT>
    • type the information
    • </NOSCRIPT>
In the implementation of the single pixel GIF to create surrogate log files, the following tags and attributes are used as illustrated in FIG. 2 discussed below:
    • IMG is the HTML tag for inserting images on a page;
    • ALT is an attribute for offering alternate text that is displayed if the image is not;
    • SRC is an attribute for specifying the URL of the image;
Also illustrated in FIG. 2 are the following attributes for the IMG tag:
    • WIDTH, HEIGHT are attributes for specifying the size of the image so that the HTML page can be loaded more quickly;
    • BORDER is an attribute for specifying the thickness of the border, if any. BORDER=0 omits the border that a browser would otherwise place automatically around an image.
In a preferred embodiment of the present invention, a CGI string of data is appended to the SRC attribute for the single pixel GIF at the time the page is published, as follows:
&pag=xxxxxxx the absolute URL of the page on which the GIF
appears:
&num=xx the number of elements (SRCs) on the page at the time
of publishing:
&ref=xxxxxxxxx the URL of the page which requested the current page
(this is done via Java Script).
In addition, the persistent cookie identification of the user's cookie can be appended to the CGI string of data as follows:
&usr=xxxxxxxx the persistent cookie ID of the user cookie (Java
Script).
FIG. 2 illustrates an example of an implementation of the single pixel GIF with the addition of query string parameters to act as a surrogate for the complete set of log records that would have been created had the page content not been cached. In FIG. 2, the Java Script statements are embedded directly on the HTML page. It includes a document object with a write method (“document.write”). The document object contains information on the current document and provides methods for displaying the HTML expressions to the user in a specified window. The IMG and BR tags are the HTML expressions that are displayed in the window. The BR CLEAR tag and attribute simply create a line break and stop text wrap. The SRC attribute following the IMG tag provides the absolute URL of the page containing the single pixel clear GIF (“uc.GIF”); i.e.,
    • SRC=“http://census.rolandgarros.org/rc/images/uc.GIF-?pag=‘+location.pathname+‘&num=14’+r+’”.
      The CGI string following uc.GIF indicates that there are 14 SRC elements on the HTML page. The URL of the referrer page is indicated by a variable “r”, which is defined as ‘&ref=’+top.document.referrer based on a true condition to the “if” statement (i.e., the document referrer object is not empty). The Java Script top.document.referrer reflects the URL of the calling document (i.e., referrer page) that the user was viewing before the current page.
In the event the client browser cannot interpret a scripting language, the NOSCRIPT tag demarcates the HTML statements to be interpreted by the browser. This includes the IMG tag wherein the SRC attribute has a query string after “uc.GIF” that is modified to include the default URL of the HTML page (i.e., “index.html”). The index.html file is the default file for the top level directory on the web site.
In order to serve up web pages, web sites need a host computer and server software that runs on the host. The host manages the communications, protocols, and houses the pages and related software required to create a website on the Internet. The server software resides on the host and serves up the pages, and otherwise acts on the requests sent by the client's browser software. The server handles the HTTP requests and communications with the host operating system, which in turn handles the TCP/IP communications. There are different types of server software that perform different types of services for different types of clients. Specifically, a web server is an HTTP server and its function is to send information to the client software (browser) using the HyperText Transfer Protocol. The client browser requests that the server return an HTML document. The server receives this request and sends back a response. The top portion of the response includes transmission information and the rest of the response is the HTML file. In addition to sending pages to the browser, a web server also passes requests to run CGI scripts to CGI applications. These scripts run external mini-programs, such as a database lookup or interactive forms processing. The server sends the script to the application via CGI and communicates the script back to the browser. The server software also includes configuration files and utilities to secure and manage the website in a variety of ways.
FIG. 3 illustrates the processing logic of the present invention. The process starts in start block 300. In logic block 302, the client browser software requests an HTML web page. The client browser determines if the requested HTML page has been cached at the client in decision block 302. If the page has been cached at the client, then the HTML file is delivered to the browser as indicated in logic block 310. The browser interprets the HTML file and builds the web page with source (i.e., from the origin web server) or cached images. The cached images can be available locally or at an ISP, or at a router or other network device along the path. If in decision block 304, it is determined that the page is not cached at the client, then another test is performed in decision block 306 to determine if the page has been cached at an ISP. The ISP cache test is intended to be illustrative of an embodiment of the invention. The next hop from the client can be to a server on an intranet which has a TCP/IP address and provides direct Internet access. If the page has been cached along the path, then, as indicated in logic block 312, the HTML file is delivered to the client browser to interpret the HTML code and build the web page with images that have been cached or retrieved from the origin web server. If the page has not been cached along the path to the web server, the request for the page is transmitted to the host where the web server software processes the request as indicated in logic block 308. If the browser has requested an HTML file, the web server retrieves the original source HTML file, attaches a header to the file, and send the file to the browser as indicated in logic block 314.
Once the browser has received the HTML file from the processing in logic blocks 310, 312 or 314, a test is made in decision block 318 to determine if the HTML file contains an uncacheable single pixel GIF (represented by uc.GIF in this invention). If it does not, the retrieved cached images are displayed to complete the build of the web requested web page in logic block 316. Processing of the request is then completed as indicated by termination block 326. If, in decision block 318, a uc.GIF request is found in the HTML file, then the uc.GIF and CGI query string are transmitted to the origin web server in logic block 320 where they are analyzed to gather the enriched web server activity data made possible by this invention. The browser again interprets the HTML code and builds the page with source or cached images. Using the example of FIGS. 2, 14 hits are recorded for the web page, including one for the transmitted uc.GIF request and 13 for the other source images that are retrieved based on the HTML IMG SRC tags/attributes in the HTML file. This represents the surrogate nature of using the uncacheable single pixels GIF requests. The referrent page for the 14 hits is also contained as part of the CGI query string. In FIG. 2, this is represented by “r=‘&ref=’+top.document.referrer”. The gathering and storing of this enriched web server activity data is indicated by logic block 322. The request processing then ends as indicated in termination block 324.
When a user visits a website, the browser examines the URL and looks into a cookie file stored on the client computer's hard drive. If the browser finds a cookie associated with that URL, it sends that cookie information to the server. If no cookie is associated with the URL, the server places a cookie inside the cookie file. Some sites may first ask a series of questions, such as name and password, and then will place a cookie on the hard disk with that information in it. This is typical of sites that require registration. Commonly, a GCI script on the server takes the information that the user has entered and then writes a cookie onto the client computer's hard disk. When the user leaves a web site, the cookie information remains on the hard disk so that the site can recognize the user the next time the user visits the web site, unless the cookie has specifically been written to expire when the user leaves the site.
With the capability to gather enriched information through the use of the single pixel GIF described above, much more detailed and accurate information regarding web site activity can be collected and stored in multidimensional databases, including multidimensional implementations of a relational database. Furthermore, this collected data also can be analyzed using relatively new techniques such as On-line Analytical Processing (OLAP), described briefly below.
On-Line Analytical Processing (OLAP) describes a class of technologies that are designed for live ad hoc data access and analysis. While transaction processing generally relies on relational databases, OLAP has become synonymous with multidimensional views of business data. These multidimensional views are supported by multidimensional database technology. OLAP applications are used by analysts who frequently want a higher level, aggregated view of the data, such as total sales by product line, by region, etc. The OLAP database is usually updated in batch mode, often from multiple sources, and provides an analytical backend to multiple user applications.
FIG. 4 illustrates an exemplary site level analysis display that can be derived from the collecting of accurate hit information using the single pixel GIF as a surrogate for the complete set of log records which would have been generated if the web page content had not been cached. The figure depicts the various measurements that can be made for selected intervals of time and includes hits, pages visited, seconds per page view, visits, hits per visit, page views per visit, and seconds per visit.
FIG. 5 illustrates an exemplary referral categories display that can be generated from the use of the single pixel GIF to log information pertaining to the web page referral source. The different referral categories include commercial, education, government, internal referrals, ISP referrals, and search engines and directories among them. Again, the data is presented for selected intervals of time (e.g., calendar weeks). The various referral categories are underlined, which means that they can “drilled down” to sub-referral categories as illustrated in FIG. 6.
FIG. 6 illustrates the breakdown of the search engines and directories referral category for the selected intervals of time based on the referrals made from common search engines or browsers. For example, during the week ending June 10 in which the peak number of page referrals occurred, over 71% were referred by the Yahoo search engine. Further drill down is possible into the search engine referral category as indicated by the underlined subcategories.
FIG. 7 illustrates a further drill down of the AltaVista referral subcategory. For example, the display shows that 84% of the referrals from AltaVista during the week ending June 3 originated from a CGI query string on the AltaVista home page. No further drill downs are possible in this referral subcategory.
FIG. 8 illustrates an exemplary display of web page by content categories that can be derived from the collecting of accurate hit information using the single pixel GIF as a surrogate for the complete set of log records which would have been generated if the web page content had not been cached. The content categories include draws, home page, news and photos, players, scoreboard, and shop (gift shop) among other content categories. The data is presented for selected intervals of time. The various content categories are underlined which means they can be drilled down to a lower level of detail.
FIG. 9 illustrates a drill down of the home page content category. The resources include the English version home page (/en) accessible via a Java Script-enabled browser; the French version home page (/fr) accessible via a Java Script-enabled browser; the English version home page (/en/index.html) accessible from a browser that is not Java Script-enabled, etc. For the peak traffic week ending June 10, 58% of the home page traffic was directed to the English-version page and initiated from a Java Script-enabled browser. Slightly less than 42% of the traffic was directed to the French-version page initiated from a Java Script-enabled browser.
FIG. 10 illustrates a display of exemplary saved reports that can be generated using OLAP processing of the surrogate log records created through the use of the single pixel GIF of this invention. The saved reports include site level reports, visit distribution reports, traffic reports, content reports, domain/sub-domain reports etc. Each of the listed reports is underlined indicating that a detailed report is available simply by clicking on the report name.
FIGS. 11A-11M illustrate the format of the corresponding exemplary saved report. FIG. 11A shows the site level report that is available. In this instance, the available site level report is a site traffic report. The report name is underline indicating that a further drill down to a detailed report results from clicking on the report name. Such action would generate a display like that of FIG. 4. The available visit distribution reports are listed in the display of FIG. 11B. FIGS. 11C-11K and 11M illustrate various saved reports that are basically “top 10” lists. FIG. 11C depicts traffic reports and enables display of the top 10 requested resources. FIG. 11D depicts content reports and enables display of the top 10 most requested pages. FIG. 11E depicts sub-domain reports and enables display of the top 10 sub-domains by either pages viewed or by number of visits. FIG. 11F depicts domain reports and enables display of the top 10 domains by either pages viewed or by number of visits. FIG. 11G depicts referral reports and enables display of the top 10 referrals by pages viewed or by number of visits. FIG. 11H depicts entry page reports and enables display of the top 10 site entry pages. FIG. 11I depicts exit page reports and enables display of the top 10 exit pages. FIG. 11J depicts browser reports and enables display of the top 10 browsers by either pages viewed or by number of visits. FIG. 11K depicts platform reports and enables display of the top 10 platforms by pages viewed of by the number of visits. FIG. 11L depicts usage cluster reports and enables display of usage cluster visits. FIG. 11M depicts ad reports and enables display of the top 10 ads by impression created. All of the available saved reports are presented for selected intervals of time such as the most recent five weeks.
The corresponding structures, materials, acts, and equivalents of any means plus function elements in any claims below are intended to include any structure, material, or acts for performing the functions in combination with other claimed elements as specifically claimed.
While the invention has been particularly shown and described with reference to embodiments thereof, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the spirit and scope of the present invention.

Claims (59)

1. A system for obtaining enriched activity data in a client-server communications network wherein information requested by a network element is cached at one or more other network elements, comprising:
a server network element including server software and a database for generating and storing a plurality of information files that are accessible to a requesting network element, the information files including text files and key words that are interpreted by the requesting network element to display the information requested, the information file further including an uncacheable single pixel Graphics Image Format (GIF) request;
wherein upon interpreting the information file, the single pixel GIF request is transmitted from the requesting element over the communications network to the server network element which reads and stores enriched data contained therein.
2. The system for obtaining enriched activity data of claim 1 further comprising one or more cache engines that are connected to at least one of the other network elements for temporarily storing requested information files that are served upon demand to the requesting network element.
3. The system for obtaining enriched activity data of claim 1 wherein the single pixel GIF request includes a Common Gateway Interface (CGI) query string appended to it that contains the enriched data.
4. The system for obtaining enriched activity data of claim 3 wherein the CGI query string includes an identification of the location of the requested information file.
5. The system for obtaining enriched activity data of claim 3 wherein the CGI query string includes a number of image objects contained in the information file.
6. The system for obtaining enriched activity data of claim 3 wherein the CGI query string includes an identification of a network element that referred the requesting network element to the server network element.
7. The system for obtaining enriched activity data of claim 3 wherein the CGI query string includes a persistent cookie identification of the requesting network element.
8. The system for obtaining enriched activity data of claim 1 wherein the client-server communications network is a global network such as the Internet.
9. The system for obtaining enriched activity data of claim 1 wherein the plurality of information files are hypertext documents written with HyperText Markup Language (HTML) tags.
10. The system for obtaining enriched activity data of claim 9 wherein the hypertext documents contain source HTML code interpreted by the requesting element to generate the display of corresponding web pages stored at the server network element.
11. The system for obtaining enriched activity data of claim 1 wherein the server network element is a HyperText Transfer Protocol (HTTP) server.
12. The system for obtaining enriched activity data of claim 1 wherein the requesting network element is a client browser application.
13. The system for obtaining enriched activity data of claim 9 wherein the single pixel GIF request with an appended Common Gateway Interface (CGI) query string is included as part of a JavaScript command that is put directly into the HTML file.
14. The system for obtaining enriched activity data of claim 13 wherein the JavaScript command is a “document.write” command which places an expression that follows the command into a document window.
15. The system for obtaining enriched activity data of claim 14 wherein the expression contains a HyperText Markup Language (HTML) image (IMG) tag with a source (SRC) attribute that specifies the Uniform Resource Locator (URL) location for the hypertext document.
16. The system for obtaining enriched activity data of claim 1 wherein the other network elements include any one or more of switch devices, router devices, gateways, and client computer devices.
17. A method for obtaining enriched activity data in a client-server communications network wherein information requested by a network element is cached at one or more other network elements, comprising the acts of:
generating and storing a plurality of information files at a server network element that are accessible to a requesting network element, the information files including text files and key words and a single pixel Graphics Image Format (GIF) request;
interpreting the information files including the text files, key words and single pixel GIF request by the requesting network element to display the information requested;
transmitting the single pixel GIF request from the requesting element over the communications network to the server network element, and
reading and storing the enriched activity data contained in the transmitted single pixel GIF request at the server network element.
18. The method for obtaining enriched activity data of claim 17 further comprising the act of temporarily storing the requested inform files that are served on demand to the requested network element by one or more cache engines that are connected to at least one of the other network elements.
19. The method for obtaining enriched activity data of claim 17 further comprising the act of appending a common gateway interface (CGI) query string to the single pixel GIF request.
20. The method for obtaining enriched activity data of claim 19 wherein the CGI query string includes an identification of the location of the requested information file.
21. The method for obtaining enriched activity data of claim 19 wherein the CGI query string includes a number of image objects contained in the information file.
22. The method for obtaining enriched activity data of claim 19 wherein the CGI query string includes an identification of a network element that referred the requesting network element to the server network element.
23. The method for obtaining enriched activity data of claim 19 wherein the CGI query string includes a persistent cookie identification of the requesting network element.
24. The method for obtaining enriched activity data of claim 17 wherein the client-server communications network is a global network such as the Internet.
25. The method for obtaining enriched activity data of claim 17 wherein the plurality of information files are hypertext documents written with HyperText Markup Language (HTML) tags.
26. The method for obtaining enriched activity data of claim 25 further comprising interpreting the source HTML code in the hypertext documents by the requesting element to generate a display of corresponding web pages stored at the server network element.
27. The method for obtaining enriched activity data of claim 17 wherein the hypertext documents are stored at a HyperText Transfer Protocol (HTTP) server.
28. The method for obtaining enriched activity data of claim 17 wherein the requesting network element is a client browser application.
29. The method for obtaining enriched activity data of claim 25 further including the single pixel GIF request with an appended Common Gateway Interface (CGI) query string is included as part of a JavaScript command that is put directly into the HTML file.
30. The method for obtaining enriched activity data of claim 29 wherein the JavaScript command is a “document.write” command which places an expression that follows the command into a document window.
31. The method for obtaining enriched activity data of claim 30 wherein the expression contains a HyperText Markup Language (HTML) image (IMG) tag with a source (SRC) attribute that specifies the Uniform Resource Locator (URL) location of the hypertext document.
32. A computer readable medium containing a computer program for obtaining enriched activity data in a client-server communications network wherein information requested by a network element is cached at one or more other network elements, the computer program product comprising:
program instructions that generate and store a plurality of accessible information files at a server network element, the information files including text files and key words and a single pixel Graphics Image Format (GIF);
program instructions that receive the single pixel GIF request from the requesting element when the requesting element interprets the contents of the information file; and
program instructions that read and store the enriched activity data contained in the transmitted single pixel GIF request at the server network element.
33. The computer program product for obtaining enriched activity data of claim 32 further comprising program instructions that append a common gateway interface (CGI) query string to the single pixel GIF request.
34. The computer program product for obtaining enriched activity data of claim 33 wherein the CGI query string includes an identification of the location of the requested information file.
35. The computer program product for obtaining enriched activity data of claim 33 wherein the CGI query string includes a number of image objects contained in the information file.
36. The computer program product for obtaining enriched activity data of claim 33 wherein the CGI query string includes an identification of a network element that referred the requesting network element to the server network element.
37. The computer program product for obtaining enriched activity data of claim 33 wherein the CGI query string includes a persistent cookie identification of the requesting network element.
38. The computer program product for obtaining enriched activity data of claim 32 wherein the plurality of information files are hypertext documents written with HyperText Markup Language (HTML) tags.
39. The computer program product for obtaining enriched activity data of claim 32 further comprising program instructions that store the hypertext documents at a HyperText Transfer Protocol (HTTP) server.
40. The computer program product for obtaining enriched activity data of claim 38 further comprising program instructions that place a JavaScript command, including the single pixel GIF request with an appended Common Gateway Interface (CGI) query string directly into the HTML file.
41. The computer program product for obtaining enriched activity data of claim 40 wherein the JavaScript command is a “document.write” command which places an expression that follows the command into a document window at a requesting network element.
42. The computer program product for obtaining enriched activity data of claim 41 wherein the expression contains a HyperText Markup Language (HTML) image (IMG) tag with a source (SRC) attribute that specifies the Uniform Resource Locator (URL) location of the hypertext document.
43. A system for obtaining enriched activity data in a client-server communications network wherein information requested by a network element is cached at one or more other network elements, comprising:
a server network element including server software and a database for generating and storing a plurality of information files that are accessible to a requesting network element, at least one of said plurality of information files including a text file and key words, said one of said plurality of information files being capable of being interpreted by the requesting network element to display the information requested, said one of said plurality of information files further including an uncacheable small image object request;
wherein said server network element is configured to receive the uncacheable small image object request from the requesting element over the communications network and to read and store enriched data contained therein.
44. The system for obtaining enriched activity data of claim 43 wherein the uncacheable small image object request includes a Common Gateway Interface (CGI) query string appended to it that contains the enriched data.
45. The system for obtaining enriched activity data of claim 44 wherein the CGI query string includes an identification of the location of the requested information file.
46. The system for obtaining enriched activity data of claim 44 wherein the CGI query string includes a number of image objects contained in the information file.
47. The system for obtaining enriched activity data of claim 44 wherein the CGI query string includes an identification of a network element that referred the requesting network element to the server network element.
48. The system for obtaining enriched activity data of claim 44 wherein the CGI query string includes a persistent cookie identification of the requesting network element.
49. The system for obtaining enriched activity data of claim 43 wherein the plurality of information files are hypertext documents written with HyperText Markup Language (HTML) tags.
50. The system for obtaining enriched activity data of claim 49 wherein the hypertext documents contain source HTML code interpreted by the requesting element to generate the display of corresponding web pages stored at the server network element.
51. A method for obtaining enriched activity data in a client-server communications network wherein information requested by a network element is cached at one or more other network elements, comprising:
generating and storing at a server network element a plurality of information files that are accessible to a requesting network element, at least one of said plurality of information files including a text file and key words and a small image object request, said one of said plurality of information files being capable of being interpreted by the requesting network element to display the information requested;
receiving the small image object request from the requesting element over the communications network, and
reading and storing enriched activity data contained in the received small image object request at the server network element.
52. The method for obtaining enriched activity data of claim 51 further comprising appending a common gateway interface (CGI) query string to the small image object request.
53. The method for obtaining enriched activity data of claim 52 wherein the CGI query string includes an identification of the location of the requested information file.
54. The method for obtaining enriched activity data of claim 52 wherein the CGI query string includes a number of image objects contained in the information file.
55. The method for obtaining enriched activity data of claim 52 wherein the CGI query string includes an identification of a network element that referred the requesting network element to the server network element.
56. The method for obtaining enriched activity data of claim 52 wherein the CGI query string includes a persistent cookie identification of the requesting network element.
57. The method for obtaining enriched activity data of claim 51 wherein the plurality of information files are hypertext documents written with HyperText Markup Language (HTML) tags.
58. A computer readable medium containing a computer program for obtaining enriched activity data in a client-server communications network wherein information requested by a network element is cached at one or more other network elements, the computer program product comprising:
program instructions that generate and store a plurality of accessible information files at a server network element, at least one of said plurality of information files including a text file and key words and a small image object request, the contents of said one of said plurality of information files being capable of being interpreted by the requesting element;
program instructions that receive the small image object request from the requesting element; and
program instructions that read and store enriched activity data contained in the received small image object request at the server network element.
59. The computer program product for obtaining enriched activity data of claim 58 further comprising program instructions that append a common gateway interface (CGI) query string to the small image object request.
US12/437,581 2000-08-18 2009-05-08 Gathering enriched web server activity data of cached web content Expired - Lifetime USRE41440E1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/437,581 USRE41440E1 (en) 2000-08-18 2009-05-08 Gathering enriched web server activity data of cached web content

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/641,495 US7216149B1 (en) 2000-08-18 2000-08-18 Gathering enriched web server activity data of cached web content
US12/437,581 USRE41440E1 (en) 2000-08-18 2009-05-08 Gathering enriched web server activity data of cached web content

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/641,495 Reissue US7216149B1 (en) 2000-08-18 2000-08-18 Gathering enriched web server activity data of cached web content

Publications (1)

Publication Number Publication Date
USRE41440E1 true USRE41440E1 (en) 2010-07-13

Family

ID=24572627

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/641,495 Ceased US7216149B1 (en) 2000-08-18 2000-08-18 Gathering enriched web server activity data of cached web content
US12/437,581 Expired - Lifetime USRE41440E1 (en) 2000-08-18 2009-05-08 Gathering enriched web server activity data of cached web content

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/641,495 Ceased US7216149B1 (en) 2000-08-18 2000-08-18 Gathering enriched web server activity data of cached web content

Country Status (8)

Country Link
US (2) US7216149B1 (en)
EP (1) EP1374060A2 (en)
JP (1) JP4046328B2 (en)
KR (1) KR100612711B1 (en)
CN (1) CN100399290C (en)
AU (1) AU2001293760A1 (en)
TW (1) TW518498B (en)
WO (1) WO2002017079A2 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249905A1 (en) * 2006-06-26 2008-10-09 Omniture, Inc. Multi-party web-beacon-based analytics
US7992135B1 (en) * 2006-06-26 2011-08-02 Adobe Systems Incorporated Certification of server-side partner plug-ins for analytics and privacy protection
US20120023156A1 (en) * 2010-07-21 2012-01-26 Empire Technology Development Llc Information processing apparatus, server-client system, and computer program product
US8346889B1 (en) 2010-03-24 2013-01-01 Google Inc. Event-driven module loading
US8370495B2 (en) 2005-03-16 2013-02-05 Adaptive Computing Enterprises, Inc. On-demand compute environment
US8453049B1 (en) * 2010-05-19 2013-05-28 Google Inc. Delayed code parsing for reduced startup latency
US8782120B2 (en) 2005-04-07 2014-07-15 Adaptive Computing Enterprises, Inc. Elastic management of compute resources between a web server and an on-demand compute environment
US9003423B1 (en) * 2011-07-29 2015-04-07 Amazon Technologies, Inc. Dynamic browser compatibility checker
US9015324B2 (en) 2005-03-16 2015-04-21 Adaptive Computing Enterprises, Inc. System and method of brokering cloud computing resources
US9075657B2 (en) 2005-04-07 2015-07-07 Adaptive Computing Enterprises, Inc. On-demand access to compute resources
US9231886B2 (en) 2005-03-16 2016-01-05 Adaptive Computing Enterprises, Inc. Simple integration of an on-demand compute environment
US9262396B1 (en) 2010-03-26 2016-02-16 Amazon Technologies, Inc. Browser compatibility checker tool
US10691738B1 (en) 2017-08-07 2020-06-23 Amdocs Development Limited System, method, and computer program for tagging application data with enrichment information for interpretation and analysis by an analytics system
US11467883B2 (en) 2004-03-13 2022-10-11 Iii Holdings 12, Llc Co-allocating a reservation spanning different compute resources types
US11494235B2 (en) 2004-11-08 2022-11-08 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11522952B2 (en) 2007-09-24 2022-12-06 The Research Foundation For The State University Of New York Automatic clustering for self-organizing grids
US11526304B2 (en) 2009-10-30 2022-12-13 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US11630704B2 (en) 2004-08-20 2023-04-18 Iii Holdings 12, Llc System and method for a workload management and scheduling module to manage access to a compute environment according to local and non-local user identity information
US11650857B2 (en) 2006-03-16 2023-05-16 Iii Holdings 12, Llc System and method for managing a hybrid computer environment
US11652706B2 (en) 2004-06-18 2023-05-16 Iii Holdings 12, Llc System and method for providing dynamic provisioning within a compute environment
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes

Families Citing this family (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AUPQ206399A0 (en) * 1999-08-06 1999-08-26 Imr Worldwide Pty Ltd. Network user measurement system and method
KR100424481B1 (en) * 2000-06-24 2004-03-22 엘지전자 주식회사 Apparatus and method for recording and reproducing a digital broadcasting service information on optical medium
KR100910972B1 (en) * 2002-12-07 2009-08-05 엘지전자 주식회사 Method for controling a playback in interactive optical disc player
EP1348168A1 (en) * 2000-10-24 2003-10-01 Singingfish.Com, Inc. Method of collecting data using an embedded media player page
US7761497B1 (en) * 2001-07-13 2010-07-20 Vignette Software, LLC Storage medium having a manageable file directory structure
US8156216B1 (en) 2002-01-30 2012-04-10 Adobe Systems Incorporated Distributed data collection and aggregation
US20040098229A1 (en) * 2002-06-28 2004-05-20 Brett Error Efficient click-stream data collection
CN1293738C (en) * 2002-06-28 2007-01-03 华为技术有限公司 Method for improving data processing capability of remote user dialing authentication protocol
US8028077B1 (en) * 2002-07-12 2011-09-27 Apple Inc. Managing distributed computers
US7797418B2 (en) 2002-08-06 2010-09-14 Tvworks, Llc Method of maintaining broadcast data stream
US7853684B2 (en) * 2002-10-15 2010-12-14 Sas Institute Inc. System and method for processing web activity data
KR100920654B1 (en) * 2002-12-09 2009-10-09 엘지전자 주식회사 Method for controling a playback in interactive optical disc player
US20050216844A1 (en) * 2004-03-03 2005-09-29 Error Brett M Delayed transmission of website usage data
US20040243704A1 (en) * 2003-04-14 2004-12-02 Alfredo Botelho System and method for determining the unique web users and calculating the reach, frequency and effective reach of user web access
WO2004107130A2 (en) * 2003-05-28 2004-12-09 Caymas Systems, Inc. Multilayer access control security system
US7774499B1 (en) * 2003-10-30 2010-08-10 United Online, Inc. Accelerating network communications
US7596554B2 (en) 2003-12-09 2009-09-29 International Business Machines Corporation System and method for generating a unique, file system independent key from a URI (universal resource indentifier) for use in an index-less voicexml browser caching mechanism
US8984640B1 (en) * 2003-12-11 2015-03-17 Radix Holdings, Llc Anti-phishing
US20050256923A1 (en) * 2004-05-14 2005-11-17 Citrix Systems, Inc. Methods and apparatus for displaying application output on devices having constrained system resources
JP4565495B2 (en) * 2004-11-10 2010-10-20 富士通株式会社 Terminal device, mail processing method of terminal device, and mail processing program
US20060168549A1 (en) * 2005-01-25 2006-07-27 Eric Chan User interfaces and methods for displaying attributes of objects and accessing content
CN100361454C (en) * 2005-04-27 2008-01-09 华为技术有限公司 Method for obtaining of daily information from network element equipment by network management server
US8341259B2 (en) * 2005-06-06 2012-12-25 Adobe Systems Incorporated ASP for web analytics including a real-time segmentation workbench
US7401123B2 (en) * 2005-10-04 2008-07-15 International Business Machines Corporation Method for identifying and tracking grouped content in e-mail campaigns
US7366762B2 (en) * 2005-10-04 2008-04-29 International Business Machines Corporation Method for monitoring and reporting usage of non-hypertext markup language e-mail campaigns
US7461127B2 (en) * 2005-10-04 2008-12-02 International Business Machines Corporation Method for determining user uniqueness in e-mail campaigns
US7558830B2 (en) * 2005-10-04 2009-07-07 International Business Machines Corporation Method for tagging and tracking non-hypertext markup language based e-mail
JP2007109137A (en) * 2005-10-17 2007-04-26 Kan:Kk Portable telephone access identification system, web server, and control method for web server
US8126766B2 (en) * 2006-11-29 2012-02-28 Yahoo! Inc. Interactive user interface for collecting and processing nomenclature and placement metrics for website design
US20080281903A1 (en) * 2007-05-10 2008-11-13 Marek Kwiatkowski System and method for providing interactive multimedia content
US8560669B2 (en) * 2007-09-26 2013-10-15 Quantcast Corporation Tracking identifier synchronization
US7925694B2 (en) 2007-10-19 2011-04-12 Citrix Systems, Inc. Systems and methods for managing cookies via HTTP content layer
US8090877B2 (en) 2008-01-26 2012-01-03 Citrix Systems, Inc. Systems and methods for fine grain policy driven cookie proxying
US8028201B2 (en) 2008-05-09 2011-09-27 International Business Machines Corporation Leveled logging data automation for virtual tape server applications
US7958191B1 (en) 2008-06-27 2011-06-07 Quantcast Corporation System and method for client management
US7752261B1 (en) 2008-06-27 2010-07-06 Quant cast Corporation System and method for multibeaconing
US20100057506A1 (en) * 2008-08-28 2010-03-04 Yahoo! Inc. Conversion value reporting using conversion value pixel
JP5135135B2 (en) 2008-09-11 2013-01-30 株式会社日立製作所 Application execution management method, server computer for executing application, and relay device
US20100107091A1 (en) * 2008-10-29 2010-04-29 International Business Machines Corporation Publishing requests for dynamically loaded missing images
US9582596B2 (en) 2008-10-29 2017-02-28 International Business Machines Corporation Preventing not found error (404) responses on dynamically loaded missing images
US8832206B2 (en) 2009-02-13 2014-09-09 Hostopia.Com Inc. Email recipient behavior tracking
FR2943159B1 (en) * 2009-03-16 2016-10-21 Alcatel Lucent METHOD FOR ASSISTING AN OPERATOR OF A CALL CENTER
KR101012351B1 (en) * 2009-04-24 2011-02-09 전대우 Charcoal frame with lighting means
CN101650727A (en) * 2009-08-11 2010-02-17 腾讯数码(天津)有限公司 Method and device for browsing pictures
US8878855B2 (en) * 2009-08-13 2014-11-04 Liveclicker, Inc. Video in e-mail
US8910259B2 (en) 2010-08-14 2014-12-09 The Nielsen Company (Us), Llc Systems, methods, and apparatus to monitor mobile internet activity
US8886773B2 (en) 2010-08-14 2014-11-11 The Nielsen Company (Us), Llc Systems, methods, and apparatus to monitor mobile internet activity
US8499065B2 (en) 2010-09-30 2013-07-30 The Nielsen Company (Us), Llc Methods and apparatus to distinguish between parent and child webpage accesses and/or browser tabs in focus
TWI512620B (en) * 2010-11-18 2015-12-11 Alibaba Group Holding Ltd Method and device for fragmented nested caching of web pages
US9124920B2 (en) 2011-06-29 2015-09-01 The Nielson Company (Us), Llc Methods, apparatus, and articles of manufacture to identify media presentation devices
US8594617B2 (en) 2011-06-30 2013-11-26 The Nielsen Company (Us), Llc Systems, methods, and apparatus to monitor mobile internet activity
CN103188323B (en) * 2011-12-31 2016-09-07 中国移动通信集团公司 The system and method for Web service is provided based on subscriber's main station buffer memory device
US9665630B1 (en) * 2012-06-18 2017-05-30 EMC IP Holding Company LLC Techniques for providing storage hints for use in connection with data movement optimizations
US20140068411A1 (en) * 2012-08-31 2014-03-06 Scott Ross Methods and apparatus to monitor usage of internet advertising networks
US9301173B2 (en) 2013-03-15 2016-03-29 The Nielsen Company (Us), Llc Methods and apparatus to credit internet usage
US10356579B2 (en) 2013-03-15 2019-07-16 The Nielsen Company (Us), Llc Methods and apparatus to credit usage of mobile devices
CN103309806B (en) * 2013-05-03 2016-06-01 上海证券交易所 The device and method of a kind of quick development and testing
US9219928B2 (en) 2013-06-25 2015-12-22 The Nielsen Company (Us), Llc Methods and apparatus to characterize households with media meter data
US9277265B2 (en) 2014-02-11 2016-03-01 The Nielsen Company (Us), Llc Methods and apparatus to calculate video-on-demand and dynamically inserted advertisement viewing probability
US10965763B2 (en) 2014-07-31 2021-03-30 Wells Fargo Bank, N.A. Web analytics tags
US9762688B2 (en) 2014-10-31 2017-09-12 The Nielsen Company (Us), Llc Methods and apparatus to improve usage crediting in mobile devices
CN104461937B (en) * 2014-12-08 2017-10-03 福建新大陆通信科技股份有限公司 A kind of method and system of set box browser internal memory optimization
US11423420B2 (en) 2015-02-06 2022-08-23 The Nielsen Company (Us), Llc Methods and apparatus to credit media presentations for online media distributions
US10219039B2 (en) 2015-03-09 2019-02-26 The Nielsen Company (Us), Llc Methods and apparatus to assign viewers to media meter data
US9826359B2 (en) 2015-05-01 2017-11-21 The Nielsen Company (Us), Llc Methods and apparatus to associate geographic locations with user devices
US11188941B2 (en) 2016-06-21 2021-11-30 The Nielsen Company (Us), Llc Methods and apparatus to collect and process browsing history
US10728329B2 (en) * 2016-11-22 2020-07-28 Vivint, Inc. System and methods for secure data storage
US10791355B2 (en) 2016-12-20 2020-09-29 The Nielsen Company (Us), Llc Methods and apparatus to determine probabilistic media viewing metrics
EP3588347B1 (en) * 2018-06-29 2021-01-13 AO Kaspersky Lab Systems and methods for identifying unknown attributes of web data fragments when launching a web page in a browser
WO2022071615A1 (en) * 2020-09-29 2022-04-07 제이엠사이트 주식회사 Failure prediction method and apparatus implementing same

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5796952A (en) * 1997-03-21 1998-08-18 Dot Com Development, Inc. Method and apparatus for tracking client interaction with a network resource and creating client profiles and resource database
US5848396A (en) * 1996-04-26 1998-12-08 Freedom Of Information, Inc. Method and apparatus for determining behavioral profile of a computer user
US5892917A (en) * 1995-09-27 1999-04-06 Microsoft Corporation System for log record and log expansion with inserted log records representing object request for specified object corresponding to cached object copies
US5913041A (en) * 1996-12-09 1999-06-15 Hewlett-Packard Company System for determining data transfer rates in accordance with log information relates to history of data transfer activities that independently stored in content servers
US5935207A (en) * 1996-06-03 1999-08-10 Webtv Networks, Inc. Method and apparatus for providing remote site administrators with user hits on mirrored web sites
US6018763A (en) * 1997-05-28 2000-01-25 3Com Corporation High performance shared memory for a bridge router supporting cache coherency
US6018619A (en) * 1996-05-24 2000-01-25 Microsoft Corporation Method, system and apparatus for client-side usage tracking of information server systems
US6023726A (en) * 1998-01-20 2000-02-08 Netscape Communications Corporation User configurable prefetch control system for enabling client to prefetch documents from a network server
US6041355A (en) * 1996-12-27 2000-03-21 Intel Corporation Method for transferring data between a network of computers dynamically based on tag information
US6085229A (en) * 1998-05-14 2000-07-04 Belarc, Inc. System and method for providing client side personalization of content of web pages and the like
US6094662A (en) * 1998-04-30 2000-07-25 Xerox Corporation Apparatus and method for loading and reloading HTML pages having cacheable and non-cacheable portions
US20020004733A1 (en) * 2000-05-05 2002-01-10 Frank Addante Method and apparatus for transaction tracking over a computer network
US6363418B1 (en) * 1998-10-16 2002-03-26 Softbook Press, Inc. On-line image caching control for efficient image display
US6385642B1 (en) * 1998-11-03 2002-05-07 Youdecide.Com, Inc. Internet web server cache storage and session management system
US6393479B1 (en) * 1999-06-04 2002-05-21 Webside Story, Inc. Internet website traffic flow analysis
US6606581B1 (en) * 2000-06-14 2003-08-12 Opinionlab, Inc. System and method for measuring and reporting user reactions to particular web pages of a website
US7003565B2 (en) * 2001-04-03 2006-02-21 International Business Machines Corporation Clickstream data collection technique
US20080052392A1 (en) * 2006-05-18 2008-02-28 Jeff Webster System and Method for Monitoring a User's Online Activity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182122B1 (en) * 1997-03-26 2001-01-30 International Business Machines Corporation Precaching data at an intermediate server based on historical data requests by users of the intermediate server

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5892917A (en) * 1995-09-27 1999-04-06 Microsoft Corporation System for log record and log expansion with inserted log records representing object request for specified object corresponding to cached object copies
US5848396A (en) * 1996-04-26 1998-12-08 Freedom Of Information, Inc. Method and apparatus for determining behavioral profile of a computer user
US5991735A (en) * 1996-04-26 1999-11-23 Be Free, Inc. Computer program apparatus for determining behavioral profile of a computer user
US6018619A (en) * 1996-05-24 2000-01-25 Microsoft Corporation Method, system and apparatus for client-side usage tracking of information server systems
US5935207A (en) * 1996-06-03 1999-08-10 Webtv Networks, Inc. Method and apparatus for providing remote site administrators with user hits on mirrored web sites
US5913041A (en) * 1996-12-09 1999-06-15 Hewlett-Packard Company System for determining data transfer rates in accordance with log information relates to history of data transfer activities that independently stored in content servers
US6742040B1 (en) * 1996-12-27 2004-05-25 Intel Corporation Firewall for controlling data transfers between networks based on embedded tags in content description language
US6041355A (en) * 1996-12-27 2000-03-21 Intel Corporation Method for transferring data between a network of computers dynamically based on tag information
US5796952A (en) * 1997-03-21 1998-08-18 Dot Com Development, Inc. Method and apparatus for tracking client interaction with a network resource and creating client profiles and resource database
US6018763A (en) * 1997-05-28 2000-01-25 3Com Corporation High performance shared memory for a bridge router supporting cache coherency
US6023726A (en) * 1998-01-20 2000-02-08 Netscape Communications Corporation User configurable prefetch control system for enabling client to prefetch documents from a network server
US6094662A (en) * 1998-04-30 2000-07-25 Xerox Corporation Apparatus and method for loading and reloading HTML pages having cacheable and non-cacheable portions
US6085229A (en) * 1998-05-14 2000-07-04 Belarc, Inc. System and method for providing client side personalization of content of web pages and the like
US6363418B1 (en) * 1998-10-16 2002-03-26 Softbook Press, Inc. On-line image caching control for efficient image display
US6385642B1 (en) * 1998-11-03 2002-05-07 Youdecide.Com, Inc. Internet web server cache storage and session management system
US6393479B1 (en) * 1999-06-04 2002-05-21 Webside Story, Inc. Internet website traffic flow analysis
US20020147772A1 (en) * 1999-06-04 2002-10-10 Charles Glommen Internet website traffic flow analysis
US20020004733A1 (en) * 2000-05-05 2002-01-10 Frank Addante Method and apparatus for transaction tracking over a computer network
US6606581B1 (en) * 2000-06-14 2003-08-12 Opinionlab, Inc. System and method for measuring and reporting user reactions to particular web pages of a website
US7003565B2 (en) * 2001-04-03 2006-02-21 International Business Machines Corporation Clickstream data collection technique
US20080052392A1 (en) * 2006-05-18 2008-02-28 Jeff Webster System and Method for Monitoring a User's Online Activity

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Computer Knowledge Newsletter-Nov. 1999 Issue. *
Computer Knowledge Newsletter—Nov. 1999 Issue. *
PCT/EP01/09308, PCT Preliminary Examination Report, Jul. 3, 2003, European Patent Office. *

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11467883B2 (en) 2004-03-13 2022-10-11 Iii Holdings 12, Llc Co-allocating a reservation spanning different compute resources types
US11652706B2 (en) 2004-06-18 2023-05-16 Iii Holdings 12, Llc System and method for providing dynamic provisioning within a compute environment
US11630704B2 (en) 2004-08-20 2023-04-18 Iii Holdings 12, Llc System and method for a workload management and scheduling module to manage access to a compute environment according to local and non-local user identity information
US11537434B2 (en) 2004-11-08 2022-12-27 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11494235B2 (en) 2004-11-08 2022-11-08 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11656907B2 (en) 2004-11-08 2023-05-23 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11886915B2 (en) 2004-11-08 2024-01-30 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11709709B2 (en) 2004-11-08 2023-07-25 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11762694B2 (en) 2004-11-08 2023-09-19 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11861404B2 (en) 2004-11-08 2024-01-02 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11537435B2 (en) 2004-11-08 2022-12-27 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US10333862B2 (en) 2005-03-16 2019-06-25 Iii Holdings 12, Llc Reserving resources in an on-demand compute environment
US11658916B2 (en) 2005-03-16 2023-05-23 Iii Holdings 12, Llc Simple integration of an on-demand compute environment
US9015324B2 (en) 2005-03-16 2015-04-21 Adaptive Computing Enterprises, Inc. System and method of brokering cloud computing resources
US8370495B2 (en) 2005-03-16 2013-02-05 Adaptive Computing Enterprises, Inc. On-demand compute environment
US9112813B2 (en) 2005-03-16 2015-08-18 Adaptive Computing Enterprises, Inc. On-demand compute environment
US9231886B2 (en) 2005-03-16 2016-01-05 Adaptive Computing Enterprises, Inc. Simple integration of an on-demand compute environment
US11356385B2 (en) 2005-03-16 2022-06-07 Iii Holdings 12, Llc On-demand compute environment
US11134022B2 (en) 2005-03-16 2021-09-28 Iii Holdings 12, Llc Simple integration of an on-demand compute environment
US10608949B2 (en) 2005-03-16 2020-03-31 Iii Holdings 12, Llc Simple integration of an on-demand compute environment
US10986037B2 (en) 2005-04-07 2021-04-20 Iii Holdings 12, Llc On-demand access to compute resources
US11533274B2 (en) 2005-04-07 2022-12-20 Iii Holdings 12, Llc On-demand access to compute resources
US11831564B2 (en) 2005-04-07 2023-11-28 Iii Holdings 12, Llc On-demand access to compute resources
US11765101B2 (en) 2005-04-07 2023-09-19 Iii Holdings 12, Llc On-demand access to compute resources
US8782120B2 (en) 2005-04-07 2014-07-15 Adaptive Computing Enterprises, Inc. Elastic management of compute resources between a web server and an on-demand compute environment
US10277531B2 (en) 2005-04-07 2019-04-30 Iii Holdings 2, Llc On-demand access to compute resources
US11522811B2 (en) 2005-04-07 2022-12-06 Iii Holdings 12, Llc On-demand access to compute resources
US9075657B2 (en) 2005-04-07 2015-07-07 Adaptive Computing Enterprises, Inc. On-demand access to compute resources
US11496415B2 (en) 2005-04-07 2022-11-08 Iii Holdings 12, Llc On-demand access to compute resources
US11650857B2 (en) 2006-03-16 2023-05-16 Iii Holdings 12, Llc System and method for managing a hybrid computer environment
US9396478B2 (en) 2006-06-26 2016-07-19 Adobe System Incorporated Web-beacon plug-ins and their certification
US7992135B1 (en) * 2006-06-26 2011-08-02 Adobe Systems Incorporated Certification of server-side partner plug-ins for analytics and privacy protection
US8352917B2 (en) 2006-06-26 2013-01-08 Adobe Systems Incorporated Web-beacon plug-ins and their certification
US8365150B2 (en) 2006-06-26 2013-01-29 Adobe Systems Incorporated Multi-party web-beacon-based analytics
US20080249905A1 (en) * 2006-06-26 2008-10-09 Omniture, Inc. Multi-party web-beacon-based analytics
US11522952B2 (en) 2007-09-24 2022-12-06 The Research Foundation For The State University Of New York Automatic clustering for self-organizing grids
US11526304B2 (en) 2009-10-30 2022-12-13 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US8407319B1 (en) 2010-03-24 2013-03-26 Google Inc. Event-driven module loading
US8346889B1 (en) 2010-03-24 2013-01-01 Google Inc. Event-driven module loading
US9262396B1 (en) 2010-03-26 2016-02-16 Amazon Technologies, Inc. Browser compatibility checker tool
US8458585B1 (en) 2010-05-19 2013-06-04 Google Inc. Delayed code parsing for reduced startup latency
US9703761B2 (en) 2010-05-19 2017-07-11 Google Inc. Delayed code parsing for reduced startup latency
US8453049B1 (en) * 2010-05-19 2013-05-28 Google Inc. Delayed code parsing for reduced startup latency
US8990291B2 (en) * 2010-07-21 2015-03-24 Empire Technology Development Llc Information processing apparatus, server-client system, and computer program product
US20120023156A1 (en) * 2010-07-21 2012-01-26 Empire Technology Development Llc Information processing apparatus, server-client system, and computer program product
US9003423B1 (en) * 2011-07-29 2015-04-07 Amazon Technologies, Inc. Dynamic browser compatibility checker
US10691738B1 (en) 2017-08-07 2020-06-23 Amdocs Development Limited System, method, and computer program for tagging application data with enrichment information for interpretation and analysis by an analytics system

Also Published As

Publication number Publication date
CN1494680A (en) 2004-05-05
US7216149B1 (en) 2007-05-08
WO2002017079A2 (en) 2002-02-28
JP4046328B2 (en) 2008-02-13
KR100612711B1 (en) 2006-08-18
TW518498B (en) 2003-01-21
AU2001293760A1 (en) 2002-03-04
JP2004507816A (en) 2004-03-11
CN100399290C (en) 2008-07-02
KR20040005816A (en) 2004-01-16
EP1374060A2 (en) 2004-01-02
WO2002017079A3 (en) 2003-10-09

Similar Documents

Publication Publication Date Title
USRE41440E1 (en) Gathering enriched web server activity data of cached web content
US7676574B2 (en) Internet website traffic flow analysis
US6453342B1 (en) Method and apparatus for selective caching and cleaning of history pages for web browsers
US6449604B1 (en) Method for characterizing and visualizing patterns of usage of a web site by network users
AU2005263962B2 (en) Improved user interface
US7096418B1 (en) Dynamic web page cache
US6052730A (en) Method for monitoring and/or modifying web browsing sessions
US7346703B2 (en) Request tracking for analysis of website navigation
US6954783B1 (en) System and method of mediating a web page
US20020078191A1 (en) User tracking in a Web session spanning multiple Web resources without need to modify user-side hardware or software or to store cookies at user-side hardware
US20110191664A1 (en) Systems for and methods for detecting url web tracking and consumer opt-out cookies
US6308210B1 (en) Method and apparatus for traffic control and balancing for an internet site
WO2003010685A1 (en) Traffic flow analysis method
US7594001B1 (en) Partial page output caching
KR20020075369A (en) A system and method of mediating a web page
US7222170B2 (en) Tracking hits for network files using transmitted counter instructions
JP2000076168A (en) Distribution method of cache updating notice and system therefor
Mahanti et al. Workload characterization of a large systems conference web server
Rugaber et al. Problems modeling web sites and user behavior
WO2002003291A1 (en) System and method for delivering advertisement information on a data network with enhanced user privacy

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FEPP Fee payment procedure

Free format text: 11.5 YR SURCHARGE- LATE PMT W/IN 6 MO, LARGE ENTITY (ORIGINAL EVENT CODE: M1556); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12