WO2002048899A1 - A method of tracking browsing activity of a user in a computer network - Google Patents

A method of tracking browsing activity of a user in a computer network Download PDF

Info

Publication number
WO2002048899A1
WO2002048899A1 PCT/AU2001/001612 AU0101612W WO0248899A1 WO 2002048899 A1 WO2002048899 A1 WO 2002048899A1 AU 0101612 W AU0101612 W AU 0101612W WO 0248899 A1 WO0248899 A1 WO 0248899A1
Authority
WO
WIPO (PCT)
Prior art keywords
session
user
content
identifier
server
Prior art date
Application number
PCT/AU2001/001612
Other languages
French (fr)
Inventor
Paul Gerard Cross
Andrew Prendergast
Original Assignee
Traffion Technologies Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Traffion Technologies Pty Ltd filed Critical Traffion Technologies Pty Ltd
Priority to AU2002221336A priority Critical patent/AU2002221336A1/en
Publication of WO2002048899A1 publication Critical patent/WO2002048899A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/875Monitoring of systems including the internet

Definitions

  • the invention relates generally to a method of tracking browsing activity of a user in a computer network.
  • the invention is suitable for use in tracking the browsing behaviour of a user of the Internet and it will be convenient to describe the invention in relation to that exemplary application. It is to be appreciated, however, that the invention is not limited to use on the Internet only.
  • browsing activity of a user on a network has been tracked by attaching a token to the Uniform Resource Locator (URL) of a page on a network.
  • the token is typically a unique identifier that is encoded within a URL.
  • a first document When a first document is requested from the web server, it is returned to the client with a session identifier (ID) attached (for example, A0D3B944).
  • ID a session identifier
  • the browsing software located on the user's computer, or client is expected to send a copy of the session ID back to the network server every time the user requests an additional document from that network server.
  • the network server expects the session ID to be embedded in each HTTP request that the user sends to the network server.
  • each of these document requests can be grouped together under a single session ID - thereby creating 'Session State' .
  • a problem with the use of URL tokens is that hyperlinks and code are often cut and pasted into other network pages, or are placed in personal bookmarks, and repeatedly used to access pages on the network site.
  • the result of this is that when people link to a page on a network where a session manager is installed, they include a session ID in their hyperlink. Whenever a user clicks such a hyperlink, the network server incorrectly re-uses the old session ID. In a worst case scenario, multiple users will simultaneously click on the same hyperlink and be tracked as one session rather than as multiple sessions.
  • User browsing activity may also be tracked by the placement of a cookie on the user's computer (client) when an HTTP object, such as an Internet page, is sent to a client.
  • client When an HTTP object, such as an Internet page, is sent to a client.
  • Set inside that cookie, or state object is a piece of state information that allows the client to determine the range of URLs for which that state is valid. Any future HTTP requests made by the client that fall within that range will include a transmittal of the current value of the state object from the client back to the server.
  • shopping applications can store information about items that are currently selected for purchase by a customer. Cookies also free the client from being required to retype a user-ID on the next connection.
  • Some sites also use cookies to store user preferences on the client computer, and have the client supply those preferences every time the user connects to that site.
  • Cookies however cannot be relied upon to maintain session state in all instances. This is particularly so where users have disabled cookies, where cookies are not supported by the web browser or where the domain name of the web site changes mid-session (for example, moving from an insecure to a secure site for the purpose of completing an online transaction).
  • linear click-streams or sequential log, which is stored on some form of media, typically a sequential file on a hard disk.
  • linear click-streams, or traditional log files are time stamped and as such are entered into the log chronologically. Therefore, linear click-streams fail to accurately record a user's activity, notably by including not being able to record where and when the user used their browser's back button, not tracking sessions across (and accurately represent) meta-refresh & HTTP redirects, and not accurately re-construct a user's activity in-to and out-of frames, popup consoles and new browser windows.
  • One aspect of the present invention provides a method of facilitating tracking browsing activity of a user in a computer network, the user having established a connection session with the computer network, the method including the steps of: receiving a user request for content located on a network server or server cluster; dynamically including a unique session identifier in content served to the user during the connection session; and examining multiple possible state holders within user content requests subsequently received by the network server or server cluster during the connection session to derive navigational data including the session identifier.
  • the session identifier may include a server identifier and a session- specific alphanumeric identifier.
  • Possible state holders may include any one or more of the following: a content identifier, a Cookie value, a referrer request header field, a persistent connection indicator, and a browser signature.
  • the content identifier may be a uniform resource identifier.
  • the method may further include the step of: dynamically including in the content served to the user any one or more of a content locater, a content sequence identifier, a temporal stamp and a checksum.
  • the content locater, content sequence identifier, temporal stamp and/or checksum may be appended to the session identifier.
  • the content locater may be a uniform resource locater.
  • the content sequence identifier is a click, request or page sequence number.
  • the method may further include the step of: using the temporal stamp to temporally locate a session event during which content was served to a user.
  • the method may also include the step of: using the checksum to validate the session identifier.
  • connection session may be a persistent session linking session events from multiple single sessions established by a same user.
  • Navigational data derived from content served to the user during each session event may be stored in a session event message.
  • the session event message may be transmitted to a traffic manager to link session events within a same connection session.
  • the traffic manager may be located on the same machine as the network server or server cluster, or may be located remotely from the network server or server cluster.
  • the method may further include the step of: transmitting the user request for content to an information server engine for retrieval of the requested content.
  • the method may further include the step of: transmitting the requested content to a filter module to dynamically include at least part of the navigational data including the session identifier in the requested content prior to serving the requested content to the user.
  • At least part of the navigational data may be obtained from counters or configuration files accessible from the filter module.
  • Another aspect of the invention provides a method of facilitating tracking browsing activity of a user in a computer network, wherein the establishes a connection session with the computer network, and a unique content identifier is assigned to each separate instance of navigable content within the network, the method including the steps of: receiving navigational data, including the previous and current content identifier, corresponding to content served to a user for each session event during the connection session; and deriving a tree index structure which logically interconnects nodes representing instances of content being served to the user for each session event during the connection session with nodes representative of content from which the user has navigated.
  • the tree structure may be stored in a memory device, with a separate entry being made for each node and associate fields for identifying each node to which that node is interconnected.
  • the tree index structure may be stored in a memory device, a separate entry being created for each node and associated parent, child and sibling links being created to identify nodes interconnected to that node.
  • the method may further include the step of: merging connection sessions in which one or more session event messages have common session identifiers.
  • a further aspect of the invention may provide a network server or server cluster including a processing unit and associated memory encoded with network server software for causing the network server to performing a method of facilitating tracking browsing activity of a user in a computer network as previously described .
  • Yet another aspect of the invention may provide a software plug-in forming part of the network server software of a network server or server cluster described above, wherein the software plug-in causes the network to perform a method of facilitating tracking browsing activity of a user in a computer network as previously described.
  • the software plug-in may comprise: a filter module for dynamically including a unique session identifier in content served to the user during the connection session; and a session ID module for examining multiple possible state holders within user content requests subsequently received by the network server or server cluster during the connection session to derive navigational data including the session identifier.
  • the filter module may assigns a unique content identifier is assigned to each separate instance of navigable content within the computer network.
  • the plug-in may further comprise: a traffic manager module for receiving navigational data, including the previous and current content identifier, corresponding to content served to a user for each session event during the connection session; and deriving a tree index structure which logically connects content served to the user for each session event during the connection session with content from which the user has navigated.
  • the traffic manager module may be located remotely from the network server or server cluster.
  • Figure 1 is a general schematic diagram illustrating one embodiment of a content serving system performing a method of tracing browsing activity in accordance with the present invention
  • Figure 2 is a detailed schematic diagram of the system of Figure 1;
  • Figure 3 is a flow chart illustrating steps performed by the session ID module forming part of the network server included in the system of Figures 1 and 2;
  • Figure 4 is a diagram illustrating navigational data included in content served to a user of the system of Figures 1 and 2;
  • Figure 5 is a flow chart showing a prior art linear click stream of session events occurring during the performance of the present invention
  • Figure 6 is a flow chart showing a tree index structure used to record session events occurring during the present invention
  • Figure 7 is a variant of the tree index structure of Figure 6, and
  • Figures 8 to 10 are examples of tree index structures representing exemplary connection sessions performed in accordance with the present invention.
  • a content serving system 100 including a network user or client 1 and a network server 51.
  • the client 1 and network server 51 are interconnected in this example by the Internet 2.
  • the network server 51 acts to serve content stored in a storage media 16 to the client 1.
  • the network server 51 includes network server software 57, the functionality of which is determined by a network server plugin 58 acting in conjunction with an information server engine 54 to serve content stored on the storage media 16 to the client 1.
  • the network server plugin 58 includes a session ID detection module 55, a synchronisation module 53 and a filter module 52.
  • the network server plugin 58 also includes a traffic manager module 56, although in other embodiments this module may be provided remotely from the network server 51.
  • the client 1 makes a request to receive content located on the network server 51.
  • This request is intercepted by the session ID module 55 and analysed at step 3.
  • the session ID module 55 determines whether a token has been encoded within the HTTP request sent by the client 1.
  • the session ID module 55 determines at step 5 whether a cookie stored on the client 1 has set a cookie value, acting as a session identifier, within the HTTP request.
  • the session ID module 55 determines whether a session identifier can be derived from the referrer field in the HTTP request header.
  • the session ID module 55 determines that a session identifier can be determined at any one or more of steps 4 to 6, the session identifier will be temporarily stored in a memory storage device 7.
  • the session ID module 55 will trigger a flagged impression event at step 9.
  • the session ID module 55 is able to identify the user as one that is currently being tracked at step 8
  • a non-flagged impression event will be triggered at step 10.
  • flagged and non-flagged impression events are placed into a queue in a memory device 12. Subsequent event messages are added to the memory device 12, and the contents of that memory device may be subsequently retrieved by the traffic manager module 56 to link session events within a same connection session.
  • the content request is passed at step 13 to the information server engine 54.
  • the information server engine 54 retrieves the requested content stored in the storage media 16 at step 15, and streams or otherwise transmits this retrieved content at step 17 to the filter module 52.
  • the content forwarded from the information server engine 54 is received at step 18, and subsequently examined at step 19 to determine if the content is part of a sale, subscription, query or other result event. If the content does contain something to signify that the content is part a result event at step 18
  • the filter module 52 raises a result event message at step 21, and transmits this event message for storage in a queue maintained in the memory device 12.
  • the filter module 52 then dynamically alters the content at step 22 to be served to the user, in order that the browsing activity of the user may be tracked during the current connection session.
  • Information that is dynamically included into the content at step 22 is obtained either from counter that exist in a memory device 23, or from configuration files 24, which may typically be maintained locally on the network server 51.
  • the content is transmitted at step 25 across the Internet 2 to the client 1 for viewing by the user.
  • the synchronisation module 53 generates a set of synchronisation event messages to allow for temporal stamping of information flowing from the plugin 58.
  • Synchronisation event messages are initiated at step 26 by the generation at regular intervals of a sync signal, which are sent to the memory device 12 at step 27 either as a timesync event message 28 or a server ping message 29.
  • the purpose of the timesync event message 28 is to mark the place in the queue maintained in the memory device 12 when a particular time occurred. For example, if the queue maintained in the memory device 12 was backlogged by 30 minutes worth of processing, then the timesync signal will still allow interfacing software to record the viewing transactions with some form of time date or other temporal data.
  • the server ping messages 29 are may be placed in the queue maintained in the memory device 12 for each website or source of content active on the webserver 57. The purpose of the server ping message 29 is to signal to interfacing software that the network server is still active.
  • the traffic manager module 56 interfaces with the network server plugin 58, and is responsible for processing the event messages generated by the network server plugin 58. Initially, the traffic manager module 30 obtains the event messages maintained in the queue in the memory device 12 at step 30. The traffic manager module 56 then determines at step 31 if a retrieved event message is a server ping event message, which is irrelevant to the storage of information regarding the browsing activity of a user. If a server ping event message is detected, the message is ignored at step 32. If the event message is not a server ping message, the traffic manager module 56 considers at step 33 whether the particular user identified by the event message is already known to the traffic manager module 56 (and is hence is a tracked user).
  • the traffic manager module 56 acts, at step 34, to create a new session and assigns the new user a unique session identifier.
  • the user's unique session identifier is then stored in a searchable index 35 for subsequent reference.
  • the traffic manager module 56 acts at step 36 to record the information relating to the request into a clicktree or in other words a click sequence based tracking structure maintained in a storage device 37.
  • the traffic manager module 56 transmits tracking data to a desired location, for example, to an administration computer 39 connected to the Internet 2.
  • CDs audio compact discs
  • a user initially decides to visit a particular site on a network.
  • the user makes the decision to browse for a CD to purchase from a selected online CD store, which is located on the Internet at http://www.foo.com.
  • the user types the URL of the chosen online CD Store (here, www.foo.com) into the browsing software that is installed on the client 1.
  • the online CD store maintains a network server as shown in Figure 2 to serve content to online visitors.
  • the web server software 57 of the network server 51 communicates with the software applet or plugin 58 that performs the tracking functions that the Store wishes to perform. Examples of web server software that may be adapted for use in the present invention include Microsoft Internet Information Server and Apache HTTP Server.
  • the plugin 58 monitors incoming HTTP requests from users connected to the network. The plugin 58 attempts to match session identifiers that are embedded in incoming HTTP requests with the session identifiers of users that the plugin is already tracking.
  • the plugin 58 examines multiple possible state holders within the request to determine whether a session identifier can be derived from one or more of these state holders. In this example, a three-tier process is adopted. Firstly, the plugin determines whether there has been a token encoded within the request that will properly identify the user to the plugin. Secondly, the plugin determines if there is a cookie stored on the user's computer that helps the plugin identify the user. Finally, the plugin determines whether any information can be gleaned from the HTTP header, including the referrer value, which accompanied the request for content to be served to a particular user 6.
  • the plugin 58 allocates the user a unique session identifier and then sends that content to the user, at steps 22 and 25, with the session identifier attached (for example, A0D3B944)
  • each session identifier issued by the plugin 58 is unique over a predetermined period, which is typically 24 hours.
  • a convenient way to assign the session identifier is in sequence. For example, session identifiers may be numbered from one, assigned in sequence up to a predefined limit, starting again at one when the range has been exhausted. The larger the range, the less likely it is that a session identifier will be re-used in a defined period.
  • a server cluster In situations where there are multiple physical network computers servicing a single network site (known as a server cluster), it is desirable to ensure that session identifiers remain unique over at least a 24 hour period.
  • One way of facilitating this is to surrender part of the session identifiers range for use as a server number. This ensures that every server in the cluster, which has a unique server identifiers, will generate session identifiers that are unique across the whole cluster.
  • the server number uniquely identifies a server in a cluster of servers. Clusters of servers are typically used on network sites with high volumes of traffic. The advantage of using this technique is that mutually exclusive session identifiers can be generated by multiple network server plugins operating on multiple network servers, without the need for synchronisation between network server plugins.
  • the communications channel between the network server plugin 58 and traffic manager module 56 is not used to generate session identifiers, thereby facilitating the queuing of events for deferred processing when the traffic manager module 56 is temporarily unavailable.
  • network servers operating in either a stand alone or clustered configuration can rapidly generate new session IDs regardless of the availability of other network servers.
  • the client 1 would be expected to send a copy of the session identifier back to the web server every time an additional document is requested.
  • reliability may be achieved by giving the plugin 58 a number of options from which it may obtain the session identifier. Those options are listed below in Table 1.
  • the web server software 57 will occasionally receive two differing session identifiers in the one HTTP request.
  • the browsing activities of a user may also result in multiple session identifiers being detected by the session ID module 55.
  • a user may visit a site, back completely out of the site using their back button, then re-enter the site via the same path;
  • a user may arrive at a site during the same session via multiple sources (for example, clicking on two different banner advertisements, following two different links into the site, typing the URL of the website into another browser window after the user has already entered the site); and
  • the client 1 drops the session identifier, the network server plugin 58 generates a new session identifier on a later request, then the client 1 sends both session identifier to the server.
  • the cookie may contain the session identifier 20339843, while the referrer string might contain the session identifier 20339840.
  • This situation can be handled by merging both sessions together so that two interchangeable session identifiers reference a single session. This merge operation can result in multiple clicktrees/streams under a single session, or alternatively will result in one clicktree/stream becoming part of the other clicktree/stream. This process is represented in Figure 3.
  • navigational data in the CD Store example may include:
  • A0D3B944 the session identifier (session ID) being tracked, which is linked to a tracked user;
  • click sequence numbers may be assigned in sequence. For example, click sequence numbers are numbered from one, assigned in sequence up to a predefined limit then start again at one when the range has been exhausted. To ensure the click sequence numbers do not collide, the size of the range used for the sequence numbers should be equal to the range used for session identifiers.
  • the session identifier may be generated as a function of the current time. As each of the modules 52 to 55 only physically processes one session identifier at a time, it is possible to use the current time as the session identifier.
  • Another method that may be used to generate unique session identifiers is by reference to values that are stored in a database local to the information server engine 54. This database may be used to produce unique values to be used as session identifiers based on any number of attributes.
  • the system 100 shown in Figures 1 and 2 may track not only sessions that are taking place contemporaneously, but also allows sessions that appear disconnected by varying amounts of time. For example, custom tracking data that has been stored on the client 1 with future sessions able to be matched where incoming HTTP requests also reference that custom tracking data. In other words, a persistent session state between multiple sessions is simulated with a reference to the custom tracking data.
  • Custom tracking data is defined when a network user first arrives at a network site, as part of an incoming session, and normally consists of one or more identifiers that describe a link to the network site in detail.
  • a placement identifier can be incorporated into the custom tracking data to describe where on a page an advertising message has appeared.
  • a further placement identifier may be provided to identify where in a network site an advertising message appeared.
  • An advertisement identifier might be used to identify the exact advertising message presented to a network user.
  • the checksum may be appended to the session identifier to detect the corruption of session identifiers, such as the problem of users occasionally tampering with URL tokens or URL concatenation. It also ensures that data from other systems that may also be embedded in the same URL is not mistaken for a valid session identifier.
  • the session ID module 55 attempts to retrieve the session identifier from the cookie value, which is temporarily stored on the client 1. Cookies are placed on the client 1, when an HTTP object, such as an Internet page, is sent to the client 1 at step 25. Set inside that cookie, or state object, is a piece of state information that allows the client 1 to determine the range of URLs for which that state is valid. Any future HTTP requests made by the client that fall within that range will include a transmittal of the current value of the state object from the client back to the server.
  • Cookies may be advantageously used in the preset invention to identify people that have visited a site for the first time.
  • a persistent cookie may be used, which can exist on the client 1 for up to two years, but which are typically set to expire after 12 months. This cookie is used to identify users that revisit the site within that period. If this cookie does not exist, or if after analysing the value it is determined that the cookie was sent more than 12 months ago, a new cookie is sent to the client and an event is triggered, which is stored in memory to be accessed by other processes running on the server. Cookies may also be used to identify people that have returned to a site
  • a monthly cookie may be used, which is checked on each impression. The purpose of this cookie is to identify users who have returned to the site within the month. If a monthly cookie does not exist, or if after analysing the value it is determined that the cookie does not refer to the current month, a new monthly cookie is sent to the client and an event is triggered, with a message stored in memory for further use by other processes running on the server.
  • Cookies may be used to maintain session state. For example, a temporary cookie may be used, which expires when the browser is closed. The purpose of this temporary cookie is to help the plugin 58 track the current session identifier.
  • cookies may be used to maintain persistent custom tracking data.
  • a suitable cookie for this purpose is a persistent cookie, which can exist on the client 1 for up to two years, but which is typically set to expire after 12 months.
  • custom tracking data is first supplied, it is stored on the client 1, and the connection sessions tracked in regards to the custom tracking data. If at a later date a revised set of custom tracking data is supplied to plugin 58 for a specific user, the cookie data is replaced with new custom tracking data.
  • the Referrer request-header field allows the client 1 to specify, for the server's benefit, the URI of the resource from which the Request-URI was obtained.
  • the plugin 58 at step 6, reads the referrer string that is received from the client 1 in the incoming HTTP headers. The string is analysed to determine if it contains a valid session identifier, or tce_sid CGI parameter. Additionally, the referrer is included, where possible, in the information that is stored the details regarding the current content request.
  • this user has requested a page on the network server by typing the URL into the client browser, and so does not have a current session identifier, which can be tracked by the plugin 58.
  • the fact that this request has come from an unknown user triggers a new user event.
  • the plugin 58 in such circumstances, then creates a session identifier that is unique, and commences to track this user by linking the session identifier to subsequent requests.
  • the session identifier scheme discussed above provides server-side session management, allowing seamless tracking and recording of the movement of a user on a network.
  • the first of these is persistent TCP connections.
  • a separate TCP connection was established to fetch each URL, increasing the load on HTTP servers and causing congestion on the Internet.
  • a single TCP connection is established over which a series of HTTP requests are sent.
  • An ideal use of persistent connections is as an error detection mechanism.
  • a list is maintained for the persistent connection.
  • a session map is generated informing the session manager to perform an error correction - if the error correction has already been performed for the given identifiers, the session map is discarded by the session manager.
  • the second state holder is a browser signature.
  • the plugin 58 will raise an event that stores, amongst other things, the session identifier being tracked, the page or other content the user is currently being served and the content the user has requested. If the plugin 58 cannot determine that the user who made the request is currently being tracked, the plugin 58 will raise a new user event by setting the first impression flag to an on state.
  • All impression event messages contain certain information relating to the navigation of a user, and may include the following:
  • a session identifier for example, A0D3B944
  • a previous click number the identification number processed from the token or cookie, for example, 232991
  • CGI Common Gateway Interface
  • a queue is a linear list software data structure, where all insertions are made at one end of the list and all removals and access at the other. Therefore, a queue is a First-In-First-Out (FIFO) or First-Come-First-Served (FCFS) structure used for storing certain information; here it is used for sequentially and chronologically storing event messages.
  • FIFO First-In-First-Out
  • FCFS First-Come-First-Served
  • the plugin 58 then passes the original request (in the case of the CD Store, http://www.foo.com) to the network information server engine 54.
  • the network server engine 54 retrieves the content required to display the page with the URL http://www.foo.com from the storage media 16, usually a hard disk on the network server, and then streams that data back to the plugin 58.
  • the plugin As the plugin receives the stream of data from the information server engine 54, the plugin examines the data to determine identify markers, or tags, that are embedded in the content of the page being sent that would indicate some transaction (result event), such as a sale, has taken place. If no transaction has taken place, the plugin simply passes the streamed data to a filter that modifies navigational links that are embedded in the streamed data.
  • One form of data files that will be sent is machine readable, or binary form, such as images (for example, *.gif; *.jpg), audio (for example, *.wav; *.au; *.mid) and executable programs (for example, *.exe; *.class).
  • a second form of data files that will be sent is text form, such as scripts (for example, *.pl; *.vbs) and markup (for example, *.html; *.sgml).
  • Each of these files can be defined according to their major and minor MIME type. For example, *.html files have the MIME type of text html. Those files that fall into the text/* MIME category will always generate an impression event.
  • modifiable HTML tags may contain navigation links, which are used to access other data on the network. Tokens, in the form outlined above, are encoded within these navigation links in the modifiable HTML tags.
  • the session identifier (A0D3B944) has remained the same because the plugin is still tracking a particular session . Typically the date (0101) will also remain the same. The click identifier has been incremented, and the checksum will now return a different value because the composition of the token has changed.
  • the filter module 52 then examines tags that are embedded in the content being streamed to determine if there are any tags that are concerned with navigation around the network. Examples of tags that are concerned with navigation include:
  • Tags that are concerned with navigation are then dynamically altered by the plugin 58 to allow the plugin to link the next HTTP request that is received from the user, with the page that is currently being served to the user. This is done, for example, by appending a URL token to the end of each navigation tag.
  • the contents of these tokens uniquely identify each user, and also identify from which page the user has made a particular request.
  • the values that make up the contents of the tokens are retrieved from configuration files 24 and counters maintained in memory 23.
  • the token A0D3B944232991010191 is encoded within the end of each URL or URI contained in the streamed content.
  • the filter finishes processing the data being streamed through it, the data is then passed back across the Internet from the network server 51 to the client 1, where the client browser displays the requested page to the user.
  • the user may then decide to purchase an item displayed on the page just served, by clicking the words "Buy It".
  • the server passes that request through the plugin 58.
  • the plugin 58 can determine the page, or URI, being requested (http://www.foo.com/buy_it.html) and can also determine the token that was applied during the filtering process, which took place as the document was streamed to the user.
  • tce_sid signify that tce_sid is a name or common gateway interface (CGI) variable that is used by the plugin to refer to the value of the token that is received by the network server, in this case tce_sid represents the token value of A0D3B944232991010191.
  • CGI common gateway interface
  • the second decision at step 5 involves the querying of a value stored in a cookie that is typically located on the user's computer.
  • the third decision at step 6 involves the querying of the referrer field that is contained in the hypertext transfer protocol (HTTP) headers.
  • HTTP hypertext transfer protocol
  • the referrer field would, in this instance, also contain the URL http://www.foo.com, which was the previous page sent back to the user. In this example, it will be assumed that this information does not contain a valid token, and is so disregarded.
  • the plugin 58 can determine information that is relevant for tracking users around a network, or for determining a user browsing profile. Information recorded includes the tracked user from which the request originated from, from which navigable content they are navigating from, and what page are they navigating to. As the plugin 58 is able to determine that this is a tracked user, the plugin then triggers an impression event message at step 10, and at step 11 stores the retrieved data in memory device 12 so that data can be accessed by other processes running on the network server 51.
  • the plugin 58 then passes the request to the network information server engine 54 at steps 13 and 14 to retrieve the content requested from storage media 16 in a manner outlined above.
  • the information server then streams the data back through the plugin at steps 17 and 18.
  • the user in this example may have merely entered into the page where the user may make a purchase, and the transaction has not yet been completed. Accordingly, no tag will be embedded in the content being served to the user that would indicate that a transaction, or result event, has taken place and should be recorded.
  • navigational tags are dynamically altered at step 22 to append the next token, which typically includes the current session identifier and a new number that corresponds to the page that has been requested, or the click ID.
  • the token A0D3B944232999010136 will be encoded within each navigation link embedded in the served document. Note that the Server Number (A0) has remained the same, the session ID (A0D3B944) has remained the same, the date has remained the same (0101) but the click ID (232999) and the checksum (36) have changed.
  • the content is then streamed at step 25 from the network server 51, across the network 2, to the client 1. The user may then decide that the displayed CD is the one they want to purchase and so choose to now complete the purchase.
  • the plugin 58 then stores the various details relating to the user's request in an impression event message, which is stored in memory device 12, as described above.
  • the content is requested from the network information server, which streams the requested data back into the plugin.
  • tags are embedded in the content being served at step 8, thus triggering the plugin to create a result event message, and then at step 21 place that event message in memory device 12 so that message may be used by other processes running on the server.
  • This event message signifies that a transaction has taken place, and the relevant details are matched to the appropriate session identifier, linking a navigation profile with a transaction or result event that has occurred on the site.
  • the result might also be a non-financial transaction.
  • the network user could have subscribed to the CD Store's clearance sale mail list, which is known as a lead.
  • the plugin 58 appends the new token to the navigation links that are embedded in the content at step 22 and then at step 25 sends the content across the network 2 to the client 1, as outlined above.
  • Information obtained during the above described process is stored in memory device 12 in the form of impression event messages. These impression event messages are stored ⁇ n a FIFO queue so that a separate process that may, be operating on the same or a different server as the network server plugin 58, is capable of storing that information in an intelligible manner that will later allow easier tracking of a user's browsing behaviour.
  • clicktrees path- analysis An accurate representation of a user's true experience when visiting a website may be obtained by using clicktrees path- analysis.
  • the most discerning feature of clicktrees stem from the way that information is stored.
  • clicktrees facilitate easier development of data mining and visualisation tools that can be used to identify the most common paths taken to achieve results, clicktrees will also assist these analytical tools to identify areas of a network site which are either popular, or which are leading users away from results, or perhaps to rival network sites.
  • Clicktrees can also be used to describe user interactions across network sites.
  • Linear click-streams or traditional log files, fail to accurately record a user's activity, notably by being unable to:
  • clicktrees tree indexing facilitates an accurate representation of sessions in which: (a) a user clicked on a banner, left the site, then clicked on the banner again; and
  • Clicktrees may be constructed from a sequence of events by using various pieces of information, and may include: (a) a timestamp that identifies when each document was served by the web server, typically accurate to the second, and/or
  • Table 3 is a linear representation of the requests made by the user being tracked in session ID A0D3B944. This table is similar to the current form of linear log files that are currently being used on network sites to store usage information, however it will be noted that the data captured includes both click and session identifiers.
  • the session events recorded in this table can be represented by the time- based sequence shown in Figure 5, where the events have been sorted in a linear sequence, based on the time stamp of each event.
  • Existing browsing activity tracking systems record user activity is linear sequences of this type.
  • each event recorded in Table 3 has been arranged according to its previous click ID number.
  • This clicktrees tree index structure quickly displays, and most accurately describes, the network user's browsing experience.
  • a click ID number is essentially a number that is allocated to each page or other separately navigable content that is served from the network server to a network information server. As outlined above, the click ID number is also part of the URL token that is encoded within each modifiable HTML tag, which is processed by the plugin filter. The steps the filter takes in assigning a click ID are essentially as follows:
  • Dynamically tracking a user's movements around a network site involves the logical construction and storage of clicktrees in the computer's random access memory (RAM).
  • a separate clicktree is stored in RAM for each active connection session that is being tracked. The tracking of session identifiers have been described above
  • the construction and storage of clicktrees is conducted by the traffic manager module 56, which may or may not be operating on the same computer as the network server 51 and plugin 58, obtaining an impression event message from the memory device 12.
  • the traffic manager module 56 then examines the impression event message to determine the session identifier that is stored in the impression event message. Take, for example session identifier A0D3B944, which was used in the CD example above. The session identifier A0D3B944 would be contained in the impression event message.
  • the traffic manager module 56 looks for session identifier A0D3B944 in the list of active sessions, and if it is not located as an active session, then the traffic manager module 56 starts a new session for that session identifier.
  • the traffic manager module 56 determines the previous and current click ID numbers so that it can insert the next entry in the appropriate place in the clicktree.
  • the logical relationship between each impression event (or node) in the tree index structure is clearly defined. It is also clear that the result event did not come from going through the page identified by click ID 22152. Instead, the user has pressed the 'back' navigation icon in the user's browser, and has then proceeded down a path through the page represented by click ID 22180. Notwithstanding the limits of modern technology, using this technique does not limit the number of links a node may have.
  • the tree index structure may be stored in RAM by creating a separate entry for each node, and associated fields for identification of each node to which that node is interconnected.
  • an alternative storage and representation method may be used, which reduces memory requirements.
  • This method also has the effect that traversal time is reduced due to the lack of a variable length link list at each node
  • the clicktrees tree index structure resulting from this method is shown in Figure 7.
  • a parent, child and sibling field only are required to be created for each node in the tree index structure recorded in memory, rather than allowing for the possibility of an infinite number of fields corresponding to an infinite number of nodes interconnected to each node.
  • the method used to store and access clicktrees in RAM may not appropriate for some systems. Before storing a clicktree on such systems, the clicktrees may be converted back into a sequential list format, similar to that shown in Table 3.
  • Clicktrees can then be formed at a later stage by reconstructing the tree structure, based on previous and current click ID numbers.
  • connection sessions are practical examples of what one should expect in the CD Store example made previously.
  • a sequential list of events is shown for each session in a table. From the table, each session is represented visually in two forms i.e., as a linear clickstream based on the application of time-based sequencing, and as an optimised clicktree tree index structure based on the application of click sequence numbers.
  • An additional example is offered for some sessions in the form of a logical clicktree.
  • the following scenario is based on a web server cluster consisting of two physically distinct machines, both of which are communicating with a single traffic manager module 56 located on one of these machines or remote from both machines.
  • a first exemplary connection session is shown in Table 4.
  • a user enters site through homepage, visits the products page then retrieves information about delivery payment zones. After reviewing the delivery payment zones, the user presses "back" and proceeds to review a single product. Happy with their selection, the user proceeds to buy the product and receives a receipt. Six impression events and, one result event are generated.
  • a user enters site through homepage visits the products page then retrieves information about delivery payment zones. After reviewing the delivery payment zones, the user presses "back" and proceeds to review a single product. Happy with their selection, the user proceeds to buy the product and receives a receipt. Curious about where their sensitive information will end up, the user returns to the homepage by pressing the back button four times then reads the site's privacy policy. Seven impression events and one result event are generated.
  • a third exemplary connection session is illustrated in table 6 and Figure 10.
  • the user After clicking on an advertisement, the user enters the site through a jump page.
  • the jump page presents the user with a form offering for them to subscribe to a special offers mailing list.
  • the user fills in their details and is presented with a thank you page after submitting their details.
  • the user After half an hour of surfing, the user is presented with another advertisement. Again clicking on the advertisement, the user this time enters the site through the frames based interface discussed previously. Five impression events (two marked as TRASH), and one non-financial result event are generated. Two impressions had a last click ID of zero, indicating that two click-trees were started during the same session (in this case the user entered the site twice).
  • Session hopping occurs when a client "hops" between two or more session IDs, making what is essentially a single session appear to be two or more simultaneous sessions.

Abstract

A method of facilitating tracking browsing activity of a user (1) in a computer network (100), the user having established a connection session with the computer network, the method including the steps of: receiving (3) a user request for content located on a network server (51) or server cluster; dynamically including (22) a unique session identifier in content served to the user during the connection session; and examining (4, 5, 6) multiple possible state holders within user content requests subsequently received by the network server or server cluster during the connection session to derive navigational data including the session identifier.

Description

A METHOD OF TRACKING BROWSING ACTIVITY OF A USER IN A
COMPUTER NETWORK
The invention relates generally to a method of tracking browsing activity of a user in a computer network. The invention is suitable for use in tracking the browsing behaviour of a user of the Internet and it will be convenient to describe the invention in relation to that exemplary application. It is to be appreciated, however, that the invention is not limited to use on the Internet only. Traditionally, browsing activity of a user on a network has been tracked by attaching a token to the Uniform Resource Locator (URL) of a page on a network. The token is typically a unique identifier that is encoded within a URL. In the examplary URI request http://www.acme.com/doc.html?tce_sid=(A0D3B944)232991010191, the token A0D3B944232991010191 is encoded within the URI http://www.acme.con_/doc.html.
When a first document is requested from the web server, it is returned to the client with a session identifier (ID) attached (for example, A0D3B944). From then on, the browsing software located on the user's computer, or client, is expected to send a copy of the session ID back to the network server every time the user requests an additional document from that network server. In other words, the network server expects the session ID to be embedded in each HTTP request that the user sends to the network server. Using the session ID data, each of these document requests can be grouped together under a single session ID - thereby creating 'Session State' .
A problem with the use of URL tokens is that hyperlinks and code are often cut and pasted into other network pages, or are placed in personal bookmarks, and repeatedly used to access pages on the network site. The result of this is that when people link to a page on a network where a session manager is installed, they include a session ID in their hyperlink. Whenever a user clicks such a hyperlink, the network server incorrectly re-uses the old session ID. In a worst case scenario, multiple users will simultaneously click on the same hyperlink and be tracked as one session rather than as multiple sessions.
User browsing activity may also be tracked by the placement of a cookie on the user's computer (client) when an HTTP object, such as an Internet page, is sent to a client. Set inside that cookie, or state object, is a piece of state information that allows the client to determine the range of URLs for which that state is valid. Any future HTTP requests made by the client that fall within that range will include a transmittal of the current value of the state object from the client back to the server.
This provides a simple mechanism that enables a range of new types of applications to be written for network-based environments. For example, shopping applications, as described in the example below, can store information about items that are currently selected for purchase by a customer. Cookies also free the client from being required to retype a user-ID on the next connection.
Some sites also use cookies to store user preferences on the client computer, and have the client supply those preferences every time the user connects to that site.
Cookies however cannot be relied upon to maintain session state in all instances. This is particularly so where users have disabled cookies, where cookies are not supported by the web browser or where the domain name of the web site changes mid-session (for example, moving from an insecure to a secure site for the purpose of completing an online transaction).
Further, the traditional method of tracking a network user's behaviour has been by using a linear click-stream, or sequential log, which is stored on some form of media, typically a sequential file on a hard disk. However, linear click-streams, or traditional log files, are time stamped and as such are entered into the log chronologically. Therefore, linear click-streams fail to accurately record a user's activity, notably by including not being able to record where and when the user used their browser's back button, not tracking sessions across (and accurately represent) meta-refresh & HTTP redirects, and not accurately re-construct a user's activity in-to and out-of frames, popup consoles and new browser windows.
It would therefore be desirable to provide a method of facilitating tracking browsing activity of a user in a computer network that ameliorates or overcomes one or more problems of known browsing activity tracking methods.
One aspect of the present invention provides a method of facilitating tracking browsing activity of a user in a computer network, the user having established a connection session with the computer network, the method including the steps of: receiving a user request for content located on a network server or server cluster; dynamically including a unique session identifier in content served to the user during the connection session; and examining multiple possible state holders within user content requests subsequently received by the network server or server cluster during the connection session to derive navigational data including the session identifier.
The session identifier may include a server identifier and a session- specific alphanumeric identifier. Possible state holders may include any one or more of the following: a content identifier, a Cookie value, a referrer request header field, a persistent connection indicator, and a browser signature.
The content identifier may be a uniform resource identifier.
In one or more embodiments, the method may further include the step of: dynamically including in the content served to the user any one or more of a content locater, a content sequence identifier, a temporal stamp and a checksum.
The content locater, content sequence identifier, temporal stamp and/or checksum may be appended to the session identifier.
The content locater may be a uniform resource locater. The content sequence identifier is a click, request or page sequence number. In one or more embodiments, the method may further include the step of: using the temporal stamp to temporally locate a session event during which content was served to a user. The method may also include the step of: using the checksum to validate the session identifier.
The connection session may be a persistent session linking session events from multiple single sessions established by a same user.
Navigational data derived from content served to the user during each session event may be stored in a session event message. The session event message may be transmitted to a traffic manager to link session events within a same connection session.
The traffic manager may be located on the same machine as the network server or server cluster, or may be located remotely from the network server or server cluster. The method may further include the step of: transmitting the user request for content to an information server engine for retrieval of the requested content.
The method may further include the step of: transmitting the requested content to a filter module to dynamically include at least part of the navigational data including the session identifier in the requested content prior to serving the requested content to the user.
At least part of the navigational data may be obtained from counters or configuration files accessible from the filter module.
Another aspect of the invention provides a method of facilitating tracking browsing activity of a user in a computer network, wherein the establishes a connection session with the computer network, and a unique content identifier is assigned to each separate instance of navigable content within the network, the method including the steps of: receiving navigational data, including the previous and current content identifier, corresponding to content served to a user for each session event during the connection session; and deriving a tree index structure which logically interconnects nodes representing instances of content being served to the user for each session event during the connection session with nodes representative of content from which the user has navigated.
The tree structure may be stored in a memory device, with a separate entry being made for each node and associate fields for identifying each node to which that node is interconnected. Alternatively, the tree index structure may be stored in a memory device, a separate entry being created for each node and associated parent, child and sibling links being created to identify nodes interconnected to that node.
The method may further include the step of: merging connection sessions in which one or more session event messages have common session identifiers.
A further aspect of the invention may provide a network server or server cluster including a processing unit and associated memory encoded with network server software for causing the network server to performing a method of facilitating tracking browsing activity of a user in a computer network as previously described .
Yet another aspect of the invention may provide a software plug-in forming part of the network server software of a network server or server cluster described above, wherein the software plug-in causes the network to perform a method of facilitating tracking browsing activity of a user in a computer network as previously described.
The software plug-in may comprise: a filter module for dynamically including a unique session identifier in content served to the user during the connection session; and a session ID module for examining multiple possible state holders within user content requests subsequently received by the network server or server cluster during the connection session to derive navigational data including the session identifier. The filter module may assigns a unique content identifier is assigned to each separate instance of navigable content within the computer network. The plug-in may further comprise: a traffic manager module for receiving navigational data, including the previous and current content identifier, corresponding to content served to a user for each session event during the connection session; and deriving a tree index structure which logically connects content served to the user for each session event during the connection session with content from which the user has navigated.
The traffic manager module may be located remotely from the network server or server cluster.
The following description refers in more detail to the various features of the present invention. To facilitate an understanding of the invention, reference is made in the description to the accompanying drawings where the method of tracking browsing activity of a user in a computer network is illustrated in a preferred embodiment. It is to be understood that the present invention is however not limited to that exemplary embodiment.
In the drawings:
Figure 1 is a general schematic diagram illustrating one embodiment of a content serving system performing a method of tracing browsing activity in accordance with the present invention;
Figure 2 is a detailed schematic diagram of the system of Figure 1;
Figure 3 is a flow chart illustrating steps performed by the session ID module forming part of the network server included in the system of Figures 1 and 2; Figure 4 is a diagram illustrating navigational data included in content served to a user of the system of Figures 1 and 2;
Figure 5 is a flow chart showing a prior art linear click stream of session events occurring during the performance of the present invention; Figure 6 is a flow chart showing a tree index structure used to record session events occurring during the present invention;
Figure 7 is a variant of the tree index structure of Figure 6, and
Figures 8 to 10 are examples of tree index structures representing exemplary connection sessions performed in accordance with the present invention.
Referring now to Figure 1, there is shown generally a content serving system 100 including a network user or client 1 and a network server 51. The client 1 and network server 51 are interconnected in this example by the Internet 2. The network server 51 acts to serve content stored in a storage media 16 to the client 1. The network server 51 includes network server software 57, the functionality of which is determined by a network server plugin 58 acting in conjunction with an information server engine 54 to serve content stored on the storage media 16 to the client 1. The network server plugin 58 includes a session ID detection module 55, a synchronisation module 53 and a filter module 52. In this example, the network server plugin 58 also includes a traffic manager module 56, although in other embodiments this module may be provided remotely from the network server 51.
The operation of the system 100 will now be briefly described with relation to Figure 2. Initially, the client 1 makes a request to receive content located on the network server 51. This request is intercepted by the session ID module 55 and analysed at step 3. At step 4, the session ID module 55 determines whether a token has been encoded within the HTTP request sent by the client 1. Next, the session ID module 55 determines at step 5 whether a cookie stored on the client 1 has set a cookie value, acting as a session identifier, within the HTTP request. In the next step 6, the session ID module 55 determines whether a session identifier can be derived from the referrer field in the HTTP request header.
If the session ID module 55 determines that a session identifier can be determined at any one or more of steps 4 to 6, the session identifier will be temporarily stored in a memory storage device 7.
If the module 55 is unable to positive identify the user as a user whose browsing activity is currently being tracked by matching the detected session identifier with previously generated and stored session identifiers, the session ID module 55 will trigger a flagged impression event at step 9. Alternatively, if the session ID module 55 is able to identify the user as one that is currently being tracked at step 8, a non-flagged impression event will be triggered at step 10. At step 11, flagged and non-flagged impression events are placed into a queue in a memory device 12. Subsequent event messages are added to the memory device 12, and the contents of that memory device may be subsequently retrieved by the traffic manager module 56 to link session events within a same connection session.
Once the session ID module 55 has stored an event message in the memory device 12, the content request is passed at step 13 to the information server engine 54. Once the server request is received by the information server engine 54 at step 14, the information server engine 54 retrieves the requested content stored in the storage media 16 at step 15, and streams or otherwise transmits this retrieved content at step 17 to the filter module 52.
At step 18, the content forwarded from the information server engine 54 is received at step 18, and subsequently examined at step 19 to determine if the content is part of a sale, subscription, query or other result event. If the content does contain something to signify that the content is part a result event at step
20, the filter module 52 raises a result event message at step 21, and transmits this event message for storage in a queue maintained in the memory device 12.
The filter module 52 then dynamically alters the content at step 22 to be served to the user, in order that the browsing activity of the user may be tracked during the current connection session. Information that is dynamically included into the content at step 22 is obtained either from counter that exist in a memory device 23, or from configuration files 24, which may typically be maintained locally on the network server 51. Once the content has been dynamically altered at step 22, the content is transmitted at step 25 across the Internet 2 to the client 1 for viewing by the user.
The synchronisation module 53 generates a set of synchronisation event messages to allow for temporal stamping of information flowing from the plugin 58. Synchronisation event messages are initiated at step 26 by the generation at regular intervals of a sync signal, which are sent to the memory device 12 at step 27 either as a timesync event message 28 or a server ping message 29. The purpose of the timesync event message 28 is to mark the place in the queue maintained in the memory device 12 when a particular time occurred. For example, if the queue maintained in the memory device 12 was backlogged by 30 minutes worth of processing, then the timesync signal will still allow interfacing software to record the viewing transactions with some form of time date or other temporal data. At predetermined intervals the server ping messages 29 are may be placed in the queue maintained in the memory device 12 for each website or source of content active on the webserver 57. The purpose of the server ping message 29 is to signal to interfacing software that the network server is still active.
The traffic manager module 56 interfaces with the network server plugin 58, and is responsible for processing the event messages generated by the network server plugin 58. Initially, the traffic manager module 30 obtains the event messages maintained in the queue in the memory device 12 at step 30. The traffic manager module 56 then determines at step 31 if a retrieved event message is a server ping event message, which is irrelevant to the storage of information regarding the browsing activity of a user. If a server ping event message is detected, the message is ignored at step 32. If the event message is not a server ping message, the traffic manager module 56 considers at step 33 whether the particular user identified by the event message is already known to the traffic manager module 56 (and is hence is a tracked user). If the user is determined not to be a tracked user then no existing session details will be recorded for that user and the traffic manager module 56 acts, at step 34, to create a new session and assigns the new user a unique session identifier. The user's unique session identifier is then stored in a searchable index 35 for subsequent reference.
Alternatively, if it is determined at step 33 that a user is a tracked user and thus has an existing session, or where that session has just been created at step 34, the traffic manager module 56 acts at step 36 to record the information relating to the request into a clicktree or in other words a click sequence based tracking structure maintained in a storage device 37. At step 38, the traffic manager module 56 transmits tracking data to a desired location, for example, to an administration computer 39 connected to the Internet 2.
There will now be described an exemplary application of the present invention, based on an online store that sells audio compact discs (CDs) over a computer network, such as the Internet. In this example, a user initially decides to visit a particular site on a network. In this case, the user makes the decision to browse for a CD to purchase from a selected online CD store, which is located on the Internet at http://www.foo.com.
The user types the URL of the chosen online CD Store (here, www.foo.com) into the browsing software that is installed on the client 1. To track the browsing behaviour of the user, the online CD store maintains a network server as shown in Figure 2 to serve content to online visitors. The web server software 57 of the network server 51 communicates with the software applet or plugin 58 that performs the tracking functions that the Store wishes to perform. Examples of web server software that may be adapted for use in the present invention include Microsoft Internet Information Server and Apache HTTP Server. The plugin 58 monitors incoming HTTP requests from users connected to the network. The plugin 58 attempts to match session identifiers that are embedded in incoming HTTP requests with the session identifiers of users that the plugin is already tracking. To determine if a particular HTTP request is from a user that is currently being tracked, the plugin 58 examines multiple possible state holders within the request to determine whether a session identifier can be derived from one or more of these state holders. In this example, a three-tier process is adopted. Firstly, the plugin determines whether there has been a token encoded within the request that will properly identify the user to the plugin. Secondly, the plugin determines if there is a cookie stored on the user's computer that helps the plugin identify the user. Finally, the plugin determines whether any information can be gleaned from the HTTP header, including the referrer value, which accompanied the request for content to be served to a particular user 6. When a first document or other content is initially requested from the network server 51, the plugin 58 allocates the user a unique session identifier and then sends that content to the user, at steps 22 and 25, with the session identifier attached (for example, A0D3B944)
To ensure that data from two unrelated connection sessions do not overlap, each session identifier issued by the plugin 58, is unique over a predetermined period, which is typically 24 hours. A convenient way to assign the session identifier is in sequence. For example, session identifiers may be numbered from one, assigned in sequence up to a predefined limit, starting again at one when the range has been exhausted. The larger the range, the less likely it is that a session identifier will be re-used in a defined period.
In situations where there are multiple physical network computers servicing a single network site (known as a server cluster), it is desirable to ensure that session identifiers remain unique over at least a 24 hour period. One way of facilitating this is to surrender part of the session identifiers range for use as a server number. This ensures that every server in the cluster, which has a unique server identifiers, will generate session identifiers that are unique across the whole cluster. The server number uniquely identifies a server in a cluster of servers. Clusters of servers are typically used on network sites with high volumes of traffic. The advantage of using this technique is that mutually exclusive session identifiers can be generated by multiple network server plugins operating on multiple network servers, without the need for synchronisation between network server plugins. The communications channel between the network server plugin 58 and traffic manager module 56 is not used to generate session identifiers, thereby facilitating the queuing of events for deferred processing when the traffic manager module 56 is temporarily unavailable. Given the autonomy of this session identifier generation procedure, network servers operating in either a stand alone or clustered configuration can rapidly generate new session IDs regardless of the availability of other network servers. In an optimal situation, the client 1 would be expected to send a copy of the session identifier back to the web server every time an additional document is requested. However, given the unpredictable nature of the Internet, the variations in network technology (such as proxies and firewalls), and the diverse range of web browsers in use, reliability may be achieved by giving the plugin 58 a number of options from which it may obtain the session identifier. Those options are listed below in Table 1.
Table 1
■Jflffi -- From Server-to-Client From Client-to-Server
Figure imgf000015_0001
Due to the existence of various technologies in network communications, such as proxy servers and other caching technologies, the web server software 57 will occasionally receive two differing session identifiers in the one HTTP request.
The browsing activities of a user may also result in multiple session identifiers being detected by the session ID module 55. In particular: (a) a user may visit a site, back completely out of the site using their back button, then re-enter the site via the same path;
(b) a user may arrive at a site during the same session via multiple sources (for example, clicking on two different banner advertisements, following two different links into the site, typing the URL of the website into another browser window after the user has already entered the site); and
(c) the client 1 drops the session identifier, the network server plugin 58 generates a new session identifier on a later request, then the client 1 sends both session identifier to the server.
For example, the cookie may contain the session identifier 20339843, while the referrer string might contain the session identifier 20339840. This situation can be handled by merging both sessions together so that two interchangeable session identifiers reference a single session. This merge operation can result in multiple clicktrees/streams under a single session, or alternatively will result in one clicktree/stream becoming part of the other clicktree/stream. This process is represented in Figure 3.
In a variant of the above-described technique, other navigational data may be appended to enhance the reliability and the functionality of the information collected about each session. As seen in Figure 4, the navigational data in the CD Store example may include:
(a) A0 = the server number;
(b) A0D3B944 = the session identifier (session ID) being tracked, which is linked to a tracked user;
(c) 232991 = the current click/request/page sequence number;
(d) 0101 = the day and month; and
(e) 91 = a checksum.
Accordingly, a URL token with this navigational data embedded in it may look something like: http://www.foo.com/doc.html?tce_sid=A0D3B944232991010191
Every document or other content that is served is numbered in the range 0 though to 16,777,125 - regardless of the session identifier. This click sequence number is inserted into the session identifier, and later used to reconstruct the sequence of events occurring in a single session. As with session identifiers, click sequence numbers may be assigned in sequence. For example, click sequence numbers are numbered from one, assigned in sequence up to a predefined limit then start again at one when the range has been exhausted. To ensure the click sequence numbers do not collide, the size of the range used for the sequence numbers should be equal to the range used for session identifiers.
It will be appreciated that alternate session identifier generation methods may be used with the present invention. For example, the session identifier may be generated as a function of the current time. As each of the modules 52 to 55 only physically processes one session identifier at a time, it is possible to use the current time as the session identifier.
Another method that may be used to generate unique session identifiers is by reference to values that are stored in a database local to the information server engine 54. This database may be used to produce unique values to be used as session identifiers based on any number of attributes.
The system 100 shown in Figures 1 and 2 may track not only sessions that are taking place contemporaneously, but also allows sessions that appear disconnected by varying amounts of time. For example, custom tracking data that has been stored on the client 1 with future sessions able to be matched where incoming HTTP requests also reference that custom tracking data. In other words, a persistent session state between multiple sessions is simulated with a reference to the custom tracking data. Custom tracking data is defined when a network user first arrives at a network site, as part of an incoming session, and normally consists of one or more identifiers that describe a link to the network site in detail.
For example, a placement identifier can be incorporated into the custom tracking data to describe where on a page an advertising message has appeared. A further placement identifier may be provided to identify where in a network site an advertising message appeared. An advertisement identifier might be used to identify the exact advertising message presented to a network user. When user activity further is analysed from the perspective of the custom tracking data, the appearance of persistent state can then be simulated, as though the connection session never ended. A major problem with URL tokens is that hyperlinks and code are often cut and pasted into other network pages, or are placed in personal bookmarks and used repeatedly. The result of this is that when people link to a page on a network where a session manager is installed, they include the session identifier in their hyperlink. Whenever a user clicks such a hyperlink, the network server incorrectly re-uses the old session identifier. In a worst case scenario, multiple users will simultaneously click on the same hyperlink and be tracked as one session (as opposed to multiple sessions). One solution to this problem is to include a mechanism whereby session identifiers expire after a determined period. The date information embedded in the navigational data shown in Figure 4 can be used to achieve this.
The checksum may be appended to the session identifier to detect the corruption of session identifiers, such as the problem of users occasionally tampering with URL tokens or URL concatenation. It also ensures that data from other systems that may also be embedded in the same URL is not mistaken for a valid session identifier.
If the HTTP request is received by the plugin 58, and either the token encoded within the URL is invalid (for example, it has expired or does not match the checksum) or is missing, then the session ID module 55 attempts to retrieve the session identifier from the cookie value, which is temporarily stored on the client 1. Cookies are placed on the client 1, when an HTTP object, such as an Internet page, is sent to the client 1 at step 25. Set inside that cookie, or state object, is a piece of state information that allows the client 1 to determine the range of URLs for which that state is valid. Any future HTTP requests made by the client that fall within that range will include a transmittal of the current value of the state object from the client back to the server.
This provides a simple mechanism that enables a host of applications to be written for web-based environments. Shopping applications, as described in this example, can store information about the currently selected items. Cookies also free a user from retyping a user identifier on next connection. Sites may also store per-user preferences on the client 1 and have the client 1 supply those preferences every time that site is connected to.
Cookies may be advantageously used in the preset invention to identify people that have visited a site for the first time. A persistent cookie may be used, which can exist on the client 1 for up to two years, but which are typically set to expire after 12 months. This cookie is used to identify users that revisit the site within that period. If this cookie does not exist, or if after analysing the value it is determined that the cookie was sent more than 12 months ago, a new cookie is sent to the client and an event is triggered, which is stored in memory to be accessed by other processes running on the server. Cookies may also be used to identify people that have returned to a site
(for calculating re- visitation rates).
For example, a monthly cookie may be used, which is checked on each impression. The purpose of this cookie is to identify users who have returned to the site within the month. If a monthly cookie does not exist, or if after analysing the value it is determined that the cookie does not refer to the current month, a new monthly cookie is sent to the client and an event is triggered, with a message stored in memory for further use by other processes running on the server.
Cookies may be used to maintain session state. For example, a temporary cookie may be used, which expires when the browser is closed. The purpose of this temporary cookie is to help the plugin 58 track the current session identifier.
Finally, cookies may be used to maintain persistent custom tracking data. A suitable cookie for this purpose is a persistent cookie, which can exist on the client 1 for up to two years, but which is typically set to expire after 12 months. When custom tracking data is first supplied, it is stored on the client 1, and the connection sessions tracked in regards to the custom tracking data. If at a later date a revised set of custom tracking data is supplied to plugin 58 for a specific user, the cookie data is replaced with new custom tracking data. The Referrer request-header field allows the client 1 to specify, for the server's benefit, the URI of the resource from which the Request-URI was obtained. The plugin 58, at step 6, reads the referrer string that is received from the client 1 in the incoming HTTP headers. The string is analysed to determine if it contains a valid session identifier, or tce_sid CGI parameter. Additionally, the referrer is included, where possible, in the information that is stored the details regarding the current content request.
For the CD Store example, it is sufficient to say that this user has requested a page on the network server by typing the URL into the client browser, and so does not have a current session identifier, which can be tracked by the plugin 58. The fact that this request has come from an unknown user triggers a new user event. This causes the plugin 58 to create an impression event message, which is stored in the memory device 12 for use by another process at a latter stage. The plugin 58, in such circumstances, then creates a session identifier that is unique, and commences to track this user by linking the session identifier to subsequent requests.
The session identifier scheme discussed above provides server-side session management, allowing seamless tracking and recording of the movement of a user on a network. However, there are at least two other state holders that may be used to maintain session state, and allow the plugin 58 to track a particular user's browsing behaviour.
The first of these is persistent TCP connections. Prior to persistent connections, a separate TCP connection was established to fetch each URL, increasing the load on HTTP servers and causing congestion on the Internet. With persistent connections, a single TCP connection is established over which a series of HTTP requests are sent.
An ideal use of persistent connections is as an error detection mechanism. As session identifiers pass over the persistent connection, a list is maintained for the persistent connection. Whenever second and subsequent session identifiers are identified, a session map is generated informing the session manager to perform an error correction - if the error correction has already been performed for the given identifiers, the session map is discarded by the session manager. It is also possible to assign a session identifier to each persistent connection and session map that identifier in terms of the other session identifiers passed over the persistent connection. The second state holder is a browser signature. Through the application of a hashing function to a combination of HTTP request headers and the remote network address, a short-term network user identifier can be generated. This network user identifier can then be used as another session identifier, similar to the way a cookie is used for the purpose of maintaining session state in the above-described example.
If the plugin can, by using the URL token, cookie or HTTP referrer determine that the HTTP request has come from a user that is currently being tracked, the plugin 58 will raise an event that stores, amongst other things, the session identifier being tracked, the page or other content the user is currently being served and the content the user has requested. If the plugin 58 cannot determine that the user who made the request is currently being tracked, the plugin 58 will raise a new user event by setting the first impression flag to an on state.
An example of the session events tracked by the plugin 58 are indicated in table 2 below.
Table 2
Example Event Description
Figure imgf000021_0001
Figure imgf000022_0001
All impression event messages contain certain information relating to the navigation of a user, and may include the following:
(a) a session identifier (for example, A0D3B944); (b) a previous click number (the identification number processed from the token or cookie, for example, 232991);
(c) flags;
(d) timestamp (the local server time); and/or
(e) requested URL (the requested page, including Common Gateway Interface (CGI) parameters for example, http://www.acme.com/doc.html?tce_sid=A0D3B944232991010191).
An impression event message is placed onto the end of a queue of event messages, maintained in the memory device 12. A queue is a linear list software data structure, where all insertions are made at one end of the list and all removals and access at the other. Therefore, a queue is a First-In-First-Out (FIFO) or First-Come-First-Served (FCFS) structure used for storing certain information; here it is used for sequentially and chronologically storing event messages.
The plugin 58 then passes the original request (in the case of the CD Store, http://www.foo.com) to the network information server engine 54. The network server engine 54 retrieves the content required to display the page with the URL http://www.foo.com from the storage media 16, usually a hard disk on the network server, and then streams that data back to the plugin 58.
As the plugin receives the stream of data from the information server engine 54, the plugin examines the data to determine identify markers, or tags, that are embedded in the content of the page being sent that would indicate some transaction (result event), such as a sale, has taken place. If no transaction has taken place, the plugin simply passes the streamed data to a filter that modifies navigational links that are embedded in the streamed data.
Various forms of content are retrieved by the network server 51 from the storage media 16 and forwarded back to the client 1. One form of data files that will be sent is machine readable, or binary form, such as images (for example, *.gif; *.jpg), audio (for example, *.wav; *.au; *.mid) and executable programs (for example, *.exe; *.class). A second form of data files that will be sent is text form, such as scripts (for example, *.pl; *.vbs) and markup (for example, *.html; *.sgml). Each of these files can be defined according to their major and minor MIME type. For example, *.html files have the MIME type of text html. Those files that fall into the text/* MIME category will always generate an impression event.
Those files that specifically match the MIME type text/html will be filtered to determine if there are fields of information, for example modifiable HTML tags, that are embedded in those text files and which will impact on the plugin' s ability to track the browsing characteristics of a user. The modifiable HTML tags may contain navigation links, which are used to access other data on the network. Tokens, in the form outlined above, are encoded within these navigation links in the modifiable HTML tags. An example is the HREF value in an anchor tag. The tag <a href=http://www.foo.com/help.html>Help
Page</a> will be modified by the plugin 's filter to be <a href=http://www.foo.com/help.html?A0D3B944232992010129>Help Page</a>.
It will be noted that the session identifier (A0D3B944) has remained the same because the plugin is still tracking a particular session . Typically the date (0101) will also remain the same. The click identifier has been incremented, and the checksum will now return a different value because the composition of the token has changed.
The filter module 52 then examines tags that are embedded in the content being streamed to determine if there are any tags that are concerned with navigation around the network. Examples of tags that are concerned with navigation include:
• HTML comments (< ! -- . . . — >) and <TITLE> . . .</TIT E> are ignored
• <FORM> . . . </FORM> (both get & post forms, as required)
• <SCRIPT SRC=" . . . "> (apart from the SRC parameter, contents of
<SCRIPT> . . . </SCRIPT> are ignored)
• <META HTTP-EQUIV="REFRESH" CONTENT=" ... "> • <FRAME SRC="...">
• <LINK HRΞF=" ... ">
• <A HREF=" ...">
• <AREA HREF="...">
Tags that are concerned with navigation are then dynamically altered by the plugin 58 to allow the plugin to link the next HTTP request that is received from the user, with the page that is currently being served to the user. This is done, for example, by appending a URL token to the end of each navigation tag. As stated above, the contents of these tokens uniquely identify each user, and also identify from which page the user has made a particular request. The values that make up the contents of the tokens are retrieved from configuration files 24 and counters maintained in memory 23.
In this example, the token A0D3B944232991010191 is encoded within the end of each URL or URI contained in the streamed content. For example, the URI http://www.foo.com/buy_it.html will be dynamically replaced with the URI http://www.foo.com/buy_it.html?tce_sid=A0D3B944232991010191.
As the filter finishes processing the data being streamed through it, the data is then passed back across the Internet from the network server 51 to the client 1, where the client browser displays the requested page to the user.
In this example, the user may then decide to purchase an item displayed on the page just served, by clicking the words "Buy It". Those words may be linked, by HTML, or some other method, to the URI http.7/www.foo.com buy_it.html?tce_sid=A0D3B944232991010191. The URI http://www.foo.com/buy_it.html?tce_sid=A0D3B944232991010191 is then transmitted as part of an HTTP request to the network server 51 so that the server will supply the requested content to the user, who is now a tracked user.
When the request arrives at the network server 51, in the same manner outlined above, the server passes that request through the plugin 58. From this request, the plugin 58 can determine the page, or URI, being requested (http://www.foo.com/buy_it.html) and can also determine the token that was applied during the filtering process, which took place as the document was streamed to the user.
The characters "?tce_sid=" signify that tce_sid is a name or common gateway interface (CGI) variable that is used by the plugin to refer to the value of the token that is received by the network server, in this case tce_sid represents the token value of A0D3B944232991010191.
In this example, the first decision made by the plugin 58 at step 4, upon receipt of the request http://www.foo.com/buy_it.html?tce_sid=A0D3B944232991010191, is that the value of the URL token is A0D3B944232991010191.
The second decision at step 5 involves the querying of a value stored in a cookie that is typically located on the user's computer. The query of that cookie should also return the result that tce_sid=A0D3B944232991010191.
The third decision at step 6 involves the querying of the referrer field that is contained in the hypertext transfer protocol (HTTP) headers. The referrer field would, in this instance, also contain the URL http://www.foo.com, which was the previous page sent back to the user. In this example, it will be assumed that this information does not contain a valid token, and is so disregarded.
From this process, the plugin 58 can determine information that is relevant for tracking users around a network, or for determining a user browsing profile. Information recorded includes the tracked user from which the request originated from, from which navigable content they are navigating from, and what page are they navigating to. As the plugin 58 is able to determine that this is a tracked user, the plugin then triggers an impression event message at step 10, and at step 11 stores the retrieved data in memory device 12 so that data can be accessed by other processes running on the network server 51.
The plugin 58 then passes the request to the network information server engine 54 at steps 13 and 14 to retrieve the content requested from storage media 16 in a manner outlined above. The information server then streams the data back through the plugin at steps 17 and 18.
However, the user in this example may have merely entered into the page where the user may make a purchase, and the transaction has not yet been completed. Accordingly, no tag will be embedded in the content being served to the user that would indicate that a transaction, or result event, has taken place and should be recorded.
As outlined above, navigational tags are dynamically altered at step 22 to append the next token, which typically includes the current session identifier and a new number that corresponds to the page that has been requested, or the click ID. In this example, the token A0D3B944232999010136 will be encoded within each navigation link embedded in the served document. Note that the Server Number (A0) has remained the same, the session ID (A0D3B944) has remained the same, the date has remained the same (0101) but the click ID (232999) and the checksum (36) have changed. The content is then streamed at step 25 from the network server 51, across the network 2, to the client 1. The user may then decide that the displayed CD is the one they want to purchase and so choose to now complete the purchase. Practically, and in this example, this is achieved by clicking on a link that says "Purchase Now", or something similar, which is linked to the URI http://www.foo.com/purch.html?tce_sid=A0D3B944232999010136. When the request http://www.foo.com purch.html?tce_sid=A0D3B944232999010136 is received by the plugin 58 at step 3 in the manner described above, it is able to determine at steps 4, 5 and 6 that this is a tracked user and is again able to store relevant information that is retrieved from the user's request at steps 10, 11 and 12.
This time, the plugin 58 is also able, at step 6, to obtain information from the referrer field in the HTTP header of the request. This is because the referring page, or the page on which the link the user just clicked, had a token in the navigation link, the referrer will contain the URI httρ://www.foo.com/buy_it.html?tce_sid=A0D3B944232999010136, which contains the valid token A0D3B944232999010136.
The plugin 58 then stores the various details relating to the user's request in an impression event message, which is stored in memory device 12, as described above. The content is requested from the network information server, which streams the requested data back into the plugin. As a sale has now taken place, tags are embedded in the content being served at step 8, thus triggering the plugin to create a result event message, and then at step 21 place that event message in memory device 12 so that message may be used by other processes running on the server. This event message signifies that a transaction has taken place, and the relevant details are matched to the appropriate session identifier, linking a navigation profile with a transaction or result event that has occurred on the site. However, the result might also be a non-financial transaction. For example, the network user could have subscribed to the CD Store's clearance sale mail list, which is known as a lead.
Once the result event message has been placed in memory device at step 21, the plugin 58 appends the new token to the navigation links that are embedded in the content at step 22 and then at step 25 sends the content across the network 2 to the client 1, as outlined above. Information obtained during the above described process, is stored in memory device 12 in the form of impression event messages. These impression event messages are storedϊn a FIFO queue so that a separate process that may, be operating on the same or a different server as the network server plugin 58, is capable of storing that information in an intelligible manner that will later allow easier tracking of a user's browsing behaviour.
An accurate representation of a user's true experience when visiting a website may be obtained by using clicktrees path- analysis. The most discerning feature of clicktrees stem from the way that information is stored. As a user's browsing behaviour is stored in a tree index that best represents the way a user navigates around an internet site, clicktrees facilitate easier development of data mining and visualisation tools that can be used to identify the most common paths taken to achieve results, clicktrees will also assist these analytical tools to identify areas of a network site which are either popular, or which are leading users away from results, or perhaps to rival network sites. Clicktrees can also be used to describe user interactions across network sites.
Linear click-streams, or traditional log files, fail to accurately record a user's activity, notably by being unable to:
(a) determine where and when the user used their browser's back button; (b) track sessions across (and accurately represent) meta-refresh & HTTP redirects;
(c) accurately re-construct a user's activity in-to and out-of frames. However, clicktrees tree indexing facilitates an accurate representation of sessions in which: (a) a user clicked on a banner, left the site, then clicked on the banner again; and
(b) a user opened more than one window to surf a site in the same sitting. Clicktrees may be constructed from a sequence of events by using various pieces of information, and may include: (a) a timestamp that identifies when each document was served by the web server, typically accurate to the second, and/or
(b) previous and current click IDs.
Table 3 below is a linear representation of the requests made by the user being tracked in session ID A0D3B944. This table is similar to the current form of linear log files that are currently being used on network sites to store usage information, however it will be noted that the data captured includes both click and session identifiers.
Table 3
Figure imgf000029_0002
First document request Contains a result
Figure imgf000029_0001
The session events recorded in this table can be represented by the time- based sequence shown in Figure 5, where the events have been sorted in a linear sequence, based on the time stamp of each event. Existing browsing activity tracking systems record user activity is linear sequences of this type.
However, in the example shown in Figure 6, each event recorded in Table 3 has been arranged according to its previous click ID number. This clicktrees tree index structure quickly displays, and most accurately describes, the network user's browsing experience.
A click ID number is essentially a number that is allocated to each page or other separately navigable content that is served from the network server to a network information server. As outlined above, the click ID number is also part of the URL token that is encoded within each modifiable HTML tag, which is processed by the plugin filter. The steps the filter takes in assigning a click ID are essentially as follows:
(a) determine the last click ID allocated, in other words, the click ID of the last page served by the network server;
(b) increment that value by one; and
(c) apply the new value as this page's click ID.
Dynamically tracking a user's movements around a network site involves the logical construction and storage of clicktrees in the computer's random access memory (RAM). A separate clicktree is stored in RAM for each active connection session that is being tracked. The tracking of session identifiers have been described above
The construction and storage of clicktrees is conducted by the traffic manager module 56, which may or may not be operating on the same computer as the network server 51 and plugin 58, obtaining an impression event message from the memory device 12. The traffic manager module 56 then examines the impression event message to determine the session identifier that is stored in the impression event message. Take, for example session identifier A0D3B944, which was used in the CD example above. The session identifier A0D3B944 would be contained in the impression event message. The traffic manager module 56 then looks for session identifier A0D3B944 in the list of active sessions, and if it is not located as an active session, then the traffic manager module 56 starts a new session for that session identifier. The traffic manager module 56 then determines the previous and current click ID numbers so that it can insert the next entry in the appropriate place in the clicktree. Again, using the following example, the logical relationship between each impression event (or node) in the tree index structure is clearly defined. It is also clear that the result event did not come from going through the page identified by click ID 22152. Instead, the user has pressed the 'back' navigation icon in the user's browser, and has then proceeded down a path through the page represented by click ID 22180. Notwithstanding the limits of modern technology, using this technique does not limit the number of links a node may have. The tree index structure may be stored in RAM by creating a separate entry for each node, and associated fields for identification of each node to which that node is interconnected.
In another embodiment of the invention, an alternative storage and representation method may be used, which reduces memory requirements. This method also has the effect that traversal time is reduced due to the lack of a variable length link list at each node The clicktrees tree index structure resulting from this method is shown in Figure 7. In the optimised structure shown in Figure 7, a parent, child and sibling field only are required to be created for each node in the tree index structure recorded in memory, rather than allowing for the possibility of an infinite number of fields corresponding to an infinite number of nodes interconnected to each node. The method used to store and access clicktrees in RAM may not appropriate for some systems. Before storing a clicktree on such systems, the clicktrees may be converted back into a sequential list format, similar to that shown in Table 3.
Clicktrees can then be formed at a later stage by reconstructing the tree structure, based on previous and current click ID numbers.
The following exemplary connection sessions are practical examples of what one should expect in the CD Store example made previously. For the sake of clarity, a sequential list of events is shown for each session in a table. From the table, each session is represented visually in two forms i.e., as a linear clickstream based on the application of time-based sequencing, and as an optimised clicktree tree index structure based on the application of click sequence numbers. An additional example is offered for some sessions in the form of a logical clicktree.
The following scenario is based on a web server cluster consisting of two physically distinct machines, both of which are communicating with a single traffic manager module 56 located on one of these machines or remote from both machines.
A first exemplary connection session is shown in Table 4.
Table 4
Figure imgf000032_0001
In this example, a user enters site through homepage, visits the products page then retrieves information about delivery payment zones. After reviewing the delivery payment zones, the user presses "back" and proceeds to review a single product. Happy with their selection, the user proceeds to buy the product and receives a receipt. Six impression events and, one result event are generated.
A second exemplary connection session is shown in Table 5 and Figure 9. Table 5
__________£______£__■ Click ID Click ID URI (www.foo.com)
Figure imgf000033_0001
In this example, a user enters site through homepage, visits the products page then retrieves information about delivery payment zones. After reviewing the delivery payment zones, the user presses "back" and proceeds to review a single product. Happy with their selection, the user proceeds to buy the product and receives a receipt. Curious about where their sensitive information will end up, the user returns to the homepage by pressing the back button four times then reads the site's privacy policy. Seven impression events and one result event are generated.
A third exemplary connection session is illustrated in table 6 and Figure 10.
Table 6
Figure imgf000034_0001
After clicking on an advertisement, the user enters the site through a jump page. The jump page presents the user with a form offering for them to subscribe to a special offers mailing list. The user fills in their details and is presented with a thank you page after submitting their details. After half an hour of surfing, the user is presented with another advertisement. Again clicking on the advertisement, the user this time enters the site through the frames based interface discussed previously. Five impression events (two marked as TRASH), and one non-financial result event are generated. Two impressions had a last click ID of zero, indicating that two click-trees were started during the same session (in this case the user entered the site twice).
A common problem with the application of techniques designed to create session-state in an otherwise stateless world is session hopping. Session hopping occurs when a client "hops" between two or more session IDs, making what is essentially a single session appear to be two or more simultaneous sessions.
The process undertaken during a session map occurs when session- hopping has been detected (using the above-described connection session management techniques scheme) and that traffic manager module 56 should make an adjustment. Based loosely on the session illustrated in Table 5 and Figure 9 as an example, this exemplary connection session assumes the server receives the events shown in Table 7 below.
Table 7
Figure imgf000035_0001
Figure imgf000035_0002
At the fifth impression we will assume that session hopping occurred because the link to http://secure.foo.com/buy.asp did not include a URL token
(perhaps due to design error), the client failed to send a HTTP referrer header, and cookies were not transmitted due to the change in domain names.
After the move back to the www.foo.com domain at the seventh impression event, either of the following occurred:
(a) the URL requested in the seventh impression event contained the session identifier 01004ADC; and (b) the HTTP request associated with the seventh impression event included a
HTTP referrer header with the session identifier 01004ADC encoded inside.
Assuming that cookies are available and that the cookie contains the session identifier 0100AD8, a session map event is detected by the traffic manager module 56 and the data for both the 01004AD8 and 01004ADC sessions are merged. After the session map, the traffic manager module 56 treats both ID's as a single session and store all future events arriving under either of the IDs in the same area. Finally, it is to be understood that various modifications and/or additions may be made to the method of tracking browsing activity of the user in a computer network, and network server and software plugin for performance of that method, without departing from the ambit of the present invention as defined in the claims appended hereto.

Claims

CLAIMS:
1. A method of facilitating tracking browsing activity of a user in a computer network, the user having established a connection session with the computer network, the method including the steps of: receiving a user request for content located on a network server or server cluster; dynamically including a unique session identifier in content served to the user during the connection session; and examining multiple possible state holders within user content requests subsequently received by the network server or server cluster during the connection session to derive navigational data including the session identifier.
2. A method according to claim 1, wherein the session identifier includes a server identifier and a session-specific alphanumeric identifier.
3. A method according to either one of claims 1 or 2, wherein the possible state holders include any one or more of the following: a content identifier, a Cookie value, a referrer request header field, a persistent connection indicator, and a browser signature.
4. A method according to claim 3, wherein the content identifier is a uniform resource identifier.
5. A method according to any one of the preceding claims, and further including the step of: dynamically including in the content served to the user any one or more of a content locater, a content sequence identifier, a temporal stamp and a checksum.
6. A method according to claim 5, wherein the content locater, content sequence identifier, temporal stamp and/or checksum are appended to the session identifier.
7. A method according to the claim 6, wherein the content locater is a uniform resource locater.
8. A method according to the claim 6, wherein the content sequence identifier is a click, request or page sequence number.
9. A method according to the claim 6, and further including the step of: using the temporal stamp to temporally locate a session event during which content was served to a user.
10. A method according to the claim 6, and further including the step of: using the checksum to validate the session identifier.
11. A method according to any one of the preceding claims, wherein the connection session is a persistent session linking session events from multiple single sessions established by a same user.
12. A method according to any one of the preceding claims, and further including the step of: storing navigational data derived from content served to the user during each session event in a session event message.
13. A method according to claim 12, and further including the step of: transmitting the session event message to a traffic manager to link session events within a same connection session.
14. A method according to claim 13, wherein the traffic manager is located remotely from the network server or server cluster.
15. A method according to any one of claims 12 to 14, and further including the step of: transmitting the user request for content to an information server engine for retrieval of the requested content.
16. A method according to claim 15, and further including the step of: transmitting the requested content to a filter module to dynamically include at least part of the navigational data including the session identifier in the requested content prior to serving the requested content to the user.
17. A method according to claim 16, wherein at least part of the navigational data is obtained from counters or configuration files accessible from the filter module.
18. A method of facilitating tracking browsing activity of a user in a computer network, wherein the establishes a connection session with the computer network, and a unique content identifier is assigned to each separate instance of navigable content within the network, the method including the steps of: receiving navigational data, including the previous and current content identifier, corresponding to content served to a user for each session event during the connection session; and deriving a tree index structure which logically interconnects nodes representing instances of content being served to the user for each session event during the connection session with nodes representative of content from which the user has navigated.
19. A method according to claim 18, wherein the tree index structure is stored in a memory device by creating a separate entry for each node and associated fields for each node interconnecting with that node.
20. A method according to claim 18, wherein the tree index structure is stored in a memory device, a separate entry being made for each node, and a parent, child and sibling link field being created for nodes that interconnect to that node.
21. A method according to claim 18, and further including the step of: merging connection sessions in which one or more session event messages have common session identifiers.
22. A method according to any one of claims 18 or 21, wherein the navigational data is derived by a method according to any one of claims 1 to 17.
23. A network server or server cluster including a processing unit and associated memory encoded with network server software for causing the network server to performing a method according to any one of the preceding claims.
24. A software plug-in forming part of the network server software of a network server or server cluster according to claim 23, wherein the software plug-in causes the network to perform a method according to any one of claims 1 to 22.
25. A software plug-in according to claim 24, comprising a filter module for dynamically including a unique session identifier in content served to the user during the connection session; and a session ID module for examining multiple possible state holders within user content requests subsequently received by the network server or server cluster during the connection session to derive navigational data including the session identifier.
26. A software plug-in according to claim 25, wherein the filter module assigns a unique content identifier is assigned to each separate instance of navigable content within the computer network, the plug-in further comprising: a traffic manager module for receiving navigational data, including the previous and current content identifier, corresponding to content served to a user for each session event during the connection session; and deriving a tree index structure which logically connects content served to the user for each session event during the connection session with content from which the user has navigated.
27. A software plug-in according to claim 26, wherein the traffic manager module is located remotely from the network server or server cluster.
PCT/AU2001/001612 2000-12-14 2001-12-14 A method of tracking browsing activity of a user in a computer network WO2002048899A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002221336A AU2002221336A1 (en) 2000-12-14 2001-12-14 A method of tracking browsing activity of a user in a computer network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPR2089 2000-12-14
AUPR2089A AUPR208900A0 (en) 2000-12-14 2000-12-14 A method of deriving information relating to the browsing of a user on a network

Publications (1)

Publication Number Publication Date
WO2002048899A1 true WO2002048899A1 (en) 2002-06-20

Family

ID=3826123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2001/001612 WO2002048899A1 (en) 2000-12-14 2001-12-14 A method of tracking browsing activity of a user in a computer network

Country Status (2)

Country Link
AU (2) AUPR208900A0 (en)
WO (1) WO2002048899A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10332717A1 (en) * 2003-07-18 2005-02-03 Abb Research Ltd. User guidance method e.g. for web portal, involves web portal exhibiting, in hierarchical structure and being linked with one another and user information about popularity of all sides of web portal and subordinate branches indicated
WO2005050950A1 (en) * 2003-11-13 2005-06-02 Cryptek, Inc. System and method for traversing access control metadata across multiple network domains
WO2012125270A1 (en) * 2011-02-28 2012-09-20 Oracle International Corporation Method and computer program to monitor and correlate user - initiated actions with backend operations
CN102904903A (en) * 2012-11-02 2013-01-30 北京奇虎科技有限公司 Communication system and communication method
US20140082480A1 (en) * 2012-09-14 2014-03-20 International Business Machines Corporation Identification of sequential browsing operations
US8880710B2 (en) 2006-01-31 2014-11-04 Speed-Trap.Com Ltd. Website monitoring and cookie setting
US8898309B2 (en) 2006-01-31 2014-11-25 Speed-Trap.Com Ltd. Website monitoring and cookie setting
US10432581B2 (en) 2015-10-14 2019-10-01 Smartpipe Technologies Ltd Network identification as a service

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5717860A (en) * 1995-09-20 1998-02-10 Infonautics Corporation Method and apparatus for tracking the navigation path of a user on the world wide web
US6112240A (en) * 1997-09-03 2000-08-29 International Business Machines Corporation Web site client information tracker
WO2000075827A1 (en) * 1999-06-04 2000-12-14 Websidestory, Inc. Internet website traffic flow analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5717860A (en) * 1995-09-20 1998-02-10 Infonautics Corporation Method and apparatus for tracking the navigation path of a user on the world wide web
US6112240A (en) * 1997-09-03 2000-08-29 International Business Machines Corporation Web site client information tracker
WO2000075827A1 (en) * 1999-06-04 2000-12-14 Websidestory, Inc. Internet website traffic flow analysis

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10332717A1 (en) * 2003-07-18 2005-02-03 Abb Research Ltd. User guidance method e.g. for web portal, involves web portal exhibiting, in hierarchical structure and being linked with one another and user information about popularity of all sides of web portal and subordinate branches indicated
WO2005050950A1 (en) * 2003-11-13 2005-06-02 Cryptek, Inc. System and method for traversing access control metadata across multiple network domains
US7561578B2 (en) 2003-11-13 2009-07-14 Cryptek, Inc. System and method for traversing metadata across multiple network domains at various layers of the protocol stack
US7990884B2 (en) 2003-11-13 2011-08-02 Api Cryptek Inc. System and method for traversing metadata across multiple network domains at various layers of the protocol stack
US8880710B2 (en) 2006-01-31 2014-11-04 Speed-Trap.Com Ltd. Website monitoring and cookie setting
US8898309B2 (en) 2006-01-31 2014-11-25 Speed-Trap.Com Ltd. Website monitoring and cookie setting
CN103415841B (en) * 2011-02-28 2016-06-15 甲骨文国际公司 Method and system mutual between monitor client system place user and back-end system
US8650284B2 (en) 2011-02-28 2014-02-11 Oracle International Corporation User activity monitoring
CN103415841A (en) * 2011-02-28 2013-11-27 甲骨文国际公司 Method and computer program to monitor and correlate user - initiated actions with backend operations
WO2012125270A1 (en) * 2011-02-28 2012-09-20 Oracle International Corporation Method and computer program to monitor and correlate user - initiated actions with backend operations
US20140082480A1 (en) * 2012-09-14 2014-03-20 International Business Machines Corporation Identification of sequential browsing operations
US10353984B2 (en) * 2012-09-14 2019-07-16 International Business Machines Corporation Identification of sequential browsing operations
US20190286687A1 (en) * 2012-09-14 2019-09-19 International Business Machines Corporation Identification of sequential browsing operations
US11030384B2 (en) 2012-09-14 2021-06-08 International Business Machines Corporation Identification of sequential browsing operations
CN102904903A (en) * 2012-11-02 2013-01-30 北京奇虎科技有限公司 Communication system and communication method
CN102904903B (en) * 2012-11-02 2015-09-02 北京奇虎科技有限公司 Communication system and communication means
US10432581B2 (en) 2015-10-14 2019-10-01 Smartpipe Technologies Ltd Network identification as a service

Also Published As

Publication number Publication date
AUPR208900A0 (en) 2001-01-11
AU2002221336A1 (en) 2002-06-24

Similar Documents

Publication Publication Date Title
US6557038B1 (en) Method and apparatus for maintaining session states
US6691163B1 (en) Use of web usage trail data to identify related links
EP1706832B1 (en) Improved user interface
JP5055133B2 (en) Method and device for publishing cross-network user behavior data
US7062475B1 (en) Personalized multi-service computer environment
US8291040B2 (en) System and method of associating events with requests
US6973492B2 (en) Method and apparatus for collecting page load abandons in click stream data
US20020078191A1 (en) User tracking in a Web session spanning multiple Web resources without need to modify user-side hardware or software or to store cookies at user-side hardware
US8732252B2 (en) Cooperating system, chat server, program, and cooperating method
US20030131045A1 (en) Method and apparatus for synchronizing cookies across multiple client machines
EP1193603A1 (en) Information collection system and information collection method on network, each using technique of internet, and recording medium in which the information collection method is recorded
JP2000099461A (en) Method and device for forming user session and providing internet data corresponding to the user session
KR20060121923A (en) Techniques for analyzing the performance of websites
JP2007510973A (en) Optimization of advertising activities on computer networks
JP2008517373A (en) Generating keywords for searching in computer networks
WO2001077968A2 (en) Methods and systems for transactional tunneling
JP2007533019A (en) Apparatus and method for tracking electronic content issued syndicated via an Internet type communication network
KR20070007131A (en) System and method for responding to search requests in a computer network
US20090327235A1 (en) Presenting references with answers in forums
Suresh et al. An overview of data preprocessing in data and web usage mining
Rosenstein What is actually taking place on web sites: e-commerce lessons from web server logs
WO2002048899A1 (en) A method of tracking browsing activity of a user in a computer network
JP2006526218A (en) Generation and presentation of search results using addressing information
US20030018752A1 (en) System and method for embedding a message in a uniform resource locator
US20040205490A1 (en) Method and system for automated simulation of dynamic URL-based Web application

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION UNDER RULE 69 EPC ( EPO FORM 1205A DATED 23/09/03 )

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP