WO2001009771A1

WO2001009771A1 - Targeted advertising system

Info

Publication number: WO2001009771A1
Application number: PCT/US2000/020999
Authority: WO
Inventors: Robert M. Giuli; Stanley George Fisher
Original assignee: General Dynamics Government Systems Corporation
Priority date: 1999-08-03
Filing date: 2000-08-02
Publication date: 2001-02-08
Also published as: AU6617700A; WO2001009771A9

Abstract

A client (106) accesses a content serving site (102) and information from this access is passively gathered by the content serving site (102). The information obtained from the client may be used to intelligently select content, such as advertisements, to include with the content accessed by the client. When a client (106) initially arrives, the client is assigned a unique anonymous identifier and an entry is created in a database (107). As the client (106) moves around the content serving site (102), the client's profile is updated in the database (107) based on the clients actions. The stored information can be analyzed and provided to Internet service provides (ISP) to provide better service to their clients. Additionally, the information regarding client preferences can be used to alter the content of the information provided to the client (106) in real time using a database that stores context information for each page of the content serving site (102) and context information for each potential page from which the client (106) can be referred.

Description

TARGETED ADVERTISING SYSTEM CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of United States Provisional Patent Application No. 60/146,955 filed on August 3, 1999. BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates to a system for providing users with targeted information, e.g., an advertisement, based on a profile of the user's preferences. Description of the Related Art The Internet excels at cheaply delivering information to a wide audience. Internet sites that compile and make the information available to users, often called "content" sites, are rapidly becoming an important element in national commerce. One way these sites generate income is by charging advertisers to display advertisements in, e.g., a banner, when their site is accessed. Users reading content at the site are exposed to the advertisements.

To maximize the efficiency of the individual ads, it is desirable to tailor the advertisements to the particular individual user. One conventional way to tailor advertisements to a particular user is to have the user initially fill out a questionnaire relating to the user's hobbies, demographic information, employment information, etc. This method of obtaining information about the user, however, is burdensome to the user and is limited to the questions in the questionnaire.

Thus, there is a need in the art to be able to obtain profile information specific to a user without having to explicitly question the user. SUMMARY OF THE INVENTION

Systems and methods consistent with the present invention overcome the shortcomings of the prior art by passively obtaining profile information regarding individual users of a network. Additional objects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. In accordance with the purpose of the invention, as embodied and broadly described herein, systems and methods consistent with the invention target information to a user of a network having content information by passively gathering parameters from a user's request of content information, determining, from the parameters, a user profile, and providing targeted information to the user based on the user profile. In accordance with another aspect of the invention, systems and methods consistent with the invention receive a user's request of content information from a first server, determine parameters from the user's request, and send the parameters to a second server for storing information regarding the user. Such systems and methods also provide a database, used in connection with content information on a network, including information regarding an address of the content information and information regarding a plurality of addresses containing references to the address of content information. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and together with the description, serve to explain the principles of the invention. In the drawings,

Fig. 1 is a high level diagram of an exemplary computing system network;

Fig. 2 is a more detailed diagram of an exemplary computer system associated with the present invention;

Fig. 3 is a high level diagram of processes run by a server in the exemplary computing system network;

Fig. 4 is a schematic diagram of an exemplary page context generation engine operating in accordance with the present invention; and Fig. 5 is a schematic diagram of an exemplary intelligence vortex component operating in accordance with the present invention. DETAILED DESCRIPTION

Reference will now be made in detail to the present disclosure of the invention, an example of which is illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

A user interacts with a local device called a client. For simplicity, this user/client combination will be referred to as a user or a client herein. The client accesses a content serving site operating consistent with the present invention and information from this access is passively gathered by the content serving site. The information obtained from the client may be used to intelligently select content, such as advertisements, to include with the content accessed by the client. More particularly, when a client initially arrives at the content serving site, the server queries the client to determine if an anonymous identifier, such as a cookie or certificate, has been previously stored with the client. If so, the information in the anonymous identifier is used to identify a corresponding entry in a database. If not, the client is assigned a unique anonymous identifier and an entry is created in a database. Anonymity is a key concern of the present invention out of reasonable concern for privacy of users of the system. Privacy is ensured by only passively looking to information provided by the client with the request for information from the content serving to determine the client's behavior. Also, if placement of the anonymous identifier is not permitted on the client side, that information is not provided to the system. Nevertheless, a less responsible party may choose to forego the safeguards provided by the system.

As the client moves around the content serving site (e.g., as the client requests various web pages from the site or disconnects from the content serving site), the client's profile is updated in the database based on the clients actions. In the subsequent explanation of the invention, a preferred embodiment related to a user browsing on the Internet will be described merely for convenience and by way of example. As will be readily apparent to a person having skill in the area, the present invention is not so limited to browsing on the Internet. For example, the invention can be applied to any information presentation in wide spread area demographic situations. Fig. 1 is a high level diagram of an exemplary computing system network on which the present invention may be implemented. The system includes a server 102 storing content information, such as web pages or downloadable files, and client computers 106 capable of accessing the content information on server 102 through network 104. Network 104 may be, for example, the Internet or a corporate Intranet. Server 102 may be any of a number of known computers, or network of computers, capable of delivering information to clients 106 over network 104. Similarly, clients 106 may be any of a number of known computers, or network of computers, capable of requesting information from server 102.

To retrieve content information, such as a web page, stored on server 102, the user of client 106 specifies a URL (uniform resource locator). The specified URL allows software running on client 106, e.g., browsing software such as Netscape Corporation's Navigator (TM) or Microsoft Corporation's Internet Explorer (TM), to initiate communication with server 102 and access the desired content (web page), which the client software interprets and provides to client 106. For example, the desired content can be displayed on a CRT display.

Fig. 2 is a more detailed diagram of a computer system 200, which may be client 106 or server 102. Computer system 200 includes a processor 202 and a memory 204 coupled to processor 202 through a bus 206. Processor 202 fetches computer instructions from memory 204 and executes those instructions. Processor 202 also (1) reads data from and writes data to memory 204, (2) sends data and control signals through bus 206 to one or more computer output devices 220, (3) receives data and control signals through bus 206 from one or more computer input devices 230 in accordance with the computer instructions, and (4) transmits and receives data through bus 206 and router 225 to network 104. Memory 204 can include any type of computer memory including, without limitation, random access memory (RAM), read-only memory (ROM), and storage devices that include storage media such as magnetic and/or optical disks. Memory 204 includes a computer process 210, such as a web browser or web server software. A computer process includes a collection of computer instructions and data that collectively define a task performed by computer system 200.

Computer output devices 220 can include any type of computer output device, such as a printer 224, a cathode ray tube (CRT) 222 (alternatively called a monitor or display), a light-emitting diode (LED) display, or a liquid crystal display (LCD). CRT display 222 preferably displays the graphical and textual information of the web browser. Each of computer output devices 220 receives from processor 202 control signals and data and, in response to such control signals, displays the received data.

User input devices 230 can include any type of user input device such as a keyboard 232, a keypad (not shown), a pointing device, such as an electronic mouse 134, a trackball (not shown), a lightpen (not shown), a touch-sensitive pad (not shown), a digitizing tablet (not shown), thumb wheels (not shown), or a joystick (not shown). Each of user input devices 230 generates signals in response to physical manipulation by a user and transmits those signals through bus 206 to processor 202. The process begins with the user's first contact with a content serving site that employs a user profiler 110 consistent with the present invention. After this first contact, user profiler 110 receives information related to the content serving site accessed by client 106 and the client's request of information from the content serving site. With this information, user profiler 110 generates a profile of client 106, to which it intelligently matches targeted content information, such as banner ads, to also provide to client 106. The previously mentioned anonymous identifier is used to identify the client for future profiling.

User profiler 110 begins by gathering initial basic data from the client request, including client 106 and server 102 information, and information for the Internet Hyper

Text Transfer Protocol (HTTP) searches and links. Such information can include date, gateway interface, HTTP image formats accepted, HTTP character sets accepted, HTTP encoding accepted, HTTP language accepted, the URL of the site accessed, the page which referred to user to the current page, the HTTP user agent, path information for executables on the server, a query string, the user's IP address, the user's host, the method used to access the desired page, the script used in accessing the page, and the site server's name and port.

To accomplish this, a process in server 102 hosting the content serving site traps the basic data associated with the client's actions in a manner that is transparent to the client. If an anonymous identifier has not been created, server 102 creates a unique, anonymous identifier for the client and attempts to store the anonymous identifier on the client side. To relieve the system from having to generate a unique number, the initial basic data can be used to generate the unique identifier. For example, if information concerning a user's Ethernet ID is provided, this is a guaranteed unique number until the

year 2002. Alternatively, the system can combine data from a portion of the user's IP

address with the current date and time to generate the unique number.

The process operating on server 102 then creates a packet containing the unique

identifier and the basic data and sends the packet to a server 107 operating as part of user

profiler 110.

Server 107 preferably receives the packet as part of an HTTP URL on port 777 in

a GET-Form format. Other formats can be used, based on appropriate engineering

tradeoffs. For example, the GET-format is limited in length and does not require

handshaking, and the POST method lengthens the time required to transfer information.

A sample GET-format of the URL is as follows:

http://www.gnosis.com:777?l=12345+D=2000:1231 :2359+S=www.gte.net+ K=travel+L=San Francisco, CA, USA+U=www.yahoo.com.

With this format, which can be expanded up to the maximum character limit for a GET-

format URL (presently 2048 characters), 1 stands for anonymous identifier of the Client,

e.g. 1=12345, D stands for the date and time of transfer, e.g. D=2000: 1231 :2359, S stands

for the source server 102, e.g. S=www.gte.net or an IP address, K stands for a keyword,

e.g. K=travel, L stands for the location of the client gateway, e.g. L=San Francisco, CA,

USA, and U stands for linked URL, e.g U=www.yahoo.com or IP address. The U

parameter is last in each aspect of the preferred embodiment.

Once the packet is sent, server 102 operates without regard to success of the

arrival of the packet at server 107. A precise transfer is not required in the preferred embodiment, because additional data might not serve to increase the profiling of the user

parameters that much.

New packets can be generated and sent to update the user profile in real time as

the user performs various browsing operations, e.g., a new search, a disconnect, and

following a link embedded in the current page.

Server 107 includes a daemon 301, parse and store threads 302, a targeting

intelligence engine (see Figs. 4-5), and a database 303, of which elements 301-303 are

schematically illustrated in Fig. 3.

Daemon 301 is a program that continually monitors processes port 777. A sample

Java program that demonstrates the concept follows.

import java.io.*; importjava.net.*;

/** *This program camps on port 777 waiting for inbound udp xfers

*Usage as above

1. URL specs must always be last

2. quotes not included in the udp call 3. Total call size limited to 256 Characters

**/ public class echo777 {

OutputStream out = null; public static void main(String args []) { try{

//port number, 777 or first parameter int port = 777; if (args. length = 1) port = lnteger.parselnt(args[0]);

ServerSocket ss = new ServerSocket(port);//open a socket

//wait in infinite loop for inbound xfers int transaction = 0; while (true) { transaction++; Socket client = ss.accept(); Buffered Reader in = new BufferedReader(new

InputStreamReader(client.getinputStreamO)); Print Writer echo_out= new Print Writer(new OutputStreamWriter(client.getOutputStream())); echo_out.println("HTTP/l .0 200"); echo_out.println("Content-Type: text/plain"); echo_out.printm() ; echo_out.println();

System.out.print("transaction "+transaction+" "); System.out.printlnC'CONNECTION TO PORT 777

SUCCESSFUL"); echo_out.flush(); String line; while((line = in.readLine()) !=null) { if(line.length() = 0) break; echo_out.println(line) ; }//line

//close streams echo_out.close(); in.close(); client.close();

System.out.println("end of echo message"); System.out.flush(); System.gcO; if (transaction > 2) break;

} ///while true }//try catch (Exception e) { System, err.println(e) ; System. err.println("usage violated - daemon777 <optional port>");

}// catch }//main {//echo777

The primary function of server 107 is to respond as quickly as possible to the

datagram in the URL and not to do anything with that datagram. A function of daemon 301 is to initiate timed parse and store threads 302. Each instance of a datagram passed to the daemon will spawn one of these threads.

A function of each parse and store thread 302 is to collect the information received by the daemon, parse the information to make sense of it, link up the parsed information for storage in the database 303, and terminate after completing its mission. If a parse and store thread cannot complete its mission in a predetermined time, the thread 302 is terminated automatically. Each thread 302 then passes the parsed information to database 303.

Database 303 stores the parsed information. Preferably, database 303 is an Oracle database, because it is a real-time, highly scalable, multi-access database with known functionality. Other databases could be used, e.g., ODI Persistent Object Store.

Once the information is stored in the database 303, it can be analyzed. The focus of the analysis can be to determine what content a client 106 visiting the content serving site on the server 102 is interested. The stored information can be provided to Internet service providers (ISPs) to provide better service to their clients. A site owner could use the information to improve the content of the site. Additionally, the information regarding client preferences can be used to alter the content of the information provided to the client 106 in real time.

If the information is provided to the site owner or ISP, the information does not require a real-time response, so the process can be performed off-line or in a background mode reniced to a low priority. The process does, however, require the intelligence of a methodology to determine categories of client interest from the information stored in database 303. For example, based on the URL references, or reverse URL references, keywords

can be determined that can be correlated to categories of client interest.

To examine a URL, a sample Java program follows:

import java.io.*; importjava.net.*;

/**

*This simple program uses the URL class and its openStream() method to

*download the contents of a URL and copy them to a file or to the console.

**/ public class geturl { public static void main(String[] args) { InputStream in = null; OutputStream out = null; try{ //check the arguments if ((args.length !=1) && (args.length !=2)) throw new IllegalArgumentException("Wrong number of Arguments":

//Setup the Streams, in & out URL url = new URL(args[0]); in = url.openStream(); if (args.length = = 2) out = new FileOutputStream(args[l]); else out = System.out; //Now copy bytes from the URL to the output stream byte[]buffer = new byte[4096]; int bytes_read; while((bytes_read = in.read(buffer)) != -1) out.write(buffer, 0, bytes_read); }//try

//On exceptions, print error message and usage indicator catch (Exception e) { System.err.println(e);

System.err.println("Usage: Java GetURL <URL> [<filename>} "); }//catch finally {// always close the streams, no matter what! try {in.closeO; out.close();} catch (Exception e) {} } //finally }//main }//GetURL This program obtains the URL information of the stored URL address, which can then be scanned to find metatags placed for search engines as well as the text of the body of the URL. This can then be indexed by both the anonymous identifier and the server 102, and resolved in the database with any keyword parameters already stored for that anonymous identifier and server.

To provide the targeting information to the site owner, the database is searched with respect to the content of server 102 and the keywords found in the URLs. The results are to determine appropriate categories of interest, and those categories are relayed to the site owner. The reverse site URLs, which make reference to the site, can also be provided to give the site owner a list of potential advertisers. The information can be vector matched in a similar manner to be described subsequently with respect to the provision of real-time target information. Therefore, a server 102 employing the process can improve their cost per thousand of impressions (CPM), and improve their percent usage of real estate. Vectored analysis of the client information is performed by the targeting intelligence engine. Figs. 4 and 5 illustrate the main components of the targeting intelligence engine, which determines psycho-graphic personality preferences using the data gathered and stored in database 303. The targeting intelligence engine is designed to give confidence intervals about a user's sex, age, and geographical location, for example. In preparation for the transmission of targeted content from, e.g., Intelligent

Information Systems' Netgravity Advertisement server to client 106, context information must be generated for each page of the content serving site. The generated context information is stored in a database 404, for example, a relational database management system, such as Oracle. Additionally, context information for each potential page from which the visitor can be referred is be generated and stored in database 404.

As shown in Fig. 4, a page context generation engine 400 runs a context application, e.g. an Oracle ConText application, against all of the pages of the content serving sites that subscribe to the system to generate the context for each of those pages.

Oracle ConText is preferred because of its ability to determine the gist or theme of a document on the web using standard SQL commands. Nevertheless, any software that can read a document and determine its context would be acceptable. Context generation is also performed against all pages on the Internet that have links to the pages of the subscribing content serving sites. In Fig. 4, a generic server having a content serving site is indicated at 402. Generated context information for all pages is stored in database 404 and is indexed by page URL.

Page context generation engine 400 is fully automated, and only requires the URL's of the pages on which the targeted content will be served. The URL's can be entered by an operator using a Java applet from a web browser interface. When a URL is entered, the web page is extracted and run through the HTML filter, which removes HTML tags and non-essential data. The resulting page content is passed to context application 401 which generates and stores the page context vectors, indexed by URL. In addition to context vectors for each page with which targeted content will be provided, all potential referring pages are extracted by a URL robot (a.k.a. a spider) to locate URLS which contain links to the host page. This can be performed by, e.g., database maintenance robot 405. In a similar manner as the host page, each referring page is filtered and passed to context application 401, which generates and stores the page context vectors as a database record, indexed by URL.

After the context vectors for each page are determined, a page gender calculation engine 403 develops a gender/value pair from the resulting context for each page. The gender/value is stored in database 404, indexed by URL.

Figure 5 illustrates a preferred embodiment of an intelligence vortex component 500. When targeted content is requested of a content server 501, intelligence vortex component 500, in real time, retrieves the stored context information from database 404 for both the page from the content serving site and the referring page. Intelligence vortex component 500 includes a group of tasks used to determine the most appropriate targeted content for a particular client 106 using information developed from database 303 in user profiler 110.

A visitor gender analysis engine 504 extracts gender/value pairs stored in database 404 associated with client 106. An algorithm is applied using the weighted value of each page gender, resulting in a more accurate determination of the visitor gender than could be arrived at using a single gender datapoint. For example, the past, present, and to be accessed pages could be used to calculate the gender. If client 106 is one who has been previously assigned an anonymous identifier, all URLs associated with that client 106 can be used to increase the accuracy of visitor gender analysis engine 504. If the client's gender is already known and stored in database 303, then this process can be avoided.

A visitor location component 505 uses client location information from database 303 or estimates a likely location from the clients IP address provided from server 107 and information from a local Whols(TM) command server (GTE, the assignee of the present invention, provides such a server).

A page context analysis engine 503 extracts the context data from database 404 of the pages associated with client 106. An algorithm is applied that determines content that the client 106 is likely to find preferable. For example, the context of past, present, and to be accessed pages could be used to calculate the client preference. If client 106 is one that has been previously assigned an anonymous identifier, all URL's associated with that client 106 can be used to increase the accuracy of page context analysis engine 503.

A target content dimensions determination engine 502 uses the output of visitor gender analysis engine 504, page context analysis engine 503, and visitor location component 505 to develop target preferences in the context of the known targeted content available at target content server 501. Similarly, an age analysis engine could also be provided to determine target preferences. Then, the targeted content is provided to client 106 in real time. A database maintenance robot 405 (see Fig. 4) runs continually in the background at a low priority to ensure that database 404 does not contain cluttering, invalid records; contains all valid pages; and does not contain improper context data. Database maintenance robot 405 ensures that all records in database 404 are valid by successively accessing the database by, e.g., walking the database, extracting a URL, and requesting the page from the Internet using the URL. In the event that the page no longer exists

(Error 404), the record containing the URL is deleted from the database.

To ensure that all valid pages are in database 404, database maintenance robot 405 continually spiders the web, seeking new URL's which contain links to the banner host pages. If such a URL is found, it submits the web page for inclusion in the database 404, after performing the above filtering and context generation as done for the other referring pages.

To ensure that the context of each valid page is appropriate and current, database maintenance robot 405 periodically updates the context data for each linking web page and banner host page in database 404.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

WHAT IS CLAIMED:

1. A method of providing targeted information to a user of a network having content information, comprising: passively gathering parameters from a user's request of content information; determining, from the parameters, a user profile; and providing targeted information to the user based on the user profile.

2. A method according to claim 1, wherein the gathered parameters include time parameters, user-side parameters, and network-side parameters.

3. A method according to claim 1, further comprising: storing the gathered parameters in a database, wherein the determining step includes using a history of gathered parameters to determine a user profile.

4. A method according to claim 1, wherein the determining step includes determining a user's gender profile.

5. A method according to claim 1, wherein the determining step includes determining a user's location profile.

6. A method according to claim 1, wherein the determining step includes determining a user's preferences based on the content information.

7. A method of gathering user data, comprising: receiving a user's request of content information from a first server; determining parameters from the user's request; and sending the parameters to a second server for storing information regarding the user.

8. A method according to claim 7, further comprising: assigning an identifier for the user and causing the identifier to be stored on the user's system.

9. A method according to claim 8, wherein the assigning step includes sending the identifier from the first server to the second server.

10. A method according to claim 8, further comprising: storing the parameters in a database indexed by the identifier.

11. A method according to claim 7, wherein the method operates in association with the Internet, and wherein the sending step includes transmitting the parameters in GET-format.

12. A database used in connection with content information on a network, comprising: information regarding an address of the content information; and information regarding a plurality of addresses containing references to the address of content information.

13. A database according to claim 12, wherein the information regarding an address of the content information includes a context of the content information.

14. A database according to claim 12, wherein the information regarding the plurality of addresses includes a context of information associated with the plurality of addresses.

15. A database according to claim 12, wherein the information regarding an address of the content information includes the gender value pairs.

16. A database according to claim 15, wherein information regarding the plurality of addresses includes gender value pairs associated with the plurality of addresses.

17. A method of providing targeted information to a user of a network having content information using a database comprising: passively gathering parameters from a user's request of content information; determining, from the parameters and information stored in the database, a user profile; and providing targeted information to the user based on the user profile.

18. An apparatus which provides targeted information to a user of a network having content information, comprising: means for passively gathering parameters from a user's request of content information; means for determining, from the parameters, a user profile; and means for providing targeted information to the user based on the user profile.

19. An apparatus according to claim 18, wherein the gathered parameters include time parameters, user-side parameters, and network-side parameters.

20. An apparatus according to claim 18, further comprising: a database which stores the gathered parameters, wherein the user profile is determined using a history of gathered parameters.

21. An apparatus according to claim 18, wherein the means for determining determines a users gender profile.

22. An apparatus according to claim 18, wherein the means for determining determines a user's location profile.

23. An apparatus according to claim 18, wherein the means for determining determines a user's preferences based on the content information.

24. An apparatus according to claim 18, wherein the network is the Internet.

25. An apparatus according to claim 18, wherein the network is an Intranet.

26. An apparatus which gathers user data, comprising: a first server which for receiving a user's request of content information; means for determining parameters from the user's request; and means for sending the parameters to a second server for storing information regarding the user.

27. An apparatus according to claim 26, further comprising: means for assigning an identifier for the user and causing the identifier to be stored on the user's system.

28. An apparatus according to claim 27, wherein the means for assigning includes sending the identifier from the first server to the second server.

29. An apparatus according to claim 27, further comprising: storing the parameters in a database indexed by the identifier.

30. An apparatus according to claim 26, wherein the method operates in association with the Internet, sending includes transmission of the parameters in GET- format.