WO2001046896A1

WO2001046896A1 - Automatic marketing process

Info

Publication number: WO2001046896A1
Application number: PCT/US1999/030793
Authority: WO
Inventors: Lounette M. Dyer; Jonathan G. Dickinson; Jean-Marc Langlois; Gordon P. Rios
Original assignee: Cogit Corporation
Priority date: 1999-12-20
Filing date: 1999-12-20
Publication date: 2001-06-28
Also published as: AU2384400A

Abstract

A method and system which automatically generate a list of target consumers, each assigned the best marketing treatment for use in a marketing campaign. Model-building unit (102) receives inputs from consumer database (120) and schema (114) and uses these inputs to generate model (103). Scoring unit (104) receives inputs from model (103), propect database (122) and schema (115) and uses these inputs to generate prospect list (105). Target list generator (106) receives inputs from scored prospect list (105) and schema (116) and uses these inputs to generate target list (124). Target list (124) is input into contact initiator (108) which is responsible for contacting the selected prospective customers. Response collector (110) gathers responses (126) to contacts made by contact initiator (108). Responses (126) along which schema (118) are inputted into model-refining unit (112) which refines model (103) based on consumers' propensity to respond or their propensity to respond to particular treatments.

Description

AUTOMATIC MARKETING PROCESS

BACKGROUND

Field of the Invention

The present invention relates to computer-based systems for analyzing marketing data. More specifically, the present invention relates to a method and an apparatus for automatically targeting particular groups of consumers for a marketing campaign, based upon response data from consumers and/or historical consumer behavioral data.

Related Art

Marketing campaigns can potentially reach millions of consumers offering goods and services as diverse as credit cards and magazine subscriptions through communication channels such as mass mailings and phone bank solicitations. Marketing campaigns are typically directed to particular segments of the population defined by demographic, geographic or behavioral characteristics that have exhibited a propensity to respond to particular marketing messages. In this way, marketing resources can be directed toward consumers who are likely to respond favorably to solicitations.

Current approaches to the problem of marketing automation only automate parts of the modeling process. The entire marketing process has many additional manual tasks: analyze and prepare the data for modeling, create a balanced sample of the data, study the modeling results, execute an automatic scoring module, and then manually develop a system for selecting the appropriate prospects for the marketing campaign. Additionally, for marketing campaigns with multiple treatments, a method must be created manually for selecting which treatment each target prospect will receive. Not only is this process labor intensive, but it is very error-prone and time consuming. In addition, the results generally must be analyzed manually. Because of the time required for the entire process, the marketer generally has to go to market with a new campaign without the benefit of the results of the previous campaign. The above-described method for developing a marketing campaign has a number of drawbacks. First, it is time consuming. A statistician typically requires several weeks to develop a model and use the model to create a marketing campaign. Second, since the statistician manually selects particular attributes to examine, important predictive relationships in the test marketing data might be inadvertently overlooked. This can result in a marketing campaign that performs poorly. Additionally, even if the model is constructed properly, the process of selecting prospective clients to contact using the model can be a time consuming manual process.

What is needed is a system that automatically finds predictive relationships between consumer attributes and consumer responses, and that uses these relationships to automatically produce a marketing campaign directed to consumers who are likely to respond.

SUMMARY

One embodiment of the present invention provides a method and apparatus that automatically finds predictive relationships in marketing data between consumer attributes and consumer responses, and that uses these relationships to produce a marketing campaign directed to consumers who are likely to respond. The system operates by constructing a model for consumer responses by dividing a database containing consumer records into segments containing one or more consumer records based upon attributes in the records. This segmentation is performed with a goal of optimizing the predictive power of the model by identifying the "best" splits with which to build an induction tree. Splitting can be based on more than one attribute, including a treatment (type of solicitation) attribute. Next, the system accesses a second database containing records for prospective consumers. The system uses the consumer response model to select a group of target consumers from the second database based upon their propensity to respond, or their propensity to respond to particular treatments. The system next assigns particular treatments to particular target consumers so as to substantially maximize the response rate among the group of target consumers. A subsequent marketing campaign applies the specified treatments to target consumers. The results of this marketing campaign can be used to adjust the response model for subsequent marketing campaigns. Thus, the above-described embodiment of the present invention provides a number of advantages. First, it allows a marketing manager to create a marketing campaign automatically without the services of a trained statistician, or it can greatly enhance the productivity of trained analysts supporting marketing campaigns. Second, since the present invention can exhaustively search for predictive relationships between consumer attributes and response rates, it will not miss any important relationships. Furthermore, since the present invention can automate the process of generating marketing campaigns, it can create a marketing campaign within hours, instead of weeks as is presently required using existing systems.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates a system for creating a marketing campaign in accordance with an embodiment of the present invention.

FIG. 2 is a flow chart illustrating the process of generating a segmentation of consumer records in accordance with an embodiment of the present invention.

FIG. 3 is a flow chart illustrating the process of generating a scored prospect list of prospective consumers for the marketing campaign in accordance with an embodiment of the present invention.

FIG. 4 is a flow chart illustrating the process generating a target list in accordance with an embodiment of the present invention.

FIG. 5 illustrates a tree structure that is used to generate a segmentation in accordance with an embodiment of the present invention.

FIG. 6 illustrates a cell matrix relating treatments to segments in accordance with an embodiment of the present invention. FIG. 7 is a flow chart illustrating the process of using response data from a marketing campaign to update a response model in accordance with an embodiment of the present invention.

FIG. 8 illustrates how segmentation is performed with multiple treatments in accordance with an embodiment of the present invention. DETAILED DESCRIPTION

The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein. The data structures and code described in this detailed description are typically stored on a computer readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs) and DVDs (digital video discs), and computer instruction signals embodied in a carrier wave. For example, the carrier wave may originate from a communications network, such as the internet.

One embodiment of the present invention uses a process known as "predictive modeling" in the creation of a marketing campaign. In this embodiment, a model is created from a database containing consumer demographic and behavior data, as well as response information. This model is used to predict which of a group of prospective consumers are likely to respond to a marketing campaign, and those consumers to contact.

FIG. 1 illustrates an embodiment the present invention as described above in a system for creating a marketing campaign. The functional modules shown in FIG. 1 are model-building unit 102, scoring unit 104, target list generator 106, contact initiator 108, response collector 110 and model-refining unit 112. Model-building unit 102 receives inputs from consumer database 120 and schema 114, and uses these inputs to generate model 103. Scoring unit 104 receives inputs from model 103, prospect database 122 and schema 115, and uses these inputs to generate scored prospect list 105. Target list generator 106 receives inputs from scored prospect list 105 and schema 116, and uses these inputs to generate target list 124. Target list 124 is input into contact initiator 108, which is responsible for contacting the selected prospective customers. Response collector 110 gathers responses 126 to contacts made by contact initiator 108. Responses 126, along with schema 118, are input into model-refining unit 112, which refines model 103. The system of FIG. 1 will now be described in greater detail, starting with consumer database 120.

Consumer database 120 includes records containing information pertaining to consumers. For example, record 130 includes information for a particular consumer, including predictive variable 132. In this embodiment, predictive variable 132 is a boolean variable indicating whether or not there was a response. In the absence of a known value for predictive variable 132, predictive variable 132 can be a dependent predictive variable that the system attempts to predict using model 103. When the value of predictive variable 132 for all records is known, consumer database 120 is referred to as a "training database". Record 130 can also include a treatment field 134 indicating which marketing treatment the particular consumer received, where "marketing treatment" refers to any of a variety of types of solicitation. A marketing treatment can be related to a number of factors, such as message, product, price and type of channel. For example, in a mailing campaign, a company might test several different types of letters (treatments) on different consumers. Record 130 additionally includes descriptive attributes 136, which contain information about the consumer. This information may include demographic data, such as the age and income of the consumer; locational data; and behavioral data, indicating, for example, the types of magazines the consumer subscribes to or categories of products they buy from catalogs. Note that locational data might refer to geographic locations as well as addresses on computer networks such as the internet (for example, domain names).

Schema 114 contains metadata that describes the structure of consumer database 120. Schema 114 may additionally define transformations of attributes 136. For example, attributes corresponding to a "checking account balance" and a "savings account balance" can be added to form a new attribute "total account balance," if the "total account balance" has an important business meaning. Schema 114 also includes specific information on how to operate model-building unit 102 so as to produce model 103.

Model-building unit 102 receives inputs from consumer database 120 and schema 114, and uses these inputs to generate model 103. Model 103 may assume a number of different forms, including baseline, response, and refined. A "baseline model" is an initial model that is constructed without using response data from prior campaigns. In a baseline model, records 130 are proxy records. For example, people who already have bought the product may be used as a proxy for people who would respond to a marketing campaign. A "response model" is a model constructed using responses gathered from consumers during a first marketing campaign. A response model can incorporate response data for a number of different treatments. A "refined model" is a model created after at least one marketing campaign. In a refined model, parameters for the previous model (which is either a response model or a refined model) are adjusted according to the responses to the prior campaign. Model-building unit 102 generates model 103 by classifying records in database

120 based on the values of attributes contained in the records. Any of a variety of classification schemes can be used, such as, but not limited to, schemes based on trees, tables, or other data structures.

In one embodiment, model-building unit 102 uses a segmentation process to generate model 103. In this embodiment, records are classified according to a binary tree structure so as to separate database 120 into a number of non-overlapping segments, as illustrated in FIG. 5.

In FIG. 5, each node in the tree, for example, node 508, corresponds to a non- overlapping segment. In a binary induction tree each non-leaf node, for example, root node 502, has two child nodes. A condition is associated with each non-leaf node, such as "age > 40." If the condition is true for a given record, then the given record is assigned to the respective child node. If it is false, the record is assigned to the other child node. In FIG. 5, records for consumers with "age > 40" are in the right subtree (node 506), and other records are in the left subtree (node 504). Within node 504, a further split is made on the basis of the attribute "account balance > $16K." Records for consumers with "account balance > $16K" are in the right subtree of node 504 (node 510), and other records are in the left subtree of node 504 (node 508). Within node 506, the next split is made on the basis of the attribute "income > $60K." Records for consumers with "income > $60K" are in the right subtree of node 506 (node 514), and other records are in the left subtree of node 506 (node 512).

Hence, each segment can be defined by a series of boolean operations based on one or more of the descriptive attributes. For instance, the tree in FIG. 5 illustrates nodes 512 and 514, corresponding to segments defined as "age > 40" combined with either "income <= $60K" (node 512) or "income > $60K" (node 514); and nodes 508 and 510, corresponding to segments defined as "age <= 40" combined with either "account <= $16K" (node 508) or "account > $16K" (node 510).

Each of the leaf nodes 508, 510, 512 and 514 are associated with terminal segments. Leaf node 508 is associated with a segment of consumers having "age <= 40" and "account balance <= $16K"; this segment has a response rate of 4%. Leaf node 510 is associated with a segment of consumers having "age <= 40" and "account balance > $16K"; this segment has a response rate of 8%. Leaf node 512 is associated with a segment of consumers having "age > 40" and "income <= $60K"; this segment has a response rate of 2%. Leaf node 514 is associated with a segment of consumers having "age > 40" and "income > $60K"; this segment has a response rate of 10%.

Note that if the system uses multiple treatments, the system can keep track of response rates for each of the treatments. For example, the response rate of 4% for node 508 might be the result of a response rate of 2% for a first treatment, applied to half of the members of the segment, and a response rate of 6% for a second treatment, applied to the other half of the members of the segment.

Returning to the segmentation process, FIG. 2 is a flow chart illustrating a segmentation process as referred to above. In step 202, model-building unit 102 reads data from schema 114 and uses it to read data from consumer database 120 into model- building unit 102. In step 204, model-building unit 102 determines the type of induction tree (e.g., classification or regression) to use based on the data type (e.g., boolean, categorical, or continuous) of dependent variable 132. Alternatively, the type of induction tree to use can be set manually.

In step 206, model-building unit 102 recursively segments the records in consumer database 120 to produce model 103. As an example, an embodiment of the present invention wherein segmentation is performed according to a binary induction tree, as illustrated in FIG. 5, will be described.

In step 206, starting with the entire database 120, the best split is found across all attributes 136. For a boolean attribute (i.e., an attribute having two possible states), only one split is possible. For a continuous-valued attribute with "n" distinct values, there are n-1 possible splits of the form "greater than." If the number of distinct values is too large, as determined using information from schema 114, then the values can be distributed into a smaller number of bins according to information from schema 114. For a categorical attribute (i.e., an attribute having an arbitrary number of states) with "n" distinct values, there are 2^n"'-l possible splits, one for each possible non-NULL subset of the distinct values. If the number of distinct values is too large, as determined using information from schema 114, then segmentation can be performed without considering splitting based on this attribute. Alternatively, an order can be imposed based upon the values of this categorical attribute in order to limit the number of possible splits to "n-1."

The metric for determining the best split can be defined from information from schema 114. Any of a variety of metrics can be chosen. Metrics that can be used include Entropy, Gini Index, and Gini-Hat Index. The metric chosen for segmentation has a subtle effect on the types of splits that are chosen by model-building unit 102. The Entropy and Gini Index metrics produce similar trees. The Gini Index is mathematically equivalent to the sum of squared errors metric. The Gini-Hat Index (GHI) is preferred for databases that have many strong relationships in the data. Strong relationships tend to produce trees with many "pure" nodes (100% Yes or 100% No), which are undesirable because they leave very little room for marketing opportunities. A variance metric can be used for regression trees.

For a given metric, the best split can be defined as the split with the largest gain, where the gain is the difference between the value of the metric for the attribute at the parent node and the sum of the values of the metric for the attribute at the two child nodes. Database 120 is segmented based on the best split. This segmentation process continues recursively, by repeating step 206 on child nodes, until terminated according to information from schema 114. Any of a variety of termination criteria can be used separately or in combination.

For example, a parameter ("StopGrow") can specify the smallest size node that can be split. When the segmentation algorithm encounters a node that is smaller than this size, it ceases splitting on that branch of the tree. Another parameter ("MinSize") can specify the smallest allowable size for a node to prevent creation of a node of smaller than this size. The values of the parameters can be set during segmentation in order to prevent over- fitting of the tree to the data. Alternatively, these parameters can be set manually.

In step 208, model-building unit 102 can also be used to generate distinguishing and/or descriptive characteristics containing information regarding attributes, including attributes that are not used to define the segment. For instance, the tree structure might not include a split based on "gender". However, 80 percent of the consumers in a particular segment might be "male". This statistic is computed as part of the distinguishing characteristics. Thus, distinguishing characteristics show important differences between a segment and the total population. Model-building unit 102 can also generate a set of descriptive statistics of attributes of interest by segment or across more than one segment irrespective of the distinct attribute values of a particular segment. In step 210, model-building unit 102 can be used to determine the relative importance of attributes used in generating the segmentation. In step 212, the segmentation tree can be pruned to prevent overfitting. This pruning process involves examining the statistical significance of segmentation operations, accounting for the bias of looking at multiple attributes if applicable, and then pruning back the tree until all splits in the tree are statistically significant. In one embodiment, step 212 uses a technique known in the art as Bias Adjusted Significance Pruning (BASP), which provides a quantitative method for determining when a particular segment should not be segmented further. BASP creates conservative and robust models without the computational expense of exhaustive cross validation testing. Thus, BASP is superior to methods in the art such as cross validation that are not easily automated and require ad hoc heuristics. BASP also has the advantage of allowing the model-building process to correctly identify the model in one pass. It therefore does not rely on a test set. This means that the training set does not have to be split into train and test. This is important when the number of records in the training set is small.

After step 212, the segmentation process is complete, and model 103 is available for use in creating a marketing campaign.

Returning to FIG. 1, scoring unit 104 takes inputs from model 103, prospect database 122, and schema 115, and uses these inputs to generate scored prospect list 105. Prospect database 122 includes information on prospective consumers. The prospective consumers in prospect database 122 can be the same as the consumers in consumer database 120, as in the case of a cross-sell application, or they can be different, as in the case of a new customer acquisition application. At least some of the attributes used in constructing model 103 must be available in prospect database, so that a segment and/or a score can be assigned to each prospect according to the process described below. Schema 115 contains metadata that describes the structure of prospect database 122, and can also include information on treatments to be applied to consumers in prospect database 122.

FIG. 3 is a flow chart illustrating the process of generating scored prospect list 105, a scored list of prospective consumers for the marketing campaign, in accordance with an embodiment of the present invention. This process is performed by scoring unit 104 from FIG. 1. In step 302, scoring unit 104 reads records from prospect database 122. In step 304, a score and/or segment is computed for each record by applying the attribute values to the segmentation from model 103. For instance, if predictive variable 132 is a boolean variable indicating a "propensity to buy", then high scoring segments contain records for consumers who are more likely to buy. In step 306, the result is output to scored prospect list 105. Scored prospect list 105 contains records from prospect database 122 with a segment and/or score assigned to each record. In one embodiment, scored prospect list 105 is divided into segments based on attribute values in the records.

Note that it is possible to screen the set of prospects using a filter, for instance to delete high-risk consumers. The screening process can be performed by scoring unit 104 or by target list generator 106. The screening process does not necessarily have to exclude all members of a particular group, it may, for example, exclude only a portion of a group, and allow through the rest of the group.

Returning to FIG. 1, scored prospect list 105, along with schema 116, is used by target list generator 106 to generate target list 124. Schema 116 contains campaign parameters, such as the desired size of the campaign and the percentage of the campaign that should be used for test and control, and can also include information on treatments to be applied to prospects.

FIG. 4 is a flow chart illustrating the process of generating target list 124 in accordance with an embodiment of the present invention. This process is performed by target list generator 106 from FIG. 1. Target list generator 106 takes in model 103, scored prospect list 105, and schema 116. From these inputs, it generates target list 124. More specifically, in step 402, target list generator 106 reads in scored prospect list 105. In step 404, target list generator 106 reads in campaign parameters from schema 116, including information on how many consumers to include in the target list, and what treatments to use for the contacts.

In step 406, target list generator 106 selects prospective consumers to include in target list 124. In one embodiment, target list generator 106 selects prospective consumers by first selecting records corresponding to the highest scoring segment in model 103. Target list generator 106 then selects records corresponding to the next highest scoring segment, and so on, until the number of selected consumers reaches the size of the campaign.

In step 408, target list generator 106 outputs target list 124. Target list 124 contains a subset of the records in scored prospect list 105, including at least a unique identifier with which to identify each consumer. Thus, target list 124 specifies which consumers to contact.

Target list 124 can also contain an identifier for the treatment to use for each record, thereby specifying which treatment to use for each consumer. If information on multiple treatments is available, in one embodiment target list generator 106 assigns the highest scoring treatment for each segment to consumers in that segment. In another embodiment, treatments are assigned across all segments in proportion to their score, such that higher scoring treatments ar3 assigned with greater frequency than lower scoring treatments, but higher scoring treatments are distributed across segments. In another embodiment, or if no treatment infoπnation is available, treatments can be assigned to consumers so as to evenly distribute the multiple treatments across segments.

For test and control purposes, target list generator 106 may select some consumers from segments with low scores for inclusion in target list 124. Target list generator 106 may additionally assign sub-optimal treatments to selected consumers. This allows the system to monitor for possible improving behavior of low scoring segments and treatments, for example, if there are multiple phases in a campaign, by comparing the new score for a segment or treatment to the old score for that segment or treatment or to the score for other segments or treatments.

Returning to FIG. 1, target list 124 is used by contact initiator 108 for initiating contact with the selected prospective customers using the assigned treatments. In one embodiment, contact initiator 108 can generate a mailing list for the targeted consumers. In another embodiment, contact initiator 108 can generate a phone list for calling the targeted consumers. In general, the present invention can be applied to any type of consumer contact, including mail, telephone, email, and even door-to-door solicitations. In another embodiment, the system is used for targeting on websites. In this embodiment, a model is created for each of a set of promotions based on a predictive variable that acts as a surrogate for the offer. The training database consists of a set of consumers who previously visited a website for whom demographic and other data exists, as well as the surrogate predictive variable. Each consumer for whom there exists a database record is then scored for each model. When a consumer visits the site, the promotion that has the highest score for that consumer is presented on the web page. In another embodiment, other promotion selection strategies are used when the consumer visits the site multiple times, such as presenting multiple copies of the promotions in random order, with the number of copies of each promotion being proportional to the score whereby the highest scoring promotions have the most copies and the lowest scoring promotions have the fewest copies. Response collector 110 gathers responses 126 to contacts made by contact initiator 108. This response information typically includes information on whether or not a targeted consumer responded to a treatment, and can also include treatment information. Responses 126, along with schema 118, are used by model-refining unit 112 to refine or update model 103 using the information contained in responses 126. Schema 118 describes the response data 126, including any transformations to be applied to this data.

Model-refining unit 112 re-computes the score (response rate) for each of the segments in order to track changing behavior over time. If the prospects in a previously high-scoring segment have a lower response rate, the segment's score is reduced. If prospects in a previously low-scoring segment have a higher response rate, the segment's score is increased. Similarly, responses to different treatments can be incorporated into refined model 103. In one embodiment, model-refining unit 112 does not change the preexisting segmentation in model 103. Hence, a prospective customer who is in a certain segment in model 103 will be in the same segment in refined model 103. Then, if the same prospect database 122 is used to generate a subsequent target list 124, scored prospect list 105 does not have to be regenerated because the segmentation of model 103 does not change during the refining process.

FIG. 7 is a flow chart illustrating the process of using responses 126 to refine or update response model 103 in accordance with an embodiment of the present invention. This process is performed by model-refining unit 1 12 from FIG. 1. In step 702, model- refining unit 112 reads responses 126. Model-refining unit 112 also reads model 103 and schema 118.

In step 704, model-refining unit 1 12 uses response data 126 to update scores for segments and treatments in model 103. Note that "refining" or "updating" refer to any modification, adaptation, or other change in model 103 to accommodate response data 126. Any of a variety of techniques can be used to update the scores for segments and treatments. In one embodiment, scores are updated by minimizing the variance scaled distance between the squared deviation for all of the observations. In step 706, model-refining unit 112 produces a report specifying how model 103 has been modified.

The above process of refining model 103 and generating a new target list 124 can take place multiple times during a marketing campaign. This allows refined model 103 to incorporate changes in consumer response rates over time. In one embodiment, the score updating process uses a time decay function so that more recent observations are given more weight than older observations. For instance, each sample can be multiplied by the factor e^'Bt, where t is time and B is a factor that can be adjusted to speed up or slow down the effect of time decay of the function. Other types of functions can also be used, such as cyclical functions that weight more heavily observations from the previous year to capture seasonal response patterns, such as for holidays. In general, any function that reduces the impact of an observation based upon the age of the observation can be used.

The embodiment of the present invention as illustrated in FIG. 1 has now been described. Many modifications to and variations of this embodiment, as well as other embodiments, are possible within the scope of the present invention. For example, several variations of the segmentation process will now be described.

As stated earlier, the segmentation process can vary depending on what type of induction tree (e.g., classification or regression) is used for classification of records. Two way classification can be used for predictive variables that are boolean data types, and K- way classification can be used for predictive variables that are categorical data types with K-states. Boolean and categorical classification are related but result in different internal representations. Although two-state categorical classification produces the same tree as boolean classification, significant efficiency gains can result from specifying two-state data types as boolean. Regression trees can be used for continuous-valued predictive variables. These continuous variables can be represented as floating point numbers. Ordered discrete variables having integer values, such as a quantity of items, are also continuous-valued and therefore use a regression tree.

The segmentation process can also be varied to account for missing values in database 120. Note that the value of predictive variable 132 is typically defined for all records in database 120. This is because a consumer record 130 cannot be used to predict a behavior if the corresponding predictive variable 132 is not defined. For all attributes 136, missing values are allowed, however, fewer missing values improves the predictive power of model 103. In one embodiment, missing values are handled using an information-theoretic approach. However, in this method, the overall effect of missing values is not adequately penalized for splits high up in the tree because of the "local" nature of the algorithm of this approach. Model-building unit 102 can adjust for this by adding a penalty for splitting on attributes contain missing values. The range of this penalty value can be from 1.0 to 5.0, with 1.0 being used as a strictly local penalty for missing values, and 5.0 being used so that a sparse attribute (i.e., an attribute with many missing values) is least likely to be split first. A default value can be set at 3.0. This penalty adjusts information gain by the information-theoretic loss due to the missing values, and can be reapplied for each split. Alternatively, missing values may be handled in other ways. Rather than penalizing for missing values, one embodiment of the present invention fills in missing values. A missing value can be filled in a number of ways. For example, a record with a missing value can be propagated to both children with fractional weights (x and 1 -x) equal to the proportion of non-missing value records that were routed to the children when building the tree. This approach assumes that the actual values of an attribute with missing values correlate, on average, to the values of that attribute without missing values. Another approach is to assign missing values in a way that maintains the integrity of a variance-covariance matrix. Another approach is to propagate the record with the missing value to the children with the most similar values of other attributes. The segmentation process can also be varied to structure the data more closely to the business model. In one embodiment, model-building unit 102 can set dependencies between attributes. For example, missing credit card balances should not be treated as missing if the consumer does not have a credit card. Instead of setting the credit card balance to missing or unknown, model-building unit 102 creates a "HAS- A" dependency between a credit card balance attribute and a new synthetic attribute indicating whether or not the consumer has a credit card. This approach might be used for when consumer records having different formats are joined. Note that only boolean type fields can be designated as HAS-A attributes. Thus, if A depends on B, B must be of boolean type.

HAS-A attributes generally do not have enough predictive power to be chosen on their own merits. If sufficient predictive power is found in the dependent attribute, a special composite split, using both the HAS-A attribute and the dependent attribute, can be performed.

The segmentation process can also be varied if there are a number of different treatments used in a marketing campaign. For example, in one embodiment, multiple treatments can be handled by a designating treatment attribute 134 as a categorical attribute that adds a second dimension to the splitting process. The treatment attribute 134 is preferably free of unknown or missing values. At each split, the differential in the value of predictive variable 132 across the different values of treatment attribute 134 is taken into account. FIG. 8 illustrates how segmentation can be performed in the multiple treatment case in accordance with an embodiment of the present invention. This embodiment uses a segmentation criterion that measures the gain in explanatory power for the response that results from different treatment effects in child nodes above and beyond the power of a basic split that simply differentiates average response irrespective of treatment. The tree illustrated in FIG. 8 shows some similarity to the tree without different treatments illustrated in FIG. 5, but there are important differences. The tree in FIG. 5 shows the application of simple split criteria at two successive levels while FIG. 8 shows the choice of a split at just one level employing a compound criterion that considers both the predictor attribute and the treatment attribute. The responses to different treatments in the base node, 802, are shown by responses in node 803 which contains records of consumers that received treatment A and node 805 which contains records of consumers that received treatment B. Node 808 contains records of consumers with "age <= 40" that received treatment A, node 810 contains records of consumers with "age <= 40" that received treatment B, node 812 contains records of consumers with "age > 40" that received treatment A, and node 814 contains records of consumers with "age > 40" that received treatment B.

The metric for the choice of splits in the multiple treatment case is an elaboration of the single treatment case. For example, consider the case without multiple treatments. In FIG. 8, the single treatment case corresponds to the nodes 802, 804 and 806, without nodes 808, 810, 812 and 814. Gain can be computed for the single treatment case using the following equation: gain = metric(node 802) - (metric(node 804) + metric(node 806)) In this equation the function "metric" is based on the value of predictive variable 132, and is a function such as Entropy, Gini Index, and Gini-Hat Index. The above equation measures the gain in explanatory power of splitting on the attribute "age."

For the case with multiple treatments, nodes 808, 810, 812 and 814 are included in the tree. In this multiple treatment case, the gain in explanatory power includes the contributions due to treatments in the equation that follows: gain = cl(metric(node 802) - (metric(node 803) + metric(node 805)))

- c2(metric(node 804) - (metric(node 808) + metric(node 810)) + (metric(node 806) - (metric(node 812) + metric(node 814))) The coefficients "cl" and "c2" can be adjusted to increase or decrease the influence of the differentiating power of treatments. For example, if "cl = 1" and "c2 = 0", only the effect of splitting on age is considered so that the splits are those of a single treatment tree. If "cl = 0" and "c2 = 1", only the differentiating contribution of treatments A and B are considered. In practice, intermediate values are typically used, for example "cl = 0.5" and "c2 - 0.5", or "cl = 0.2" and "c2 = 0.8".

The segmentation process can also be varied depending on the balance in segment or database populations. For example, in one embodiment, model-building unit 102 uses a parameter "omega" to determine the amount of correction to be made for unbalanced populations. The omega parameter for classification trees is typically set between 0.0 and 1.0. For example, if there are 10% "yes" and 90% "no" responses for the predictive variable 132 in a given population, the "omega" factor can be used to force the algorithm to treat the population as a 50% "yes" and 50% "no" population. This approach has practical benefits. It trades off accuracy for increased numbers of mixed nodes. Such mixed nodes provide more opportunities for finding good targets for some marketing campaigns. The "omega" parameter can be set to its maximum value of 1.0 for target marketing applications, and can be set to its minimum value of 0.0 for pure classification tasks. The "omega" parameter can also be tuned to an intermediate value between 0.0 and 1.0, according to specific needs.

The segmentation process can also be varied depending on what each record represents. For example, in one embodiment, each consumer record can also include a weight to be used to calculate a weighted response. In this case, the metrics are based upon the weighted response rates. Such weights can be used to analyze panel data where each record represents a group of people in a larger population. The weight is then proportional to the number of observations in the group. In addition, such weights can be used to value the consumer based on such characteristics as credit risk, profitability, or revenue generation. The segmentation process can also be varied to use a technique called "boosting".

The boosting method can be used to build more accurate trees by building a series of trees based on the remaining errors from a previous tree. The basic idea of boosting is to improve the segmentation process to build a set of related trees. Each tree generates a score for each record. The final score is then the sum of the scores across all the trees built. The scores are real numbers (e.g., positive for true and negative for false). The scores are not restricted to the range between zero and one, and they do not represent probabilities of responding as in a previously described tree model.

Each tree is built using a different set of weights. The first tree typically has all weights equal to one. For subsequent trees, weights are chosen to minimize a loss function. The resulting weights are such that records that are mis-classified have larger weight. Records that are correctly classified have smaller weight.

In one implementation of boosting, the scores are equal to half the log of the odds ratio within a node. A different choice of a loss function and/or minimization approximation can be used, leading to a different expression for the score. For example, using the same loss function but choosing a Newton update instead of a full minimization leads to scores that are equal to the difference between the probability of responding and the probability of not responding.

Any of a variety of loss functions can be used. One such loss function has an exponential form: LOSS = Sum_over_record_i W(i,t) * exp( -yi * h(i,t) ), where W(i,t) = weight for record 'i' and tree 't', yi = boolean variable for record 'i' (coded as +1 or -1), and h(i,t) = score for record 'i' and tree 't'.

In one embodiment of boosting, described below, the score, h(i,t), is the same for all of the records in a node. All records in node 'j' have the same score: h(i,t) = score for record 'i' in node 'j' and tree 't'. The value of h(i,t) is obtained by minimizing the loss function where the sum is restricted to the records in node 'j'- The result of this minimization is: h(i,t) = (l/2) log(p(j,t)/(l-p(j,t)) ) where p(j,t) = weighted probability of yes for node 'j' in tree 't' The final score for record 'i' is then given by: fi = Sum_over_tree_t h(i,t).

Another embodiment of the boosted model employs a residual regression score as a covariate predictor within nodes of the boosted induction tree. Transformed residuals from a pre-boosting tree structure, which has typically been pruned more closely than a single tree, are used to perform a step-wise regression against all attributes that are candidate predictors. The transformations can be used for non-linearities or for equalizing variances.

Scores from this regression are then included as covariates in the boosted tree. The inclusion as covariates may take the form of binned score values within a multiple treatment implementation of the algorithm. For the continuous valued (regression) embodiment, the covariate may be included as a scaled value in a simple regression mechanism nested within the induction tree.

Thus, several embodiments of the segmentation process of the present invention as illustrated in FIG. 1 have been described. To further illustrate the variety of possible embodiments of the present invention, variations of the scoring process and the target list generation process will also be described. Once model 103 is created, a prospect list can be scored by applying prospect database 122 to model 103. The scoring process results in a list of predicted values, one for each record. For example, for boolean types, the result is a list of the probabilities of "yes" (generally probability to respond). For categoricals, the result is a list of the most likely categories. For continuous values, the result is a list of the predicted values. Also, the standard deviation of the predicted value can be generated for continuous values. In the multiple treatment case, two values are generated. One is the predicted value, usually the "probability of response", and the other is the best treatment. Another embodiment generates the segment number rather than a probability or a prediction. Then, a table can be used to look up a score.

When a marketing campaign is desired, a list of target prospects must be created. To create target list 124, a cell matrix is created. In one embodiment, the cell matrix is created by generating a matrix with one row for each segment and one column for each treatment, plus one column to represent the whole segment. The cells in the latter column have the percentage of the number of targets that should be selected from that segment.

This column sums to 100%. The treatment columns contain a number that is a percentage of each of the treatments that should be allocated within that segment. Thus, each row of treatment cells sum to 100%.

In another embodiment, the cell matrix is automatically adjusted to contain absolute numbers of targets based on the campaign size provided by the user.

In another embodiment, the cell matrix is automatically constrained based on the number of targets available. This is done by first scoring the prospect database with the segment ID as described above. Then, tallies are made of the number and the percentage of the number of targets for each segment. The cell matrix can then be modified by reducing the number in the cell to ensure that there will be enough targets to fill it. This reduction can be done explicitly or proportionally. A corresponding increase must then be done to the other cells to ensure consistency.

Another embodiment allows the user to specify a number or percentage of the total to be used as a test population. This results in an additional pseudo-cell (or row of cells in the multiple treatment case). These test cells represent prospects that are selected randomly across all segments, or from a set of segments defined by the user.

Another embodiment allows the user to specify a number or percentage of the total number of targets to be used for tesing a new treatment. One embodiment is a new column that has "N/S" targets, where "N" is the number in the test and "S" is the number of segments. Alternatively, this number may be weighted by the total for the segment. These test targets may be in addition to the previously specified campaign size, or may be taken from other cells the preserve the size.

In one embodiment of the target list generation process, target list generator 106 populates a cell matrix, such as cell matrix 600 that appears in FIG. 6, to optimize the number of contacts to make for each segment and treatment. Referring to FIG. 6, cell matrix 600 has rows 1, 2, 3, ... corresponding to segments, and columns, A, B, C, ... corresponding to treatments. Note that each cell within the cell matrix 600 can have a specific response rate that is determined from responses collected during prior campaigns, and/or a number or percentage as described above. In one embodiment of the present invention, target list generator 106 allocates prospects across cell matrix 600 in order to optimize total expected response subject to a risk constraint. This risk constraint penalizes the response rate for a cell if the measured response rate for the cell has a high variance. For example, instead of optimizing for expected response rate, target list generator 106 optimizes expected response rate minus a function of the variance of the expected response (multiplied by a coefficient). By penalizing for variance, this risk constraint reduces that chance that the marketing campaign will perform more poorly than a random mailing.

If this coefficient is taken to be zero, then the cell matrix is populated by filling the cells with the highest response rate first. Once a cell is populated, the treatment with the best response rate is assigned to all consumers in that cell. As an alternative, if the difference in response rates between treatments is not too large, then a small number of consumers are randomly assigned to the non-dominant treatment. This treatment testing is only done if enough consumers are in this cell. If the difference in response rates is too small, then the treatments are assigned randomly across all consumers in the cell. In another embodiment of the target list generation process, a different algorithm is used. Using this algorithm, the segments are sorted by score. Consumers to contact are selected from the segments by assigning a 100% sampling rate to the highest scoring segment until a target set, starting with the highest scoring segments, then selecting a holdout set from a distribution of the remaining segments, for example, by separating the remaining segments into equal quartiles and assigning a sampling rate of 60%, 20%, 10%), and 10% to the first through fourth quartiles, respectively. Linear interpolation can be used for segments that run across the boundary of a quartile. Then, a sampling rate for each treatment in each segment is determined by according to a Z score for each segment, where the Z score is the difference between the response rate to the best treatment and the test treatment, normalized by the standard deviation of the difference. The sample size receiving the best treatment varies based on the Z score, for example, from 50% of the consumers to be contacted from the segment where the Z score is less than a minimum value such as "0.1", to 100% of the consumers to be contacted from the segment where the Z score is greater than a maximum value such as "3.0". The sampling assignments are verified using the expected response rates to ensure that the results will be statistically significant.

The segmentation process can also be varied so as to perform multidimensional segmentation. Multidimensional segmentation is performed using a tree algorithm to predict a number, K, of different behaviors at the same time. For example, the different behaviors may represent revenue, cost, risk, specific product usage, or loyalty. This method is performed by replicating the training data set K times into K separate "tiers." Predictive variable 132 is constructed as a composite, successively taking on values of one of the K behavioral outcomes in each of the K tiers. Treatment attribute 124 is defined with a distinct value in each tier which is identified with the behavioral measure present as predictive variable 132 in that tier.

The criterion for choice of splits is to maximize the difference in the K- dimensional pattern of behavior in the two child nodes relative to the parent node. A variation on this embodiment adapts a multidimensional treatment algorithm, for example the K-dimensional Gini information measure with a differentiation parameter set for maximal treatment differentiation as is described below. The results are much like a powerful multi-treatment model with some behaviors being very high in some nodes, and other behaviors being very high in other nodes. The behaviors in a single analysis (referred to as "behavioral drivers") can be both positive (e.g., profit revenue, sales of K different products) and negative (e.g, cost, risk, attrition default). Hence, only minor consideration is placed on the average outcome.

A variation on the above-described embodiment includes using a regression tree with split criteria based on explained sums of squares. Another variation uses absolute linear differences. These variations allow for flexible weighting of different behavioral drivers to provide for greater or lesser impact on the segmentation as appropriate for a given business problem.

The previous embodiments handle cases when the number of treatments (or behavior drivers) is small (four to eight). In cases where the number of treatments (or behavior drivers) is large, 20-30 or even 100 (such as one might use in analyzing a large number of products), items that appeal to similar people may cluster together into an overall behavioral pattern. A behavior that follows a cluster pattern in most segments but differs in a few distinct segments may suggest marketing opportunities.

Note that the above-described differential treatment mechanisms are defined to work on classifications made with boolean data types and in regressions made with continuous or integer data types. In some cases, the underlying behavior measures are scaled data types that are transformed into boolean data types indicating a place in the top or bottom n-tiles of the distribution.

Another embodiment of multidimensional segmentation treats the K target behaviors as a matrix with K columns and one row per observation. The class of target predictive variables is expanded to include continuously-valued measures. A metric may include different weights for each of the K behavioral measures to align statistical impacts with the business importance of different measures. CONCLUSION

The above-described embodiments of the present invention provide a number of advantages. First, they can allow a marketing manager to create a marketing campaign automatically without the services of trained statisticians. Second, since an aspect of the present invention exhaustively searches for relationships between consumer attributes and response rates, it is not likely to miss any important relationships. Third, the system automatically handles multiple treatments which generally must be handled manually. Fourth, the system can systematically create a test campaign allowing the marketer to measure the effectiveness of the overall campaign. Fifth, the system automatically handles many labor-intensive data preprocessing functions such as handling missing data, data transformations, and data dependencies. Also, because the system is scalable, sampling is not required. Furthermore, the present invention can be used to automatically create a marketing campaign within hours, instead of weeks as is presently required using existing systems.

The foregoing descriptions of embodiments of the invention have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the invention. The scope of the invention is defined by the appended claims.

Claims

WHAT IS CLAIMED IS:

L A computer-implemented method for automatically optimizing a marketing campaign, comprising the steps of:

(a) accessing a training database comprising a plurality of consumer records, each consumer record including at least one attribute and a predictive variable, said predictive variable pertaining to a consumer behavior;

(b) segmenting said training database to create a model having a plurality of segments, each segment comprising at least one of said consumer records, by analyzing values of said at least one attribute;

(c) for at least one of said segments, calculating a score predicting said consumer behavior; and

(d) selecting target consumers from a prospect database using said model;

for use in iteratively optimizing a marketing campaign, targeting said consumer behavior, in an automated fashion.

2. The method of claim 1 where said segmenting step includes recursively determining the best split, including determining which attribute has the greatest predictive power for the predictive variable.

3. The method of claim 2 where determining which attribute has the greatest predictive power for the first variable includes calculating a metric for each segment, said metric belonging to the group consisting of Gini, Gini-Hat, Entropy, Explained Sum of Squares, and Absolute Linear Distance measures.

4. The method of claim 2 where said segmenting step includes penalizing a splitting based on an attribute containing missing values.

5. The method of claim 2 where said segmenting step includes filling in an attribute's missing values.

6. The method of claim 2 where said determining the best split includes using one of a plurality of marketing treatments as a segmentation criterion.

7. The method of claim 2 where said segmenting step includes weighting at least one of said plurality of consumer records in accordance with a relative importance of a value.

8. The method of claim 2 where said segmentation includes applying a factor to adjust for populations having an unbalanced distribution of the value of said predictive variable.

9. The method of claim 8 where said segmenting step includes weighting at least one of said plurality of records in accordance with the corresponding consumer's representativeness of a larger superset of consumers.

10. The method of claim 1 where:

(i) said segmenting step includes creating certain of said segments by splitting certain other of said segments; and

(ii) said splitting includes controlling a granularity thereof based on statistical significance testing.

11. The method of claim 10 where said controlling of said granularity includes preventing creation of a segment smaller than a minimum size.

12. The method of claim 1 further comprising a step of accessing a schema describing a transformation to be performed on one or more of said attributes.

13. The method of claim 1 further including a step of creating a plan for targeting selected consumers using a plurality of treatments.

14. The method of claim 13 where said plan includes assigning treatments to selected consumers based on observed responses to said treatments.

15. The method of claim 13 where said plan includes: (i) a plurality of segment-treatment pairs, and (ii) for each said pair, a probability of response of said segment to said treatment.

16. The method of claim 1 where said step of selecting target consumers comprises selecting a first set of consumers exhibiting said consumer behavior.

17. The method of claim 16 where said step of selecting target consumers further comprises selecting a second set of consumers for statistically testing a difference in response to a treatment between said second set of consumers and set first set of consumers.

18. The method of claim 16 where said step of selecting target consumers further comprises selecting consumers from model segments with low scores.

19. The method of claim 17 where said step of selecting target consumers further comprises selecting consumers from model segments with low scores.

20. The method of claim 1 where said step of selecting target consumers comprises selecting a set of consumers for statistically testing a response to a treatment.

21. The method of claim 20 where said step of selecting target consumers further comprises testing the statistical variability of responses of said consumers to variations in treatment assignment.

22. The method of claim 20 where said step of selecting target consumers further comprises selecting consumers τom model segments with low scores.

23. The method of claim 1 where said step of selecting target consumers includes selecting consumers from model segments with low scores.

24. The method of claim 1 further comprising accessing a schema describing a number of target consumers to be selected.

25. The method of claim 1 where said selecting step includes matching entries from said prospect database through said model to assign a segment to said entries.

26. The method of claim 1 further comprising repeating steps (a)-(d) after updating said consumer database using responses from at least one previous marketing campaign.

27. The method of claim 26 where said repeated step (b) includes generating a plurality of new segments.

28. The method of claim 26 further comprising the step of using information from a marketing campaign to iteratively:

(a) update said segmented database for at least some of said treatments;

(b) re-assign at least one treatment to certain of said segments, based on observed past behaviors of said consumers recorded among said at least one response variable; and

(c) re-select certain of said prospective consumers as target consumers in accordance with substantially maximizing said desired behavior.

29. The method of claim 26 where said responses are weighted over time.

30. The method of claim 1 where said training database is substantially similar to said prospect database.

31. The method of claim 1 where said selecting step incorporates locational considerations of said consumers.

32. The method of claim 1 further comprising, prior to said step (a), the step of conducting an initial marketing campaign using statistical sampling from said prospect database to create values for said response variable pertaining to said behavior attribute.

33. The method of claim 1 where:

(i) each of said consumer records further includes a plurality of variables pertaining to a plurality of consumer behaviors; and

(ii) said segmenting step includes generating a multidimensional segmentation, for substantially simultaneously predicting said plurality of behaviors.

34. The method of claim 33 where said step of generating a multidimensional segmentation includes performing multidimensional predictive clustering using a tree algorithm to predict said plurality of behaviors.

35. The method of claim 1 :

(i) deployed for use in predicting a plurality of consumer behaviors;

(ii) where said training database reflects a said plurality of consumer behaviors; and

(iii) said step (d) includes jointly considering a plurality of variables pertaining to said behaviors to select certain of said segments having behavioral distinctiveness.

36. The method of claim 35 where said behavioral distinctiveness is reflected by contrasting patterns of said plurality of variables across said plurality of behaviors.

37. The method of claim 1 where said step of segmenting said training database includes calculating descriptive statistics for at least one of said segments.

38. The method of claim 1 where said step of selecting target consumers includes screening said consumers based on a risk criteria.

39. The method of claim 1 further comprising a step of creating a new attribute upon which at least one of said plurality of attributes depends.

40. The method of claim 1 further comprising a step of pruning said model to remove segments having no statistical significance.

41. The method of claim 1 where said segmentation is improved using a boosting technique.

42. The method of claim 1 further including a step of accessing a schema describing the structure of said training database.

43. The method of claim 42 where said schema can describe a plurality of database structures.

44. A computer-implemented method for automatically optimizing a marketing campaign on a wide area computer network including sites, said method comprising the steps of:

(a) accessing a training database comprising a plurality of consumer records corresponding to consumers who have visited a site, each consumer record including at least one attribute and at least one predictive variable, each of said at least one predictive variables pertaining one of a plurality of offers;

(b) for each predictive variable, calculating a score based on values of said at least one attribute; and

(c) selecting an offer to present on said site based on said score.

45. The method of claim 44 wherein said selecting step includes selecting an offer to present on said site based on said score and on a random variable.