WO2009094552A2 - Granular support vector machine with random granularity - Google Patents

Granular support vector machine with random granularity Download PDF

Info

Publication number
WO2009094552A2
WO2009094552A2 PCT/US2009/031853 US2009031853W WO2009094552A2 WO 2009094552 A2 WO2009094552 A2 WO 2009094552A2 US 2009031853 W US2009031853 W US 2009031853W WO 2009094552 A2 WO2009094552 A2 WO 2009094552A2
Authority
WO
WIPO (PCT)
Prior art keywords
granule
granules
hyperplane
operable
tuples
Prior art date
Application number
PCT/US2009/031853
Other languages
French (fr)
Other versions
WO2009094552A3 (en
Inventor
Yuchun Tang
Yuanchen He
Original Assignee
Secure Computing Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Secure Computing Corporation filed Critical Secure Computing Corporation
Publication of WO2009094552A2 publication Critical patent/WO2009094552A2/en
Publication of WO2009094552A3 publication Critical patent/WO2009094552A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking

Definitions

  • This disclosure relates generally to data mining using support vector machines.
  • Support vector machines are useful in providing input to identify trends in existing data and to classify new sets of data for analysis.
  • Generally support vector machines can be visualized by plotting data into an n-dimensional space, n being the number of attributes associated with the item to be classified.
  • n being the number of attributes associated with the item to be classified.
  • support vector machines can be processor intensive.
  • Recently analysts have developed an algorithm known as "Random Forests.”
  • Random Forests uses decision trees to classify data. Decision trees modeled on large amounts of data can be difficult to parse and hence classification accuracy is limited. Thus, “Random Forests” utilizes a bootstrap aggregating (bagging) algorithm to randomly generate multiple bootstrapping datasets from a training dataset. Then a decision tree is modeled on each bootstrapping dataset. For each decision tree modeling, at each node a small fraction of attributes are randomly selected to determine the split. Because all attributes need to be available for random selection, the whole bootstrapping dataset is needed in the memory. Moreover, “Random Forests” has difficulty working with sparse data (e.g., data which contains many zeroes). For example, a dataset, formatted as a matrix with rows as samples and columns as attributes, has to be entirely loaded into the memory even when a cell is zero.
  • sparse data e.g., data which contains many zeroes
  • Random Forests is space-consuming, and when modeling the entire data matrix, “Random Forests” is also time-consuming, given a large and sparse dataset.
  • the dataset cannot be parallelized on a distributed system such as a computer cluster, because it is time- consuming to transfer a whole bootstrapping dataset between different computer nodes.
  • methods include: receiving a training dataset comprising a plurality of tuples and a plurality of attributes for each of the tuples; deriving a plurality of granules from the training dataset, each granule comprising a plurality of sample tuples and a plurality of sample attributes; processing the granules using a support vector machine process to identify a hyperplane classifier associated with each of the granules; predicting a classification of a new tuple using each of the hyperplane classifiers to produce a plurality of predictions; and aggregating the predictions to derive a decision on a final classification of the new tuple.
  • Systems can include a granule selection module, multiple granule processing modules, one or more prediction modules and an aggregation module.
  • the granule selection module can select a plurality of granules from a training dataset. Each of the granules can include multiple tuples and attributes.
  • the granule processing modules can process granules using support vector machine processes identifying a hyperplane classifier associated with each of the granules.
  • the one or more prediction modules can predict a classification associated with an unknown tuple based upon the hyperplane classifiers to produce multiple granule predictions.
  • the aggregation module can aggregate the granule predictions to derive a decision on a final classification associated with the unknown tuple
  • FIG. 1 is a block diagram of a network environment including an example classification system.
  • FIG. 2 is a block diagram of an example classification system.
  • FIG. 3 is a block diagram of a messaging filter using a classification system and illustrating example policies.
  • FIG. 4 is a block diagram of an example distributed classification system.
  • FIG. 5 is a block diagram of another example distributed classification system.
  • FIG. 6 is a flowchart illustrating an example method used to derive granules and classification planes.
  • FIG. 7 is a flowchart illustrating an example method used to derive classification associated with a new set of attributes for classification.
  • FIG. 8 is a flowchart illustrating an example method used to derive granules and distribute granules to processing modules.
  • Granular support vector machines with random granularity can help to provide efficient and accurate classification of many types of data.
  • granular support vector machines can be used in the context of spam classification.
  • the granules typically much smaller than the bootstrapping datasets before random subspace projection, can be distributed across many processors, such that the granules can be processed in parallel.
  • the granules can be distributed based upon spare processing capability at distributed processing modules. The nature of the granules can facilitate distributed processing. The reduction in size of the training dataset can facilitate faster processing of each of the granules.
  • FIG. 1 is a block diagram of a network environment including an example classification system 100.
  • the classification system 100 can receive classification queries from an enterprise messaging filter 110.
  • the enterprise messaging filter 110 can protect enterprise messaging entities 120 from external messaging entities 130 attempting to communicate with the enterprise messaging entity 120 through a network 140.
  • the classification system 100 can receive a training dataset
  • the training dataset 150 can be provided by an administrator, for example.
  • the classification system 100 can use granular support vector machine classification 160 to process the training dataset and derive hyperplane classifiers respectively associated with randomly selected granules, thereby producing a number of granular support vector machines (e.g., GVSM 1 170, GVSM 2 180 ... GVSM n 190).
  • the classification system 100 can derive a number of granules from the training dataset.
  • the granules can be derived, for example, using a bootstrapping process whereby a tuple (e.g., a record in the dataset) is randomly selected for inclusion in the in-bag data.
  • a tuple e.g., a record in the dataset
  • Additional tuples can be selected from among the entire training dataset (e.g., sampled with replacement).
  • the selection of each tuple is independent from the selection of other tuples and the same tuple can be selected more than once. For example, if a training dataset included 100 tuples and 100 bootstrapping samples are selected from among the 100 tuples, on average 63.2 of the tuples would be selected, and 36.8 of the tuples would not be selected.
  • the selected data can be identified as in-bag data, while the non-selected data can be identified as out-of-bag data.
  • the sample size can be set at 10% of the total number of tuples in the training dataset. Thus, if there were 100 tuples, the classification system 100 can select 10 samples with replacement.
  • the classification system 100 can then project the data into a random subspace.
  • the random subspace projection can be a random selection of tuple attributes (e.g., features).
  • the random selection of tuple attributes can be performed without replacement (e.g., no duplicates can be selected).
  • the random selection of tuple attributes can be performed with replacement (e.g., duplicate samples are possible, but discarded).
  • the random selection of tuples with a projection of the tuple attributes into a random subspace generates a granule.
  • the granule can be visualized as a matrix having a number of rows of records (e.g., equal to the number of unique tuples selected from the training dataset) with a number of columns defining attributes associated with the granule.
  • the classification system 100 can then execute a support vector machine process operable to receive the data and to plot the data into an n-dimensional space (e.g., n being the number of unique tuples sampled during the bootstrapping process).
  • the support vector machine process can identify a hyperplane classifier (e.g., a linear classifier) to find the plane which best separates the data into two or more classifications.
  • a hyperplane classifier e.g., a linear classifier
  • adjustments to the support vector machine process can be made to avoid overfitting the hyperplane classifier to the datapoints.
  • the hyperplane classifier which achieves maximum separation can be identified and selected by the support vector machine process.
  • the support vector machine can warp the random subspace to provide better fit of the hyperplane classifier to the datapoints included in the granule.
  • the hyperplane classifiers can then be used to analyze new data.
  • a new tuple e.g., set of attributes
  • the classification system can receive an unparsed document and can parse the document to extract the attributes used for classification by the various granules.
  • the hyperplane classifiers can be stored locally to the classification system and can be used to derive a number of predictions for the classification of the new tuple.
  • the hyperplane classifiers are stored by the respective processing modules that processed the granule and the new tuple can be distributed to each of the respective processing modules. The processing modules can then each respond with a predicted granule classification, resulting in a number of granule predictions equal to the number of derived granules.
  • the predicted classifications can be aggregated to derive a final classification prediction associated with the new tuple.
  • the predicted classifications can be aggregated by majority voting. For example, each prediction can be counted as a "vote.” The "votes” can then be tallied and compared to determine which classification received the most "votes.” This classification can be adopted by the classification system as the final classification prediction.
  • the granule predictions can include a distance metric describing the distance of a datapoint associated with the new tuple from the hyperplane classifier.
  • the distance can be used to weight the aggregation of the predicted classifications. For example, if the classification system were determining whether a set of data indicates a man versus a woman, and one hyperplane classifier predicts that the datapoint is associated with a woman while another hyperplane classifier predicts that the same datapoint is associated with a man, the distance of each from the hyperplane classifier can be used to determine which classifier to use as the final classification prediction.
  • each of the hyperplane classifiers can have an effectiveness metric associated with the classifier.
  • the effectiveness metric can be derived by validating the hyperplane classifier against the out-of-bag data not chosen for inclusion in the granule associated with the hyperplane classifier.
  • the hyperplane classifier for example, is measured to be 90% effective on the out-of-bag data, the prediction can be weighted at 90%.
  • the prediction can be weighted at 70%.
  • the hyperplane classifier can be discarded. For example, if a hyperplane classifier is less than 50% effective on the out-of- bag data, it is more likely than not that the classification is incorrect (at least as far as the out- of-bag data is concerned). In such instances, it the hyperplane classifier could be based on datapoints which are outliers that do not accurately represent the sample.
  • the classification system can request a new training dataset, or possibly different and/or additional attributes associated with the current training dataset. In other implementations, the classification system can continue to run the support vector machine processing until a threshold number of hyperplane classifiers are identified.
  • FIG. 2 is a block diagram of an example classification system 100.
  • the classification system 100 can include a granule selection module 210, a processing module 220, a prediction module 230, and an aggregation module 240.
  • the granule selection module 210 can receive a training dataset 250 and can randomly select
  • the random selection of the tuples can be based upon a bootstrapping process, whereby a selection of a tuple is made, the tuple is replaced and then another tuple is selected. In various examples, this process can continue until a threshold number of selections are made. As a specific example, 10% bootstrapping on a 100-tuple training dataset can mean that 10 selections are made (including duplicates). Thus, there are expected to be less than 10 unique sample tuples in the in-bag data, on average.
  • the in-bag tuples are then projected onto a random subspace.
  • the in- bag tuples can be visualized as datapoints plotted onto an n-dimensional space, where n equals the number of attributes associated with each tuple. If a dimension is removed, the datapoints can be said to be projected into the subspace comprising the remaining attributes.
  • the random subspace can be selected by randomly selecting attributes to remove from the subspace or randomly selecting the attributes that are included in the subspace.
  • the random subspace is chosen by randomly selecting the attributes for inclusion in the granule without replacement (e.g., no duplicates can be selected, because once an attribute is selected, it is removed from the sample).
  • an original matrix associated with the training dataset can be reduced into a granule.
  • Granules can be continued to be selected until a threshold number of granules have been selected. The random selection of the granules and a smaller sample size can facilitate diversity among the granules. For example, one granule is unlikely to be similar to any of the other granules.
  • the processing module 220 can be operable to process the granules using a support vector machine process.
  • the processing module 220 can use the support vector machine process to plot the tuples associated with a respective granule into an n-dimensional space, where n equals the number of tuples associated with the granule.
  • the processing module 220 can identify a hyperplane classifier (e.g., linear classifier) which best separates the data based upon the selected category for classification. In some instances, multiple hyperplane classifiers might provide differentiation between the data. In some implementations, the processing module 220 can select the hyperplane classifier that provides maximum separation between the datapoints (e.g., a maximum margin classifier).
  • the granular nature of the process can facilitate distributed processing of the granules. For example, if there are 10 granules to process on five processors, each of the processors could be assigned to handle two granules.
  • Some implementations can include a distribution module operable to distribute the granules among potentially multiple processing modules 220 (e.g., processors running a support vector machine processes on the granules).
  • the distribution module can, for example, determine the available (e.g., spare) processing capacity and/or specialty processing available on each of a number of processors and assign the granules to the processors accordingly. Other factors for determining distribution of the granules can be used.
  • the prediction module 230 can receive features 260 (e.g., from an unclassified tuple) for classification.
  • the features 260 can be received from a messaging filter 280.
  • the features 260 can be derived from a received message 270 by a messaging filter 280.
  • the messaging filter 280 can extract the features 260.
  • the messaging filter 280 can be a part of the classification system 100.
  • the messaging filter 280 can query the classification system 100 by sending the attributes associated with the tuple to be classified to the classification system 100.
  • the prediction module 230 can compare datapoints associated with the features against each of the hyperplane classifiers derived from the granules to derive granule predictions associated with the respective hyperplane classifiers. For example, the prediction module 230 could plot the unclassified new tuple onto a random subspace associated with a first granule and associated hyperplane classifier and determine whether the unclassifier new tuple shows characteristics associated with a first classification (e.g., men) or a characteristics associated with a second classification (e.g., women). The prediction module 230 could continue this process until each of the hyperplane classifiers have been compared to a datapoint associated with the unclassified new tuple.
  • a first classification e.g., men
  • a second classification e.g., women
  • the prediction module 230 can include distributed processing elements (e.g., processors). In such implementations, the prediction module 230 can distribute classification jobs to processors, for example with available processing capability. In other implementations, the prediction module 230 can distribute classification jobs based upon which processors previously derived the hyperplane classifier associated with a granule. In such implementations, for example, a processor used to derive a first hyperplane classifier for a first granule can also be used to plot an unclassified new tuple into the random subspace associated with the first granule and can compare the datapoint associated with the new tuple to the first hyperplane classifier associated with the first granule.
  • distributed processing elements e.g., processors
  • the prediction module 230 can distribute classification jobs to processors, for example with available processing capability.
  • the prediction module 230 can distribute classification jobs based upon which processors previously derived the hyperplane classifier associated with a granule.
  • the granule predictions can be communicated to the aggregation module 240.
  • the aggregation module 240 can use a simple voting process to aggregate the granule predictions. For example, each prediction can be tallied as a "vote" for the classification predicted by the granule prediction. The classification that compiles the most votes can be identified as the final classification decision.
  • each granule prediction can include a distance metric identifying the distance of datapoints associated with the unclassified new tuple from the respective hyperplane classifiers. The distance metric can be used to weight the respective granule predictions.
  • classifier A For example, if there are three predictions, one for classifier A located a distance of 10 units from the hyperplane classifier, and two for classifier B located a distance of 2 and 5 units from their respective hyperplane classifier, then classifier A is weighted at 10 units and classifier B is weighted at 7 units. Thus, in this example, classifier A can be selected as the final classification prediction.
  • each of the predictions can be weighted by a Bayesian confidence level associated with the respective hyperplane classifiers.
  • the Bayesian confidence level can be based upon a validation performed on the hyperplane classifier using the out-of-bag data associated with each respective hyperplane classifier. For example, if a first hyperplane classifier is measured to be 85% effective at classifying the out-of-bag data, the predictions associated with the hyperplane classifier can be weighted by the effectiveness metric. The weighted predictions can be summed and compared to each other to determine the final classification prediction.
  • FIG. 3 is a block diagram of a messaging filter 300 using a classification system 310 and illustrating example policies 320-360.
  • the policies can include an information security policy 320, a virus policy 330, a spam policy 340, a phishing policy 350, a spyware policy 360, or combinations thereof.
  • the messaging filter 300 can filter communications received from messaging entities 380 destined for other messaging entities 380.
  • the messaging filter 300 can query a classification system 310 to identify a classification associated with a message.
  • the classification system 310 can use a granular support vector machine process to identify hyperplane classifiers associated with a number of granules derived from a training dataset 390.
  • the training dataset can include, for example, documents that have previously been classified.
  • the documents can be a library of spam messages identified by users and/or provided by third parties. In other examples, the documents can be a library of viruses identified by administrators, users, and/or other systems or devices.
  • the hyperplane classifiers can then be compared to the attributes of new messages to determine to which classification the new message belongs.
  • incoming and/or outgoing messages can be classified and compared to the information security policy to determine whether to forward the message for delivery.
  • the classification system might determine that the document is a technical specification document.
  • the information security policy for example, might specify that technical specification documents should not be forwarded outside of an enterprise network, or only sent to specific individuals.
  • the information security policy could specify that technical documents require encryption of a specified type so as to ensure the security of the technical documents being transmitted.
  • Other information security policies can be used.
  • the virus policy can specify a risk level associated with communications that are acceptable. For example, the virus policy can indicate a low tolerance for viruses.
  • the messaging filter can block communications that are determined to be even a low risk for including viruses.
  • the virus policy can indicate a high tolerance for virus activity.
  • the messaging filter might only block those messages which are strongly correlated with virus activity.
  • a confidence metric can be associated with the classification. If the confidence metric exceeds a threshold level set by the virus policy, the message can be blocked.
  • Other virus policies can be used.
  • the spam policy can specify a risk level associated with communications that is acceptable to the enterprise network. For example, a system administrator can specify a high tolerance for spam messages. In such an example, the messaging filter 300 can filter only messages that are highly correlated with spam activity.
  • the phishing policy can specify a risk level associated with communications that are acceptable to the enterprise network. For example, a system administrator can specify a low tolerance for phishing activity. In such an example, the messaging filter 300 can filter even communications which show a slight correlation to phishing activity.
  • the spyware policy can specify a network tolerance for communications that might include spyware. For example, an administrator can set a low tolerance for spyware activity on the network.
  • the messaging filter 300 can filter communications that show even a slight correlation to spyware activity.
  • FIG. 4 is a block diagram of an example classification system 100 using distributed processing modules 400a-e.
  • the classification system 100 can include a granule selection module 410, a distribution module 420, a prediction module 430 and an aggregation module 440.
  • the classification system can operate to receive a training dataset 450, to derive a number of hyperplane classifiers from the training dataset, and then to predict the classification of incoming unclassified messages 460.
  • the granule selection module can receive the training dataset 450.
  • the training dataset 450 can be provided, for example by a system administrator or a third party device.
  • the training dataset 450 can include a plurality of records (e.g., tuples) which have previously been classified.
  • the training dataset 450 can include a corpus of documents that have not been parsed.
  • the granule selection module 410 in such implementations, can include a parser operable to extract attributes from the document corpus.
  • the granule selection module 410 can randomly select granules by using a bootstrapping process on the tuples, and then projecting the tuples into a random subspace.
  • the distribution module 420 can operate to distribute the granules to a plurality of processing modules 400a-e for processing. In some implementations, the distribution module 420 can distribute the granules to processing modules 400a-e having the highest available processing capacity. In other implementations, the distribution module 420 can distribute the granules to processing modules 400a-e based upon the type of content being classified. In still further implementations, the distribution module 420 can distribute the granules to processing modules 400a-e based upon other characteristics of the processing modules 400a-e (e.g., availability of special purpose processing power (e.g., digital signal processing, etc.)).
  • characteristics of the processing modules 400a-e e.g., availability of special purpose processing power (e.g., digital signal processing, etc.)).
  • the distributed processing modules 400a-e can return a hyperplane classifier to the distribution module 420.
  • the hyperplane classifiers can be provided to the prediction module 430.
  • the prediction module 430 can also receive unclassified messages 460 and can use the hyperplane classifiers to provide granule classification predictions associated with each of the hyperplane classifiers.
  • the granule classification predictions can be provided to an aggregation module 440.
  • the aggregation module 440 can operate to aggregate the granule classification predictions.
  • the aggregation module 440 can aggregate the granule classification predictions to derive a final classification prediction based upon a simple voting process.
  • the aggregation module 440 can use a distance metric associated with each of the granule classification predictions to weight the respective granule predictions.
  • the aggregation module 440 can use a Bayesian confidence score to weight each of the granule classification predictions.
  • the Bayesian confidence score can be derived, for example, by validating the each respective hyperplane classifier associated with a granule against out-of-bag data not selected for inclusion in the granule.
  • the resulting final classification prediction can be provided as output of the classification system 100.
  • FIG. 5 is a block diagram of another example classification system 100 having distributed processing and prediction modules 500a-e.
  • the classification system 100 can include a granule selection module 510, a distribution module
  • the classification system 100 can operate to distribute the processing associated with both the granule processing to derive the hyperplane classifiers associated with the granules and the prediction processing to provide granule predictions based upon the derived hyperplane classifiers.
  • the granule selection module 510 can receive the training dataset 540.
  • the training dataset 540 can be provided, for example by a system administrator or a third party device.
  • the training dataset 540 can include a plurality of records (e.g., tuples) which have previously been classified.
  • the training dataset 540 can include a corpus of documents that have not been parsed.
  • the granule selection module 510 can include a parser operable to extract attributes from the document corpus.
  • the granule selection module 510 can randomly select granules by using a bootstrapping process on the tuples, and then projecting the tuples into a random subspace.
  • the distribution module 520 can received the granules from the granule selection module 520.
  • the distribution module 520 can distribute the granules to one or more distributed processing and prediction modules 500a-e.
  • Each distributed processing and prediction modules 500a-e can operate to execute a support vector machine process on the receive granule(s).
  • the support vector machine process can operate to derive a hyperplane classifier(s) associated with the granule(s).
  • Each distributed processing and prediction modules 500a-e can then use the derived hyperplane classifier(s) to generate a granule classification prediction (or predictions) associated with an unclassified message 550.
  • the granule classification predictions can be provided to an aggregation module 530.
  • the aggregation module 530 can operate to aggregate the granule classification predictions.
  • the aggregation module 530 can aggregate the granule classification predictions to derive a final classification prediction based upon a simple voting process. In other implementations, the aggregation module 530 can use a distance metric associated with each of the granule classification predictions to weight the respective granule predictions. In still further implementations, the aggregation module 530 can use a
  • FIG. 6 is a flowchart illustrating an example method used to derive granules and classification planes.
  • a training dataset is received.
  • the training dataset can be received, for example, by a granule selection module (e.g., classification system 100 of FIG. 2).
  • the training dataset in various examples, can include parsed or unparsed data describing attributes of an item for classification.
  • the item can include documents, deoxyribonucleic acid (DNA) sequences, chemicals, or any other item that has definite and/or quantifiable attributes that can be compiled and analyzed.
  • the training dataset can include a document corpus operable to be parsed to identify attributes of each document in the document corpus.
  • a plurality of granules are derived.
  • the plurality of granules can be derived, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2).
  • the granule selection module can use a bootstrapping process to identify a random sampling of a received training dataset.
  • the granule selection module can then project the random sampling into a random subspace, thereby producing a granule.
  • the granule is much smaller than the original dataset, which can facilitate more efficient processing of the granule than can be achieved using the entire training dataset.
  • the granule further supports distributed processing, thereby facilitating the parallel processing of the derived granules.
  • the granules are processed using a support vector machine process.
  • the granules can be processed, for example, by a processing module (e.g., processing module 220 of FIG. 2).
  • the support vector machine process can operate to derive a hyperplane classifier associated with each granule.
  • the hyperplane classifiers can be used to provide demarcations between given classifications of the data (e.g., spam or non-spam, virus or non- virus, spyware or non-spyware, etc.).
  • FIG. 7 is a flowchart illustrating an example method used to derive classification associated with a new set of attributes for classification. At stage 710 a new tuple and associated attributes can be received.
  • the new tuple and associated attributes can be received, for example, by a prediction module (e.g., classification system 100 of FIG. 2).
  • a prediction can be generated based upon each hyperplane classifier.
  • the prediction can be generated, for example, by a prediction module (e.g., prediction module 230 of FIG. 2).
  • the prediction module can use the derived hyperplane classifiers to generate a granule classification prediction associated with each hyperplane classifier.
  • the granule classification predictions from each of the hyperplane classifiers can be aggregated.
  • the predictions can be aggregated, for example, by an aggregation module (e.g., aggregation module 240 of FIG. 2).
  • the granule classification predictions can be aggregated using a simple voting process, a distance between the datapoint and the hyperplane classifiers can be used to factor the final classification, or a Bayesian confidence can be used to weight the predictions based upon the confidence associated with the respective hyperplane classifiers.
  • FIG. 8 is a flowchart illustrating an example method used to derive granules and distribute granules to processing modules. The method is initialized at stage 800.
  • a training dataset is received.
  • the training dataset can be received, for example, by a granule selection module (e.g., classification system 100 of FIG. 2).
  • the training dataset in various examples, can include parsed or unparsed data describing attributes of an item for classification.
  • the item can include documents, deoxyribonucleic acid
  • a counter can be initialized.
  • the counter can be initialized, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2).
  • the counter can be used to identify when enough granules have been generated based on the training dataset. For example, the number of granules for a given dataset can be a percentage (e.g., 50%) of the number of tuples in the training dataset.
  • a bootstrap aggregating process is used to randomly select tuples from among the training dataset.
  • the bootstrap aggregating process can be performed, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2).
  • the bootstrap aggregating process randomly selects a tuple from the training dataset, and replaces the tuple before selecting another tuple, until a predefined number of selections have been made. In such implementations, duplicates can be selected.
  • the predefined number of selections can be based upon a percentage (e.g., 10%) of the size of the training dataset.
  • the random sample of tuples is projected into a random subspace.
  • the projection into a random subspace can be performed, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2).
  • the random subspace can be selected, in some implementations, by randomly selecting the features to be used within the granule, without replacement. For example, when a first feature is selected, the feature is not replaced into the group, but removed so as not to be selected a second time. Such random selection guarantees that the granule will include a predefined number of features in each granule.
  • the generated granule is labeled as the nth granule, where n is the current counter value.
  • the granule can be labeled, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2).
  • the counter can be incremented, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2).
  • the counter can be compared to a threshold to determine whether a predefined number of granules have been generated. If a predefined number of granules have not been generated, the process returns to stage 815 and generates additional granules until the specified number of granules have been generated. However, if the counter has reached the threshold at stage 835, the process can continue to stage 840 where the granules can be distributed.
  • the granules can be distributed, for example, by a distribution module (e.g., distribution module 420, 520 of FIGS. 4 and 5, respectively). In various implementations, the granules can be distributed based upon the characteristics of a plurality of processing modules or the characteristics of the granules themselves.
  • a distribution module e.g., distribution module 420, 520 of FIGS. 4 and 5, respectively.
  • the granules can be distributed based upon the characteristics of a plurality of processing modules or the characteristics of the granules themselves.
  • the granules can be processed.
  • the granules can be processed, for example, by distributed processing module (e.g., distributed processing modules 400a-e, 500a-e of FIGS. 4 and 5, respectively).
  • the distributed processing modules can be executed by multiple processors.
  • the distributed processing modules can execute a support vector machine process on each generated granule to derive a hyperplane classifier associated with each generated granule.
  • the hyperplane classifier can be compared to unclassified data to derive a classification prediction associated with the unclassified data.
  • the hyperplane classifiers can be validated.
  • the hyperplane classifiers can be validated, for example, by distributed processing modules (e.g., distributed processing modules 400a-e, 500a-e of FIGS. 4 and 5, respectively).
  • each hyperplane classifier can be validated using respective out-of-bag data associated with the granule used to generate the hyperplane classifier.
  • each hyperplane classifier can be tested to determine the effectiveness of the derived hyperplane classifier.
  • the determination of which hyperplane classifiers to use can be performed, for example, by a distributed processing module (e.g., distributed processing modules 400a-e, 500a-e of FIGS. 4 and 5, respectively).
  • a threshold effectiveness level can be identified whereby if the validation does not meet the threshold, it is not used for predicting classifications for unclassified datasets. For example, if a hyperplane classifier is validated as being correct less than 50% of the time, the classification associated with the hyperplane classifier is incorrect more often than it is correct. In some implementations, such hyperplane classifier can be discarded as misleading with respect to the final classification prediction.
  • the method ends at stage 860.
  • the method can be used to efficiently derive a plurality of hyperplane classifiers associated with a training dataset by distributing the granules for parallel and/or independent processing. Moreover, inaccurate hyperplane classifiers can be discarded in some implementations.
  • message filters can forward, drop, quarantine, delay delivery, or specify messages for more detailed testing.
  • the messages can be delayed to facilitate collection of additional information related to the message.
  • the systems and methods disclosed herein may use data signals conveyed using networks (e.g., local area network, wide area network, internet, etc.), fiber optic medium, carrier waves, wireless networks (e.g., wireless local area networks, wireless metropolitan area networks, cellular networks, etc.), etc. for communication with one or more data processing devices (e.g., mobile devices).
  • networks e.g., local area network, wide area network, internet, etc.
  • carrier waves e.g., wireless local area networks, wireless metropolitan area networks, cellular networks, etc.
  • wireless networks e.g., wireless local area networks, wireless metropolitan area networks, cellular networks, etc.
  • the data signals can carry any or all of the data disclosed herein that is provided to or from a device.
  • the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by one or more processors.
  • the software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform methods described herein.
  • the systems and methods may be provided on many different types of computer- readable media including computer storage mechanisms (e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) that contain instructions for use in execution by a processor to perform the methods' operations and implement the systems described herein.
  • computer storage mechanisms e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.
  • Ranges may be expressed herein as from “about” one particular value, and/or to "about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent "about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.

Abstract

Methods and systems for granular support vector machines. Granular support vector machines can randomly select samples of datapoints and project the samples of datapoints into a randomly selected subspaces to derive granules. A support vector machine can then be used to identify hyperplane classifiers respectively associated with the granules. The hyperplane classifiers can be used on an unknown datapoint to provide a plurality of predictions which can be aggregated to provide a final prediction associated with the datapoint.

Description

GRANULAR SUPPORT VECTOR MACHINE WITH RANDOM
GRANULARITY
BACKGROUND AND FIELD
This disclosure relates generally to data mining using support vector machines. Support vector machines are useful in providing input to identify trends in existing data and to classify new sets of data for analysis. Generally support vector machines can be visualized by plotting data into an n-dimensional space, n being the number of attributes associated with the item to be classified. However, given large numbers of attributes and a large volume of training data, support vector machines can be processor intensive. Recently analysts have developed an algorithm known as "Random Forests."
"Random Forests" uses decision trees to classify data. Decision trees modeled on large amounts of data can be difficult to parse and hence classification accuracy is limited. Thus, "Random Forests" utilizes a bootstrap aggregating (bagging) algorithm to randomly generate multiple bootstrapping datasets from a training dataset. Then a decision tree is modeled on each bootstrapping dataset. For each decision tree modeling, at each node a small fraction of attributes are randomly selected to determine the split. Because all attributes need to be available for random selection, the whole bootstrapping dataset is needed in the memory. Moreover, "Random Forests" has difficulty working with sparse data (e.g., data which contains many zeroes). For example, a dataset, formatted as a matrix with rows as samples and columns as attributes, has to be entirely loaded into the memory even when a cell is zero.
Thus, "Random Forests" is space-consuming, and when modeling the entire data matrix, "Random Forests" is also time-consuming, given a large and sparse dataset. The dataset cannot be parallelized on a distributed system such as a computer cluster, because it is time- consuming to transfer a whole bootstrapping dataset between different computer nodes.
SUMMARY
Systems, methods, apparatuses and computer program products for granular support vector machines are provided. In one aspect, methods are disclosed, which include: receiving a training dataset comprising a plurality of tuples and a plurality of attributes for each of the tuples; deriving a plurality of granules from the training dataset, each granule comprising a plurality of sample tuples and a plurality of sample attributes; processing the granules using a support vector machine process to identify a hyperplane classifier associated with each of the granules; predicting a classification of a new tuple using each of the hyperplane classifiers to produce a plurality of predictions; and aggregating the predictions to derive a decision on a final classification of the new tuple..
Systems can include a granule selection module, multiple granule processing modules, one or more prediction modules and an aggregation module. The granule selection module can select a plurality of granules from a training dataset. Each of the granules can include multiple tuples and attributes. The granule processing modules can process granules using support vector machine processes identifying a hyperplane classifier associated with each of the granules. The one or more prediction modules can predict a classification associated with an unknown tuple based upon the hyperplane classifiers to produce multiple granule predictions. The aggregation module can aggregate the granule predictions to derive a decision on a final classification associated with the unknown tuple The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram of a network environment including an example classification system.
FIG. 2 is a block diagram of an example classification system. FIG. 3 is a block diagram of a messaging filter using a classification system and illustrating example policies. FIG. 4 is a block diagram of an example distributed classification system.
FIG. 5 is a block diagram of another example distributed classification system. FIG. 6 is a flowchart illustrating an example method used to derive granules and classification planes.
FIG. 7 is a flowchart illustrating an example method used to derive classification associated with a new set of attributes for classification.
- ? - FIG. 8 is a flowchart illustrating an example method used to derive granules and distribute granules to processing modules.
DETAILED DESCRIPTION
Granular support vector machines with random granularity can help to provide efficient and accurate classification of many types of data. For example, granular support vector machines can be used in the context of spam classification. Moreover, in some implementations, the granules, typically much smaller than the bootstrapping datasets before random subspace projection, can be distributed across many processors, such that the granules can be processed in parallel. In other implementations, the granules can be distributed based upon spare processing capability at distributed processing modules. The nature of the granules can facilitate distributed processing. The reduction in size of the training dataset can facilitate faster processing of each of the granules. In comparison to "Random Forests", this granular support vector machine with random granularity works well on large and sparsely populated datasets (e.g., data which contains a lot of zeroes or null sets), because all zeros or null sets are not needed in the memory. In some implementations, the classification system can be used to classify spam. In other implementations, the classification system can be used to classify biological data. Other classifications can be derived from any type of dataset using granular support vector machines with random granularity. FIG. 1 is a block diagram of a network environment including an example classification system 100. The classification system 100 can receive classification queries from an enterprise messaging filter 110. The enterprise messaging filter 110 can protect enterprise messaging entities 120 from external messaging entities 130 attempting to communicate with the enterprise messaging entity 120 through a network 140. In some implementations, the classification system 100 can receive a training dataset
150. The training dataset 150 can be provided by an administrator, for example. The classification system 100 can use granular support vector machine classification 160 to process the training dataset and derive hyperplane classifiers respectively associated with randomly selected granules, thereby producing a number of granular support vector machines (e.g., GVSM 1 170, GVSM 2 180 ... GVSM n 190). In some implementations, the classification system 100 can derive a number of granules from the training dataset. The granules can be derived, for example, using a bootstrapping process whereby a tuple (e.g., a record in the dataset) is randomly selected for inclusion in the in-bag data. Additional tuples can be selected from among the entire training dataset (e.g., sampled with replacement). Thus, the selection of each tuple is independent from the selection of other tuples and the same tuple can be selected more than once. For example, if a training dataset included 100 tuples and 100 bootstrapping samples are selected from among the 100 tuples, on average 63.2 of the tuples would be selected, and 36.8 of the tuples would not be selected. The selected data can be identified as in-bag data, while the non-selected data can be identified as out-of-bag data. In some examples, the sample size can be set at 10% of the total number of tuples in the training dataset. Thus, if there were 100 tuples, the classification system 100 can select 10 samples with replacement.
The classification system 100 can then project the data into a random subspace. The random subspace projection can be a random selection of tuple attributes (e.g., features). In some implementations, the random selection of tuple attributes can be performed without replacement (e.g., no duplicates can be selected). In other implementations, the random selection of tuple attributes can be performed with replacement (e.g., duplicate samples are possible, but discarded). The random selection of tuples with a projection of the tuple attributes into a random subspace generates a granule. The granule can be visualized as a matrix having a number of rows of records (e.g., equal to the number of unique tuples selected from the training dataset) with a number of columns defining attributes associated with the granule.
The classification system 100 can then execute a support vector machine process operable to receive the data and to plot the data into an n-dimensional space (e.g., n being the number of unique tuples sampled during the bootstrapping process). The support vector machine process can identify a hyperplane classifier (e.g., a linear classifier) to find the plane which best separates the data into two or more classifications. In some implementations, adjustments to the support vector machine process can be made to avoid overfitting the hyperplane classifier to the datapoints. In various examples, there can be more than one potential hyperplane classifier which provides separation between the data. In such instances, the hyperplane classifier which achieves maximum separation (e.g., maximum margin classifier) can be identified and selected by the support vector machine process. In some implementations, the support vector machine can warp the random subspace to provide better fit of the hyperplane classifier to the datapoints included in the granule.
The hyperplane classifiers (e.g., GVSM 1 170, GVSM 2 180 ... GVSM n 190) can then be used to analyze new data. In some implementations, a new tuple (e.g., set of attributes) with an unknown classification can be received. In other implementations, the classification system can receive an unparsed document and can parse the document to extract the attributes used for classification by the various granules.
In some implementations, the hyperplane classifiers can be stored locally to the classification system and can be used to derive a number of predictions for the classification of the new tuple. In other implementations, the hyperplane classifiers are stored by the respective processing modules that processed the granule and the new tuple can be distributed to each of the respective processing modules. The processing modules can then each respond with a predicted granule classification, resulting in a number of granule predictions equal to the number of derived granules.
The predicted classifications can be aggregated to derive a final classification prediction associated with the new tuple. In some implementations, the predicted classifications can be aggregated by majority voting. For example, each prediction can be counted as a "vote." The "votes" can then be tallied and compared to determine which classification received the most "votes." This classification can be adopted by the classification system as the final classification prediction.
In other implementations, the granule predictions can include a distance metric describing the distance of a datapoint associated with the new tuple from the hyperplane classifier. The distance can be used to weight the aggregation of the predicted classifications. For example, if the classification system were determining whether a set of data indicates a man versus a woman, and one hyperplane classifier predicts that the datapoint is associated with a woman while another hyperplane classifier predicts that the same datapoint is associated with a man, the distance of each from the hyperplane classifier can be used to determine which classifier to use as the final classification prediction. In other examples, it can be imagined that 5 hyperplane classifiers predict that the datapoint is associated with a man while 10 hyperplane classifiers predict that the datapoint is associated with a woman. In those implementations where distance is used to provide a weighting to the predictions, if the 5 classifiers predicting that the datapoint is male have a greater aggregate distance from the respective hyperplane classifiers than the 10 classifiers predicting that the datapoint is female, then the final classification prediction can be male. In still further implementations, each of the hyperplane classifiers can have an effectiveness metric associated with the classifier. In such implementations, the effectiveness metric can be derived by validating the hyperplane classifier against the out-of-bag data not chosen for inclusion in the granule associated with the hyperplane classifier. Thus, for example, using a 10% bootstrapping process on a training sample of 100 records, there are expected to be about 93 out-of-bag tuples (e.g., datapoints). Those datapoints can be used in an attempt to determine the effectiveness of the hyperplane classifier derived with respect to the granule. If the hyperplane classifier, for example, is measured to be 90% effective on the out-of-bag data, the prediction can be weighted at 90%. If another hyperplane classifier is measured, for example, to be 70% effective on the out-of-bag data, the prediction can be weighted at 70%. In some implementations, if a hyperplane classifier is measured to be less than a threshold level of effectiveness on the out-of-bag data, the hyperplane classifier can be discarded. For example, if a hyperplane classifier is less than 50% effective on the out-of- bag data, it is more likely than not that the classification is incorrect (at least as far as the out- of-bag data is concerned). In such instances, it the hyperplane classifier could be based on datapoints which are outliers that do not accurately represent the sample. In some implementations, if a threshold number of hyperplane classifiers are discarded because they do not predict with a threshold effectiveness, the classification system can request a new training dataset, or possibly different and/or additional attributes associated with the current training dataset. In other implementations, the classification system can continue to run the support vector machine processing until a threshold number of hyperplane classifiers are identified.
FIG. 2 is a block diagram of an example classification system 100. In some implementations, the classification system 100 can include a granule selection module 210, a processing module 220, a prediction module 230, and an aggregation module 240. The granule selection module 210 can receive a training dataset 250 and can randomly select
(with replacement) tuples from the training dataset 250. In some implementations, the random selection of the tuples can be based upon a bootstrapping process, whereby a selection of a tuple is made, the tuple is replaced and then another tuple is selected. In various examples, this process can continue until a threshold number of selections are made. As a specific example, 10% bootstrapping on a 100-tuple training dataset can mean that 10 selections are made (including duplicates). Thus, there are expected to be less than 10 unique sample tuples in the in-bag data, on average.
The in-bag tuples are then projected onto a random subspace. For example, the in- bag tuples can be visualized as datapoints plotted onto an n-dimensional space, where n equals the number of attributes associated with each tuple. If a dimension is removed, the datapoints can be said to be projected into the subspace comprising the remaining attributes.
In various implementations, the random subspace can be selected by randomly selecting attributes to remove from the subspace or randomly selecting the attributes that are included in the subspace. In some implementations, the random subspace is chosen by randomly selecting the attributes for inclusion in the granule without replacement (e.g., no duplicates can be selected, because once an attribute is selected, it is removed from the sample). Thus, an original matrix associated with the training dataset can be reduced into a granule. Granules can be continued to be selected until a threshold number of granules have been selected. The random selection of the granules and a smaller sample size can facilitate diversity among the granules. For example, one granule is unlikely to be similar to any of the other granules.
The processing module 220 can be operable to process the granules using a support vector machine process. The processing module 220 can use the support vector machine process to plot the tuples associated with a respective granule into an n-dimensional space, where n equals the number of tuples associated with the granule. The processing module 220 can identify a hyperplane classifier (e.g., linear classifier) which best separates the data based upon the selected category for classification. In some instances, multiple hyperplane classifiers might provide differentiation between the data. In some implementations, the processing module 220 can select the hyperplane classifier that provides maximum separation between the datapoints (e.g., a maximum margin classifier). In various implementations, the granular nature of the process can facilitate distributed processing of the granules. For example, if there are 10 granules to process on five processors, each of the processors could be assigned to handle two granules. Some implementations can include a distribution module operable to distribute the granules among potentially multiple processing modules 220 (e.g., processors running a support vector machine processes on the granules). In such implementations, the distribution module can, for example, determine the available (e.g., spare) processing capacity and/or specialty processing available on each of a number of processors and assign the granules to the processors accordingly. Other factors for determining distribution of the granules can be used.
The prediction module 230 can receive features 260 (e.g., from an unclassified tuple) for classification. In some implementations, the features 260 can be received from a messaging filter 280. In such implementations, the features 260 can be derived from a received message 270 by a messaging filter 280. The messaging filter 280, for example, can extract the features 260. In some implementations, the messaging filter 280 can be a part of the classification system 100. In other implementations, the messaging filter 280 can query the classification system 100 by sending the attributes associated with the tuple to be classified to the classification system 100.
The prediction module 230 can compare datapoints associated with the features against each of the hyperplane classifiers derived from the granules to derive granule predictions associated with the respective hyperplane classifiers. For example, the prediction module 230 could plot the unclassified new tuple onto a random subspace associated with a first granule and associated hyperplane classifier and determine whether the unclassifier new tuple shows characteristics associated with a first classification (e.g., men) or a characteristics associated with a second classification (e.g., women). The prediction module 230 could continue this process until each of the hyperplane classifiers have been compared to a datapoint associated with the unclassified new tuple.
In some implementations, the prediction module 230 can include distributed processing elements (e.g., processors). In such implementations, the prediction module 230 can distribute classification jobs to processors, for example with available processing capability. In other implementations, the prediction module 230 can distribute classification jobs based upon which processors previously derived the hyperplane classifier associated with a granule. In such implementations, for example, a processor used to derive a first hyperplane classifier for a first granule can also be used to plot an unclassified new tuple into the random subspace associated with the first granule and can compare the datapoint associated with the new tuple to the first hyperplane classifier associated with the first granule. The granule predictions can be communicated to the aggregation module 240. In some implementations, the aggregation module 240 can use a simple voting process to aggregate the granule predictions. For example, each prediction can be tallied as a "vote" for the classification predicted by the granule prediction. The classification that compiles the most votes can be identified as the final classification decision. In another implementation, each granule prediction can include a distance metric identifying the distance of datapoints associated with the unclassified new tuple from the respective hyperplane classifiers. The distance metric can be used to weight the respective granule predictions. For example, if there are three predictions, one for classifier A located a distance of 10 units from the hyperplane classifier, and two for classifier B located a distance of 2 and 5 units from their respective hyperplane classifier, then classifier A is weighted at 10 units and classifier B is weighted at 7 units. Thus, in this example, classifier A can be selected as the final classification prediction.
In other implementations, each of the predictions can be weighted by a Bayesian confidence level associated with the respective hyperplane classifiers. In some such implementations, the Bayesian confidence level can be based upon a validation performed on the hyperplane classifier using the out-of-bag data associated with each respective hyperplane classifier. For example, if a first hyperplane classifier is measured to be 85% effective at classifying the out-of-bag data, the predictions associated with the hyperplane classifier can be weighted by the effectiveness metric. The weighted predictions can be summed and compared to each other to determine the final classification prediction.
FIG. 3 is a block diagram of a messaging filter 300 using a classification system 310 and illustrating example policies 320-360. In various implementations, the policies can include an information security policy 320, a virus policy 330, a spam policy 340, a phishing policy 350, a spyware policy 360, or combinations thereof. The messaging filter 300 can filter communications received from messaging entities 380 destined for other messaging entities 380. In some implementations, the messaging filter 300 can query a classification system 310 to identify a classification associated with a message. The classification system 310 can use a granular support vector machine process to identify hyperplane classifiers associated with a number of granules derived from a training dataset 390. The training dataset can include, for example, documents that have previously been classified. In some examples, the documents can be a library of spam messages identified by users and/or provided by third parties. In other examples, the documents can be a library of viruses identified by administrators, users, and/or other systems or devices. The hyperplane classifiers can then be compared to the attributes of new messages to determine to which classification the new message belongs.
In those implementations that include an information security policy, incoming and/or outgoing messages can be classified and compared to the information security policy to determine whether to forward the message for delivery. For example, the classification system might determine that the document is a technical specification document. In such an example, the information security policy, for example, might specify that technical specification documents should not be forwarded outside of an enterprise network, or only sent to specific individuals. In other examples, the information security policy could specify that technical documents require encryption of a specified type so as to ensure the security of the technical documents being transmitted. Other information security policies can be used. In those implementations that include a virus policy, the virus policy can specify a risk level associated with communications that are acceptable. For example, the virus policy can indicate a low tolerance for viruses. Using such a policy, the messaging filter can block communications that are determined to be even a low risk for including viruses. In other examples, the virus policy can indicate a high tolerance for virus activity. In such examples, the messaging filter might only block those messages which are strongly correlated with virus activity. For example, in such implementations, a confidence metric can be associated with the classification. If the confidence metric exceeds a threshold level set by the virus policy, the message can be blocked. Other virus policies can be used.
In those implementations that include a spam policy, the spam policy can specify a risk level associated with communications that is acceptable to the enterprise network. For example, a system administrator can specify a high tolerance for spam messages. In such an example, the messaging filter 300 can filter only messages that are highly correlated with spam activity.
In those implementations that include a phishing policy, the phishing policy can specify a risk level associated with communications that are acceptable to the enterprise network. For example, a system administrator can specify a low tolerance for phishing activity. In such an example, the messaging filter 300 can filter even communications which show a slight correlation to phishing activity.
In those implementations that include a spyware policy, the spyware policy can specify a network tolerance for communications that might include spyware. For example, an administrator can set a low tolerance for spyware activity on the network. In such an example, the messaging filter 300 can filter communications that show even a slight correlation to spyware activity.
FIG. 4 is a block diagram of an example classification system 100 using distributed processing modules 400a-e. In some implementations, the classification system 100 can include a granule selection module 410, a distribution module 420, a prediction module 430 and an aggregation module 440. The classification system can operate to receive a training dataset 450, to derive a number of hyperplane classifiers from the training dataset, and then to predict the classification of incoming unclassified messages 460.
In some implementations, the granule selection module can receive the training dataset 450. The training dataset 450 can be provided, for example by a system administrator or a third party device. In some implementations, the training dataset 450 can include a plurality of records (e.g., tuples) which have previously been classified. In other implementations, the training dataset 450 can include a corpus of documents that have not been parsed. The granule selection module 410, in such implementations, can include a parser operable to extract attributes from the document corpus. In some implementations, the granule selection module 410 can randomly select granules by using a bootstrapping process on the tuples, and then projecting the tuples into a random subspace.
The distribution module 420 can operate to distribute the granules to a plurality of processing modules 400a-e for processing. In some implementations, the distribution module 420 can distribute the granules to processing modules 400a-e having the highest available processing capacity. In other implementations, the distribution module 420 can distribute the granules to processing modules 400a-e based upon the type of content being classified. In still further implementations, the distribution module 420 can distribute the granules to processing modules 400a-e based upon other characteristics of the processing modules 400a-e (e.g., availability of special purpose processing power (e.g., digital signal processing, etc.)).
In some implementations, the distributed processing modules 400a-e can return a hyperplane classifier to the distribution module 420. The hyperplane classifiers can be provided to the prediction module 430. The prediction module 430 can also receive unclassified messages 460 and can use the hyperplane classifiers to provide granule classification predictions associated with each of the hyperplane classifiers.
The granule classification predictions can be provided to an aggregation module 440. The aggregation module 440 can operate to aggregate the granule classification predictions. In some implementations, the aggregation module 440 can aggregate the granule classification predictions to derive a final classification prediction based upon a simple voting process. In other implementations, the aggregation module 440 can use a distance metric associated with each of the granule classification predictions to weight the respective granule predictions. In still further implementations, the aggregation module 440 can use a Bayesian confidence score to weight each of the granule classification predictions. The Bayesian confidence score can be derived, for example, by validating the each respective hyperplane classifier associated with a granule against out-of-bag data not selected for inclusion in the granule. The resulting final classification prediction can be provided as output of the classification system 100.
FIG. 5 is a block diagram of another example classification system 100 having distributed processing and prediction modules 500a-e. In some implementations, the classification system 100 can include a granule selection module 510, a distribution module
520 and an aggregation module 530. The classification system 100 can operate to distribute the processing associated with both the granule processing to derive the hyperplane classifiers associated with the granules and the prediction processing to provide granule predictions based upon the derived hyperplane classifiers. In some implementations, the granule selection module 510 can receive the training dataset 540. The training dataset 540 can be provided, for example by a system administrator or a third party device. In some implementations, the training dataset 540 can include a plurality of records (e.g., tuples) which have previously been classified. In other implementations, the training dataset 540 can include a corpus of documents that have not been parsed. The granule selection module 510, in such implementations, can include a parser operable to extract attributes from the document corpus. In some implementations, the granule selection module 510 can randomly select granules by using a bootstrapping process on the tuples, and then projecting the tuples into a random subspace.
In some implementations, the distribution module 520 can received the granules from the granule selection module 520. The distribution module 520 can distribute the granules to one or more distributed processing and prediction modules 500a-e. The distribution module
520 can also distribute an unclassified message to the distributed processing and prediction modules 500a-e.
Each distributed processing and prediction modules 500a-e can operate to execute a support vector machine process on the receive granule(s). The support vector machine process can operate to derive a hyperplane classifier(s) associated with the granule(s). Each distributed processing and prediction modules 500a-e can then use the derived hyperplane classifier(s) to generate a granule classification prediction (or predictions) associated with an unclassified message 550.
The granule classification predictions can be provided to an aggregation module 530. The aggregation module 530 can operate to aggregate the granule classification predictions.
In some implementations, the aggregation module 530 can aggregate the granule classification predictions to derive a final classification prediction based upon a simple voting process. In other implementations, the aggregation module 530 can use a distance metric associated with each of the granule classification predictions to weight the respective granule predictions. In still further implementations, the aggregation module 530 can use a
Bayesian confidence score to weight each of the granule classification predictions. The Bayesian confidence score can be derived, for example, by validating the each respective hyperplane classifier associated with a granule against out-of-bag data not selected for inclusion in the granule. The resulting final classification prediction can be provided as output of the classification system 100. FIG. 6 is a flowchart illustrating an example method used to derive granules and classification planes. At stage 610, a training dataset is received. The training dataset can be received, for example, by a granule selection module (e.g., classification system 100 of FIG. 2). The training dataset, in various examples, can include parsed or unparsed data describing attributes of an item for classification. In some examples, the item can include documents, deoxyribonucleic acid (DNA) sequences, chemicals, or any other item that has definite and/or quantifiable attributes that can be compiled and analyzed. In other examples, the training dataset can include a document corpus operable to be parsed to identify attributes of each document in the document corpus. At stage 620, a plurality of granules are derived. The plurality of granules can be derived, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2). In various implementations, the granule selection module can use a bootstrapping process to identify a random sampling of a received training dataset. The granule selection module can then project the random sampling into a random subspace, thereby producing a granule. In various implementations, the granule is much smaller than the original dataset, which can facilitate more efficient processing of the granule than can be achieved using the entire training dataset. In some implementations, the granule further supports distributed processing, thereby facilitating the parallel processing of the derived granules.
At stage 630, the granules are processed using a support vector machine process. The granules can be processed, for example, by a processing module (e.g., processing module 220 of FIG. 2). The support vector machine process can operate to derive a hyperplane classifier associated with each granule. The hyperplane classifiers can be used to provide demarcations between given classifications of the data (e.g., spam or non-spam, virus or non- virus, spyware or non-spyware, etc.). FIG. 7 is a flowchart illustrating an example method used to derive classification associated with a new set of attributes for classification. At stage 710 a new tuple and associated attributes can be received. The new tuple and associated attributes can be received, for example, by a prediction module (e.g., classification system 100 of FIG. 2). At stage 720 a prediction can be generated based upon each hyperplane classifier. The prediction can be generated, for example, by a prediction module (e.g., prediction module 230 of FIG. 2). In various implementations, the prediction module can use the derived hyperplane classifiers to generate a granule classification prediction associated with each hyperplane classifier.
At stage 730, the granule classification predictions from each of the hyperplane classifiers can be aggregated. The predictions can be aggregated, for example, by an aggregation module (e.g., aggregation module 240 of FIG. 2). In various implementations, the granule classification predictions can be aggregated using a simple voting process, a distance between the datapoint and the hyperplane classifiers can be used to factor the final classification, or a Bayesian confidence can be used to weight the predictions based upon the confidence associated with the respective hyperplane classifiers. FIG. 8 is a flowchart illustrating an example method used to derive granules and distribute granules to processing modules. The method is initialized at stage 800. At stage 805, a training dataset is received. The training dataset can be received, for example, by a granule selection module (e.g., classification system 100 of FIG. 2). The training dataset, in various examples, can include parsed or unparsed data describing attributes of an item for classification. In some examples, the item can include documents, deoxyribonucleic acid
(DNA) sequences, chemicals, or any other item that has definite and/or quantifiable attributes that can be compiled and analyzed. In other examples, the training dataset can include a document corpus operable to be parsed to identify attributes of each document in the document corpus. At stage 810, a counter can be initialized. The counter can be initialized, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2). In various implementations, the counter can be used to identify when enough granules have been generated based on the training dataset. For example, the number of granules for a given dataset can be a percentage (e.g., 50%) of the number of tuples in the training dataset. At stage 815, a bootstrap aggregating process is used to randomly select tuples from among the training dataset. The bootstrap aggregating process can be performed, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2). In various implementations, the bootstrap aggregating process randomly selects a tuple from the training dataset, and replaces the tuple before selecting another tuple, until a predefined number of selections have been made. In such implementations, duplicates can be selected.
Thus, it is unknown how many tuples will be selected prior to the bootstrap aggregating process, though it does ensure that the number of samples will be no greater than the number of selections made. In some examples, the predefined number of selections can be based upon a percentage (e.g., 10%) of the size of the training dataset.
At stage 820, the random sample of tuples is projected into a random subspace. The projection into a random subspace can be performed, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2). The random subspace can be selected, in some implementations, by randomly selecting the features to be used within the granule, without replacement. For example, when a first feature is selected, the feature is not replaced into the group, but removed so as not to be selected a second time. Such random selection guarantees that the granule will include a predefined number of features in each granule.
At stage 825, the generated granule is labeled as the nth granule, where n is the current counter value. The granule can be labeled, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2).
At stage 830, the counter is incremented (n = n + 1). The counter can be incremented, for example, by a granule selection module (e.g., granule selection module 210 of FIG. 2). At stage 835, the counter can be compared to a threshold to determine whether a predefined number of granules have been generated. If a predefined number of granules have not been generated, the process returns to stage 815 and generates additional granules until the specified number of granules have been generated. However, if the counter has reached the threshold at stage 835, the process can continue to stage 840 where the granules can be distributed. The granules can be distributed, for example, by a distribution module (e.g., distribution module 420, 520 of FIGS. 4 and 5, respectively). In various implementations, the granules can be distributed based upon the characteristics of a plurality of processing modules or the characteristics of the granules themselves.
At stage 845, the granules can be processed. The granules can be processed, for example, by distributed processing module (e.g., distributed processing modules 400a-e, 500a-e of FIGS. 4 and 5, respectively). In some implementations, the distributed processing modules can be executed by multiple processors. In additional implementations, the distributed processing modules can execute a support vector machine process on each generated granule to derive a hyperplane classifier associated with each generated granule. The hyperplane classifier can be compared to unclassified data to derive a classification prediction associated with the unclassified data.
At optional stage 850, the hyperplane classifiers can be validated. The hyperplane classifiers can be validated, for example, by distributed processing modules (e.g., distributed processing modules 400a-e, 500a-e of FIGS. 4 and 5, respectively). In some implementations, each hyperplane classifier can be validated using respective out-of-bag data associated with the granule used to generate the hyperplane classifier. Thus, each hyperplane classifier can be tested to determine the effectiveness of the derived hyperplane classifier.
At optional stage 855, a determination is made which hyperplane classifiers to use in prediction modules based upon the validation. The determination of which hyperplane classifiers to use can be performed, for example, by a distributed processing module (e.g., distributed processing modules 400a-e, 500a-e of FIGS. 4 and 5, respectively). In some implementations, a threshold effectiveness level can be identified whereby if the validation does not meet the threshold, it is not used for predicting classifications for unclassified datasets. For example, if a hyperplane classifier is validated as being correct less than 50% of the time, the classification associated with the hyperplane classifier is incorrect more often than it is correct. In some implementations, such hyperplane classifier can be discarded as misleading with respect to the final classification prediction.
The method ends at stage 860. The method can be used to efficiently derive a plurality of hyperplane classifiers associated with a training dataset by distributing the granules for parallel and/or independent processing. Moreover, inaccurate hyperplane classifiers can be discarded in some implementations.
In various implementations of the above description, message filters can forward, drop, quarantine, delay delivery, or specify messages for more detailed testing. In some implementations, the messages can be delayed to facilitate collection of additional information related to the message.
The systems and methods disclosed herein may use data signals conveyed using networks (e.g., local area network, wide area network, internet, etc.), fiber optic medium, carrier waves, wireless networks (e.g., wireless local area networks, wireless metropolitan area networks, cellular networks, etc.), etc. for communication with one or more data processing devices (e.g., mobile devices). The data signals can carry any or all of the data disclosed herein that is provided to or from a device.
The methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by one or more processors. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform methods described herein.
The systems and methods may be provided on many different types of computer- readable media including computer storage mechanisms (e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) that contain instructions for use in execution by a processor to perform the methods' operations and implement the systems described herein.
The computer components, software modules, functions and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that software instructions or a module can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code or firmware. The software components and/or functionality may be located on a single device or distributed across multiple devices depending upon the situation at hand. This written description sets forth the best mode of the invention and provides examples to describe the invention and to enable a person of ordinary skill in the art to make and use the invention. This written description does not limit the invention to the precise terms set forth. Thus, while the invention has been described in detail with reference to the examples set forth above, those of ordinary skill in the art may effect alterations, modifications and variations to the examples without departing from the scope of the invention.
As used in the description herein and throughout the claims that follow, the meaning of "a," "an," and "the" includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of "in" includes "in" and "on" unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of "and" and "or" include both the conjunctive and disjunctive and may be used interchangeably unless the context clearly dictates otherwise.
Ranges may be expressed herein as from "about" one particular value, and/or to "about" another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent "about," it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. These and other implementations are within the scope of the following claims.

Claims

CLAIMSWhat is claimed is:
1. A method comprising: receiving a training dataset comprising a plurality of tuples and a plurality of attributes for each of the tuples; deriving a plurality of granules from the training dataset, each granule comprising a plurality of sample tuples and a plurality of sample attributes; processing the granules using a support vector machine process to identify a hyperplane classifier associated with each of the granules; predicting a classification of a new tuple using each of the hyperplane classifiers to produce a plurality of predictions; aggregating the predictions to derive a decision on a final classification of the new tuple; and filtering a communication associated with the new tuple based upon the final classification of the new tuple.
2. The method of claim 1, wherein deriving each granule comprises: randomly selecting granule tuples from among the plurality of tuples with replacement; and randomly selecting granule attributes from the plurality of attributes without replacement.
3. The method of claim 2, wherein the selection of granule tuples and granule attributes for each granule is independent of the selection of granule tuples and granule attributes for other granules.
4. The method of claim 1, wherein the training dataset comprises a relational database of tuples with associated attributes, each of the tuples having a known classification.
5. The method of claim 1, further comprising validating a hyperplane classifier associated with a granule by attempting to classify a plurality of tuples from the training dataset which were not-included in the granule.
6. The method of claim 5, further comprising generating a hyperplane classifier effectiveness level based upon the validation of the granule against tuples from the training dataset which were not included in the granule.
7. The method of claim 5, wherein aggregating the predicted classifications comprises weighting the predictions based upon the hyperplane classifier effectiveness levels associated with the granules, respectively, and aggregating the weighted predictions.
8. The method of claim 1, further comprising: weighting the predictions based upon a distance of the new tuple from the hyperplane classifiers, respectively; aggregating the weighted predictions.
9. The method of claim 1, wherein each of the predictions comprises a vote, and aggregating the predictions comprises adding the votes together and determining which classification is most common.
10. The method of claim 1, wherein the tuples are gene sequences and the attributes are features of the gene sequences, whereby the method is operable to determine whether a gene sequence is likely to be share known characteristics of other gene sequences.
11. The method of claim 1 , wherein the tuples are documents and the attributes are features of the documents, whereby the method is operable to determine whether a document is likely to be share known characteristics of other documents.
12. The method of claim 1 , wherein the known characteristics comprise one or more of spam characteristics, virus characteristics, spyware characteristics, or phishing characteristics, and the method is operable to determine whether the new tuple should be classified as including one or more of the known characteristics.
13. The method of claim 1, wherein processing the granules comprises processing the granules on a plurality of processors in parallel, the processors being operable to identify support vector machines associated.
14. The method of claim 13, further comprising selecting the plurality of processors based upon the processing power available on the respective processors.
15. A system comprising : a granule selection module operable to select a plurality of granules from a training dataset, each the granules comprising a plurality of tuples and a plurality of attributes; a plurality of granule processing modules operable to process granules using a support vector machine process to identify a hyperplane classifier associated with each of the granules; one or more prediction modules operable to predict a classification associated with an unknown tuple based upon the hyperplane classifiers to produce a plurality of granule predictions; an aggregation module operable to aggregate the granule predictions to derive a decision on a final classification associated with the unknown tuple; a message filter operable filter a communication associated with the unknown tuple based upon the final classification of the unknown tuple.
16. The system of claim 15, wherein the unknown tuple comprises the features of an unclassified document, and the system further comprises a parsing module operable to parse the unclassified document to derive a plurality of unclassified attributes associated with the unknown tuple.
17. The system of claim 16, wherein the one or more prediction modules are operable to extract a portion of the unclassified attributes based upon the granule and the hyperplane
- 99 - classifier, and is operable to compare the unclassified attributes to the hyperplane classifier to derive the prediction associated with the granule.
18. The system of claim 17, wherein the one or more prediction modules are operable to generate a prediction for each of the hyperplane classifiers to produce the plurality of granule predictions.
19. The system of claim 18, wherein the aggregation module is operable to count each of the predictions as a vote, and to derive the final prediction based upon which of the classifications accumulates the most votes.
20. The system of claim 15, further comprising: a validation module operable to test the hyperplane classifiers associated with respective granules on tuples that are not part of the respective granules, thereby producing a plurality of effectiveness metrics respectively associated with the hyperplane classifiers; and wherein the aggregation module is operable to weight the predictions based upon the effectiveness metrics respectively associated with the hyperplane classifiers.
21. The system of claim 15, wherein each prediction includes a distance metric from the hyperplane classifier, and the aggregation module is operable to weight each prediction based upon the associated distance metric.
22. The system of claim 15, wherein the plurality of granule processing modules are operable to be processed independently.
23. The system of claim 15 , wherein the plurality of granule processing modules are operable to be processed in parallel.
24. The system of claim 15, wherein the plurality of granule processing modules are operable to be executed by separate processors.
25. The system of claim 15, wherein the system is operable to classify a tuple as one or more of a spam risk, a phishing risk, a virus risk, or a spyware risk.
PCT/US2009/031853 2008-01-25 2009-01-23 Granular support vector machine with random granularity WO2009094552A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/020,253 US8160975B2 (en) 2008-01-25 2008-01-25 Granular support vector machine with random granularity
US12/020,253 2008-01-25

Publications (2)

Publication Number Publication Date
WO2009094552A2 true WO2009094552A2 (en) 2009-07-30
WO2009094552A3 WO2009094552A3 (en) 2009-11-05

Family

ID=40900223

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/031853 WO2009094552A2 (en) 2008-01-25 2009-01-23 Granular support vector machine with random granularity

Country Status (2)

Country Link
US (1) US8160975B2 (en)
WO (1) WO2009094552A2 (en)

Families Citing this family (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015942A1 (en) 2002-03-08 2006-01-19 Ciphertrust, Inc. Systems and methods for classification of messaging entities
US8578480B2 (en) 2002-03-08 2013-11-05 Mcafee, Inc. Systems and methods for identifying potentially malicious messages
US8561167B2 (en) 2002-03-08 2013-10-15 Mcafee, Inc. Web reputation scoring
US8635690B2 (en) 2004-11-05 2014-01-21 Mcafee, Inc. Reputation based message processing
US8214497B2 (en) 2007-01-24 2012-07-03 Mcafee, Inc. Multi-dimensional reputation scoring
US7779156B2 (en) 2007-01-24 2010-08-17 Mcafee, Inc. Reputation based load balancing
US8763114B2 (en) 2007-01-24 2014-06-24 Mcafee, Inc. Detecting image spam
US8185930B2 (en) 2007-11-06 2012-05-22 Mcafee, Inc. Adjusting filter or classification control settings
US8589503B2 (en) 2008-04-04 2013-11-19 Mcafee, Inc. Prioritizing network traffic
US20100191567A1 (en) * 2009-01-26 2010-07-29 At&T Intellectual Property I, L.P. Method and apparatus for analyzing rhetorical content
US8489685B2 (en) 2009-07-17 2013-07-16 Aryaka Networks, Inc. Application acceleration as a service system and method
US8621638B2 (en) 2010-05-14 2013-12-31 Mcafee, Inc. Systems and methods for classification of messaging entities
JP5570895B2 (en) * 2010-07-15 2014-08-13 富士フイルム株式会社 Detector configuration apparatus, method, and program
US20120046996A1 (en) * 2010-08-17 2012-02-23 Vishal Shah Unified data management platform
US9122877B2 (en) 2011-03-21 2015-09-01 Mcafee, Inc. System and method for malware and network reputation correlation
US8605998B2 (en) 2011-05-06 2013-12-10 Toyota Motor Engineering & Manufacturing North America, Inc. Real-time 3D point cloud obstacle discriminator apparatus and associated methodology for training a classifier via bootstrapping
US8799201B2 (en) 2011-07-25 2014-08-05 Toyota Motor Engineering & Manufacturing North America, Inc. Method and system for tracking objects
US8972307B1 (en) * 2011-09-15 2015-03-03 Google Inc. Method and apparatus for machine learning
WO2013100783A1 (en) 2011-12-29 2013-07-04 Intel Corporation Method and system for control signalling in a data path module
US8931043B2 (en) 2012-04-10 2015-01-06 Mcafee Inc. System and method for determining and using local reputations of users and hosts to protect information in a network environment
US9690635B2 (en) * 2012-05-14 2017-06-27 Qualcomm Incorporated Communicating behavior information in a mobile computing device
US9609456B2 (en) 2012-05-14 2017-03-28 Qualcomm Incorporated Methods, devices, and systems for communicating behavioral analysis information
US9202047B2 (en) 2012-05-14 2015-12-01 Qualcomm Incorporated System, apparatus, and method for adaptive observation of mobile device behavior
US9495537B2 (en) 2012-08-15 2016-11-15 Qualcomm Incorporated Adaptive observation of behavioral features on a mobile device
US9747440B2 (en) 2012-08-15 2017-08-29 Qualcomm Incorporated On-line behavioral analysis engine in mobile device with multiple analyzer model providers
US9319897B2 (en) 2012-08-15 2016-04-19 Qualcomm Incorporated Secure behavior analysis over trusted execution environment
US8938796B2 (en) 2012-09-20 2015-01-20 Paul Case, SR. Case secure computer architecture
US9686023B2 (en) 2013-01-02 2017-06-20 Qualcomm Incorporated Methods and systems of dynamically generating and using device-specific and device-state-specific classifier models for the efficient classification of mobile device behaviors
US10089582B2 (en) 2013-01-02 2018-10-02 Qualcomm Incorporated Using normalized confidence values for classifying mobile device behaviors
US9684870B2 (en) 2013-01-02 2017-06-20 Qualcomm Incorporated Methods and systems of using boosted decision stumps and joint feature selection and culling algorithms for the efficient classification of mobile device behaviors
US9742559B2 (en) 2013-01-22 2017-08-22 Qualcomm Incorporated Inter-module authentication for securing application execution integrity within a computing device
US9491187B2 (en) 2013-02-15 2016-11-08 Qualcomm Incorporated APIs for obtaining device-specific behavior classifier models from the cloud
US10331583B2 (en) * 2013-09-26 2019-06-25 Intel Corporation Executing distributed memory operations using processing elements connected by distributed channels
US9697469B2 (en) 2014-08-13 2017-07-04 Andrew McMahon Method and system for generating and aggregating models based on disparate data from insurance, financial services, and public industries
US20170185667A1 (en) * 2015-12-24 2017-06-29 Mcafee, Inc. Content classification
CA2960505C (en) * 2016-03-10 2018-12-11 Tata Consultancy Services Limited System and method for visual bayesian data fusion
US10586171B2 (en) 2016-05-31 2020-03-10 International Business Machines Corporation Parallel ensemble of support vector machines
US10402168B2 (en) 2016-10-01 2019-09-03 Intel Corporation Low energy consumption mantissa multiplication for floating point multiply-add operations
US10416999B2 (en) 2016-12-30 2019-09-17 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10474375B2 (en) 2016-12-30 2019-11-12 Intel Corporation Runtime address disambiguation in acceleration hardware
US10572376B2 (en) 2016-12-30 2020-02-25 Intel Corporation Memory ordering in acceleration hardware
US10558575B2 (en) 2016-12-30 2020-02-11 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10009775B1 (en) 2017-03-14 2018-06-26 Aruba Networks, Inc. Network deployment
US10469397B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods with configurable network-based dataflow operator circuits
US10467183B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods for pipelined runtime services in a spatial array
US10515046B2 (en) 2017-07-01 2019-12-24 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10445451B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
US10387319B2 (en) 2017-07-01 2019-08-20 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features
US10515049B1 (en) 2017-07-01 2019-12-24 Intel Corporation Memory circuits and methods for distributed memory hazard detection and error recovery
US10445234B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features
US11086816B2 (en) 2017-09-28 2021-08-10 Intel Corporation Processors, methods, and systems for debugging a configurable spatial accelerator
US10496574B2 (en) 2017-09-28 2019-12-03 Intel Corporation Processors, methods, and systems for a memory fence in a configurable spatial accelerator
US10445098B2 (en) 2017-09-30 2019-10-15 Intel Corporation Processors and methods for privileged configuration in a spatial array
US10380063B2 (en) 2017-09-30 2019-08-13 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator
US10565134B2 (en) 2017-12-30 2020-02-18 Intel Corporation Apparatus, methods, and systems for multicast in a configurable spatial accelerator
US10445250B2 (en) 2017-12-30 2019-10-15 Intel Corporation Apparatus, methods, and systems with a configurable spatial accelerator
US10417175B2 (en) 2017-12-30 2019-09-17 Intel Corporation Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator
US11307873B2 (en) 2018-04-03 2022-04-19 Intel Corporation Apparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator with predicate propagation and merging
US10564980B2 (en) 2018-04-03 2020-02-18 Intel Corporation Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator
US10853073B2 (en) 2018-06-30 2020-12-01 Intel Corporation Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator
US10891240B2 (en) 2018-06-30 2021-01-12 Intel Corporation Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator
US10459866B1 (en) 2018-06-30 2019-10-29 Intel Corporation Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator
US11200186B2 (en) 2018-06-30 2021-12-14 Intel Corporation Apparatuses, methods, and systems for operations in a configurable spatial accelerator
US10678724B1 (en) 2018-12-29 2020-06-09 Intel Corporation Apparatuses, methods, and systems for in-network storage in a configurable spatial accelerator
US10817291B2 (en) 2019-03-30 2020-10-27 Intel Corporation Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator
US11029927B2 (en) 2019-03-30 2021-06-08 Intel Corporation Methods and apparatus to detect and annotate backedges in a dataflow graph
US10915471B2 (en) 2019-03-30 2021-02-09 Intel Corporation Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
US10965536B2 (en) 2019-03-30 2021-03-30 Intel Corporation Methods and apparatus to insert buffers in a dataflow graph
US11037050B2 (en) 2019-06-29 2021-06-15 Intel Corporation Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator
US11138477B2 (en) * 2019-08-15 2021-10-05 Collibra Nv Classification of data using aggregated information from multiple classification modules
US11907713B2 (en) 2019-12-28 2024-02-20 Intel Corporation Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
US20230351172A1 (en) * 2022-04-29 2023-11-02 Intuit Inc. Supervised machine learning method for matching unsupervised data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6662170B1 (en) * 2000-08-22 2003-12-09 International Business Machines Corporation System and method for boosting support vector machines
US20060112026A1 (en) * 2004-10-29 2006-05-25 Nec Laboratories America, Inc. Parallel support vector method and apparatus
US20070239642A1 (en) * 2006-03-31 2007-10-11 Yahoo!, Inc. Large scale semi-supervised linear support vector machines

Family Cites Families (365)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4289930A (en) 1978-11-30 1981-09-15 The General Electric Company Limited Electronic apparatus for the display of information received over a line
US4386416A (en) 1980-06-02 1983-05-31 Mostek Corporation Data compression, encryption, and in-line transmission system
US4384325A (en) 1980-06-23 1983-05-17 Sperry Corporation Apparatus and method for searching a data base using variable search criteria
US4532588A (en) 1982-11-09 1985-07-30 International Business Machines Corporation Electronic document distribution network with uniform data stream
US4713780A (en) 1985-04-15 1987-12-15 Express Communications, Inc. Electronic mail
US4754428A (en) 1985-04-15 1988-06-28 Express Communications, Inc. Apparatus and method of distributing documents to remote terminals with different formats
US4837798A (en) 1986-06-02 1989-06-06 American Telephone And Telegraph Company Communication system having unified messaging
NL8602418A (en) 1986-09-25 1988-04-18 Philips Nv DEVICE FOR DISPLAYING A PCM MODULATED SIGNAL WITH A MUTE CIRCUIT.
DE3711776A1 (en) * 1987-04-08 1988-10-27 Huels Chemische Werke Ag USE OF N-POLYHYDROXYALKYL Fatty Acid Amides As Thickeners For Liquid Aqueous Surfactant Systems
JP2702927B2 (en) 1987-06-15 1998-01-26 株式会社日立製作所 String search device
EP0298691B1 (en) 1987-07-08 1994-10-05 Matsushita Electric Industrial Co., Ltd. Method and apparatus for protection of signal copy
US4853961A (en) 1987-12-18 1989-08-01 Pitney Bowes Inc. Reliable document authentication system
US4951196A (en) 1988-05-04 1990-08-21 Supply Tech, Inc. Method and apparatus for electronic data interchange
US5008814A (en) 1988-08-15 1991-04-16 Network Equipment Technologies, Inc. Method and apparatus for updating system software for a plurality of data processing units in a communication network
US5144660A (en) 1988-08-31 1992-09-01 Rose Anthony M Securing a computer against undesired write operations to or read operations from a mass storage device
US5054096A (en) 1988-10-24 1991-10-01 Empire Blue Cross/Blue Shield Method and apparatus for converting documents into electronic data for transaction processing
US4975950A (en) 1988-11-03 1990-12-04 Lentz Stephen A System and method of protecting integrity of computer data and software
CA1321656C (en) 1988-12-22 1993-08-24 Chander Kasiraj Method for restricting delivery and receipt of electronic message
US5167011A (en) 1989-02-15 1992-11-24 W. H. Morris Method for coodinating information storage and retrieval
US5210824A (en) 1989-03-03 1993-05-11 Xerox Corporation Encoding-format-desensitized methods and means for interchanging electronic document as appearances
US5020059A (en) 1989-03-31 1991-05-28 At&T Bell Laboratories Reconfigurable signal processor
US5144659A (en) 1989-04-19 1992-09-01 Richard P. Jones Computer file protection system
US5119465A (en) 1989-06-19 1992-06-02 Digital Equipment Corporation System for selectively converting plurality of source data structures through corresponding source intermediate structures, and target intermediate structures into selected target structure
GB8918553D0 (en) 1989-08-15 1989-09-27 Digital Equipment Int Message control system
JPH03117940A (en) 1989-09-25 1991-05-20 Internatl Business Mach Corp <Ibm> Method of managing electronic mail
US5105184B1 (en) * 1989-11-09 1997-06-17 Noorali Pirani Methods for displaying and integrating commercial advertisements with computer software
US5495610A (en) * 1989-11-30 1996-02-27 Seer Technologies, Inc. Software distribution system to build and distribute a software release
EP0451384B1 (en) 1990-04-10 1997-09-24 International Business Machines Corporation Hypertext data processing system and method
US5319776A (en) 1990-04-19 1994-06-07 Hilgraeve Corporation In transit detection of computer virus with safeguard
US5210825A (en) 1990-04-26 1993-05-11 Teknekron Communications Systems, Inc. Method and an apparatus for displaying graphical data received from a remote computer by a local computer
US5822527A (en) 1990-05-04 1998-10-13 Digital Equipment Corporation Method and apparatus for information stream filtration using tagged information access and action registration
US5144557A (en) 1990-08-13 1992-09-01 International Business Machines Corporation Method and system for document distribution by reference to a first group and particular document to a second group of user in a data processing system
US5276869A (en) * 1990-09-10 1994-01-04 International Business Machines Corporation System for selecting document recipients as determined by technical content of document and for electronically corroborating receipt of document
US5247661A (en) 1990-09-10 1993-09-21 International Business Machines Corporation Method and apparatus for automated document distribution in a data processing system
US5239466A (en) 1990-10-04 1993-08-24 Motorola, Inc. System for selectively routing and merging independent annotations to a document at remote locations
JP3161725B2 (en) * 1990-11-21 2001-04-25 株式会社日立製作所 Workstations and collaborative information processing systems
US5283887A (en) * 1990-12-19 1994-02-01 Bull Hn Information Systems Inc. Automatic document format conversion in an electronic mail system based upon user preference
JP3177684B2 (en) * 1991-03-14 2001-06-18 株式会社日立製作所 Email system
US5424724A (en) 1991-03-27 1995-06-13 International Business Machines Corporation Method and apparatus for enhanced electronic mail distribution
US5513323A (en) 1991-06-14 1996-04-30 International Business Machines Corporation Method and apparatus for multistage document format transformation in a data processing system
US5577209A (en) 1991-07-11 1996-11-19 Itt Corporation Apparatus and method for providing multi-level security for communication among computers and terminals on a network
US5379340A (en) * 1991-08-02 1995-01-03 Betterprize Limited Text communication system
US5367621A (en) 1991-09-06 1994-11-22 International Business Machines Corporation Data processing method to provide a generalized link from a reference point in an on-line book to an arbitrary multimedia object which can be dynamically updated
US5313521A (en) 1992-04-15 1994-05-17 Fujitsu Limited Key distribution protocol for file transfer in the local area network
US5485409A (en) * 1992-04-30 1996-01-16 International Business Machines Corporation Automated penetration analysis system and method
US5278901A (en) * 1992-04-30 1994-01-11 International Business Machines Corporation Pattern-oriented intrusion-detection system and method
US5235642A (en) 1992-07-21 1993-08-10 Digital Equipment Corporation Access control subsystem and method for distributed computer system using locally cached authentication credentials
GB2271002B (en) * 1992-09-26 1995-12-06 Digital Equipment Int Data processing system
US5418908A (en) 1992-10-15 1995-05-23 International Business Machines Corporation System for automatically establishing a link between an electronic mail item and a remotely stored reference through a place mark inserted into the item
JP3553987B2 (en) * 1992-11-13 2004-08-11 株式会社日立製作所 Client server system
US5675733A (en) 1992-11-30 1997-10-07 International Business Machines Corporation Statistical analysis and display of reception status of electronic messages
US5544320A (en) 1993-01-08 1996-08-06 Konrad; Allan M. Remote information service access system based on a client-server-service model
US5406557A (en) * 1993-02-01 1995-04-11 National Semiconductor Corporation Interenterprise electronic mail hub
US5479411A (en) 1993-03-10 1995-12-26 At&T Corp. Multi-media integrated message arrangement
US5404231A (en) * 1993-05-24 1995-04-04 Audiofax, Inc. Sender-based facsimile store and forward facility
JPH0764788A (en) 1993-06-14 1995-03-10 Mitsubishi Electric Corp Microcomputer
JPH0737087A (en) * 1993-07-19 1995-02-07 Matsushita Electric Ind Co Ltd Picture processor
JPH0779298A (en) 1993-09-08 1995-03-20 Hitachi Ltd Facsimile server system
US5513126A (en) * 1993-10-04 1996-04-30 Xerox Corporation Network having selectively accessible recipient prioritized communication channel profiles
US5657461A (en) 1993-10-04 1997-08-12 Xerox Corporation User interface for defining and automatically transmitting data according to preferred communication channels
US5414833A (en) 1993-10-27 1995-05-09 International Business Machines Corporation Network security system and method using a parallel finite state machine adaptive active monitor and responder
US5771354A (en) * 1993-11-04 1998-06-23 Crawford; Christopher M. Internet online backup system provides remote storage for customers using IDs and passwords which were interactively established when signing up for backup services
US5606668A (en) * 1993-12-15 1997-02-25 Checkpoint Software Technologies Ltd. System for securing inbound and outbound data packet flow in a computer network
US5509074A (en) * 1994-01-27 1996-04-16 At&T Corp. Method of protecting electronically published materials using cryptographic protocols
US5557742A (en) 1994-03-07 1996-09-17 Haystack Labs, Inc. Method and system for detecting intrusion into and misuse of a data processing system
US5541993A (en) 1994-05-10 1996-07-30 Fan; Eric Structure and method for secure image transmission
US5675507A (en) 1995-04-28 1997-10-07 Bobo, Ii; Charles R. Message storage and delivery system
US5511122A (en) * 1994-06-03 1996-04-23 The United States Of America As Represented By The Secretary Of The Navy Intermediate network authentication
US5416842A (en) 1994-06-10 1995-05-16 Sun Microsystems, Inc. Method and apparatus for key-management scheme for use with internet protocols at site firewalls
US5535276A (en) 1994-11-09 1996-07-09 Bell Atlantic Network Services, Inc. Yaksha, an improved system and method for securing communications using split private key asymmetric cryptography
US5481312A (en) * 1994-09-12 1996-01-02 At&T Corp. Method of and apparatus for the transmission of high and low priority segments of a video bitstream over packet networks
US5740231A (en) * 1994-09-16 1998-04-14 Octel Communications Corporation Network-based multimedia communications and directory system and method of operation
US5933478A (en) 1994-09-28 1999-08-03 Hitachi, Ltd. Data transfer system and handheld terminal device used therefor
US5805719A (en) 1994-11-28 1998-09-08 Smarttouch Tokenless identification of individuals
US5758257A (en) 1994-11-29 1998-05-26 Herz; Frederick System and method for scheduling broadcast of and access to video programs and other data using customer profiles
US5619648A (en) * 1994-11-30 1997-04-08 Lucent Technologies Inc. Message filtering techniques
US5608874A (en) * 1994-12-02 1997-03-04 Autoentry Online, Inc. System and method for automatic data file format translation and transmission having advanced features
US5550984A (en) 1994-12-07 1996-08-27 Matsushita Electric Corporation Of America Security system for preventing unauthorized communications between networks by translating communications received in ip protocol to non-ip protocol to remove address and routing services information
US5530852A (en) 1994-12-20 1996-06-25 Sun Microsystems, Inc. Method for extracting profiles and topics from a first file written in a first markup language and generating files in different markup languages containing the profiles and topics for use in accessing data described by the profiles and topics
US5694616A (en) 1994-12-30 1997-12-02 International Business Machines Corporation Method and system for prioritization of email items by selectively associating priority attribute with at least one and fewer than all of the recipients
US5638487A (en) 1994-12-30 1997-06-10 Purespeech, Inc. Automatic speech recognition
US5878230A (en) * 1995-01-05 1999-03-02 International Business Machines Corporation System for email messages wherein the sender designates whether the recipient replies or forwards to addresses also designated by the sender
US5710883A (en) * 1995-03-10 1998-01-20 Stanford University Hypertext document transport mechanism for firewall-compatible distributed world-wide web publishing
US5790793A (en) 1995-04-04 1998-08-04 Higley; Thomas Method and system to create, transmit, receive and process information, including an address to further information
US5677955A (en) 1995-04-07 1997-10-14 Financial Services Technology Consortium Electronic funds transfer instruments
US6119142A (en) 1995-04-25 2000-09-12 Canon Kabushiki Kaisha Data communication apparatus for managing information indicating that data has reached its destination
WO1996035994A1 (en) * 1995-05-08 1996-11-14 Compuserve Incorporated Rules based electronic message management system
JP3338585B2 (en) * 1995-05-16 2002-10-28 富士通株式会社 Apparatus and method for converting presentation data
US5632011A (en) 1995-05-22 1997-05-20 Sterling Commerce, Inc. Electronic mail management system for operation on a host computer system
US5812776A (en) 1995-06-07 1998-09-22 Open Market, Inc. Method of providing internet pages by mapping telephone number provided by client to URL and returning the same in a redirect command by server
US5708780A (en) * 1995-06-07 1998-01-13 Open Market, Inc. Internet server access control and monitoring systems
US5742759A (en) 1995-08-18 1998-04-21 Sun Microsystems, Inc. Method and system for facilitating access control to system resources in a distributed computer system
EP0762337A3 (en) 1995-09-08 2000-01-19 Francotyp-Postalia Aktiengesellschaft & Co. Method and device for enhancing manipulation-proof of critical data
US5696822A (en) 1995-09-28 1997-12-09 Symantec Corporation Polymorphic virus detection module
US5826013A (en) 1995-09-28 1998-10-20 Symantec Corporation Polymorphic virus detection module
US5572643A (en) 1995-10-19 1996-11-05 Judson; David H. Web browser with dynamic display of information objects during linking
US5948062A (en) 1995-10-27 1999-09-07 Emc Corporation Network file server using a cached disk array storing a network file directory including file locking information and data mover computers each having file system software for shared read-write file access
US5826029A (en) 1995-10-31 1998-10-20 International Business Machines Corporation Secured gateway interface
US5793763A (en) 1995-11-03 1998-08-11 Cisco Technology, Inc. Security system for network address translation systems
US5923846A (en) 1995-11-06 1999-07-13 Microsoft Corporation Method of uploading a message containing a file reference to a server and downloading a file from the server using the file reference
US5764906A (en) 1995-11-07 1998-06-09 Netword Llc Universal electronic resource denotation, request and delivery system
JPH09153050A (en) * 1995-11-29 1997-06-10 Hitachi Ltd Method and device for gathering document information
US5892825A (en) * 1996-05-15 1999-04-06 Hyperlock Technologies Inc Method of secure server control of local media via a trigger through a network for instant local access of encrypted data on local media
US5937164A (en) 1995-12-07 1999-08-10 Hyperlock Technologies, Inc. Method and apparatus of secure server control of local media via a trigger through a network for instant local access of encrypted data on local media within a platform independent networking system
US5758343A (en) 1995-12-08 1998-05-26 Ncr Corporation Apparatus and method for integrating multiple delegate directory service agents
US5745574A (en) * 1995-12-15 1998-04-28 Entegrity Solutions Corporation Security infrastructure for electronic transactions
US5706442A (en) * 1995-12-20 1998-01-06 Block Financial Corporation System for on-line financial services using distributed objects
US5903723A (en) 1995-12-21 1999-05-11 Intel Corporation Method and apparatus for transmitting electronic mail attachments with attachment references
US5781901A (en) 1995-12-21 1998-07-14 Intel Corporation Transmitting electronic mail attachment over a network using a e-mail page
US5796951A (en) * 1995-12-22 1998-08-18 Intel Corporation System for displaying information relating to a computer network including association devices with tasks performable on those devices
US5602918A (en) * 1995-12-22 1997-02-11 Virtual Open Network Environment Corp. Application level security system and method
WO1997025798A1 (en) 1996-01-11 1997-07-17 Mrj, Inc. System for controlling access and distribution of digital property
US5801700A (en) 1996-01-19 1998-09-01 Silicon Graphics Incorporated System and method for an iconic drag and drop interface for electronic file transfer
US5826014A (en) 1996-02-06 1998-10-20 Network Engineering Software Firewall system for protecting network elements connected to a public network
US5963915A (en) 1996-02-21 1999-10-05 Infoseek Corporation Secure, convenient and efficient system and method of performing trans-internet purchase transactions
US5751956A (en) 1996-02-21 1998-05-12 Infoseek Corporation Method and apparatus for redirection of server external hyper-link references
US5855020A (en) 1996-02-21 1998-12-29 Infoseek Corporation Web scan process
US5862325A (en) * 1996-02-29 1999-01-19 Intermind Corporation Computer-based communication system and method using metadata defining a control structure
US5673322A (en) 1996-03-22 1997-09-30 Bell Communications Research, Inc. System and method for providing protocol translation and filtering to access the world wide web from wireless or low-bandwidth networks
US5850442A (en) 1996-03-26 1998-12-15 Entegrity Solutions Corporation Secure world wide electronic commerce over an open network
US5826022A (en) 1996-04-05 1998-10-20 Sun Microsystems, Inc. Method and apparatus for receiving electronic mail
US5727156A (en) * 1996-04-10 1998-03-10 Hotoffice Technologies, Inc. Internet-based automatic publishing system
US5845084A (en) 1996-04-18 1998-12-01 Microsoft Corporation Automatic data display formatting with a networking application
US5778372A (en) 1996-04-18 1998-07-07 Microsoft Corporation Remote retrieval and display management of electronic document with incorporated images
US5864852A (en) * 1996-04-26 1999-01-26 Netscape Communications Corporation Proxy server caching mechanism that provides a file directory structure and a mapping mechanism within the file directory structure
US5793972A (en) 1996-05-03 1998-08-11 Westminster International Computers Inc. System and method providing an interactive response to direct mail by creating personalized web page based on URL provided on mail piece
US5742769A (en) * 1996-05-06 1998-04-21 Banyan Systems, Inc. Directory with options for access to and display of email addresses
US5884033A (en) * 1996-05-15 1999-03-16 Spyglass, Inc. Internet filtering system for filtering data transferred over the internet utilizing immediate and deferred filtering actions
US5768528A (en) 1996-05-24 1998-06-16 V-Cast, Inc. Client-server system for delivery of online information
US5822526A (en) 1996-06-03 1998-10-13 Microsoft Corporation System and method for maintaining and administering email address names in a network
US5918013A (en) 1996-06-03 1999-06-29 Webtv Networks, Inc. Method of transcoding documents in a network environment using a proxy server
US5812398A (en) 1996-06-10 1998-09-22 Sun Microsystems, Inc. Method and system for escrowed backup of hotelled world wide web sites
US6108688A (en) 1996-06-12 2000-08-22 Sun Microsystems, Inc. System for reminding a sender of an email if recipient of the email does not respond by a selected time set by the sender
US6373950B1 (en) * 1996-06-17 2002-04-16 Hewlett-Packard Company System, method and article of manufacture for transmitting messages within messages utilizing an extensible, flexible architecture
US5742459A (en) * 1996-06-20 1998-04-21 Read-Rite Corporation Magnetic head having encapsulated magnetoresistive transducer and multilayered lead structure
US5781857A (en) 1996-06-28 1998-07-14 Motorola, Inc. Method of establishing an email monitor responsive to a wireless communications system user
US5790789A (en) 1996-08-02 1998-08-04 Suarez; Larry Method and architecture for the creation, control and deployment of services within a distributed computer environment
US6072942A (en) 1996-09-18 2000-06-06 Secure Computing Corporation System and method of electronic mail filtering using interconnected nodes
JPH10111727A (en) 1996-10-03 1998-04-28 Toshiba Corp Information equipment having telephone function and security rearising method therefor
US6119236A (en) 1996-10-07 2000-09-12 Shipley; Peter M. Intelligent network security device and method
US6012144A (en) * 1996-10-08 2000-01-04 Pickett; Thomas E. Transaction security method and apparatus
US5930479A (en) 1996-10-21 1999-07-27 At&T Corp Communications addressing system
US5790790A (en) 1996-10-24 1998-08-04 Tumbleweed Software Corporation Electronic document delivery system in which notification of said electronic document is sent to a recipient thereof
TW400487B (en) 1996-10-24 2000-08-01 Tumbleweed Software Corp Electronic document delivery system
US6119137A (en) 1997-01-30 2000-09-12 Tumbleweed Communications Corp. Distributed dynamic document conversion server
US6192407B1 (en) 1996-10-24 2001-02-20 Tumbleweed Communications Corp. Private, trackable URLs for directed document delivery
US6502191B1 (en) 1997-02-14 2002-12-31 Tumbleweed Communications Corp. Method and system for binary data firewall delivery
US6385655B1 (en) 1996-10-24 2002-05-07 Tumbleweed Communications Corp. Method and apparatus for delivering documents over an electronic network
US6058381A (en) 1996-10-30 2000-05-02 Nelson; Theodor Holm Many-to-many payments system for network content materials
US6453345B2 (en) 1996-11-06 2002-09-17 Datadirect Networks, Inc. Network security and surveillance system
US5991881A (en) 1996-11-08 1999-11-23 Harris Corporation Network surveillance system
US6167520A (en) 1996-11-08 2000-12-26 Finjan Software, Inc. System and method for protecting a client during runtime from hostile downloadables
US5796948A (en) 1996-11-12 1998-08-18 Cohen; Elliot D. Offensive message interceptor for computers
US5796942A (en) 1996-11-21 1998-08-18 Computer Associates International, Inc. Method and apparatus for automated network-wide surveillance and security breach intervention
JPH10164124A (en) 1996-12-04 1998-06-19 Canon Inc Communication device
US5968119A (en) 1996-12-09 1999-10-19 Wall Data Incorporated Method of accessing information of an SNA host computer from a client computer using a specific terminal emulation
US6285991B1 (en) 1996-12-13 2001-09-04 Visa International Service Association Secure interactive electronic account statement delivery system
WO1998027690A1 (en) * 1996-12-16 1998-06-25 Samsung Electronics Co. Ltd. Method for sending e-mail messages in a local area network, and device for applying same
US5911776A (en) 1996-12-18 1999-06-15 Unisys Corporation Automatic format conversion system and publishing methodology for multi-user network
US6061722A (en) 1996-12-23 2000-05-09 T E Network, Inc. Assessing network performance without interference with normal network operations
US5898836A (en) * 1997-01-14 1999-04-27 Netmind Services, Inc. Change-detection tool indicating degree and location of change of internet documents by comparison of cyclic-redundancy-check(CRC) signatures
US5978799A (en) 1997-01-30 1999-11-02 Hirsch; G. Scott Search engine including query database, user profile database, information templates and email facility
US5896499A (en) * 1997-02-21 1999-04-20 International Business Machines Corporation Embedded security processor
US6539430B1 (en) * 1997-03-25 2003-03-25 Symantec Corporation System and method for filtering data received by a computer system
US6061448A (en) 1997-04-01 2000-05-09 Tumbleweed Communications Corp. Method and system for dynamic server document encryption
TW396308B (en) 1997-04-01 2000-07-01 Tumbleweed Software Corp Document delivery system
US6108786A (en) 1997-04-25 2000-08-22 Intel Corporation Monitor network bindings for computer security
US5958005A (en) 1997-07-17 1999-09-28 Bell Atlantic Network Services, Inc. Electronic mail security
US7162738B2 (en) * 1998-11-03 2007-01-09 Tumbleweed Communications Corp. E-mail firewall with stored key encryption/decryption
US7117358B2 (en) 1997-07-24 2006-10-03 Tumbleweed Communications Corp. Method and system for filtering communication
AU8759098A (en) * 1997-07-24 1999-02-16 Tumbleweed Communications Corporation E-mail firewall with stored key encryption/decryption
US6006329A (en) 1997-08-11 1999-12-21 Symantec Corporation Detection of computer viruses spanning multiple data streams
US6199102B1 (en) * 1997-08-26 2001-03-06 Christopher Alan Cobb Method and system for filtering electronic messages
US6119230A (en) 1997-10-01 2000-09-12 Novell, Inc. Distributed dynamic security capabilities
EP0907120A3 (en) 1997-10-02 2004-03-24 Tumbleweed Software Corporation Method amd apparatus for delivering documents over an electronic network
US6393568B1 (en) 1997-10-23 2002-05-21 Entrust Technologies Limited Encryption and decryption system and method with content analysis provision
US6003027A (en) 1997-11-21 1999-12-14 International Business Machines Corporation System and method for determining confidence levels for the results of a categorization system
US6094731A (en) 1997-11-24 2000-07-25 Symantec Corporation Antivirus accelerator for computer networks
US6393465B2 (en) 1997-11-25 2002-05-21 Nixmail Corporation Junk electronic mail detector and eliminator
US5860068A (en) * 1997-12-04 1999-01-12 Petabyte Corporation Method and system for custom manufacture and delivery of a data product
US6202157B1 (en) * 1997-12-08 2001-03-13 Entrust Technologies Limited Computer network security system and method having unilateral enforceable security policy provision
US6023723A (en) * 1997-12-22 2000-02-08 Accepted Marketing, Inc. Method and system for filtering unwanted junk e-mail utilizing a plurality of filtering mechanisms
US6052709A (en) * 1997-12-23 2000-04-18 Bright Light Technologies, Inc. Apparatus and method for controlling delivery of unsolicited electronic mail
US6035423A (en) 1997-12-31 2000-03-07 Network Associates, Inc. Method and system for providing automated updating and upgrading of antivirus applications using a computer network
US6279133B1 (en) 1997-12-31 2001-08-21 Kawasaki Steel Corporation Method and apparatus for significantly improving the reliability of multilevel memory architecture
US6029256A (en) * 1997-12-31 2000-02-22 Network Associates, Inc. Method and system for allowing computer programs easy access to features of a virus scanning engine
US5999932A (en) 1998-01-13 1999-12-07 Bright Light Technologies, Inc. System and method for filtering unsolicited electronic mail messages using data matching and heuristic processing
CA2228687A1 (en) * 1998-02-04 1999-08-04 Brett Howard Secured virtual private networks
US6279113B1 (en) 1998-03-16 2001-08-21 Internet Tools, Inc. Dynamic signature inspection-based network intrusion detection
US6092114A (en) 1998-04-17 2000-07-18 Siemens Information And Communication Networks, Inc. Method and system for determining the location for performing file-format conversions of electronics message attachments
US6145083A (en) 1998-04-23 2000-11-07 Siemens Information And Communication Networks, Inc. Methods and system for providing data and telephony security
US6104500A (en) 1998-04-29 2000-08-15 Bcl, Computer Inc. Networked fax routing via email
US6298445B1 (en) 1998-04-30 2001-10-02 Netect, Ltd. Computer security
JP3017712B2 (en) 1998-05-15 2000-03-13 松下電送システム株式会社 Internet facsimile
US6275942B1 (en) 1998-05-20 2001-08-14 Network Associates, Inc. System, method and computer program product for automatic response to computer system misuse using active response modules
US6058482A (en) 1998-05-22 2000-05-02 Sun Microsystems, Inc. Apparatus, method and system for providing network security for executable code in computer and communications networks
US6330589B1 (en) 1998-05-26 2001-12-11 Microsoft Corporation System and method for using a client database to manage conversation threads generated from email or news messages
US6289214B1 (en) 1998-05-29 2001-09-11 Ericsson Inc. Systems and methods for deactivating a cellular radiotelephone system using an ANSI-41 short message service email
US6347374B1 (en) * 1998-06-05 2002-02-12 Intrusion.Com, Inc. Event detection
WO1999066383A2 (en) * 1998-06-15 1999-12-23 Dmw Worldwide, Inc. Method and apparatus for assessing the security of a computer system
US6317829B1 (en) 1998-06-19 2001-11-13 Entrust Technologies Limited Public key cryptography based security system to facilitate secure roaming of users
US6161130A (en) 1998-06-23 2000-12-12 Microsoft Corporation Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set
US6185689B1 (en) * 1998-06-24 2001-02-06 Richard S. Carson & Assoc., Inc. Method for network self security assessment
US6141778A (en) 1998-06-29 2000-10-31 Mci Communications Corporation Method and apparatus for automating security functions in a computer system
US6324656B1 (en) 1998-06-30 2001-11-27 Cisco Technology, Inc. System and method for rules-driven multi-phase network vulnerability assessment
US6442686B1 (en) 1998-07-02 2002-08-27 Networks Associates Technology, Inc. System and methodology for messaging server-based management and enforcement of crypto policies
US6269447B1 (en) 1998-07-21 2001-07-31 Raytheon Company Information security analysis system
US6151675A (en) 1998-07-23 2000-11-21 Tumbleweed Software Corporation Method and apparatus for effecting secure document format conversion
US6223213B1 (en) * 1998-07-31 2001-04-24 Webtv Networks, Inc. Browser-based email system with user interface for audio/video capture
US6304973B1 (en) 1998-08-06 2001-10-16 Cryptek Secure Communications, Llc Multi-level security network system
US6442588B1 (en) 1998-08-20 2002-08-27 At&T Corp. Method of administering a dynamic filtering firewall
US6324569B1 (en) 1998-09-23 2001-11-27 John W. L. Ogilvie Self-removing email verified or designated as such by a message distributor for the convenience of a recipient
US6460141B1 (en) 1998-10-28 2002-10-01 Rsa Security Inc. Security and access management system for web-enabled and non-web-enabled applications and content on a computer network
US6260043B1 (en) 1998-11-06 2001-07-10 Microsoft Corporation Automatic file format converter
US6282565B1 (en) 1998-11-17 2001-08-28 Kana Communications, Inc. Method and apparatus for performing enterprise email management
US6249807B1 (en) 1998-11-17 2001-06-19 Kana Communications, Inc. Method and apparatus for performing enterprise email management
US6272532B1 (en) 1998-12-02 2001-08-07 Harold F. Feinleib Electronic reminder system with universal email input
US6370648B1 (en) * 1998-12-08 2002-04-09 Visa International Service Association Computer network intrusion detection
US6546416B1 (en) 1998-12-09 2003-04-08 Infoseek Corporation Method and system for selectively blocking delivery of bulk electronic mail
US6550012B1 (en) 1998-12-11 2003-04-15 Network Associates, Inc. Active firewall system and methodology
US6249575B1 (en) 1998-12-11 2001-06-19 Securelogix Corporation Telephony security system
US6574737B1 (en) 1998-12-23 2003-06-03 Symantec Corporation System for penetrating computer or computer network
US6118856A (en) 1998-12-28 2000-09-12 Nortel Networks Corporation Method and apparatus for automatically forwarding an email message or portion thereof to a remote device
US6301668B1 (en) 1998-12-29 2001-10-09 Cisco Technology, Inc. Method and system for adaptive network security using network vulnerability assessment
JP2002535884A (en) * 1999-01-14 2002-10-22 タンブルウィード コミュニケーションズ コーポレイション Distribution of web-based secure email messages
US6487666B1 (en) 1999-01-15 2002-11-26 Cisco Technology, Inc. Intrusion detection signature analysis using regular expressions and logical operators
US20030023695A1 (en) * 1999-02-26 2003-01-30 Atabok Japan, Inc. Modifying an electronic mail system to produce a secure delivery system
US6405318B1 (en) 1999-03-12 2002-06-11 Psionic Software, Inc. Intrusion detection system
US6988199B2 (en) * 2000-07-07 2006-01-17 Message Secure Secure and reliable document delivery
US6324647B1 (en) 1999-08-31 2001-11-27 Michel K. Bowman-Amuah System, method and article of manufacture for security management in a development architecture framework
US6725381B1 (en) * 1999-08-31 2004-04-20 Tumbleweed Communications Corp. Solicited authentication of a specific user
US6304898B1 (en) 1999-10-13 2001-10-16 Datahouse, Inc. Method and system for creating and sending graphical email
US7363361B2 (en) 2000-08-18 2008-04-22 Akamai Technologies, Inc. Secure content delivery system
US6321267B1 (en) 1999-11-23 2001-11-20 Escom Corporation Method and apparatus for filtering junk email
US6363489B1 (en) * 1999-11-29 2002-03-26 Forescout Technologies Inc. Method for automatic intrusion detection and deflection in a network
US6343290B1 (en) * 1999-12-22 2002-01-29 Celeritas Technologies, L.L.C. Geographic network management system
AU2293601A (en) * 1999-12-30 2001-07-16 Tumbleweed Communications Corp. Sender-controlled post delivery handling of digitally delivered documents
IL134066A (en) 2000-01-16 2004-07-25 Eluv Holdings Ltd Key encrypted e-mail system
US20020016910A1 (en) * 2000-02-11 2002-02-07 Wright Robert P. Method for secure distribution of documents over electronic networks
US7159237B2 (en) 2000-03-16 2007-01-02 Counterpane Internet Security, Inc. Method and system for dynamic network intrusion monitoring, detection and response
US6826609B1 (en) * 2000-03-31 2004-11-30 Tumbleweed Communications Corp. Policy enforcement in a secure data file delivery system
US6519703B1 (en) * 2000-04-14 2003-02-11 James B. Joyce Methods and apparatus for heuristic firewall
US20030159070A1 (en) 2001-05-28 2003-08-21 Yaron Mayer System and method for comprehensive general generic protection for computers against malicious programs that may steal information and/or cause damages
JP2002056176A (en) 2000-06-01 2002-02-20 Asgent Inc Method and device for structuring security policy and method and device for supporting security policy structuring
US6892179B1 (en) 2000-06-02 2005-05-10 Open Ratings Inc. System and method for ascribing a reputation to an entity
US6892178B1 (en) 2000-06-02 2005-05-10 Open Ratings Inc. Method and system for ascribing a reputation to an entity from the perspective of another entity
US6895385B1 (en) 2000-06-02 2005-05-17 Open Ratings Method and system for ascribing a reputation to an entity as a rater of other entities
US20020023140A1 (en) * 2000-06-08 2002-02-21 Hile John K. Electronic document delivery system
US20030061506A1 (en) * 2001-04-05 2003-03-27 Geoffrey Cooper System and method for security policy
US7328349B2 (en) 2001-12-14 2008-02-05 Bbn Technologies Corp. Hash-based systems and methods for detecting, preventing, and tracing network worms and viruses
US20020046041A1 (en) * 2000-06-23 2002-04-18 Ken Lang Automated reputation/trust service
CA2410522C (en) 2000-06-30 2010-01-26 Andrea Soppera Packet data communications
US8661539B2 (en) 2000-07-10 2014-02-25 Oracle International Corporation Intrusion threat detection
AU2001283231A1 (en) * 2000-08-08 2002-02-18 Tumbleweed Communications Corp. Recipient-specified automated processing in a secure data file delivery system
AU2001281218A1 (en) * 2000-08-08 2002-02-18 Tumbleweed Communications Corp. Recipient-specified automated processing in a secure data file delivery system
US20020049853A1 (en) * 2000-08-16 2002-04-25 Tan-Na Chu End-to-end secure file transfer method and system
US7278159B2 (en) 2000-09-07 2007-10-02 Mazu Networks, Inc. Coordinated thwarting of denial of service attacks
US7043759B2 (en) * 2000-09-07 2006-05-09 Mazu Networks, Inc. Architecture to thwart denial of service attacks
US20020032871A1 (en) * 2000-09-08 2002-03-14 The Regents Of The University Of Michigan Method and system for detecting, tracking and blocking denial of service attacks over a computer network
US6650890B1 (en) * 2000-09-29 2003-11-18 Postini, Inc. Value-added electronic messaging services and transparent implementation thereof using intermediate server
US20030097439A1 (en) 2000-10-23 2003-05-22 Strayer William Timothy Systems and methods for identifying anomalies in network data streams
US20020078382A1 (en) 2000-11-29 2002-06-20 Ali Sheikh Scalable system for monitoring network system and components and methodology therefore
ATE344573T1 (en) 2000-11-30 2006-11-15 Lancope Inc FLOW-BASED NETWORK INTRUSION DETECTION
CA2327211A1 (en) 2000-12-01 2002-06-01 Nortel Networks Limited Management of log archival and reporting for data network security systems
US7818249B2 (en) 2001-01-02 2010-10-19 Verizon Patent And Licensing Inc. Object-oriented method, system and medium for risk management by creating inter-dependency between objects, criteria and metrics
GB2371125A (en) 2001-01-13 2002-07-17 Secr Defence Computer protection system
US20030051026A1 (en) * 2001-01-19 2003-03-13 Carter Ernst B. Network surveillance and security system
US7168093B2 (en) 2001-01-25 2007-01-23 Solutionary, Inc. Method and apparatus for verifying the integrity and security of computer networks and implementation of counter measures
US6983380B2 (en) 2001-02-06 2006-01-03 Networks Associates Technology, Inc. Automatically generating valid behavior specifications for intrusion detection
US7281267B2 (en) 2001-02-20 2007-10-09 Mcafee, Inc. Software audit system
US20020120853A1 (en) 2001-02-27 2002-08-29 Networks Associates Technology, Inc. Scripted distributed denial-of-service (DDoS) attack discrimination using turing tests
US20020143963A1 (en) 2001-03-15 2002-10-03 International Business Machines Corporation Web server intrusion detection method and apparatus
US7313822B2 (en) * 2001-03-16 2007-12-25 Protegrity Corporation Application-layer security method and system
US20020133365A1 (en) 2001-03-19 2002-09-19 William Grey System and method for aggregating reputational information
US7287280B2 (en) 2002-02-12 2007-10-23 Goldman Sachs & Co. Automated security management
US20020138759A1 (en) 2001-03-26 2002-09-26 International Business Machines Corporation System and method for secure delivery of a parcel or document
US20020147734A1 (en) 2001-04-06 2002-10-10 Shoup Randall Scott Archiving method and system
CN101567889B (en) 2001-04-13 2014-01-08 诺基亚公司 System and method for providing protection for networks
US6941478B2 (en) 2001-04-13 2005-09-06 Nokia, Inc. System and method for providing exploit protection with message tracking
US7603709B2 (en) 2001-05-03 2009-10-13 Computer Associates Think, Inc. Method and apparatus for predicting and preventing attacks in communications networks
US7769845B2 (en) 2001-05-04 2010-08-03 Whale Communications Ltd Method and system for terminating an authentication session upon user sign-off
US20030055931A1 (en) * 2001-09-18 2003-03-20 Cravo De Almeida Marcio Managing a remote device
CA2386491A1 (en) 2001-05-16 2002-11-16 Kasten Chase Applied Research Limited System for secure electronic information transmission
US7325252B2 (en) * 2001-05-18 2008-01-29 Achilles Guard Inc. Network security testing
US20030028803A1 (en) * 2001-05-18 2003-02-06 Bunker Nelson Waldo Network vulnerability assessment system and method
US20020178227A1 (en) 2001-05-25 2002-11-28 International Business Machines Corporation Routing instant messages using configurable, pluggable delivery managers
US7458094B2 (en) 2001-06-06 2008-11-25 Science Applications International Corporation Intrusion prevention system
US7350234B2 (en) * 2001-06-11 2008-03-25 Research Triangle Institute Intrusion tolerant communication networks and associated methods
US7308715B2 (en) * 2001-06-13 2007-12-11 Mcafee, Inc. Protocol-parsing state machine and method of using same
DE60135449D1 (en) 2001-06-14 2008-10-02 Ibm Intrusion detection in data processing systems
DE60220214T2 (en) * 2001-06-29 2008-01-24 Stonesoft Corp. Method and system for detecting intruders
US20030005326A1 (en) * 2001-06-29 2003-01-02 Todd Flemming Method and system for implementing a security application services provider
US6928549B2 (en) * 2001-07-09 2005-08-09 International Business Machines Corporation Dynamic intrusion detection for computer systems
US7356689B2 (en) * 2001-07-09 2008-04-08 Lucent Technologies Inc. Method and apparatus for tracing packets in a communications network
US7380279B2 (en) * 2001-07-16 2008-05-27 Lenel Systems International, Inc. System for integrating security and access for facilities and information systems
US7673342B2 (en) * 2001-07-26 2010-03-02 Mcafee, Inc. Detecting e-mail propagated malware
JP2003046576A (en) * 2001-07-27 2003-02-14 Fujitsu Ltd Message delivery system, message delivery management server, message distribution management program, and computer-readable recording medium with the program recorded thereon
US7243374B2 (en) * 2001-08-08 2007-07-10 Microsoft Corporation Rapid application security threat analysis
US7245632B2 (en) * 2001-08-10 2007-07-17 Sun Microsystems, Inc. External storage for modular computer systems
US7657935B2 (en) 2001-08-16 2010-02-02 The Trustees Of Columbia University In The City Of New York System and methods for detecting malicious email transmission
US7278160B2 (en) * 2001-08-16 2007-10-02 International Business Machines Corporation Presentation of correlated events as situation classes
US20030051163A1 (en) * 2001-09-13 2003-03-13 Olivier Bidaud Distributed network architecture security system
US20030065943A1 (en) 2001-09-28 2003-04-03 Christoph Geis Method and apparatus for recognizing and reacting to denial of service attacks on a computerized network
US8261059B2 (en) 2001-10-25 2012-09-04 Verizon Business Global Llc Secure file transfer and secure file transfer protocol
US20030084323A1 (en) 2001-10-31 2003-05-01 Gales George S. Network intrusion detection system and method
US20030135749A1 (en) 2001-10-31 2003-07-17 Gales George S. System and method of defining the security vulnerabilities of a computer system
US7444679B2 (en) 2001-10-31 2008-10-28 Hewlett-Packard Development Company, L.P. Network, method and computer readable medium for distributing security updates to select nodes on a network
JP2003150748A (en) 2001-11-09 2003-05-23 Asgent Inc Risk evaluation method
US20030093695A1 (en) 2001-11-13 2003-05-15 Santanu Dutta Secure handling of stored-value data objects
US7315944B2 (en) 2001-11-13 2008-01-01 Ericsson Inc. Secure handling of stored-value data objects
US20030095555A1 (en) 2001-11-16 2003-05-22 Mcnamara Justin System for the validation and routing of messages
US7487262B2 (en) 2001-11-16 2009-02-03 At & T Mobility Ii, Llc Methods and systems for routing messages through a communications network based on message content
US6546493B1 (en) 2001-11-30 2003-04-08 Networks Associates Technology, Inc. System, method and computer program product for risk assessment scanning based on detected anomalous events
US20030126464A1 (en) 2001-12-04 2003-07-03 Mcdaniel Patrick D. Method and system for determining and enforcing security policy in a communication session
US20030110392A1 (en) 2001-12-06 2003-06-12 Aucsmith David W. Detecting intrusions
KR100427449B1 (en) 2001-12-14 2004-04-14 한국전자통신연구원 Intrusion detection method using adaptive rule estimation in nids
US6754705B2 (en) 2001-12-21 2004-06-22 Networks Associates Technology, Inc. Enterprise network analyzer architecture framework
US7096500B2 (en) 2001-12-21 2006-08-22 Mcafee, Inc. Predictive malware scanning of internet data
US7400729B2 (en) 2001-12-28 2008-07-15 Intel Corporation Secure delivery of encrypted digital content
BR0215388A (en) 2001-12-31 2004-12-07 Citadel Security Software Inc Method and system for resolving vulnerabilities in a computer, and, readable by computer
JP4152108B2 (en) 2002-01-18 2008-09-17 株式会社コムスクエア Vulnerability monitoring method and system
US7076803B2 (en) 2002-01-28 2006-07-11 International Business Machines Corporation Integrated intrusion detection services
US7222366B2 (en) 2002-01-28 2007-05-22 International Business Machines Corporation Intrusion event filtering
US7268899B2 (en) 2002-01-31 2007-09-11 Hewlett-Packard Development Company, L.P. Secure system for delivery of a fax to a remote user
US20030149887A1 (en) 2002-02-01 2003-08-07 Satyendra Yadav Application-specific network intrusion detection
US7174566B2 (en) 2002-02-01 2007-02-06 Intel Corporation Integrated network intrusion detection
US8370936B2 (en) 2002-02-08 2013-02-05 Juniper Networks, Inc. Multi-method gateway-based network security systems and methods
US7073074B2 (en) 2002-02-13 2006-07-04 Microsoft Corporation System and method for storing events to enhance intrusion detection
KR100468232B1 (en) 2002-02-19 2005-01-26 한국전자통신연구원 Network-based Attack Tracing System and Method Using Distributed Agent and Manager Systems
WO2003071390A2 (en) 2002-02-19 2003-08-28 Postini Corporation E-mail management services
US6941467B2 (en) 2002-03-08 2005-09-06 Ciphertrust, Inc. Systems and methods for adaptive message interrogation through multiple queues
US7694128B2 (en) 2002-03-08 2010-04-06 Mcafee, Inc. Systems and methods for secure communication delivery
US7458098B2 (en) 2002-03-08 2008-11-25 Secure Computing Corporation Systems and methods for enhancing electronic communication security
US7124438B2 (en) 2002-03-08 2006-10-17 Ciphertrust, Inc. Systems and methods for anomaly detection in patterns of monitored communications
US7096498B2 (en) 2002-03-08 2006-08-22 Cipher Trust, Inc. Systems and methods for message threat management
AUPS193202A0 (en) 2002-04-23 2002-05-30 Pickup, Robert Barkley Mr A method and system for authorising electronic mail
WO2003092217A1 (en) 2002-04-23 2003-11-06 Patentek, Inc. Method and system for securely communicating data in a communications network
US20040203589A1 (en) 2002-07-11 2004-10-14 Wang Jiwei R. Method and system for controlling messages in a communication network
US8924484B2 (en) * 2002-07-16 2014-12-30 Sonicwall, Inc. Active e-mail filter with challenge-response
US7017186B2 (en) * 2002-07-30 2006-03-21 Steelcloud, Inc. Intrusion detection system using self-organizing clusters
JP3831696B2 (en) * 2002-09-20 2006-10-11 株式会社日立製作所 Network management apparatus and network management method
US7200658B2 (en) * 2002-11-12 2007-04-03 Movielink, Llc Network geo-location system
US20040111531A1 (en) 2002-12-06 2004-06-10 Stuart Staniford Method and system for reducing the rate of infection of a communications network by a software worm
US7171450B2 (en) 2003-01-09 2007-01-30 Microsoft Corporation Framework to enable integration of anti-spam technologies
US7543053B2 (en) * 2003-03-03 2009-06-02 Microsoft Corporation Intelligent quarantining for spam prevention
US20040177120A1 (en) 2003-03-07 2004-09-09 Kirsch Steven T. Method for filtering e-mail messages
US7676546B2 (en) 2003-03-25 2010-03-09 Verisign, Inc. Control and management of electronic messaging
GB0307913D0 (en) * 2003-04-05 2003-05-14 Hewlett Packard Development Co Management of peer-to-peer network using reputation services
US7051077B2 (en) 2003-06-30 2006-05-23 Mx Logic, Inc. Fuzzy logic voting method and system for classifying e-mail using inputs from multiple spam classifiers
US7769594B2 (en) * 2003-09-05 2010-08-03 France Telecom Evaluation of reputation of an entity by a primary evaluation centre
US20050102366A1 (en) 2003-11-07 2005-05-12 Kirsch Steven T. E-mail filter employing adaptive ruleset
US7644127B2 (en) 2004-03-09 2010-01-05 Gozoom.Com, Inc. Email analysis using fuzzy matching of text
US8918466B2 (en) 2004-03-09 2014-12-23 Tonny Yu System for email processing and analysis
WO2005116851A2 (en) 2004-05-25 2005-12-08 Postini, Inc. Electronic message source information reputation system
KR100628623B1 (en) 2004-08-02 2006-09-26 포스데이타 주식회사 Spam mail filtering system and method capable of recognizing and filtering spam mail in real time
US7933985B2 (en) * 2004-08-13 2011-04-26 Sipera Systems, Inc. System and method for detecting and preventing denial of service attacks in a communications system
US8010460B2 (en) * 2004-09-02 2011-08-30 Linkedin Corporation Method and system for reputation evaluation of online users in a social networking scheme
US20060095404A1 (en) 2004-10-29 2006-05-04 The Go Daddy Group, Inc Presenting search engine results based on domain name related reputation
US20060123083A1 (en) 2004-12-03 2006-06-08 Xerox Corporation Adaptive spam message detector
US20060230039A1 (en) 2005-01-25 2006-10-12 Markmonitor, Inc. Online identity tracking
US7519563B1 (en) * 2005-02-07 2009-04-14 Sun Microsystems, Inc. Optimizing subset selection to facilitate parallel training of support vector machines
US20060212931A1 (en) 2005-03-02 2006-09-21 Markmonitor, Inc. Trust evaluation systems and methods
US7822620B2 (en) 2005-05-03 2010-10-26 Mcafee, Inc. Determining website reputations using automatic testing
US7873583B2 (en) * 2007-01-19 2011-01-18 Microsoft Corporation Combining resilient classifiers
KR100996311B1 (en) * 2007-09-27 2010-11-23 야후! 인크. Method and system for detecting spam user created contentucc

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6662170B1 (en) * 2000-08-22 2003-12-09 International Business Machines Corporation System and method for boosting support vector machines
US20060112026A1 (en) * 2004-10-29 2006-05-25 Nec Laboratories America, Inc. Parallel support vector method and apparatus
US20070239642A1 (en) * 2006-03-31 2007-10-11 Yahoo!, Inc. Large scale semi-supervised linear support vector machines

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUCHUN TANG: 'GRANULAR SUPPORT VECTOR MACHINES BASED ON GRANULAR COMPUTING' SOFT COMPUTING AND STATISTICAL LEARNING May 2006, pages 15 - 16, 107-110 *

Also Published As

Publication number Publication date
WO2009094552A3 (en) 2009-11-05
US8160975B2 (en) 2012-04-17
US20090192955A1 (en) 2009-07-30

Similar Documents

Publication Publication Date Title
US8160975B2 (en) Granular support vector machine with random granularity
US10200393B2 (en) Selecting representative metrics datasets for efficient detection of anomalous data
US10846052B2 (en) Community discovery method, device, server and computer storage medium
US8069210B2 (en) Graph based bot-user detection
US10902062B1 (en) Artificial intelligence system providing dimension-level anomaly score attributions for streaming data
Harenberg et al. Community detection in large‐scale networks: a survey and empirical evaluation
Filkov et al. Integrating microarray data by consensus clustering
Chen et al. Eigen-optimization on large graphs by edge manipulation
Logeswari et al. An intrusion detection system for sdn using machine learning
US8700640B2 (en) System or apparatus for finding influential users
US10963463B2 (en) Methods for stratified sampling-based query execution
CN107357902A (en) A kind of tables of data categorizing system and method based on correlation rule
Ferriyan et al. Feature selection using genetic algorithm to improve classification in network intrusion detection system
WO2021109724A1 (en) Log anomaly detection method and apparatus
Doyle et al. Predicting complex user behavior from CDR based social networks
Canbek et al. New techniques in profiling big datasets for machine learning with a concise review of android mobile malware datasets
Nguyen et al. Detecting rumours with latency guarantees using massive streaming data
Carvalho et al. Survey on privacy-preserving techniques for data publishing
Kwon et al. Data-oob: out-of-bag estimate as a simple and efficient data value
US20220277219A1 (en) Systems and methods for machine learning data generation and visualization
Picado et al. Survivability of cloud databases-factors and prediction
CN112612832A (en) Node analysis method, device, equipment and storage medium
Kreačić et al. Differentially private synthetic data using KD-trees
Han et al. UFTR: A unified framework for ticket routing
Dickens et al. Interpretable Anomaly Detection with Mondrian P {\'o} lya Forests on Data Streams

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09703387

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09703387

Country of ref document: EP

Kind code of ref document: A2