MachineLearning
Models
Below is the complete list of all machine learning models within the Patrol product:
- Auxililary Models: this collection of models supports downstream classifiers available to users
- FINRA Customer Response
Lifecycle
Evaluation
Proofpoint defines evaluation metrics for each ML classifier. For classifiers that output binary predictions, precision and recall are used as evaluation metrics in the model training and evaluation processes.
Training
ML classifiers are trained in an iterative method. Model selection and hyperparameter tuning are optimized according to the classifier evaluation metrics (i.e. precision and recall) on the validation set and are performed jointly. False positives and false negatives are analyzed in the training set and validation set and additional labeled data are collected for each subset whereupon model training then repeats until acceptable precision and recall scores are reached.
This classifier uses an ensemble approach, combining a bag of words classifier with a CNN that operates on the vectorized text. The text is vectorized by first tokenizing, then hashing, and finally by averaging the token vectors into one vector for the document. This document vector is passed to the downstream CNN.
Monitoring
Classifiers are updated in one of three scenarios: Customer Feedback, Regulatory Changes, Drift Assessment.
Customer Feedback
Model users may submit classifier feedback to Proofpoint. The false positives and false negatives that a customer submits triggers a data labeling exercise where more data of a similar form is gathered from Proofpoint customers’ data and labeled and added to the training set. The model is then retrained and reevaluated according to the “Training and Evaluation” section above.
Regulatory Changes
A change in regulatory statutes that underpin a classifier will trigger a retraining. If the regulatory statues have been added to, Proofpoint will source and label data for the new category from its customers’ data. If the regulatory statues have been amended, all training data and labels will be reviewed and adjusted as applicable. These new data are divided between the training and evaluation dataset. The model is then retrained and reevaluated according to the “Training and Evaluation” section above.
Drift assessment
A drift assessment is undertaken at periodic intervals; these intervals differ for each classifier and depends on the previous drift assessment results. Flagged content is pulled from customers’ data and the false positive rate is assessed by labeling all of the flagged content. The false negative rate is assessed by labeling a randomly sampling applicable content that was not flagged. If the measured false positive and false negative rate are not acceptable, the model is then retrained and reevaluated according to the “Training and Evaluation” section above.
Additional Notes
Customers of Social Patrol will be alerted to when a model is updated in the software release notes. No classifier is not updated dynamically or in real-time or in-line.