MODEL EVALUATION

2/23/2016

mODEL EVALUATION IN GEneral

For a specific question, determine which one is better

Randomly divide datasets in to 1) training 2) validation 3) test

Cross-validated is considered if data is limited

EVALUATE A CLASSIFIER - Confusion matrix

Definition: entry i, j in a confusion matrix is the number of observations actually in group i, but predicted to be in group j.

Command:

metrics.confusion_matrix(y_true, y_pred), if you are using 0,1,2,3.... as your elements in y_true and y_pred; However, if you are using other variables like strings, you should use
metrics.confusion_matrix(y_true, y_pred, labels=["good","neutral","bad"])

ROC curve

Sensitivity: predicted true / total true
Specificity: predicted false / total false

ROC curve = fpr (false predicted value) v.s. ppr (positive predicted value)

For a given Y_true and Y_score, the ROC curve is plotted by changing the threshold of the mapping from Y_score to Y_predict, (remember that we are working on classification)

And it begins with a high threshold and end up with a low threshold, so at first, almost everypoint is defined as zero. so ppr and fpr both are zero, so this is the reason why ROC curve start from zero. Similarly it ends up at 1,1.

Comments