Especially in the computer-science oriented side of the machine learning literature, AUC (area under the receiver operator characteristic curve) is a popular criterion for evaluating classifiers. What are the justifications for using the AUC? E.g. is there a particular loss function for which the optimal decision is the classifier with the best AUC?
For binary classifiers $C$ used for ranking (i.e. for each example $e$ we have $C(e)$ in the interval $[0, 1]$) from which the AUC is measured the AUC is equivalent to the probability that $C(e_1) > C(e_0)$ where $e_1$ is a true positive example and $e_0$ is a true negative example. Thus, choosing a model with the maximal AUC minimizes the probability that $C(e_0) \geq C(e_1)$. That is, minimizes the loss of ranking a true negative at least as large as a true positive.
Let's take a simple example of identifying good tomato from a pool of good + bad tomato. Let's say number of good tomato are 100, and bad tomato are 1000, So a total of 1100. Now your job is to identify as many good tomato as possible. One way to get all good tomato are taking all 1100 tomato. But it clearly says you are not able to differentiate b/n good vs bad.
So, What's the right way of differentiating - need to get as many good ones with while picking up very few bad ones, So we need a measure something, which can say how many good ones we picked up and also say what's the bad ones count in it. AUC measure gives more weight if it's able to select more good ones with few bad ones as depicted below. which says how good you are able to differentiate b/n good and bad.