how the selection algorithm works

by: scienceJan 19, 2020 10:22:53 AM

so i took the time to read a bit about the algorithm behind the campus director stuff. thought i'd share

Binary or binomial classification is the task of classifying the elements of a given set into two groups (predicting which group each one belongs to) on the basis of a classification rule. Contexts requiring a decision as to whether or not an item has some qualitative property, some specified characteristic, or some typical binary classification include:

Medical testing to determine if a patient has certain disease or not – the classification property is the presence of the disease.
A "pass or fail" test method or quality control in factories, i.e. deciding if a specification has or has not been met – a Go/no go classification.
Information retrieval, namely deciding whether a page or an article should be in the result set of a search or not – the classification property is the relevance of the article, or the usefulness to the user.
Binary classification is dichotomization applied to practical purposes, and in many practical binary classification problems, the two 2 groups are not symmetric – rather than overall accuracy, the relative proportion of different types of errors is of interest. For example, in medical testing, a false positive (detecting a disease when it is not present) is considered differently from a false negative (not detecting a disease when it is present).

Contents
1 Statistical binary classification
2 Evaluation of binary classifiers
3 Converting continuous values to binary
4 See also
5 References
6 Bibliography
Statistical binary classification
Statistical classification is a problem studied in machine learning. It is a type of supervised learning, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification.

Some of the methods commonly used for binary classification are:

Decision trees
Random forests
Bayesian networks
Support vector machines
Neural networks
Logistic regression
Probit model
Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the feature vector, the noise in the data and many other factors. For example, random forests perform better than SVM classifiers for 3D point clouds.[1][2]

Evaluation of binary classifiers
Main article: Evaluation of binary classifiers

The left, and right, halves respectively contain instances that in fact have, and do not have, the condition. The oval contains instances that are classified (predicted) as positive (having the condition). Green and red respectively contain instances that are correctly (true), and wrongly (false), classified.
TP=True Positive; TN=True Negative; FP=False Positive (type I error); FN=False Negative (type II error); TPR=True Positive Rate; FPR=False Positive Rate; PPV=Positive Predictive Value; NPV=Negative Predictive Value.
There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. For example, in medicine sensitivity and specificity are often used, while in information retrieval precision and recall are preferred. An important distinction is between metrics that are independent on the prevalence (how often each category occurs in the population), and metrics that depend on the preva