4.1 An Overview of Classification
Classification: Approaches to make inference and/or predict qualitative (categorical) response variable
Few common classification techniques (classifiers):
- logistic regression
- linear discriminant analysis (LDA)
- quadratic discriminant analysis (QDA)
- naive Bayes
- K-nearest neighbors
- Examples of classification problems:
- A person arrives at the emergency room with a set of symptoms that could possibly be attributed to one of three medical conditions. Which of the three conditions does the individual have?
- Predictor variable: Symptoms
- Response variable: Type of medical conditions
- An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent, on the basis of the user’s IP address, past transaction history, and so forth.
- Predictor variable: User’s IP address, past transaction history, etc
- Response variable: Fraudulent activity (Yes/No)
- On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not.
Predictor variable: DNA sequence data
Response variable: Presence of deleterious gene (Yes/No)
In the following section, we are going to explore the
Default
dataset. The annual incomes (\(X_1\) =income
) and monthly credit card balances (\(X_2\) =balance
) are used to predict whether whether an individual will default on his or her credit card payment.