4.1 An Overview of Classification | Introduction to Statistical Learning Using R Book Club

Processing math: 100%

4.1 An Overview of Classification

Classification: Approaches to make inference and/or predict qualitative (categorical) response variable
Few common classification techniques (classifiers):
- logistic regression
- linear discriminant analysis (LDA)
- quadratic discriminant analysis (QDA)
- naive Bayes
- K-nearest neighbors

- Examples of classification problems:

A person arrives at the emergency room with a set of symptoms that could possibly be attributed to one of three medical conditions. Which of the three conditions does the individual have?

Predictor variable: Symptoms
Response variable: Type of medical conditions

An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent, on the basis of the user’s IP address, past transaction history, and so forth.

Predictor variable: User’s IP address, past transaction history, etc
Response variable: Fraudulent activity (Yes/No)

On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not.

Predictor variable: DNA sequence data
Response variable: Presence of deleterious gene (Yes/No)
In the following section, we are going to explore the Default dataset. The annual incomes ( $X_1$ = income) and monthly credit card balances ( $X_2$ =balance) are used to predict whether whether an individual will default on his or her credit card payment.

The distribution of balance and income split by the binary default variable respectively; Note. Defaulters represented as orange plus sign; non-defaulters represented as blue circle

Figure 4.1: The distribution of balance and income split by the binary default variable respectively; Note. Defaulters represented as orange plus sign; non-defaulters represented as blue circle