───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───
Data and Goal
- Data → a set of records (examples, instances, cases, rows, etc)
- attributes →
 - A class → each example is labelled with a pre-defined class
 
 - Goal → to learn a classification model from the data that can be used to predict the classes of new cases/instances
 
Classification Process
- Model construction involves describing a set of predetermined classes
- Each record is is assumed to belong to one predefined class
 - The set of records used for model construction is the training set
 - The model is represented as classification rules, decisions, or probabilistic models
 
 - Model usage is for classifying future or unknown objects
- Estimate the accuracy of the model
- The classified result compared with the known label of the test set
 - Accuracy rate is the percentage of test set examples that are correct
 - Test set must be independent of training set, otherwise over-fitting will occur
 - If the accuracy is acceptable, use the model to classify future data whose class labels are not known
 
 
 - Estimate the accuracy of the model
 
Decision Tree Construction Algorithm
- Attributes are categorical
 - Start at the rot and work down
 - For each level in the tree are selected based on some statistical measure (typically Gini index)
 - You stoop picking attributes when
- All samples for the given node belong to the same class
 - There are no remaining attributes for further partitioning
 
 
Gini Impurity
- Gini impurity is a measurement of likelihood for an incorrect classification of a new instance of a random variable
 - An attribute with lower Gini impurity is more accurate than attributes with higher Gini impurity
 
───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───