Customer Data
This is the complete dataset used to train and test our prediction model.
| Age | Income | Student | Credit Rating | Buys Computer? |
|---|---|---|---|---|
| youth | high | no | fair | no |
| youth | high | no | excellent | no |
| middle aged | high | no | fair | yes |
| senior | medium | no | fair | yes |
| senior | low | yes | fair | yes |
| senior | low | yes | excellent | no |
| middle aged | low | yes | excellent | yes |
| youth | medium | no | fair | no |
| youth | low | yes | fair | yes |
| senior | medium | yes | fair | yes |
| youth | medium | yes | excellent | yes |
| middle aged | medium | no | excellent | yes |
| middle aged | high | yes | fair | yes |
| senior | medium | no | excellent | no |
How the Model is Trained
The model learns from customer data using a process called "supervised learning".
Step 1: Train-Test Split
The dataset is split into two parts: a larger Training Set to teach the model, and a smaller Testing Set to evaluate its accuracy. The blue rows in the table above represent the testing data.
Full Dataset
14 records
→
Training Set
10 records
Testing Set
4 records
Step 2: Training the Random Forest
The model is a Random Forest, which is a collection of many individual Decision Trees. Each tree is trained on a random subset of the training data and features (like age, income, etc.). When making a prediction, all trees "vote", and the majority outcome becomes the final prediction. This makes the model more accurate and robust for predicting customer behavior.
Tree 1
Tree 2
Tree 3
...
Many Trees