Synthetic Patient Dataset

Patient Data
This is the complete dataset used to train and test our prediction model.
Blood Pressure (Systolic)CholesterolHeart RateBlood SugarRisk
15025090130
risky
1181907295
risk less
16028095150
risky
12521078105
risk less
13522582115
risky
1151856892
risk less
14523088120
risky
1101806590
risk less
170290100160
risky
12220575100
risk less
How the Model is Trained
The model learns from patient data using a process called "supervised learning".

Step 1: Train-Test Split

The dataset is split into two parts: a larger Training Set to teach the model, and a smaller Testing Set to evaluate its accuracy. The blue rows in the table above represent the testing data.

Full Dataset

10 records

Training Set

8 records

Testing Set

2 records

Step 2: Training the Random Forest

The model is a Random Forest, which is a collection of many individual Decision Trees. Each tree is trained on a random subset of the training data and features (like blood pressure, cholesterol, etc.). When making a prediction, all trees "vote", and the majority outcome becomes the final prediction. This makes the model more accurate and robust for risk assessment.

Tree 1

Tree 2

Tree 3

...

Many Trees