Every decision has the following steps:

- Determine the factor: In this example, it will be the absolute value of the correlation.
- Determine the correlation
- Analyze the absolute values of the correlation

- Determine the Split Value by taking the median.
- Add node to the decision tree.
- Split the data: left & right.

Repeat with both sides of the data (left first, then right because in the end you want to be right).

Given this data set, where X2, X10, and X11 are factors in the decision tree, and Y is the value determination:

row # | X2 | X10 | X11 | Y |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

1 | 0.725 | 0.390 | 10.900 | 5.000 |

2 | 0.560 | 0.500 | 9.400 | 6.000 |

3 | 0.735 | 0.570 | 9.800 | 5.000 |

4 | 0.610 | 0.630 | 8.400 | 3.000 |

5 | 0.260 | 0.630 | 11.800 | 8.000 |

6 | 0.500 | 0.680 | 10.500 | 7.000 |

7 | 0.320 | 0.780 | 10.000 | 6.000 |

## DECISION #0 - ROOT

Step 1: Determine the factor.

Step 1a: Determine the correlation

row # | X2 | X10 | X11 | Y |

correl | -0.731 | 0.406 | 0.826 |

Step 1b: Analyze the absolute values of the correlation

row # | X2 | X10 | X11 | Y |

correl | 0.731 | 0.406 | 0.826 |

The biggest impact is X11, so I will split on X11.

Step 2: Determine the Split Value by taking the median.

The median of X11 is **9.9**.

Step 3: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

Because the left tree always goes first and because the nodes are listed relatively, the left tree node will always be 1 (or nan for leaves).

Step 4: Split the data (red is left; right is green):

row # | X2 | X10 | X11 | Y |

correl | -0.731 | 0.406 | 0.826 | |

4 | 0.610 | 0.630 | 8.400 | 3.000 |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

2 | 0.560 | 0.500 | 9.400 | 6.000 |

3 | 0.735 | 0.570 | 9.800 | 5.000 |

7 | 0.320 | 0.780 | 10.000 | 6.000 |

6 | 0.500 | 0.680 | 10.500 | 7.000 |

1 | 0.725 | 0.390 | 10.900 | 5.000 |

5 | 0.260 | 0.630 | 11.800 | 8.000 |

## DECISION #1 - LEFT TREE

With my subtree, I now have this data:

row # | X2 | X10 | X11 | Y |

4 | 0.610 | 0.630 | 8.400 | 3.000 |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

2 | 0.560 | 0.500 | 9.400 | 6.000 |

3 | 0.735 | 0.570 | 9.800 | 5.000 |

Step 1: Determine the factor.

Step 1a: Determine the correlation.

row # | X2 | X10 | X11 | Y |

correl | -0.267 | -0.149 | 0.808 |

Step 1b: Analyze the absolute values of the correlation

row # | X2 | X10 | X11 | Y |

correl | 0.267 | 0.149 | 0.808 |

The biggest impact will be X11 again.

Step 2: Determine the Split Value by taking the median.

The median of X11 in this subtree is **9.25**.

Step 3: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | ? |

Since we don’t know where the right decision nodes are yet, we cannot update that.

Step 4: Split the data (red is left; right is green):

row # | X2 | X10 | X11 | Y |

correl | -0.267 | -0.149 | 0.808 | |

4 | 0.610 | 0.630 | 8.400 | 3.000 |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

2 | 0.560 | 0.500 | 9.400 | 6.000 |

3 | 0.735 | 0.570 | 9.800 | 5.000 |

### DECISION #1.1 - LEFT TREE: LEFT SUBTREE

Step 0: The data

row # | X2 | X10 | X11 | Y |

4 | 0.610 | 0.630 | 8.400 | 3.000 |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

Step 1: Determine the factor.

Step 1a: Determine the correlation

row # | X2 | X10 | X11 | Y |

correl | 1.000 | -1.000 | 1.000 |

Step 1b: Analyze the absolute values of the correlation

row # | X2 | X10 | X11 | Y |

correl | 1.000 | 1.000 | 1.000 |

All the correlations are the same, so let’s take the first one, X2.

Step 2: Determine the Split Value by taking the median.

X2’s median of this subtree is 0.748.

Step 3: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | ? |

2 | X2 | 0.748 | 1 | 2 |

Again, since we don’t know where the right decision nodes are yet, we cannot update that. However, since there are only two lines of data remaining, we know what the left and right relative node values will be. The left is 1 as always and the right here is 2, which will always be the case for a node containing two leaves.

Step 4: Split the tree (red is left; right is green):

row # | X2 | X10 | X11 | Y |

correl | 1.000 | -1.000 | 1.000 | |

4 | 0.610 | 0.630 | 8.400 | 3.000 |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

#### DECISION #1.1.1 - LEFT TREE: LEFT SUBTREE: LEFT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

4 | 0.610 | 0.630 | 8.400 | 3.000 |

Now that we only have one row, we have a leaf.

With a leaf, there is no factor to determine and no need to split any further. So we create a leaf by taking the Y value as the Split Value. Since it’s a leaf, it’s the end of the line, so there is no value for left and right. The value we will enter is NAN.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | ? |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

Again, since we don’t know where the right decision nodes are yet, we cannot update that anywhere.

#### DECISION #1.1.2 - LEFT TREE: LEFT SUBTREE: RIGHT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

Now that we only have one row, we have a leaf.

With a leaf, there is no factor to determine and no need to split any further. So we create a leaf by taking the Y value as the Split Value. Since it’s a leaf, it’s the end of the line, so there is no value for left and right. The value we will enter is NAN.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | ? |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

This completes the left subtree of the left tree.

#### DECISION #1.1 - UPDATE

Now that we know where the right tree of the left tree will start, let’s update that tree node’s right relative value. Since the tree node is node 1 and the right tree will start on node 5, the value is 4 (5-1).

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

### DECISION #1.1 – LEFT TREE: RIGHT SUBTREE

Step 0: The data

row # | X2 | X10 | X11 | Y |

2 | 0.560 | 0.500 | 9.400 | 6.000 |

3 | 0.735 | 0.570 | 9.800 | 5.000 |

Step 1: Determine the factor.

Step 1a: Determine the correlation

row # | X2 | X10 | X11 | Y |

correl | -1.000 | -1.000 | -1.000 |

Step 1b: Analyze the absolute values of the correlation

row # | X2 | X10 | X11 | Y |

correl | 1.000 | 1.000 | 1.000 |

All the correlations are the same, so let’s take the first one, X2.

Step 2: Determine the Split Value by taking the median.

X2’s median of this subtree is 0.648.

Step 3: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | ? |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

Again, since we don’t know where the right decision nodes are yet, we cannot update that. However, since there are only two lines of data remaining, we know what the left and right relative node values will be. The left is 1 as always and the right here is 2, which will always be the case for a node containing two leaves.

Step 4: Split the tree (red is left; right is green):

row # | X2 | X10 | X11 | Y |

correl | -1.000 | -1.000 | -1.000 | |

2 | 0.560 | 0.500 | 9.400 | 6.000 |

3 | 0.735 | 0.570 | 9.800 | 5.000 |

#### DECISION #1.1.1 - LEFT TREE: RIGHT SUBTREE: LEFT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

2 | 0.560 | 0.500 | 9.400 | 6.000 |

Now that we only have one row, we have a leaf.

With a leaf, there is no factor to determine and no need to split any further. So we create a leaf by taking the Y value as the Split Value. Since it’s a leaf, it’s the end of the line, so there is no value for left and right. The value we will enter is NAN.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | Nan |

Again, since we don’t know where the right decision nodes are yet, we cannot update that anywhere.

#### DECISION #1.1.2 - LEFT TREE: RIGHT SUBTREE: RIGHT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

3 | 0.735 | 0.570 | 9.800 | 5.000 |

Now that we only have one row, we have a leaf.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | ? |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

This completes the right subtree of the left tree.

### DECISION #1 - UPDATE

Now that we know where the right tree of the main tree will start, let’s update the root tree node’s right relative value. Since the tree node is node zero (0) and the right tree will start on node 8, the value is 8 (8-0).

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 |

## DECISION #2 - RIGHT TREE

With my subtree, I now have this data:

row # | X2 | X10 | X11 | Y |

7 | 0.320 | 0.780 | 10.000 | 6.000 |

6 | 0.500 | 0.680 | 10.500 | 7.000 |

1 | 0.725 | 0.390 | 10.900 | 5.000 |

5 | 0.260 | 0.630 | 11.800 | 8.000 |

Step 1: Determine the factor.

Step 1a: Determine the correlation.

row # | X2 | X10 | X11 | Y |

correl | -0.750 | 0.484 | 0.542 |

Step 1b: Analyze the absolute values of the correlation

row # | X2 | X10 | X11 | Y |

correl | 0.750 | 0.484 | 0.542 |

The biggest impact will be X2.

Step 2: Determine the Split Value by taking the median.

The median of X11 in this subtree is **0.410**.

Step 3: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | ? |

Since we don’t know where the right decision nodes are yet, we cannot update that.

Step 4: Split the data (red is left; right is green):

row # | X2 | X10 | X11 | Y |

correl | -0.267 | -0.149 | 0.808 | |

7 | 0.320 | 0.780 | 10.000 | 6.000 |

5 | 0.260 | 0.630 | 11.800 | 8.000 |

6 | 0.500 | 0.680 | 10.500 | 7.000 |

1 | 0.725 | 0.390 | 10.900 | 5.000 |

### DECISION #2.1 - RIGHT TREE: LEFT SUBTREE

Step 0: The data

row # | X2 | X10 | X11 | Y |

7 | 0.320 | 0.780 | 10.000 | 6.000 |

5 | 0.260 | 0.630 | 11.800 | 8.000 |

Step 1: Determine the factor.

Step 1a: Determine the correlation

row # | X2 | X10 | X11 | Y |

correl | -1.000 | -1.000 | 1.000 |

Step 1b: Analyze the absolute values of the correlation

row # | X2 | X10 | X11 | Y |

correl | 1.000 | 1.000 | 1.000 |

All the correlations are the same, so let’s take the first one, X2.

Step 2: Determine the Split Value by taking the median.

X2’s median of this subtree is 0.290.

Step 3: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | ? |

9 | X2 | 0.290 | 1 | 2 |

Again, since we don’t know where the right decision nodes are yet, we cannot update that. However, since there are only two lines of data remaining, we know what the left and right relative node values will be. The left is 1 as always and the right here is 2, which will always be the case for a node containing two leaves.

Step 4: Split the tree (red is left; right is green):

row # | X2 | X10 | X11 | Y |

correl | 1.000 | -1.000 | 1.000 | |

7 | 0.320 | 0.780 | 10.000 | 6.000 |

5 | 0.260 | 0.630 | 11.800 | 8.000 |

#### DECISION #2.1.1 - RIGHT TREE: LEFT SUBTREE: LEFT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

7 | 0.320 | 0.780 | 10.000 | 6.000 |

Now that we only have one row, we have a leaf.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | ? |

9 | X2 | 0.290 | 1 | 2 |

10 | LEAF | 6.000 | nan | nan |

Again, since we don’t know where the right decision nodes are yet, we cannot update that anywhere.

#### DECISION #2.1.2 - RIGHT TREE: LEFT SUBTREE: RIGHT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

0 | 0.885 | 0.330 | 9.100 | 4.000 |

Now that we only have one row, we have a leaf.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | ? |

9 | X2 | 0.290 | 1 | 2 |

10 | LEAF | 6.000 | nan | nan |

11 | LEAF | 4.000 | nan | nan |

This completes the left tree of the left tree.

#### DECISION #2.1 - Update

Now that we know where the right tree of the left tree will start, let’s update that tree node’s right relative value. Since the tree node is node 8 and the right tree will start on node 12, the value is 4 (12-8).

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | 4 |

9 | X2 | 0.290 | 1 | 2 |

10 | LEAF | 6.000 | nan | nan |

11 | LEAF | 4.000 | nan | nan |

12 |

### DECISION #2.2 – LEFT TREE: RIGHT SUBTREE

Step 0: The data

row # | X2 | X10 | X11 | Y |

6 | 0.500 | 0.680 | 10.500 | 7.000 |

1 | 0.725 | 0.390 | 10.900 | 5.000 |

Step 1: Determine the factor.

Step 1a: Determine the correlation

row # | X2 | X10 | X11 | Y |

correl | -1.000 | 1.000 | -1.000 |

Step 1b: Analyze the absolute values of the correlation

row # | X2 | X10 | X11 | Y |

correl | 1.000 | 1.000 | 1.000 |

All the correlations are the same, so let’s take the first one, X2.

Step 2: Determine the Split Value by taking the median.

X2’s median of this subtree is 0.648.

Step 3: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | 4 |

9 | X2 | 0.290 | 1 | 2 |

10 | LEAF | 6.000 | nan | nan |

11 | LEAF | 4.000 | nan | nan |

12 | X2 | 0.535 | 1 | 2 |

Step 4: Split the tree (red is left; right is green):

row # | X2 | X10 | X11 | Y |

correl | -1.000 | -1.000 | -1.000 | |

6 | 0.500 | 0.680 | 10.500 | 7.000 |

1 | 0.725 | 0.390 | 10.900 | 5.000 |

#### DECISION #2.2.1 - LEFT TREE: RIGHT SUBTREE: LEFT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

6 | 0.500 | 0.680 | 10.500 | 7.000 |

Now that we only have one row, we have a leaf.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | 4 |

9 | X2 | 0.290 | 1 | 2 |

10 | LEAF | 6.000 | nan | nan |

11 | LEAF | 4.000 | nan | nan |

12 | X2 | 0.535 | 1 | 2 |

13 | LEAF | 7.000 | nan | nan |

Again, since we don’t know where the right decision nodes are yet, we cannot update that anywhere.

#### DECISION #2.2.2 - LEFT TREE: RIGHT TREE: RIGHT LEAF

Step 0: The data

row # | X2 | X10 | X11 | Y |

1 | 0.725 | 0.390 | 10.900 | 5.000 |

Now that we only have one row, we have a leaf.

Step Final: Add node to the decision tree.

Tree | ||||

node | Factor | SplitVal | Left | Right |

0 | 11 | 9.900 | 1 | 8 |

1 | 11 | 9.250 | 1 | 4 |

2 | X2 | 0.748 | 1 | 2 |

3 | LEAF | 3.000 | nan | nan |

4 | LEAF | 4.000 | nan | nan |

5 | X2 | 0.648 | 1 | 2 |

6 | LEAF | 6.000 | nan | nan |

7 | LEAF | 5.000 | nan | nan |

8 | X2 | 0.410 | 1 | 4 |

9 | X2 | 0.290 | 1 | 2 |

10 | LEAF | 6.000 | nan | nan |

11 | LEAF | 4.000 | nan | nan |

12 | X2 | 0.535 | 1 | 2 |

13 | LEAF | 7.000 | nan | nan |

14 | LEAF | 5.000 | nan | nan |

This completes the right tree.

## CONCLUSION

This completes the decision tree. Now go forth and make this in Python.

## Leave a Reply