15,667,904 members
Articles / Programming Languages / Objective C
Article
Posted 5 Mar 2007

105.2K views
52 bookmarked

# Binary Tree Introduction

Rate me:
Description of binary trees and fast search in one-dimensional data

## Introduction

In this article, I have given an introduction of binary trees and hierarchical data structures. In the sample project, I have compared binary trees with qsort. The binary tree is defined with a C++ template. It can be used from any environment supporting C++, and for any data type supporting data comparison operators, `<` and `>`. The description is easy to follow. To use the template, you need to include BTree.h in your C++ project. For balancing and optimization of data insertion, I have used a simple reordering method instead of Red-Black and AVL trees. The advantage is that data insertion and tree-creation require slightly less time. Even if not perfectly balanced, this method ensures that the data search requires `<code>log2` comparing operations.

My major interests are multi-dimensional data structures so I mostly utilize algorithms which can be easily generalized for N-dimensions.

## Background

Binary trees are hierarchical data structures which allow insertion and a fast, nearest-neighbours search in one-dimensional data. It can be used instead of qsort and binary search to quickly find the closest points in a data array. The binary tree has the advantage of having a simple structure that allows generalization for more than one dimension - the so-called `KD Tree`. Therefore, it is good to understand how it works and how it performs data searches. I will keep the explanation clear and easy to follow.

First let's define a `tree`. A tree is a data-structure composed from a set of nodes. Each node has links to a number of other nodes called `children` and `parents`. The difference between them is that children are chosen by some `criterion` while the parent links merely keep the history of how we have reached a given node.

The binary tree has two children - `<code>Left` and `<code>Right` and one `parent`. The parent link is necessary for nearest-neighbours searching and for optimization of the tree.

Each node of the tree contains some type of data with value `X`. The left child node must have a smaller `X` value than the corresponding right value - `<code>X > Left->X` and `<code>Right->X > X`. This is the basic criterion of the binary tree. The tree also has a `<code>Root`. The root is the only node which does not have a `Parent`. When a new node with value `X` needs to be added to the tree, we always start from `Root` and go down in the tree until we reach an empty location. Each time when we pass through a node, we decide to go left or right by comparing the value `X` with the values of the node - if `X` is smaller, we go left, if it is bigger, we go right.

In the above example, the new value is `100`. To find the location of this value, follow the steps within the tree. Starting from the root we check if `100` is smaller or bigger then `127`. In this case it is smaller so we go left in the node with value `111`. Now, `100` is smaller than `111` so we go left to node `60`. This time `100` is bigger, but `60` does not have a `Right` child, it is an empty location, so we attach `100` here.

Now let's see how the binary tree can be used for something useful. Let's see how the limiting range and the nearest neighbour can be found. Let us choose a new value, `125`, for example. Between the two existing tree values, we need to determine which is `125`. Again, we start searching the `Parent` node of `125` and after few steps, we reach node `115`:

It is clear now that since `125` is the `Right` child of `115`, `125` is the closest from the leftmost neighbour. But the search for the right value is not so simple. In this case, the `Root` of the tree is `127`. The rule for binary trees is that the limiting value is the first `Parent` node with a value bigger than `125`. So we have to go up in the tree until we find a `Parent` with a value bigger than `125`. It is possible in some cases that the node has only one limiting neighbour. The code for binary tree declaration, data insertion and nearest neighbour search is given below.

## Code

The declaration and implementation of binary tree is in BTree.h. The class `BTNode` contains `insert` and `remove` functions. The template type is `X`, with two pointers to the `left` and `right` children, and one to the `parent`. I have made only two small extensions over the standard binary tree, and added one member property - `ParentOrientation` defined as `bool`, and one member function - `moveup()` used to optimize insertion.

C++
```//Binary Tree Node
template <class>
class BTNode
{
public:
BTNode();
BTNode(Xtype x);
bool Delete_Tree(BTNode<xtype>* Root);
BTNode(BTNode<xtype>* k1, BTNode<xtype>* k2);
BTNode<xtype> *Left, *Right, *Parent;
BTNode<xtype>* place(BTNode<xtype>* T);
BTNode<xtype>* insert(Xtype x,bool&root_changed,bool optimize=true);
bool insert(BTNode<xtype>* node,bool&root_changed,bool optimize=true);
void remove();
int count_childs();
bool  moveup();
Xtype x;
bool ParentOrientation ; //where is child WRT parent ? 0=left 1=right
};```

The `ParentOrientation` member is used for optimization of insertion and nearest-neighbour search. It identifies what the orientation of a child is with respect to its `parent`; if it is `0`, the child is from the left of the parent, if it is `1` it is on the right. Of course, we can determine this just by comparing the values of the child and the parent, but since the comparing operators `<` and `>` can be slow, I have used this flag to decrease their use. The member function `moveup()` tries to move the node up in the tree. It improves the balance of the tree and optimizes the `insert` operation. The `Insert` then looks like this:

C++
```//insert node to the tree, starts from current node
template <class>
bool
BTNode<xtype>::insert(BTNode<xtype>* node,bool&root_changed, bool optimize)
{
BTNode<xtype>* pNext = this;
BTNode<xtype>* pFather;
while(pNext) //do not insert if already exist
{
pFather = pNext;
if(node->x < pNext->x)
pNext = pNext->Left;
else if(node->x > pNext->x)
pNext = pNext->Right;
else
return false;
}

if(!pNext) //empty place, do not insert value twice
{
node->Parent = pFather;
if(node->x < pFather->x)
{
pFather->Left = node;
node->ParentOrientation = 0 ;
}
else
{
pFather->Right = node;
node->ParentOrientation = 1;
}

if(optimize)
root_changed = node->moveup();
else
root_changed = false ;

return true;
}
return false;
}```

Nearest-neighbour search is explained in the previous section. Below is the code:

C++
```/*   nearest neighbours search
input:
Root - the root of the tree
x - template type with search target value

output:
l - the left limiting value or INVALID_VALUE if there
is no neighbour to left
r - the right limiting value or INVALID_VALUE if there
is no neighbour to right

returns:
- the closest value to x
- INVALID_VALUE - empty tree
*/
template <class>
Xtype FindNearest(BTNode<xtype>* Root, Xtype x, Xtype* l, Xtype* r)
{
Xtype left,right;
BTNode<xtype>* pNext = Root;
BTNode<xtype>* pFather;
bool ParentOrientation ;
while(pNext)
{
pFather = pNext;
if(x<pnext->x)
{
pNext = pNext->Left;
ParentOrientation = false;
}
else if(x>pNext->x)
{
pNext = pNext->Right;
ParentOrientation = true;
}
else
{
*l = pNext->x;
*r = pNext->x;
return pNext->x;
}
}

if(!ParentOrientation)  //x < pFather->x
{
right = pFather->x;
//go up in the tree    and search for the left limit
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
while(pFather && !ParentOrientation)
{
//search parent to the left
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
}

if(!pFather)
{
*l = INVALID_VALUE;  //no neighbour to the left
*r = right;
return right;
}
else
{
*l = pFather->x;
*r = right;
return (x-pFather->x < right-x ? pFather->x:right);
}
}

else //x > pFather->x
{
left = pFather->x;
// go up in the tree and search for the right limit
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
while(pFather && ParentOrientation)
{
//search parent to the right
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
}

if(!pFather)
{
*l = left;
*r = INVALID_VALUE;   //no neighbour to the right
return left;
}
else
{
*l = left;
*r = pFather->x;
return (x-left < pFather->x-x ? left:pFather->x);;
}
}

return INVALID_VALUE; //empty tree
}```

As was mentioned above, optimization of the tree is realized with `moveup()` routine. Why is optimization needed? Well, if the values contained in the tree come in random (not ordered) sequence, the tree will be naturally balanced. However, there is no guarantee that values are not going to come in some non-random pattern. For example, if it is something like `2,5,8,11`..., then the binary tree will become a linked list and the search of nearest neighbours will become sequential search, which is ineffective. There are a lot of balancing methods - Red-Black,AVL... I prefer my own local tree optimization which removes the worst cases. The illustration below shows how it works:

Here `N` is the node to be inserted, `P` is its parent, and `G` is its grand-parent. If it happens like in both cases shown above on the left, or it mirror cases, it is possible to move `P` or `N` up, and `G` down in the tree. The three nodes - `N`, `P` and `G` are reordered as shown on the right. In this way, the height of the tree is decreased, the worst cases are corrected and the balance of the tree is improved. From the test I performed, I discovered that the `insert` operation and tree creation are less than 10% slower. If you increase the average height of randomly chosen data, you find that performance is still 10% slower, but this percentage rises with data size and ensures that the average height is always log2N, where N is the data size. This means that the average path which we will pass while looking for a value in a large array (100 000)will be about 20, because 2 of power 20 makes 100 000. The project source contains the application performing these tests. If you want to check the average tree height for different points, change the numbers for `POINTS_NUM` and change the `generate_random_point()` routine if it is too big.

A list of licenses authors might use can be found here.

Written By
Web Developer
Bulgaria
I live in Sofia, Bulgaria. I am a C++ programer, I had worked in the area of 3D graphics, Win32, GDI, databases and webservers, pattern recognition, inpainting. I am interested from algorithms,data mining, AI, image inpainting,recognition, automation systems.

 First Prev Next
 AVL Trees - Before and After Methods Ben McNamara13-Sep-15 19:17 Ben McNamara 13-Sep-15 19:17
 how to do it for ternary tree svknair29-Apr-11 19:57 svknair 29-Apr-11 19:57
 My vote of 5 ahchliuxing3-Apr-11 20:32 ahchliuxing 3-Apr-11 20:32
 My vote of 4 waenng1-Dec-10 21:00 waenng 1-Dec-10 21:00
 Nice Article Vasu Makam12-Dec-07 5:56 Vasu Makam 12-Dec-07 5:56
 Nice Article Vasu Makam12-Dec-07 5:56 Vasu Makam 12-Dec-07 5:56
 For myself, not from computer science or data strctures background, this is very good article explaining binary trees, sorting etc. However, "imiting range" and "the nearest neighbour" terms are missing explanation for a novice like me. But, I learned many. thanks for sharing.
 Handy Waldermort21-Sep-07 7:20 Waldermort 21-Sep-07 7:20
 BSP Emil - Gabriel25-Mar-07 22:24 Emil - Gabriel 25-Mar-07 22:24
 Sample and source code doesn't exist. SchweinDeBurg6-Mar-07 0:14 SchweinDeBurg 6-Mar-07 0:14
 Re: Sample and source code doesn't exist. vasja314c16-Mar-07 1:44 vasja314c1 6-Mar-07 1:44
 Re: Sample and source code doesn't exist. SchweinDeBurg6-Mar-07 1:49 SchweinDeBurg 6-Mar-07 1:49
 Last Visit: 31-Dec-99 18:00     Last Update: 4-Jun-23 14:28 Refresh 1