17/9/12

Data structures: the Tree (balancing headaches)


I need to explain an extra concept in this Data structure, so the post will be longer than usual. Worth it, though! 


What is a tree?


I apologize for the joke.

A tree is a structure that you must already know about if you've ever dealt with graphs. It consists of a first node that acts like a father father, or root, with a number of sons born from it, and linked with branches. Each son, in time, has its own sons, and so on and so forth. Those nodes who have no offspring are called leaves.
(The nodes are the letter by the way)
Now if you're already a step ahead of me, you might foresee trees are a very loose concept. There's so many types! So many sons, and directions! How can we possibly make a good structure out of this?

Having that in mind we'll introduce 2 restrictions:
  1. The tree must be binary; meaning, any node can have no more than 2 sons (but can have less!).
  2. The elements that are smaller (in any logic) than their father must hang on the right of its father; otherwise, they must hang from the left. You can do it vice versa if you prefer
Other than that, the other restrictions stand, like not mixing types of elements, etc.
Example of a binary ordered tree


What can you do with a tree?
  • Create: begin the tree data structure, reserve space.
  • Add: put a new element. Keep pushing it down the tree this way (by convention, you can change it if you prefer): to the left of the current element if it's bigger than it, or to the right if it's smaller than it.
  • Delete: remove the specified element.
  • Alter: change the value in the element.
  • Value: get the value of the element.
  • First: get the first element.
  • Size: gives the number of listed objects
  • Empty: checks whether the list has no elements.
    The tree has 3 levels: 4, 5-8 and 3-6-7-9. You can reach any number touching only 3 nodes.
Advantages
  • Finding costs less than in a list. Worst case scenario, a list must run through all the elements; with a tree, you only have to run through as many nodes as levels on a tree.
Setbacks
  • Adding and deleting cost more than in a list
  • This only works wonders if the trees are balanced

Balancing
Balancing is one of the major drawbacks of trees. Take this tree for instance.

What we've done here is correct, according to our restrictions. However, for five elements, we have four levels! Before we had a tree with 7 nodes that only had 3 levels! This is a serious efficiency problem that disrupts all the advantages of a tree.

Wouldn't it be much better if it looked like this?
Now we have 3 levels. What we've done here is balance a tree. When a tree self-balances automatically, we call it an AVL tree.
So how do we balance a tree?
Luckily for us, there is a logic, recursive algorithm to do just that. Unfortunately for us, like any new algorithm, it will take some time to understand it.

Courtesy of wikipedia: http://en.wikipedia.org/wiki/AVL_tree
Bear in mind:
  • 3, 4, 5 are the nodes
  • A, B, C, D are sub-trees born from their respective nodes. That comes to suggest you can recursively solve a tree's balancing by balancing their subtrees, one by one, in their smallest nature.

There are four actions you must take, depending on the four cases of imbalance.
  1. Left-Right: will lead to Left-Left. The root
  2. Left-Left: will balance the tree.
  3. Right-Left: will lead to Right-Right.
  4. Right-Right: will balance the tree.

Implementation
We can pretty much recycle the ListEl from the list; however we will use two pointers instead of one, for the left and right sons, thus creating a TreeEl. It should also be noted that this implementation does not contemplate balance.
/*
 * tree.h
 *
 *  Created on: 11/08/2012
 *      Author: Daniel
 */


#ifndef TREE_H_
#define TREE_H_

template<typename T>
class TreeEl { //Elements that go into the list

private:
    T* _element; //Actual information
    TreeEl *_left; //for binary trees; for bigger-offspringed trees you might consider an array of pointers.
    TreeEl *_right;

public:
    TreeEl() {
        _element = NULL;
        _left = NULL;
        _right = NULL;
    }

    TreeEl(T input) {
        _element = new T();
        *_element = input;
        _left = NULL;
        _right = NULL;
    }

    T getElement() {
        return *_element;
    }
    void setElement(T in) {
        _element = new T();
        *_element = in;
    }
    TreeEl* getLeft() {
        return _left;
    }

    TreeEl* getRight() {
        return _right;
    }

    bool pointLeftTo(TreeEl* obj) { //We retain the direction, not the element, for versatility; else we'd be playing a game of russian dolls.
        if (obj != NULL) { //little failsafe to ensure the TreeEl exists and we're not being fed null stuff.
            _left = obj;
            return true;
        }
        std::cout << "Error at pointTo, object is null";
        return false;
    }

    bool pointRightTo(TreeEl* obj) { //We retain the direction, not the element, for versatility; else we'd be playing a game of russian dolls.
        if (obj != NULL) { //little failsafe to ensure the TreeEl exists and we're not being fed null stuff.
            _right = obj;
            return true;
        }
        std::cout << "Error at pointTo, object is null";
        return false;
    }

    void clear() {
        _element = new T();
        _left = NULL;
        _right = NULL;
    }

    bool isLeaf() {
        if (_left == NULL && _right == NULL) { //That would mean no offspring -> end of tree
            return true;
        }
        return false;
    }
};

//------------------------------------------------------------------------------------------------------------------------------------------

//------------------------------------------------------------------------------------------------------------------------------------------

//------------------------------------------------------------------------------------------------------------------------------------------

template<typename T>
class Tree {

private:
    TreeEl<T> *_root; //first of them all.
    TreeEl<T> *_it; //iterator
    int _size;
public:
    Tree(T val) { //let's say it goes from less (left) to more (right)
        _root = new TreeEl<T>(val);
        _it = _root;
        _size = 1;
    }

    void root() {
        _it = _root;
    }
    void left() {
        _it = _it->getLeft();
    }
    void right() {
        _it = _it->getRight();
    }

    void add(T newv) {
        TreeEl<T> *taux = new TreeEl<T>(newv);
        _it = _root;
        while (!_it->isLeaf()) { //That would mean no offspring -> end of tree
            if (biggerOrEq(newv, _it->getElement())) {
                _it = _it->getRight();
            } else {
                _it = _it->getLeft();
            }
        }

        if (biggerOrEq(newv, _it->getElement())) {
            _it->pointRightTo(taux);
        } else {
            _it->pointLeftTo(taux);
        }
    }

    bool biggerOrEq(T newv, T oldv) { //to show the <s and >s should be implemented for more than one typename T, here I'll just implement for ints
        return newv >= oldv ? true : false;

        //This is a one-line way of writing if-else!
        // newv >= oldv ? true : false;
        // is newv bigger-or-equal towards oldv ? if so, true : if not, false;
    }

};

#endif /* TREE_H_ */



------

Now you try and make a function that fetches a node no matter where it is inside the tree.

Finally, hash maps!

No hay comentarios:

Publicar un comentario en la entrada