Nested set model practical examples, part I.

  • Viewed 9348×

  • Blog

Every developer will sooner or later solve a situation how to store hierarchical data (parent – child relation) in relational database. As there is a lot of information and code snippets for representing trees and hierarchies in SQL, I will demonstrate practical examples and source codes you can use and save time on your project.

As a developer I find the nested set model useful for creating the site structure in CMS, creating a geographical location structure, organizing categories in e-shop system and storing article discussions.

One stored procedure

The core of the tree traversing is based on one MySQL stored procedure r_tree_traversal() that handles all operations – insert, delete, move (with leaves) and sort (with leaves).
Selecting and finding subtrees is moved into other stored procedures, depending on what type of data your aplication needs. I also put the most into SQL so you have a minimum source code in your application (no extra classes or functions required). You should easily adapt the tree traversing to your programming language that is storing data in MySQL.

The source code is on GitHub (https://github.com/werc/TreeTraversal) where you find examples of using r_tree_traversal() stored procedure as well.

Benefits and drawbacks of the Tree traversal method

Performance bottleneck is updating the tree. Moving parent with childs and sorting in a branch which takes bunch of queries and it is executed with temporary table as a help.
But you can easily and fast select data and use them for website navigation, breadcrumbs, sitemap etc. Printing selected tree is possible without any recursion.

In situations where your application is reading data most of the time the nested set model is good solution. In cases where you need inserting and updating tree structure primarily you should try a different method of storing hierarchical data.

 

Inserting node into the Tree

Before we can select some data we need to insert them first, right. Let's create our tables. Yes, two tables because it will come that you will need to store translations of nodes (website pages, categories) so it will be ready for that. I will use naming "node" so it will follow the naming conventions used in Joe Celko's book (1).

mysql> select * from tree_map; 
+---------+-----+-----+-----------+
| node_id | lft | rgt | parent_id |
+---------+-----+-----+-----------+
|       1 |   1 |   2 |         0 |
+---------+-----+-----+-----------+

mysql> select * from tree_content; +----+---------+------+------+ | id | node_id | lang | name | +----+---------+------+------+ | 1 | 1 | en | Home | +----+---------+------+------+

Tables tree_map and tree_content with default values.

The lft and rgt columns store left and right side values of a "tree flow". Note the left and right are reserved keywords (R) in MySQL.

  • Rule 1
    The right side value rgt is allways greater than the node left side value lft.
  • Rule 2
    If rgt - lft = 1 then node does not have any leaves (is not a parent).

Let's insert some nodes and create a navigation structure for example. Nodes will be part of the new branch with the same parent (node Home with node_id = 1).

--Insert node HTML&CSS
CALL r_tree_traversal('insert', NULL, 1); --returns node_id = 2
INSERT INTO tree_content (node_id, name) VALUES (2, 'HTML&CSS');
--Insert node JS
CALL r_tree_traversal('insert', NULL, 1); --returns node_id = 3
INSERT INTO tree_content (node_id, name) VALUES (3, 'JS');
--Insert node PHP
CALL r_tree_traversal('insert', NULL, 1); --returns node_id = 4
INSERT INTO tree_content (node_id, name) VALUES (4, 'PHP'); 

Process of inserting node in our stored procedure r_tree_traversal('insert', NULL, pparent_id) is following.

SELECT rgt INTO new_lft FROM tree_map WHERE node_id = pparent_id;
UPDATE tree_map SET rgt = rgt + 2 WHERE rgt >= new_lft;
UPDATE tree_map SET lft = lft + 2 WHERE lft > new_lft;
INSERT INTO tree_map (lft, rgt, parent_id) VALUES (new_lft, (new_lft + 1), pparent_id);
SELECT LAST_INSERT_ID();

Note: New node is allways inserted at the end of the branch.

Tree traversal insert node
The process of inserting node PHP.

Now we will insert nodes under PHP. It will create a new branch and depth. A parent will be PHP node with node_id = 4.

CALL r_tree_traversal('insert', NULL, 4);
INSERT INTO tree_content (node_id, name) VALUES (5, 'Cloud');
CALL r_tree_traversal('insert', NULL, 4);
INSERT INTO tree_content (node_id, name) VALUES (6, 'Debugging');

Tables and tree structure after inserting nodes.

mysql> select * from tree_map; 
+---------+-----+-----+-----------+
| node_id | lft | rgt | parent_id |
+---------+-----+-----+-----------+
|       1 |   1 |  12 |         0 |
|       2 |   2 |   3 |         1 |
|       3 |   4 |   5 |         1 |
|       4 |   6 |  11 |         1 |
|       5 |   7 |   8 |         4 |
|       6 |   9 |  10 |         4 |
+---------+-----+-----+-----------+

mysql> select * from tree_content; +----+---------+------+-----------+ | id | node_id | lang | name | +----+---------+------+-----------+ | 1 | 1 | en | Home | | 2 | 2 | en | HTML&CSS | | 3 | 3 | en | JS | | 4 | 4 | en | PHP | | 5 | 5 | en | Cloud | | 6 | 6 | en | Debugging | +----+---------+------+-----------+
Tree traversal tree structure
Navigation tree will be default for other examples in the article.

Our navigation structure is created. Let's delete a node and describe the script.

 

Deleting node from the Tree

During the deleting process we need to find out if the node has some leaves (has_leafs > 1) or not. By using the Rule 2 we will check if rgt - lft = 1. Depending on that SQL has only one condition. Let's delete the PHP node (node_id = 4).

CALL r_tree_traversal('delete', 4, NULL);

And the process of deleting node PHP in procedure r_tree_traversal('delete', pnode_id, NULL).

SELECT lft, rgt, (rgt - lft), (rgt - lft + 1), parent_id 
  INTO new_lft, new_rgt, has_leafs, width, superior_parent 
  FROM tree_map WHERE node_id = pnode_id;

DELETE FROM tree_content WHERE node_id = pnode_id;

IF (has_leafs = 1) THEN
  DELETE FROM tree_map WHERE lft BETWEEN new_lft AND new_rgt;
  UPDATE tree_map SET rgt = rgt - width WHERE rgt > new_rgt;
  UPDATE tree_map SET lft = lft - width WHERE lft > new_rgt;
ELSE
  DELETE FROM tree_map WHERE lft = new_lft;
  UPDATE tree_map SET rgt = rgt - 1, lft = lft - 1, parent_id = superior_parent 
   WHERE lft BETWEEN new_lft AND new_rgt;
  UPDATE tree_map SET rgt = rgt - 2 WHERE rgt > new_rgt;
  UPDATE tree_map SET lft = lft - 2 WHERE lft > new_rgt;
END IF;

When the node with leaves is deleted than the leaf takes over the level and parent of deleted node.

Tree traversal delete
Navigation structure after deleting PHP node.

 

Moving node and leaves within the Tree

Moving a subtree inside the nested sets is probably hardest part to undersand. While we are moving a node we are changing it's parent and depth.

In this example node JS (node_id = 3) has a parent Home and we will move node JS under PHP (node_id = 4) in the navigation structure. In this case we are using a temporary table to hold subtrees extracted from tree_map and view vw_lftrgt later in the script.

CALL r_tree_traversal('move', 3, 4);
--second parameter is pnode_id, third parameter is pparent_id

The major part of the following script is from Celkos book (1). Bellow I will put an image of tree to visualise the process better.

CREATE TEMPORARY TABLE IF NOT EXISTS working_tree_map
(
   `node_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
   `lft` smallint(5) unsigned DEFAULT NULL,
   `rgt` smallint(5) unsigned DEFAULT NULL,
   `parent_id` smallint(5) unsigned NOT NULL,
    PRIMARY KEY (`node_id`)
);
     
-- put subtree into temporary table working_tree_map
INSERT INTO working_tree_map (node_id, lft, rgt, parent_id)
SELECT t1.node_id, 
       (t1.lft - (SELECT MIN(lft) FROM tree_map WHERE node_id = pnode_id)) AS lft,
       (t1.rgt - (SELECT MIN(lft) FROM tree_map WHERE node_id = pnode_id)) AS rgt,
       t1.parent_id
  FROM tree_map AS t1, tree_map AS t2
 WHERE t1.lft BETWEEN t2.lft AND t2.rgt AND t2.node_id = pnode_id;

DELETE FROM tree_map WHERE node_id IN (SELECT node_id FROM working_tree_map);

SELECT rgt INTO parent_rgt FROM tree_map WHERE node_id = pparent_id;
SET subtree_size = (SELECT (MAX(rgt) + 1) FROM working_tree_map); -- make a gap in tree UPDATE tree_map SET lft = (CASE WHEN lft > parent_rgt THEN lft + subtree_size ELSE lft END), rgt = (CASE WHEN rgt >= parent_rgt THEN rgt + subtree_size ELSE rgt END) WHERE rgt >= parent_rgt; INSERT INTO tree_map (node_id, lft, rgt, parent_id) SELECT node_id, lft + parent_rgt, rgt + parent_rgt, parent_id FROM working_tree_map; -- close gaps in tree UPDATE tree_map SET lft = (SELECT COUNT(*) FROM vw_lftrgt AS v WHERE v.lft <= tree_map.lft), rgt = (SELECT COUNT(*) FROM vw_lftrgt AS v WHERE v.lft <= tree_map.rgt); DELETE FROM working_tree_map; UPDATE tree_map SET parent_id = pparent_id WHERE node_id = pnode_id;
Tree traversal move node

 

Sorting in the Tree

One more function you may need is nodes in required order. If we are changing node position in a branch, the parent_id will be unchaged. We have to renumber lft, rgt values to the right in the branch.

Sorting alphabetically?

You can't simply sort nodes alphabetically while selecting from nested set model, but if you need to you can sort them manualy.

Let's put the PHP (node_id = 4) "behind" HTML&CSS (node_id = 2). We call the stored procedure with these parameters:

CALL r_tree_traversal('order', 4, 2);
--second parameter is pnode_id, third parameter is pparent_id
SELECT lft, rgt, (rgt - lft + 1), parent_id INTO old_lft, old_rgt, width, superior
  FROM tree_map WHERE node_id = pnode_id;

-- is parent 
SELECT parent_id INTO superior_parent FROM tree_map WHERE node_id = pparent_id;

IF (superior = superior_parent) THEN
   SELECT (rgt + 1) INTO new_lft FROM tree_map WHERE node_id = pparent_id;
ELSE
   SELECT (lft + 1) INTO new_lft FROM tree_map WHERE node_id = pparent_id;
END IF;

IF (new_lft != old_lft) THEN
CREATE TEMPORARY TABLE IF NOT EXISTS working_tree_map
(
   `node_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
   `lft` smallint(5) unsigned DEFAULT NULL,
   `rgt` smallint(5) unsigned DEFAULT NULL,
   `parent_id` smallint(5) unsigned NOT NULL,
    PRIMARY KEY (`node_id`)
);

INSERT INTO working_tree_map (node_id, lft, rgt, parent_id)
     SELECT t1.node_id,
	    (t1.lft - (SELECT MIN(lft) FROM tree_map WHERE node_id = pnode_id)) AS lft,
	    (t1.rgt - (SELECT MIN(lft) FROM tree_map WHERE node_id = pnode_id)) AS rgt,
	    t1.parent_id
       FROM tree_map AS t1, tree_map AS t2
      WHERE t1.lft BETWEEN t2.lft AND t2.rgt AND t2.node_id = pnode_id;
            
DELETE FROM tree_map WHERE node_id IN (SELECT node_id FROM working_tree_map);

IF(new_lft < old_lft) THEN -- move up
   UPDATE tree_map SET lft = lft + width WHERE lft >= new_lft AND lft < old_lft;
   UPDATE tree_map SET rgt = rgt + width WHERE rgt > new_lft AND rgt < old_rgt;
   UPDATE working_tree_map SET lft = new_lft + lft, rgt = new_lft + rgt;
END IF;

IF(new_lft > old_lft) THEN -- move down
   UPDATE tree_map SET lft = lft - width WHERE lft > old_lft AND lft < new_lft;
   UPDATE tree_map SET rgt = rgt - width WHERE rgt > old_rgt AND rgt < new_lft;
   UPDATE working_tree_map SET lft = (new_lft - width) + lft, rgt = (new_lft - width) + rgt;
END IF;

INSERT INTO tree_map (node_id, lft, rgt, parent_id)
SELECT node_id, lft, rgt, parent_id
  FROM working_tree_map;
            
DELETE FROM working_tree_map;
END IF;

Lft and rgt values are renumbered to the right from node_id.

Tree traversal sort

 

This was a brief explanation of functions in stored procedure r_tree_traversal(). You can find example and full source on GitHub and if you find some improvements you can create a pull request.

Footnotes

  1. (#) If you need more explanation about different types of data hierarchies and models I recommend Joe Celko's book Trees and hierarchies in SQL for Smarties.
  2. Managing Hierarchical Data in MySQL, Mike Hillyer's blog.
  3. I also recommend MySQL Stored Procedure Programming book by Guy Harrison with Steven Feuerstein to realize full potential of MySQL stored programs.