Hierarchical data in MySQL

A simple approach on solving this complex problem of managing hierarchical data using MySQL. - 31 Dec 2012

One of the most complex problem I’ve seen as a developer is dealing with some kind of hierarchy in a data model. It’s not much about storing the relationship but more it comes to generating reports that take into account this hierarchy.

In this post, I will expose a simple approach I’ve found while dealing with these issues. For the sake of this demonstration we will use the following scenario:

You are managing an affiliate program in which you pay users $5 for each user they refer. On top of that you have a 3-tier commissions on purchases these users make. You pay 5% of the total amount to direct referrals, 2% on level 2 referrals and 1% on level 3 referrals.

Table Structure

Here is the table structure we will be using for this scenario.

Users

id username referrer_id
1 user.1 NULL
2 user.2 1
3 user.3 NULL
4 user.4 3
5 user.5 2
6 user.6 2
7 user.7 4
8 user.8 7
9 user.9 6
10 user.10 NULL

Purchases

id user_id timestamp amount
1 1 2012-12-01 08:00 14.95
2 2 2012-12-01 08:15 24.95
3 4 2012-12-01 08:25 4.95
4 4 2012-12-01 08:40 9.95
5 8 2012-12-01 08:55 54.95
6 7 2012-12-01 09:10 34.95
7 10 2012-12-01 09:20 29.95
8 8 2012-12-01 09:35 39.95
9 2 2012-12-01 09:40 14.95
10 7 2012-12-01 10:15 19.95

The concept of a Lineage

The concept of a lineage is quite simple. You store as text the hierarchical representation in your Database. For doing this, all we need to do is to add a column called lineage in our users table. Then we need to fill this column.

The first query we will run will be to populate the lineage for users who do not have a referrer.

UPDATE `users`
SET `lineage` = `id`
WHERE `referrer_id` IS NULL

Then we will run the following query in a for loop until we have 0 affected rows. This will populate the lineage for the other users.

UPDATE `users` u1, `users` u2
SET u1.`lineage` = CONCAT(u2.`lineage`, '-', u1.`id`)
WHERE u1.`lineage` IS NULL
AND u2.`id` = u1.`referrer_id`
AND u2.`lineage` IS NOT NULL

The users table will look like the following once the lineage column has been populated.

id username referrer_id lineage
1 user.1 NULL 1
2 user.2 1 1-2
3 user.3 NULL 3
4 user.4 3 3-4
5 user.5 2 1-2-5
6 user.6 2 1-2-6
7 user.7 4 3-4-7
8 user.8 7 3-4-7-8
9 user.9 6 1-2-6-9
10 user.10 NULL 10

Calculating the sign up commission for a single user

This case is quite simple. It can be done by executing the following query.

SELECT u.`id`, u.`username`, COUNT(r.`id`) * 5 as `signup_commission`
FROM `users` u
LEFT JOIN `users` r ON u.`id` = r.`referrer_id`
WHERE u.`id` = '1'

Basically, we simply join the users table on the referrer_id and counting how many that gives for a single user. In this case, it will give the following result.

id username signup_commission
1 user.1 5

Calculating the sign up commission for all users

This case is quite similar to the previous case. All we have to do is remove the filter for the specific user and grouping by the user id.

SELECT u.`id`, u.`username`, COUNT(r.`id`) * 5 as `signup_commission`
FROM `users` u
LEFT JOIN `users` r ON u.`id` = r.`referrer_id`
GROUP BY u.`id`

This would generate the following results.

id username signup_commission
1 user.1 5
2 user.2 10
3 user.3 5
4 user.4 5
5 user.5 0
6 user.6 5
7 user.7 5
8 user.8 0
9 user.9 0
10 user.10 0

Alternatively, you can calculate the commission of all users of a same level with this query.

SELECT u.`id`, u.`username`, COUNT(r.`id`) * 5 as `signup_commission`, length(u.`lineage`)-length(replace(u.`lineage`,"-","")) as `level`
FROM `users` u
LEFT JOIN `users` r ON u.`id` = r.`referrer_id`
GROUP BY u.`id`
HAVING `level` = 0

Calculating the purchases commission for a single user

This is where it starts to get complex and where the lineage becomes very useful. Basically, we will join the users table for the referrals but this time we will join using the lineage. It would give the following query.

SELECT u.`id`, u.`username`, 
COALESCE(SUM(CASE
	WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 1 THEN .05 * p.`amount`
	WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 2 THEN .02 * p.`amount`
	WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 3 THEN .01 * p.`amount`
	ELSE 0
END), 0) as `purchases_commission`
FROM `users` u
LEFT JOIN `users` r ON r.`lineage` LIKE CONCAT(u.`lineage`, "-%")
LEFT JOIN `purchases` p ON p.`user_id` = r.`id`
WHERE u.`id` = '1'

In this case, it will give the following result.

id username signup_commission
1 user.1 1.9950

Calculating the purchases commission for all users

This one is similar to the previous one except that we replace the filter for the single user by grouping the users.

SELECT u.`id`, u.`username`, 
COALESCE(SUM(CASE
	WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 1 THEN .05 * p.`amount`
	WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 2 THEN .02 * p.`amount`
	WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 3 THEN .01 * p.`amount`
	ELSE 0
END), 0) as `purchases_commission`
FROM `users` u
LEFT JOIN `users` r ON r.`lineage` LIKE CONCAT(u.`lineage`, "-%")
LEFT JOIN `purchases` p ON p.`user_id` = r.`id`
GROUP BY u.`id`

This would generate the following results.

id username signup_commission
1 user.1 1.9950
2 user.2 0.0000
3 user.3 2.7920
4 user.4 4.6430
5 user.5 0.0000
6 user.6 0.0000
7 user.7 4.7450
8 user.8 0.0000
9 user.9 0.0000
10 user.10 0.0000

Creating the final commission report for all users

This is basically combining the queries we’ve been running above. The query would be the following.

SELECT u.`id`, u.`username`, COUNT(DISTINCT(r1.`id`)) * 5 +
COALESCE(SUM(CASE
	WHEN length(replace(r2.`lineage`, u.`lineage`, ""))-length(replace(replace(r2.`lineage`, u.`lineage`, ""),"-","")) = 1 THEN .05 * p.`amount`
	WHEN length(replace(r2.`lineage`, u.`lineage`, ""))-length(replace(replace(r2.`lineage`, u.`lineage`, ""),"-","")) = 2 THEN .02 * p.`amount`
	WHEN length(replace(r2.`lineage`, u.`lineage`, ""))-length(replace(replace(r2.`lineage`, u.`lineage`, ""),"-","")) = 3 THEN .01 * p.`amount`
	ELSE 0
END), 0) as `total_commission`
FROM `users` u
LEFT JOIN `users` r1 ON u.`id` = r1.`referrer_id`
LEFT JOIN `users` r2 ON r2.`lineage` LIKE CONCAT(u.`lineage`, "-%")
LEFT JOIN `purchases` p ON p.`user_id` = r2.`id`
GROUP BY u.`id`

This would generate the following results.

id username signup_commission
1 user.1 6.9950
2 user.2 10.0000
3 user.3 7.7920
4 user.4 9.6430
5 user.5 0.0000
6 user.6 5.0000
7 user.7 9.7450
8 user.8 0.0000
9 user.9 0.0000
10 user.10 0.0000

To conclude this post, the lineage concept has been extremely useful to me. I’ve used it on several project to create advanced reports and real-time dashboards.

MySQL
MySQL , Join , Hierarchy
comments powered by Disqus