A simple approach on solving this complex problem of managing hierarchical data using MySQL. - 31 Dec 2012
One of the most complex problem I’ve seen as a developer is dealing with some kind of hierarchy in a data model. It’s not much about storing the relationship but more it comes to generating reports that take into account this hierarchy.
In this post, I will expose a simple approach I’ve found while dealing with these issues. For the sake of this demonstration we will use the following scenario:
You are managing an affiliate program in which you pay users $5 for each user they refer. On top of that you have a 3-tier commissions on purchases these users make. You pay 5% of the total amount to direct referrals, 2% on level 2 referrals and 1% on level 3 referrals.
Table Structure
Here is the table structure we will be using for this scenario.
Users
id | username | referrer_id |
1 | user.1 | NULL |
2 | user.2 | 1 |
3 | user.3 | NULL |
4 | user.4 | 3 |
5 | user.5 | 2 |
6 | user.6 | 2 |
7 | user.7 | 4 |
8 | user.8 | 7 |
9 | user.9 | 6 |
10 | user.10 | NULL |
Purchases
id | user_id | timestamp | amount |
1 | 1 | 2012-12-01 08:00 | 14.95 |
2 | 2 | 2012-12-01 08:15 | 24.95 |
3 | 4 | 2012-12-01 08:25 | 4.95 |
4 | 4 | 2012-12-01 08:40 | 9.95 |
5 | 8 | 2012-12-01 08:55 | 54.95 |
6 | 7 | 2012-12-01 09:10 | 34.95 |
7 | 10 | 2012-12-01 09:20 | 29.95 |
8 | 8 | 2012-12-01 09:35 | 39.95 |
9 | 2 | 2012-12-01 09:40 | 14.95 |
10 | 7 | 2012-12-01 10:15 | 19.95 |
The concept of a Lineage
The concept of a lineage is quite simple. You store as text the hierarchical representation in your Database. For doing this, all we need to do is to add a column called lineage in our users table. Then we need to fill this column.
The first query we will run will be to populate the lineage for users who do not have a referrer.
UPDATE `users`
SET `lineage` = `id`
WHERE `referrer_id` IS NULL
Then we will run the following query in a for loop until we have 0 affected rows. This will populate the lineage for the other users.
UPDATE `users` u1, `users` u2
SET u1.`lineage` = CONCAT(u2.`lineage`, '-', u1.`id`)
WHERE u1.`lineage` IS NULL
AND u2.`id` = u1.`referrer_id`
AND u2.`lineage` IS NOT NULL
The users table will look like the following once the lineage column has been populated.
id | username | referrer_id | lineage |
1 | user.1 | NULL | 1 |
2 | user.2 | 1 | 1-2 |
3 | user.3 | NULL | 3 |
4 | user.4 | 3 | 3-4 |
5 | user.5 | 2 | 1-2-5 |
6 | user.6 | 2 | 1-2-6 |
7 | user.7 | 4 | 3-4-7 |
8 | user.8 | 7 | 3-4-7-8 |
9 | user.9 | 6 | 1-2-6-9 |
10 | user.10 | NULL | 10 |
Calculating the sign up commission for a single user
This case is quite simple. It can be done by executing the following query.
SELECT u.`id`, u.`username`, COUNT(r.`id`) * 5 as `signup_commission`
FROM `users` u
LEFT JOIN `users` r ON u.`id` = r.`referrer_id`
WHERE u.`id` = '1'
Basically, we simply join the users table on the referrer_id and counting how many that gives for a single user. In this case, it will give the following result.
id | username | signup_commission |
1 | user.1 | 5 |
Calculating the sign up commission for all users
This case is quite similar to the previous case. All we have to do is remove the filter for the specific user and grouping by the user id.
SELECT u.`id`, u.`username`, COUNT(r.`id`) * 5 as `signup_commission`
FROM `users` u
LEFT JOIN `users` r ON u.`id` = r.`referrer_id`
GROUP BY u.`id`
This would generate the following results.
id | username | signup_commission |
1 | user.1 | 5 |
2 | user.2 | 10 |
3 | user.3 | 5 |
4 | user.4 | 5 |
5 | user.5 | 0 |
6 | user.6 | 5 |
7 | user.7 | 5 |
8 | user.8 | 0 |
9 | user.9 | 0 |
10 | user.10 | 0 |
Alternatively, you can calculate the commission of all users of a same level with this query.
SELECT u.`id`, u.`username`, COUNT(r.`id`) * 5 as `signup_commission`, length(u.`lineage`)-length(replace(u.`lineage`,"-","")) as `level`
FROM `users` u
LEFT JOIN `users` r ON u.`id` = r.`referrer_id`
GROUP BY u.`id`
HAVING `level` = 0
Calculating the purchases commission for a single user
This is where it starts to get complex and where the lineage becomes very useful. Basically, we will join the users table for the referrals but this time we will join using the lineage. It would give the following query.
SELECT u.`id`, u.`username`,
COALESCE(SUM(CASE
WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 1 THEN .05 * p.`amount`
WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 2 THEN .02 * p.`amount`
WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 3 THEN .01 * p.`amount`
ELSE 0
END), 0) as `purchases_commission`
FROM `users` u
LEFT JOIN `users` r ON r.`lineage` LIKE CONCAT(u.`lineage`, "-%")
LEFT JOIN `purchases` p ON p.`user_id` = r.`id`
WHERE u.`id` = '1'
In this case, it will give the following result.
id | username | signup_commission |
1 | user.1 | 1.9950 |
Calculating the purchases commission for all users
This one is similar to the previous one except that we replace the filter for the single user by grouping the users.
SELECT u.`id`, u.`username`,
COALESCE(SUM(CASE
WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 1 THEN .05 * p.`amount`
WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 2 THEN .02 * p.`amount`
WHEN length(replace(r.`lineage`, u.`lineage`, ""))-length(replace(replace(r.`lineage`, u.`lineage`, ""),"-","")) = 3 THEN .01 * p.`amount`
ELSE 0
END), 0) as `purchases_commission`
FROM `users` u
LEFT JOIN `users` r ON r.`lineage` LIKE CONCAT(u.`lineage`, "-%")
LEFT JOIN `purchases` p ON p.`user_id` = r.`id`
GROUP BY u.`id`
This would generate the following results.
id | username | signup_commission |
1 | user.1 | 1.9950 |
2 | user.2 | 0.0000 |
3 | user.3 | 2.7920 |
4 | user.4 | 4.6430 |
5 | user.5 | 0.0000 |
6 | user.6 | 0.0000 |
7 | user.7 | 4.7450 |
8 | user.8 | 0.0000 |
9 | user.9 | 0.0000 |
10 | user.10 | 0.0000 |
Creating the final commission report for all users
This is basically combining the queries we’ve been running above. The query would be the following.
SELECT u.`id`, u.`username`, COUNT(DISTINCT(r1.`id`)) * 5 +
COALESCE(SUM(CASE
WHEN length(replace(r2.`lineage`, u.`lineage`, ""))-length(replace(replace(r2.`lineage`, u.`lineage`, ""),"-","")) = 1 THEN .05 * p.`amount`
WHEN length(replace(r2.`lineage`, u.`lineage`, ""))-length(replace(replace(r2.`lineage`, u.`lineage`, ""),"-","")) = 2 THEN .02 * p.`amount`
WHEN length(replace(r2.`lineage`, u.`lineage`, ""))-length(replace(replace(r2.`lineage`, u.`lineage`, ""),"-","")) = 3 THEN .01 * p.`amount`
ELSE 0
END), 0) as `total_commission`
FROM `users` u
LEFT JOIN `users` r1 ON u.`id` = r1.`referrer_id`
LEFT JOIN `users` r2 ON r2.`lineage` LIKE CONCAT(u.`lineage`, "-%")
LEFT JOIN `purchases` p ON p.`user_id` = r2.`id`
GROUP BY u.`id`
This would generate the following results.
id | username | signup_commission |
1 | user.1 | 6.9950 |
2 | user.2 | 10.0000 |
3 | user.3 | 7.7920 |
4 | user.4 | 9.6430 |
5 | user.5 | 0.0000 |
6 | user.6 | 5.0000 |
7 | user.7 | 9.7450 |
8 | user.8 | 0.0000 |
9 | user.9 | 0.0000 |
10 | user.10 | 0.0000 |
To conclude this post, the lineage concept has been extremely useful to me. I’ve used it on several project to create advanced reports and real-time dashboards.
comments powered by Disqus