Recursive CTE's

No matter what programming language I've used in my career there has always been an occasion where I needed to create a function and call it recursively so that the last calculated result is fed back into the same calculation as an input to the output for the next invocation.

In SQL we can do the same thing with Common Table Expressions (CTE's) that we can make recursive; being careful that they don't spiral out of control to run indefinitely.

Let's use some arbitrary base data like this (itself a CTE):

WITH RECURSIVE_TEST AS
 (SELECT ROWNUM AS EMP_RANK,
         'Emp #' || ROWNUM TITLE,
         CASE ROWNUM
            WHEN 1 THEN
             10
            WHEN 5 THEN
             40
            WHEN 6 THEN
             50
            WHEN 10 THEN
             5
            WHEN 11 THEN
             6
            WHEN 13 THEN
             8
            WHEN 15 THEN
             7
            WHEN 18 THEN
             4
            WHEN 19 THEN
             2
            WHEN 20 THEN
             1
         END AS COMMISSION
  FROM   DUAL
  CONNECT BY LEVEL <= 20)
SELECT EMP_RANK,
       TITLE,
       COMMISSION
FROM   RECURSIVE_TEST
ORDER  BY EMP_RANK;

OK let's say this is some sort of strange business where the higher your rank (1 being the lowest) you get your own commission plus the commission of the employee below your rank but it must include the result of the same calculation from that employee below them and any from any employee below them.

So employee #1 has a commission of 10, employees #2 through #4 don't have a commission of their own so they get 10 carried forward from #1 up the chain.

Employee #5 has a commission of 40 so she will get 10 + 40 = 50. The 10 she got is from employee #4 who in turn inherited that from #3, who got theirs from #2 who got the 10 from #1.

Employee #6 has a commission of 50 so they should get the 50 from employee #5 (not just the 40 as it needs to be the result of the calculation for employee #5) added to their own 50 = 100.

If we simply used the LAG function (ignoring nulls) to pull the commission forward that wouldn't solve the problem as it would give us this:

You can see that the LAG_COMM column simply pulls forward the last employee's commission value, not their calculated commission so the LAG_COMM_PLUS_COMM result isn't the 100 expected but only 90.

To solve this particular problem, we could just simply use an analytic function of SUM(COMMISSION) OVER(ORDER BY EMP_RANK) to get the desired result but we will use a recursive CTE as that is the whole point of this blog!

WITH RECURSIVE_TEST AS
 (SELECT ROWNUM AS EMP_RANK,
         'Emp #' || ROWNUM TITLE,
         CASE ROWNUM
            WHEN 1 THEN
             10
            WHEN 5 THEN
             40
            WHEN 6 THEN
             50
            WHEN 10 THEN
             5
            WHEN 11 THEN
             6
            WHEN 13 THEN
             8
            WHEN 15 THEN
             7
            WHEN 18 THEN
             4
            WHEN 19 THEN
             2
            WHEN 20 THEN
             1
         END AS COMMISSION
  FROM   DUAL
  CONNECT BY LEVEL <= 20),
RECURSIVE(EMP_RANK,
COMMISSION,
LAG_COMM_PLUS_COMM) AS
 (SELECT EMP_RANK,
         COMMISSION,
         COMMISSION AS LAG_COMM_PLUS_COMM
  FROM   RECURSIVE_TEST
  WHERE  EMP_RANK = 1
  UNION ALL
  SELECT B.EMP_RANK,
         B.COMMISSION,
         R.LAG_COMM_PLUS_COMM + NVL(B.COMMISSION, 0) AS LAG_COMM_PLUS_COMM
  FROM   RECURSIVE_TEST B
  JOIN   RECURSIVE R
  ON     R.EMP_RANK = B.EMP_RANK - 1)
SELECT EMP_RANK,
       COMMISSION,
       LAG_COMM_PLUS_COMM
FROM   RECURSIVE
ORDER  BY EMP_RANK;

Here we create our new CTE that has been arbitrarily named RECURSIVE, making sure we explicitly define it with the three columns it returns: EMP_RANK, COMMISSION and LAG_COMM_PLUS_COMM.

In this CTE you can see we are starting with the lowest rank of 1 as a way to prime the statement and so Oracle knows where to start and how to progress "up the hierarchy". The UNION ALL then iterates through all rows and looks back one row each time to pull forward its calculated value to apply it to the current row. This is the key difference to the LAG approach, the recursion sees the result of the prior row and uses that for the current row.

That's it and now we get the correct output for #6 and all their colleagues:

Just for completeness, here is the SQL to get the exact same result using the SUM analytic approach:

WITH RECURSIVE_TEST AS
 (SELECT ROWNUM AS EMP_RANK,
         'Emp #' || ROWNUM TITLE,
         CASE ROWNUM
            WHEN 1 THEN
             10
            WHEN 5 THEN
             40
            WHEN 6 THEN
             50
            WHEN 10 THEN
             5
            WHEN 11 THEN
             6
            WHEN 13 THEN
             8
            WHEN 15 THEN
             7
            WHEN 18 THEN
             4
            WHEN 19 THEN
             2
            WHEN 20 THEN
             1
         END AS COMMISSION
  FROM   DUAL
  CONNECT BY LEVEL <= 20)
SELECT EMP_RANK,
       TITLE,
       COMMISSION,
       SUM(COMMISSION) OVER(ORDER BY EMP_RANK) AS FINAL_COMMISSION
FROM   RECURSIVE_TEST
ORDER  BY EMP_RANK;