Working with Recursive SQL Statements
Using recursive CTEs (common table expressions) can be useful when working with recursive or hierarchical data. Using a recursive CTE will allow us to fetch data using a single query, rather than needing to loop over and use several queries to build up the data as it comes in.
Example Context
When trading shares or equities you trade in what is called parcels of shares. A parcel is just a distinct number of shares bought at a particular time. For example, if I buy 200 shares of BHP on the 1/1/2025, and then 100 more on the 2/1/2025 I will have two parcels.
When shares are sold we can choose which parcels to sell first (or even split it across several parcels) so a parcel may be interacted on by many different operations. One such operation is a transfer, where a parcel may be either wholly or partially transferred to a different company. This leads to the creation of a new parcel.
For our example we are going to write a recursive SQL query that allows us to query a parcels history, even when it includes transfers, so we can find the originating parcel.
There are two flows that we shall be looking at, in the first, a simple Buy ➡️ Sell ➡️ Sell flow is done, however the second has a Buy ➡️ Sell ➡️ Transfer ➡️ Sell flow.
Example Setup
For our example we'll use PostgreSQL, if you have an existing environment simply run the below script to seed your database with the appropriate tables:
If you don't have a PostgreSQL environment handy you can just spin one up using docker, using the following compose:
services:
postgres:
image: postgres
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports:
- "5432:5432"
adminer:
image: adminer
ports:
- "8080:8080"
depends_on:
- postgres
Start the stack using docker compose up -d
and then navigate to http://localhost:8080. When logging in use 'postgres' as the server, username and password.
The Query
Now we can start working on our recursive query. The data we want to display is a history of all the transactions that lead to a particular transaction. Meaning if the parcel has been transferred we want to traverse the history of it using the linked original parcel (identified by the created_by_parcel_id
field.
When working with recursive queries there are two cases we need to be aware of, the first is the base case, that is the what starts our recursive query, and the second is the recursive case, what causes our query to traverse the recursive structure.
The base case for this query would be a simple select on the parcel table by id:
SELECT
id
FROM parcel
WHERE id = 3
Next we need to determine what the recursive case is, the easiest option is to always determine this separately, before combing it all together. For our recursive case we want to fetch the parcel linked to the current one by the created_by_parcel_id
column:
SELECT
*
FROM parcel p1
JOIN parcel p2 ON p2.id = p1.created_by_parcel_id
We can see that our query is working as expected here, as the first parcel is joined to itself, as is the second, but third parcel (created by a transfer) is also linked to the second parcel.
The last step is to link it all together, using a UNION ALL
to combine the base with the recursive case, and then putting both statements into a WITH RECURSIVE
block. Given we will have rows that reference themselves (i.e. parcel with id 2 will link back to itself) we also need to ensure we don't get stuck in a loop, so we can add a simple path check to ensure we don't return to visited parcels.
WITH RECURSIVE parcel_search AS (
-- BASE CASE
SELECT
id,
created_by_parcel_id,
ARRAY[id] AS path
FROM parcel
WHERE id = 3 -- Starting parcel ID
UNION ALL
-- RECURSIVE CASE
SELECT
p.id,
p.created_by_parcel_id,
ps.path || p.id
FROM parcel p
JOIN parcel_search ps ON p.id = ps.created_by_parcel_id
WHERE NOT p.id = ANY(ps.path)
)
SELECT *
FROM parcel_search;
Finally if we want to display our history in a nice to understand way we just need to update our main select, so as to display all the corresponding information.
WITH RECURSIVE parcel_search AS (
-- BASE CASE
SELECT
id,
created_by_parcel_id,
ARRAY[id] AS path
FROM parcel
WHERE id = 3 -- Starting parcel ID
UNION ALL
-- RECURSIVE CASE
SELECT
p.id,
p.created_by_parcel_id,
ps.path || p.id
FROM parcel p
JOIN parcel_search ps ON p.id = ps.created_by_parcel_id
WHERE NOT p.id = ANY(ps.path)
)
SELECT
t.transaction_date,
t.transaction_type,
t.stock,
c.name AS company_name,
pt.qty
FROM transaction t
JOIN parcel_transaction pt ON t.id = pt.transaction_id
JOIN company c ON t.company_id = c.id
JOIN parcel_search ps ON pt.parcel_id = ps.id
ORDER BY t.transaction_date;
