CTEs for hierarchical data: Pokemon Evolutions
One of my favorite hobbies is video games. I think back to the birthday where I received a Nintendo GameBoy fondly. A device you could take with you anywhere and play games was amazing in the 90s. Of course, smart phones have made the idea ubiquitous for the last twenty years. Plus, you don't need to carry around a pack of AA batteries anymore! 😂
As big as the GameBoy (and its pack-in title, Tetris) was, it wasn't until Pokemon released that it took the world by storm. I don't need to explain the idea, but it is a role-playing game where you assemble and raise a party of monsters for battle. Through battle, these monsters gain experience and can grow into new monsters. Pokemon calls this evolution.
MonsterDB
Last year, I decided to spend some time building a web application covering the first set of games called MonsterDB. This project has a lot of neat features -- you can see a monster, the moves it learns, where to catch it, and more! For today's post, I'm going to cover how I structured a given monster's evolutions.
Consider the following monster
table (MonsterDB uses sqlite):
CREATE TABLE IF NOT EXISTS monster
(
id integer
constraint monster_pk
primary key autoincrement,
dex_id integer not null,
name TEXT not null
);
This encodes a few values for a monster:
- The first column is an autoincrementing synthetic primary key
- The second column is the game's representation of the monster's index number
- The last column is the monster's name in english
Evolutions
For those that haven't played the game, a monster can evolve into many other kinds of monsters, and those monsters can also do this, and so on. To express this data, we have a linking table for monsters:
CREATE TABLE IF NOT EXISTS evolution
(
id integer
constraint evolution_pk
primary key autoincrement,
from_monster_id integer,
to_monster_id integer,
constraint evolution_monster_id_id_fk
foreign key (from_monster_id, to_monster_id) references monster (id, id)
);
This table encodes the relationship between monsters:
- The first column is our synthetic primary key
- The second column is our starting monster, a foreign key to monster
- The third column is our ending monster, also a foreign key to monster
That's all we need to define the relationships between monsters. So what does this look like in practice? Consider the monster Bulbasaur:
Bulbasaur can become Ivysaur, which can then become Venusaur.
How could we query the database for this information? Armed with only the base monster, the linking table can only give us the monsters Bulbasaur can directly evolve into.
Hello CTEs
Armed with our linking table, we can write a query utilizing common table expressions to retrieve the list of monsters that come from a Bulbsaur.
What is a common table expression? Wikipedia to the rescue:
"A common table expression, or CTE, (in SQL) is a temporary named result set, derived from a simple query and defined within the execution scope of a SELECT, INSERT, UPDATE, or DELETE statement."
In other words, we can craft a named query that can then also be queried like a table. The syntax for such a query has a form similar to this in SQLite:
WITH <temporary-table-name> AS (
SELECT <columns>
FROM <table>
)
SELECT * FROM <temporary-table-name>;
Walking the evolutionary tree with a CTE
That inner SELECT
inside of the WITH
is where the magic happens. We can leverage other SQL features in there, such as a UNION
:
WITH tree AS (
SELECT
null as from_monster_id,
id AS to_monster_id
FROM monster WHERE id = 1
UNION
SELECT
e.from_monster_id,
e.to_monster_id
FROM evolution e
INNER JOIN tree t ON e.from_monster_id = t.to_monster_id
)
SELECT * FROM tree;
The first SELECT
sets up our initial row (the one containing Bulbasaur). The second SELECT
(the one following the UNION
) retrieves the rows that match as we "walk" the evolution tree.
When run this gives us the following output:
from_monster_id | to_monster_id
null | 1
1 | 2
2 | 3
You may not need that first row if you're not looking for the complete hierarchy. Suppose you're only interested in the evolutions. Simply tweak the first SELECT
to start at the first evolution:
WITH tree AS (
SELECT
from_monster_id,
to_monster_id
FROM evolution
WHERE from_monster_id = 1
UNION
SELECT
e.from_monster_id,
e.to_monster_id
FROM evolution e
INNER JOIN tree t ON e.from_monster_id = t.to_monster_id
)
SELECT * FROM tree;
Running this query produces the desired output:
from_monster_id | to_monster_id
1 | 2
2 | 3
That's all there is to it. CTEs are a powerful tool for walking hierarchical data. Pokemon evolutions happen to fit that description well, and this CTE made what could have been an ugly loop in some backend code into a concise database query.