Indexed table

Author

Ricardo Rodrigo Basa

Published

January 7, 2024

In the previous chapter, we saw that an SQL DB parses through the whole table to retrieve rows. Because it does not know where the rows that match the provided conditions are, it has to check every row. This is why it does not matter where in the table the rows are located. This is where INDEXes come in.

Duplicate our table

We want to keep our non-indexed table so that we can still run non-indexed queries later.

It takes about 15 minutes to make a copy.

Then we add the primary key and indexes for fk_id and entry_date.

It takes about 20 minutes to add these indices.

Database Connection

In the background, we set up our environment, connect to the database, and turn on profiling.

A quick check of our tables:

Our tables are the same except for the indexes.

Show Profile function

Because we will run profiling repeatedly, it makes sense to write it into a function.

Query 6: Run the “first day” query with the benefit of an index

From 17 seconds, we are now down to below 0.007 seconds.

Query 7: Run the “last day” query with the benefit of an index

It is the same for our “last day” query. Below 0.007 seconds. The DB is not parsing the entire table anymore.

But, this is a little bit of a cheat. Remember that entry_date is already sorted. Rows with the same entry_dates are together. How much difference is there if the needles are scattered all over the table?

Query 8: Retrieve all fk_id = 45

We have a different column we can filter on. fk_id is not sorted. It is randomly distributed across the entire table. Let’s run a baseline on the non-indexed table.

As expected, we get about 17 seconds.

Query 9: Retrieve all fk_id = 45 with the help of an index

Then the same query from the indexed table.

We have a substantial improvement from 17 seconds to less than 5 seconds.

Questions

What if we need to filter by year? Or by year-month?