Database Performance: Redundant Indexes
Databases use sophisticated planning algorithms to determine best query execution strategy. An important aspect is deciding which indexes, if any, to use. This article explores why some indexes could be replaced by other existing indexes and how to identify them.
What Is an Index?
Ordinary database indexes are implement using B+ trees. We aren’t going to delve into their definition and implementation as it’d detract us from the goal of understanding why some indexes may be redundant. For the purposes of the article, you can think of an index as:
- A copy of the original table …
- containing a subset of its columns …
- along with a reference to the original row …
- and stored in sorted order.
This model is good enough to understand the principle behind redundant. Let’s work through an example demonstrating when one index can make another redundant. We’ll create a multi-column index and show how it can replace a single-column index.
Multi-column Index Enters the Scene
Imagine we’re working on a marketplace app where buyers and sellers can transact. Each transaction is recorded in the
transactions table along with references to the buyer and seller. The figure below presents an example table.
idis sorted but other columns are not.
One of requirements is that the seller should be able to list all transactions with a specific buyer. The list should be paginated and sorted starting with the most recent transaction. The query to fetch a page of results could look like the one below.
SELECT * FROM transactions WHERE seller_id = 1 AND buyer_id = 2 ORDER BY created_at DESC LIMIT 20 OFFSET 40
transactions is going to contain millions of rows and if it’s unindexed then the query is going to require a slow sequential table scan. In order to improve performance, we could create a specialized three-column index on
created_at (in descending order).
CREATE INDEX transactions_seller_id_buyer_id_created_at_desc_idx ON transactions(buyer_id, seller_id, created_at DESC)
This index is likely to improve performance significantly. In my synthetic benchmarks on a table of 10 million rows adding the index reduced execution time from 8,000 ms to less than 1 ms.
Replacing a Single-column Index
Browsing transactions with a specific buyer is a frequent use case but certainly not the only one. There are other less frequent uses cases that need to access seller’s transactions. Those use cases are almost certain to select rows by
seller_id and thus would benefit from an index on that column. The good news is we might be able to reuse the index created previously – no new index would be necessary.
To understand why, let’s visualize the index from the previous section as a table.
idrepresents a reference to the original row in
We can immediately see that
seller_id is already sorted. When executing a query, the database can turn that multi-column index into a single-column index on
seller_id by pretending it contains no other columns. It is as if we’ve already created an index on
It’s also evident
buyer_id is NOT sorted so it’s not true any column would work. There’s a slightly more complicated rule at play. In general, a multi-column index on
columnN can be used as an index on any prefix of the column list, that is:
column[N - 1]
There are many other concerns we need to keep in mind though. For example, the reasoning above breaks down for unique indexes – a unique index on
users(email, api_key) guarantees that a pair of values is unique but it’s still possible to have the same email with different API keys. If
We won’t go into further details as we wanted to understand the basic principle in action. Time to find its practical applications.
Redundant Indexes in Practice
Let’s start by devising an algorithm for identifying redundant indexes. In order to do so, we need to be able to tell whether one index can be replaced by another. We’ll approach the problem backwards and assume we’ve already implemented a method called
replacable_with?(index1, index2) that determines whether
index1 can be replaced with
replacable_with? available for use, it’s easy to identify redundant indexes: those are indexes that can be replaced with another index. The example code below demonstrates how we could approach identifying them in Ruby.
redundant_indexes = indexes.reject do |index| indexes.any? do |other_index| index != other_index && replaceable_with?(index, other_index) end end
That snippet, though simple, makes some implicit assumptions. It assumes there are no loops, i.e. there’s no situation where A replaces B and B replaces A. Under this limitation, we’ve just reduced the problem to defining and implementing
In the previous section, we touched upon the fact a multi-column index can serve as an index on any prefix of its column list. A more sophisticated definition of
replacable_with? is a topic for another article but the snippet in the figure below takes this one step further by taking uniqueness into account. Keep in mind there are many other properties that may affect whether one index can replace another – operator classes, partiality, index type (e.g. B+ tree, hash) and others.
def replaceable_with?(index1, index2) case [index1.unique, index2.unique] when [true, true] # If indexes are unique then must cover identical columns to be replaceable. index1.columns == index2.columns when [true, false] # Unique indexes cannot be replaced by non-unique ones. false else # In other cases check whether columns in index1 are a prefix of columns in # index2. prefix?(index1.columns, index2.columns) end end
The beauty of this approach is that if the algorithm produces incorrect results (either false positives or false negatives) then it’s enough to refine
Redundant Indexes in Rails Apps
If your app is Rails based (or Active Record-based in general) then you can take this one step further and use
active_record_doctor to identify potentially redundant indexes.
All it takes is adding
Gemfile, installing it and running:
bundle exec rails active_record_doctor:extraneous_indexes
The output will list redundant indexes (each on a separate line) along with any indexes they could be replaced with. For example:
remove index_users_on_last_name - can be replaced by index_users_on_last_name_and_first_name_and_email or unique_index_on_users_last_name_and_first_name
In the example above,
index_users_on_last_name can be removed in favor of the other indexes mentioned on the right.
In case you wanted to retain some of the reported indexes then you can configure
active_record_doctor to either skip individual indexes or all indexes on individual tables. Please consult the documentation for more information.
Indexing is both a science and an art. A complex database schema usually comes with comprehensive index coverage and simple performance tuning optimizations are hidden there. Realizing how multi-column indexes work could help save disk space and improve write performance without harming read performance.
If you’re working on a Rails app then I recommend you give
active_record_doctor a try.