Quickly finding unused indexes (and estimating their size)

I had a customer recently who needed to reduce their database size on disk quickly without a lot of messy schema redesign and application recoding. They didn’t want to drop any actual data, and their index usage was fairly high, so we decided to look for unused indexes that could be removed.

Collecting data

It’s quite easy to collect statistics about index usage in Percona Server (and others) using the User Statistics patch. By enabling ‘userstat_running’, we start to get information in the INFORMATION_SCHEMA.INDEX_STATISTICS table. This data collection does add some overhead to your running server, but it’s important to leave this running for a good long while to get a good dataset that is representative of as much of your workload as possible.

If you miss collecting index stats while some occasional queries run, you run the risk of dropping indexes that are being (seldomly) used, but are still important for the health of your system to have. This may or may not impact you, but I’d highly recommend you manually review the list of unused indexes being generated above before you simply drop them.

Depending on your sensitivity to production load, you may therefore want to run this several days, or just sample different short windows during your normal production peak. In either case, you may want to compare or repeat this index analysis, so let’s setup a separate schema to do this. Its important that this index analysis is on a server with your full production dataset loaded, but it could be a master, or just a slave somewhere (just be careful not to break replication!).

mysql> create schema index_analysis;

1	mysql> create schema index_analysis;

If our index_statistics are collecting on the same server, then we can simply get a snapshot of it into our schema with one command:

mysql> create table index_analysis.used_indexes select * from information_schema.index_statistics;

1	mysql> create table index_analysis.used_indexes select * from information_schema.index_statistics;

If the stats come from some other server, then you may need to dump and load a copy of that table into your working index_analysis schema.

Merging stats from several servers

In the case of this client, they had a master and several slaves taking read traffic. The index workload on these two sets of servers was different and I wanted to make sure I considered the index statistics from both of these sources. Be sure you include all relevant index stats from all aspects of your application, otherwise garbage-in, garbage-out and you risk dropping necessary indexes.

To accomplish merging multiple result sets, I gathered statistics from both their master and slave and loaded them into my schema as separate tables. Then I simply created a view of a UNION DISTINCT of those two tables:

mysql> create view used_indexes as 
   (select TABLE_SCHEMA, TABLE_NAME, INDEX_NAME from master_index_stats ) 
    UNION DISTINCT 
   (select TABLE_SCHEMA, TABLE_NAME, INDEX_NAME from slave_index_stats) 
   ORDER BY TABLE_SCHEMA, TABLE_NAME;

mysql> create view used_indexes as

(select TABLE_SCHEMA, TABLE_NAME, INDEX_NAME from master_index_stats )

UNION DISTINCT

(select TABLE_SCHEMA, TABLE_NAME, INDEX_NAME from slave_index_stats)

ORDER BY TABLE_SCHEMA, TABLE_NAME;

Now I can query the ‘all_known_index_usage’ and see the union of both of those datasets. This, of course, can be extended to all the datasets you want.

Interpreting the data

So, this is all well and good, but how do we then easily determine a list of indexes that are not being used? Well, for this we need to back to the INFORMATION_SCHEMA to get a list of ALL the indexes on my system (or at least the schemas I want to consider dropping indexes in). Let’s keep using views so this dynamically updates as our schema changes over time:

mysql> create view all_indexes as
select
   t.table_schema as TABLE_SCHEMA,
   t.table_name as TABLE_NAME,
   i.index_name as INDEX_NAME,
   i.NON_UNIQUE as NON_UNIQUE,
   count(*) as COLUMN_CNT,
   group_concat( i.column_name order by SEQ_IN_INDEX ASC SEPARATOR ',') as COLUMN_NAMES
from
   information_schema.tables t join information_schema.statistics i using (table_schema, table_name)
where
   t.table_schema like 'sakila%'
group by
   t.table_schema, t.table_name, i.index_name;

mysql> create view all_indexes as

select

t.table_schema as TABLE_SCHEMA,

t.table_name as TABLE_NAME,

i.index_name as INDEX_NAME,

i.NON_UNIQUE as NON_UNIQUE,

count(*) as COLUMN_CNT,

group_concat( i.column_name order by SEQ_IN_INDEX ASC SEPARATOR ',') as COLUMN_NAMES

from

information_schema.tables t join information_schema.statistics i using (table_schema, table_name)

where

t.table_schema like 'sakila%'

group by

t.table_schema, t.table_name, i.index_name;

Now I can query this view to see my indexes:

mysql> select * from all_indexes limit 1 G
*************************** 1. row ***************************
TABLE_SCHEMA: sakila
TABLE_NAME: actor
INDEX_NAME: idx_actor_last_name
NON_UNIQUE: 1
COLUMN_CNT: 1
COLUMN_NAMES: last_name
1 row in set (0.03 sec)

mysql> select * from all_indexes limit 1 G

*************************** 1. row ***************************

TABLE_SCHEMA: sakila

TABLE_NAME: actor

INDEX_NAME: idx_actor_last_name

NON_UNIQUE: 1

COLUMN_CNT: 1

COLUMN_NAMES: last_name

1 row in set (0.03 sec)

Now I need a way to find the set of indexes in all_indexes, but not in used_indexes. These indexes (if our original index statistics are good) are candidates to be dropped:

create view droppable_indexes as
select
   all_indexes.table_schema as table_schema,
   all_indexes.table_name as table_name,
   all_indexes.index_name as index_name
from
   all_indexes left join used_indexes using (TABLE_SCHEMA, TABLE_NAME, INDEX_NAME)
where
   used_indexes.INDEX_NAME is NULL and
   all_indexes.INDEX_NAME != 'PRIMARY' and
   all_indexes.NON_UNIQUE = 1;

create view droppable_indexes as

select

all_indexes.table_schema as table_schema,

all_indexes.table_name as table_name,

all_indexes.index_name as index_name

from

all_indexes left join used_indexes using (TABLE_SCHEMA, TABLE_NAME, INDEX_NAME)

where

used_indexes.INDEX_NAME is NULL and

all_indexes.INDEX_NAME != 'PRIMARY' and

all_indexes.NON_UNIQUE = 1;

Note that we also want to avoid dropping PRIMARY and UNIQUE indexes since those tend to enforce important application data constraints, so we added some additional criteria to the end of our SELECT.

I can now select my droppable (unused) indexes from this view:

mysql> select * from droppable_indexes;
+--------------+---------------+-----------------------------+
| table_schema | table_name    | index_name                  |
+--------------+---------------+-----------------------------+
| sakila       | actor         | idx_actor_last_name         |
| sakila       | address       | idx_fk_city_id              |
| sakila       | city          | idx_fk_country_id           |
| sakila       | customer      | idx_fk_address_id           |
| sakila       | customer      | idx_fk_store_id             |
| sakila       | customer      | idx_last_name               |
| sakila       | film          | idx_fk_language_id          |
| sakila       | film          | idx_fk_original_language_id |
| sakila       | film          | idx_title                   |
| sakila       | film_actor    | idx_fk_film_id              |
| sakila       | film_category | fk_film_category_category   |
| sakila       | film_text     | idx_title_description       |
| sakila       | inventory     | idx_fk_film_id              |
| sakila       | inventory     | idx_store_id_film_id        |
| sakila       | payment       | fk_payment_rental           |
| sakila       | payment       | idx_fk_customer_id          |
| sakila       | payment       | idx_fk_staff_id             |
| sakila       | rental        | idx_fk_customer_id          |  
| sakila       | rental        | idx_fk_inventory_id         |
| sakila       | rental        | idx_fk_staff_id             |
| sakila       | staff         | idx_fk_address_id           |
| sakila       | staff         | idx_fk_store_id             |
| sakila       | store         | idx_fk_address_id           |
+--------------+---------------+-----------------------------+
23 rows in set (0.02 sec)

mysql> select * from droppable_indexes;

+--------------+---------------+-----------------------------+

| table_schema | table_name | index_name |

+--------------+---------------+-----------------------------+

| sakila | actor | idx_actor_last_name |

| sakila | address | idx_fk_city_id |

| sakila | city | idx_fk_country_id |

| sakila | customer | idx_fk_address_id |

| sakila | customer | idx_fk_store_id |

| sakila | customer | idx_last_name |

| sakila | film | idx_fk_language_id |

| sakila | film | idx_fk_original_language_id |

| sakila | film | idx_title |

| sakila | film_actor | idx_fk_film_id |

| sakila | film_category | fk_film_category_category |

| sakila | film_text | idx_title_description |

| sakila | inventory | idx_fk_film_id |

| sakila | inventory | idx_store_id_film_id |

| sakila | payment | fk_payment_rental |

| sakila | payment | idx_fk_customer_id |

| sakila | payment | idx_fk_staff_id |

| sakila | rental | idx_fk_customer_id |

| sakila | rental | idx_fk_inventory_id |

| sakila | rental | idx_fk_staff_id |

| sakila | staff | idx_fk_address_id |

| sakila | staff | idx_fk_store_id |

| sakila | store | idx_fk_address_id |

+--------------+---------------+-----------------------------+

23 rows in set (0.02 sec)

From here I can use some clever SQL to generate the precise ALTER TABLE statements to drop these indexes, an exercise left to the reader. 🙂

Estimating the size of these indexes

But, what if we want to see if it’s worth doing first? Do these indexes actually represent a significant enough amount of disk space for it to be worth our while?

We need some more information to answer this question, but fortunately in Percona Server, we have it in the INFORMATION_SCHEMA.INNODB_INDEX_STATS table and the ‘index_total_pages’ column. A page in Innodb is (usually) 16k, so some simple math here should help us know how much disk space an index utilizes.

Let’s go update our all_indexes view to include this information:

mysql> drop view if exists all_indexes;
mysql> create view all_indexes as
select
   t.table_schema as TABLE_SCHEMA,
   t.table_name as TABLE_NAME,
   i.index_name as INDEX_NAME,
   i.NON_UNIQUE as NON_UNIQUE,
   count(*) as COLUMN_CNT,
   group_concat( i.column_name order by SEQ_IN_INDEX ASC SEPARATOR ',') as COLUMN_NAMES,
   s.index_total_pages as index_total_pages,
   (s.index_total_pages * 16384 ) as index_total_size 
from
   information_schema.tables t join information_schema.statistics i using (table_schema, table_name)
   join information_schema.innodb_index_stats s using (table_schema, table_name, index_name) 
where
   t.table_schema like 'sakila%'
group by
   t.table_schema, t.table_name, i.index_name;

mysql> drop view if exists all_indexes;

mysql> create view all_indexes as

select

t.table_schema as TABLE_SCHEMA,

t.table_name as TABLE_NAME,

i.index_name as INDEX_NAME,

i.NON_UNIQUE as NON_UNIQUE,

count(*) as COLUMN_CNT,

group_concat( i.column_name order by SEQ_IN_INDEX ASC SEPARATOR ',') as COLUMN_NAMES,

s.index_total_pages as index_total_pages,

(s.index_total_pages * 16384 ) as index_total_size

from

information_schema.tables t join information_schema.statistics i using (table_schema, table_name)

join information_schema.innodb_index_stats s using (table_schema, table_name, index_name)

where

t.table_schema like 'sakila%'

group by

t.table_schema, t.table_name, i.index_name;

Now we can can see index sizing information in the all_indexes view:

mysql> select * from all_indexesG
...
*************************** 33. row ***************************
TABLE_SCHEMA: sakila
TABLE_NAME: rental
INDEX_NAME: rental_date
NON_UNIQUE: 0
COLUMN_CNT: 3
COLUMN_NAMES: rental_date,inventory_id,customer_id
index_total_pages: 27
index_total_size: 442368
...

mysql> select * from all_indexesG

...

*************************** 33. row ***************************

TABLE_SCHEMA: sakila

TABLE_NAME: rental

INDEX_NAME: rental_date

NON_UNIQUE: 0

COLUMN_CNT: 3

COLUMN_NAMES: rental_date,inventory_id,customer_id

index_total_pages: 27

index_total_size: 442368

...

Now we just need to update our droppable_indexes view to use that information:

mysql> drop view if exists droppable_indexes;
mysql> create view droppable_indexes as
select
   all_indexes.table_schema as table_schema,
   all_indexes.table_name as table_name,
   all_indexes.index_name as index_name,
   ROUND(all_indexes.index_total_size / ( 1024 * 1024 ), 2) as index_size_mb 
from
   all_indexes left join used_indexes using (TABLE_SCHEMA, TABLE_NAME, INDEX_NAME)
where
   used_indexes.INDEX_NAME is NULL and
   all_indexes.INDEX_NAME != 'PRIMARY' and
   all_indexes.NON_UNIQUE = 1
order by index_size_mb desc;

mysql> drop view if exists droppable_indexes;

mysql> create view droppable_indexes as

select

all_indexes.table_schema as table_schema,

all_indexes.table_name as table_name,

all_indexes.index_name as index_name,

ROUND(all_indexes.index_total_size / ( 1024 * 1024 ), 2) as index_size_mb

from

all_indexes left join used_indexes using (TABLE_SCHEMA, TABLE_NAME, INDEX_NAME)

where

used_indexes.INDEX_NAME is NULL and

all_indexes.INDEX_NAME != 'PRIMARY' and

all_indexes.NON_UNIQUE = 1

order by index_size_mb desc;

Now we can easily see how big each index is if we dropped it (not big in this case with test data):

mysql> select * from droppable_indexes;
+--------------+---------------+-----------------------------+---------------+
| table_schema | table_name    | index_name                  | index_size_mb |
+--------------+---------------+-----------------------------+---------------+
| sakila       | payment       | fk_payment_rental           |          0.27 |
| sakila       | rental        | idx_fk_customer_id          |          0.27 |
| sakila       | rental        | idx_fk_inventory_id         |          0.27 |
| sakila       | rental        | idx_fk_staff_id             |          0.19 |
| sakila       | payment       | idx_fk_staff_id             |          0.17 |
| sakila       | payment       | idx_fk_customer_id          |          0.17 |
| sakila       | inventory     | idx_store_id_film_id        |          0.11 |
| sakila       | inventory     | idx_fk_film_id              |          0.08 |  
| sakila       | film_actor    | idx_fk_film_id              |          0.08 |
| sakila       | film          | idx_title                   |          0.05 |
| sakila       | film          | idx_fk_original_language_id |          0.02 |  
| sakila       | city          | idx_fk_country_id           |          0.02 |
| sakila       | film_category | fk_film_category_category   |          0.02 |
| sakila       | customer      | idx_last_name               |          0.02 |
| sakila       | store         | idx_fk_address_id           |          0.02 |
| sakila       | actor         | idx_actor_last_name         |          0.02 |
| sakila       | customer      | idx_fk_address_id           |          0.02 |
| sakila       | staff         | idx_fk_address_id           |          0.02 |
| sakila       | film          | idx_fk_language_id          |          0.02 |
| sakila       | address       | idx_fk_city_id              |          0.02 |
| sakila       | customer      | idx_fk_store_id             |          0.02 |
| sakila       | staff         | idx_fk_store_id             |          0.02 |
+--------------+---------------+-----------------------------+---------------+
22 rows in set (0.02 sec)

mysql> select * from droppable_indexes;

+--------------+---------------+-----------------------------+---------------+

+--------------+---------------+-----------------------------+---------------+

+--------------+---------------+-----------------------------+---------------+

22 rows in set (0.02 sec)

Recovering filesystem space

Now astute innodb experts will realize that this isn’t the end of the story when it comes to reclaiming disk space. You may have dropped the indexes, but the tablespaces on disk are still the same old size. If you use innodb_file_per_table, then you can rebuild the tablespace for your table by simply doing:

mysql> alter table mytable ENGINE=Innodb;

1	mysql> alter table mytable ENGINE=Innodb;

However, this blocks and on a large table can take quite some time. All the normal tricks and tips about doing a long blocking schema change without affecting your production environment apply here and is out of scope for this blog post.

Happy hunting for those unused indexes!

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jonathan Levin

11 years ago

You should have used my patent-pending Intensive Table Optimization technique from my PLUK tutorial:
http://www.jonathanlevin.co.uk/2012/12/tutorial-slides.html

Gregory Haase

11 years ago

Keep in mind with MySQL you almost always need to have space in order to free more space. Even with innodb_file_per_table, you may run into trouble if you have gotten that low on disk space. If you have a 5GB table and you are attempting to remove 500MB in indexes, you’lll still need at least an additional 4.5GB, since MySQL will retain the old copy of the table while building the new.

On a side note: I wish tutorials would make more use of NOT EXISTS vs LEFT JOIN NULL. I believe it is slightly more readable in that one immediately understands we are looking for a record in one table that does not exist in the other. Also, MySQL is likely to optimize that “left join null” using a “not exists” behind the scenes anyway.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Quickly finding unused indexes (and estimating their size)

Collecting data

Merging stats from several servers

Interpreting the data

Estimating the size of these indexes

Recovering filesystem space

Related

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Quickly finding unused indexes (and estimating their size)

Collecting data

Merging stats from several servers

Interpreting the data

Estimating the size of these indexes

Recovering filesystem space

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation