Managing indexes

Indexes, as mentioned already, are fast look-up tables for fields in a table. Searching through a book page-by-page for a word is very slow work, but using the index to find all references to that word takes just 10 seconds - the same is true of database tables.

Here's the table schema for the golfscores table:

mysql> DESC golfscores;
| Field       | Type    | Null | Key | Default | Extra          |
| ID          | int(11) |      | PRI | NULL    | auto_increment |
| MemberID    | int(11) | YES  |     | NULL    |                |
| Score       | int(11) | YES  |     | NULL    |                |
| DateEntered | int(11) | YES  |     | NULL    |                |

To demonstrate the difference indexing makes, I added 1,000,000 example rows to the table that simulate the scores of 1000 players in the golf club. Yes, that's quite a few games to have been played on a golf course, but it's relatively few for other tables!

Now, to count the number of times a player has scored precisely 100, I'd execute a query like this:

mysql> SELECT count(MemberID) FROM golfscores WHERE Score = 100;
| count(MemberID) |
|           27786 |
1 row in set (0.39 sec)

The return value isn't important, but do notice the time taken to execute the query - 0.39 seconds. That's very fast for a million records, but still that's 0.40 seconds at 100% CPU load - MySQL really working its hardest to find all the rows that match and count them.

If Score is something I intend to query regularly, it's probably a smart idea to add an index to it: to tell MySQL to store all the Scores in a special place (behind the scenes) so that they can be searched extra-fast. To do this, use this SQL:

ALTER TABLE golfscores ADD INDEX IdxScore(Score);

Most of that is pretty obvious, except perhaps the last part: the first part, IdxScore, is the name I want to give to the index, and the part in brackets, Score, is the field I want to index. You don't actually need to name your indexes, as in normal querying you always use the normal field name, but it makes administration easier (and clearer) to be able to refer directly to an index by name.

Now, let's try that query again:

mysql> SELECT COUNT(MemberID) FROM golfscores WHERE Score = 100;
| count(MemberID) |
|           27786 |
1 row in set (0.17 sec)

Note now that the query time is down to 0.17 seconds, which is more than twice as fast. This is because MySQL sees us filtering on the Score field, realises it has an index ready to help filtering on that field, and uses it.

Getting your indexing right is a matter of common sense for the most part, but also careful investigation now and then - you need to find out what fields are being read regularly (and preferably written rarely, as changing an indexed field means changing the original and the index) and index it.

For example, I wrote a lot of code for an online strategy game a few years ago, and the map screen would center on a player's home system and draw all the squares near to it. The query to get those squares involved getting the home system's row and column on the map, and reading in all the squares that have adjacent rows and columns. From there, it also calculated what the player could see by searching for friendly radar bases nearby, friendly spaceships that were present in enemy space, and any spy operations in place, so it did even more row and column searching. The end result was that in a map view of just 9x9 (81 squares in total) over 4000 queries to the database were required, most of which read the row and column fields.

At first, I didn't have these columns indexed ("premature optimisation is the root of all evil" said Sir Tony Hoare, Emeritus Professor of Computing at Oxford University), which meant that map generation took over 20 seconds at full CPU usage - wholly unacceptable, even if only one person was playing the game at a time. After indexing both the row and column fields, that time dropped to under a second. Don't get me wrong: it was still slow, because three-quarters of a second of CPU time is an age when you've got 1000 users simultaneously, but it was certainly much, much faster. Adding some indexes to other fields brought the final map execution time down to under half a second, which meant I could further change the code to tell MySQL to give the map drawing code a low priority. The end result was that drawing the map took about 0.75 seconds, but that MySQL would allow other queries (from other gamers doing more important things) to take over as appropriate.

If you later decide you don't want an index, it can be removed with this command:

ALTER TABLE golfscores DROP INDEX IdxScore;

Now you can see why giving each index a unique name is helpful!


Want to learn PHP 7?

Hacking with PHP has been fully updated for PHP 7, and is now available as a downloadable PDF. Get over 1200 pages of hands-on PHP learning today!

If this was helpful, please take a moment to tell others about Hacking with PHP by tweeting about it!

Next chapter: Simple text searching using LIKE >>

Previous chapter: MySQL functions

Jump to:


Home: Table of Contents

Copyright ©2015 Paul Hudson. Follow me: @twostraws.