Indexing a database has always been seen as some form of dark art and it doesn’t need to be that way. Despite being critical to the performance of an application, indexes are generally one of the last things thought about in a database project. A properly indexed database will increase performance and decrease costs.
Before we begin, I should warn you this isn’t going to go into how to query the index statistics so if you want more information on that, we’ll create another blog in the future.
We’re going to discuss my 4 golden rules for indexes. In addition, we’ll look at some of the things I’ve seen while assisting clients index their SQL databases.
Rule 1. Every Table MUST have a Primary Key.
There is no exception here, every single table in a sql database should have a primary key. Generally this will be a single column call ID that will be numeric. Depending on the scale of the software an INT field maybe sufficient but I generally recommend BigInt because I have upgraded 2 enterprise legacy systems that ran out of numbers for the int field.
This index is commonly, but not always, the clustered index. A clustered index is the order the data is stored on the disc. If there is another field that orders the data better than the ID, then the primary key will be an auto increment ID field non-clustered. The most common example of a better cluster field than the ID is the date. For this first rule, concentrate on getting your ID set and clustering second.
Rule Number 2: Order Fields Based on Ease of Limiting Data Pulled
The order of fields in the indexes is very important. The key here is to think about the usefulness of the data in a search result. The most useful fields should be first. If we take the ID from rule one, that field his not very useful for a search because you’d need the exact ID to find it. As om the previous rule, Date is usually a good first field.
One of our clients had a log table that had columns such as id, log event type, log event, and created date. In this case created date would be an ideal starting value in an index as it would group the table nicely and ‘split’ the table. Therefore, if you had to search for a log based on date and type, the date would instantly limit the number of results before searching for the type, making the search much faster.
Rule 3: 6-8 Indexes
This rule is more of a guide than a hard and fast rule as there is no perfect number of indexes on a table, but generally I find around 6-8 is the sweet spot for write and read times. Its important to keep in mind with every index you create there is a write overhead required to maintain the index and resort them. Therefore if you add too many indexes your read times may well decrease BUT writing of the data to the index will increase.
This isn’t exactly how it works, but a good way to think about databases are they effectively keep little tables for each index off the main table, so that when an index is used in a search it can use this smaller table to get a view on where in the main table rows are. In order for these to be useful, when new data is written all of these smaller tables update. If there are 100s of indexes, then there are 100s of these tables and thus write times increase.
Based on my experience, 6-8 indexes give good read and write times.
Rule 4: Single Field Indexes Are Useless.
Generally speaking a single field index is useless. The only case for a single field index is in a very small table and it must have fields specified in the “includes” field. The “includes” field allows fields that are in the select statement to be stored along side the index. This stops the database having to select by the index in the smaller index side table then go back to the main table and do a scan on the bigger table.
Single index with “no includes” in the code is useless and it’s the most common index I see on client projects. The database has to do nearly as much work as without an index yet has a write overhead. Its important for tables that you match your indexes to the queries being written and generally speaking a query is not going to have a single field, nor a select a single field, therefore your indexes should not be a single field nor have blank “includes.”
That concludes my 4 golden rules for indexes, there are of course more details behind each of these rules and more data that can be used to help make your indexes more useful. Subscribe to our blog for more information on this in the future.