Index design basics

An index is an on-disk structure associated with a table or view that speeds up the retrieval of rows from a table or view. Indexes contain keys generated by one or more columns in a table or view. These keys are stored in a structure (B tree) that enables SQL Server to quickly and efficiently find the row associated with the key value.

Choosing the right index for a database and its workload is a complex task that needs to be balanced between query speed and update overhead. If the index is narrow, or if there are only a few columns in the index key, there is less disk space and less maintenance overhead. A wide index, on the other hand, covers more queries. You may need to experiment with several different designs to find the most efficient index. Indexes can be added, modified, and removed without affecting database architecture or application design. Therefore, you should experiment with multiple different indexes without hesitation.

The query optimizer in SQL Server can reliably select the most efficient index in most cases. The overall index design strategy should provide the query optimizer with multiple indexes to choose from and rely on the query optimizer to make the right decisions. This can reduce analysis time and achieve good performance in many cases. To see the indexes used by the query optimizer for a particular query, select Include Actual execution plans on the Query menu in SQL Server Management Studio.

Don’t always equate index use with good performance, or good performance with efficient use of indexes. The query optimizer’s job would be easy if the best performance was achieved by simply using indexes. In reality, however, an incorrect index selection will not yield the best performance. Therefore, the task of the query optimizer is to select an index or combination of indexes only when it improves performance, and to avoid using it when index retrieval hinders performance.

Index classification rule

  • Aggregation or non-aggregation.
  • Unique or not unique.
  • Single column or multiple columns.
  • Are columns in an index sorted in ascending or descending order?
  • Whether a non-clustered index is a full table or filtered.

The index profile

Clustered index

A clustered index is a way of reorganizing the actual data on disk to sort the values of one or more specified columns.

Clustered indexes sort and store rows within a table based on their key values. In clustered indexes, the physical order of the rows in the table is the same as the logical (index) order of the key values.

There can only be one clustered index per table because the rows themselves can only be stored in one order.

Matters needing attention

In general, the fewer columns you use to define clustered index keys is the better.

Application:

  • Unique or containing many non-repeating values.

  • Be accessed sequentially.

  • Define it as IDENTITY.

  • Often used to sort data retrieved from a table. Not applicable:

  • Columns that change frequently

    This causes the entire row to move because the database engine must retain the data values in the row in physical order. This is especially important because data is often mutable in high-volume transaction processing systems.

  • Wide key

    A wide key is a combination of several columns or large columns. All non-clustered indexes use the key values in the clustered index as lookup keys. Any non-clustered index defined for the same table will be much larger because non-clustered index entries contain clustered keys as well as key columns defined for this non-clustered index.

Nonclustered index

Index is understood as a special kind of directory, the directory is pure directory, the text is pure text sorting method is called “non-clustered index”.

Non-clustered indexes contain index key values and row locators that point to where the table data is stored.

You can create multiple non-clustered indexes on a table or indexed view, up to 250 non-clustered indexes for a table that has no clustered indexes.

In general, non-clustered indexes are designed to improve the performance of frequently used queries that do not have clustered indexes.

Similar to the way you use indexes in books, the query optimizer searches for data values by first searching the non-clustered index to find the location of the data value in the table, and then retrieving the data directly from that location. This makes a non-clustered index the best choice for a perfectly matched query because the index contains items that describe the exact location in the table of the data values searched by the query.

Each non-clustered index provides a different sort order for accessing data.

Matters needing attention

Suitable for:

  • A column is often used for set functions (e.g. Sum,….) .
  • Override the query.
  • A large number of non-repeating values, such as combinations of last names and first names (if the clustered index is used for other columns).
  • No more than 20% of the data in the table is retrieved.

The only index

Unique indexes ensure that no duplicate values are contained in the index key, thus making each row in the table unique in some way.

Specifying a unique index makes sense only if uniqueness is characteristic of the data itself.

With multi-column unique indexes, indexes ensure that each combination of values in the index key is unique.

Both clustered indexes and non-clustered indexes can be unique. As long as the data in the column is unique, you can create one unique clustered index and multiple unique non-clustered indexes for the same table.

Matters needing attention

Advantages:

  • Ability to ensure data integrity of defined columns.
  • Provides additional information useful to the query optimizer.

The primary key index

A primary key is a constraint that is attached to an index, which can be a clustered or non-clustered index.

Defining a primary key for a table in a database diagram automatically creates a primary key index, which means that it has an index on it, but not necessarily a clustered index.

A primary key is unique, and only a unique index is unique. A primary key index requires that each value in the primary key be unique.

It also allows fast access to data when using a primary key index in a query.

A primary key index is a special type of unique index.

A primary key is an index, but an index is not necessarily a primary key.

Screening index

A filtered index is a non-clustered index that has been optimized and is particularly useful for covering queries that select data from well-defined data subsets.

Filter indexes use filter predicates to index partial rows in a table.

Compared with full-table indexes, well-designed filter indexes can improve query performance, reduce index maintenance overhead, and reduce index storage overhead.

  • Filtered indexes have the following advantages over full table indexes:
  • Improved query performance and plan quality
  • Reduced index maintenance overhead
  • Reduced index storage overhead
  • A sparse column containing only a few non-NULL values.
  • Heterogeneous columns containing multiple categories of data.
  • A column containing multiple ranges of values, such as dollar amount, time, and date.
  • Table partitions defined by simple comparison logic of column values.

Matters needing attention

Suitable for:

  • Filter index of a subset of data

    When there are only a few related values in a column that need to be queried, a filter index can be created for a subset of the values.

  • Filter index of heterogeneous data

    A filter index can be created for one or more categories of data when a table contains heterogeneous rows of data.

  • Key column

    It is best to include a small number of keys or include columns in the filter index definition, and only include columns that the query optimizer selects for the filter index for the query execution plan. The query optimizer can select a filter index for a query whether or not it covers a query. However, if a filter index covers a query, the query optimizer is more likely to select that filter index.

  • Filter data conversion operators in predicates

    If the comparison operator specified in the filter index expression that filters index results results in an implicit or explicit data conversion, an error occurs when the conversion occurs to the left of the comparison operator. The solution is to write a filter index expression containing the data conversion operator (CAST or CONVERT) to the right of the comparison operator.

Full-text index concept

A full-text engine uses the information in a full-text index to compile full-text queries that quickly search for specific words or phrases in a table. Full-text indexes store information about important words and their locations in one or more columns of a database table. A full-text index is a special type of token-based functional index that is generated and maintained by the SQL Server full-text engine. The process for generating full-text indexes is different from that for other types of indexes. Instead of constructing a B-tree structure based on the values stored in a particular row, the full-text engine generates an inverted, stacked, and compressed index structure based on the individual markers in the text to be indexed. The full-text index size is limited only by the available memory resources of the computer running the SQL Server instance.

Starting with SQL Server 2008, full-text indexes are integrated with the database engine, rather than in the file system as in earlier versions of SQL Server. For the new database, the full-text catalog is now a virtual object that does not belong to any file group; It is simply a logical concept that represents a set of full-text indexes.

Matters needing attention

  • Full-text indexes are for data tables. Full-text indexes can only be created for tables, but not for databases.
  • Each database may contain no full-text catalog or multiple full-text catalogs. A full-text catalog may contain multiple full-text indexes, but a full-text index can only be used to constitute a full-text catalog.
  • Only one full-text index can be created for a data table, and a full-text index can contain multiple fields.
  • A table that creates a full-text index must have a unique non-empty index, and this unique non-empty index must be a single field, not a combination of fields.
  • To create a full-text index on a table, the table must have a unique and non-null column. You can create full-text indexes on the following types of columns: CHAR, VARCHAR, NCHAR, NVARCHar, TEXT, ntext, image, XML, varbinary, and Varbinary (Max) to perform full-text searches on these columns. Creating a full-text index for a column of data type varbinary, varbinary(Max), image, or XML requires that you specify a type column. Type columns are table columns that store the file extensions (.doc,.pdf, XLS, etc.) of the documents in each row.

XML indexes

XML indexes fall into the following categories: primary XML indexes and secondary XML indexes

The main XML indexes

The primary XML index indexes all tags, values, and paths within an XML instance in an XML column. To create a primary XML index, the table in which the corresponding XML column resides must have a clustered index on the primary key of that table. SQL Server uses this primary key to associate rows in the primary XML index with rows in the table that contains this XML column.

The primary XML index is a split and persistent representation of the XML Blob in the XML data type column. For each XML binary large object (BLOB) in a column, the index creates several rows of data. The number of rows in this index is approximately equal to the number of nodes in an XML binary large object. When a query retrieves a complete XML instance, SQL Server provides the instance in that XML column. Queries in an XML instance use the primary XML index and can return scalar values or XML subtrees by using the index itself.

Auxiliary XML index

To enhance search performance, secondary XML indexes can be created. You must have a primary XML index to create a secondary index. Secondary indexes are of the following types:

  • PATH assists XML indexes

If queries typically specify PATH expressions for COLUMNS of XML type, the PATH secondary index can speed up searches.

  • VALUE Secondary XML index

    If the query is based on the value of the query, for example, / Root/ProductDescription / @ * [. = “Mountain Bike”] or / / ProductDescription [@ Name = “Mountain Bike”]. Generating secondary XML indexes based on node values in the primary XML index can yield faster results if the path is not fully specified or contains wildcards.

    The key column of the VALUE index is the node VALUE and path of the main XML index. A VALUE index can be useful if your workload involves querying values in an XML instance, but you don’t know the element or attribute names that contain those values.

  • PROPERTY assists XML indexes

    Queries that retrieve one or more values from a single XML instance apply to PROPERTY indexes. The XML type of the value () method occurs by retrieving an object attribute and the primary key of the object is known.

    The PROPERTY index is created on the columns of the primary XML index (PK, Path, and node values), where PK is the primary key of the base table.

Here are some guidelines for creating one or more secondary indexes:

  • If a workload makes heavy use of PATH expressions for XML columns, path-assisted XML indexing may speed up the workload’s processing. The most common case is to use exist() on XML columns in the WHERE clause of Transact-SQL.

  • If the workload retrieves multiple values from a single XML instance by using path expressions, aggregating paths from individual XML instances in a PROPERTY index can be useful. This is typically the case in property package schemes, where the attributes of an object are extracted and their primary key values are known.

  • If your workload involves querying values in an XML instance, but you don’t know the element or attribute names that contain those values, you might want to create a VALUE index. This typically occurs in descendant axis look-ups, such as //author[last-name=”Howard”], where elements can appear at any level of the hierarchy. This is also the case with wildcard queries, such as /book [@* = “novel”], where the query looks for elements that have an attribute with a value of “novel”.