If a table has 'id' column as it's clustered index, is there any benefit in adding 'id' column as an included column in any other non-clustered index
If a table has 'id' (the primary key) column as its clustered index, is there any benefit in adding 'id' column as an included column in any other non-clustered index in Microsoft SQLServer?
eg:- Table 'xyz'
CREATE NONCLUSTERED INDEX [NonClusteredIndex_status_Date] ON [xyz] ( [status] ASC, [date] ASC ) INCLUDE ( [id], [name] )
And this non-clustered index is targeted for a query similar to bellow on a large data set. In the actual case there could be some other queries as well.
select * from xyz where status='active' and date > '2021-06-20'
This answer assumes you are planning to run the following query:
SELECT * FROM xyz WHERE status = 'active' AND date > '2021-06-20';
If you only created a non clustered index on
(status, date), then it would cover the
WHEREclause, but not the
SELECTclause. What this means is that SQL Server might choose to use the index to find the matching records in the query. But when it gets to evaluating the
SELECTclause, it would be forced to seek back to the clustered index to find the values for the columns not included in the index, other than the
idclustered index column (which includes the
namecolumn in this case). There is performance penalty in doing this, and SQL Server might, depending on your data, even choose to not use the index because it does not completely cover the query.
To mitigate this, you can define the index in your question, where you include the
namevalue in the leaf nodes. Note that
idwill be included by default in the index, so we do not need to explicitly
INCLUDEit. By following this approach, the index itself is said to completely cover the query, meaning that SQL Server may use the index for the entire query plan. This can lead to fast performance in many cases.
The answer to your question is essentially No, there is no benefit.
When you create a non-clustered index on a table, each row in the index needs to be able to point to the row in the base table.
If the base table is a heap, each row in the index will contain a pointer to the rid (row identifier) which is what SQL Server uses to uniquely identify each row.
When the table is defined with a clustered index, every non-clustered index will automatically contain the clustered index column(s) as keys to the row in the base table.
You can see this in an execution plan, where a non-clustered index is used and SQL Server has to retrieve additional columns from the base table; if the table is a heap, it will be an RID lookup, if the table has a clustered index it will be a Key lookup.
Additionally, if the clustered index is not unique, SQL Server adds its own uniquifier value to ensure uniqueness, and this is also included in non-clustered indexes.
So when it comes to non-clustered indexes, it does not matter if you specify the clustered index column(s) - you can, and there is no harm in doing so, but it/they are always included.