Popular Posts

Thursday, January 1, 2015

Clustered Index and Nonclustered Index

Indexing is mainly used to increase the speed  while searching a record from a table having large amount of data.

Index is a database object, which can be created on one or more columns (16 Max column combination). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (InsertDelete and Update) operations.

ex- If a table contains 1 million rows and we want to find a record from that table which is present at 50,000 row location. so for searching that record we have to go line by line in that table and at 50,000 row we will find that record,but it is time consuming.
To avoid this we can use indexing.

There are mainly two types of indexing-
1)Clustered index-
  • Whenever we apply/create primary key on the table ,clustered index automatically gets applied on it.
  • Custered index decide physical location of the record present in the table.
  • Record are stored in binary tree format, so while searching the record it becomes easy to search the record in lesser time.
  • A table can only have one clustered index.
  • For more info check this http://www.codeproject.com/Articles/173275/Clustered-and-Non-Clustered-Index-in-SQL
2)Nonclustered Index-

Logical Representation of Non Clustered index

In a simple word , a non clustered index is a subset of a table. When we define a non clustered index, SQL server store the set of non clustered key in a different pages.Let us consider a table with four columns (PersonId(PK),PersonType,FirstName,LastName) and a non clustered index on that table.The actual table is stored in the order of personid column (Cluster index key).In the below figure will give you a pictorial representation of non clustered index. The non clustered index key column are stored  apart from the actual table.If you look into the Non clustered index ,you can notice that, records are stored in the order of Firstname and lastname (Non Clustered index key) not as in the order of actual table.To understand easily , non clustered index can be considered as subset table of actual one. 


Now let us assume that we have a task to find out all records whose first name is 'Michael'.If you tried to find out from the Actual table , we need to go through each record from top to bottom to check whether first name matches with our search string 'Michael' as the records are not ordered based on the firstname column. It will be more tedious task if we have thousands of records in that table. The task will be much easier if we look into the the Non Clustered index side as the first name and last name are alphabetically ordered.We can easily locate the records which has firstname as 'Michael' and we do not need go beyond that as we are sure that there will not be any more records with firstname as 'Michael'.

Now we know Firstname and lastname of the record. How do we get the values for other two column ? Let us make a change in the Non clustered index part by associating the PersonId column along with the non clustered index.


Now, once we locate the records , we can go back to the Actual Table using the PersonId (Cluster index key) to find the values of other columns and this operation is called bookmark lookups or RID lookup. 
Non clustered index can be defined on a heap table or clustered table.In the leaf level of nonclusterd index, each index row contain the nonclustered key value and a row locator.This locator point to a the data row in the clustered index or heap.The row locator in nonclusterd index rows are either a pointer to a row or a clustered index key for a row. If the table is a heap, which means it does not have a clustered index,the row locator is a pointer to the row.The pointer is built from the file identifier ,page number and slot number of the row on the page. The whole pointer is known as a Row ID(RID). If the table has a clustered index, the row locator is the clustered index key for the row.

No comments:

Post a Comment