Pages

Thursday, 27 June 2013

SQL Server: Part 2 : All About SQL Server Statistics :Histogram

In the Part 1 about SQL server Statistics, we have discussed about density vector information stored in the the statistics. In this post, let us discuss about the histogram. Let us create a copy of SalesOrderDetail table and two indexes on top of that as we did in our first part.

USE mydb
GO
SELECT INTO SalesOrderDetail FROM AdventureWorks2008.Sales.SalesOrderDetail
CREATE UNIQUE CLUSTERED INDEX ix_SalesOrderDetailID ON SalesOrderDetail(SalesOrderDetailID)
CREATE NONCLUSTERED INDEX ix_productid ON SalesOrderDetail(productid)

Let us see the histogram information of the non clustered index.

DBCC SHOW_STATISTICS('dbo.SalesOrderDetail''ix_productid'WITH HISTOGRAM



















You can see 200 records in the output. I have shown only the first 18 records. To create the histogram, SQL server split the data into different buckets (called steps) based on the value of first column of the index.Each record in the output is called as bucket or step.The maximum number of bucket is 200 based on the data distribution.Histogram is a statistical representation of your data.In other words it is the distribution of records based on the value of first column of the index. Histogram is always based only on the first column of the index even if the index is composite one.This is one of the reason why it is always suggested to have most selective column as the first column of the index, but there are exceptions. 

Let us look at the output of the histogram. It tried to put the 121317 records in the table into 200 buckets (steps) based on the value of productid. 

The RANGE_HI_KEY column represent the upper boundary of each bucket.The lower boundary of each bucket is the  RANGE_HI_KEY+1 of the previous bucket. For the first bucket, the lower boundary is the smallest value of the column on which the histogram is generated.

The RANGE_ROWS column represent the number records in that bucket range but not equal to the value of RANGE_HI_KEY. The value 0 on the first record says, there is no record in the table whose productid value is less than 707. If you look into the 11th record with RANGE_HI_KEY value 718, we have 218 in the RANGE_ROWS column .This says ,there are 218 records with productid value is greater than 716 (previous RANGE_HI_KEY) and productid value is less than 718. The output of below query proves that:

SELECT COUNT(*) FROM SalesOrderDetail WHERE productid>716 AND productid<718


The EQ_ROWS is the number of records in the table matching with RANGE_HI_KEY. For the first records, 3083 in the EQ_ROWS says that there are 3083 records in the table with productid 707.The output of below query proves that:

SELECT COUNT(*) FROM SalesOrderDetail WHERE productid=707




The DISTINCT_RANGE_ROWS represent the number of distinct records (distinct productid ) between two RANGE_HI_KEY values. If you look into the 11th record with RANGE_HI_KEY value 718, we have value 1 in the DISTINCT_RANGE_ROWS column .This says ,there is only 1 distinct records withe productid value is greater than 716 (previous RANGE_HI_KEY) and productid value is less than 718. The output of below query proves that:


SELECT COUNT(distinct productidFROM SalesOrderDetail WHERE productid>716 AND productid<718


The AVG_RANGE_ROWS column represent the average number of rows per distinct values.This is equivalent to RANGE_ROWS / DISTINCT_RANGE_ROWS when RANGE_ROWS value is greater than 0. Otherwise AVG_RANGE_ROWS is considered as 1.

How SQL server optimizer use the histogram for cardinality estimation ? Let us consider the execution plan of the below query.

SELECT productid FROM SalesOrderDetail WHERE productid>=716 AND productid<=718




From where the Estimated Number of rows (1513) is calculated ? Let us go to the histogram















Add the highlighted values which will match to 1513 which is the estimated number of rows in the execution plan .
1076 is the number of records with productid value 716
218 is the number of records with productid value greater than 716 and productid less than 718
219 is the number of records with productid value 718

Sum of these three values is the estimated number of rows in the above execution plan.

When there is complex where condition ,optimizer create required statistics called column statistics and use complex algorithm on top of the histogram data for cardinality estimation. We  will discuss about that in the next post.

If you liked this post, do like my page on FaceBook

12 comments:

  1. Thanks for the great article. There is one syntax change for others that read this article: change the word DENSITY_VECTOR to HISTOGRAM. You could use both and separate with a comma but the density vector is not what is pictured. Otherwise its a good refresher article for a lot of us and a new command for others.

    DBCC SHOW_STATISTICS('dbo.SalesOrderDetail', 'ix_productid') WITH DENSITY_VECTOR, HISTOGRAM

    ReplyDelete
    Replies
    1. Thanks for pointing out. I will make the change.

      Delete
  2. This will really come in handy especially if you'd want to review some stuff. Thank you very much for sharing this SQL server solution.

    ReplyDelete
  3. Thanks for posting this article. This will come handy in my field of work. Thanks again, I cant wait to tell my friends about this blog.

    ReplyDelete
  4. I have a question and it might be silly for you. How can I get the number of values that match each distinct column value from Statistics in SQL server?

    ReplyDelete
  5. Bạn đang lo lắng phân vân khi không biết nên đặt niềm tiên ở công ty thiết kế web nào.? Bạn chưa bít giá cả và chất lượng dịch vụ ra sao.? Tôi xin giới thiệu:
    Công Ty TNHH HTSolution là một công ty thiết kế website chuyên cung cấp các dịch vụ chuyên nhiệp như: Thiết kế website giá rẻ, phần mềm quản lý quán cà phê, cần nâng cấp website….vv.
    Nếu bạn có nhu cầu sử dụng dịch vụ phần mềm quản lý hay cần làm web của công ty chúng tôi vui lòng liên hệ:
    CÔNG TY TNHH HTSOLUTION
    Địa chỉ: 78th Thăng Long - P.4 - Q.Tân Bình - TP.HCM
    Điện thoại: (08) 6267 3377 (Anh Hoàng) or 0985327297 (Anh Trường)
    Email: Info@htsolution.vn

    ReplyDelete




  6. I loved the way you discuss the topic great work thanks for the share, Let me share this, vmware training in pune

    ReplyDelete
  7. Nice article and explanation is good,Thank you for sharing your experience on SQL DBA.you have clearly explained about the process thus it is very much interesting and i got more information from your blog.
    For more details about please check our website.

    Oracle Fusion Financial Training Institute

    ReplyDelete
  8. thank you for sharing nice information really very nice ..i am interested java training

    ReplyDelete
  9. شركة تنظيف منازل بجدة
    شركة تنظيف منازل بمكة
    افضل شركة تنظيف بجدة
    تتميز شركتنا شركة تنظيف خزانات بجدة
    بعدة عوامل هامة تجعلنا أفضل شركة تنظيف خزانات بجدة فتقوم بجميع خدمات التنظيف ومنها تنظيف
    والشوائب الخزانات بجدة وايضا شركة تنظيف خزانات بالمدينة المنورة متخصصة في مجال تنظيف الخزانات وكذلك شركة غسيل خزانات بالمدينة سواء كانت العلوية أو السفلية وتعقيمها تعقيم تام بأفضل أنواع المطهرات

    ReplyDelete
  10. من افضل واقوى الشركات التي تقدم خدمات مكافحة حشرات المنزل البق والصراصير والنمل والعته تلك التي تستخدم مبيدات آمنة وفعالة ومضمونة مثل افضل شركات مكافحة الحشرات بجدة التي تتعامل في مكافحة الحشرات وتستخدم مبيدات آمنة ومضمونة كما هو الوضع او النظام في مثل شركة مكافحة حشرات التي تتعامل مع الحشرات في المنازل

    ReplyDelete