Brent Ozar, Microsoft Certified Master recently discussed parallelism in SQL Server, specifically the wait types CXPACKET and CXCONSUMER in his final installment of Quest’s Database Training Days Fall Series. In his usual humorous and accessible fashion, Brent demystified the concepts of parallelism and explained how to handle it when you see too many CXPACKET and CXCONSUMER wait statistics.
First, what is parallelism, and why does SQL Server make queries execute in parallel?
Simply put, SQL Server automatically recognizes that a particular query has a large workload, and it determines that the work can be done more efficiently across multiple processors than by just one. This is generally a smart decision, but it can run into trouble when SQL Server doesn’t balance the load across the threads performing the task.
Understanding CXPACKET and CXCONSUMER wait types
CXPACKET and CXCONSUMER are wait types that indicate that work isn’t equally balanced. When you see these wait stats on your server, you’ll know that SQL Server is running queries in parallel, but not doing a great job of distributing them across available processors.
Every database professional is familiar with the concept of “cost” to express how expensive a query is to execute in terms of resource consumption. These “query bucks” are an approximate measure of work, and an important signal as to whether the query will run in parallel or not. An inexpensive query won’t need to run in parallel, but an expensive one will. The goal is to execute the query as quickly and efficiently as possible so the next one in line can begin. SQL Server designates a thread as a scheduler, and this thread, which Brent deemed the “robot overlord,” will assign pieces of the parallel workload to the worker threads, or the “robot minions.”
Parallelism and the robot overlord
Brent dove into a demo to show how this works. Using the Stack Overflow database, he created a low-cost database lookup that was very fast because of the presence of an index. The execution plan was pretty straightforward and didn’t require parallelism to run.
But, when he introduced a lookup for something that wasn’t in the index, things changed by forcing a key lookup for every row on the clustered index of the table. SQL Server recognized this would be a lot of work, so it introduced parallelism and indicated as such with an icon on the execution plan. If the execution plan was three-dimensional, you’d be able to see the multiple threads stacked up, but since it’s not, you need to view the statistics to see information such as the logical reads performed by each CPU thread.
However, SQL Server only assigned this task to a few threads, not all of them. Brent explained that everything happening beyond the parallel icon is happening only on the assigned processors. So, the threads that performed the initial reads are now the only ones also doing the key lookups. The robot overlord only asked a few minions to perform the entire task instead of asking all the minions to pitch in.
He went on to explain that SQL Server has to account for what the threads are doing as well as track what the robot overlord is doing. In the early days, all this work was represented by one wait statistic, but this didn’t make sense because no matter what, the overlord still has to wait while all the threads are working. So, a new wait type was introduced – this was CXCONSUMER and it tracks what the scheduler/overlord thread is doing, while CXPACKET tracks what the worker/minion threads are doing.
Brent went back to the query to make it even more complex by adding a sort. Now, it becomes even more clear that parallelism is causing a problem rather than making the operation more efficient. The work has become even more unbalanced across the few worker threads, and some are running out of memory and spilling to disk. He added a join, even further burdening the working cores which are not getting any assistance from the non-working ones. CXPACKET stats continued to climb.
What can you do in this situation? The parallelism decision is occurring at the server level and not at the query level, so it’s going to take some configuration changes.
Assessing key configurations
We already learned that if the query cost is higher than a certain level, it causes SQL Server to parallelize. Small queries get constrained to a single thread. But what controls the threshold? It’s a property called Cost Threshold for Parallelism (CTFP). By default, if the execution plan determines the cost to be higher than 5 query bucks, the query will get parallelized. While there is no guidance as to how to set this, Brent recommends a number greater than 50. This will get rid of parallelism for trivial queries.
Another configuration is the max degree of parallelism (MAXDOP) which describes the number of threads that SQL Server will assign to the query. The default value here is zero, which means SQL Server can use all available processors, up to 64, to execute the query. Setting the MAXDOP option to 1 limits SQL Server to using only one processor – in effect, forcing a serial plan to execute the query. SQL Server will recommend a MAXDOP value based on number of server cores you have, but generally, a lower MAXDOP makes sense since there won’t be many times that all cores are needed.
Brent made adjustments to these two configurations and ran his query again. This time, we could see that more cores were engaged in the parallel operation. The CXPACKET wait stats were lower which meant that the load was more evenly balanced across more cores than before.
Tips for combating CXPACKET and CXCONSUMER wait statistics
Brent recommends the following steps if you are seeing excessive CXPACKET and CXCONSUMER wait statistics:
- Set the CTFP and MAXDOP per industry best practices and then let those settings bake for a few days. This clears the plan cache and forces SQL Server to rebuild query execution plans (reevaluate cost).
- Make index improvements that will reduce the times when queries go parallel to do scans and sorts. Let new indexes bake and then look for queries that are still doing a lot of work.
- Tune those queries and let them bake for a few days.
- Finally, if parallelism is still a serious problem, start looking for the specific queries with parallelism issues.
For even more insight, you can catch Brent’s entire training session on CXPACKET and CXCONSUMER wait statistics on-demand below.