Understanding the Thread Time view in PerfView


Recently while examining a slow request issue (I have a plan to describe this investigation in a seperate post) it came to me that every time I open the Thread Time view it takes a moment to understand what this view actually contains. I decided to write this post for me and for any of you who share the same feeling about this window :).

The Thread Time view is the most detailed of PerfView views when it comes to analysing CPU spikes and thread wait times. It’s not the view you open first as you will quickly find yourself lost with the amount of data presented there. I usually start the thread time analysis with pinpointing the interesting periods of time when my application was running. For this purpose you may either use any of the special views, such as the Server Request Thread Time Stacks view or the ASP.NET Thread Time view, or find events interesting to you in the Events view. For instance in my diagnostics case (disassembling a request) I used the Windows Kernel/TcpIp/Accept and Windows Kernel/TcpIp/Disconnect events to set a time range for my further investigations. But in this post we will examine only 2ms of the request time (full analysis should be available soon). Let’s open the Thread Time view and set the Start and End input boxes to accordingly: 9,330.058 and 9,333.001. Now switch to the CallTree tab and start expanding each thread node under your selected process. Most of them will probably have the BLOCKED status (as you can see on the print screen below) which means they weren’t doing anything all this time:

blocked-threads

You may safely exclude them from the view (Alt+E on the thread node). You should be left with threads that have some interesting call stacks. In my case in the selected period of time there were only two threads active in the process: 5760 and 7260. We will focus on the first one. After expanding all the call stacks I got the following data (I replaced the irrelevant lines with dots):

  Process32 w3wp (3736)
  + Thread (5760) CPU=7ms (.NET ThreadPool)
  ...
  Name                                                                     First       Last
  + nancy!ModuleExtensions.Bind                                            9,330.057  9,331.099
  | + Nancy!DynamicModelBinderAdapter.TryConvert                           9,330.057  9,331.099
  |  + Nancy!DefaultBinder.Bind                                            9,330.057  9,331.099
  |   + Nancy!DefaultBinder.DeserializeRequestBody                         9,330.057  9,331.099
  |    + Nancy!JsonBodyDeserializer.Deserialize                            9,330.057  9,331.099
  |     + mscorlib.ni!StreamReader.ReadToEnd                               9,330.057  9,331.099
  |      + mscorlib.ni!StreamReader.ReadBuffer                             9,330.057  9,331.099
  |       + Nancy!RequestStream.Read                                       9,330.057  9,331.099
  |        + Microsoft.Owin.Host.SystemWeb!DelegatingStream.Read           9,330.057  9,331.099
  |         + System.Web.ni!HttpBufferlessInputStream.Read                 9,330.057  9,331.099
  |          + System.Web.ni!IIS7WorkerRequest.ReadEntityBody              9,330.057  9,331.099
  |           + System.Web.ni!IIS7WorkerRequest.ReadEntityCoreSync         9,330.057  9,331.099
  |            + System.Web.ni!DomainBoundILStubClass.IL_STUB_PInvoke(...  9,330.057  9,331.099
  |             + webengine4!MgdReadEntityBody                             9,330.057  9,331.099
  |               + ...                                                    9,330.057  9,331.099
  |                 + ntoskrnl!NtWaitForSingleObject                       9,330.057  9,331.099
  |                  + ntoskrnl!KeWaitForSingleObject                      9,330.057  9,331.099
  |                   + ntoskrnl!KiCommitThreadWait                        9,330.057  9,331.099
  |                    + ntoskrnl!KiSwapContext                            9,330.057  9,331.099
  |                     + ntoskrnl!SwapContext_PatchXRstor                 9,330.057  9,331.099
1#|                      + BLOCKED                                         9,330.057  9,330.663
2#|                      + CPU_TIME                                        9,330.663  9,331.099
  + LowLevelDesign.Diagnostics.LogStore.ElasticSearch!ElasticSearchAppC... 9,331.728  9,332.373
  |                    + ...                                               9,331.728  9,332.373
3#|                      + CPU_TIME                                        9,331.728  9,332.373
  + FluentValidation!FluentValidation.AbstractValidator`1[...].Validate    9,331.099  9,331.728
  |                    + ...                                               9,331.099  9,331.728
4#|                      + CPU_TIME                                        9,331.099  9,331.728

Notice the left margin and four marks on it – I will be referring to them later on. By checking the time ranges on the right we can see that everything starts at 9,330.057ms. Firstly, the thread is blocked (mark 1#) till 9,330.663ms. The call stack for the BLOCKED event shows what the thread was doing at the moment it got suspended.. Now, we have some CPU_TIME, split in three time ranges:

  • from 9,330.663ms till 9,331.099ms (mark 2#)
  • from 9,331.099ms till 9,331.728ms (mark 4#)
  • from 9,331.728ms till 9,332.373ms (mark 3#)

As you can see in the code snippet, lines are not sorted chronologically – mark 4# comes before 3# and you should check the When column to see which moment comes first, eg.:

  |When                                 First	    Last
2#| AAAAAA9AAAA3____________________    9,330.057   9,331.099
3#| __________________8AAAAAA1______    9,331.728   9,332.373
4#| ___________6AAAAAA1_____________    9,331.099   9,331.728

Now the question is how to interpret those call stacks? Before we answer this question let’s first understand how PerfView builds the Thread Time view. As you know, ETW tracing is just a process of collecting events from different providers. Views available in PerfView (or WPA) are just nice ways to display these collected events. The Thread Time view is not an exception and it is built upon the following events:

  • Windows Kernel/Thread/CSwitch – event generated each time a thread is given CPU time (CPU switches from one thread to the other)
  • Windows Kernel/PerfInfo/Sample – event generated every millisecond to collect stack traces of processes running on each processor on the system

Let’s open the Events view, set the time range to 9,330.058ms9,333.001ms, choose our process and select the above mentioned events from the event list. We will receive the following records (I formatted them for better readability):

Event#1
Event Name:   Windows Kernel/Thread/CSwitch
Time MSec :   9,331.778
Process Name: w3wp (3736)
Rest:         HasStack="True" ThreadID="5,760" OldThreadID="7,256" OldProcessID="3,736" OldProcessName="w3wp" NewThreadID="5,760" NewProcessID="3,736" NewProcessName="w3wp" ProcessorNumber="0" NewThreadPriority="8" OldThreadPriority="9" NewThreadQuantum="0" OldThreadQuantum="0" OldThreadWaitReason="15" OldThreadWaitMode="Swappable" OldThreadState="Wait" OldThreadWaitIdealProcessor="0" NewThreadWaitTime="0"

Event#2
Event Name:   Windows Kernel/PerfInfo/Sample
Time MSec :   9,332.100
Process Name: w3wp (3736)
Rest:         HasStack="True" ThreadID="5,760" InstructionPointer="0xfffff800019c7bf0" ProcessorNumber="0" Priority="0" ExecutingDPC="False" ExecutingISR="False" Rank="0" Count="1"

Event#3
Event Name:   Windows Kernel/Thread/CSwitch
Time MSec :   9,332.373
Process Name: w3wp (3736)
Rest:         HasStack="True" ThreadID="1,244" OldThreadID="5,760" OldProcessID="3,736" OldProcessName="w3wp" NewThreadID="1,244" NewProcessID="3,736" NewProcessName="w3wp" ProcessorNumber="0" NewThreadPriority="9" OldThreadPriority="8" NewThreadQuantum="0" OldThreadQuantum="0" OldThreadWaitReason="32" OldThreadWaitMode="Swappable" OldThreadState="Ready" OldThreadWaitIdealProcessor="0" NewThreadWaitTime="452"

Now, everything becomes clear. We have two context switches and we can see that the first switch, at 9,331.778ms, (Event#1) activated our 5760 thread (NewThreadID) and it was active till 9,332.373ms (Event#3) when system scheduler suspended it and switched the first CPU to a thread 1244 (NewThreadID). In the meantime one profiling event was recorded at 9,332.100ms which proves that the thread was running on the first processor.

With all the collected data we can summarize the 2ms of the 5760 thread lifetime in the following table:

Time Description
9,331.778ms The thread was awakened by the system after the request stream was received and started doing something (we don’t know what)
9,332.100ms The thread was performing some requests in the ElasticSearch log store (the stack is from my code so I know :))
9,332.373ms The thread was doing validations (FluentValidation.AbstractValidator`1.Validate) when it was put on hold by the system.

As you can see much happens on your system in every millisecond and collecting ETW traces is one of the best ways to understand exactly what is keeping your CPU cores busy 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s