On the occasion of releasing wtrace 2.2, I decided to write a short post about new functionalities I added to this tool in the recent months. I hope you will find them interesting. Wtrace is a command line application which collects ETW traces from the system and the selected processes and outputs them to the console. It is very simple to use and runs on Windows 7+. Currently, it supports the collection of File I/O, TCP, ALPC, RPC, ISR, DPC, and PowerShell events.
Application Insights is a performance monitoring service, created by Microsoft and available on Azure. It gives you space to store the performance metrics and logs of your application (1GB for free!), as well as functionalities to search and manage them. In this post I am not going to present you the whole platform – Microsoft already did it in the Azure documentation, but rather focus on an element of the log collection, named dependency calls tracking. I did some analysis on the Application Insights libraries, and decided to publish my findings, in the hope that the results might interest some of you too.
Dependency calls are requests, which your application makes to the external services (such as databases or REST services). When this telemetry type is enabled, all the dependent actions form a timeline within the scope of the parent action. Using this timeline we may easily verify whether the delay in our application is caused by an external service, or the application itself. Let’s analyze in detail how this data is collected.
In today’s short post I would like to present you a new tool in my diagnostics toolkit: wtrace, and an update to procgov (or Process Governor). Let’s start with wtrace.
On Linux, when I need to check what a given process is doing, I usually use strace. I was always missing such a tool for Windows. We have procmon (which is great), but it does not run in a console, and thus can’t be used in the command line scripts, or on a Nano server. This might change soon, as in one of the latest episodes of the Defrag Tools show, Mark Russinovich shared the plan of releasing the procmon version for Nano. Till then though we don’t have much choice when it comes to real-time tracing. You may think of xperf or wpr, but those tools only record ETW events for further analysis. However, we may use the same ETW events in a realtime session, and print information they provide to the console output. This is how the idea for wtrace was born in my head. Few weeks ago Sasha Goldshtein released another tool for ETW processing named etrace, which basically does something very similar and has many interesting options. I decided to publish wtrace nonetheless, as my point was to create a tool with an extremely simple interface. Wtrace is collecting only a small subset of events (FileIO, TcpIp, Process/Thread Start) from the kernel provider. It may either start a process, or trace one that is already running. At the end of the trace it also shows some statistics (unless you use the –nosummary switch). Trace session will end either when you press Ctrl+C, or when the traced process terminates. Events are printed in the console window. An example session might look as follows:
I am working on adding a support for ASP.NET performance counters into Musketeer. Compared to other .NET performance counters they have quite surprising instance names. ASP.NET developers decided that their performance counter instances will be identified by names derived from the AppDomain names (more information can be found here). This is probably due to a fact that one process may host multiple ASP.NET applications, thus one counter instance per process won’t be enough. Consequently, in order to match collected metrics with process ids we need to know which AppDomain belongs to which process. How can we do that?
Recently while examining a slow request issue (I have a plan to describe this investigation in a seperate post) it came to me that every time I open the Thread Time view it takes a moment to understand what this view actually contains. I decided to write this post for me and for any of you who share the same feeling about this window :).
Today I would like to share with you an interesting (I hope) diagnostics case in one of our system services. The IngestService (that is its name) was not starting properly for the first time – it was being killed because of exceeding the default 30s timeout. But the second try was always successful. No exception was thrown and no logs could be found in the event logs. It’s a situation when ETW traces might shed some light on what’s going on. As it was a .NET service I used PerfView to record the trace file. An important checkbox to select when diagnosing thread wait times is the Thread Time box:
After collecting the traces on production, I merged them and copied to my developer machine.
UPDATE 2019.01.30: All the features described in this post, as well as some other improvements, are available in the official NLog.Etw package. Please use it in place of my custom package.
I really like the NLog library and I use it pretty often in my projects. Some time ago I wrote a post in which I showed you my preferred debug and production configuration. Other day I presented you a simple layout renderer for assembly versions.
Today, I would like to inform you that all those goodies 😉 are available in my brand new LowLevelDesign.NLog.Ext Nuget package.
Additionally, you may find in it two ETW NLog targets. ETW (Event Tracing for Windows) is a very effective way of logging and its support in kernel makes it a great choice for verbose/trace/debug logs. Moreover, if you are using Windows Performance Toolkit in your performance analysis, providing your own ETW messages will help you correlate system events with methods in your application. ETW infrastructure is highly customizable (check Semantic Logging Application Block to see how your logs might look like and how they might be consumed:)).
.NET developers usually know they should measure code performance using a Stopwatch class from the System.Diagnostics namespace. From time to time though I see code where someone uses DateTime instances for this purpose. And it’s not very surprising as DateTime class is usually the one that comes to mind when you think of time in .NET. In today’s post I would like to show you the difference in accuracy of both those approaches and the price you need to pay using either of them. We will work on this sample code that does nothing but measure time :):
The majority of modern applications use ORMs as a way to connect to a database. Those libraries simplify the usage of the underlying ADO.NET API in a way that we might even forget that under the hood we are dealing with a relational model. This object-oriented wonderland usually lasts till the moment when the first SQL exceptions crop up. If we configured our ORM correctly we should be able to diagnose database problems with just logs that it provides. But what if we didn’t or if the problem lies deep in the ADO.NET layer? We might then try to use a debugger or a SQL Server Profiler. Both those tools, although great, are usually too heavy (or too invasive) for a production environment. Fortunately there is still one option left – ADO.NET ETW tracing. In today’s post I will show you how to turn on this type of tracing and how to use it to quickly diagnose some database problems.
I have been playing recently with the ETW (Event Tracing for Windows). One of my aims was to write a managed provider and try the ETW infrastructure in my application. Everything seemed to be well explained on the MSDN and not very hard to implement (especially in my simple case). Unfortunately not all things went smoothly and in this post I’m going to show you an issue I run into as well as some general path when diagnosing broken ETW providers.