Indexing Windows header files with the PowerShell Scour module


Developing system applications in C# requires a lot of PInvoking. Although there are many great PInvoke Nuget libraries, for smaller projects I still prefer to import only the definitions I use. The pinvoke.net site is an excellent source of stub definitions. However, it happens that the online definition does not contain all the needed constants or lacks something. In such a case you have to look into the Windows headers (which btw. contain not only definitions but also a lot of interesting comments). I used to search through those files using Total Commander “Find Files” dialog, but it was slow and inefficient. So I switched to Sublime Text and created a project for the Windows headers folder (C:\Program Files (x86)\Windows Kits\10\Include\10.0.x.x). Once the folder index is cached, Sublime becomes a great tool for analyzing the source code (not only for C++!). However, when you read a lot of code and switch between various projects, Sublime replaces the old cached projects with the new ones to keep the cache at a reasonable size. That triggers the cache rebuilt when you open the “old” project again, which takes time and makes your search inefficient again.

I then started looking for a way to build a permanent index on the folders I regularly scan (such as the Windows headers directory). At first, I was thinking about running a local instance of Elasticsearch or Apache Solr server, but that seemed like overkill. I was looking for something simpler, some kind of a wrapper over the Apache Lucene library, which is the core engine for the servers mentioned above. Then I stumbled upon the Lee Holmes article about Scour, a PowerShell module that wraps the Lucene.Net library and provides cmdlets to create full-text indexes for your folders. After using it for some time, I am happy with the results so I decided to share my simple setup with you.

Indexing

The Scour module is available in the PowerShell Gallery and you may install it with the following command:

Install-Module Scour –Scope CurrentUser

Then, run the PowerShell console as an Administrator, go to C:\Program Files (x86)\Windows Kits\10\Include\10.0.x.x (replace x.x with the version you have installed) and execute:

Initialize-ScourIndex -Path *.h

This command creates a __scour directory in the headers folder, which contains the Lucene index for the .h files. Alternatively, if you don’t like writing to the Program Files folders and running PowerShell with administrative rights, you may make a copy of the headers folder and perform the indexing there.

Searching

By default, the cmdlet to search through the indexed files is Search-ScourContent but to use it you need to go to the indexed folder and then write your query. The -Query parameter expects the Lucene Search Syntax and returns a list of files that match the query. To display also the file contents, you need to use the -RegularExpression parameter or pipe the results through the Select-String cmdlet (which is what -RegularExpression is doing for you). As I knew that my searches are simple, I added the following function to my PowerShell user profile (at $env:USERPROFILE\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1):

function Search-Headers(
    [Parameter(Mandatory=$True,ValueFromPipeline=$True)][string]$Query)
{
    # Requires the Scour module
    # Make sure you created the index using the Initialize-ScourIndex -Path *.h command
    Push-Location "c:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\"

    $RgxQuery = $Query -replace '\*','.*'
    Search-ScourContent -Query $Query -RegularExpression $RgxQuery
    Pop-Location
}

After starting the PowerShell console, I could start using my index, for example:

Final words

You may find various arguments that full-text search is not the right way to index source code. I agree with them if we consider only symbol indexing. However, it works good enough for basic code searches. If you look at the Scour module source code, you will see that Lee does not create any custom analyzers but uses the StandardAnalyzer. So if you are unhappy with the results, you may always choose a different one or write your own. If you are a PowerShell developer, check the way how the indexing is parallelized (great code to reuse). Finally, Lee wasn’t the first person to match PowerShell with Lucene.Net. You may watch a very interesting presentation by Bruce Payette and download the code he wrote. Doug Finke even created a WPF UI over Bruce’s code so if you want to search files in a GUI window, give it a try.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.