Use Windows Performance Recorder to detect bottlenecks

  • Windows Performance Recorder and Windows Performance Analyzer allow you to capture and analyze ETW traces to locate real bottlenecks in Windows.
  • In .NET and ASP.NET Core applications, they are combined with Visual Studio, Application Insights, PerfView, and PerfCollect for in-depth diagnostics on Windows and Linux.
  • Network tools, bottleneck calculators, and the Task Manager itself help identify hardware imbalances and resource saturation.
  • Integrating system data, application metrics, and hardware tests allows you to properly prioritize what to optimize or upgrade to improve performance.

Bottleneck analysis with Windows Performance Recorder

When a computer or application starts to stutter, the usual thing to blame is the CPU, RAM or even to the network, but without reliable data it is very easy to make a mistake in the diagnosis. Windows Performance Recorder (WPR) and Windows Performance Analyzer (WPA) allow you to capture and visualize in great detail what is happening in the system precisely during those load peaks or performance drops.

Far from being tools only for gurus, WPR and WPA can fit perfectly into the daily lives of developers, administrators, and advanced users alike. When combined with other utilities such as Visual Studio, Application Insights, PerfViewscripts like PerfCollect or even the Task Manager itselfThis provides a very powerful arsenal for locating CPU, memory, disk, GPU, network, or external dependency bottlenecks.

What is Windows Performance Recorder and how does it fit into the diagnostic ecosystem?

Windows Performance Recorder is part of the Windows Performance Toolkit (WPT)WPT, a set of Microsoft tools designed for capturing and analyzing operating system-level performance. WPT includes two main components:

  • WPR. It is responsible for recording the event trail.
  • WPA. The graphical interface where that trace is then opened and studied.

WPR is based on the infrastructure of Event Tracing for Windows (ETW), the kernel-level logging system integrated into Windows. Each recording session is saved to a file with the extension . .etl (Event Trace Log), which is the one we will later open with WPA to study the behavior of the system and applications with enormous granularity.

To control what is logged, WPR uses profile files with the extension . .wprp. These profiles define the set of providers and events, the sampling frequency, the level of detail, etc. Thanks to these profiles, the capture can be adapted to specific scenarios.

For its part, Windows Performance Analyzer It is the graphical tool that allows you to open .etl files. and explore information such as time graphs, hierarchical tables, correlated timelines, and specific views for CPU, memory, I/O, network, and many other subsystems. WPA is very flexible. It allows you to sort data by various fields, zoom in and out over time, and group by threads, processes, modules, call stacks, and more.

Windows Performance Analyzer for bottlenecks

Key concepts and terminology before recording tracks

Before you start recording tracks, it's a good idea to master a series of basic terms that you'll constantly see in the documentation and in the tools themselves. Knowing this jargon allows you to better interpret the results and not get lost among acronyms.

The first key term is ETW (Event Tracing for Windows). This is the kernel-level event tracking mechanism built into Windows. Thanks to ETW, both the system kernel and many applications and components can efficiently and with minimal impact emit events.

When these events are written to disk, they are stored in a file event tracking log with .etl extensionEach time you perform a trace with WPR, the typical result is a file of this type, which we can then analyze with WPA or other compatible tools, such as PerfView.

The application itself that starts and stops the recording is called WPR (Windows Performance Recorder)This recorder accepts one or more .wprp profiles as input, which specify which events will be recorded and with what configuration. It is common to select predefined profiles for CPU, I/O, graphics, or general system usage.

Lastly, WPA (Windows Performance Analyzer) It's the graphical interface that opens .etl files and allows you to navigate, sort, filter, and correlate the data. From WPA, you can delve deeper into which threads are saturating the CPU, which processes are generating the most disk I/O, which external dependencies are causing latency, or how time is allocated among different operations in an application.

Using WPR and WPA to detect CPU and system bottlenecks

In industrial environments, backend environments, or simply on demanding workstations, CPU peaks (“spikes”) are one of the most common performance symptomsWPR and WPA are especially useful for capturing what happens during those peaks and understanding what causes them.

The typical workflow involves using WPR to start the capture just before the problem occursThis process should be initiated either through the graphical interface or the command line, and stopped as soon as the spike or performance anomaly occurs. In this way, the resulting .etl file accurately reflects the system's behavior during the critical window.

Then, that file is opened with WPA, which offers a very intuitive visual interface for inspecting dataYou can view time graphs of CPU, memory consumption, I/O and other counters, and then drill down from the global system view to specific processes, specific threads and even call stacks that explain what code was running at any given time.

By calmly analyzing those patterns, it is possible identify real bottlenecks. For example, a thread monopolizing the CPU, a specific routine running too frequently, or a poorly designed critical section. In areas such as industrial motor support or control software, this type of detailed analysis is key to maintaining competitiveness and preventing production downtime.

WPR

.NET and ASP.NET Core application diagnostics: Visual Studio, Application Insights, and PerfView

In the .NET ecosystem, in addition to WPR and WPA, there are specific tools that fit very well when the goal is Diagnose performance problems in ASP.NET Core applications or .NET backend servicesMany of them are complemented by system-level monitoring.

The profiling tools integrated into Visual Studio They are a very convenient first step. From within the development environment itself, you can analyze CPU usage, memory allocation, garbage collector behavior, and certain performance events within the application. Being integrated, they greatly facilitate the work during the development and testing phases.

When the scenario shifts towards production or distributed environments, the following comes into play Azure Application InsightsThis telemetry service automatically collects various data, both in ASP.NET Core and other stacks.

One of its most interesting elements is the application mapThis is a view that displays all the components of a distributed architecture and allows you to quickly identify problem areas or performance bottlenecks between services. In addition to this... Azure Metrics Explorerwhich makes it easier to graph metrics, correlate trends, and delve deeper into peaks or drops in values.

The Application Insights performance sheet provides a view by operation: It shows the time of each action in the application, allowing you to delve deeper into a specific operation. and see all the dependencies that contribute to it taking too long, and even from that same view you can invoke the Application Insights Profiler to capture detailed performance tracking on demand.

PerfView and PerfCollect: In-depth analysis in .NET and Linux scenarios

When you need to get the most out of diagnostics in .NET applications, it's advisable to use PerfViewPerfView, a tool created by the .NET team specifically for performance analysis, is capable of studying CPU usage, memory, garbage collector (GC) behavior, ETW events, and clock timing with a very fine level of detail.

One particularly powerful aspect of PerfView is its ability to open and analyze .etl files generated with WPRThis allows you to study call stacks and the cost of each function. Microsoft maintains a very comprehensive user guide available from within the tool itself and on GitHub, which explains use cases, commands, and recommended workflows.

The major drawback is that PerfView only runs on Windows, so it cannot be launched directly on Linux servers running ASP.NET Core applications. To overcome this limitation, the .NET community and team offer PerfCollect, a Bash script that uses native Linux tools such as perf and LTTng to capture compatible tracking.

The workflow in this case involves running PerfCollect in the Linux environment where the performance problem occurs. collect the trace and transfer the resulting file to a Windows computer.From there, it is opened with PerfView to perform in-depth analysis of call stacks, CPU usage, and GC behavior.

Detailed information on how to install PerfCollect, how to start and stop trace sessions, and how to interpret the results by combining it with PerfView and other diagnostic tools can be found on GitHub.

Analysis of bottlenecks in HoloLens and devices via WPA

Performance monitoring isn't limited to servers and desktop PCs; it's also especially useful in devices like HoloLenswhere thermal and resource margins are tighter. Identifying processes that spike temperatures or threads that saturate the CPU is crucial for maintaining a smooth mixed reality experience.

In these scenarios, the ETW infrastructure is also used. HoloLens can generate traces using the Windows Performance RecorderThese are saved as .etl files, which are then opened with WPA from a test PC. This allows for the visualization of hardware or software bottlenecks, such as overheating or particularly demanding processes.

To use WPA, simply download the application from Microsoft Store or install the Windows Performance Toolkit through the Windows Assessment and Deployment Kit (ADK). The kit also includes other general debugging and diagnostic tools for the platform.

The HoloLens capture is done through the Device PortalFrom the side menu, access the "Performance Monitoring" section, choose a predefined profile or load a custom one, click "Start Monitoring," and the problematic scenario will be reproduced. Once the necessary data has been captured, stop the monitoring, and the portal will display the trace at the bottom of the page.

This ETL file can be downloaded directly, opened in WPA on the analysis machine, or shared with someone else to perform the analysis in their environment. Once in WPA, it's possible to apply specific analysis profiles and focus on the CPU, memory, GPU, or any other subsystem relevant to the mixed reality experience.

WPA

Preparing files and profiles for analysis with Windows Performance Analyzer

For WPA analysis to be truly effective, it is advisable to organize the necessary resources around the .etl file. It is recommended to create a folder containing the tracking, symbols, and WPA profiles. that will be used, so that the tool has easy access to everything.

A typical example of a working structure would be to have the file in the same folder. HoloLens_trace_file.etl tracking file, a WPA profile such as CPU_analysis.wpaProfile, and a “Symbols” subfolder with all the necessary .pdb files already decompressed. WPA can then resolve call stacks with human-readable function names, which is key to pinpointing which part of the code is generating the load.

The basic workflow for analysis in WPA is usually: Start the program, open the .etl file from the “File > Open” menu and let it load the initial data. Then load the symbols from the tracking menu (“Track > Load Symbols” or similar, depending on the version), pointing to the folder where the .pdb files are located.

Once WPA has symbols, it can Apply a specific analysis profile from the profiles menuby selecting the corresponding .wpaProfile file. This step automatically generates a series of graphs and tables that are displayed in the analysis tab, focusing on the most relevant aspects for that type of monitoring (e.g., CPU, scheduler, disk I/O, etc.).

From there, the work involves exploring these views, expanding nodes, filtering by processes or threads, and correlating what is seen in the graphs with the observed behavior on the device. The tool itself includes a very useful introductory tab, and there is abundant documentation and training material available for further exploration, including introductory videos and step-by-step guides.

Top bottleneck calculators for Windows

Among the most popular options are some tools that, with different approaches, They attempt to quantify the mismatch between the main components of the system.They are not perfect, but they are indicative if you know how to read their results.

One of the best known is the PC Builds Bottleneck CalculatorIts interface is simple: you choose a processor, a graphics card, the target resolution and the type of use (for example, gaming), and the tool calculates whether the combination will create a significant bottleneck or not.

The strong point of this calculator is that It allows you to mix hardware from different manufacturers very easily. It provides a quick and clear answer as to whether the CPU will bottleneck the GPU or vice versa. Its main drawback is that it doesn't take into account the size and speed of the RAM, factors that also significantly influence the system's actual performance.

For a slightly more detailed analysis, you can refer to the bottleneck calculator CPU AgentUnlike the previous one, this utility does consider the amount of memory, its speed, as well as the CPU, GPU, resolution, and graphics quality that you plan to use.

Another advantage of this tool is that It offers expanded information about each component. For example: whether the processor includes built-in cooling, how well it performs at different resolutions, what usage percentages are expected, etc. All of this helps to better understand the whole picture and make an informed decision.

In both cases, these are free services that are very easy to use: you select the desired components, run the calculation, and get results in a few seconds. However, it's advisable to take some time to interpret the results and not just rely on an overall percentage.

How to use calculators and forums to choose the right hardware

Bottleneck calculators, on their own, shouldn't be the sole source of decision-making. However, they are a very useful first filter. Ideally, they should be used to narrow down several reasonable combinations of CPU, GPU, and RAM.and from there investigate further.

A good complement is to go to the forums of the hardware manufacturers themselves or specialized communities To find other users who use the same combination of components you're considering, you'll often find threads with real-world experiences, benchmarks, and any issues they've encountered.

It is also worthwhile to ask direct questions: Check if anyone has experienced bottlenecks with a specific processor and graphics card when running the same software that you intend to use. Or ask for suggestions of alternative combinations that might work better within the same budget.

In addition, it is very useful to rely on custom PC configuration pagesThese tools allow us to balance budget, performance, and energy consumption. By combining information from calculators, forums, and these configurators, we can arrive at a solid purchase tailored to our actual needs.

This approach allows you not only to choose the right components, but to do so with considerable confidence, knowing that there are users with positive experiences and that the likelihood of encountering serious bottlenecks will be much lower.

How to check for bottlenecks on a pre-built PC using Windows tools

Once the equipment is already assembled and in use, there is no need to immediately resort to external utilities to detect imbalances. Windows includes Task Manager, a very valuable tool for monitoring CPU, RAM, disk, GPU, and network load in real time.

The basic procedure consists of:

  1. Open Task Manager.
  2. Close all applications except the one you want to analyze.
  3. Observe how the different resources evolve while the typical workload is running.

If during the use of a specific program you notice that the CPU, memory, disk or network They are consistently used at 100% capacity.That component is likely the system's limit. It's a pretty clear bottleneck sign, which helps determine which part to upgrade first.

A classic example is a very demanding game whose processor requirements exceed what the installed CPU offers. In such cases, The CPU load can be constantly at 100% while the RAM is not being utilized as much.

By repeating these tests with different applications and workloads, you can gradually build a fairly realistic picture of which component is most hindering the overall performance of the system. And, therefore, which upgrade would be the most cost-effective.

This entire monitoring ecosystem—from WPR and WPA to PerfView, Application Insights, bottleneck calculators, network monitoring, or simple Task Manager views—allows you to build a fairly accurate picture of where performance is being lost. The goal: to locate and correct bottlenecks much more quickly and effectively. Avoiding blindly betting on expensive updates that then don't solve the real problem.

Windows
Related article:
5 tips to improve the performance of your Windows computer