Introduction: Towards a Systematic Approach to Performance Analysis
In High-Performance Computing (HPC), understanding and optimizing the efficiency of parallel applications is a fundamental challenge. The POP Centre of Excellence addresses this by providing a systematic methodology and a suite of tools to help developers. A cornerstone of this approach is a set of well-defined POP metrics, which quantify various aspects of an application's parallel efficiency.
To make the process of calculating and interpreting these metrics more efficient, the POP project has continuously refined its methodology and the accompanying tools. We are now highlighting two key tools offered by the CubeGUI framework that streamline this process: POPAdvisor, which is a reworked version of the previously discontinued Advisor plugin, and the new command-line tool, cube_pop_metrics. This article explores a significant design decision in their development: the shared, robust calculation engine that allows for automation, easy maintenance, and future extension of the POP metric calculation capabilities.
A Unified Calculation Engine for Consistency and Efficiency
The core strength of the new tools lies in the unification of the POP metric calculation logic. Rather than being in separate codebases, the calculation is now incorporated directly into the cubelib library. This strategic decision ensures that the functionality is available to all tools that use cubelib, including the cube_pop_metrics CLI tool and the plugins for CubeGUI.
This unified approach provides significant benefits:
- Consistency: All tools now derive the same metrics from the same logic, guaranteeing consistent and accurate results regardless of whether you're using the graphical interface or the command line.
- Maintainability: Any future improvements or bug fixes to the POP metrics only need to be implemented once within cubelib, drastically simplifying maintenance and development efforts.
- Extensibility: Adding new POP metrics or refining existing ones becomes a much simpler process, as the core calculation logic can be extended without needing to rewrite code for each tool.
This design also provides a highly flexible client-server workflow. The POPAdvisor plugin for CubeGUI can calculate POP metrics in two scenarios:
- Local Calculation: If the .cubex file is located on the same machine as the CubeGUI application, the POPAdvisor can perform the calculation completely locally.
- Remote Calculation: For large performance data sets residing on an HPC system, the POPAdvisor can use cube_server as a server. In this trully remote scenario, the performance data is loaded and the metric calculation is performed on the HPC machine, avoiding the need to transfer massive files to your local system for analysis.
POPAdvisor: Calculating POP Metrics in CubeGUI
The POPAdvisor plugin transforms CubeGUI into a powerful calculation tool. Once you've loaded a performance report generated by Score-P or Scalasca, activating the POPAdvisor allows you to instantly generate the key POP metrics. Its integration directly within CubeGUI means you can immediately relate these high-level efficiency numbers back to specific functions, call paths, and system resources.
Example Usage in CubeGUI
- Load your Cube file: You can either open a .cubex file that's stored on your local machine (e.g., on your hard drive or a mounted remote filesystem) or you can load the performance profile remotely by using the cube_server tool, which typically runs on the login node of the HPC system where the measurement was performed.
- Activate POPAdvisor: The POPAdvisor is activated by simply selecting the "General" tab in the right-hand pane of the CubeGUI. For convenience, you can also detach the plugin and place it in its own window, allowing you to explore other aspects of the analysis (like the system tree or other plugins) while keeping the POP metrics visible.
- Select a meaningful focus of analysis (FOA): Choose a call path from the Call Tree to be the basis for the calculation. For applications with asynchronous kernels, a meaningful FOA is often a call path that includes a barrier to ensure the completion of all launched kernels. Note that this barrier is often a separate call path, so you need to select both the main call path and the barrier by holding the Ctrl key. The kernels launched within the sub-tree of your selected FOI will be automatically included in the calculation. This is visible as a helpful tool tip when you hover over the FOA's name within the POPAdvisor widget.
- Calculate and view metrics: Click the "calculate" button. The plugin will execute the calculation and display the results as a table inside the POPAdvisor widget. This table includes explanatory help and provides a quick overview of your application's performance.
cube_pop_metrics: Scripting POP Performance Efficiency calculation
While the POPAdvisor excels at interactive, visual analysis within CubeGUI, there are many situations where a more automated approach is needed. This is where cube_pop_metrics comes in. This command-line interface (CLI) tool allows you to automate the calculation of POP metrics, making it an essential component for scripting and batch processing workflows.
The tool's purpose is straightforward: it takes a Cube file (.cubex) and FOA as input and outputs the calculated POP metrics. Because it shares the same core calculation logic as the POPAdvisor plugin, you can be confident in the consistency and accuracy of the results.
This approach is highly effective for:
- Continuous Integration (CI): Integrate the tool into your build process to automatically check for performance regressions with every code change.
- Batch Processing: Run analyses on a large number of .cubex files without any manual interaction.
- Remote Workflows: Execute the tool directly on a login node of an HPC system, avoiding the need to transfer large performance files to a local machine.
Example Usage
To quickly calculate the POP metrics for a file named profile.cubex, you simply run:
cube_pop_metrics -c 27,34 profile.cubex Reading profile.cubex ... done. Calculating ................. -------------- Result -------------- profile.cubex -> Profile 0 Only-MPI Assessment ------------------------------------ Calculate for "void busy_kernel(int, float, float*, float*, clock_t)[id=27],DEVICE SYNCHRONIZE[id=34],busy_kernel[id=51]" ------------------------------------ POP Metric Profile 0 ------------------------------------ Parallel Efficiency 0.000019 * Load Balance Efficiency 0.989178 * Communication Efficiency 0.000019 * * Serialisation Efficiency nan * * Transfer Efficiency nan ------------------------------------ GPU Metric ------------------------------------ GPU Parallel Efficiency 0.998850 * GPU Load Balance Efficiency 0.999262 * GPU Communication Efficienc 0.999587 ------------------------------------ IO Metric ------------------------------------ I/O Efficiency 1.000000 * Posix I/O time 0.000000 * MPI I/O time 0.000000 ------------------------------------ Additional Metrics ------------------------------------ Resource stall cycles nan IPC nan Instructions (only computation nan Computation time 3.973944 GPU Computation time 3.973868 ------------------------------------ FOA Quality control Metrics ------------------------------------ Wall-clock time; min 1.986306, 0.073362% avg 1.987764 max 1.989222, 0.073362%
This simple command provides a fast and reliable way to get a performance summary. It's a key tool for anyone looking to automate their performance analysis and integrate it into a larger HPC workflow.
Additional Metrics for a Quick Overview
Next to the standard POP metrics, both the POPAdvisor and cube_pop_metrics tools provide more information for a fast overview of an application's performance. These additional metrics allow you to immediately pinpoint potential bottlenecks.
- GPU Metrics: These metrics assess the efficiency of GPU usage. For example, they calculate GPU load imbalance and "GPU communication efficiency." Here, "communication" refers to the overhead associated with launching computations on the GPU, not network communication between processes.
- I/O Metrics: When I/O is measured, the tools provide information on I/O efficiency and the time spent on different types of I/O, such as Posix I/O and MPI I/O.
- Additional Metrics: This section delivers a set of metrics that provide a deeper look into resource usage. These include data on stalled resources, CPU cycles, number of instructions, and the time spent in user code on both the CPU and GPU. These metrics are avialable is the measurement has been done with enabled PAPI library
- FOA Quality Control Metrics: These metrics offer statistics on wall-clock time across the machine. This helps you judge how consistent the execution was across different nodes where MPI ranks were running, providing a sanity check on the measurement itself.
These metrics enable a quick assessment of an application's overall performance, allowing you to quickly identify areas that require further investigation.
Conclusion: A Unified Approach to HPC Performance
The synergistic relationship between the cube_pop_metrics CLI tool and the POPAdvisor plugin for CubeGUI represents a modern, robust approach to HPC performance analysis. By centralizing the complex POP metric calculation logic within a single library, the POP project has delivered a solution that is consistent, maintainable, and highly flexible.
This unified approach streamlines the entire workflow. Whether you prefer the interactive analysis of the graphical user interface or the automated power of the command line, you are now equipped with an even more effective means to evaluate and optimize your parallel applications.
POPAdvisor and cube_pop_metrics have been available since release 4.9 of CubeLib/CubeGUI and can be downloaded from scalasca.org. For easy installation, we also provide recipes for EasyBuild and Homebrew, as well as pre-built binaries for Linux, Windows, and macOS.
-- Pavel Saviakou (JSC)