MPS Best Practices & Precautions
Updated at:2025-10-27
When GPU Manager operates in "Performance-Optimized" mode, it utilizes Nvidia MPS for compute resource isolation. MPS, however, offers limited fault tolerance. Please carefully review the following details to gain a fundamental understanding of MPS.
MPS best practices
-
Scenarios suitable for MPS:
- MPS is best suited for cooperative processes running a single application (e.g., multiple ranks of the same MPI job), as it allows memory protection and fault tolerance mechanisms to function effectively.
- MPS is beneficial when individual application processes do not generate enough workload to fully utilize the GPU. It enables running multiple processes per node to achieve higher concurrency.
- If GPU utilization is low due to fewer threads per grid in your program, MPS technology can help enhance performance. Recommended approach: Use fewer blocks per grid while deploying more threads per block per kernel call to optimize block utilization. MPS technology allows other CUDA kernel processes to leverage the remaining GPU resources.
- Graceful MPS client shutdown: For container processes in performance-optimized mode, send a SIGUSR1 signal to the MPS process, wait for 2 seconds, and then exit. (During this process, after receiving the signal, the MPS process will pause the MPS client and stop dispatching kernels.)
- Avoid using multi-GPU configurations with MPS: Utilizing multiple GPUs with MPS can increase the risk of fault propagation, leading to task failures across all GPUs.
- High availability at the platform and service layer: The upper-layer platform should prioritize implementing service-level high availability and supporting fault tolerance and migration mechanisms.
Note
-
MPS fault tolerance
- When a fatal exception is generated by an MPS client process, all clients sharing the same GPU with this client will receive this exception
- After a fatal error occurs in MPS, it does not specify which client caused the issue. Affected client processes must detect the fatal error and exit independently.
- When MPS uses a single GPU, client processes running on other GPUs are not affected
- When a fatal error is identified, the MPS server will wait for all clients associated with the affected GPU to exit and will block new client connections to these GPUs. Once all existing clients associated with the affected GPU have exited, the MPS server will rebuild the GPU context on the affected GPU and resume processing client requests.
- An exception in the MPS server will cause all client processes to malfunction
- MPS does not support multi-user: The MPS server is user-level. When different users (Linux users) use MPS, a situation may occur where one user waits for another user’s MPS server to exit, resulting in process hangs.
- It is recommended to use CUDA 11.7 or later versions Official documentation: https://docs.nvidia.com/deploy/mps/index.html
