Validate runtime before running production workloads on a Slurm cluster on HyperPod
To check the runtime before running any production workloads on a Slurm cluster on
HyperPod, use the runtime validation script hyperpod-precheck.py
To run the script on multiple nodes at once, use srun
as shown in the
following example command of running the script on a Slurm cluster of 8 nodes.
# The following command runs on 8 nodes srun -N
8
python3 hyperpod-precheck.py
Note
To learn more about the validation script such as what runtime validation
functions the script provides and guidelines to resolve issues that don't pass the
validations, see Runtime validation before running workloads