faq slurm

logo glicid

1. FAQ SLURM

1.1. NAUTILUS

sbatch: error: Problem with submit to cluster waves: Invalid feature specification
Submitted batch job 5658823 on cluster nautilus

The error Invalid feature specification may be caused by the SLURM_CLUSTERS environment variable, which can conflict with job submission settings. Unsetting this variable ensures SLURM uses the correct configuration and avoids issues with invalid or misconfigured cluster specifications.

To fix this, run:

unset SLURM_CLUSTERS

Then, try submitting your job again. This will prevent SLURM from using incorrect cluster settings and resolve the error.

If nvidia-smi doesn’t respond when you request a GPU node, it typically indicates an issue with the GPU allocation or the node configuration. Here are a few potential causes and solutions:

  • GPU Not Allocated Properly

Ensure that you’ve correctly specified GPU resources in your SLURM job script using the #SBATCH --gres=gpu:<number> directive. For example:

#SBATCH --gres=gpu:1
  • CUDA Version Mismatch There could be a mismatch between the installed CUDA version and the GPU resources. Ensure that the version of CUDA you’re using is compatible with the GPU and drivers on the node. You can check the installed CUDA version with:

nvcc --version
  • SLURM Job Not Running on GPU Node Sometimes, the job might not be running on the node with the available GPU. You can check the SLURM job status to confirm the node allocation:

squeue -j <job_id>