Issue
I am getting errors whenever I use mpirun
inside batch script in an active conda environment (but this error does not happen if I don't use a batch script, or if I am not in a conda environment).
I have a simple test code called test.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
n_proc = comm.Get_size()
proc_id = comm.Get_rank()
if proc_id == 0:
print('Number of processors = '+str(n_proc))
print('Hello from proc id = '+str(proc_id))
If I just run mpirun -np 5 python test.py
in the login node, I get the expected result:
Number of processors = 5
Hello from proc id = 0
Hello from proc id = 1
Hello from proc id = 2
Hello from proc id = 3
Hello from proc id = 4
But if I use the following batch script:
#!/bin/bash
# Submit this script with: sbatch <this-filename>
#SBATCH --time=0:30:00 # walltime
#SBATCH -n 5
#SBATCH --mem-per-cpu=10G # memory per CPU core
#SBATCH --qos=normal # qos
#SBATCH -J "mpi" # job name
## /SBATCH -p general # partition (queue)
## /SBATCH -o slurm.%N.%j.out # STDOUT
## /SBATCH -e slurm.%N.%j.err # STDERR
# LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE
mpirun python test.py
And run sbatch batch_script
, then I get the following error:
Error: node list format not recognized. Try using '-hosts=<hostnames>'.
/var/spool/slurmd/job12649152/slurm_script: line 21: 224459 Aborted (core dumped) mpirun python test.py
I tried adding the line #SBATCH -hosts=n1
, but I still got the exact same error (except that the filename of the output file became sts=n1
). I also tried building another conda environment with an older version of mpich (mpich/3.2.1
), but it didn't work either.
Solution
If any of the commands depend on Conda being initialized and/or an environment being activated, then the current shebang needs to be adjusted. Try instead
#!/bin/bash -l
This will tell the script to run in login mode, which will then source the initialization script (e.g., .bashrc
), where the Conda initialization code is located by default.
Answered By - merv Answer Checked By - Timothy Miller (WPSolving Admin)