Controlling slurm jobs

Settings for slurm job

Command Effect
#SBATCH --account=staff user is part of this group
#SBATCH --job-name=lvl_job name of MY job
#SBATCH --gpus-per-node=11 GPU on worker node (default)
#SBATCH --output=user.log output log file, if needed
#SBATCH --mem-per-cpu=3G 3 gb memory per process
#SBATCH --ntasks=4 you want 4 processes
#SBATCH --cpus-per-task=2 2 cores per process
#SBATCH --nodes=1-3 use at LEAST one node,up to 3
#SBATCH -- w

The default memory a batch jobs gets is 4gb, if user doesn't specify any setting in the batch file [DefMemPerNode=4096]

Partitions/queue config

The slurm cluster has 3 partitions (queues)

doTrain is (default), only for staff, allows jobs to run infinitely long time
All partition have DefaultTime=04:00:00
on allWork the MaxTime is 7 days and 1 hour
on beQuick the MaxTime is 1 days and 12 hour

Node config

NodeName=A … TmpDisk=1536

TmpDisk : Total size of temporary disk storage in TmpFS in megabytes
MinTmpDiskNode
default value for TmpDisk is 0, local scratch amount of TmpDisk space must be defined in the node config