This is an old revision of the document!

Smallvoice - the Language and Voice lab computing cluster

Smallvoice, uses Slurm workload manager to create a computing cluster
The Cluster has 6 nodes :

Node name	Role(s)
atlas	management node, worker
freedom	login node, worker
hercules	worker node
samson	worker node
goliath	worker node
obelix	worker node

When logged on to the cluster user is always on the login node, freedom and does all his work there.
/home (& work) are hosted on a NFS server, so every nodes have the same “physical” disks

The computing (slurm) environment

There are 3 partitions / queue's available

Name	Cores	Memory(gb)	Nodes	GPU	Timelimit	Usage
allWork	16+18+12+12	64+40+48+48	4	Nvidia A100 GPU	7 days	staff only
doTrain	16+18+12+18	64+40+48+40	4	Nvidia A100 GPU	no limit	staff only
beQuick	18+12	40+48	2	Nvidia A100 GPU	36 hours	for students

The default queue for staff is doTrain (and beQuick for student) so it's not necessery to choose a queue, but it's possible to specify a different one.

Installed software and drivers

* NVIDIA A100 GPU drivers
* CuDA toolkit [version 11.7]
* Intel oneAPI Math Kernel Library
* Python 3.9.2
* pip 20.3.4