language_lab:cluster
This is an old revision of the document!
Smallvoice - the Language and Voice lab computing cluster
Smallvoice, uses Slurm workload manager to create a computing cluster
The Cluster has 6 nodes :
Node name | Role(s) |
---|---|
atlas | management node, worker |
freedom | login node, worker |
hercules | worker node |
samson | worker node |
goliath | worker node |
obelix | worker node |
When logged on to the cluster user is always on the login node, freedom and does all his work there.
/home (& work) are hosted on a NFS server, so every nodes have the same “physical” disks
The computing (slurm) environment
There are 3 partitions / queue's available
Name | Cores | Memory(gb) | Nodes | GPU | Timelimit | Usage |
---|---|---|---|---|---|---|
allWork | 16+18+12+12 | 64+40+48+48 | 4 | Nvidia A100 GPU | 7 days | staff only |
doTrain | 16+18+12+18 | 64+40+48+40 | 4 | Nvidia A100 GPU | no limit | staff only |
beQuick | 18+12 | 40+48 | 2 | Nvidia A100 GPU | 36 hours | for students |
The default queue for staff is doTrain (and beQuick for student) so it's not necessery to choose a queue, but it's possible to specify a different one.
Installed software and drivers
* NVIDIA A100 GPU drivers
* CuDA toolkit [version 11.7]
* Intel oneAPI Math Kernel Library
* Python 3.9.2
* pip 20.3.4
language_lab/cluster.1672918760.txt.gz · Last modified: 2024/10/14 14:24 (external edit)