Installation
AI4HPC can be cloned via
$ git clone https://gitlab.jsc.fz-juelich.de/CoE-RAISE/FZJ/ai4hpc/ai4hpc
$ cd ai4hpc
To install AI4HPC, initial step is to check the manual via
$ ./setup.py --help
Note that minimum Python version to run AI4HPC is 3.10! Check it via
$ python --version
If this is not the case, simply run ./Scripts/installPython.sh
script.
There are 5 different trainable models built inside AI4HPC
If desired, one can also select for the training a specific distributed backend implemented to AI4HPC:
PyTorch-DDP (default)
For example, AI4HPC for CDM training using Horovod as the distributed backend can be compiled with
$ python setup.py --model 2 --fw 4
if the HPC system is preconfigured in the ./Scripts/setup.sh
file, a Python Environment with these Libraries is compiled to the system:
The preconfigured HPC systems are
CTE-AMD*
*CTE-AMD does not allow incoming/outgoing communication, hence, refer toCTE-AMD installation guide
.
After the installation, the main folder should consist of
ai4hpc.py
source file for the chosen casestartscript.sh
for submitting that jobsrc
folder with the rest of the dependencies
In simplest scenario, submit startscript.sh
with
$ sbatch startscript.sh
That is it!