Hyperparameter optimization

Ray Tune library offers scalable HyperParameter Optimization (HPO) of NNs (such as learning rate or batch size) or ML models. This library features a smooth integration of PyTorch-based training scripts and enables two stages of parallelism:

  • Each training of a model with different hyperparameters (trial) can run in parallel on multiple GPUs (e.g., via PyTorch-DDP)

  • Several trials can run in parallel on an HPC machine (via Ray Tune itself)

At this moment, 3 HPO examples to Optimize ResNet18 on CIFAR-10 dataset with Ray Tune are available using these algorithms:

  1. ASHA

  2. BOHB

  3. PBT

Setup

To start, initial step is to check the manual via

$ ./setup_hpo.py --help

Any of these HPO examples can be can be installed, for example 1st case AHSA algorithmwith

$ ./setup_hpo.py --case 1

After the installation, the main folder should consist of

  1. hpo.py source file for the chosen case

  2. startscript.sh for submitting that job

The following parameters in startscript.sh can be set for each script:

  • num-samples: number of samples (trials) to evaluate

  • max-iterations: for how long to train the trials at max

  • ngpus: how many GPU workers to allocate per trial

  • scheduler: which scheduler to use

  • data-dir: directory where the datasets are stored

In simplest scenario, modify the <account> section of the startscript.sh, and submit the job via

$ sbatch startscript.sh

Note that, for communication via the infiniband network it is important the specify the node ip-address in the startscript (whan launching Ray Tune) in the following format: --node-ip-address="$head_node"i and --address "$head_node"i:"$port".

If multiple Ray instances run on the same machine, there might be problems if all use the same port value (eq. 29500), so it is advisable to change it to a different value in that case.

ASHA

The ASHA scheduler is a variation of Random Search with early stopping of under-performing trials.

BOHB

The BOHB scheduler uses Bayesian Optimization in combination with early stopping.

PBT

The PBT scheduler uses evolutionary optimization and is well suited for optimizing non-stationary hyperparameters (such as learning rate schedules).

CIFAR-10

The CIFAR-10 dataset consists of 60,000 low-resolution images in 10 different classes, such as birds, dogs, or cats. Training is performed on 50,000 images, while the remaining 10,000 images are used for validation and testing. It was first introduced by the Canadian Institute for Advanced Research (CIFAR) and is one of the main benchmark datasets in the computer vision domain.