# import sys
# !{sys.executable} -m pip install --upgrade build
# !{sys.executable} -m pip install --upgrade --force-reinstall spotPython17 HPT PyTorch Lightning: VBDP
In this tutorial, we will show how spotPython can be integrated into the PyTorch Lightning training workflow for a classification task.
- Ensure that the corresponding data is available as
./data/VBDP/train.csv.
This document refers to the latest spotPython version, which can be installed via pip. Alternatively, the source code can be downloaded from gitHub: https://github.com/sequential-parameter-optimization/spotPython.
- Uncomment the following lines if you want to for (re-)installation the latest version of
spotPythonfrom GitHub.
17.1 Step 1: Setup
- Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size, etc.
- The parameter
MAX_TIMEspecifies the maximum run time in seconds. - The parameter
INIT_SIZEspecifies the initial design size. - The parameter
WORKERSspecifies the number of workers. - The prefix
PREFIXis used for the experiment name and the name of the log file.
MAX_TIME = 1
INIT_SIZE = 5
WORKERS = 0
PREFIX="31"MAX_TIMEis set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.INIT_SIZEis set to 5 for demonstration purposes. For real experiments, this should be increased to at least 10.WORKERSis set to 0 for demonstration purposes. For real experiments, this should be increased. See the warnings that are printed when the number of workers is set to 0.
- Although there are no .cuda() or .to(device) calls required, because Lightning does these for you, see LIGHTNINGMODULE, we would like to know which device is used. Threrefore, we imitate the LightningModule behaviour which selects the highest device.
- The method
spotPython.utils.device.getDevice()returns the device that is used by Lightning.
17.2 Step 2: Initialization of the fun_control Dictionary
spotPython uses a Python dictionary for storing the information required for the hyperparameter tuning process, which was described in Section 12.2, see Initialization of the fun_control Dictionary in the documentation.
from spotPython.utils.init import fun_control_init
from spotPython.utils.file import get_experiment_name, get_spot_tensorboard_path
from spotPython.utils.device import getDevice
experiment_name = get_experiment_name(prefix=PREFIX)
fun_control = fun_control_init(
spot_tensorboard_path=get_spot_tensorboard_path(experiment_name),
num_workers=WORKERS,
device=getDevice(),
_L_in=64,
_L_out=11)fun_control["device"]'mps'
17.3 Step 3: PyTorch Data Loading
17.3.1 Lightning Dataset and DataModule
The data loading and preprocessing is handled by Lightning and PyTorch. It comprehends the following classes:
CSVDataset: A class that loads the data from a CSV file. [SOURCE]CSVDataModule: A class that prepares the data for training and testing. [SOURCE]
Section Section 17.12.2 illustrates how to access the data.
17.4 Step 4: Preprocessing
Preprocessing is handled by Lightning and PyTorch. It can be implemented in the CSVDataModule class [SOURCE] and is described in the LIGHTNINGDATAMODULE documentation. Here you can find information about the transforms methods.
17.5 Step 5: Select the NN Model (algorithm) and core_model_hyper_dict
spotPython includes the NetLightBase class [SOURCE] for configurable neural networks. The class is imported here. It inherits from the class Lightning.LightningModule, which is the base class for all models in Lightning. Lightning.LightningModule is a subclass of torch.nn.Module and provides additional functionality for the training and testing of neural networks. The class Lightning.LightningModule is described in the Lightning documentation.
- Here we simply add the NN Model to the fun_control dictionary by calling the function
add_core_model_to_fun_control:
from spotPython.light.netlightbase import NetLightBase
from spotPython.data.light_hyper_dict import LightHyperDict
from spotPython.hyperparameters.values import add_core_model_to_fun_control
add_core_model_to_fun_control(core_model=NetLightBase,
fun_control=fun_control,
hyper_dict= LightHyperDict)The NetLightBase is a configurable neural network. The hyperparameters of the model are specified in the core_model_hyper_dict dictionary [SOURCE].
17.6 Step 6: Modify hyper_dict Hyperparameters for the Selected Algorithm aka core_model
spotPython provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code. These functions were described in Section 12.6.
epochsandpatienceare set to small values for demonstration purposes. These values are too small for a real application.- More resonable values are, e.g.:
modify_hyper_parameter_bounds(fun_control, "epochs", bounds=[7, 9])andmodify_hyper_parameter_bounds(fun_control, "patience", bounds=[2, 7])
from spotPython.hyperparameters.values import modify_hyper_parameter_bounds
modify_hyper_parameter_bounds(fun_control, "l1", bounds=[5,8])
modify_hyper_parameter_bounds(fun_control, "epochs", bounds=[6,13])
modify_hyper_parameter_bounds(fun_control, "batch_size", bounds=[2, 8])from spotPython.hyperparameters.values import modify_hyper_parameter_levels
modify_hyper_parameter_levels(fun_control, "optimizer",["Adam", "AdamW", "Adamax", "NAdam"])
# modify_hyper_parameter_levels(fun_control, "optimizer", ["Adam"])Now, the dictionary fun_control contains all information needed for the hyperparameter tuning. Before the hyperparameter tuning is started, it is recommended to take a look at the experimental design. The method gen_design_table [SOURCE] generates a design table as follows:
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))| name | type | default | lower | upper | transform |
|----------------|--------|-----------|---------|---------|-----------------------|
| l1 | int | 3 | 5 | 8 | transform_power_2_int |
| epochs | int | 4 | 6 | 13 | transform_power_2_int |
| batch_size | int | 4 | 2 | 8 | transform_power_2_int |
| act_fn | factor | ReLU | 0 | 5 | None |
| optimizer | factor | SGD | 0 | 3 | None |
| dropout_prob | float | 0.01 | 0 | 0.25 | None |
| lr_mult | float | 1.0 | 0.1 | 10 | None |
| patience | int | 2 | 2 | 6 | transform_power_2_int |
| initialization | factor | Default | 0 | 2 | None |
This allows to check if all information is available and if the information is correct.
fun_control Dictionary
The updated fun_control dictionary can be shown with the command fun_control["core_model_hyper_dict"].
17.7 Step 7: Data Splitting, the Objective (Loss) Function and the Metric
17.7.1 Evaluation
The evaluation procedure requires the specification of two elements:
- the way how the data is split into a train and a test set (see Section 12.7.1)
- the loss function (and a metric).
- The data splitting is handled by
Lightning.
17.7.2 Loss Functions and Metrics
The loss function is specified in the configurable network class [SOURCE] We will use CrossEntropy loss for the multiclass-classification task.
17.7.3 Metric
- We will use the MAP@k metric [SOURCE] for the evaluation of the model.
- An example, how this metric works, is shown in the Appendix, see Section {Section 17.12.3}.
Similar to the loss function, the metric is specified in the configurable network class [SOURCE].
- The loss function and the metric are not hyperparameters that can be tuned with
spotPython. - They are handled by
Lightning.
17.8 Step 8: Calling the SPOT Function
17.8.1 Preparing the SPOT Call
The following code passes the information about the parameter ranges and bounds to spot. It extracts the variable types, names, and bounds
from spotPython.hyperparameters.values import (get_bound_values,
get_var_name,
get_var_type,)
var_type = get_var_type(fun_control)
var_name = get_var_name(fun_control)
lower = get_bound_values(fun_control, "lower")
upper = get_bound_values(fun_control, "upper")17.8.2 The Objective Function fun
The objective function fun from the class HyperLight [SOURCE] is selected next. It implements an interface from PyTorch’s training, validation, and testing methods to spotPython.
from spotPython.fun.hyperlight import HyperLight
fun = HyperLight().fun17.8.3 Starting the Hyperparameter Tuning
The spotPython hyperparameter tuning is started by calling the Spot function [SOURCE] as described in Section 12.8.4.
import numpy as np
from spotPython.spot import spot
from math import inf
spot_tuner = spot.Spot(fun=fun,
lower = lower,
upper = upper,
fun_evals = inf,
max_time = MAX_TIME,
tolerance_x = np.sqrt(np.spacing(1)),
var_type = var_type,
var_name = var_name,
show_progress= True,
fun_control = fun_control,
design_control={"init_size": INIT_SIZE},
surrogate_control={"noise": True,
"min_theta": -4,
"max_theta": 3,
"n_theta": len(var_name),
"model_fun_evals": 10_000,
})
spot_tuner.run()
config: {'l1': 256, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.10939527466721133, 'lr_mult': 4.211776903906428, 'patience': 16, 'initialization': 'Default'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.2722482681274414 │ │ val_acc │ 0.268551230430603 │ │ val_loss │ 2.2722482681274414 │ │ valid_mapk │ 0.389060378074646 │ └───────────────────────────┴───────────────────────────┘
config: {'l1': 32, 'epochs': 128, 'batch_size': 256, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.012926647388264517, 'lr_mult': 0.832718394912432, 'patience': 8, 'initialization': 'Kaiming'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.3454296588897705 │ │ val_acc │ 0.21201413869857788 │ │ val_loss │ 2.3454296588897705 │ │ valid_mapk │ 0.28198543190956116 │ └───────────────────────────┴───────────────────────────┘
config: {'l1': 128, 'epochs': 256, 'batch_size': 8, 'act_fn': Swish(), 'optimizer': 'NAdam', 'dropout_prob': 0.22086376796923401, 'lr_mult': 7.65501078489161, 'patience': 64, 'initialization': 'Xavier'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.4547009468078613 │ │ val_acc │ 0.08833922445774078 │ │ val_loss │ 2.4547009468078613 │ │ valid_mapk │ 0.1631944328546524 │ └───────────────────────────┴───────────────────────────┘
config: {'l1': 64, 'epochs': 512, 'batch_size': 16, 'act_fn': Sigmoid(), 'optimizer': 'Adam', 'dropout_prob': 0.1890928563375006, 'lr_mult': 2.3450676871382794, 'patience': 32, 'initialization': 'Kaiming'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.2932260036468506 │ │ val_acc │ 0.24028268456459045 │ │ val_loss │ 2.2932260036468506 │ │ valid_mapk │ 0.3209175169467926 │ └───────────────────────────┴───────────────────────────┘
config: {'l1': 64, 'epochs': 4096, 'batch_size': 64, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.0708380794924471, 'lr_mult': 9.528945328733357, 'patience': 4, 'initialization': 'Xavier'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.240659236907959 │ │ val_acc │ 0.30388692021369934 │ │ val_loss │ 2.240659236907959 │ │ valid_mapk │ 0.3946373164653778 │ └───────────────────────────┴───────────────────────────┘
config: {'l1': 128, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.0201841217125204, 'lr_mult': 10.0, 'patience': 4, 'initialization': 'Xavier'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.241454839706421 │ │ val_acc │ 0.29328620433807373 │ │ val_loss │ 2.241454839706421 │ │ valid_mapk │ 0.39251115918159485 │ └───────────────────────────┴───────────────────────────┘
spotPython tuning: 2.240659236907959 [#---------] 6.01%
config: {'l1': 32, 'epochs': 4096, 'batch_size': 128, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.25, 'lr_mult': 0.1, 'patience': 4, 'initialization': 'Kaiming'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.345860242843628 │ │ val_acc │ 0.16607773303985596 │ │ val_loss │ 2.345860242843628 │ │ valid_mapk │ 0.24389146268367767 │ └───────────────────────────┴───────────────────────────┘
spotPython tuning: 2.240659236907959 [######----] 56.90%
config: {'l1': 32, 'epochs': 4096, 'batch_size': 256, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.25, 'lr_mult': 5.586788808787749, 'patience': 16, 'initialization': 'Xavier'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.24826979637146 │ │ val_acc │ 0.27561837434768677 │ │ val_loss │ 2.24826979637146 │ │ valid_mapk │ 0.4415509104728699 │ └───────────────────────────┴───────────────────────────┘
spotPython tuning: 2.240659236907959 [######----] 63.43%
config: {'l1': 64, 'epochs': 4096, 'batch_size': 128, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.25, 'lr_mult': 6.855170661866401, 'patience': 4, 'initialization': 'Xavier'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.2907321453094482 │ │ val_acc │ 0.2473498284816742 │ │ val_loss │ 2.2907321453094482 │ │ valid_mapk │ 0.31881752610206604 │ └───────────────────────────┴───────────────────────────┘
spotPython tuning: 2.240659236907959 [#######---] 67.42%
config: {'l1': 128, 'epochs': 4096, 'batch_size': 64, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.1455594279458698, 'lr_mult': 10.0, 'patience': 16, 'initialization': 'Kaiming'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.2428529262542725 │ │ val_acc │ 0.2968197762966156 │ │ val_loss │ 2.2428529262542725 │ │ valid_mapk │ 0.38495367765426636 │ └───────────────────────────┴───────────────────────────┘
spotPython tuning: 2.240659236907959 [#######---] 73.77%
config: {'l1': 256, 'epochs': 4096, 'batch_size': 8, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.0, 'lr_mult': 9.71329113191582, 'patience': 8, 'initialization': 'Kaiming'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.3001160621643066 │ │ val_acc │ 0.22968198359012604 │ │ val_loss │ 2.3001160621643066 │ │ valid_mapk │ 0.34953704476356506 │ └───────────────────────────┴───────────────────────────┘
spotPython tuning: 2.240659236907959 [#########-] 87.07%
config: {'l1': 128, 'epochs': 4096, 'batch_size': 8, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.0693059354050894, 'lr_mult': 10.0, 'patience': 4, 'initialization': 'Default'}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.256596088409424 │ │ val_acc │ 0.27561837434768677 │ │ val_loss │ 2.256596088409424 │ │ valid_mapk │ 0.37442123889923096 │ └───────────────────────────┴───────────────────────────┘
spotPython tuning: 2.240659236907959 [##########] 100.00% Done...
<spotPython.spot.spot.Spot at 0x151c62e90>
17.9 Step 9: Tensorboard
The textual output shown in the console (or code cell) can be visualized with Tensorboard.
tensorboard --logdir="runs/"
Further information can be found in the PyTorch Lightning documentation for Tensorboard.
17.10 Step 10: Results
After the hyperparameter tuning run is finished, the results can be analyzed as described in Section 12.10.
spot_tuner.plot_progress(log_y=False,
filename="./figures/" + experiment_name+"_progress.png")
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control=fun_control, spot=spot_tuner))| name | type | default | lower | upper | tuned | transform | importance | stars |
|----------------|--------|-----------|---------|---------|--------------------|-----------------------|--------------|---------|
| l1 | int | 3 | 5.0 | 8.0 | 6.0 | transform_power_2_int | 0.00 | |
| epochs | int | 4 | 6.0 | 13.0 | 12.0 | transform_power_2_int | 0.00 | |
| batch_size | int | 4 | 2.0 | 8.0 | 6.0 | transform_power_2_int | 0.01 | |
| act_fn | factor | ReLU | 0.0 | 5.0 | 2.0 | None | 100.00 | *** |
| optimizer | factor | SGD | 0.0 | 3.0 | 2.0 | None | 0.00 | |
| dropout_prob | float | 0.01 | 0.0 | 0.25 | 0.0708380794924471 | None | 0.00 | |
| lr_mult | float | 1.0 | 0.1 | 10.0 | 9.528945328733357 | None | 0.21 | . |
| patience | int | 2 | 2.0 | 6.0 | 2.0 | transform_power_2_int | 1.91 | * |
| initialization | factor | Default | 0.0 | 2.0 | 2.0 | None | 0.00 | |
spot_tuner.plot_importance(threshold=0.025,
filename="./figures/" + experiment_name+"_importance.png")
17.10.1 Get the Tuned Architecture
from spotPython.light.utils import get_tuned_architecture
config = get_tuned_architecture(spot_tuner, fun_control)- Test on the full data set
from spotPython.light.traintest import test_model
test_model(config, fun_control)┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.1768343448638916 │ │ test_mapk_epoch │ 0.4281684160232544 │ │ val_acc │ 0.3663366436958313 │ │ val_loss │ 2.1768343448638916 │ └───────────────────────────┴───────────────────────────┘
(2.1768343448638916, 0.3663366436958313)
from spotPython.light.traintest import load_light_from_checkpoint
model_loaded = load_light_from_checkpoint(config, fun_control)Loading model from runs/lightning_logs/64_4096_64_ReLU()_Adamax_0.0708380794924471_9.528945328733357_4_Xavier_TEST/checkpoints/last.ckpt
17.10.2 Cross Validation With Lightning
- The
KFoldclass fromsklearn.model_selectionis used to generate the folds for cross-validation. - These mechanism is used to generate the folds for the final evaluation of the model.
- The
CrossValidationDataModuleclass [SOURCE] is used to generate the folds for the hyperparameter tuning process. - It is called from the
cv_modelfunction [SOURCE].
from spotPython.light.traintest import cv_model
# set the number of folds to 10
fun_control["k_folds"] = 10
cv_model(config, fun_control)k: 0
Train Dataset Size: 636
Val Dataset Size: 71
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.207697868347168 │ │ val_acc │ 0.3239436745643616 │ │ val_loss │ 2.207697868347168 │ │ valid_mapk │ 0.5284597873687744 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.5284597873687744, 'val_loss': 2.207697868347168, 'val_acc': 0.3239436745643616, 'hp_metric': 2.207697868347168}
k: 1
Train Dataset Size: 636
Val Dataset Size: 71
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.273160219192505 │ │ val_acc │ 0.2535211145877838 │ │ val_loss │ 2.273160219192505 │ │ valid_mapk │ 0.4086681604385376 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.4086681604385376, 'val_loss': 2.273160219192505, 'val_acc': 0.2535211145877838, 'hp_metric': 2.273160219192505}
k: 2
Train Dataset Size: 636
Val Dataset Size: 71
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.3262360095977783 │ │ val_acc │ 0.19718310236930847 │ │ val_loss │ 2.3262360095977783 │ │ valid_mapk │ 0.2529761791229248 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.2529761791229248, 'val_loss': 2.3262360095977783, 'val_acc': 0.19718310236930847, 'hp_metric': 2.3262360095977783}
k: 3
Train Dataset Size: 636
Val Dataset Size: 71
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.248728036880493 │ │ val_acc │ 0.2535211145877838 │ │ val_loss │ 2.248728036880493 │ │ valid_mapk │ 0.2472098171710968 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.2472098171710968, 'val_loss': 2.248728036880493, 'val_acc': 0.2535211145877838, 'hp_metric': 2.248728036880493}
k: 4
Train Dataset Size: 636
Val Dataset Size: 71
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.3227880001068115 │ │ val_acc │ 0.19718310236930847 │ │ val_loss │ 2.3227880001068115 │ │ valid_mapk │ 0.2873883843421936 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.2873883843421936, 'val_loss': 2.3227880001068115, 'val_acc': 0.19718310236930847, 'hp_metric': 2.3227880001068115}
k: 5
Train Dataset Size: 636
Val Dataset Size: 71
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.2405641078948975 │ │ val_acc │ 0.2957746386528015 │ │ val_loss │ 2.2405641078948975 │ │ valid_mapk │ 0.4836309552192688 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.4836309552192688, 'val_loss': 2.2405641078948975, 'val_acc': 0.2957746386528015, 'hp_metric': 2.2405641078948975}
k: 6
Train Dataset Size: 636
Val Dataset Size: 71
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.3006131649017334 │ │ val_acc │ 0.23943662643432617 │ │ val_loss │ 2.3006131649017334 │ │ valid_mapk │ 0.4224330484867096 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.4224330484867096, 'val_loss': 2.3006131649017334, 'val_acc': 0.23943662643432617, 'hp_metric': 2.3006131649017334}
k: 7
Train Dataset Size: 637
Val Dataset Size: 70
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.201622247695923 │ │ val_acc │ 0.3571428656578064 │ │ val_loss │ 2.201622247695923 │ │ valid_mapk │ 0.4184027910232544 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.4184027910232544, 'val_loss': 2.201622247695923, 'val_acc': 0.3571428656578064, 'hp_metric': 2.201622247695923}
k: 8
Train Dataset Size: 637
Val Dataset Size: 70
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.333130121231079 │ │ val_acc │ 0.17142857611179352 │ │ val_loss │ 2.333130121231079 │ │ valid_mapk │ 0.28125 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.28125, 'val_loss': 2.333130121231079, 'val_acc': 0.17142857611179352, 'hp_metric': 2.333130121231079}
k: 9
Train Dataset Size: 637
Val Dataset Size: 70
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2.169214963912964 │ │ val_acc │ 0.3857142925262451 │ │ val_loss │ 2.169214963912964 │ │ valid_mapk │ 0.4262152910232544 │ └───────────────────────────┴───────────────────────────┘
train_model result: {'valid_mapk': 0.4262152910232544, 'val_loss': 2.169214963912964, 'val_acc': 0.3857142925262451, 'hp_metric': 2.169214963912964}
0.37566344141960145
- This is the evaluation that will be used in the comparison.
17.10.3 Detailed Hyperparameter Plots
filename = "./figures/" + experiment_name
spot_tuner.plot_important_hyperparameter_contour(filename=filename)act_fn: 100.0
lr_mult: 0.21029704932007795
patience: 1.9089791092711725



17.10.4 Parallel Coordinates Plot
spot_tuner.parallel_plot()Parallel coordinates plots
17.10.5 Plot all Combinations of Hyperparameters
- Warning: this may take a while.
PLOT_ALL = False
if PLOT_ALL:
n = spot_tuner.k
for i in range(n-1):
for j in range(i+1, n):
spot_tuner.plot_contour(i=i, j=j, min_z=min_z, max_z = max_z)17.10.6 Visualizing the Activation Distribution
- The following code is based on [PyTorch Lightning TUTORIAL 2: ACTIVATION FUNCTIONS], Author: Phillip Lippe, License: [CC BY-SA], Generated: 2023-03-15T09:52:39.179933.
After we have trained the models, we can look at the actual activation values that find inside the model. For instance, how many neurons are set to zero in ReLU? Where do we find most values in Tanh? To answer these questions, we can write a simple function which takes a trained model, applies it to a batch of images, and plots the histogram of the activations inside the network:
from spotPython.torch.activation import Sigmoid, Tanh, ReLU, LeakyReLU, ELU, Swish
act_fn_by_name = {"sigmoid": Sigmoid, "tanh": Tanh, "relu": ReLU, "leakyrelu": LeakyReLU, "elu": ELU, "swish": Swish}from spotPython.hyperparameters.values import get_one_config_from_X
X = spot_tuner.to_all_dim(spot_tuner.min_X.reshape(1,-1))
config = get_one_config_from_X(X, fun_control)
model = fun_control["core_model"](**config, _L_in=64, _L_out=11)
modelNetLightBase(
(train_mapk): MAPK()
(valid_mapk): MAPK()
(test_mapk): MAPK()
(layers): Sequential(
(0): Linear(in_features=64, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.0708380794924471, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.0708380794924471, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.0708380794924471, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.0708380794924471, inplace=False)
(12): Linear(in_features=16, out_features=11, bias=True)
)
)
from spotPython.utils.eda import visualize_activations
visualize_activations(model, color=f"C{0}")
17.11 Submission
import numpy as np
import pandas as pd
from sklearn.preprocessing import OrdinalEncoderimport pandas as pd
from sklearn.preprocessing import OrdinalEncoder
train_df = pd.read_csv('./data/VBDP/train.csv', index_col=0)
# remove the id column
# train_df = train_df.drop(columns=['id'])
n_samples = train_df.shape[0]
n_features = train_df.shape[1] - 1
target_column = "prognosis"
# Encode our prognosis labels as integers for easier decoding later
enc = OrdinalEncoder()
y = enc.fit_transform(train_df[[target_column]])
test_df = pd.read_csv('./data/VBDP/test.csv', index_col=0)
test_df| sudden_fever | headache | mouth_bleed | nose_bleed | muscle_pain | joint_pain | vomiting | rash | diarrhea | hypotension | ... | lymph_swells | breathing_restriction | toe_inflammation | finger_inflammation | lips_irritation | itchiness | ulcers | toenail_loss | speech_problem | bullseye_rash | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 707 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 708 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 709 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 710 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 711 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1005 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1006 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1007 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1008 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1009 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
303 rows × 64 columns
import torch
X_tensor = torch.Tensor(test_df.values)
X_tensor = X_tensor.to(fun_control["device"])y = model_loaded(X_tensor)
y.shapetorch.Size([303, 11])
# convert the predictions to a numpy array
y = y.cpu().detach().numpy()
yarray([[9.1882097e-04, 2.1353103e-03, 3.4111074e-01, ..., 3.5611987e-03,
1.5083617e-01, 4.6887794e-01],
[9.9263459e-01, 1.6731443e-04, 1.8472878e-03, ..., 5.2285739e-03,
8.9900452e-05, 2.9124603e-06],
[2.8728388e-04, 6.1344850e-05, 8.7252341e-02, ..., 5.2384338e-03,
8.9878088e-01, 8.2571404e-03],
...,
[6.7850891e-07, 5.6389499e-08, 1.1290988e-02, ..., 9.0060084e-06,
9.8707992e-01, 1.6189741e-03],
[8.3826768e-09, 2.7530784e-06, 2.0392492e-05, ..., 9.9997604e-01,
2.1343609e-09, 3.7158691e-07],
[1.0118679e-04, 1.3786184e-03, 4.1456467e-01, ..., 1.6462383e-03,
1.3861924e-02, 5.4504627e-01]], dtype=float32)
test_sorted_prediction_ids = np.argsort(-y, axis=1)
test_top_3_prediction_ids = test_sorted_prediction_ids[:,:3]
original_shape = test_top_3_prediction_ids.shape
test_top_3_prediction = enc.inverse_transform(test_top_3_prediction_ids.reshape(-1, 1))
test_top_3_prediction = test_top_3_prediction.reshape(original_shape)
test_df['prognosis'] = np.apply_along_axis(lambda x: np.array(' '.join(x), dtype="object"), 1, test_top_3_prediction)
test_df['prognosis'].reset_index().to_csv('./data/VBDP/submission.csv', index=False)17.12 Appendix
17.12.1 Differences to the spotPython Approaches for torch, sklearn and river
- Data loading is handled independently from the
fun_controldictionary byLightningandPyTorch. - In contrast to
spotPythonwithtorch,riverandsklearn, the data sets are not added to thefun_controldictionary.
17.12.1.1 Specification of the Preprocessing Model
The fun_control dictionary, the torch, sklearnand river versions of spotPython allow the specification of a data preprocessing pipeline, e.g., for the scaling of the data or for the one-hot encoding of categorical variables, see Section 12.4. This feature is not used in the Lightning version.
Lightning allows the data preprocessing to be specified in the LightningDataModule class. It is not considered here, because it should be computed at one location only.
17.12.2 Taking a Look at the Data
import torch
from spotPython.light.csvdataset import CSVDataset
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
# Create an instance of CSVDataset
dataset = CSVDataset(csv_file="./data/VBDP/train.csv", train=True)
# show the dimensions of the input data
print(dataset[0][0].shape)
# show the first element of the input data
print(dataset[0][0])
# show the size of the dataset
print(f"Dataset Size: {len(dataset)}")torch.Size([64])
tensor([1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0.,
1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0., 0.,
1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 1.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Dataset Size: 707
# Set batch size for DataLoader
batch_size = 3
# Create DataLoader
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
# Iterate over the data in the DataLoader
for batch in dataloader:
inputs, targets = batch
print(f"Batch Size: {inputs.size(0)}")
print("---------------")
print(f"Inputs: {inputs}")
print(f"Targets: {targets}")
breakBatch Size: 3
---------------
Inputs: tensor([[1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0.,
1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1.,
1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.,
1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Targets: tensor([1, 0, 9])
17.12.3 The MAPK Metric
Here is an example how the MAPK metric is calculated.
from spotPython.torch.mapk import MAPK
import torch
mapk = MAPK(k=2)
target = torch.tensor([0, 1, 2, 2])
preds = torch.tensor(
[
[0.5, 0.2, 0.2], # 0 is in top 2
[0.3, 0.4, 0.2], # 1 is in top 2
[0.2, 0.4, 0.3], # 2 is in top 2
[0.7, 0.2, 0.1], # 2 isn't in top 2
]
)
mapk.update(preds, target)
print(mapk.compute()) # tensor(0.6250)tensor(0.6250)