Automatic Hyperparameter Tuning

In this notebook we show how the franken library can be used to automatically find the best hyperparameters (HPs) for fitting potential functions on a given dataset. The HP search procedure is a simple grid-search over

  1. The kernel parameters (e.g length-scale, polynomial degree, …)

  2. The solver parameters (e.g. force-weight, ridge penalty, …)

This tuning procedure can be equivalently run using the code from this notebook, or from the command-line using the franken.autotune command and specifying all options on the CLI.

This notebook is also available on Google colab for easy running.

[1]:
try:
    import franken
except ImportError:
    %pip install franken[mace]
    import franken
[2]:
import json

import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

from franken.autotune import autotune
from franken.config import MaceBackboneConfig, GaussianRFConfig, DatasetConfig, SolverConfig, HPSearchConfig, AutotuneConfig
/home/lbonati@iit.local/software/miniforge3-mamba/envs/franken/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

The process of running autotune is simple: first initialize all configuration objects. These will also include the definition of the hyperparameter search-space. Then simply pass all objects to autotune which will run the training algorithm for each hyperparameters setting and figure out which one performs the best. For this to work it is very important to provide a validation dataset which is different from the training set. The “water” dataset we use in the notebook also comes with a pre-defined validation split.

[3]:
from franken.datasets.registry import DATASET_REGISTRY
from franken.backbones.utils import CacheDir

train_path = DATASET_REGISTRY.get_path("water", "train", base_path=CacheDir.get())
val_path = DATASET_REGISTRY.get_path("water", "val", base_path=CacheDir.get())

print(f"Train path: {train_path}")
print(f"Val path: {val_path}")
Train path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_1.xyz
Val path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_2-val.xyz

The equivalent franken.autotune command to the configuration defined here is:

franken.autotune \
    --train-path $HOME/.franken/water/ML_AB_dataset_1.xyz
    --val-path $HOME/.franken/water/ML_AB_dataset_2-val.xyz
    --max-train-samples 8 \
    --l2-penalty="(-10, -5, 5, log)" \
    --force-weight="(0.01, 0.99, 5, linear)" \
    --metrics energy_MAE forces_MAE \
    --seed 42 \
    --jac-chunk-size "auto" \
    --run-dir "./results" \
    --backbone=mace --mace.path-or-id "mace_mp/small" --mace.interaction-block 2 \
    --rf=gaussian --gaussian.num-rf 512 --gaussian.length-scale="[1., 5., 10.0, 15.0, 20.0, 25.0, 30.0]"

The GNN and dataset configurations are fixed. We use just 8 training samples to reduce the computation time for CoLab, but if you run this locally you can increase it.

[4]:
gnn_config = MaceBackboneConfig(
    path_or_id="mace_mp/small",
    interaction_block=2,
)
dataset_cfg = DatasetConfig(train_path=str(train_path),
                            max_train_samples=8,
                            val_path=str(val_path)
)

We use Gaussian random features to run the automatic tuning on the length-scale parameter. autotune will test some values from 1 to 30.

[5]:
rf_config = GaussianRFConfig(
    num_random_features=512,
    length_scale=HPSearchConfig(values=[1., 5., 10.0, 15.0, 20.0, 25.0, 30.0]),
    rng_seed=42,  # for reproducibility
)

The solver parameters are less expensive to iterate over, so we can use a finer grid. For the l2_penalty HP we will test 5 logarithmically-spaced values between \(10^{-10}\) and \(10^{-5}\), and for the force_weight HP we will test 5 linearly-spaced values between 0.01 and 0.99.

[6]:
solver_cfg = SolverConfig(
    l2_penalty=HPSearchConfig(start=-10, stop=-5, num=5, scale='log'),  # equivalent of numpy.logspace
    force_weight=HPSearchConfig(start=0.01, stop=0.99, num=5, scale='linear'),  # equivalent of numpy.linspace
)

Finally all configurations are grouped together and we run autotune. Note the run_dir setting: this is where the logs and models will be saved.

[7]:
autotune_cfg = AutotuneConfig(
    dataset=dataset_cfg,
    solver=solver_cfg,
    backbone=gnn_config,
    rfs=rf_config,
    metrics=["energy_MAE", "forces_MAE"],
    seed=42,
    jac_chunk_size='auto',
    run_dir="./results",
)

run_path = autotune(autotune_cfg)
console_logging_level: INFO
dtype: float64
jac_chunk_size: auto
rf_normalization: leading_eig
run_dir: ./results
save_every_model: False
save_fmaps: False
scale_by_species: True
seed: 42
backbone:
    family: mace
    interaction_block: 2
    path_or_id: mace_mp/small
dataset:
    max_train_samples: 8
    name: null
    test_path: null
    train_path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_1.xyz
    val_path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_2-val.xyz
metrics:
    - energy_MAE
    - forces_MAE
rfs:
    length_scale:
      num: null
      scale: null
      start: null
      stop: null
      value: null
      values:
      - 1.0
      - 5.0
      - 10.0
      - 15.0
      - 20.0
      - 25.0
      - 30.0
    num_random_features: 512
    rf_type: gaussian
    rng_seed: 42
    use_offset: true
solver:
    force_weight:
      num: 5
      scale: linear
      start: 0.01
      stop: 0.99
      value: null
      values: null
    l2_penalty:
      num: 5
      scale: log
      start: -10
      stop: -5
      value: null
      values: null

2026-02-10 16:15:03.521 WARNING (rank 0): Cache directory already initialized at /home/lbonati@iit.local/.franken. Reinitializing.
2026-02-10 16:15:03.522 INFO (rank 0): Initializing default cache directory at /home/lbonati@iit.local/.franken
2026-02-10 16:15:03.545 INFO (rank 0): Run folder: results/run_260210_161503_ecf01a5f
cuequivariance or cuequivariance_torch is not available. Cuequivariance acceleration will be disabled.
ASE -> MACE (train): 100%|██████████| 8/8 [00:00<00:00, 145.20it/s]
ASE -> MACE (val): 100%|██████████| 189/189 [00:00<00:00, 248.05it/s]
Computing dataset statistics: 100%|██████████| 8/8 [00:00<00:00, 17.58it/s]
2026-02-10 16:15:12.382 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:15:44.109 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:16:29.210 INFO (rank 0): Trial   1 | rf_type: gaussian | num_random_features:   512   | length_scale:  1.000  | use_offset:  True   | rng_seed:   42    | Best trial 1 (energy 1.48 meV/atom - forces 96.7 meV/Ang)
2026-02-10 16:16:34.821 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:17:05.039 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:17:49.492 INFO (rank 0): Trial   2 | rf_type: gaussian | num_random_features:   512   | length_scale:  5.000  | use_offset:  True   | rng_seed:   42    | Best trial 2 (energy 0.38 meV/atom - forces 28.1 meV/Ang)
2026-02-10 16:17:54.961 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:18:25.419 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:19:10.188 INFO (rank 0): Trial   3 | rf_type: gaussian | num_random_features:   512   | length_scale: 10.000  | use_offset:  True   | rng_seed:   42    | Best trial 3 (energy 0.34 meV/atom - forces 25.9 meV/Ang)
2026-02-10 16:19:15.820 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:19:48.589 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:20:35.754 INFO (rank 0): Trial   4 | rf_type: gaussian | num_random_features:   512   | length_scale: 15.000  | use_offset:  True   | rng_seed:   42    | Best trial 4 (energy 0.32 meV/atom - forces 25.3 meV/Ang)
2026-02-10 16:20:41.424 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:21:13.832 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:21:58.791 INFO (rank 0): Trial   5 | rf_type: gaussian | num_random_features:   512   | length_scale: 20.000  | use_offset:  True   | rng_seed:   42    | Best trial 5 (energy 0.34 meV/atom - forces 25.0 meV/Ang)
2026-02-10 16:22:04.323 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:22:34.677 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:23:20.463 INFO (rank 0): Trial   6 | rf_type: gaussian | num_random_features:   512   | length_scale: 25.000  | use_offset:  True   | rng_seed:   42    | Best trial 6 (energy 0.33 meV/atom - forces 24.8 meV/Ang)
2026-02-10 16:23:26.247 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:23:58.504 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:24:45.309 INFO (rank 0): Trial   7 | rf_type: gaussian | num_random_features:   512   | length_scale: 30.000  | use_offset:  True   | rng_seed:   42    | Best trial 7 (energy 0.32 meV/atom - forces 24.6 meV/Ang)

Analysing the results

There are two main outputs from the autotune procedure: the model trained with the best hyperparameters, which is saved at "results/best_ckpt.pt" and the logs which describe the errors of all the models trained. Here we analyze the error-log, and in the molecular_dynamics notebook we will use the trained model to perform some MD simulations.

[8]:
# We load the full logs for all training runs and the logs for just the best model.
with open(run_path / "log.json", "r") as fh:
    all_logs = json.load(fh)
with open(run_path / "best.json", "r") as fh:
    best_log = json.load(fh)
[9]:
best_ls = best_log["hyperparameters"]["random_features"]["length_scale"]
best_l2 = best_log["hyperparameters"]["solver"]["l2_penalty"]
best_fw = best_log["hyperparameters"]["solver"]["force_weight"]
print("Best hyperparameters: ")
print(f"\tLength-scale: {best_ls:.1f}")
print(f"\tL2 penalty: {best_l2:.2e}")
print(f"\tForce-weight: {best_fw:.3f}")

Best hyperparameters:
        Length-scale: 30.0
        L2 penalty: 1.78e-09
        Force-weight: 0.010

To make the analysis easier we convert the json logs to pandas

[10]:
logs_df = pd.json_normalize(all_logs)  # flattens nested dictionaries and converts to DataFrame
logs_df.head()
[10]:
checkpoint.hash checkpoint.rf_weight_id timings.cov_coeffs timings.solve metrics.train.energy_MAE metrics.train.forces_MAE metrics.validation.energy_MAE metrics.validation.forces_MAE hyperparameters.franken.path_or_id hyperparameters.franken.interaction_block hyperparameters.franken.family hyperparameters.random_features.rf_type hyperparameters.random_features.num_random_features hyperparameters.random_features.length_scale hyperparameters.random_features.use_offset hyperparameters.random_features.rng_seed hyperparameters.input_scaler.scale_by_Z hyperparameters.solver.force_weight hyperparameters.solver.l2_penalty hyperparameters.solver.dtype
0 1566c4e8d00298883c7def9b5e59e352 0 37.597101 0.005464 0.038367 63.922082 1.217128 97.950285 mace_mp/small 2 mace gaussian 512 1.0 True 42 True 0.01 1.000000e-10 torch.float64
1 1566c4e8d00298883c7def9b5e59e352 1 37.597101 0.001383 0.038376 63.921710 1.217229 97.946695 mace_mp/small 2 mace gaussian 512 1.0 True 42 True 0.01 1.778279e-09 torch.float64
2 1566c4e8d00298883c7def9b5e59e352 2 37.597101 0.001512 0.038521 63.916007 1.219013 97.884128 mace_mp/small 2 mace gaussian 512 1.0 True 42 True 0.01 3.162278e-08 torch.float64
3 1566c4e8d00298883c7def9b5e59e352 3 37.597101 0.001474 0.041016 63.985303 1.246055 97.092725 mace_mp/small 2 mace gaussian 512 1.0 True 42 True 0.01 5.623413e-07 torch.float64
4 1566c4e8d00298883c7def9b5e59e352 4 37.597101 0.001303 0.063570 70.761652 1.362110 98.960269 mace_mp/small 2 mace gaussian 512 1.0 True 42 True 0.01 1.000000e-05 torch.float64

Since we have three different hyperparameters it’s hard to visualize their behavior all at the same time. We start by analyzing the behavior as the forces-weight changes from very low to very high, plotting the error on both forces and energy predictions.

[11]:
df_fw = logs_df[  # Fix the other two hyperparameters
    (logs_df["hyperparameters.random_features.length_scale"] == best_ls) &
    ((logs_df["hyperparameters.solver.l2_penalty"] - 5.62341325e-07).abs() < 1e-12)
]
df_fw = df_fw.sort_values("hyperparameters.solver.force_weight")
fig, ax = plt.subplots()
ax.plot(
    df_fw["hyperparameters.solver.force_weight"],
    df_fw["metrics.validation.energy_MAE"],
    label="Energy MAE"
)
ax2 = ax.twinx()
ax2.plot(
    df_fw["hyperparameters.solver.force_weight"],
    df_fw["metrics.validation.forces_MAE"],
    label="Forces MAE",
    c='r'
)
fig.legend(loc='upper center')
ax.set_xlabel("Force weight")
ax.set_ylabel("Energy MAE [meV/atom]")
ax2.set_ylabel("Forces MAE [meV/A]")
[11]:
Text(0, 0.5, 'Forces MAE [meV/A]')
../_images/notebooks_autotune_21_1.png

Next we analyze the behaviour of the length-scale and l2 penalty hyperparameters

[12]:
df_ker = logs_df[logs_df["hyperparameters.solver.force_weight"] == 0.5]
fig, ax = plt.subplots()
pivot = df_ker.pivot_table(
    index="hyperparameters.solver.l2_penalty",
    columns="hyperparameters.random_features.length_scale",
    values="metrics.validation.forces_MAE"
)
im = ax.imshow(pivot, norm=matplotlib.colors.Normalize(vmin=25, vmax=30), cmap="viridis_r")
cb = fig.colorbar(im)
cb.set_label("Forces MAE [meV/A]")
ax.set_xticks(range(len(pivot.columns)), pivot.columns)
ax.set_xlabel("Length-scale")
ax.set_yticks(range(len(pivot.index)), [f"{i:.1e}" for i in pivot.index])
ax.set_ylabel("L2 penalty")
[12]:
Text(0, 0.5, 'L2 penalty')
../_images/notebooks_autotune_23_1.png