Automatic Hyperparameter Tuning

In this notebook we show how the franken library can be used to automatically find the best hyperparameters (HPs) for fitting potential functions on a given dataset. The HP search procedure is a simple grid-search over

The kernel parameters (e.g length-scale, polynomial degree, …)
The solver parameters (e.g. force-weight, ridge penalty, …)

This tuning procedure can be equivalently run using the code from this notebook, or from the command-line using the franken.autotune command and specifying all options on the CLI.

This notebook is also available on Google colab for easy running.

[1]:

try:
    import franken
except ImportError:
    %pip install franken[mace]
    import franken

[2]:

import json

import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

from franken.autotune import autotune
from franken.config import MaceBackboneConfig, GaussianRFConfig, DatasetConfig, SolverConfig, HPSearchConfig, AutotuneConfig

/home/lbonati@iit.local/software/miniforge3-mamba/envs/franken/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

The process of running autotune is simple: first initialize all configuration objects. These will also include the definition of the hyperparameter search-space. Then simply pass all objects to autotune which will run the training algorithm for each hyperparameters setting and figure out which one performs the best. For this to work it is very important to provide a validation dataset which is different from the training set. The “water” dataset we use in the notebook also comes with a pre-defined validation split.

[3]:

from franken.datasets.registry import DATASET_REGISTRY
from franken.backbones.utils import CacheDir

train_path = DATASET_REGISTRY.get_path("water", "train", base_path=CacheDir.get())
val_path = DATASET_REGISTRY.get_path("water", "val", base_path=CacheDir.get())

print(f"Train path: {train_path}")
print(f"Val path: {val_path}")

Train path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_1.xyz
Val path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_2-val.xyz

The equivalent franken.autotune command to the configuration defined here is:

franken.autotune \
    --train-path $HOME/.franken/water/ML_AB_dataset_1.xyz
    --val-path $HOME/.franken/water/ML_AB_dataset_2-val.xyz
    --max-train-samples 8 \
    --l2-penalty="(-10, -5, 5, log)" \
    --force-weight="(0.01, 0.99, 5, linear)" \
    --metrics energy_MAE forces_MAE \
    --seed 42 \
    --jac-chunk-size "auto" \
    --run-dir "./results" \
    --backbone=mace --mace.path-or-id "mace_mp/small" --mace.interaction-block 2 \
    --rf=gaussian --gaussian.num-rf 512 --gaussian.length-scale="[1., 5., 10.0, 15.0, 20.0, 25.0, 30.0]"

The GNN and dataset configurations are fixed. We use just 8 training samples to reduce the computation time for CoLab, but if you run this locally you can increase it.

[4]:

gnn_config = MaceBackboneConfig(
    path_or_id="mace_mp/small",
    interaction_block=2,
)
dataset_cfg = DatasetConfig(train_path=str(train_path),
                            max_train_samples=8,
                            val_path=str(val_path)
)

We use Gaussian random features to run the automatic tuning on the length-scale parameter. autotune will test some values from 1 to 30.

[5]:

rf_config = GaussianRFConfig(
    num_random_features=512,
    length_scale=HPSearchConfig(values=[1., 5., 10.0, 15.0, 20.0, 25.0, 30.0]),
    rng_seed=42,  # for reproducibility
)

The solver parameters are less expensive to iterate over, so we can use a finer grid. For the l2_penalty HP we will test 5 logarithmically-spaced values between \(10^{-10}\) and \(10^{-5}\), and for the force_weight HP we will test 5 linearly-spaced values between 0.01 and 0.99.

[6]:

solver_cfg = SolverConfig(
    l2_penalty=HPSearchConfig(start=-10, stop=-5, num=5, scale='log'),  # equivalent of numpy.logspace
    force_weight=HPSearchConfig(start=0.01, stop=0.99, num=5, scale='linear'),  # equivalent of numpy.linspace
)

Finally all configurations are grouped together and we run autotune. Note the run_dir setting: this is where the logs and models will be saved.

[7]:

autotune_cfg = AutotuneConfig(
    dataset=dataset_cfg,
    solver=solver_cfg,
    backbone=gnn_config,
    rfs=rf_config,
    metrics=["energy_MAE", "forces_MAE"],
    seed=42,
    jac_chunk_size='auto',
    run_dir="./results",
)

run_path = autotune(autotune_cfg)

console_logging_level: INFO
dtype: float64
jac_chunk_size: auto
rf_normalization: leading_eig
run_dir: ./results
save_every_model: False
save_fmaps: False
scale_by_species: True
seed: 42
backbone:
    family: mace
    interaction_block: 2
    path_or_id: mace_mp/small
dataset:
    max_train_samples: 8
    name: null
    test_path: null
    train_path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_1.xyz
    val_path: /home/lbonati@iit.local/.franken/water/ML_AB_dataset_2-val.xyz
metrics:
    - energy_MAE
    - forces_MAE
rfs:
    length_scale:
      num: null
      scale: null
      start: null
      stop: null
      value: null
      values:
      - 1.0
      - 5.0
      - 10.0
      - 15.0
      - 20.0
      - 25.0
      - 30.0
    num_random_features: 512
    rf_type: gaussian
    rng_seed: 42
    use_offset: true
solver:
    force_weight:
      num: 5
      scale: linear
      start: 0.01
      stop: 0.99
      value: null
      values: null
    l2_penalty:
      num: 5
      scale: log
      start: -10
      stop: -5
      value: null
      values: null

2026-02-10 16:15:03.521 WARNING (rank 0): Cache directory already initialized at /home/lbonati@iit.local/.franken. Reinitializing.
2026-02-10 16:15:03.522 INFO (rank 0): Initializing default cache directory at /home/lbonati@iit.local/.franken
2026-02-10 16:15:03.545 INFO (rank 0): Run folder: results/run_260210_161503_ecf01a5f
cuequivariance or cuequivariance_torch is not available. Cuequivariance acceleration will be disabled.

ASE -> MACE (train): 100%|██████████| 8/8 [00:00<00:00, 145.20it/s]
ASE -> MACE (val): 100%|██████████| 189/189 [00:00<00:00, 248.05it/s]
Computing dataset statistics: 100%|██████████| 8/8 [00:00<00:00, 17.58it/s]

2026-02-10 16:15:12.382 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:15:44.109 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:16:29.210 INFO (rank 0): Trial   1 | rf_type: gaussian | num_random_features:   512   | length_scale:  1.000  | use_offset:  True   | rng_seed:   42    | Best trial 1 (energy 1.48 meV/atom - forces 96.7 meV/Ang)
2026-02-10 16:16:34.821 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:17:05.039 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:17:49.492 INFO (rank 0): Trial   2 | rf_type: gaussian | num_random_features:   512   | length_scale:  5.000  | use_offset:  True   | rng_seed:   42    | Best trial 2 (energy 0.38 meV/atom - forces 28.1 meV/Ang)
2026-02-10 16:17:54.961 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:18:25.419 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:19:10.188 INFO (rank 0): Trial   3 | rf_type: gaussian | num_random_features:   512   | length_scale: 10.000  | use_offset:  True   | rng_seed:   42    | Best trial 3 (energy 0.34 meV/atom - forces 25.9 meV/Ang)
2026-02-10 16:19:15.820 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:19:48.589 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:20:35.754 INFO (rank 0): Trial   4 | rf_type: gaussian | num_random_features:   512   | length_scale: 15.000  | use_offset:  True   | rng_seed:   42    | Best trial 4 (energy 0.32 meV/atom - forces 25.3 meV/Ang)
2026-02-10 16:20:41.424 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:21:13.832 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:21:58.791 INFO (rank 0): Trial   5 | rf_type: gaussian | num_random_features:   512   | length_scale: 20.000  | use_offset:  True   | rng_seed:   42    | Best trial 5 (energy 0.34 meV/atom - forces 25.0 meV/Ang)
2026-02-10 16:22:04.323 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:22:34.677 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:23:20.463 INFO (rank 0): Trial   6 | rf_type: gaussian | num_random_features:   512   | length_scale: 25.000  | use_offset:  True   | rng_seed:   42    | Best trial 6 (energy 0.33 meV/atom - forces 24.8 meV/Ang)
2026-02-10 16:23:26.247 INFO (rank 0): jacobian chunk size automatically set to 32
2026-02-10 16:23:58.504 WARNING (rank 0): `leading_eig` normalization has high memory usage. If you encounter OOM errors try to disable it.
2026-02-10 16:24:45.309 INFO (rank 0): Trial   7 | rf_type: gaussian | num_random_features:   512   | length_scale: 30.000  | use_offset:  True   | rng_seed:   42    | Best trial 7 (energy 0.32 meV/atom - forces 24.6 meV/Ang)

Analysing the results

There are two main outputs from the autotune procedure: the model trained with the best hyperparameters, which is saved at "results/best_ckpt.pt" and the logs which describe the errors of all the models trained. Here we analyze the error-log, and in the molecular_dynamics notebook we will use the trained model to perform some MD simulations.

[8]:

# We load the full logs for all training runs and the logs for just the best model.
with open(run_path / "log.json", "r") as fh:
    all_logs = json.load(fh)
with open(run_path / "best.json", "r") as fh:
    best_log = json.load(fh)

[9]:

best_ls = best_log["hyperparameters"]["random_features"]["length_scale"]
best_l2 = best_log["hyperparameters"]["solver"]["l2_penalty"]
best_fw = best_log["hyperparameters"]["solver"]["force_weight"]
print("Best hyperparameters: ")
print(f"\tLength-scale: {best_ls:.1f}")
print(f"\tL2 penalty: {best_l2:.2e}")
print(f"\tForce-weight: {best_fw:.3f}")

Best hyperparameters:
        Length-scale: 30.0
        L2 penalty: 1.78e-09
        Force-weight: 0.010

To make the analysis easier we convert the json logs to pandas

[10]:

logs_df = pd.json_normalize(all_logs)  # flattens nested dictionaries and converts to DataFrame
logs_df.head()

[10]:

	checkpoint.hash	checkpoint.rf_weight_id	timings.cov_coeffs	timings.solve	metrics.train.energy_MAE	metrics.train.forces_MAE	metrics.validation.energy_MAE	metrics.validation.forces_MAE	hyperparameters.franken.path_or_id	hyperparameters.franken.interaction_block	hyperparameters.franken.family	hyperparameters.random_features.rf_type	hyperparameters.random_features.num_random_features	hyperparameters.random_features.length_scale	hyperparameters.random_features.use_offset	hyperparameters.random_features.rng_seed	hyperparameters.input_scaler.scale_by_Z	hyperparameters.solver.force_weight	hyperparameters.solver.l2_penalty	hyperparameters.solver.dtype
0	1566c4e8d00298883c7def9b5e59e352	0	37.597101	0.005464	0.038367	63.922082	1.217128	97.950285	mace_mp/small	2	mace	gaussian	512	1.0	True	42	True	0.01	1.000000e-10	torch.float64
1	1566c4e8d00298883c7def9b5e59e352	1	37.597101	0.001383	0.038376	63.921710	1.217229	97.946695	mace_mp/small	2	mace	gaussian	512	1.0	True	42	True	0.01	1.778279e-09	torch.float64
2	1566c4e8d00298883c7def9b5e59e352	2	37.597101	0.001512	0.038521	63.916007	1.219013	97.884128	mace_mp/small	2	mace	gaussian	512	1.0	True	42	True	0.01	3.162278e-08	torch.float64
3	1566c4e8d00298883c7def9b5e59e352	3	37.597101	0.001474	0.041016	63.985303	1.246055	97.092725	mace_mp/small	2	mace	gaussian	512	1.0	True	42	True	0.01	5.623413e-07	torch.float64
4	1566c4e8d00298883c7def9b5e59e352	4	37.597101	0.001303	0.063570	70.761652	1.362110	98.960269	mace_mp/small	2	mace	gaussian	512	1.0	True	42	True	0.01	1.000000e-05	torch.float64

Since we have three different hyperparameters it’s hard to visualize their behavior all at the same time. We start by analyzing the behavior as the forces-weight changes from very low to very high, plotting the error on both forces and energy predictions.

[11]:

df_fw = logs_df[  # Fix the other two hyperparameters
    (logs_df["hyperparameters.random_features.length_scale"] == best_ls) &
    ((logs_df["hyperparameters.solver.l2_penalty"] - 5.62341325e-07).abs() < 1e-12)
]
df_fw = df_fw.sort_values("hyperparameters.solver.force_weight")
fig, ax = plt.subplots()
ax.plot(
    df_fw["hyperparameters.solver.force_weight"],
    df_fw["metrics.validation.energy_MAE"],
    label="Energy MAE"
)
ax2 = ax.twinx()
ax2.plot(
    df_fw["hyperparameters.solver.force_weight"],
    df_fw["metrics.validation.forces_MAE"],
    label="Forces MAE",
    c='r'
)
fig.legend(loc='upper center')
ax.set_xlabel("Force weight")
ax.set_ylabel("Energy MAE [meV/atom]")
ax2.set_ylabel("Forces MAE [meV/A]")

[11]:

Text(0, 0.5, 'Forces MAE [meV/A]')

Next we analyze the behaviour of the length-scale and l2 penalty hyperparameters

[12]:

df_ker = logs_df[logs_df["hyperparameters.solver.force_weight"] == 0.5]
fig, ax = plt.subplots()
pivot = df_ker.pivot_table(
    index="hyperparameters.solver.l2_penalty",
    columns="hyperparameters.random_features.length_scale",
    values="metrics.validation.forces_MAE"
)
im = ax.imshow(pivot, norm=matplotlib.colors.Normalize(vmin=25, vmax=30), cmap="viridis_r")
cb = fig.colorbar(im)
cb.set_label("Forces MAE [meV/A]")
ax.set_xticks(range(len(pivot.columns)), pivot.columns)
ax.set_xlabel("Length-scale")
ax.set_yticks(range(len(pivot.index)), [f"{i:.1e}" for i in pivot.index])
ax.set_ylabel("L2 penalty")

[12]:

Text(0, 0.5, 'L2 penalty')

Getting Started

Molecular dynamics with franken and ASE