AutotuneConfig

class franken.config.AutotuneConfig(dataset, solver, backbone, rfs, rf_normalization='leading_eig', save_every_model=False, dtype='float64', save_fmaps=False, metrics=<factory>, best_model_selection=<factory>, scale_by_species=True, jac_chunk_size='auto', run_dir='.', seed=1337, console_logging_level='INFO', eval_splits=None, atomic_energies=None)

Bases: object

Configure automatic hyperparameter tuning for franken.

The program will run a grid-search over certain hyperparameters (denoted by ‘HYPERPARAMETER’ in the help-text), which can be configured by using the following three forms: 1. a simple float or int value, 2. a list of float or int values, 3. a 4-tuple (<start>, <stop>, <num>, <’log’ | ‘linear’>) which specifies a linear or logarithmic range.

There are two classes of hyperparameters over which we can perform grid-search: solver parameters (for example the L2 penalty), and random feature parameters. It is much more efficient to try multiple combinations of the solver parameters than of the random feature parameters, so plan your grid accordingly. In particular, some random feature approximations like the multiscale-gaussian do not need any hyperparameter search, reducing the time required for the overall grid-search.

The program has two sub-commands: backbone which allows to choose the GNN underlying the franken features, and rfs to choose the random feature approximation and its parameters. To view the help-text for the subcommands, you can run for example franken.autotune backbone:mace -h or franken.autotune backbone:mace rfs:gaussian -h Note that the backbone must be chosen before the random features

atomic_energies: dict[int, float] | None = None: Optional dictionary mapping atomic numbers to their reference energies (eV). If provided, these energies will be subtracted from the prediction during training. Format: {atomic_number: energy_value, …}. Example: {1: -0.5, 8: -75.3}

backbone: BackboneConfig: The GNN backbone which will be used by franken.

best_model_selection: list[str]: Metrics used to select the best model among trials. This does not affect the training loss.

console_logging_level: Literal['DEBUG', 'INFO', 'WARN', 'ERROR'] = 'INFO': Controls verbosity

dataset: DatasetConfig

Configure a dataset for training Franken.

If –dataset.name corresponds to one of the datasets used in the Franken paper (e.g. “water”, “PtH2O”, “TM23/Ag”, etc.) there is no need to specify train, test or validation paths: the code will take care of downloading and preprocessing the data automatically. Instead, to use a custom dataset please specify at a minimum the training path, and ideally also the validation path (which is used to determine the best model during a hyperparameter search).

dtype: Literal['float32', 'float64'] = 'float64': Data-type for the franken solution. float64 can usually obtain slightly smaller errors while paying a small performance cost.

eval_splits: list[str] | None = None: Evaluate only the specified splits e.g. [‘val’, ‘test’]. The default value of None runs the evaluation on all splits.

jac_chunk_size: Literal['auto'] | int = 'auto': Chunk-size for jacobian calculations. ‘auto’ attempts to set it based on available GPU memory. If you encounter out-of-memory issues, try setting this manually.

metrics: list[str]

Metrics to compute during evaluation.

Options include energy_MAE, forces_MAE, energy_RMSE, forces_RMSE, forces_MAE_species, forces_RMSE_species, and forces_cosim.

rf_normalization: Literal['leading_eig'] | None = 'leading_eig': Normalization strategy for the covariance matrix.

rfs: RFConfig: Choose the random-feature approximation.

run_dir: str = '.': Directory in which the hyperparameter search results will be saved

save_every_model: bool = False: If true saves a checkpoint for every trial, otherwise it saves only the best model.

save_fmaps: bool = False: Whether to save training feature maps. If the dataset is small (~100 samples), setting this to True can increase the speed of hyperparameter tuning, at the cost of higher memory usage.

scale_by_species: bool = True: how to scale the GNN features, whether globally (across species) or individually per species.

seed: int = 1337: Random seed

solver: SolverConfig: Configure the franken solver. Hyperparameter search over these is efficient, so the search-grid can be quite fine-grained.

SolverConfig

Franken CLI Reference