Training a model on forces and energies

In addition to the energy, machine learning models can also be used to model molecular forces. These are \(N_\mathrm{atoms} \times 3\) arrays describing the Cartesian force acting on each atom due to the overall (potential) energy. They are formally defined as the negative gradient of the energy \(E_\mathrm{pot}\) with respect to the nuclear positions \(\mathbf{R}\)

\begin{equation} \mathbf{F}^{(\alpha)} = -\frac{\partial E_\mathrm{pot}}{\partial \mathbf{R}^{(\alpha)}}, \end{equation}

where \(\alpha\) is the index of the nucleus.

The above expression offers a straightforward way to include forces in machine learning models by simply defining a model for the energy and taking the appropriate derivatives. The resulting model can directly be trained on energies and forces. Moreover, in this manner energy conservation and the correct behaviour under rotations of the molecule is guaranteed.

Using forces in addition to energies to construct a machine learning model offers several advantages. Accurate force predictions are important for molecular dynamics simulations, which will be covered in the subsequent tutorial. Forces also encode a greater wealth of information than the energies. For every molecule, only one energy is present, while there are \(3N_\mathrm{atoms}\) force entries. This property, combined with the fact that reference forces can be computed at the same cost as energies, makes models trained on forces and energies very data efficient.

In the following, we will show how to train such force models and how to use them in practical applications.

Preparing the data

The process of preparing the data is similar to the tutorial on QM9. We begin by importing all relevant packages and generating a directory for the tutorial experiments.

import torch
import torchmetrics
import schnetpack as spk
import schnetpack.transform as trn
import pytorch_lightning as pl
import os
import matplotlib.pyplot as plt
import numpy as np

forcetut = './forcetut'
if not os.path.exists(forcetut):

Next, the data needs to be loaded from a suitable dataset. For convenience, we use the MD17 dataset class provided in SchNetPack, which automatically downloads and builds suitable databases containing energies and forces for a range of small organic molecules. In this case, we use the ethanol molecule as an example.

from schnetpack.datasets import MD17

ethanol_data = MD17(
        trn.RemoveOffsets(, remove_mean=True, remove_atomrefs=False),
    pin_memory=True, # set to false, when not using a GPU
INFO:root:Downloading ethanol data
INFO:root:Parsing molecule ethanol
INFO:root:Write atoms to db...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:02<00:00, 33.40it/s]

The data is split into training (1000 points), validation (1000 points) and test set (remainder). Once again, we subtract the mean of the energies in the training data with a preprocessing transform to precondition our model. This only needs to be done for the energies, since the forces are obtained as derivatives and automatically capture the scale of the data. The subtraction of atomic reference energies is not necessary here, since only molecules of the same composition are used.

For custom datasets, the data would have to be loaded via the SchNetPack ASEAtomsData and AtomsDataModule classes (see tutorial on data preparation). In this case, one needs to make sure that the naming of properties is kept consistent with the config. The module provides standard names for a wide range of properties. Here, we use the definitions provided with the MD17 class.

In order to train force models, forces need to be included in the reference data. Once the dataset has been loaded, this can be checked as follows:

properties = ethanol_data.dataset[0]
print('Loaded properties:\n', *['{:s}\n'.format(i) for i in properties.keys()])
Loaded properties:

As can be seen, energy and forces are included in the properties dictionary. To have a look at the forces array and check whether it has the expected dimensions, we can call:

print('Forces:\n', properties[MD17.forces])
print('Shape:\n', properties[MD17.forces].shape)
 tensor([[ 1.4517e+00,  6.0192e+00,  5.2068e-07],
        [ 1.7953e+01, -5.1624e+00,  3.4900e-07],
        [-4.0884e+00,  2.2590e+01,  3.3088e-06],
        [-1.1416e+00, -9.7469e+00,  7.6473e+00],
        [-1.1416e+00, -9.7469e+00, -7.6473e+00],
        [-2.4821e+00,  4.9335e+00,  4.3700e+00],
        [-2.4821e+00,  4.9335e+00, -4.3700e+00],
        [-5.5148e+00, -3.0207e+00, -8.9093e-09],
        [-2.4393e+00, -1.0838e+01, -6.0721e-08]], dtype=torch.float64)
 torch.Size([9, 3])

Building the model

After having prepared the data in the above way, we can now build and train the force model. This is done in the same three steps as described in QM9 tutorial:

  1. Defining input modules

  2. Building the representation

  3. Defining an output module

For the representation we can use the same layers as in the previous tutorial:

cutoff = 5.
n_atom_basis = 30

pairwise_distance = spk.atomistic.PairwiseDistances() # calculates pairwise distances between atoms
radial_basis = spk.nn.GaussianRBF(n_rbf=20, cutoff=cutoff)
schnet = spk.representation.SchNet(
    n_atom_basis=n_atom_basis, n_interactions=3,

Since we want to model forces, we need an additional output module. We will still use the Atomwise to predict the energy. However, since the forces should be described as the derivative of the energy, we have to indicate that the corresponding derviative of the model should be computed.

This is done with the Forces module, which computes the negative derivative of the energy (specified by the supplied energy_key) with respect to the atom positions.

pred_energy = spk.atomistic.Atomwise(n_in=n_atom_basis,
pred_forces = spk.atomistic.Forces(, force_key=MD17.forces)

The input, representation and output modules are then assembled to the neural network potential:

nnpot = spk.model.NeuralNetworkPotential(
    output_modules=[pred_energy, pred_forces],
        trn.AddOffsets(, add_mean=True, add_atomrefs=False)

Training the model

Before we can train the model, the training task has to be defined, including the model, loss functions and optimizers. First, the outputs of the models are connected to their respective loss functions using ModelOutput. To train the model on energies and forces, we will use a combined loss function:

\begin{equation} \mathcal{L}(E_\mathrm{ref},\mathbf{F}_\mathrm{ref},E_\mathrm{pred}, \mathbf{F}_\mathrm{pred}) = \frac{1}{n_\text{train}} \sum_{n=1}^{n_\text{train}} \left[ \rho_1 \left( E_\mathrm{ref} - E_\mathrm{pred} \right)^2 + \frac{\rho_2}{3N_\mathrm{atoms}} \sum^{N_\mathrm{atoms}}_\alpha \left\| \mathbf{F}_\mathrm{ref}^{(\alpha)} - \mathbf{F}_\mathrm{pred}^{(\alpha)} \right\|^2 \right], \end{equation}

where we take the predicted forces to be:

\begin{equation} \mathbf{F}_\mathrm{pred}^{(\alpha)} = -\frac{\partial E_\mathrm{pred}}{\partial \mathbf{R}^{(\alpha)}}. \end{equation}

We have introduced the loss weights \(\rho_i\) in order to control the tradeoff between energy and force loss. In SchNetPack, we can implement such a weighted loss function by setting the loss weights of ModelOutput:

output_energy = spk.task.ModelOutput(,
        "MAE": torchmetrics.MeanAbsoluteError()

output_forces = spk.task.ModelOutput(
        "MAE": torchmetrics.MeanAbsoluteError()

Now, the training task can be assembled as in the last tutorial:

task = spk.task.AtomisticTask(
    outputs=[output_energy, output_forces],
    optimizer_args={"lr": 1e-4}
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiplvxb3g
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiplvxb3g/

Finally, we train the model using the PyTorch Lightning Trainer for 5 epochs.

logger = pl.loggers.TensorBoardLogger(save_dir=forcetut)
callbacks = [
        model_path=os.path.join(forcetut, "best_inference_model"),

trainer = pl.Trainer(
    max_epochs=5, # for testing, we restrict the number of epochs
), datamodule=ethanol_data)
GPU available: True, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/mitx/anaconda3/envs/spk2_test/lib/python3.8/site-packages/pytorch_lightning/trainer/ PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
Missing logger folder: ./forcetut/lightning_logs

  | Name    | Type                   | Params
0 | model   | NeuralNetworkPotential | 16.4 K
1 | outputs | ModuleList             | 0
16.4 K    Trainable params
0         Non-trainable params
16.4 K    Total params
0.066     Total estimated model params size (MB)
/home/mitx/anaconda3/envs/spk2_test/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/ PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
/home/mitx/anaconda3/envs/spk2_test/lib/python3.8/site-packages/pytorch_lightning/utilities/ UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 10. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
/home/mitx/anaconda3/envs/spk2_test/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/ PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.

Training will produce several files in the model_path directory, which is forcetut in our case. The split is stored in split.npz. Checkpoints are written to checkpoints periodically, which can be used to restart training. A copy of the best model is stored as best_inference_model, which can directly be accessed using the torch.load function.

You can have a look at the log using Tensorboard:

tensorboard --logdir=forcetut/default

It should be noted that the model trained here is used exclusively for demonstrative purposes. Accordingly, its size and the training time have been reduced significantly. This puts strong constraints on the accuracy that can be obtained. For practical applications, one would e.g. increase the number of features, the interaction layers, the learning rate schedule and train until convergence (removing the n_epochs keyword from the trainer). To quickly get started training state-of-the-art models, have a look at the command line interface, which comes with a series of pre-built configurations.

Using the model

Since all models in SchNetPack are stored in the same way, we can use the trained force model in exactly the same manner as described in the QM9 tutorial. To load the model stored in the best_inference_model file, we use the torch.load function. It will automatically be moved to the device it was trained on. The AtomsConverter can then be used to directly operate on ASE atoms objects (e.g. a molecule loaded from a file).

from ase import Atoms

# set device
device = torch.device("cuda")

# load model
model_path = os.path.join(forcetut, "best_inference_model")
best_model = torch.load(model_path, map_location=device)

# set up converter
converter = spk.interfaces.AtomsConverter(
    neighbor_list=trn.ASENeighborList(cutoff=5.0), dtype=torch.float32, device=device

# create atoms object from dataset
structure = ethanol_data.test_dataset[0]
atoms = Atoms(
    numbers=structure[], positions=structure[]

# convert atoms to SchNetPack inputs and perform prediction
inputs = converter(atoms)
results = best_model(inputs)

{'energy': tensor([-97197.4077], dtype=torch.float64, grad_fn=<AddBackward0>), 'forces': tensor([[-37.4005,  -8.7528,  17.0417],
        [ -5.2704, -20.8051,  -6.0660],
        [  8.2791,  -1.1684,  55.5458],
        [ 31.0775,  51.3363, -44.3595],
        [ 16.1287,  -3.1426,   3.2068],
        [ 43.6367,  18.0510,   4.7904],
        [-14.2506, -14.6068,  -2.0782],
        [-13.4426,  -3.2766,   2.2518],
        [-28.7580, -17.6352, -30.3327]], dtype=torch.float64,

Interface to ASE

Having access to molecular forces also makes it possible to perform a variety of different simulations. The SpkCalculator offers a simple way to perform all computations available in the ASE package (QM9 tutorial). Below, we create an ASE calculator from the trained model and the previously generated atoms object (see Preparing the data). One important point is, that the MD17 dataset uses kcal/mol and kcal/mol/Å as units for energies and forces. For use with ASE, these need to be converted to the standard internal ASE units eV and eV/Å. To do so, we need to pass the units of the energies and positions used by the model to the calculator. The calculator will use these to derive the units of properties like forces and stress.

calculator = spk.interfaces.SpkCalculator(


print("energy:", atoms.get_total_energy())
print("forces:", atoms.get_forces())
INFO:schnetpack.interfaces.ase_interface:Loading model from ./forcetut/best_inference_model
energy: -4214.878303077072
forces: [[-1.62183823 -0.37955435  0.73899666]
 [-0.22854481 -0.90219292 -0.26304624]
 [ 0.35901742 -0.05066551  2.4086958 ]
 [ 1.34764766  2.2261533  -1.92360918]
 [ 0.6994068  -0.13627476  0.13906061]
 [ 1.89226687  0.78276571  0.20772979]
 [-0.61796511 -0.63340956 -0.09012016]
 [-0.58292702 -0.14208913  0.097645  ]
 [-1.24706339 -0.76473293 -1.31535204]]

Among the simulations which can be done by using ASE and a force model are geometry optimisation, normal mode analysis and simple molecular dynamics simulations.

The AseInterface of SchNetPack offers a convenient way to perform basic versions of these computations. Only a file specifying the geometry of the molecule and a pretrained model are needed.

We will first generate a XYZ file containing an ethanol configuration:

from ase import io

# Generate a directory for the ASE computations
ase_dir = os.path.join(forcetut, 'ase_calcs')

if not os.path.exists(ase_dir):

# Write a sample molecule
molecule_path = os.path.join(ase_dir, '')
io.write(molecule_path, atoms, format='xyz')

The AseInterface is initialized by passing the path to the molecule, the model and a computation directory. In addition, the names of energies and forces model output, as well as their units, need to be provided (similar to the SpkCalculator). Computation device and floating point precision can be set via the device and dtype arguments.

ethanol_ase = spk.interfaces.AseInterface(
INFO:schnetpack.interfaces.ase_interface:Loading model from ./forcetut/best_inference_model

Geometry optimization

For some applications it is neccessary to relax a molecule to an energy minimum. In order to perform this optimization of the molecular geometry, we can simply call

                Step[ FC]     Time          Energy          fmax
BFGSLineSearch:    0[  0] 17:15:32    -4214.878487        3.2361
BFGSLineSearch:    1[  2] 17:15:32    -4215.334953        2.1019
BFGSLineSearch:    2[  4] 17:15:33    -4215.508052        1.0978
BFGSLineSearch:    3[  6] 17:15:33    -4215.569028        1.1677
BFGSLineSearch:    4[  8] 17:15:33    -4215.632554        0.5485
BFGSLineSearch:    5[ 10] 17:15:33    -4215.653430        0.5177
BFGSLineSearch:    6[ 12] 17:15:33    -4215.679364        0.5736
BFGSLineSearch:    7[ 14] 17:15:33    -4215.711106        0.3711
BFGSLineSearch:    8[ 16] 17:15:33    -4215.724128        0.5958
BFGSLineSearch:    9[ 18] 17:15:34    -4215.752866        0.4883
BFGSLineSearch:   10[ 20] 17:15:34    -4215.780940        0.6177
BFGSLineSearch:   11[ 22] 17:15:34    -4215.791426        0.5257
BFGSLineSearch:   12[ 24] 17:15:34    -4215.805315        0.4893
BFGSLineSearch:   13[ 26] 17:15:34    -4215.812500        0.3287
BFGSLineSearch:   14[ 28] 17:15:35    -4215.815933        0.2076
BFGSLineSearch:   15[ 29] 17:15:35    -4215.822284        0.2128
BFGSLineSearch:   16[ 30] 17:15:35    -4215.826568        0.3069
BFGSLineSearch:   17[ 33] 17:15:35    -4215.835913        0.3087
BFGSLineSearch:   18[ 35] 17:15:35    -4215.844832        0.2865
BFGSLineSearch:   19[ 37] 17:15:35    -4215.847019        0.1888
BFGSLineSearch:   20[ 38] 17:15:35    -4215.852365        0.1112
BFGSLineSearch:   21[ 39] 17:15:35    -4215.856973        0.0804
BFGSLineSearch:   22[ 40] 17:15:35    -4215.859844        0.1154
BFGSLineSearch:   23[ 41] 17:15:36    -4215.861084        0.0507
BFGSLineSearch:   24[ 42] 17:15:36    -4215.862084        0.0495
BFGSLineSearch:   25[ 43] 17:15:36    -4215.862876        0.0420
BFGSLineSearch:   26[ 44] 17:15:36    -4215.863316        0.0374
BFGSLineSearch:   27[ 45] 17:15:36    -4215.863477        0.0165
BFGSLineSearch:   28[ 46] 17:15:36    -4215.863523        0.0105
BFGSLineSearch:   29[ 47] 17:15:36    -4215.863533        0.0054

Since we trained only a reduced model, the accuracy of energies and forces is not optimal and several steps are needed to optimize the geometry.

Normal mode analysis

Once the geometry was optimized, normal mode frequencies can be obtained from the Hessian (matrix of second derivatives) of the molecule. The Hessian is a measure of the curvature of the potential energy surface and normal mode frequencies are useful for determining, whether an optimization has reached a minimum. Using the AseInterface, normal mode frequencies can be obtained via:

  #    meV     cm^-1
  0   11.8i     94.8i
  1    2.0i     16.1i
  2    0.9i      6.9i
  3    0.0i      0.3i
  4    0.0       0.1
  5    0.0       0.2
  6    0.5       4.0
  7   36.0     290.2
  8   48.6     392.1
  9   70.6     569.0
 10  104.4     842.0
 11  111.1     895.9
 12  118.6     956.4
 13  123.6     996.5
 14  128.2    1034.1
 15  149.8    1208.5
 16  156.0    1258.4
 17  163.1    1315.4
 18  170.0    1371.4
 19  171.4    1382.4
 20  178.7    1441.7
 21  337.5    2722.0
 22  378.6    3053.7
 23  382.2    3083.0
 24  393.1    3170.7
 25  407.5    3286.7
 26  412.2    3324.7
Zero-point energy: 2.021 eV

Imaginary frequencies indicate, that the geometry optimisation has not yet reached a minimum. The AseInterface also creates an file which can be used to visualize the vibrations with jmol.

Molecular dynamics

Finally, it is also possible to basic run molecular dynamics simulations using this interface. To do so, we first need to prepare the system, where we specify the simulation file. This routine automatically initializes the velocities of the atoms to a random number corresponding to a certain average kinetic energy.

/home/mitx/anaconda3/envs/spk2_test/lib/python3.8/site-packages/ase/md/ FutureWarning: Specify the temperature in K using the 'temperature_K' argument

The actual simulation is performed by calling the function run_md with a certain number of steps:


During simulation, energies and geometries are logged to simulation.log and simulation.traj, respectively.

We can for example visualize the evolution of the systems total and potential energies as

# Load logged results
results = np.loadtxt(os.path.join(ase_dir, 'simulation.log'), skiprows=1)

# Determine time axis
time = results[:,0]

# Load energies
energy_tot = results[:,1]
energy_pot = results[:,2]
energy_kin = results[:,3]

# Construct figure

# Plot energies
plt.plot(time, energy_tot, label='Total energy')
plt.plot(time, energy_pot, label='Potential energy')
plt.ylabel('E [eV]')

plt.plot(time, energy_kin, label='Kinetic energy')
plt.ylabel('E [eV]')
plt.xlabel('Time [ps]')

temperature = results[:,4]
print('Average temperature: {:10.2f} K'.format(np.mean(temperature)))
Average temperature:     197.47 K

As can be seen, the potential and kinetic energies fluctuate, while the total energy (sum of potential and kinetic energy) remains approximately constant. This is a good demonstration for the energy conservation obtained by modeling forces as energy derivatives. Unfortunately, this also means that energy conservation is not a sufficient measure for the quality of the potential.

However, frequently one is interested in simulations where the system is coupled to an external heat bath. This is the same as saying that we wish to keep the average kinetic energy of the system and hence temperature close to a certain value. Currently, the average temperature only depends on the random velocities drawn during the initialization of the dynamics. Keeping a constant temperature average can be achieved by using a so-called thermostat. In the AseInterface, simulations with a thermostat (to be precise a Langevin thermostat) can be carried out by providing the temp_bath keyword. A simulation with e.g. the target temperature of 300K is performed via:

ethanol_ase.optimize(fmax=1e-2) # reoptimize structure

                Step[ FC]     Time          Energy          fmax
BFGSLineSearch:    0[  0] 17:16:32    -4215.625438        1.4607
BFGSLineSearch:    1[  2] 17:16:33    -4215.742612        1.0855
BFGSLineSearch:    2[  4] 17:16:33    -4215.772789        0.7915
BFGSLineSearch:    3[  6] 17:16:33    -4215.805078        0.5537
BFGSLineSearch:    4[  8] 17:16:33    -4215.811527        0.4483
BFGSLineSearch:    5[  9] 17:16:33    -4215.824273        0.2383
BFGSLineSearch:    6[ 11] 17:16:33    -4215.831291        0.3442
BFGSLineSearch:    7[ 13] 17:16:33    -4215.834314        0.2057
BFGSLineSearch:    8[ 14] 17:16:33    -4215.838707        0.1232
BFGSLineSearch:    9[ 15] 17:16:33    -4215.842975        0.1588
BFGSLineSearch:   10[ 16] 17:16:34    -4215.845716        0.1364
BFGSLineSearch:   11[ 17] 17:16:34    -4215.847659        0.0975
BFGSLineSearch:   12[ 18] 17:16:34    -4215.848325        0.0790
BFGSLineSearch:   13[ 19] 17:16:34    -4215.848835        0.0570
BFGSLineSearch:   14[ 21] 17:16:34    -4215.849115        0.0382
BFGSLineSearch:   15[ 23] 17:16:34    -4215.849546        0.0626
BFGSLineSearch:   16[ 25] 17:16:35    -4215.850433        0.0838
BFGSLineSearch:   17[ 28] 17:16:35    -4215.853214        0.1644
BFGSLineSearch:   18[ 29] 17:16:35    -4215.855430        0.1865
BFGSLineSearch:   19[ 30] 17:16:35    -4215.858360        0.1516
BFGSLineSearch:   20[ 32] 17:16:35    -4215.859622        0.1566
BFGSLineSearch:   21[ 33] 17:16:36    -4215.862199        0.0752
BFGSLineSearch:   22[ 34] 17:16:36    -4215.863491        0.1810
BFGSLineSearch:   23[ 35] 17:16:36    -4215.864008        0.0852
BFGSLineSearch:   24[ 37] 17:16:36    -4215.864357        0.0353
BFGSLineSearch:   25[ 38] 17:16:36    -4215.864445        0.0179
BFGSLineSearch:   26[ 39] 17:16:36    -4215.864465        0.0105
BFGSLineSearch:   27[ 40] 17:16:37    -4215.864472        0.0065

We can now once again plot total and potential energies. Instead of the kinetic energy, we plot the temperature (both quantities are directly related).

skip_initial = 5000

# Load logged results
results = np.loadtxt(os.path.join(ase_dir, 'simulation_300K.log'), skiprows=1)

# Determine time axis
time = results[skip_initial:,0]
# Load energies
energy_tot = results[skip_initial:,1]
energy_pot = results[skip_initial:,2]

# Construct figure

# Plot energies
plt.plot(time, energy_tot, label='Total energy')
plt.plot(time, energy_pot, label='Potential energy')
plt.ylabel('Energies [eV]')

# Plot Temperature
temperature = results[skip_initial:,4]

# Compute average temperature
print('Average temperature: {:10.2f} K'.format(np.mean(temperature)))

plt.plot(time, temperature, label='Simulation')
plt.ylabel('Temperature [K]')
plt.xlabel('Time [ps]')
plt.plot(time, np.ones_like(temperature)*300, label='Target')
Average temperature:     261.25 K

Since our molecule is now subjected to external influences via the thermostat the total energy is no longer conserved. However, the simulation temperature now fluctuates near to the requested 300K. This can also be seen by computing the temperature average over time, which is now close to the desired value in contrast to the previous simulation.


In this tutorial, we have trained a SchNet model on energies and forces using the MD17 ethanol dataset as an example. We have then evaluated the performance of the model and performed geometry optimisation, normal mode analysis and basic molecular dynamic simulations using the SchNetPack ASE interface.

While these simulations can already be useful for practical applications, SchNetPack also comes with its own molecular dynamics package. This package makes it possible to run efficient simulations on GPU and also offers access to advanced techniques, such as ring polymer dynamics. In the next tutorial, we will cover how to perform molecular dynamics simulations directly with SchNetPack.