{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Molecular dynamics in SchNetPack\n",
    "\n",
    "In the [previous tutorial](tutorial_03_force_models.ipynb) we have covered how to train machine learning (ML) models on molecular forces and use them for basic molecular dynamics (MD) simulations with the SchNetPack ASE interface.\n",
    "\n",
    "All these simulations can also be carried out using the native MD package available in SchNetPack.\n",
    "The main ideas behind integrating MD functionality directly into SchNetPack are:\n",
    "- improve performance by reducing the communication overhead between ML models and the MD code and adding the option to use GPUs\n",
    "- adding extended functionality, such as sampling algorithms (e.g. constant temperature or pressure) and ring polymer MD\n",
    "- providing a modular MD environment for easy development and interfacing\n",
    "\n",
    "In the following, we first introduce the general structure of the SchNetPack-MD package.\n",
    "Then the simulation from the previous tutorial will be used as an example to demonstrate how to implement basic MD with SchNetPack-MD.\n",
    "Having done so, we will cover a few advanced simulation techniques, such as ring polymer MD.\n",
    "\n",
    "Finally, we will show how all of these different simulations can be accessed via the `hydra` CLI and config files."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Getting started\n",
    "\n",
    "Before we can begin with the main tutorial, some setup is required.\n",
    "First, we generate a directory in which the simulations will be carried out:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "md_workdir = \"mdtut\"\n",
    "\n",
    "# Gnerate a directory of not present\n",
    "if not os.path.exists(md_workdir):\n",
    "    os.mkdir(md_workdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Since we want to run MD simulations, we need a SchNetPack model trained on forces and a molecular structure as a starting point.\n",
    "In principle, we could use the ethanol model and structure generated in the previous tutorial.\n",
    "However, the model trained in the force tutorial was only intended as a demonstration and is not the most accurate.\n",
    "\n",
    "\n",
    "Instead, we will use a sample ethanol structure, as well as a fully converged PaiNN model of ethanol provided in `schnetpack/tests/testdata` for this tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import schnetpack as spk\n",
    "\n",
    "# Get the parent directory of SchNetPack\n",
    "spk_path = os.path.abspath(os.path.join(os.path.dirname(spk.__file__), \"../..\"))\n",
    "\n",
    "# Get the path to the test data\n",
    "test_path = os.path.join(spk_path, \"tests/testdata\")\n",
    "\n",
    "# Load model and structure\n",
    "model_path = os.path.join(test_path, \"md_ethanol.model\")\n",
    "molecule_path = os.path.join(test_path, \"md_ethanol.xyz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## MD in SchNetPack\n",
    "\n",
    "In general, a MD code needs to carry out several core tasks during each simulation step. \n",
    "It has to keep track of the positions $\\mathbf{R}$ and momenta $\\mathbf{p}$ of all nuclei, compute the forces $\\mathbf{F}$ acting on the nuclei and use the latter to integrate Newton's equations of motion.\n",
    "\n",
    "<img src=\"tutorials_figures/md_flowchart.svg\" width=\"200\" style=\"padding: 5px 15px; float: left;\">\n",
    "\n",
    "The overall workflow used in the SchNetPack-MD package is sketched in the figure above.\n",
    "As can be seen, the various tasks are distributed between different modules:\n",
    "\n",
    " - The `System` class contains all information on the present state of the simulated system (e.g. nuclear positions and momenta).\n",
    " - The `Integrator` computes the positions and momenta of the next step and updates the state of the system accordingly.\n",
    " - In order to carry out this update, the nuclear forces are required. These are computed with a `Calculator`, which takes the positions of atoms and returns the corresponding forces. Typically, the `Calculator` consists of a previously trained machine learning model.\n",
    "\n",
    "All these modules are linked together in the `Simulator` class, which contains the main MD loop and calls the three previous modules in the correct order.\n",
    "\n",
    "We will now describe the different components of the MD package in more detail and give an example of how to set up a short MD simulation of an ethanol molecule."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### System\n",
    "\n",
    "As stated previously, `System` keeps track of the state of the simulated system and contains the atomic positions $\\mathbf{R}$ and momenta $\\mathbf{p}$, but also e.g. atom types, simulation cells and computed molecular properties.\n",
    "\n",
    "A special property of SchNetPack-MD is the use of multidimensional tensors to store the system information (using the `torch.Tensor` class).\n",
    "This makes it possible to make full use of vectorization and e.g. simulate several different molecules as well as different replicas of a molecule in a single step.\n",
    "The general shape of these system tensors is similar to the structure used in the SchNetPack models: $N_\\textrm{replicas} \\times (N_\\textrm{molecules} \\cdot N_\\textrm{atoms}) \\times \\ldots$, where the first dimension is the number of replicas of the same molecule (e.g. for ring polymer MD) and the second runs over the atoms of all molecules. This removes the need for extensive zero-padding  and masking when simulating molecules of different sizes.\n",
    "\n",
    "To prepare a system for simulation, first an empty instance of the `System` class is created.\n",
    "Afterwards, the molecules which should be simulated need to be loaded via `load_molecules`.\n",
    "This function takes `ase.Atoms` objects as inputs, which can be created from all file formats supported by ASE with the `ase.io.read` function.\n",
    "Here, we use an XYZ file containing the structure of a single ethanol.\n",
    "If a list of `Atoms` objects is provided to `load_molecules`, multiple molecules can be loaded at once.\n",
    "Next, the number of replicas used during MD needs to be given. For a standard MD $N_\\mathrm{replicas}=1$.\n",
    "Finally, the units of length used in the input structures needs to be provided, in this case `Angstrom`.\n",
    "This is necessary, since the MD code uses its own unit system internally and inputs are converted automatically.\n",
    "(If atomic masses are to be loaded from the input files, a unit of mass also needs to be provided. By default, Daltons are assumed.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md import System\n",
    "from ase.io import read\n",
    "\n",
    "# Load atoms with ASE\n",
    "molecule = read(molecule_path)\n",
    "\n",
    "# Number of molecular replicas\n",
    "n_replicas = 1\n",
    "\n",
    "# Create system instance and load molecule\n",
    "md_system = System()\n",
    "md_system.load_molecules(molecule, n_replicas, position_unit_input=\"Angstrom\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Right now, all system momenta are set to zero.\n",
    "For practical purposes, one usually wants to draw the momenta from a distribution corresponding to a certain temperature.\n",
    "This can be done with an `Initializer`, which takes the temperature in Kelvin as an input.\n",
    "For this example, we use `UniformInit` which draws all momenta from a random uniform distribution and rescales them to a certain temperature.\n",
    "Other routines, e.g. drawing from a Maxwell&mdash;Boltzmann distribution, are also available.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md import UniformInit\n",
    "\n",
    "system_temperature = 300  # Kelvin\n",
    "\n",
    "# Set up the initializer\n",
    "md_initializer = UniformInit(\n",
    "    system_temperature,\n",
    "    remove_center_of_mass=True,\n",
    "    remove_translation=True,\n",
    "    remove_rotation=True,\n",
    ")\n",
    "\n",
    "# Initialize the system momenta\n",
    "md_initializer.initialize_system(md_system)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The `Initializer` can also be used to center the moleculer structure on the center of mass and remove all translational and rotational components of the momenta, as was done in the code snippet above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Integrator\n",
    "\n",
    "Having set up the system in such a manner, one needs to specify how the equations of motion should be propagated.\n",
    "Currently, there are two integration schemes implemented in SchNetPack:\n",
    "- a Velocity Verlet integrator which evolves the system in a purely classical manner and\n",
    "- a ring polymer integrator which is able to model a certain degree of nuclear quantum effects\n",
    "\n",
    "For demonstration purposes, we will first focus on a purely classical MD using the Velocity Verlet algorithm.\n",
    "An example on how to use ring polymer MD in SchNetPack and potential benefits will be given later in the tutorial.\n",
    "To initialize the integrator, one has to specify the length of the timestep $\\Delta t$ used for integration in units of femtoseconds.\n",
    "A common value for classical MD is $\\Delta t = 0.5$&thinsp;fs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.integrators import VelocityVerlet\n",
    "\n",
    "time_step = 0.5  # fs\n",
    "\n",
    "# Set up the integrator\n",
    "md_integrator = VelocityVerlet(time_step)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Calculator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The only ingredient missing for simulating our system is a `Calculator` to compute molecular forces and other properties.\n",
    "A `Calculator` can be thought of as an interface between a computation method (e.g. a machine learning model) and the MD code in SchNetPack.\n",
    "SchNetPack comes with several predefined calculators and also offers the possibility to implement custom calculators.\n",
    "\n",
    "Right now, we are only interested in using a model trained with SchNetPack, hence we use the `SchNetPackCalculator`.\n",
    "To initialize the `SchNetPackCalculator`, two steps are required.\n",
    "\n",
    "First, we have to define a neighbor list which computes interatomic distances during the MD.\n",
    "This is done with the `NeighborListMD` wrapper, which can set up any SchNetPack neighbor list transform to be suitable for simulations.\n",
    "The required arguments are a cutoff radius and buffer region (both use the same length units as the model).\n",
    "Using a buffer region has the advantage, that neighbor lists only needs to be recomputed from scratch once the atoms move a certain distance.\n",
    "Here, we use a cutoff of 5 &Aring;, a buffer of 2 &Aring; and the `ASENeigborList` as basic neighbor list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.neighborlist_md import NeighborListMD\n",
    "from schnetpack.transform import ASENeighborList\n",
    "\n",
    "# set cutoff and buffer region\n",
    "cutoff = 5.0  # Angstrom (units used in model)\n",
    "cutoff_shell = 2.0  # Angstrom\n",
    "\n",
    "# initialize neighbor list for MD using the ASENeighborlist as basis\n",
    "md_neighborlist = NeighborListMD(\n",
    "    cutoff,\n",
    "    cutoff_shell,\n",
    "    ASENeighborList,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "In a second step, the `Calculator` needs to be set up.\n",
    "Similar as for the ASE interface in the [last tutorial](tutorial_03_force_models.ipynb), we have to tell the calculator the path to a stored model and how the forces are named in the output.\n",
    "It is also necessary to specify which units the calculator expects for positions and for the energy.\n",
    "Force units are converted automatically based on these two inputs.\n",
    "Optionally, one can also provide an `energy_key` to store the potential energies in the `System` class if returned by the model.\n",
    "Additional properties (e.g. dipole moments, etc.) can be requested with the `required_properties` argument.\n",
    "\n",
    "Here, we use the pretrained model defined above.\n",
    "Since it is a SchNetPack model, we can use SchNetPack property definitions for force (and energy) keys.\n",
    "With regard to units, the current calculator uses &Aring; for positions and kcal/mol; for the energies.\n",
    "The conversion factors can either be given as a number or as a string (please use `Angstrom` or `Ang` for &Aring;, since `A` would indicate Ampere)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.calculators import SchNetPackCalculator\n",
    "from schnetpack import properties\n",
    "\n",
    "md_calculator = SchNetPackCalculator(\n",
    "    model_path,  # path to stored model\n",
    "    \"forces\",  # force key\n",
    "    \"kcal/mol\",  # energy units\n",
    "    \"Angstrom\",  # length units\n",
    "    md_neighborlist,  # neighbor list\n",
    "    energy_key=\"energy\",  # name of potential energies\n",
    "    required_properties=[],  # additional properties extracted from the model\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Simulator (bringing it all together)\n",
    "\n",
    "With our molecular system, a machine learning calculator for the forces and an integrator at hand, we are almost ready to carry out a MD simulation.\n",
    "The last step is to pass all these ingredients to a `Simulator`.\n",
    "The `Simulator` performs the actual MD simulations, looping over a series of time steps and calling the individual modules in the appropriate order:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md import Simulator\n",
    "\n",
    "md_simulator = Simulator(md_system, md_integrator, md_calculator)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Before starting the actual simulation, it is also recommended to set the desired floating point precision and computational device:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "# check if a GPU is available and use a CPU otherwise\n",
    "if torch.cuda.is_available():\n",
    "    md_device = \"cuda\"\n",
    "else:\n",
    "    md_device = \"cpu\"\n",
    "\n",
    "# use single precision\n",
    "md_precision = torch.float32\n",
    "\n",
    "# set precision\n",
    "md_simulator = md_simulator.to(md_precision)\n",
    "# move everything to target device\n",
    "md_simulator = md_simulator.to(md_device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "To finally carry out the simulation, one needs to call the `simulate` function with an integer argument specifying the number of simulation steps.\n",
    "\n",
    "For example, a MD simulation of our ethanol molecule for 100 time steps (50&thinsp;fs) can be done via:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "n_steps = 100\n",
    "\n",
    "md_simulator.simulate(n_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Since the `Simulator` keeps track of the state of the dynamics and the system, we can call it repeatetly to get longer trajectories."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "md_simulator.simulate(n_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The actual number of steps is stored in the `step` variable of the `Simulator` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "print(\"Total number of steps:\", md_simulator.step)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Although we are now able to run a full-fledged MD simulation, there is one major problem with the current setup:\n",
    "we do not collect any information during the simulation, such as nuclear positions.\n",
    "This means, that we currently have no way of analyzing what happened during the MD trajectory.\n",
    "\n",
    "\n",
    "This &mdash; and many other things &mdash; can be done in the SchNetPack-MD package using so-called simulation hooks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "\n",
    "\n",
    "## Simulation hooks\n",
    "\n",
    "\n",
    "Simulation hooks can be thought as instructions for the `Simulator`, which are performed at certain points during each MD step.\n",
    "They can be used to tailor a simulation to specific needs, contributing to the customizability of the SchNetPack-MD package.\n",
    "\n",
    "<img src=\"tutorials_figures/integrator.svg\" width=\"370\" style=\"padding: 20px 20px; float: left\">\n",
    "\n",
    "The diagram above shows how a single MD step of the `Simulator` is structured in detail and at which points hooks can be applied.\n",
    "Depending on which time they are called and which actions they encode, simulation hooks can achieve a wide range of tasks.\n",
    "\n",
    "If they are introduced before and after each integration half-step, they can e.g. be used to control the temperature of the system in the form of thermostats.\n",
    "\n",
    "When acting directly after the computation of the forces done by the `Calculator`, simulation hooks can be used to control sampling.\n",
    "At this point, enhanced sampling schemes such as metadynamics and accelerated MD can be implemented, which modify the forces and potential energies of the system.\n",
    "It is also possible to introduce active learning for automatically generating machine learning models in this way.\n",
    "\n",
    "Finally, when called after the second integration step, simulation hooks can be used to collect and store information on the system, which can then be used for analysis.\n",
    "\n",
    "Multiple hooks can be passed to a simulator at any time, which makes it possible to control a simulation in various manners.\n",
    "In the following, we will demonstrate how to apply a thermostat to the above simulation and how data can be collected."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Adding a Thermostat\n",
    "\n",
    "As mentioned in the [force tutorial](tutorial_03_force_models.ipynb), thermostats are used to keep the fluctuations of the kinetic energy of a system (temperature) close to a predefined average.\n",
    "Simulations employing thermostats are referred to as canonical ensemble or $NVT$ simulations, since they keep the number of particles $N$, the volume $V$ and the average temperature $T$ constant.\n",
    "\n",
    "Last time, we used a Langevin thermostat to regulate the temperature of our simulation.\n",
    "This thermostat (and many others) is also available in SchNetPack and can be used via"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.simulation_hooks import LangevinThermostat\n",
    "\n",
    "# Set temperature and thermostat constant\n",
    "bath_temperature = 300  # K\n",
    "time_constant = 100  # fs\n",
    "\n",
    "# Initialize the thermostat\n",
    "langevin = LangevinThermostat(bath_temperature, time_constant)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "In case of the Langevin thermostat, a bath temperature (in Kelvin) and a time constant (in fs) have to be provided.\n",
    "The first regulates the temperature the system is kept at, the second how fast the thermostat adjusts the temperature. \n",
    "In order to speed up equilibration, we use a more aggressive thermostatting schedule by with a time constant of 100&thinsp;fs.\n",
    "\n",
    "Finally, we begin collecting the simulation hooks we want to pass to the simulator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "simulation_hooks = [langevin]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Collecting Data\n",
    "\n",
    "The primary way to store simulation data in the SchNetPack-MD package is via the `FileLogger` class.\n",
    "A `FileLogger` collects data during the MD and stores it to a database in HDF5 format.\n",
    "The type of data to be collected is specified via so-called `DataStreams`, which are passed to the `FileLogger`.\n",
    "The data streams currently available in SchNetPack are:\n",
    "\n",
    "- `MoleculeStream`: Stores positions and velocities during all simulation steps\n",
    "- `PropertyStream`: Stores properties predicted by the calculator (no unit conversion is performed unless requested)\n",
    "\n",
    "To reduce overhead due to writing to disk, the `FileLogger` first collects information for a certain number of steps into a buffer, which it then writes to the database at once.\n",
    "\n",
    "The `FileLogger` is initialized by specifying the name of the target database, the size of the buffer and which data to store (in form of the respective data streams). In addition, logging frequency and floating point precision can be specified:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.simulation_hooks import callback_hooks\n",
    "\n",
    "# Path to database\n",
    "log_file = os.path.join(md_workdir, \"simulation.hdf5\")\n",
    "\n",
    "# Size of the buffer\n",
    "buffer_size = 100\n",
    "\n",
    "# Set up data streams to store positions, momenta and the energy\n",
    "data_streams = [\n",
    "    callback_hooks.MoleculeStream(store_velocities=True),\n",
    "    callback_hooks.PropertyStream(target_properties=[properties.energy]),\n",
    "]\n",
    "\n",
    "# Create the file logger\n",
    "file_logger = callback_hooks.FileLogger(\n",
    "    log_file,\n",
    "    buffer_size,\n",
    "    data_streams=data_streams,\n",
    "    every_n_steps=1,  # logging frequency\n",
    "    precision=32,  # floating point precision used in hdf5 database\n",
    ")\n",
    "\n",
    "# Update the simulation hooks\n",
    "simulation_hooks.append(file_logger)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Storing checkpoints\n",
    "\n",
    "In general, it is also a good idea to store checkpoints of the system and simulation state at regular intervals.\n",
    "Should something go wrong, these can be used to restart the simulation from the last stored point.\n",
    "In addition, checkpoints can also be used to initialize the `System` to a certain state.\n",
    "This is e.g. useful for equilibrating simulations with different thermostats.\n",
    "\n",
    "Storing checkpoints can be done with the `Checkpoint` hook, which takes a file the data is stored to and the frequency a checkpoint is generated:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Set the path to the checkpoint file\n",
    "chk_file = os.path.join(md_workdir, \"simulation.chk\")\n",
    "\n",
    "# Create the checkpoint logger\n",
    "checkpoint = callback_hooks.Checkpoint(chk_file, every_n_steps=100)\n",
    "\n",
    "# Update the simulation hooks\n",
    "simulation_hooks.append(checkpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Monitoring the simulation with TensorBoard\n",
    "\n",
    "It is also possible to directly monitor simulations with a `TensorBoardLogger`.\n",
    "This callback collects various properties and stores them to log file that can be viewed with TensorBoard.\n",
    "Different properties can be specified via keywords:\n",
    "\n",
    "- `energy`: log kinetic and potential energies.\n",
    "- `temperature`: log the system temperature.\n",
    "- `pressure`: log the pressure during NPT simulations.\n",
    "- `volume`: log cell volumes during NPT simulations.\n",
    "\n",
    "E.g., to monitor potential and kinetic energies, as well as the temperature, the following setup can be used:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# directory where tensorboard log will be stored to\n",
    "tensorboard_dir = os.path.join(md_workdir, \"logs\")\n",
    "\n",
    "tensorboard_logger = callback_hooks.TensorBoardLogger(\n",
    "    tensorboard_dir,\n",
    "    [\"energy\", \"temperature\"],  # properties to log\n",
    ")\n",
    "\n",
    "# update simulation hooks\n",
    "simulation_hooks.append(tensorboard_logger)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The simulation progress of the different properties can then be viewed in TensorBoard via\n",
    "\n",
    "    tensorboard --logdir=mdtut/logs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Adding hooks and running the simulation\n",
    "\n",
    "With all simulation hooks created and collected in `simulation_hooks`, we can finally build our updated simulator.\n",
    "This is done exactly the same way as above, with the difference that now the hooks are specified as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "md_simulator = Simulator(\n",
    "    md_system, md_integrator, md_calculator, simulator_hooks=simulation_hooks\n",
    ")\n",
    "\n",
    "md_simulator = md_simulator.to(md_precision)\n",
    "md_simulator = md_simulator.to(md_device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "We can now use the simulator to run a MD trajectory of our ethanol. Here, we run for 20000 steps, which are 10&thinsp;ps.\n",
    "This should take approximately 5 minutes on a notebook GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "n_steps = 20000\n",
    "\n",
    "md_simulator.simulate(n_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "After the simulation, the tutorial directory should contain the following entries:\n",
    "- `simulation.hdf5` holding the collected data\n",
    "- `simulation.chk` containing the last checkpoint and\n",
    "- `logs` which contains the TensorBoard log files."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Reading HDF5 outputs\n",
    "\n",
    "We will now show, how to access the HDF5 files generated during the simulation.\n",
    "For this purpose, SchNetPack comes with a `HDF5Loader`, which can be used to extract the data by giving the path to the simulation output (`mdtut/simulation.hdf5`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.data import HDF5Loader\n",
    "\n",
    "data = HDF5Loader(log_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Extracted data is stored in the `properties` dictionary of the `HDF5Loader` and can be accessed with the `get_property` function.\n",
    "Right now, we can access the following entries, most of which should be self explaining and correspond to the standard SchNetPack `properties` keys:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "for prop in data.properties:\n",
    "    print(prop)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "`masses` are the atomic masses in Dalton. The two energies `energy` and `energy_system` are present, since we a) requested the energy from the `Calculator` to be stored as the potential energy of `System` with the `energy_key` keyword and b) we requested the `energy` property to be logged in the `PropertyStream` of the `FileLogger`. The difference between both quantities is that the former uses SchNetPack internal units (kJ/mol) while the latter uses the `Calculator`'s units (kcal/mol). While logging both is not necessary for practical purposes, it serves as an example for the different ways to log quantities during simulations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Extracting basic properties\n",
    "\n",
    "We can use both potential energies to show how properties can be extracted from the HDF5 file in different ways.\n",
    "As stated above, one way is to use the `get_property` function.\n",
    "It requires the name of the property and the information whether the property is atomistic (entries for every single atom, e.g. forces) or not (e.g. energies).\n",
    "Optionally, the index of the molecule and replica for which the data should be extracted can also be provided.\n",
    "By default, `get_property` extracts the first molecule (`mol_idx=0`) and averages over all replicas if more than one are present.\n",
    "Neither is relevant for our current simulation.\n",
    "\n",
    "Alternatively, there are several hardcoded functions to retrieve specific properties from the logged system information (e.g. temperature, kinetic energy, etc.).\n",
    "Here, we use the `get_potential_energy` function to directly access the system potential energy (`energy_system`).\n",
    "Shown are the last 200 fs of the trajectory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from schnetpack import units as spk_units\n",
    "\n",
    "\n",
    "# Get the energy logged via PropertiesStream\n",
    "energies_calculator = data.get_property(properties.energy, atomistic=False)\n",
    "# Get potential energies stored in the MD system\n",
    "energies_system = data.get_potential_energy()\n",
    "\n",
    "# Check the overall shape\n",
    "print(\"Shape:\", energies_system.shape)\n",
    "\n",
    "# Get the time axis\n",
    "time_axis = np.arange(data.entries) * data.time_step / spk_units.fs  # in fs\n",
    "\n",
    "# Convert the system potential energy from internal units (kJ/mol) to kcal/mol\n",
    "energies_system *= spk_units.convert_units(\"kJ/mol\", \"kcal/mol\")\n",
    "\n",
    "# Plot the energies\n",
    "plt.figure()\n",
    "plt.plot(time_axis, energies_system, label=\"E$_\\mathrm{pot}$ (System)\")\n",
    "plt.plot(time_axis, energies_calculator, label=\"E$_\\mathrm{pot}$ (Logger)\", ls=\"--\")\n",
    "plt.ylabel(\"E [kcal/mol]\")\n",
    "plt.xlabel(\"t [fs]\")\n",
    "plt.xlim(9800, 10000)\n",
    "plt.legend()\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "As would be expected, once the energy stored in the `System` class has been converted from MD internal units to kcal/mol with the `convert_units` function, both curves are identical."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The functions provided by the `HDF5Loader` also offer access to a number of derived properties, such as the kinetic energy (`get_kinetic_energy`), the temperature (`get_temperature`) or the pressure (`get_pressure`).\n",
    "\n",
    "Since we performed a NVT simulation, it makes sense to inspect the temperature for our example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "\n",
    "def plot_temperature(data):\n",
    "\n",
    "    # Read the temperature\n",
    "    temperature = data.get_temperature()\n",
    "\n",
    "    # Compute the cumulative mean\n",
    "    temperature_mean = np.cumsum(temperature) / (np.arange(data.entries) + 1)\n",
    "\n",
    "    # Get the time axis\n",
    "    time_axis = np.arange(data.entries) * data.time_step / spk_units.fs  # in fs\n",
    "\n",
    "    plt.figure(figsize=(8, 4))\n",
    "    plt.plot(time_axis, temperature, label=\"T\")\n",
    "    plt.plot(time_axis, temperature_mean, label=\"T (avg.)\")\n",
    "    plt.ylabel(\"T [K]\")\n",
    "    plt.xlabel(\"t [fs]\")\n",
    "    plt.legend()\n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "plot_temperature(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "As can be seen, the system requires an initial period for equilibration.\n",
    "This is relevant for simulations, as ensemble properties are typically only computed for fully equilibrated systems.\n",
    "\n",
    "In SchNetPack, the equilibration period can be accounted for in different ways.\n",
    "The checkpoint file of the equilibrated system can be used as a starting point for a production simulation.\n",
    "The easier way is to reject the initial part of the trajectory and only consider the equilibrated system for analysis.\n",
    "This can be done by specifying the number of steps to skip in the `HDF5Loader`. \n",
    "Here, we skip the first half (10&thinsp;000 steps) of the trajectory.\n",
    "In general, the equilibration period strongly depends on system size and the thermostat settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "equilibrated_data = HDF5Loader(log_file, skip_initial=10000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "We can easily see, that only the later part of the simulation is now considered by plotting the data once again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "plot_temperature(equilibrated_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "As stated before, the HDF5 datafile uses a special convention for units.\n",
    "For internal quantities (e.g. positions, velocities and kinetic energy), internal MD units are used (based on kJ/mol, nm and Dalton).\n",
    "The only exception are temperatures, which are given in units of Kelvin for convenience.\n",
    "For all properties computed by the `Calculator` the original unit returned by the model is used, unless a conversion factor is specified during initialization.\n",
    "This means, that the energies and forces collected here have units of kcal/mol and kcal/mol/&Aring;."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Extracting structures\n",
    "\n",
    "Structures can be extracted from the HDF5 database with the `convert_to_atoms` function.\n",
    "It returns a list of ASE `Atoms` objects of all sampled configurations.\n",
    "These can then e.g. be used to store the trajectory in XYZ file format using the `ase.io.write` utilities:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from ase.io import write\n",
    "\n",
    "# extract structure information from HDF5 data\n",
    "md_atoms = data.convert_to_atoms()\n",
    "\n",
    "# write list of Atoms to XYZ file\n",
    "write(os.path.join(md_workdir, \"trajectory.xyz\"), md_atoms, format=\"xyz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Vibrational spectra\n",
    "\n",
    "While curves of temperatures and wobbling atoms might be nice to look at, one is usually also interested in other quantities when running a MD simulation.\n",
    "One example are vibrational spectra, which give information on which vibrations are active in a molecule.\n",
    "SchNetPack provides the module `schnetpack.md.data.spectra`, which contains different classes for computing vibrational spectra directly from the HDF5 files.\n",
    "These implementations use several tricks to improve efficiency and accuracy.\n",
    "Currently, power spectra (`PowerSpectrum`), infrared (IR) spectra (`IRSpectrum`) and Raman Spectra (`RamanSpectrum`) are available.\n",
    "\n",
    "Here, we will compute the power spectrum from our ethanol simulation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.data import PowerSpectrum\n",
    "\n",
    "# Initialize the spectrum\n",
    "spectrum = PowerSpectrum(equilibrated_data, resolution=2048)\n",
    "\n",
    "# Compute the spectrum for the first molecule (default)\n",
    "spectrum.compute_spectrum(molecule_idx=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The `resolution` keyword specifies, how finely the peaks in the spectrum are resolved.\n",
    "`PowerSpectrum` also computes the effective resolution in inverse centimeters, as well as the spectral range.\n",
    "For molecules, one is usually interested in frequencies up to 4000&thinsp;cm<sup>-1</sup>, which we will use to restrict the plotting area.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Get frequencies and intensities\n",
    "frequencies, intensities = spectrum.get_spectrum()\n",
    "\n",
    "# Plot the spectrum\n",
    "plt.figure()\n",
    "plt.plot(frequencies, intensities)\n",
    "plt.xlim(0, 4000)\n",
    "plt.ylim(0, 100)\n",
    "plt.ylabel(\"I [a.u.]\")\n",
    "plt.xlabel(\"$\\omega$ [cm$^{-1}$]\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The spectrum shows several typical vibrational bands for ethanol (which can be checked with [experimental tables available online](https://chem.libretexts.org/Ancillary_Materials/Reference/Reference_Tables/Spectroscopic_Parameters/Infrared_Spectroscopy_Absorption_Table)).\n",
    "For example, the peak close to 3700&thinsp;cm<sup>-1</sup> stems from the bond vibrations of the OH-group.\n",
    "The bond vibrations of the CH<sub>3</sub> and CH<sub>2</sub> groups are clustered around 3000&thinsp;cm<sup>-1</sup> and the corresponding bending vibrations can be seen at 1400&thinsp;cm<sup>-1</sup>.\n",
    "In general, computed vibrational spectra serve as a good check for the validity of a machine learning potential.\n",
    "\n",
    "One important fact should be noted at this point: the spectrum computed here is a power spectrum, representing the vibrational density of states.\n",
    "It gives information on **all** vibrational modes which **can** potentially be active in an experimental spectrum.\n",
    "Hence, it can help in identifying which motions give rise to which experimental bands.\n",
    "However, depending on the experiment, only a subset of the peaks of the power spectrum can be active and the intensities can vary dramatically.\n",
    "As such, a power spectrum only serves as a poor stand in for simulating e.g. Raman or IR spectra.\n",
    "Using SchNetPack, it is also possible to model e.g. IR spectra, by training a model on dipoles in addition to forces (for example with FieldSchNet).\n",
    "Simulations can then be done in the same manner as above and the corresponding IR spectra can be obtained using `IRSpectrum` instead of `PowerSpectrum`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Restarting simulations\n",
    "\n",
    "In some situations, it is convenient to restart simulations from a previously stored checkpoint, e.g. when the cluster burned down for no apparent reasons.\n",
    "\n",
    "In SchNetPack, this can be done by loading the checkpoint file with `torch` and then passing it to a `Simulator` using the `restart_simulation` function (here we use the same instance of simulator as before for convenience, in a real setup a new one would be initialized)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "checkpoint = torch.load(chk_file, weights_only=False)\n",
    "md_simulator.restart_simulation(checkpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "This restores the full state, including the system state, simulation steps and states of the thermostats.\n",
    "\n",
    "Sometimes, it can be sufficient to only restore the system state (positions and momenta), for example when starting production level simulations after equilibration.\n",
    "This is achieved by calling the `load_system_state` function of the simulator system (`Simulator.system`) on the loaded checkpoint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "md_simulator.system.load_system_state(checkpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Ring polymer molecular dynamics with SchNetPack\n",
    "\n",
    "Above, we have computed a vibrational spectrum of ethanol based on a classical MD simulation using the Velocity Verlet integrator.\n",
    "Unfortunately, this approach completely neglects nuclear quantum effects, such as zero-point energies, etc.\n",
    "One way to recover some of these effects is to use so-called ring polymer molecular dynamics (RPMD).\n",
    "In RPMD, multiple replicas of a system are connected via harmonic springs and propagated simultaneously.\n",
    "This can be thought of a discretization of the path integral formulation of quantum mechanics.\n",
    "The fully quantum solution is then recovered in the limit of an infinite number of replicas, also called beads.\n",
    "\n",
    "RPMD simulations can easily be carried out using the SchNetPack MD package.\n",
    "Due to need to perform a large number of computations, RPMD profits greatly from the use of machine learning potentials.\n",
    "Moreover, the presence of multiple replicas lends it self to an efficient parallelization on GPUs, which is one reason for the special structure of the system tensors used in SchNetPack-MD.\n",
    "\n",
    "Here, we will repeat the above simulation for ethanol using RPMD instead of a classical simulation.\n",
    "The main differences in the setup are:\n",
    "- the system needs to be initialized with multiple replicas\n",
    "- a ring polymer integrator needs to be used\n",
    "- special thermostats are required if a canonical ensemble should be simulated\n",
    "\n",
    "The `System` can be set up in exactly the same manner as before, only the number of replicas when loading molecules is now set to be greater than one.\n",
    "For demonstration purposes we use `n_replicas=4`, in general larger numbers are recommended.\n",
    "We can reuse the `Initializer` from above for setting the initial momenta, as it automatically detects the presence of multiple replicas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Number of beads in RPMD simulation\n",
    "n_replicas = 4\n",
    "\n",
    "# Set up the system, load structures\n",
    "rpmd_system = System()\n",
    "rpmd_system.load_molecules(molecule, n_replicas, position_unit_input=\"Angstrom\")\n",
    "\n",
    "# reuse initializer for momenta\n",
    "md_initializer.initialize_system(rpmd_system)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Next, we need to change the integrator to the `RingPolymer` integrator.\n",
    "For RPMD, we need to use a smaller time step, in order to keep the integration numerically stable.\n",
    "In addition, one needs to specify a temperature for the ring polymer, which modulates how strongly the different beads couple.\n",
    "Typically, we use the same temperature as for the thermostat."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.integrators import RingPolymer\n",
    "\n",
    "# Here, a smaller time step is required for numerical stability\n",
    "rpmd_time_step = 0.2  # fs\n",
    "\n",
    "# Initialize the integrator, RPMD also requires a polymer temperature which determines the coupling of beads.\n",
    "# Here, we set it to the system temperature\n",
    "rpmd_integrator = RingPolymer(\n",
    "    rpmd_time_step,\n",
    "    n_replicas,\n",
    "    system_temperature,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Next, we have to change our thermostat to one suitable for RPMD simulations.\n",
    "We will use the local PILE thermostat, which can be thought of as a RPMD equivalent of the classical Langevin thermostat used above.\n",
    "In general, SchNetPack comes with a wide variety of thermostats for classical and ring polymer simulations (see the [thermostats module](../modules/md.rst#module-schnetpack.md.simulation_hooks.thermostats)).\n",
    "For the bath temperature and time constant, the same values as above are used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from schnetpack.md.simulation_hooks import PILELocalThermostat\n",
    "\n",
    "# Initialize the thermostat\n",
    "pile = PILELocalThermostat(bath_temperature, time_constant)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The hooks are generated in exactly the same way as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Logging\n",
    "rpmd_log_file = os.path.join(md_workdir, \"rpmd_simulation.hdf5\")\n",
    "rpmd_file_logger = callback_hooks.FileLogger(\n",
    "    rpmd_log_file,\n",
    "    buffer_size,\n",
    "    data_streams=[\n",
    "        callback_hooks.MoleculeStream(\n",
    "            store_velocities=True,\n",
    "        ),\n",
    "        callback_hooks.PropertyStream([\"energy\"]),\n",
    "    ],\n",
    ")\n",
    "\n",
    "# Checkpoints\n",
    "rpmd_chk_file = os.path.join(md_workdir, \"rpmd_simulation.chk\")\n",
    "rpmd_checkpoint = callback_hooks.Checkpoint(rpmd_chk_file, every_n_steps=100)\n",
    "\n",
    "# Tensorboard\n",
    "rpmd_tensorboard_dir = os.path.join(md_workdir, \"rpmd_logs\")\n",
    "rpmd_tensorboard_logger = callback_hooks.TensorBoardLogger(\n",
    "    rpmd_tensorboard_dir,\n",
    "    [\"energy\", \"temperature\"],\n",
    ")\n",
    "\n",
    "# Assemble the hooks:\n",
    "rpmd_hooks = [\n",
    "    pile,\n",
    "    rpmd_file_logger,\n",
    "    rpmd_checkpoint,\n",
    "    rpmd_tensorboard_logger,\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "And so is the simulator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Assemble the simulator\n",
    "rpmd_simulator = Simulator(\n",
    "    rpmd_system, rpmd_integrator, md_calculator, simulator_hooks=rpmd_hooks\n",
    ")\n",
    "\n",
    "rpmd_simulator = rpmd_simulator.to(md_device)\n",
    "rpmd_simulator = rpmd_simulator.to(md_precision)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Now we can carry out the simulations.\n",
    "Since our time step is shorter, we will run for longer in order to cover the same time scale as the classical simulation (runs approximately 13 minutes on a notebook GPU):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "n_steps = 50000\n",
    "\n",
    "rpmd_simulator.simulate(n_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Loading of the data with the `HDF5Loader` works exactly the same as before.\n",
    "When loading properties for RPMD datafiles, the `HDF5Loader` default of using centroid properties (meaning an average over all beads) becomes active.\n",
    "This is usually what one wants to analyze.\n",
    "If a specific replica should be used, it can be specified via `replica_idx` in the `get_property` function.\n",
    "Here, we immediatly skip the a part of the trajectory to only load the equilibrated system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "rpmd_data = HDF5Loader(rpmd_log_file, skip_initial=25000)\n",
    "plot_temperature(rpmd_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Finally, we can compute the power spectrum and compare it to its classical counterpart:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Initialize the spectrum\n",
    "rpmd_resolution = int(\n",
    "    time_step / rpmd_time_step * 2048\n",
    ")  # bring to same spectral resolution as MD\n",
    "rpmd_spectrum = PowerSpectrum(rpmd_data, resolution=rpmd_resolution)\n",
    "\n",
    "# Compute the spectrum for the first molecule (default)\n",
    "rpmd_spectrum.compute_spectrum(molecule_idx=0)\n",
    "\n",
    "# Get frequencies and intensities\n",
    "rpmd_frequencies, rpmd_intensities = rpmd_spectrum.get_spectrum()\n",
    "\n",
    "# Plot the spectrum\n",
    "plt.figure(figsize=(8, 4))\n",
    "plt.plot(frequencies, intensities, label=\"MD\")\n",
    "plt.plot(rpmd_frequencies, rpmd_intensities, label=\"RPMD\")\n",
    "plt.xlim(0, 4000)\n",
    "plt.ylim(0, 100)\n",
    "plt.ylabel(\"I [a.u.]\")\n",
    "plt.xlabel(\"$\\omega$ [cm$^{-1}$]\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "One problem of purely classical simulations can be observed in the high frequency regions of the MD spectrum.\n",
    "Peaks are shifted towards higher wave numbers compared to the expected [experimental values](https://chem.libretexts.org/Ancillary_Materials/Reference/Reference_Tables/Spectroscopic_Parameters/Infrared_Spectroscopy_Absorption_Table),\n",
    "e.g. 3100&thinsp;cm<sup>-1</sup> vs. 2900&thinsp;cm<sup>-1</sup> for the CH vibrations.\n",
    "The inclusion of nuclear quantum effects via RPMD shifts these bands towards lower frequencies, leading to an improved agreement with experiment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Using the command line interface and config files\n",
    "\n",
    "While setting up simulations in the way described above can be useful for testing and developing new approaches, it can grow tedious for routine simulations. Because of this, SchNetPack provides the script `spkmd` to quickly set up MD simulations using a combination of the `hydra` command line interface (CLI) and predefined config files. This functionality is covered in detail in the [Run MD simulations with the CLI](https://schnetpack.readthedocs.io/en/latest/userguide/md.html) section of the general SchNetPack user guide."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Summary\n",
    "\n",
    "In this tutorial, we have given a basic introduction to the structure and functionality of the MD package in SchNetPack.\n",
    "After setting up a standard MD simulation, we have explored how to use simulation hooks to control simulations in a modular way.\n",
    "We have shown how to extract data from the HDF5 files generated during MD and how available functions can be used to compute dynamic quantities, such as power spectra.\n",
    "This was followed by a short example of using more advanced simulation techniques in the form of RPMD.\n",
    "Finally, a short introduction to the `spkmd` script and the use of `hydra` configs was given."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  },
  "nbsphinx": {
   "execute": "never"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}