Source data for dipoles fits
============================

The XYZ files here give both the geometries and the dipoles of the datasets
used in fitting of the scalar-vector dipole models in [this publication][muml].

All dipoles are expressed in units of Debye.  All files are in extended-XYZ
format, readable by ASE, where the dipole is then accessible as follows:

    atoms.info['dipole_<method>']

where `atoms` is a single ASE Atoms object, and `<method>` is one of `b3lyp`,
`ccsd`, or `scan0`.

QM7b
----

Both CCSD/daDZ and B3LYP/daDZ dipoles are included, as well as
polarizabilities, all calculated as described [here][alphaml_datapaper] and
available with the [alphaML dataset][alphaml_dataset].  The molecules are
randomly shuffled (to ease random partitioning into training and test set; the
first 5400 molecules are the training set and the last 1811 are the test set),
but their indices in FPS ordering are given under the key `fps_order`.

QM9
---

A sample from the [QM9 database][qm9_paper], B3LYP dipoles only.  As with the
QM7b set, the molecules were randomly shuffled; the first 20000 were chosen as
the training set and the next 1000 as the test set.  The key `id` corresponds
to the QM9 ID of the molecule as given in [the dataset][qm9_dataset].

Showcase
--------

This test set consists of the first 29 molecules of the AlphaML showcase (also
available [here][alphaml_dataset]) plus 31 additional amino acid derivatives.
Dipoles were computed at B3LYP, CCSD, and SCAN0.

Challenge sets
--------------

Finally, four challenge sets are provided; these are all series of molecules of
increasing length, some made of polar fragments, one with large separation of
charge, and one "control" with nearly constant dipole as a function of length.
Dipoles computed only at the B3LYP level.

License
-------

This dataset is licensed under a [Creative Commons Attribution 4.0
International License](http://creativecommons.org/licenses/by/4.0/).


[muml]: http://arxiv.org/abs/2003.12437
[alphaml_datapaper]: https://doi.org/10.1038/s41597-019-0157-8
[alphaml_dataset]: https://doi.org/10.24435/materialscloud:2019.0002/v3
[qm9_paper]: https://doi.org/10.1038/sdata.2014.22
[qm9_dataset]: https://doi.org/10.6084/m9.figshare.978904
