**Figure 1: CoarsenConf architecture.**

Molecular conformer generation is a fundamental task in computational chemistry. The goal is to predict a stable low-energy 3D molecular structure, known as a conformer, given a 2D molecule. Accurate molecular conformation is critical for a variety of applications that rely on precise spatial and geometric qualities, including drug discovery and protein docking.

We introduce CoarsenConf, a SE(3) equivariant hierarchical variational autoencoder (VAE) that pools information from fine-grained atomic coordinates into a coarse-grained subgraph-level representation to generate efficient autoregressive conformers.

## background

Coarse-graining reduces the dimensionality of the problem by allowing conditional autoregressive generation instead of generating all coordinates independently as done in previous work. By directly adjusting the 3D coordinates of previously generated subgraphs, our model generalizes better across chemically and spatially similar subgraphs. This mimics the basic molecular synthesis process in which small functional units join together to form large drug-like molecules. Unlike previous methods, CoarsenConf generates low-energy conformers with the ability to directly model atomic coordinates, distances, and torsion angles.

CoarsenConf architecture can be divided into the following components:**(me)** The encoder $q_\phi(z| ) takes the conformer $\mathcal{C}$ as input (derived from $X$ and a predefined CG strategy) and outputs a variable-length equilateral CG representation via equilateral message passing and point convolution.**(II)** Equivariate MLP is applied to learn the mean and log variance of both posterior and prior distributions.**(III)** The posterior (training) or prior (inference) is sampled and fed into the channel selection module. Here, the attention layer is used to learn the optimal path from CG to FG structure.**(IV)** Given the FG latent vector and the RDKit approximation, the decoder $p_\theta(X |\mathcal{R}, z)$ learns to recover the low-energy FG structure via autoregressive isovariant message passing. The entire model can be trained end-to-end by optimizing the KL divergence of the latent distribution and the reconstruction error of the generated conformers.

## MCG red tape

We formulate the Molecular Conformer Generation (MCG) task as modeling the conditional distribution $p(X|\mathcal{R})$. Here $\mathcal{R}$ is an approximate conformer generated by RDKit and $X$ is the optimal. Low-energy conformist. RDKit, a commonly used Cheminformatics library, uses inexpensive distance geometry-based algorithms and inexpensive physics-based optimization to obtain reasonable conformer approximations.

## coarse grain

**Figure 2: High-level procedure.**

**(me)** Example of variable length coarse grains. Microscopic particles molecules split along rotatable bonds that define the torsion angle. They are then approximated to reduce dimensionality and learn subgraph-level latent distributions. **(II)** Visualization of 3D conformers. Specific atom pairs are highlighted for decoder message passing operations.

Molecular assembly simplifies the molecular representation by grouping the fine-grained (FG) atoms of the original structure into individual assembled (CG) beads $\mathcal{B}$ using rule-based mapping, as shown in Figure 2. (me). Coarse-graining has been widely utilized in protein and molecule design, and similarly, fragment-level or subgraph-level generation has proven invaluable in a variety of 2D molecular design tasks. Breaking down the generative problem into smaller pieces is an approach that can be applied to many 3D molecular tasks and provides a natural dimensionality reduction to work on large, complex systems.

Compared to previous studies that focused on fixed-length CG strategies, where each molecule is represented at a fixed resolution of $N$ CG beads, our method uses variable-length CG for its flexibility and ability to support coarse-grained selection. Grain technology. This means that input molecules can be mapped to any number of CG beads, and thus a single CoarsenConf model can be generalized to any coarse resolution. In our case, all rotatable bonds are broken, coarsening the atoms composed of each connected component into a single bead. In the CG procedure, this choice implicitly forces the model to learn about atomic coordinates and interatomic distances as well as torsion angles. The experiments used GEOM-QM9 and GEOM-DRUGS, which have on average 11 atoms and 3 CG beads and 44 atoms and 9 CG beads, respectively.

## SE(3)-equal variances

An important aspect when working with 3D structures is maintaining appropriate homodispersion. A three-dimensional molecule is isomorphic under rotation and translation, i.e. SE(3)-isomorphism. We apply SE(3)-parsimony to the encoder, decoder and latent space of the stochastic model CoarsenConf. As a result, $p(X | \mathcal{R})$ remains unchanged for rotation transformations of the approximate conformer $\mathcal{R}$. Additionally, if $\mathcal{R}$ is rotated 90° clockwise, the optimal $X$ is expected to exhibit the same rotation. See the full paper for an in-depth definition and discussion of how to maintain equal variances.

## focused attention

**Figure 3: Approximate backmapping of variable length with Aggregated Attention.**

We introduce a method called Aggregated Attention to learn optimal variable-length mappings from latent CG representations to FG coordinates. This is a variable-length operation, as a single molecule with $n$ atoms can be mapped to any number of $N$ CG beads (each bead is represented by a single latent vector). The latent vectors of a single CG bead $Z_{B}$ $\in R^{F \times 3}$ are used as keys and values for a single-head attention task with an embedding dimension of 3 to match x. y, z coordinates. The query vector is a subset of the RDKit conformance corresponding to the bead $B$ $\in R^{ n_{B} \times 3}$. Here $n_B$ is of variable length since we know in advance how many FG atoms it corresponds to. On specific CG beads. Leveraging attention to efficiently learn optimal mixtures of latent features for FG reconstruction. This is called Aggregated Attention because it aggregates 3D segments of FG information to form latent queries. Aggregated Attention is responsible for efficient transformation from latent CG representations to executable FG coordinates (Figure 1(III)).

## Model

CoarsenConf is a hierarchical VAE with SE(3) equivariant encoders and decoders. The encoder operates on SE(3) invariant atomic properties $h \in R^{n \times D}$ and SE(3) equivariant atomic coordinates $x \in R^{n \times 3}$. A single encoder layer consists of three modules: fine-grained module, pooling, and coarse-grained module. Complete equations for each module can be found in the full paper. The encoder produces the final equivariant CG tensor $Z \in R^{N \times F \times 3}$. where $N$ is the number of beads and F is the user-defined latent size.

The role of the decoder is twofold. The first is to transform the latent coarse-grained representation back into the FG space through a process called channel selection that leverages Aggregated Attention. The second is to autoregressively refine the fine-grained representation to generate the final low-energy coordinates (Figure 1(IV)).

We emphasize that our model learns the optimal twist angle in an unsupervised manner because the conditional inputs to the decoder are unordered via the approximate theorem by torsion angle concatenation. CoarsenConf then ensures that each generated subgraph is properly rotated to achieve low coordinate and distance errors.

## Experiment result

**table number 1**: The quality of the generated fitness ensemble for the GEOM-DRUGS test set ($\delta=0.75Å$) in terms of coverage (%) and average RMSD ($Å$). CoarsenConf (5 epochs) was limited to use 7.3% of the data used by Torsional Diffusion (250 epochs) to illustrate the low computational and data constraint regime.

The average error (AR) is a key metric that measures the average RMSD for the generated molecules of an appropriate test set. Coverage measures the proportion of molecules that can be generated within a certain error threshold ($\delta$). To better evaluate robust generation and avoid sampling bias in the minimum metric, we introduce average and maximum metrics. We emphasize that the minimum metric produces intangible results, because unless the optimal conformer is known a priori, there is no way to know which of the 2L conformers generated for a single molecule is the best. Table 1 shows that CoarsenConf produces the lowest mean and worst errors on the entire test set of drug molecules. We also show that RDKit with inexpensive physics-based optimization (MMFF) achieves better coverage than most deep learning-based methods. Please see the full document linked below for a formal definition of the metric and further discussion.

For more information about CoarsenConf, read the paper on arXiv.

## Vivtex

If CoarsenConf has inspired your work, cite it like this:

```
@article{reidenbach2023coarsenconf,
title={CoarsenConf: Equivariant Coarsening with Aggregated Attention for Molecular Conformer Generation},
author={Danny Reidenbach and Aditi S. Krishnapriyan},
journal={arXiv preprint arXiv:2306.14852},
year={2023},
}
```