differentiable scaffolding tree for molecular optimization

For QED, LogP, normalized SA, JNK3, and GSK3, higher scores are more desirable under our experimental setting. 20796412. We consider following oracles: For all the property scores above, higher is more desirable. Proved. Use Git or checkout with SVN using the web URL. Differentiable Scaffolding Tree for Molecular Optimization. Other GNN variants, such as Graph Attention Network (GAT)[velivckovic2017graph], Graph Isomorphism Network (GIN)[xu2018powerful], can also be used in our setting. We provide two examples in Figure6 to illustrate it. expand) takes the form: where is the differentiable edge set defined above, Intuitively, all the selected molecules are dissimilar to each other and the diversity is maximized. Sun, MIMOSA: multi-constraint molecule sampling for molecule optimization, T. Fu, C. Xiao, C. Qian, L. M. Glass, and J. To the best of our knowledge, it is the first attempt to make the molecular optimization problem differentiable at the substructure level, rather than resorting to latent spaces or using RL/evolutionary algorithms. To select the substructure set S, we break all the ZINC molecules into substructures (including single rings and single atoms), count their frequencies, and include the substructures whose frequencies are higher than 1000 into vocabulary set S. QED score ranges from 0 to 1. where the grey points represent the two-dimensional vector of ZINC molecules. We remove the molecules that contains substructure that is not in vocabulary. Log In. 3.1.2Scaffolding TreeThe basic mathematical description of a molecule is molecular graph, which contains atoms as nodes and chemical bonds as edges. The sets of leaf nodes and non-leaf nodes are denoted Vleaf and Vnonleaf correspondingly. a probable furazan oxide triggered tandem isomerisation process, P. Velikovi, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, Y. Wang, S. H. Bryant, T. Cheng, J. Wang, A. Gindulyte, B. (3) SR (Success Rate); (2) GCPN (Graph Convolutional Policy Network)[You2018-xh]; Additional baseline methods include JTVAE (junction tree variational autoencoder)[jin2018junction], and GraphAF (Graph Flow-based Autoregressive Model). Those algorithms still require massive numbers of oracle calls, which is computationally inefficient during the inference time[27]. All results in the tables are from experiments up to T=50 iterations. Remark. Differentiable node indicator matrix; adjacency matrix; node weight. 10). Thus, det(SR)=Ci=1(SR)ii goes to 1. Overall, DST +DPP is the best strategy compared with other variants. When optimizing JNK3, GSK3, QED, JNK3+GSK3 and QED+SA+JNK3+GSK3, we use binary cross entropy as loss criterion. Batch normalization is adopted after each layer, and sum-pooling is used as the aggregation function. B. Zimmerman, P. T. Anastas, H. C. Erythropel, and W. Leitner, Oracle function, e.g., evaluator of molecular property (Def. We also define differentiable edge set, ={(v,v)|vVleafORvVexpand;v,vare connected} to incorporate all the edges involving leaf-nonleaf node and leaf/nonleaf-expansion node connections. It is denoted R=DPP-greedy({X1,,XM},C). The vocabulary size is big enough for this proof-of-concept study. The node weights and their gradient values interpret the property at the substructure level. For most of the target properties, the normalized loss value on the validation set would decrease significantly, and GNN can learn these properties well, except QED. Oracle GNN is trained once and for all. ER|S|d is the embedding matrix of all the substructures in vocabulary set S, and is randomly initialized. The proposed MoFlow is a flow-based graph generative model to learn invertible mappings between molecular graphs and their latent representations and achieves state-of-the-art performance, which implies its potential efficiency and effectiveness to explore large chemical space for drug discovery. The authors of Differentiable Scaffolding Tree for Molecular Optimization have not publicly listed the code yet. Figure 1: Illustration of the overall approach: During inference, we construct the corresponding scaffolding tree and differentiable scaffolding tree (DST) for each molecule. initial node embeddings, stacks basic embeddings of all the nodes in the scaffolding tree. Node weight vector, w=[1,,1]RK, indicates the K nodes in scaffolding tree are equally weighted. We want to sample a subset (denoted R) of M data, R is a subset of {1,2,,M} with fixed size C, it assigns the probability. For a nonleaf node v, we support (1) EXPAND: add a new node uv connecting to v; (2) do nothing. Those algorithms still require massive numbers of oracle calls, which is computationally inefficient during the inference time[korovina2020chembo]. Based on learnable weights for each node, we leverage the weighted average as the readout function of the last layers (L, -th) node embeddings, followed by multilayer perceptron (MLP) to yield the prediction. On the other hand, we specify the smoothness of the objective function F by assuming. We enumerate all the possible molecules following[22] (See SectionC.5 in Appendix for more details) for the further selection as described below. Only rings with a size of 5 and 6 are allowed. LogP represents octanol-water partition coefficient, measuring molecules solubility. Then we compare our method with baseline methods on 3 molecules with the highest objective (F) scores in Table5 and6 for single-objective and multi-objective generation, respectively. We provide an interpretability example in Figure13. This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model. -greedy would converge to local optimum within finite steps. It is evaluated via RDKit[landrum2006rdkit]. arxiv:2110.06389 (2021) Intuitively, all the selected molecules are dissimilar to each other and the diversity is maximized. (2) Diversity; Learning, DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Black labels plotted on the tips of the scaffold tree with correspondingcolored squares represent pure, unadmixed populations (as determined by a twoway f 3 test). T Fu et al. Substructures can be either an atom or a single ring. then we have (1) V12SV12 is positive semidefinite; (2) V12RSRV12R=(V12SV12)R. which means we can use DPP-greedy (Def. For ease of computation, we convert a molecular graph to a scaffolding tree as a higher-level representation, a tree of substructures, following[22, 24]. understand the model output. When j is a leaf node, it naturally embeds the inheritance relationship between the leaf node and the corresponding expansion node. For example, [49, 52, 23, 15] tried to solve the problem with deep reinforcement learning; Inspired by problems faced during medicinal chemistry lead optimization, the MolDQN model is extended with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule. For fairness of comparison, validation loss is normalized by dividing the validation loss at scratch (i.e., 0-th epoch) so that all the validation losses are between 0 and 1. 2209.00796 - Free download as PDF File (.pdf), Text File (.txt) or read online for free. 3.1.1Molecular optimization problemOracles are the objective functions for molecular optimization problems, e.g., QED quantifying a molecules drug-likeness[bickerton2012quantifying]. GSK3 (Glycogen synthase kinase 3 beta) is an enzyme that in humans is encoded by the GSK3 gene. We consider the following variants: DST + DPP. Adam is used as an optimizer with 1e-4 as the initial learning rate. When goes to infinity, i.e., only considering the first term (tRvt), it is equivalent to selecting C molecule candidates with the highest F score for the next iteration, same as conventional evolutionary learning in[21, 36]. This repository hosts DST (Differentiable Scaffolding Tree for Molecule Optimization) (Tianfan Fu*, Wenhao Gao*, Cao Xiao, Jacob Yasonik, Connor W. Coley, Jimeng Sun), which enables a gradient-based optimization on a chemical graph. SA; is the differentiable edge set defined above, In realistic discovery settings, the oracle acquisition cost is usually not negligible. Then we have, Combining Equation(26) and (27), we have. The difference is when selecting molecules for the next iteration, it selects the top-K molecules with highest f score. This problem formulation can find its root in conventional computer-aided molecular design algorithms with branch-and-bound algorithms as solutions[sinha1999environmentally, sahinidis2000applications]. (7), it is differentiable with regard to {N,A,w} for all molecules in the neighborhood set N(X). When connecting atom and ring in a molecule, an atom can be connected to any possible atoms in the ring. The raw SA score ranges from 1 to 10. Molecular optimization is generally a discrete optimization task, which is prohibitively expensive due to exhaustive search. It is higher-level representation of molecular graph X. TX is represented by (i) node indicator matrix, (ii) adjacency matrix, and (iii) node weight vector. Adam is used as an optimizer with 3e-4 initial learning rate. Aij=1 indicates the i-th node and j-th node are connected while 0 indicates unconnected. In this project, the basic unit is substructure, which can be atoms or single rings. QED score ranges from 0 to 1. In this section, we present some theoretical results of the proposed method. DST + top-K. Patterns (2020). GCPN predicts the actions and is trained via proximal policy optimization (PPO) to optimize an accumulative reward, including molecular property objectives and adversarial loss. The computational complexity of DST is O(TMC2) (see SectionC.8 in Appendix). Additional baseline methods include JTVAE (junction tree variational autoencoder)[22] and GraphAF (Graph Flow-based Autoregressive Model)[39]. Abnormal regulation and expression of GSK3 is associated with an increased susceptibility towards bipolar disorder. LigGPT consists of 8 such decoder blocks. and (2) multi-objective generation that optimizes the mean value of JNK3+GSK3 and QED+SA+JNK3+GSK3 in the main text. Same as DST + DPP, it uses DST to sample new molecules. (1) LigGPT (string-based distribution learning model with Transformer as a decoder)[1]; A tag already exists with the provided branch name. You, B. Liu, R. Ying, V. Pande, and J. Leskovec, Graph convolutional policy network for goal-directed molecular graph generation, Determinantal point processes for mini-batch diversification, Uncertainty in Artificial Intelligence (UAI). Diversity of generated molecules is defined as the average pairwise Tanimoto distance between the Morgan fingerprints[49, 23, 47]. The computational complexity is O(TMC2) (the main bottleneck is DPP method, Algorithm2), where the size of selected molecules C=10 for all the tasks (Section3.4 &C.6). We check the results for DST + top-K, during some period, the objective does not grow, we find it is trapped into local minimum, impeding its performance, especially convergence efficiency. For the matrix A whose shape is greater than 1 and all the elements are equal to 1, the determinant is equal to Baselines. We find that both DST sampling and DPP-based diversification play a critical role in performance. The reward includes target property and similarity constraint. The constraint is the f score is greater than 0.4. In this section, we present some additional results of de novo molecular generation for completeness. The results of multi-objective and single-objective generation are shown in Table1 and2. is an important hyperparameter that weights the importance of the discriminators loss in the overall fitness function, and we set it to 10. We choose graph neural network architecture for its state-of-the-art performance in modeling structure-property relationships. We follow most of the settings in the original paper. brute-force enumeration. Node indicator matrix; adjacency matrix; node weight. Suppose Assumption1 and2 hold, we have the following relative improvement bound with the optimum. 4). JNK3; On the other hand, if we only consider the second term in Eq. In the multiple-objective optimization, we normalize the SA score to [0,1] so that a higher normalized SA value mean easy to synthesize. New methodologies for molecule generation with interpretable and controllable deep generative models, by proposing new monotonically-regularized graph variational autoencoders that learn to represent the molecules with latent variables and then learn the correspondence between them and molecule properties parameterized by polynomial functions. T Fu, W Gao, C Xiao, J Yasonik, CW Coley, J Sun. Ring-ring connection. A. Shoemaker, P. A. Thiessen, S. He, and J. Zhang, SMILES, a chemical language and information system. Inspired by generalized DPP methods[29, 5], we further transform LDPP(R). On the other hand, atom-wise molecule generation is difficult due to the existence of rings. The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery. We describe the DPP-greedy algorithm in Algorithm2 for completeness. In our pipeline, the random error comes from in two steps: nodes in TX, there are Kleaf leaf nodes (nodes connecting to only one edge) and KKleaf non-leaf nodes (otherwise). A novel framework to systematically evaluate exploration measures for drug candidate generation is proposed and recommendations on existing and novel exploration measures that are suitable for the task of drug discovery are made. (1) Training oracle GNN: data selection/split, training process including data shuffle and GNNs parameter initialization. The atomic number is the only chemical feature that is used to derive the TUCAN format. SA (Synthetic Accessibility) score measures how hard it is to synthesize a given molecule, based on a combination of the molecules fragments contributions[10]. Despite the initial success of these previous attempts, the following limitations remain: 11) to solve Problem(21) and obtain the optimal R. Discussion. It is higher-level representation of molecular graph X. TX is represented by (i) node indicator matrix, (ii) adjacency matrix, and (iii) node weight vector. In the de novo generation, in each generation, we keep C=10 molecules for the next iteration. As a complete generation algorithm, we optimize a batch parallelly and select candidates based on DPP. atoms. is the objective function of ti-th molecule Xti. (Scaffold Constrained Molecular Generation), to perform scaffold-constrained in silico molecular design. The vocabulary size is big enough for this proof-of-concept study. In the current iteration, we have generated M molecules (X1,,XM) and need to select C molecules for the next iteration. (2) Inference (Optimizing DST): before optimizing DST, we initialize the learnable parameter randomly, including N, w, A, which also brings randomness. chemical validities. (differentiable) gradient-based optimization on a chemical graph for de novo molecule design/optimization (ICLR 2022). or Small-molecule metabolites are promising and reliable biomarkers for diverse clinical uses including early disease detection, drug identification, toxicological screening of new drugs, and drug pharmacology studies to advance personalized medicine. We address the following optimization problem to get the best scaffolding tree within N(X). For a nonleaf node v, we support (1) EXPAND: add a new node uv connecting to v; (2) do nothing. Oracle is a property evaluator and is a function whose input is molecular structure, and output is the property. The weight of expansion node connecting to leaf node relies on the weight of corresponding leaf node. Optimizing differentiable scaffolding tree: We formulate the discrete molecule optimization into a locally differentiable problem with a differentiable scaffolding tree (DST). bX is the binary Morgan fingerprint vector for the molecule X. where X2=EXPAND(X1,s1),X3=EXPAND(X2,s2), s1,s2 are substructures to add. Overall, DST +DPP is the best strategy compared with other variants. The junction tree variational autoencoder generates molecular graphs in two phases, by first generating a tree-structured scaffold over chemical substructures, and then combining them into a molecule with a graph message passing network, which allows for incrementally expand molecules while maintaining chemical validity at every step. where 22277390. Specifically, while inheriting leaf node set Vleaf and non-leaf node set Vnonleaf from the original scaffolding tree, we add expansion nodes and form expansion node set, Vexpand={uv|vVleafVnonleaf},|Vexpand|=Kexpand=K, where uv is connected to v in the original scaffolding tree. 20: 2021: Amortized tree generation for bottom-up synthesis planning and synthesizable molecular design . when both 1,2 approach to 0+, the optimal solution to Problem(19) is. We used Adam optimizer with 1e-3 as the initial learning rate. The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery. (1) QED, (quantitative estimate of drug-likeness). Each row of N is a one-hot vector, indicating which substructure the node belongs to. Dataset. (2) another is two rings share two atoms and one bond. 1), where Perm(M) is the set of all permutations of the set {1,2,,M}, Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. The hidden layer of the neural network outputs a vector of size 1024 and uses a GELU activation and the final layer again returns a vector of size 256 to be 7 used as input for the next decoder block. V12=diag([exp(v12),,exp(vM2)]). The local derivative shows consistency with chemical intuition, providing interpretability of the chemical structure-property relationship. When connecting ring and ring, there are two general ways, (1) one is to use a bond (single, double, or triple) to connect the atoms in the two rings. The local derivative shows consistency with chemical intuition, providing interpretability of the chemical structure-property relationship. Combining Equation(23) and(24), we prove V12RSRV12R=(V12SV12)R. We consider two cases in the solution R. (A) one molecule for each input molecule Z1,,ZC. This is consistent with our intuition that logP score will prefer larger molecules with more carbon atoms. Number of all possible molecules to select. Learn more. (4) # of oracle calls: DST needs to call oracle in labeling data for GNN (precomputed) and DST based de novo generation (online), we show the costs for both steps. . For the de novo design, DST-greedy start from scratch (empty molecule). where V12 is diagonal matrix, so V12=(V12). Our method, DST, falls into this category, explicitly leverages the objective function landscape and conducts an efficient goal-oriented search. Although the novelty is not the highest, it is still comparable to baseline methods. We report the learning curve in Figure10, where we plot the normalized loss on the validation set as a function of epoch numbers when learning GNN. Each self-attention layer returns a vector of size 256, that is taken as input by the fully connected network. (ii) For each nonleaf node v, we expand a new node with probability (w)uv|v. good paper Sign up to our mailing list for occasional updates. Then we present the following lemma to show that when only considering diversity, under certain assumptions, Problem(19) reduces to multiple chain MCMC methods. Differentiable Scaffolding Tree for Molecule Optimization; MS2-Transformer: An End-to-End Model for MS/MS-assisted Molecule Identification; Graph Piece: Efficiently Generating High-Quality Molecular Graphs with Substructures; Pre-training Molecular Graph Representation with 3D Geometry; Spherical Message Passing for 3D Molecular Graphs Monireh Golpour et al. The substructure is the basic building block in our method, including frequent atoms and rings. In the example, there are 4 possible ways to add a Chlorine atom (Cl) as an expansion node to the target ring, which is a leaf node in the scaffolding tree. However, it is challenging to design the effective reward function into the objective[23]. where X2=EXPAND(X1,s1),X3=EXPAND(X2,s2), s1,s2 are substructures to add. Node indicator matrix N is decomposed as N=(NnonleafNleaf){0,1}K|S|, 1. SR (Success Rate) is the percentage of the generated molecules that satisfy the property constraint measured by objective f defined in Equation(1). V12 is diagonal. In contrast, LigGPT belongs to distribution learning (a different category of method), which learns the distribution of the training set. where SRRCC is the sub-matrix of S, det(SR) is the determinant of the matrix SR. The clean data is available at [19, 9] (https://tdcommons.ai/generation_tasks/molgen/). Following the original paper, the episode number is 5,000, maximal step in each episode is 40.