Multi-task Reaction Predictions


Abstract


In this work, we develop a unified deep learning model T5Chem for a variety of chemical reaction predictions tasks by adapting the ”Text-to-Text Transfer Transformer”(T5) framework in natural language processing (NLP). T5Chem can be fine-tuned in many tasks based on self-supervised pre-training with PubChem molecules. Here four distinct types of task-specific reaction prediction tasks for open-source datasets are available, including reaction type classification on USPTO_TPL, forward reaction prediction on USPTO_MIT, single-step retrosynthesis on USPTO_50k and reaction yield prediction on high-throughput C-N coupling reactions. Meanwhile, we introduced a new unified multi-task reaction prediction dataset USPTO_500_MT, which can be used to train and test five different types of reaction tasks, including the above four as well as a new reagent suggestion task.

Code

Test our numerical results or reconstruct your own model using code from GitHub

Datasets

Name Size Task type
sample ~60,000 (1.9 MB) Multi-task (small sample dataset)
USPTO_TPL 445,115 (9.1 MB) Classification (Reaction Type)
USPTO_MIT 479,035 (18.4 MB) Forward Prediction
USPTO_50k 50,037 (892 KB) Retrosynthesis
C-N coupling 3,955 (774 KB) Regression (Reaction Yield)
USPTO_500_MT 143,535 (54.5 MB) Multi-task

Pre-trainied Models

All pre-trained models are trained on 97 million PubChem Molecules with BERT-like self-supervised mask-filling scheme. They will need to be loaded as "--pretrain" models for T5Chem and fine-tuned for any down-streaming tasks. A ready-to-go fine-tuned model for multi-task training (on USPTO_500_MT) is also available.

Model Download
Character-level pubchem pretrained Download (53.3 MB)
Multi-task USPTO_500_MT (Trained) Download (53.3 MB)

Reference

Jieyu Lu and Yingkai Zhang, J Chem Inf Model , 62, 1376-1387 (2022)
Unified Deep Learning Model for Multi-task Reaction Predictions with Explanation