In this work, we develop a unified deep learning model T5Chem for a variety of chemical reaction predictions tasks by adapting the ”Text-to-Text Transfer Transformer”(T5) framework in natural language processing (NLP). T5Chem can be fine-tuned in many tasks based on self-supervised pre-training with PubChem molecules. Here four distinct types of task-specific reaction prediction tasks for open-source datasets are available, including reaction type classification on USPTO_TPL, forward reaction prediction on USPTO_MIT, single-step retrosynthesis on USPTO_50k and reaction yield prediction on high-throughput C-N coupling reactions. Meanwhile, we introduced a new unified multi-task reaction prediction dataset USPTO_500_MT, which can be used to train and test five different types of reaction tasks, including the above four as well as a new reagent suggestion task.
Test our numerical results or reconstruct your own model using code from GitHub
Name | Size | Task type |
---|
sample | ~60,000 (1.9 MB) | Multi-task (small sample dataset) |
USPTO_TPL | 445,115 (9.1 MB) | Classification (Reaction Type) |
USPTO_MIT | 479,035 (18.4 MB) | Forward Prediction |
USPTO_50k | 50,037 (892 KB) | Retrosynthesis |
C-N coupling | 3,955 (774 KB) | Regression (Reaction Yield) |
USPTO_500_MT | 143,535 (54.5 MB) | Multi-task |
All pre-trained models are trained on 97 million PubChem Molecules with BERT-like self-supervised mask-filling scheme. They will need to be loaded as "--pretrain" models for T5Chem and fine-tuned for any down-streaming tasks. A ready-to-go fine-tuned model for multi-task training (on USPTO_500_MT) is also available.
Jieyu Lu and Yingkai Zhang,
J Chem Inf Model , 62, 1376-1387 (2022)
Unified Deep Learning Model for Multi-task Reaction Predictions with Explanation