Home

In this work, we develop a unified deep learning model T5Chem for a variety of chemical reaction predictions tasks by adapting the ”Text-to-Text Transfer Transformer”(T5) framework in natural language processing (NLP). T5Chem can be fine-tuned in many tasks based on self-supervised pre-training with PubChem molecules. Here four distinct types of task-specific reaction prediction tasks for open-source datasets are available, including reaction type classification on USPTO_TPL, forward reaction prediction on USPTO_MIT, single-step retrosynthesis on USPTO_50k and reaction yield prediction on high-throughput C-N coupling reactions. Meanwhile, we introduced a new unified multi-task reaction prediction dataset USPTO_500_MT, which can be used to train and test five different types of reaction tasks, including the above four as well as a new reagent suggestion task.

Code

Test our numerical results or reconstruct your own model using code from GitHub

Datasets

Name	Size	Task type

sample	~60,000 (1.9 MB)	Multi-task (small sample dataset)
USPTO_TPL	445,115 (9.1 MB)	Classification (Reaction Type)
USPTO_MIT	479,035 (18.4 MB)	Forward Prediction
USPTO_50k	50,037 (892 KB)	Retrosynthesis
C-N coupling	3,955 (774 KB)	Regression (Reaction Yield)
USPTO_500_MT	143,535 (54.5 MB)	Multi-task

Pre-trainied Models

All pre-trained models are trained on 97 million PubChem Molecules with BERT-like self-supervised mask-filling scheme. They will need to be loaded as "--pretrain" models for T5Chem and fine-tuned for any down-streaming tasks. A ready-to-go fine-tuned model for multi-task training (on USPTO_500_MT) is also available.

Model	Download

Character-level pubchem pretrained	Download (53.3 MB)
Multi-task USPTO_500_MT (Trained)	Download (53.3 MB)

Multi-task Reaction Predictions

Abstract

Code

Datasets

Pre-trainied Models

Reference