This is the latest version of tutorial for ΔvinaXGB, old version can be found here
First Part and Second Part of this tutorial is about installation of dependencies and set up the ΔvinaXGB.
Third Part and Forth Part are the dataset and examples of applying ΔvinaXGB scoring function to rescore the protein-ligand binding affinity.
make Makefile create_environment
conda activate DXGB
make Makefile requirements
conda install -c conda-forge xgboost=0.80.0
conda install -c rdkit rdkit=2019.03.1
conda install -c openbabel openbabel
conda install -c conda-forge setuptools
python setup.py install
MGLTools are used for preparing PDBQT file for Vina. MSMS is for calculating solvent accessible surface area. Both of them can be downloaded from MGLTools website.
Note
Remember use correct mgltools name (Linux or Mac) when install mgltools.tar -xvzf mgltools_x86_64Linux2_1.5.6.tar.gz
cd mgltools_x86_64Linux2_1.5.6/
./install.sh
msms_i86_64Linux2_2.6.1.tar.gz
Copymkdir msms
tar -xvzf msms_i86_64Linux2_2.6.1.tar.gz -C msms
cd msms
msms.x86_64Linux2.2.6.1
to be msms
cp msms.x86_64Linux2.2.6.1 msms
msms
folder, there is a script pdb_to_xyzr
. Change the line numfile = "./atmtypenumbers"
to be
numfile = "YourPATHofddeltaVinaXGB/DXGB/atmtypenumbers"
atmtypenumbers file we used can be found in deltaVinaXGB/DXGB
directory
pdb_to_xyzr 1crn.pdb > 1crn.xyzr
If it doesn't work, try
./pdb_to_xyzr 1crn.pdb > 1crn.xyzr
If error: nawk: command not found
, change nawk
to awk
in pdb_to_xyzr (line 31)
A fork of Autodock Vina has been modified to output the features in score_only
mode.
After download, extract files from zip file.If directory name is vina4dv_master, change it into vina4dv.
Most recent version of R is recommended.
Note
install.packages('randomForest')
If you have the dependencies installed already. Several environment variables need to be set in .bashrc
(Linux) or .bash_profile
(macOS) file
in your home directory. An example is given below. You can modify the path based on your case. In this example, all softwares are installed under /home/jl7003
directory.
Example
# path for MSMS
export PATH=$PATH:/home/jl7003/msms/
# set mgltool variable (if mac, should change mgltools_x86_64Linux2_1.5.6 into your downloaded mac version)
export PATH=$PATH:/home/jl7003/mgltools_x86_64Linux2_1.5.6/bin/
export MGL=/home/jl7003/mgltools_x86_64Linux2_1.5.6/
export MGLPY=$MGL/bin/python
export MGLUTIL=$MGL/MGLToolsPckgs/AutoDockTools/Utilities24/
# set vina dir
export VINADIR=/home/jl7003/vina4dv/build/linux/release/
br>
pdbid_ligand.mol2/sdf
--> ligand structure filepdbid_protein.pdb
--> protein structure pdbid_protein_all.pdb
--> protein with water molecules structure fileInput.csv
--> Input feature file
After all of above have been set up, the example can be run in deltaVinaXGB/DXGB
.
conda acivate DXGB
cd DXGB
You can check the help by
python run_DXGB.py --help
The script can be run for one complex by
python run_DXGB.py --runfeatures --datadir ../Test_2al5 --pdbid 2al5 --average
--runfeatures
is feature calculation, default is to calculate all features,
--datadir
is for structure files datadir,
--pdbid
is for structure pdbid, can be other type of index.
--average
is to calculate average scores from 10 models.
Or it can also be run by providing a list of protein-ligand complex with input features as in Input.csv
python run_DXGB.py --datadir ../Test --average
Default is to predict scores for provided structures. If you want to get scores with explicit water molecules, and optimized ligands:
python run_DXGB.py --runfeatures --datadir ../Test_2al5 --pdbid 2al5 --water rbw --opt rbwo --average
--water
is for consideration of water effect, rbw
is to consider both receptor-bound water and bridging water molecules,
--opt
is for optimization, rbwo
is to optimize ligand in no water, bridging water, and receptor-bound water environemnts.
The calculated features will be saved in Input.csv
file, and the predicted scores will be saved in score.csv
file. If you want to get deltaVinaRF scores as well, add --runrf.