Martini tutorials: DNA
Basic system setup: Double stranded DNA
Debugging: Common problems
Case: Input from sequence
Analysis: Helical parameters
System setup: protein-DNA complex
Backmapping: DNA and Backwards
NOTE: You can download all the files required for this tutorial from here. Please note that you have to adjust some of the paths (mostly location of the force field files) below if you use this package.
Setting up double-stranded DNA
This is a brief tutorial on how to set up and try out DNA Martini simulations. To complete this tutorial you should download the latest Martini DNA release beta from here. It includes the itp files for DNA as well as a modified martinize script to coarse grain a DNA molecule.
Martini DNA can be used to model both single-stranded as well as double-stranded DNA and provides two separate elastic networks. This tutorial will show only an example of creating a dsDNA topology using the stiff elastic network but a dsDNA topology with soft network or a ssDNA topology without elastic network can be created using the same procedure. For dsDNA structures the stiff elastic network is a good starting point for seeing how your system behaves in a CG simulation. There are also options for creating elastic networks for more complicated DNA structure, martinize-dna optionsall-stiffandall-softare meant to be starting points for creating topologies for such structures but they are beyond the scope of this tutorial. The DNA release package intrjconv -s 03-run.tpr -f 03-run.xtc -o ../DNA_analysis/dna.xtc -pbc clustercludes a README file that should have sufficient instructions for creating the other types of DNA topologies after you have finished this tutorial.
1. Unpack the tutorial files you downloaded (it expands todna-tutorial).
2. Go to the folder dna-tutorial/martini-dna and download the 1BNA PDB file if you didn't already. Martinize for DNA works best with cleaned up .gro files so you should first remove other atoms from the .pdb file and convert it to a .gro file.
$ grep -v HETATM 1BNA.pdb > 1BNA-cleaned.pdb
$ editconf -f 1BNA-cleaned.pdb -o 1BNA-cleaned.gro
3. Next we can use martinize-dna.py (supplied for you in the current directory) to create a Martini DNA topology for 1BNA.
$ python martinize-dna.py -dnatype ds-stiff -f 1BNA-cleaned.gro -o cg-dna.top -x cg-1bna.pdb
Next, you need to change the .top file to include the correct .itp files. This can be done with your favourite text editor but scriptable options are given below. Note: there are two sets of commands here, the first for Mac OSX, the second for Linux. In both cases, the backslashes (\) need to be typed as indicated.
$ sed -i .bak 's/#include "martini.itp"/#include "martini_v2.1-dna.itp"\
#include "martini_v2.0_ions.itp"/' cg-dna.top (OS X)
$ cp cg-dna.top cg-dna.top.bak (Linux)
$ sed -i 's/#include "martini.itp"/#include "martini_v2.1-dna.itp"\n#include "martini_v2.0_ions.itp"/' cg-dna.top (Linux)
4. You can visualize the mapping of Martini DNA by opening both the atomistic and coarse-grained structures in VMD. (The visualization is clearer if you show the CG structure as small VDW spheres.)
$ vmd -m 1BNA-cleaned.pdb cg-1bna.pdb
5. Next you need to create a simulation box for DNA and solvate it as well as add counterions. (Download box of water from here.)
$ editconf -f cg-1bna.pdb -d 1.2 -bt dodecahedron -o box.gro
$ genbox -cp box.gro -cs water.gro -o bw.gro -vdwd 0.22 -maxsol 1250
You can replace water with ions in .top file without using genion at the same time as we add waters there. Note: the next line assumes that exactly 1250 water beads were added to the DNA by the genbox tool; the first 1100 added beads are designated as normal water beads, the next 128 beads are designated as anti-freeze water beads, and the final 22 beads are designated as sodium ions.
$ printf "\nW 1100\nWF 128\nNA 22\n" >> cg-dna.top
6. Use the example.mdp provided in the directory and make the required modifications to run an energy minimization (integrator = steep, nsteps = 100).
$ cp example.mdp example-em.mdp
$ [gedit/vi] example-em.mdp
$ grompp -f example-em.mdp -c bw.gro -p cg-dna.top -o 01-em -maxwarn 1
$ mdrun -v -deffnm 01-em
7. Next, you can run a short equilibration using the original example.mdp file. Note, a smaller time step might be needed for initial equilibration. If this is the case, edit the example.mdp file and specify a smaller time step. Do a series of short equilibrations, increasing the time step up to the value in the example.mdp file and starting each one from the final structure of the previous one.
$ grompp -f example.mdp -c 01-em.gro -p cg-dna.top -o 02-eq
$ mdrun -v -deffnm 02-eq -rdd 2.0
8. Now you are ready to start your first DNA simulation. Copy the example.mdp file to produce a longer simulation, do the preprocessing and start the simulation. Note that with the elastic network the domain cell size needs to be kept fairly large (option -rdd 2.0).
$ cp example.mdp example-run.mdp
$ [gedit/vi] example-run.mdp
$ grompp -f example-run.mdp -c 02-eq.gro -p cg-dna.top -o 03-run
$ mdrun -v -deffnm 03-run -rdd 2.0
9. You can visualize the movements of the DNA, ions and waters using VMD while your simulation is still running (you will need to open a new terminal). Note that bonds in Martini are longer than bonds normally shown in VMD. Please consult the VMD tutorial to get better visualization.
$ vmd 02-eq.gro 03-run.xtc
Below are some some of the most common pitfalls when starting a Martini DNA simulation. We go through them here so that you know to avoid them when starting your own DNA simulations.
1. Not curating the input file formartinize-dna.py
There are several things that can go wrong beforemartinize-dna.pyis even run. Best course of action is to manually curate the input file, remove all heteroatoms and convert the file to a gro file.martinize-dna.pyrequires the residuenames to be 'DA','DC', etc. in order to detect them correctly.
You can try this yourself. Go to the foldercommon-problems/1inside the package you downloaded. Try runningmartinize-dna.pywith the given input file. Then try to fix the issue by modifying the residuenames using a text editor or a script (column editing in vi comes really handy here) and see thatmartinize-dna.pyruns succesfully after this.
2. Placing the "#define RUBBER_BANDS" below the DNA itp file in the system topology file.
The topology file is read from top to bottom and if theRUBBER_BANDSare not defined before the DNA itp file no elastic network will be applied to the DNA. This is the first place to look if the DNA structure is deforming right in the beginning of your simulation.
Go to the foldercommon-problems/2and run a short simulation with the provided files. Then open thecg.topfile and fix it and run the same simulation again and see the difference.
3. Getting wrong number of chains frommartinize-dna.py
This often goes unseen unless you check the output ofmartinize-dna.pyon your screen. Especially the 'Found X chains:...' part of the output. If a chain is split in two you can try to fix that by adjusting the cut-off inside the martinize-dna.py. Open the script and search forCUT-OFF_CHANGEand adjust the value from 3.0 to a suitably large value.
Go to the foldercommon-problems/3and open the gro file using VMD to see that the structure consist of 2 DNA strands. Next, try runningmartinize-dna.py(provided here in the same folder so you won't modify the version other parts of the tutorial use) on the given input file and verify from output that you get 3 chains instead of 2. Open martinize-dna.py with a text editor and search forCUT-OFF_CHANGEand adjust the cut-off value from3.0to4.0. Runmartinize-dna.pyagain and verify that it detects the DNA as 2 strands.
Input from sequence
In case you want to create an all-atom DNA structure from canonical values, there are several softwares/webservers that can help with this. One of those is NAFlex which requires only the user to create an account to save his/her projects. NAFlex allows several ways to build your DNA structure.
1. After giving a name for your project, we will select the option DNA/RNA Simulation from Sequence (although we will not run any simulation here).Click the "Next" button.
2. As an example you can build a relatively short oligonucleotide according to the right-handed B-DNA Arnott parameters with the following sequence (enter the string and click "Create Project":
3.- Click on the "Next" buttons until you see a window with a pdb file (compare the picture below). Clicking on the link to the pdb file will reveal a set of buttons. Download the file. No it is time to check the output structure after you have downloaded and unzippid the output file.
4. You can now use the downloaded .pdb file and build a Martini model from it as shown for 1BNA in the previous section. You can copy the necessary files from that directory.
NB: NAFlex has multiple applications but in this tutorial we will focus on building a structure from ab initio.
DNA analysis: physical parameters
As for all-atomistic models, we can calculate the base pairs and base pair steps helical parameters for coarse-grained DNA models. cgHeliParm.py has been developed for the analysis of the DNA/RNA double stranded structures in molecular dynamics (MD) simulations. Like other atomistic softwares (e.g. 3DNA, Curves+), it defines a geometric reference frame for each base/base pair to calculate several structural descriptors of DNA/RNA from the GROMACS CG Martini MD trajectory. As output the helical descriptors as a function of time are saved in individual files.
For now, the current version of cgHeliParm.py allows to calculate the local structural parameters of double stranded DNA molecules (i.e. base pair and base pair step helical parameters). We are currently working on extending the calculation to other structural parameters such as groove dimensions. Please note that in this version only double-stranded, standard nucleic acids can be analysed. This excludes mispaired, modified, single-stranded DNA, triplexes and quadruplexes from its application.
Before using cgHeliParm.py you need first to install python in your computer. If you don’t have it, go to the Python wiki and follow the instructions to install it in your working computer.
The second thing you will have to do is to install the python package MDAnalysis. To install/upgrade it from the command line using easy_install or pip:
<p style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt; text-align: justify;" dir="ltr"> </p>
easy_install -U MDAnalysis
pip install --upgrade MDAnalysis
Once python and MDAnalysis package are installed and mdreader is in our working directory, we can use cgHeliParm.py. First, we will have to create a ‘dry’ trajectory with only DNA. This can be done using the trjconv Gromacs tool and the corresponding NDX file. For this case, we will use the same trajectory from the previous DNA tutorial (1BNA). Go to the martini-dna tutorial and convert the trajectory twice (selecting the group for DNA when prompted, which should be number 1).
trjconv -s 03-run.tpr -f 03-run.xtc -o ../DNA_analysis/dna.xtc -pbc cluster
trjconv -s 03-run.tpr -f 03-run.xtc -o ../DNA_analysis/dna.gro -dump 0
Now, change to the directory DNA_analysis:
in this directory, we can run the cgHeliParm.py script by running in the command line:
python cgHeliParm.py -f dna.xtc -s dna.gro -o dna
This may take some time depending on the length of your trajectory and number of processors available. After the script has finished you will see that there are several files in your folder, corresponding to base pair and base pair steps helical parameters (file names are informative). Each file contains information for each bp or step along the simulation time. The first column refers to the first bp or step in the 5’3’ strand sense.
We provide also a tool to calculate the persistence-length of a double-stranded DNA. This tool requires both a fairly long dsDNA strands as well as a long simulation so we provide an example trajectory to try this out. The tool uses two different ways of averaging the results but the numbers should be fairly close to each other.
To calculate the persistence length go to the folderpersistence-lengthand run the following command:
$ python persistence-length-calculator.py -s run.gro -f run.xtc -n run.ndx -oplot test -b 100000
Advanced tutorial: protein-DNA complex
DNA is recognized by proteins in many different ways. As a first approximation to protein-DNA complexes, we will use the SRY protein (1j46, which is provided in the directory, but can also be downloaded here), which is involved in sex determination, to generate our first protein-DNA complex simulation using the CG Martini model. Go to the directory dna-tutorial/protein_DNA.
First things first. To create our CG model of a protein-DNA complex from a PDB file we will need to split both protein and DNA molecules to generate their CG models. This means that you will need to create as many PDB files as molecules your complex has. To put everything on the same page, we will call them protein_AA.pdb and dna_AA.pdb. (This has been done for you already, you may want to also try it yourself!) We will then proceed to create their CG models with the corresponding martinize script.
1. We will first build the CG DNA using the martinize-dna script but before that we will convert the .pdb file into a .gro file:
editconf -f dna_AA.pdb -o dna_aa.gro
python martinize-dna.py -dnatype ds-stiff -f dna_aa.gro -o dna.top -x dna_CG.pdb
And for the CG protein model we will use the original martinize script:
editconf -f protein_AA.pdb -o protein_aa.gro
python martinize.py -f protein_aa.gro -o protein.top -x protein_CG.pdb -dssp /opt/local/bin/dssp -p backbone -ff elnedyn22
2. We will now merge the PDB files corresponding to the CG models (protein_CG.pdb and dna_CG.pdb) using the cat command for example.
cat dna_CG.pdb protein_CG.pdb > complex_CG.pdb
3. You can visualize the mapping of Martini DNA by opening both the atomistic and coarse-grained structures in VMD. The CG beads are best visualized as VDW spheres. Note that by concatenating the protein and DNA files, the combined CG coordinate file is interpreted by VMD as two frames. By stepping through them in the VMD main window, you can see the protein and the DNA sites.
vmd -m 1j46.pdb complex_CG.pdb
4. As in the double stranded DNA tutorial for 1BNA, we will create a .top file that includes the .itp files. We can copy the dna.top file and modify to include the correct .itp files. With the experience from previous tutorial, would you be able to write a .top file for the system yourself? Check by comparing your result to the provided file cg-complex.top.
5. Next we will solvate the solute creating a box around it as well as adding counterions. We will use the same water.gro file that we used previously to create the 1BNA CG model. (Download water box from here.) Note that by concatenating the protein and DNA files, the combined coordinate file is seen as two separate frames by GROMACS tools. Therefore, we need a preprocessing step.
grep ATOM complex_CG.pdb > complex_CG-cleaned.pdb
editconf -f complex_CG-cleaned.pdb -d 1.2 -bt dodecahedron -o box.gro
genbox -cp box.gro -cs water.gro -o bw.gro -vdwd 0.22 -maxsol 1250
You can replace water with ions in .top file without using genion at the same time as we add waters there.
printf "\nW 1100\nWF 124\nNA 26\n" >> cg-complex.top
6. Use the same example.mdp file provided with in the directory dna-tutorial/martini-dna and make the required modifications to run an energy minimization (integrator = steep, nsteps = 100).
grompp -f example-em.mdp -c bw.gro -p cg-complex.top -o 01-em -maxwarn 1
mdrun -v -deffnm 01-em
7. Next, you can run a short equilibration using the original example.mdp file. (Note, a smaller time step might be needed for initial equilibration).
grompp -f example.mdp -c 01-em.gro -p cg-complex.top -o 02-eq
mdrun -v -deffnm 02-eq -rdd 2.0
8. Now you are ready to start your first protein-DNA simulation. Modify the .mdp file to produce a longer simulation and start.
grompp -f example-run.mdp -c 02-eq.gro -p cg-complex.top -o 03-run
mdrun -v -deffnm 03-run -rdd 2.0
9. You can visualize the trajectory using VMD while your simulation is still running.
vmd 02-eq.gro 03-run.xtc
DNA and Backward
DNA structures can be backmapped into atomistic resolution using Backward. Before starting this part of the tutorial you should first go and do the Backward tutorial. The DNA backmapping files are as of now not yet part of the downloadable
backward package but files for CHARMM are in the tutorial package in the
backmapping folder together with the
backward scripts. Next, we will first coarse-grain a dsDNA structure and then backmap it back to atomistic resolution.
1. Got to the dna-tutorial/backmapping directory and cleanup and convert the given DNA structure 1BNA.pdb to a gro file in a reasonable simulation box.
$ editconf -f 1BNA.pdb -d 1.4 -o 1BNA.gro
2. Create a CG structure using martinize-dna.py.
$ python ../martini-dna/martinize-dna.py -f 1BNA.gro -dnatype ds-stiff -p cg.top -x cg-1BNA.pdb
3. Solvate the CG structure.
$ genbox -cp cg-1BNA.pdb -cs waterbox.gro -vdwd 0.22 -o cg-1bna-water.gro
4. Create an atomistic topology file for the system. Choose charmm27 and tip3p when prompted.
$ pdb2gmx -f 1BNA.pdb
5. Edit the file topol.top to point explicitly to the tip3p.itp file provided for you (it will not recognize it in the standard path unfortunately). Thus, replace the line #include "charmm27/tip3p.itp" near the end of the file by #include "./tip3p.itp". Run the initram.sh script to perform the backmapping.
$ ./initram.sh -f cg-1bna-water.gro -o aa-charmm.gro -to charmm36 -p topol.top
6. Compare the initial structure to the backmapped structure.
$ vmd -m 1BNA.gro aa-charmm.gro