1 UNRES - A PROGRAM FOR COARSE-GRAINED SIMULATIONS OF PROTEINS
2 ------------------------------------------------------------
11 3. General information
13 3.2. Functions of the program
14 3.2. Companion programs
15 3.4. Programming language
20 5. Customizing your batch and C-shell script
22 6. Command line and files
27 8.1. Main input data file
29 8.1.2. Control data (data list format; READ_CONTROL subroutine)
30 8.1.2.1 Keywords to chose calculation type
31 8.1.2.2 Specification of protein and structure output in non-MD
33 8.1.2.3. Miscellaneous
34 8.1.3. Minimizer options (data list, subroutine READ_MINIM)
35 8.1.4. CSA control parameters
36 8.1.5. MCM data (data list, subroutine MCMREAD)
37 8.1.6. MD data (subroutine READ_MDPAR)
38 8.1.7. REMD/MREMD data (subroutine READ_REMDPAR)
39 8.1.8. Energy-term weights (data list; subroutine MOLREAD)
40 8.1.9. Input and/or reference PDB file name (text format; subroutine MOLREAD)
41 8.1.10. Amino-acid sequence (free and text format)
42 8.1.11. Disulfide-bridge information (free format; subroutine READ_BRIDGE)
43 8.1.12. Dihedral-angle restraint data (free format; subroutine MOLREAD)
44 8.1.13. Distance restraints (subroutine READ_DIST_CONSTR)
45 8.1.14. Internal coordinates of the reference structure (free format;
46 subroutine READ_ANGLES)
47 8.1.15. Internal coordinates of the initial conformation (free format;
48 subroutine READ_ANGLES)
49 8.1.15.1. File name with internal coordinates of the conformations
51 8.1.16 Control data for energy map construction (data lists;
54 8.3. Input coordinate files
55 8.4. Other input files
59 9.1.1. The internal coordinate (INT) files
60 9.1.2. The plain Cartesian coordinate (X) files
61 9.1.3. The compressed Cartesian coordinate (CX) files
62 9.1.4. The Brookhaven Protein Data Bank format (PDB) files
63 9.1.5. The SYBYLL (MOL2) files
64 9.2. The summary (STAT) file
66 8.2.2. MD and MREMD runs
67 9.3. CSA-specific output files
69 10. Technical support contact information
74 * This software is provided free of charge to academic users, subject to the
75 condition that no part of it be sold or used otherwise for commercial
76 purposes, including, but not limited to its incorporation into commercial
77 software packages, without written consent from the authors. For permission
78 contact Prof. H. A. Scheraga, Cornell University.
80 * This software package is provided on an "as is" basis. We in no way warrant
81 either this software or results it may produce.
83 * Reports or publications using this software package must contain an
84 acknowledgment to the authors and the NIH Resource in the form commonly used
90 The current and former developers of UNRES are listed in this section in alphabetic
91 order together with their current or former affiliations.
93 Maurizio Chinchio (formerly Cornell Univ., USA)
94 Cezary Czaplewski (Univ. of Gdansk, Poland)
95 Carlo Guardiani (Georgia State Univ., USA)
96 Yi He (Cornell Univ., USA)
97 Justyna Iwaszkiewicz (Swiss Institute of Bioinformatics, Switzerland)
98 Dawid Jagiela (Univ. of Gdansk, Poland)
99 Stanislaw Jaworski (deceased)
100 Sebastian Kalinowski (Univ. of Gdansk, Poland)
101 Urszula Kozlowska (deceased)
102 Rajmund Kazmierkiewicz (Univ. of Gdansk, Poland)
103 Jooyoung Lee (Korea Institute for Advanced Studies, Korea)
104 Adam Liwo (Univ. of Gdansk, Poland)
105 Mariusz Makowski (Univ. of Gdansk, Poland)
106 Marian Nanias (formerly Cornell Univ., USA)
107 Stanislaw Oldziej (Univ. of Gdansk, Poland)
108 Jaroslaw Pillardy (Cornell Univ., USA)
109 Daniel Ripoll (formerly Cornell Univ., USA)
110 Jeff Saunders (Schrodinger Inc., USA)
111 Harold A. Scheraga (Cornell Univ., USA)
112 Hujun Shen (Dalian Institute of Chemical Physics, P.R. China)
113 Adam Sieradzan (Univ. of Gdansk, Poland)
114 Ryszard Wawak (formerly Cornell Univ., USA)
115 Bartlomiej Zaborowski (Univ. of Gdansk, Poland)
117 3. GENERAL INFORMATION
118 ----------------------
123 Run coarse-grained calculations of polypeptide chains with the UNRES force field.
124 There are two versions of the package which should be kept separate because of
125 non-overlapping functions: version which runs global optimization (Conformational
126 Space Annealing, CSA) and version that runs coarse-grained molecular dynamics and
127 its extension. Because the installation, input file preparation and running CSA
128 and MD versions are similar, a common manual is provided. Items specific
129 for the CSA and MD version are marked "CSA" and "MD", respectively.
131 MD version can be used to run multiple-chain proteins (however, that version of
132 the code is a new release and might fail if yet un-checked functions are used).
133 The multi-chain CSA version for this purpose is another package (written largely in
136 3.2. Functions of the program
137 -----------------------------
139 1. Perform energy evaluation of a single or multiple conformations
140 (serial and parallel) (CSA and MD)
142 2. Run canonical mesoscopic molecular dynamics (serial and parallel) (MD).
144 3. Run replica exchange (REMD) and multiplexing replica exchange (MREMD)
145 dynamics (parallel only) (MD).
147 4. Run multicanonical molecular dynamics (parallel only) (MD).
149 5. Run energy minimization (serial and parallel) (CSA and MD).
151 6. Run conformational space annealing (CSA search) (parallel only) (CSA).
153 7. Run Monte Carlo plus Minimization (MCM) (parallel only) (CSA).
155 8. Run conformational family Monte Carlo (CFMC) calculations (CSA).
157 9. Thread the sequence against a database from the PDB and minimize energy of
158 each structure (CSA).
160 Energy and force evaluation is parallelized in MD version.
162 3.3. Companion programs
163 -----------------------
165 The structures produced by UNRES can be used as inputs to the following programs provided
166 with this package or separately:
168 xdrf2pdb - converts the compressed coordinate files from MD (but not MREMD)runs into
171 xdrf2pdb-m - same for MREMD runs (multiple trajectory capacity).
173 xdrf2x - converts the plain Cartesian coordinate files into PDB format.
175 WHAM - processes the coordinate files from MREMD runs and computes temperature profiles
176 of ensemble averages and computes the probabilities of conformations at selected
177 temperatures; also prepares data for CLUSTER and ZSCORE.
179 CLUSTER - does the cluster analysis of the conformations; for MREMD runs takes the
180 coordinate files from WHAM which contain information to compute probabilities
181 of conformations at any temperature.
183 PHOENIX - conversion of UNRES conformations to all-atom conformations.
185 ZSCORE - force field optimization (for developers).
187 Please consult the manuals of the corresponding packages for details. Note that not
188 all of these packages are released yet; they will be released depending on their
189 readiness for distribution. Contact Adam Liwo, Cezary Czaplewski or Stanislaw Oldziej
190 for developmental versions of these programs.
192 3.4. Programming language
193 -------------------------
195 This version of UNRES is written almost exclusively in Fortran 77; some subroutines
196 for data management are in ansi-C. The package was parallelized with MPI.
201 Citing the following references in your work that makes use of UNRES is gratefully
204 [1] A. Liwo, S. Oldziej, M.R. Pincus, R.J. Wawak, S. Rackovsky, H.A. Scheraga.
205 A united-residue force field for off-lattice protein-structure simulations.
206 I: Functional forms and parameters of long-range side-chain interaction potentials
207 from protein crystal data. J. Comput. Chem., 1997, 18, 849-873.
209 [2] A. Liwo, M.R. Pincus, R.J. Wawak, S. Rackovsky, S. Oldziej, H.A. Scheraga.
210 A united-residue force field for off-lattice protein-structure simulations.
211 II: Parameterization of local interactions and determination
212 of the weights of energy terms by Z-score optimization.
213 J. Comput. Chem., 1997, 18, 874-887.
215 [3] A. Liwo, R. Kazmierkiewicz, C. Czaplewski, M. Groth, S. Oldziej, R.J. Wawak,
216 S. Rackovsky, M.R. Pincus, H.A. Scheraga.
217 United-residue force field for off-lattice protein-structure simulations.
218 III. Origin of backbone hydrogen-bonding cooperativity in united-residue potentials.
219 J. Comput. Chem., 1998, 19, 259-276.
221 [4] A. Liwo, C. Czaplewski, J. Pillardy, H.A. Scheraga.
222 Cumulant-based expressions for the multibody terms for the correlation between
223 local and electrostatic interactions in the united-residue force field.
224 J. Chem. Phys., 2001, 115, 2323-2347.
226 [5] J. Lee, D.R. Ripoll, C. Czaplewski, J. Pillardy, W.J. Wedemeyer, H.A. Scheraga,
227 Optimization of parameters in macromolecular potential energy functions by
228 conformational space annealing. J. Phys. Chem. B, 2001, 105, 7291-7298
230 [6] J. Pillardy, C. Czaplewski, A. Liwo, W.J. Wedemeyer, J. Lee, D.R. Ripoll,
231 P. Arlukowicz, S. Oldziej, Y.A. Arnautova, H.A. Scheraga,
232 Development of physics-based energy functions that predict medium-resolution
233 structures for proteins of the alpha, beta, and alpha/beta structural classes.
234 J. Phys. Chem. B, 2001, 105, 7299-7311
236 [7] A. Liwo, P. Arlukowicz, C. Czaplewski, S. Oldziej, J. Pillardy, H.A. Scheraga.
237 A method for optimizing potential-energy functions by a hierarchical design
238 of the potential-energy landscape: Application to the UNRES force field.
239 Proc. Natl. Acad. Sci. U.S.A., 2002, 99, 1937-1942.
241 [8] J. A. Saunders and H.A. Scheraga.
242 Ab initio structure prediction of two $\alpha$-helical oligomers
243 with a multiple-chain united-residue force field and global search.
244 Biopolymers, 2003, 68, 300-317.
246 [9] J.A. Saunders and H.A. Scheraga.
247 Challenges in structure prediction of oligomeric proteins at the united-residue
248 level: searching the multiple-chain energy landscape with CSA and CFMC procedures.
249 Biopolymers, 2003, 68, 318-332.
251 [10] S. Oldziej, U. Kozlowska, A. Liwo, H.A. Scheraga.
252 Determination of the potentials of mean force for rotation about Calpha-Calpha
253 virtual bonds in polypeptides from the ab initio energy surfaces of terminally
254 blocked glycine, alanine, and proline. J. Phys. Chem. A, 2003, 107, 8035-8046.
256 [11] A. Liwo, S. Oldziej, C. Czaplewski, U. Kozlowska, H.A. Scheraga.
257 Parameterization of backbone-electrostatic and multibody contributions
258 to the UNRES force field for protein-structure prediction from ab initio
259 energy surfaces of model systems. J. Phys. A, 2004, 108, 9421-9438.
261 [12] S. Oldziej, A. Liwo, C. Czaplewski, J. Pillardy, H.A. Scheraga.
262 Optimization of the UNRES force field by hierarchical design of the
263 potential-energy landscape. 2. Off-lattice tests of the method with single
264 proteins. J. Phys. Chem. B., 2004, 108, 16934-16949.
266 [13] S. Oldziej, J. Lagiewka, A. Liwo, C. Czaplewski, M. Chinchio,
267 M. Nanias, H.A. Scheraga.
268 Optimization of the UNRES force field by hierarchical design of the
269 potential-energy landscape. 3. Use of many proteins in optimization.
270 J. Phys. Chem. B., 2004, 108, 16950-16959.
272 [14] M. Khalili, A. Liwo, F. Rakowski, P. Grochowski, H.A. Scheraga.
273 Molecular dynamics with the united-residue model of polypeptide chains.
274 I. Lagrange equations of motion and tests of numerical stability in the
275 microcanonical mode, J. Phys. Chem. B, 2005, 109, 13785-13797.
277 [15] M. Khalili, A. Liwo, A. Jagielska, H.A. Scheraga.
278 Molecular dynamics with the united-residue model of polypeptide chains.
279 II. Langevin and Berendsen-bath dynamics and tests on model $\alpha$-helical
280 systems. J. Phys. Chem. B, 2005, 109, 13798-13810.
282 [16] A. Liwo, M. Khalili, H.A. Scheraga.
283 Ab initio simulations of protein-folding pathways by molecular dynamics with
284 the united-residue model of polypeptide chains.
285 Proc. Natl. Acad. Sci. U.S.A., 2005, 102, 2362-2367.
287 [17] F. Rakowski, P. Grochowski, B. Lesyng, A. Liwo, H. A. Scheraga.
288 Implementation of a symplectic multiple-time-step molecular dynamics algorithm,
289 based on the united-residue mesoscopic potential energy function.
290 J. Chem. Phys., 2006, 125, 204107.
292 [18] M. Nanias, C. Czaplewski, H.A. Scheraga.
293 Replica exchange and multicanonical algorithms with the coarse-grained
294 united-residue (UNRES) force field.
295 J. Chem. Theory and Comput., 2006, 2, 513-528.
297 [19] A. Liwo, M. Khalili, C. Czaplewski, S. Kalinowski, S. Oldziej, K. Wachucik,
299 Modification and optimization of the united-residue (UNRES) potential energy
300 function for canonical simulations. I. Temperature dependence of the effective
301 energy function and tests of the optimization method with single training
303 J. Phys. Chem. B, 2007, 111, 260-285.
305 [20] U. Kozlowska, A. Liwo, H.A. Scheraga.
306 Determination of virtual-bond-angle potentials of mean force for coarse-grained
307 simulations of protein structure and folding from ab initio energy surfaces of
308 terminally-blocked glycine, alanine, and proline.
309 J. Phys.: Condens. Matter, 2007, 19, 285203.
311 [21] M. Chinchio, C. Czaplewski, A. Liwo, S. Oldziej, H.A. Scheraga.
312 Dynamic formation and breaking of disulfide bonds in molecular dynamics
313 simulations with the UNRES force field.
314 J. Chem. Theory and Comput., 2007, 3, 1236-1248.
316 [22] A.V. Rojas, A. Liwo, H.A. Scheraga.
317 Molecular dynamics with the united-residue force field: Ab Initio folding
318 simulations of multichain proteins.
319 J. Phys. Chem. B, 2007, 111, 293-309.
321 [23] A. Liwo, C. Czaplewski, S. Oldziej, A.V. Rojas, R. Kazmierkiewicz,
322 M. Makowski, R.K. Murarka, H.A. Scheraga.
323 Simulation of protein structure and dynamics with the coarse-grained UNRES
324 force field. In: Coarse-Graining of Condensed Phase and Biomolecular
325 Systems., ed. G. Voth, Taylor & Francis, 2008, Chapter 8, pp. 107-122.
327 [24] C. Czaplewski, S. Kalinowski, A. Liwo, H.A. Scheraga.
328 Application of multiplexed replica exchange molecular dynamics
329 to the UNRES force field: tests with $\alpha$ and $\alpha+\beta$ proteins.
330 J. Chem. Theor. Comput., 2009, 5, 627-640.
332 [24] Y. He, Y. Xiao, A. Liwo, H.A. Scheraga.
333 Exploring the parameter space of the coarse-grained UNRES force field by random
334 search: selecting a transferable medium-resolution force field.
335 J. Comput. Chem., 2009, 30, 2127-2135.
337 [25] U. Kozlowska, A. Liwo. H.A. Scheraga.
338 Determination of side-chain-rotamer and side-chain and backbone
339 virtual-bond-stretching potentials of mean force from AM1 energy surfaces of
340 terminally-blocked amino-acid residues, for coarse-grained simulations of
341 protein structure and folding. 1. The Method.
342 J. Comput. Chem., 2010, 31, 1143-1153.
344 [26] U. Kozlowska, G.G. Maisuradze, A. Liwo, H.A. Scheraga.
345 Determination of side-chain-rotamer and side-chain and backbone
346 virtual-bond-stretching potentials of mean force from AM1 energy surfaces of
347 terminally-blocked amino-acid residues, for coarse-grained simulations of
348 protein structure and folding. 2. Results, comparison with statistical
349 potentials, and implementation in the UNRES force field.
350 J. Comput. Chem., 2010, 31, 1154-1167.
352 [27] A. Liwo, S. Oldziej, C. Czaplewski, D.S. Kleinerman, P. Blood, H.A. Scheraga.
353 Implementation of molecular dynamics and its extensions with the coarse-grained
354 UNRES force field on massively parallel systems; towards millisecond-scale
355 simulations of protein structure, dynamics, and thermodynamics.
356 J. Chem. Theor. Comput., 2010, 6, 890-909.
361 The distribution is contained in the UNRES.tar.gz file. To uncompress say:
363 gzip -cd UNRES.tar.gz | tar xf -
365 This will produce a directory named UNRES with the following subdirectories:
367 src_CSA - the CSA-version source directory.
369 src_MD - the MD-version source directory, single chains.
371 src_MD-M - the MD-version source directory, oligomeric proteins
373 bin - the binaries/scripts directory; its BATCH_SCRIPTS directory contains the
374 batch scripts (at present the only example is for PBS: unres_3P_PBS.csh,
375 which is an UNRES calling script and start.mat, which is the batch script
376 submitted to the PBS system).
378 doc - documentation (this file and EXAMPLES.TXT)
380 examples - sample input files (see EXAMPLES.TXT for description).
382 To produce the executable do the following:
384 a) To build parallel version, make sure that MPI is installed in your system.
385 Note that the package will have limited functions when compiled in a single-CPU mode.
386 On linux cluster the command source $HOME/.env should be added to .tcshrc
387 or equivalent file to use parallel version of the program, the
388 alternative is to use queuing system like PBS.
389 In some cases the FORTRAN library subroutine GETENV does not work properly
390 with MPI, if the script is run interactively. In such a case try to
391 add the source mygentenv.F and turn on the -DMYGETENV preprocessor flag.
393 b) Change directory to the respective source directory.
395 c) Edit the appropriate Makefile (parallel program that includes CSA
396 procedure, the serial version is no longer supported, for serial task
397 parallel program can be run using only one processor) to customize to your
398 system. Makefiles for the following systems are provided:
400 Makefile_osf_f90 - OSF1/Tru64 UNIX HP Alphaserver with f90 compiler,
401 Makefile_lnx_pgf90 - Linux, the pgf90 compiler,
402 Makefile_lnx_ifc - Linux, ifc compiler.
403 Makefile_win_pgf90 - Windows, the pgf90 compiler.
405 Other systems should not cause problems; all you have to do is to change
406 the compiler, compiler options, and preprocessor options. Also, change the
407 BIN variable, if you want to put your binaries in other place than
408 PROTARCH/BIN. In the case of Makefile make sure that the MPI directories are
411 The following architectures are defined in the .F source files:
413 AIX - AIX systems (put -DAIX as one of the preprocessor options, if
416 LINUX - Linux (put -DLINUX)
418 G77 - Gnu-Fortran compilers (might require sum moderate source code editing)
419 (put -DG77). The recommended compiler is gfortran and not g77.
423 WINPGI - additional setting for PGI compilers for MS Windows
425 SGI - all SGI platforms; should also be good for SUN platforms (put -DSGI)
427 WIN - MS Windows with Digital Fortran compiler (put -DWIN)
429 For other platforms, the only problems might appear in connection with
430 machine-specific I/O instructions. Many files are opened in the append
431 mode, whose specification in the OPEN statement is quite machine-dependent.
432 In this case you might need to modify the source code accordingly.
433 The other platform dependent routines are the timing routines contained
434 in timing.F. In addition to the platforms specified above, ES9000, SUN,
435 KSR, and CRAY are defined there.
437 For parallel build -DMP and -DMPI must be set (these are set in Makefile).
439 IMPORTANT! Apart from this, two define flags: -DCRYST_TOR and -DMOMENT
440 define earlier versions of the force field. The MUST NOT be entered, if
441 the CASP5 and later versions of the force field are used.
443 d) Build the unres executables by typing at your UNIX prompt:
445 make # will build unres
447 make clean # will remove the object files
449 The bin directory contains pre-built binaries for Red Hat Linux. These
450 executables are specified in the csh scripts listed in section 4.
452 e) Customize the C-shell scripts unres.unres (to run the parallel version on
453 set of workstation). See the next section of this manual for guidance.
455 After the executables are build and C-shell scripts customized, you can run the
456 test examples contained in UNRES/examples.
458 5. CUSTOMIZING YOUR C-SHELL SCRIPT
459 ----------------------------------
461 IMPORTANT NOTE - The unres.csh script is for Linux and should also be easily
462 adaptable to other systems running MPICH. This script is for interactive
463 parallel jobs. Examples of scripts compatible with PBS (pbs.sub) and LoadLever
464 (sp2.sub) queuing systems are also provided.
466 Edit the following lines in your unres.csh script:
468 set DD = your_database_directory
470 e.g., if you installed the package on the directory /usr/local, this line
473 set DD = /usr/local/UNRES/PARAM
475 set BIN = your_binaries_directory
477 set FGPROCS = number_of_processors_per_energy/force_evaluation (MD)
479 e.g., if the root directory is as above:
481 set BIN = /usr/local/UNRES/bin
483 6. COMMAND LINE AND FILES
484 -------------------------
486 To run UNRES interactively enter the following command at your Unix prompt
487 or put it in the batch script:
489 unres.csh POTENTIAL INPUT N_PROCS
493 POTENTIAL specifies the side-chain interaction potential type and must be
494 one of the following:
496 LJ - 6-12 radial Lennard-Jones
497 LJK - 6-12 radial Lennard-Jones-Kihara (shifted Lennard Jones)
498 BP - 6-12 anisotropic Berne-Pechukas based on Gaussian overlap (dilated
500 GB - 6-12 anisotropic Gay-Berne (shifted Lennard-Jones)
501 GBV - 6-12 anisotropic Gay-Berne-Vorobjev (shifted Lennard-Jones)
503 See section 4. (Force Fields) for explanation and usage.
505 At present, only the LJ and GB potentials are applied. The LJ potential
506 is used in the "CASP3" version of the UNRES force field that is able
507 to predict only alpha-helical structures. All further version of the
508 UNRES force field use the GB potential. For the description of all above-mentioned
509 potentials see A. Liwo, St. Oldziej, M.R. Pincus, R.J. Wawak, S. Rackovsky,
510 H.A. Scheraga, J. Comput. Chem., 1997, 18, 849-873.
512 INPUT is the prefix for input and output files (see below)
514 N_PROCS is the number of processors; for a CSA or REMD/MREMD run it MUST be at least 2.
516 Note! The script takes one more variable, FGPROCS, as the fourth argument,
517 which is the number of fine-grain processors to parallelize energy
518 evaluations. The corresponding code is in UNRES/CSA, but it was written
519 using MPL instead of MPI and therefore is never used in the present version.
520 At present we have no plans to rewrite fine-grain parallelization using MPI,
521 because we found that the scalability for up to 200 residue polypeptide
522 chains was very poor, due to a small number of interactions and,
523 correspondingly, unfavorable ratio of the overhead to the computation time.
525 INPUT.inp contains the main input data and the control parameters of the CSA
528 INPUT.out_POTENTIAL_xxx - main output files from different processors; xxx
529 denotes the number of the processor
531 INPUT_POTENTIALxxx.stat - summary files with the energies, energy components,
532 and RMS deviations of the conformations produced by each of the processors;
533 not used in CSA runs; also it outputs different quantity in MD/MREMD runs.
535 CSA version specific files:
537 INPUT_POTENTIALxxx.int - internal coordinates; in the CSA run
538 INPUT_POTENTIAL_000.int contains the coordinates of the conformations,
539 and the other files are empty
541 INPUT.CSA.history - history file from a CSA run. This is an I/O file, because
542 it can be used to restart an interrupted CSA run.
544 INPUT.CSA.seed - stores the random seed generated in a CSA run; written for
547 INPUT.CSA.bank - current bank of conformations obtained in CSA calculations
548 (expressed as internal coordinates). This information is also stored in
549 INPUT_POTENTIAL000.int
551 INPUT.CSA.rbank - as above, but contains random-generated conformations.
553 MD version specific files:
555 INPUT_MDyyy.pdb - Cartesian coordinates of the conformations in PDB format.
557 INPUT_MDyyy.x - Cartesian coordinates of the conformations in ASCII format.
559 INPUT_MDyyy.cx - Cartesian coordinates of the conformations in compressed format
560 (need xdr2pdb to convert to PDB format).
562 The program currently produces some more files, but they are not used
563 for any purposes and most of them are scratched after a run is completed.
565 The run script also contains definitions of the parameter files through the
566 following environmental variables:
568 SIDEPAR - parameters of the SC-SC interaction potentials (U_{SC SC});
569 SCPPAR - parameters of the SC-p interaction potential (U_{SCp}); this file can
570 be ignored by specifying the -DOLDSCP preprocessor flag, which means that the
571 built-in parameters are used; at present they are the same as the parameters
572 in the file specified by SCPPAR;
573 ELEPAR - parameters of the p-p interaction potentials (U_{pp});
574 FOURIER - parameters of the multibody potentials of the coupling between the
575 backbone-local and backbone-electrostatic interactions (U_{corr});
576 THETPAR - parameters of the virtual-bond-angle bending potentials (U_b);
577 ROTPAR - parameters of the side-chain rotamer potentials (U_{rot});
578 TORPAR - parameters of the torsional potentials (U_{rot});
579 TORDPAR - parameters of the double-torsional potentials.
580 SCCORPAR - parameters of the supplementary torsional sequence-specific potentials
581 (not implemented yet).
586 UNRES is being developed since 1997 and several versions of the force field
587 were produced. The settings and references to these force fields are
590 Force fields for CSA version (can be used in MD but haven't been parameterized for this
593 ---------------------------------------------------------------------------------------
594 Additional SC-SC Example script Structural
595 Force field compiler flags potential and executables classes covered References
598 ---------------------------------------------------------------------------------------
600 CASP3 -DCRYST_TOR LJ unres_CASP3.csh only alpha [1-3]
601 -DCRYST_BOND unres_pgf90_cryst_tor.exe
602 -DCRYST_THETA unres_ifc6_cryst_tor.exe
606 ALPHA -DMOMENT GB unres_CASP4.csh only alpha [4-6]
607 -DCRYST_BOND unres_pgf90_moment.exe
608 -DCRYST_THETA unres_ifc6_moment.exe
611 BETA -DMOMENT GB unres_CASP4.csh only beta [4-6]
612 -DCRYST_BOND unres_pgf90_moment.exe
613 -DCRYST_THETA unres_ifc6_moment.exe
616 ALPHABETA -DMOMENT GB unres_CASP4.csh all [4-6]
617 -DCRYST_BOND unres_pgf90_moment.exe
618 -DCRYST_THETA unres_ifc6_moment.exe
621 CASP5 -DCRYST_BOND GB unres_CASP5.csh all [7,8,11]
622 -DCRYST_THETA unres_pgf90.exe
623 -DCRYST_SC unres_ifc6.exe
625 3P -DCRYST_BOND GB unres_3P.csh all [12,13]
626 -DCRYST_THETA unres_pgf90.exe
627 -DCRYST_SC unres_ifc6.exe
629 4P -DCRYST_BOND GB unees_4P.csh all [12,13]
630 -DCRYST_THETA unres_pgf90.exe
631 -DCRYST_SC unres_ifc6.exe
632 ---------------------------------------------------------------------------------------
634 Force fields for MD version
636 ---------------------------------------------------------------------------------------
637 Additional SC-SC Example script Structural
638 Force field compiler flags potential and executables classes covered References
641 ---------------------------------------------------------------------------------------
643 GAB -DCRYST_BOND GB unres_GAB.csh mostly alpha [19]
647 E0G -DCRYST_BOND GB unres_E0G.csh mostly alpha [19]
651 1L2Y_1LE1 none GB unres_ab.csh all [20,25-27]
653 ---------------------------------------------------------------------------------------
655 The example scripts (the *.csh filed) contain all appropriate parameter files, while
656 the energy-term weights are provided in the example input files listed in EXAMPLES.TXT
657 (*.inp; see section 5. for description of the input files). However, it is user's
658 responsibility to specify appropriate compiler flags. Note that a version WILL NOT work,
659 if the force-field specific compiler flags are not set. The parameter files specified
660 in the run script also must strictly correspond to the energy-term weights specified in
661 the input file. The parameter files for specific force fields are also specified below
662 and the energy-term weights are specified in section 5.
664 The parameter files are as follows (the environment variables from section 3 are
665 used to identify the parameters):
670 THETPAR thetaml.5parm
672 TORPAR torsion_cryst.parm
673 TORDPAR torsion_double_631Gdp.parm (not used)
674 SIDEPAR scinter_LJ.parm
677 FOURIER fourier_GAP.parm (not used)
678 SCCORPAR rotcorr_AM1.parm (not used)
680 ALPHA, BETA, ALPHABETA (CASP4):
683 THETPAR thetaml.5parm
685 TORPAR torsion_ecepp.parm
686 TORDPAR torsion_double_631Gdp.parm (not used)
687 SIDEPAR scinter_GB.parm
690 FOURIER fourier_GAP.parm
691 SCCORPAR rotcorr_AM1.parm (not used)
696 THETPAR thetaml.5parm
698 TORPAR torsion_631Gdp.parm
699 TORDPAR torsion_double_631Gdp.parm
700 SIDEPAR scinter_GB.parm
701 ELEPAR electr_631Gdp.parm
703 FOURIER fourier_opt.parm.1igd_iter7n_c
704 SCCORPAR rotcorr_AM1.parm (not used)
709 THETPAR thetaml.5parm
711 TORPAR torsion_631Gdp.parm
712 TORDPAR torsion_double_631Gdp.parm
713 SIDEPAR sc_GB_opt.3P7_iter81_1r
714 ELEPAR electr_631Gdp.parm
716 FOURIER fourier_opt.parm.1igd_hc_iter3_3
717 SCCORPAR rotcorr_AM1.parm (not used)
722 THETPAR thetaml.5parm
724 TORPAR torsion_631Gdp.parm
725 TORDPAR torsion_double_631Gdp.parm
726 SIDEPAR sc_GB_opt.4P5_iter33_3r
727 ELEPAR electr_631Gdp.parm
729 FOURIER fourier_opt.parm.1igd_hc_iter3_3
730 SCCORPAR rotcorr_AM1.parm (not used)
735 THETPAR thetaml.5parm
737 TORPAR torsion_631Gdp.parm
738 TORDPAR torsion_double_631Gdp.parm
739 SIDEPAR sc_GB_opt.1gab_3S_qclass5no310-shan2-sc-16-10-8k
740 ELEPAR electr_631Gdp.parm
742 FOURIER fourier_opt.parm.1igd_hc_iter3_3
743 SCCORPAR rotcorr_AM1.parm
748 THETPAR thetaml.5parm
750 TORPAR torsion_631Gdp.parm
751 TORDPAR torsion_double_631Gdp.parm
752 SIDEPAR sc_GB_opt.1e0g-52-17k-2k-newclass-shan1e9_gap8g-sc
753 ELEPAR electr_631Gdp.parm
755 FOURIER fourier_opt.parm.1igd_hc_iter3_3
756 SCCORPAR rotcorr_AM1.parm
760 BONDPAR bond_AM1.parm
761 THETPAR theta_abinitio.parm
762 ROTPAR rotamers_AM1_aura.10022007.parm
763 TORPAR torsion_631Gdp.parm
764 TORDPAR torsion_double_631Gdp.parm
765 SIDEPAR scinter_${POT}.parm
766 ELEPAR electr_631Gdp.parm
768 FOURIER fourier_opt.parm.1igd_hc_iter3_3
769 SCCORPAR rotcorr_AM1.parm
771 Additionally, for 1L2Y_1LE1, the following environment variables and files are required
772 to generate random conformations:
774 THETPARPDB thetaml.5parm
775 ROTPARPDB scgauss.parm
777 For CSA, the best force field is 4P. For MD, the 1L2Y_1LE1 force field is best for
778 ab initio prediction but provides medium resolution (5 A for 60-residue proteins) and
779 overemphasizes beta structures and has to be run with secondary-structure-prediction
780 information. For prediction of the structure of mostly alpha-protein, and for running
781 dynamics of large proteins, the best is the GAB force field. All these force fields
782 were trained by using our procedure of hierarchical optimization [5].
783 The 4P and 1L2Y_1LE1 force fields have considerable power independent of structural class.
784 The ALPHA, BETA, and ALPHABETA force fields (for CSA) were used in the CASP4 exercises
785 and the CASP5 force field was used in the CASP5 exercise with some success; ALPHA
786 predicts reasonably the structure of alpha-helical proteins and is still not obsolete,
787 while for beta and alpha+beta structure prediction
788 3P or 4P should be used, because they are cheaper and more reliable than BETA and
789 ALPHABETA. The early CASP3 force field is included for historical reasons only.
794 7.1. Main input data file
795 -------------------------
797 Most of the data are organized as data lists, where the data can be put
798 in any order, using a series of statements of the form:
802 for simple non-logical variables
808 to indicate that the corresponding option is turned on. For array variables
809 the assignment statement is:
811 KEYWORD=value1,value2,...
813 However, the data lists are unnamed and that must be placed EXACTLY in the
814 order indicated below. The presence of an "&" in the 80th column of a line
815 indicates that the next line will belong to the same data group. The parser
816 subroutines that interpret the keywords are case insensitive.
818 Each group of data organized as a data list is indicated as "data list format"
823 Any string containing up to 80 characters. The first input line is always
824 interpreted as title.
826 8.1.2. Control data (data list format; READ_CONTROL subroutine)
827 ---------------------------------------------------------------
829 8.1.2.1 Keywords to chose calculation type
830 ------------------------------------------
832 OUT1FILE - only the master processor prints the output file in a parallel job
834 MINIMIZE - if present, energy minimization will be carried out.
836 REGULAR - regularize the read in conformation (usually a crystal or
837 NMR structure) by doing a series of three constrained minimizations,
838 to keep the structure as close as possible to the starting
839 (experimental) structure. The constraints are the CA-CA distances
840 of the initial structure. The constraints are gradually diminished
841 and removed in the last minimization.
843 SOFTREG - regularize the read in conformation (usually a crystal or NMR
844 structure) by doing a series of constrained minimizations, with
845 additional use of soft potential and secondary structure
846 freezing, to keep the structure as close as possible to the
847 starting (experimental) structure.
850 CSA - if present, the run is a CSA run. At present, this is the only
851 reliable mode of doing global conformational search with this
852 package; it is NOT recommended to use MCM or THREAD for this
855 MCMA - if present, this is a Monte Carlo Minimization (MCM) run.
857 MULTCONF- if present, conformations will be read from the INPUT.intin
860 MD - run canonical MD (single or multiple trajectories)
862 RE - run REMD or MREMD (parallel jobs only)
864 MUCA - run multicanonical MD calculations (parallel jobs only)
867 Conformational map will be calculated in chosen angles.
869 THREAD=number (integer)
870 Threading or threading-with-minimization run, using a database of structures
871 contained in the $DD/patterns.cart pattern data base (502 chains or chain
872 fragments), using a total number patterns. It is recommended to use this with
873 energy minimization; this implies regularization of each minimized pattern.
874 For references see A. Liwo, M.R. Pincus, R.J. Wawak,
875 S. Rackovsky, St. Oldziej, H.A. Scheraga, J. Comput. Chem., 1997, 18, 874-887
876 and A. Liwo, St. Oldziej, R. Kazmierkiewicz, M. Groth, C. Czaplewski,
877 Acta Biochim. Pol., 1997, 44, 527-547.
879 CHECKGRAD - compare numerical and analytical gradient; to be followed by:
880 CART - energy gradient in virtual-bond vectors (Cartesian coordinates)
881 INT - energy gradient in internal coordinates (default)
882 CARINT - derivatives of the internal coordinates in the virtual-bond vectors.
884 8.1.2.2 Specification of protein and structure output in non-MD applications
885 ----------------------------------------------------------------------------
887 ONE_LETTER - one-letter and not three-letter code of the amino-acid residues
890 SYM (1) - number of chains with same sequence (for oligomeric proteins only),
892 PDBSTART - the initial conformation is read in from a PDB file
894 UNRES_PDB - the starting conformation is in UNRES representation (Calpha
895 and SC coordinates only). This keyword MUST appear in such a case
896 or the program will generate erroneous and unrealistic side-chain
899 RAND_CONF- start from a random conformation
901 EXTCONF - start from an extended chain conformation
903 PDBOUT - if present, conformations will be output in PDB format. Note that
904 this keyword affects only the output from single energy evaluation,
905 energy minimization and multiple-conformation data. To request
906 conformations from MD/MREMD runs in PDB format, the MDPDB keyword
907 must be placed on the MD input record.
909 MOL2OUT - if present, conformations will be output in SYBYL mol2 format
911 REFSTR - if present, reference structure will be read (e.g., to monitor
912 the RMS deviation from the crystal structure)
914 PDBREF - if present, a reference structure will be read in to compare
915 the calculated conformations with it
917 UNRES_PBD - the starting/reference structure is read from an UNRES-generated
920 Keywords: PDBOUT, MOL2OUT, PDBREF, and PDBSTART are ignored for a CSA run.
921 Output mode for MD version is specified in MD input (see section 5.5).
923 8.1.2.3. Miscellaneous
924 ----------------------
927 0 - no distance restraints
928 >0 imposes harmonic restraints on selected distances; see section 5.12.
929 In MD version, also restraints on the q variable [18] can be used.
932 the weight of the distance term; applies for REGULARIZE and THREAD, otherwise
935 USE_SEC_PRED - use secondary-structure prediction information.
937 SEED=number (integer) (no default)
938 Random seed (required, even if the run is not a CSA, MCM, MD or MREMD run)
940 PHI - only the virtual-bond dihedral angles gamma are considered as
941 variables in energy minimization
943 BACK - only the backbone virtual angles (virtual-bond angles theta and
944 virtual-bond dihedral angles gamma) are considered as variables
945 in energy minimization
947 By default, all internal coordinates: theta, gamma, and the side-chain
948 centroid polar angles alpha and beta are considered as variables in energy
951 RESCALE_MODE=number (real)
952 Choice of the type of temperature dependence of the force field.
953 0 - no temperature dependence
954 1 - homographic dependence (not implemented yet with any force field)
955 2 - hyperbolic tangent dependence [18].
958 temperature (for MD runs and temperature-dependent force fields).
960 The following keywords apply to MCM only:
962 MAXGEN=number (integer) (10000)
963 maximum number of conformations generated in a single MCM iteration
965 MAXOVERLAP=number (integer) (1000)
966 maximum number of conformations with "bad" overlaps allowed to appear in a
967 row in a single MCM iteration.
969 DISTCHAINMAX - (multi-chain capacity only) maximum distance between the
970 last residue of a given chain and the first residue of the
971 next chain such that restraints will not be imposed; quartic
972 restraints will be imposed for greater distances.
974 ENERGY_DEC - detailed energies will be printed for each interacting pair
975 or each virtual bond, virtual-bond angle and dihedral angle,
976 side chain, etc. DO NOT use unless a single energy evaluation
979 8.1.3. Minimizer options (data list, subroutine READ_MINIM)
980 -----------------------------------------------------------
982 This data group is present, if MINIMIZE was specified on the control card.
983 Otherwise, it must not appear.
985 CART - minimize in virtual-bond vectors instead of angles
987 MAXMIN=number (integer) (2000)
988 maximum number of iterations of the SUMSL minimizer
990 MAXFUN=number (integer) (5000)
991 maximum number of function evaluations in a single minimization
993 TOLF=number (real) (1.0e-2)
994 Tolerance on function
996 RTOLF=number (real) (1.0d-4)
997 Relative tolerance on function
999 The SUMSL minimizer is used in UNRES/CSA. For detailed description of
1000 the control parameters see the source file cored.f and sumsld.f
1003 8.1.4 CSA control parameters
1004 ----------------------------
1006 This data group should be present only, if CSA was specified on the control
1007 card. It is recommended that the readers to read publications on CSA method
1008 for more complete description of the parameters. Brief description of
1011 NCONF=number (integer) (50)
1012 This corresponds to the size of the bank at the beginning of the
1013 CSA procedure. The size of the bank, nbank, is set to nconf.
1014 If necessary (at much later stages of the CSA: see icmax below),
1015 nbank increases by multiple of nconf.
1017 JSTART=number (integer) (1)
1018 JEND=number (integer) (1)
1019 This corresponds to the limit values of do loop, each of which
1020 corresponds to an separate CSA run. If jstart=1, and jstart=100,
1021 this routine will repeat 100 separate CSA runs (limited by CPU)
1022 each one with separate random number initialization.
1023 The only difference between two CSA runs (one with jstart=jend=1
1024 and another one with jstart=jend=2) would be different random
1025 number initializations if other parameters are identical.
1027 NSTMAX=number (integer) (500000)
1028 This is to set a limit the total number of local minimizations of CSA
1031 N1=number (integer) (6)
1032 N2=number (integer) (4)
1033 N3=number (integer) (0)
1034 N4=number (integer) (0)
1035 N5=number (integer) (0)
1036 N6=number (integer) (10)
1037 N7=number (integer) (0)
1038 N8=number (integer) (0)
1039 N9=number (integer) (0)
1040 IS1=number (integer) (1)
1041 IS2=number (integer) (8)
1042 These numbers are used to generate trial conformations for each seed.
1043 See the file, "newconf.f", for more details.
1044 n1: the total number of trial conformations for each seed by substituting
1045 nran number of variable angles (see subroutine newconf1ab and
1046 subroutine newconf1ar)
1047 n2: the total number of trial conformations for each seed by substituting
1048 nran number of groups of variable angles (see subroutine newconf1bb and
1049 subroutine newconf1br)
1050 n3: the total number of trial conformations for each seed by substituting
1051 a window of residues which forms a beta-hairpin, if there is no enough
1052 beta-hairpins uses the same algorithm as n6
1053 n4: the total number of trial conformations for each seed by shifting the
1054 turn in beta-hairpin by +/- 1 or 2 residues, if there is no enough
1055 beta-hairpins uses the same algorithm as n6
1057 n6: the total number of trial conformations for each seed by substituting
1058 a window of residues [is1,is2] inclusive. The size of the window is
1059 determined in a random fashion (see subroutine newconf_residue for
1060 generation of the trial conformations)
1061 n7: the total number of trial conformations for each seed by copying a
1062 remote strand pair forming nonlocal beta-sheet contact
1063 n8: the total number of trial conformations for each seed by copying an
1064 alpha-helical segment
1065 n9: the total number of trial conformations for each seed by shifting the
1066 alpha-helical segment by +/- 1 or 2 residues
1068 Typical values used for a 75-residue helical protein is
1069 (6 4 0 0 0 10 1 26) for (n1,n2,n3,n4,n5,n6,is1,is2), respectively.
1070 In this example, a total of 20 trial conformations are generated for a seed
1071 Usually is1=1 is used for all applications, and the value of is2 is set about
1072 to 1/3 of the total number of residues. n3, n4 and n7 are design to help in
1073 case of proteins with beta-sheets
1075 NRAN0=number (integer) (4)
1076 NRAN1=number (integer) (2)
1077 IRR=number (integer) (1)
1078 These numbers are used to determine if the CSA stage is very early.
1079 One can use (4 2 1) for these values. For more details one should look into
1080 the file, "newconf.f", for more details.
1082 NTOTAL=number (integer) (10000)
1083 CUT1=number (real) (2.0)
1084 CUT2=number (real) (5.0)
1085 Annealing schedule is set in following fashion.
1086 The value of D_cut is reduced geometrically from 1/cut1 of D_ave (at the
1087 beginning) to 1/cut2 of D_ave (after ntotal number of minimizations) where
1088 D_ave is the average distance between two conformations in the First_bank.
1090 ESTOP=number (real) (-3000.0)
1091 The CSA procedure stops if a conformations with energy lower than estop is
1092 obtained. If the do-loop set by jstart and jend requires more than one loop,
1093 the program will go on until the do-loop is finished.
1095 ICMAX=number (integer) (3)
1096 The maximum value of cycle (see the original publications for details).
1097 If the number of cycle exceeds this value the program will add nconf
1098 more conformations to Bank and First_bank to continue CSA procedure if
1099 the new size of the nbank is within the maximum set by nbankm (see above).
1100 If the size of nbank exceeds the maximum set by nbankm the CSA procedure
1101 for this run will stop and next CSA will begin depending on the do-loop
1102 set by jstart and jend.
1104 IRESTART=number (integer) (0)
1105 This tells you if the run is fresh start (irestart=0) or a restart (irestart=1)
1106 starting from an old results
1108 NDIFF=number (integer) (2)
1109 The number of variables use in comparison when structure is added to the
1110 bank,4 - all angels, 2 - only backbone angles gamma and theta
1112 NBANKTM=number (integer) (0)
1113 The maximum number of structures saved in *.CSA.bankt as history of the run
1114 Do not use bankt on massively parallel computation as it kills scalability.
1116 DELE=number (real) (20.0)
1117 Energy cutoff for bankt.
1119 DIFCUT=number (real) (720.0)
1120 Angle cutoff for bankt.
1122 IREF=number (integer) (0)
1123 0 - normal run, 1 - local CSA which generates only structures close to the
1124 reference one read from *.CSA.native.int file
1126 RMSCUT=number (real) (4.0)
1127 CA RMSD cut off used in local CSA
1129 PNCCUT=number (real) (0.5)
1130 Percentage of native contact used in local CSA
1132 NCONF_IN=number (integer) (0)
1133 The number of conformation read for the first bank from the input file
1136 Optionally, the CSA parameters can be read from file INPUT.CSA.in, if
1137 this file exists. If so, they are read in free format in the following
1143 n1,n2,n3,n4,n5,n6,n7,n8,is1,is2
1154 8.1.5. MCM data (data list, subroutine MCMREAD)
1155 -----------------------------------------------
1157 This data group is present, if MCM was specified on the control card.
1158 Otherwise it must not appear.
1160 MAXACC=number (integer) (100)
1161 Maximum number of accepted conformations
1163 MAXTRIAL=number (integer) (100)
1164 Maximum number of unsuccessful trials in a row
1166 MAXTRIAL_ITER=number (integer) (1000)
1167 Maximum number of unsuccessful trials in a single iteration
1169 MAXREPM=number (integer) (200)
1170 Maximum number of repetitions of the same minimum
1172 RANFRACT=number (real) (0.5d0)
1173 Fraction of chain-rebuild motions
1175 OVERLAP=number (real) (1.0d3)
1176 Bad contact energy criterion
1178 NSTEPH=number (integer) (0)
1179 Number of heating step in adaptive sampling
1181 NSTEPC=number (integer) (0)
1182 Number of cooling step in adaptive sampling
1184 TMIN=number (real) (298.0d0)
1185 Minimum temperature in adaptive-temperature sampling)
1187 TMAX=number (real) (298.0d0)
1188 Maximum temperature in adaptive-temperature sampling)
1190 The temperature is changed according to the formula:
1192 T = TMIN*EXP(ISTEPH*(TMAX-TMIN)/NSTEPH) when heating
1196 T = TMAX*EXP(-ISTEPC*(TMAX-TMIN)/NSTEPC) when cooling
1198 The default is to use a constant temperature.
1200 NWINDOW=number (integer) (0)
1201 Number of windows in which the variables will be perturbed; the windows are
1202 defined by the numbers of the respective amino-acid residues. If NWINDOW
1203 is nonzero, after specifying all MCM input the next lines must define the
1204 windows. Each line looks like this:
1206 winstart winend (free format)
1208 e.g. if NWINDOW=2, the input:
1213 will mean that only the variables of residues 4-10 and 15-20 will be perturbed.
1214 However, in general, all variables will be considered in minimization.
1217 Printout level in MCM. 0 - no intermediate printing, 1 and 2 - moderate
1218 printing, 3 - extensive printing.
1220 NO_PRINT_STAT - no output to INPUT_POTENTIALxxx.stat.
1222 NO_PRINT_INT - no internal-coordinate output to INPUT_POTENTIALxxx.int.
1224 8.1.6. MD data (subroutine READ_MDPAR)
1225 --------------------------------------
1227 NSTEP (1000000) number of time steps per trajectory.
1229 NTWE (100) NTWX (1000) frequency of energy and coordinate output, respectively.
1230 The coordinates are dumped in the pdb or compressed Gromacs (cx) format,
1231 depending on the next keyword.
1232 NTWE=0 means no energy dump.
1234 MDPDB - dump coordinates in the PDB format (cx otherwise)
1236 TRAJ1FILE only the master processor outputs coordinates. This feature pertains
1237 only to REMD/MREMD jobs and overrides NTWX; coordinates are dumped at every
1240 REST1FILE only the master writes the restart file
1242 DT (real) (0.1) time step; the unit is "molecular time unit" (mtu); 1 mtu = 48.9 fs
1244 DAMAX (real) (1.0) maximum allowed change of acceleration during a single time step.
1245 The time step gets scaled down, if this is exceeded.
1247 DVMAX (real) (20.0) maximum allowed velocity (in A/mtu)
1249 EDRIFTMAX (real) (10.0) maximum allowed energy drift in a single MD step (10 kcal/mol)
1251 REST restart flag. The calculation is restarted if present.
1253 LARGE very detailed output. Don't use except for debugging.
1255 PRINT_COMPON prints energy components.
1257 RESET_MOMENT (1000) frequency of zeroing out the total angular momentum when
1258 running Berendsen mode calculations (for Langevin calculations meaningless).
1260 RESET_VEL=number (integer) (1000) - frequency of resetting velocities to values
1261 from Gaussian distribution.
1263 RATTLE - use RATTLE algorithm (constraint bonds); not yet implemented.
1265 RESPA - use the Multiple Time Step (MTS) or Adaptive Multiple Time Step (A-MTS)
1266 algorithm [17]. Without this flag the variable time step (VTS) [14] is run.
1268 NTIME_SPLIT=number (integer) (1) - initial number of time-split steps
1270 MAXTIME_SPLIT=number(integer) (64) - maximum number of time-split step
1272 If NTIME_SPLIT==MAXTIME_SPLIT, MTS is run.
1274 R_CUT=number (real) (2.0) - the cut-off distance in splitting the forces into short- and
1275 long-range in site-site VDW distance units.
1277 LAMBDA (real) (0.3) - the transition length (in site-site VDW distance units) between
1278 short- and long-range forces.
1280 XIRESP - flag to use MTS/A-MTS with Nose-Hoover/Nose-Poincare thermostats.
1282 LANG=number (integer) (0) Langevin dynamics flag:
1284 0 - No explicit Langevin dynamics.
1285 1 - Langevin with direct integration of the equations of motion (recommended
1286 for Langevin calculations)
1287 2 - Langevin calculation with analytical pre-integration of the friction and
1288 stochastic part of the equations of motion using an algorithm adapted from TINKER.
1289 This is MUCH MORE time- and memory-consuming than 1 and requires compiling without
1290 the -DLANG0 flag and enormously increases memory requirements.
1291 3 - The stochastic integrator developed by Cicotti and coworkers.
1292 4 - for other stochastic integrators (not used at present).
1294 Note: With the enclosed code, the -DLANG0 compiler flag is included which disables
1297 TBF Berendsen thermostat.
1299 TAU_BATH (1.0) (units are mtus; 1mtu=48.9 fs) constant of the coupling to the thermal bath
1300 used with the Berendsen thermostat.
1302 NOSEPOINCARE99 - the Nose-Poincare thermostat as of 1999 will be used.
1304 NOSEPOINCARE01 - the Nose-Poincare thermostat as of 2001 will be used.
1306 NOSEHOOVER96 - the Nose-Hoover thermostat will be used.
1308 Q_NP=number (real) (0.1) - the value of the mass of the fictitious particle in the calculations
1309 with the Nose-Poincare thermostat.
1311 T_BATH (300.0) (in K) temperature of canonical simulation or temperature to generate
1314 ETAWAT (0.8904) viscosity of water (in centipoises)
1316 RWAT (1.4) radius of water molecule (in A)
1318 SCAL_FRIC=number (real) (0.02) - scaling factor of the friction coefficients.
1320 SURFAREA - scale friction acting on atoms by atoms' solvent accessible area.
1322 RESET_FRICMAT=number (integer) (1000) - recalculate friction matrix every RESET_FRICMAT MD steps.
1324 USAMPL restraints on q (see reference 5 for meaning) will be imposed (see section .
1325 In this case, the next records specify the restraints; these records are
1326 placed before the list of temperatures or numbers of trajectories.
1328 EQ_TIME=number (real) (1.0e4) time (in mtus; 1 mtu=48.9 fs) after which restraints
1329 on q will start to be in force.
1331 If USAMPL has been specified, the following information must be supplied after the
1332 main MD input data record (subroutine READ_FRAGMENTS):
1334 Line 1: nset, npair, nfrag_back (number of sets of restraints, number of restrained
1335 fragments, number of restrained pairs, number of restrained backbone fragments
1336 (in terms of theta and gamma angles)
1338 For each set of restraints (1, 2,..., nset):
1340 mset(iset) - how many times the set is multiplied
1342 wfrag(i,iset), ifrag(1,i,iset), ifrag2(2,i,iset),qfrag(i,iset)
1343 weight of the restraint, first and last residue of the fragment, target q value.
1344 This information is repeated through nfrag.
1346 wpair(i,iset), ipair(1,i,iset), ipair(2,i,iset),qinpair(i,iset)
1347 weight of the restraint, first and second fragment of the pair (according to fragment
1348 list), target q value. This information is repeated through npair
1350 wfrag_back(1,i,iset), wfrag_back(2,i,iset), wfrag_back(3,i,iset),
1351 ifrag_back(1,i,iset),ifrag_back(2,i,iset)
1352 weight of the restraints on theta angles, weight on the restraints on gamma angles,
1353 weight of the restraints on side-chain rotamers, first residue of the fragment,
1354 last residue of the fragment. This information is repeated through nfrag_back.
1356 8.1.7 REMD/MREMD data (subroutine READ_REMDPAR)
1357 -----------------------------------------------
1359 NREP (3) number of replicas in a REMD/MREMD run
1361 NSTEX (1000) number of steps after which exchange is performed in REMD/MREMD
1364 The temperatures in replicas can be specified through
1366 RETMIN (10.0) minimum temperature in a REMD/MREMD run
1368 RETMAX (1000.0) maximum temperature in a REMD/MREMD run
1370 Then the range from retmin to retmax is divided into equal segments and
1371 temperature of the replicas assigned accordingly,
1375 TLIST means that the NREP temperature of the replicas will be input in the
1378 MLIST numbers of trajectories per each of the NREP temperatures will be
1379 specified in the record after the list of temperatures; this specifies
1382 Important! The number of processors must be exactly equal to the number of
1383 trajectories, i.e., NREP for a REMD run or sum_i mlist(i) for a MREMD run.
1385 SYNC - all trajectories will be synchronized every NSTEX time steps
1386 (by default, they are not synchronized)
1388 TRAJ1FILE only the master processor outputs coordinates. This feature pertains
1389 only to REMD/MREMD jobs and overrides NTWX; coordinates are dumped at every
1392 REST1FILE only the master writes the restart file
1394 HREMD - Hamiltonian replica exchange flag; not only temperatures but also
1395 sets energy-term weights are exchanged between conformations.
1397 TONLY - run a "fake" HREMD with many sets of energy-term weights in a
1398 single run but only temperature exchange.
1400 8.1.8 Energy-term weights (data list; subroutine MOLREAD)
1401 ---------------------------------------------------------
1403 WLONG=number (real) (1.0d0)
1404 common weight of the U(SC-SC) (side-chain side-chain interaction)
1405 and U(SC,p) (side-chain peptide-group) term
1407 WSCC = number (real) (WLONG)
1408 weight of the U(SC-SC) term
1410 WSCP = number (real) (WLONG)
1411 weight of the U(SC-p) term
1413 WELEC=number (real) (1.0d0)
1414 weight of the U(p-p) (peptide-group peptide-group interaction) term
1416 WEL_LOC=number (real) (1.0d0)
1417 weight of the U_el_loc^3 (local-electrostatic cooperativity, third-order) term
1419 WCORRH=number (real) (1.0d0)
1420 weight of the U(corr) (cooperativity of hydrogen-bonding interactions, fourth-order) term
1422 WCORR5=number (real) (0.0d0)
1423 weight of the U_el_loc^5 (local-electrostatic cooperativity, 5th order
1426 WCORR6=number (real) (0.0d0)
1427 weight of the U_el_loc^6 (local-electrostatic cooperativity, 6th order
1430 WTURN3=number (real) (1.0d0)
1431 weight of the U_turn^3 (local-electrostatic cooperativity within 3 residue
1432 segment, 3rd order contribution)
1434 WTURN4=number (real) (1.0d0)
1435 weight of the U_turn^4 (local-electrostatic cooperativity within 4 residue
1436 segment, 4rd order contributions)
1438 WTURN6=number (real) (1.0d0)
1439 weight of the U_turn^6 (local-electrostatic cooperativity within 6 residue
1440 segment, 6rd order contributions)
1442 WTOR=number (real) (1.0d0)
1443 weight of the torsional term U(tor)
1445 WANG=number (real) (1.0d0)
1446 weight of the virtual-bond angle bending term U(b)
1448 WSCLOC=number (real) (1.0d0)
1449 weight of the side-chain rotamer term U(SC)
1451 WSTRAIN=number (real) (1.0d0)
1452 scaling factor of the distance-constrain or disulfide-bond strain energy term
1454 SCALSCP=number (real) (1.0d0)
1455 scaling factor of U(SC,p); this is an alternative to specifying WSCP; in
1456 this case WSCP will be calculated as WLONG*SCALSCP
1458 SCAL14=number (real) (1.0d0)
1459 scaling factor of the 1,4 SC-p interactions
1461 CUTOFF (7.0) - cut-off on backbone-electrostatic interactions to compute 4-
1462 and higher-order correlations
1464 DELT_CORR (0.5) - thickness of the distance range in which the energy is
1467 The defaults are NOT the recommended values. No "working" default values
1468 have been set, because the force field is still under development. The values
1469 corresponding to the force fields listed in section 4 are as follows:
1472 WELEC=1.5 WSTRAIN=1.0 WTOR=0.08617 WANG=0.10384 WSCLOC=0.10384 WCORR=1.5 &
1473 WTURN3=0 WTURN4=0 WTURN6=0 WEL_LOC=0 WCORR5=0 WCORR6=0 SCAL14=0.40 SCALSCP=1.0 &
1474 CUTOFF=7.00000 WSCCOR=0.0
1477 WSC=1.00000 WSCP=0.72364 WELEC=1.10890 WANG=0.68702 WSCLOC=1.79888 &
1478 WTOR=0.30562 WCORRH=1.09616 WCORR5=0.17452 WCORR6=0.36878 WEL_LOC=0.19508 &
1479 WTURN3=0.00000 WTURN4=0.55588 WTURN6=0.11539 CUTOFF=7.00000 WCORR4=0.0000 &
1480 WTORD=0.0 WSCCOR=0.0
1483 WSC=1.00000 WSCP=1.10684 WELEC=0.70000 WANG=0.80775 WSCLOC=1.91939 &
1484 WTOR=3.36070 WCORRH=2.50000 WCORR5=0.99949 WCORR6=0.46247 WEL_LOC=2.50000 &
1485 WTURN3=1.80121 WTURN4=4.35377 WTURN6=0.10000 CUTOFF=7.00000 WCORR4=0.00000 &
1489 WSC=1.00000 WSCP=1.43178 WELEC=0.41501 WANG=0.37790 WSCLOC=0.12880 &
1490 WTOR=1.98784 WCORRH=2.50526 WCORR5=0.23873 WCORR6=0.76327 WEL_LOC=2.97687 &
1491 WTURN3=0.09261 WTURN4=0.79171 WTURN6=0.01074 CUTOFF=7.00000 WCORR4=0.00000 &
1495 WSC=1.00000 WSCP=1.54864 WELEC=0.20016 WANG=1.00572 WSCLOC=0.06764 &
1496 WTOR=1.70537 WTORD=1.24442 WCORRH=0.91583 WCORR5=0.00607 WCORR6=0.02316 &
1497 WEL_LOC=1.51083 WTURN3=2.00764 WTURN4=0.05345 WTURN6=0.05282 WSCCOR=0.0 &
1498 CUTOFF=7.00000 WCORR4=0.00000 WSCCOR=0.0
1501 WSC=1.00000 WSCP=2.85111 WELEC=0.36281 WANG=3.95152 WSCLOC=0.15244 &
1502 WTOR=3.00008 WTORD=2.89863 WCORRH=1.91423 WCORR5=0.00000 WCORR6=0.00000 &
1503 WEL_LOC=1.72128 WTURN3=2.99827 WTURN4=0.59174 WTURN6=0.00000 &
1504 CUTOFF=7.00000 WCORR4=0.00000 WSCCOR=0.0
1507 WSC=1.00000 WSCP=2.73684 WELEC=0.06833 WANG=4.15526 WSCLOC=0.16761 &
1508 WTOR=2.99546 WTORD=2.89720 WCORRH=1.98989 WCORR5=0.00000 WCORR6=0.00000 &
1509 WEL_LOC=1.60072 WTURN3=2.36351 WTURN4=1.34051 WTURN6=0.00000 &
1510 CUTOFF=7.00000 WCORR4=0.00000 WSCCOR=0.0
1513 WLONG=1.35279 WSCP=1.59304 WELEC=0.71534 WBOND=1.00000 WANG=1.13873 &
1514 WSCLOC=0.16258 WTOR=1.98599 WTORD=1.57069 WCORRH=0.42887 WCORR5=0.00000 &
1515 WCORR6=0.00000 WEL_LOC=0.16036 WTURN3=1.68722 WTURN4=0.66230 WTURN6=0.00000 &
1516 WVDWPP=0.11371 WHPB=1.00000 &
1517 CUTOFF=7.00000 WCORR4=0.00000
1520 WLONG=1.70905 WSCP=2.18310 WELEC=1.06684 WBOND=1.00000 WANG=1.17536 &
1521 WSCLOC=0.22070 WTOR=2.65798 WTORD=2.00646 WCORRH=0.23541 WCORR5=0.00000 &
1522 WCORR6=0.00000 WEL_LOC=0.42789 WTURN3=1.68126 WTURN4=0.75080 WTURN6=0.00000 &
1523 WVDWPP=0.27044 WHPB=1.00000 WSCP14=0.00000 &
1524 CUTOFF=7.00000 WCORR4=0.00000
1527 WLONG=1.00000 WSCP=1.23315 WELEC=0.84476 WBOND=1.00000 WANG=0.62954 &
1528 WSCLOC=0.10554 WTOR=1.84316 WTORD=1.26571 WCORRH=0.19212 WCORR5=0.00000 &
1529 WCORR6=0.00000 WEL_LOC=0.37357 WTURN3=1.40323 WTURN4=0.64673 WTURN6=0.00000 &
1530 WVDWPP=0.23173 WHPB=1.00000 WSCCOR=0.0 &
1531 CUTOFF=7.00000 WCORR4=0.00000
1533 8.1.9. Input and/or reference PDB file name (text format; subroutine MOLREAD)
1534 -----------------------------------------------------------------------------
1536 If PDBSTART or PDBREF was specified in the control card, this line contains
1537 the PDB file name. Trailing slashes to specify the full path are permitted.
1538 The file name can contain up to 64 characters.
1540 8.1.10. Amino-acid sequence (free and text format)
1541 --------------------------------------------------
1543 This data appears, if PDBSTART was not specified, otherwise must not be present
1544 because the sequence would be taken from the PDB file. The first line contains
1545 the number of amino-acid residues, including the end groups (free format),
1546 the next lines contain the sequence in 20(1X,A3) format for the three-letter
1547 or 80A1 format for the one-letter code. There are two types of end-groups:
1548 Gly (three-letter code) or G (one-letter code), if an end group contains a full
1549 peptide bond (e.g., the acetyl N-terminal group or the carboxyamide C-terminal
1550 group) and D (in the three-letter code) or X (in the one-letter code), if the
1551 end group does not contain a peptide group (e.g., the NH2 N-terminal end group
1552 or the COOH C-terminal end group). (Note the Gly or G also denotes the regular
1553 glycine residue, if found in the middle of a chain).
1554 In the second case the end group is considered as a "dummy" group and serves
1555 only to define the first (last) virtual-bond dihedral angle gamma for the
1556 first (last) full amino-acid residue.
1558 Consider, for example, the Ac-Ala(19)-NHMe polypeptide. The three-letter code
1559 input will look like this:
1562 Gly Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala
1565 And the one-letter code input will be:
1568 GAAAAAAAAAAAAAAAAAAAG
1570 If the sequence is changed to NH3(+)-Ala(19)-COO(-), the inputs will look
1574 D Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala
1580 XAAAAAAAAAAAAAAAAAAAX
1582 The sequence input is case-insensitive, because the present version of UNRES
1583 considers each amino-acid residue as an L-residue (there are no torsional
1584 parameters for the combinations of the D- and L-residues yet). Furthermore,
1585 each peptide group is considered as a trans group.
1587 If the version of UNRES has multi-chain capacity, placing a dummy residue
1588 inside the sequence indicates start of a new chain. For example, a system
1589 composed of two Ala(10) chains can be specified as follows (3-letter code):
1592 D Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala D Ala Ala Ala Ala Ala Ala Ala Ala
1598 XAAAAAAAAAAXAAAAAAAAAAX
1601 8.1.11. Disulfide-bridge information (free format; subroutine READ_BRIDGE)
1602 --------------------------------------------------------------------------
1607 NS - the number of half-cystines (required even if no half-cystines are present)
1609 ISS(i) - the position of ith half-cystine in the sequence (starting from the
1610 N-terminal end group)
1612 next line(s) (present only, if ns>0 and must not appear otherwise):
1613 NSS,(IHPB(i),JHPB(i),i=1,NSS)
1615 NSS - the number of disulfide bridges; must not be greater than NS/2
1617 IHPB(i),JHPB(i) - the cystine residue forming the ith bridge.
1619 The program will check, whether the residues specified in the ISS list
1620 are cystines and terminate with error, if any of them is not. The program
1621 also checks, if the numbers from the IHPB and the JHPB lists have appeared
1624 8.1.12. Dihedral-angle restraint data (free format; subroutine MOLREAD)
1625 -----------------------------------------------------------------------
1627 This set of data specifies the harmonic constraints (if any) imposed on selected
1628 virtual-bond dihedral angles gamma.
1631 NDIH_CONSTR - the number of restrained gamma angles (required even if no
1632 restrains are applied).
1634 2nd line (present only, if NDIH_CONSTR > 0; must not appear otherwise):
1635 FTORS - the force constant expressed in kcal/(mol*rad**2)
1637 next NDIH_CONSTR lines (present only, if NDIH_CONSTR > 0):
1639 IDIH_CONSTR(i),PHI0(i),DRANGE(i)
1641 IDIH_CONSTR(i) - the number of ith restrained gamma angle. The angles are
1642 numbered after the LAST alpha-carbons. Thus, the first "real" angle has number
1643 4 and it corresponds to the rotation about the CA(2)-CA(3) virtual-bond axis
1644 and the last angle has the number NRES and corresponds to the rotation about
1645 the CA(NRES-2)-CA(NRES-1) virtual-bond axis.
1647 PHI0(i) - the "center" of the restraint (expressed in degrees)
1649 DRANGE(i) - the "flat well" range of the restraint (in degrees)
1651 The restraint energy for the ith restrained angle is expressed as:
1654 | FTORS*(GAMMA(IDIH_CONSTR(i))-PHI0(i)+DRANGE(i))**2,
1655 | if GAMMA(IDIH_CONSTR(i))<PHI0(i)-DRANGE(i)
1657 EDIH = < 0, if PHI0(i)-DRANGE(i) <= GAMMA(IDIH_CONSTR(i) <= PHI0(i)+DRANGE(i)
1659 | FTORS*(GAMMA(IDIH_CONSTR(i))-PHI0(i)-DRANGE(i))**2,
1660 | if GAMMA(IDIH_CONSTR(i))>PHI0(i)+DRANGE(i)
1663 Applying dihedral-angle constraints also implies that for ith constrained
1664 gamma angle the sampling be carried out from the
1665 [PHI0(i)-DRANGE(i)..PHI0(i)+DRANGE(i)] interval and not from the [-Pi..Pi]
1666 interval, if random conformations are generated. If only this and not
1667 restrained minimization is required, just set FTORS to 0.
1669 8.1.13 Distance restraints (subroutine READ_DIST_CONSTR)
1670 --------------------------------------------------------
1672 Restraints are imposed on Calpha...Calpha distances.
1674 NDIST=number (integer) (0) - number of restraints on specific distances.
1676 NFRAG=number (integer) (0) - number of distance-restrained protein segments.
1678 NPAIR=number (integer) (0) - number of distance-restrained pairs of segments.
1679 Specifying NPAIR requires specification of segments.
1681 IFRAG=start(1),end(1),start(2),end(2)...start(NFRAG),end(NFRAG) (integers)
1682 First and last residues of the distance restrained segments.
1684 WFRAG=w(1),w(2),...,w(NFRAG) (reals) - force constants or bases for force
1685 constant calculation corresponding to fragment restraints.
1687 IPAIR=start(1),end(1),start(2),end(2),...,start(NPAIR),end(NPAIR) (integers)
1688 numbers of segments (consecutive numbers of start or end pairs in IFRAG
1689 specification), the distances between which will be restrained.
1691 WPAIR=w(1),w(2),...,w(NFRAG) (reals) - force constants or bases for force
1692 constant calculation corresponding to pair restraints.
1694 DIST_CUT=number (real) (5.0) - the cut-off distance in angstroms for force-
1695 constant calculations.
1697 The force constants within fragments/between pairs of fragments are calculated
1698 depending on the value of DIST_CONSTR described in section 5.1:
1700 1 - all force constants are equal to the respective entries of WFRAG/WPAIR
1702 2 - the force constants are equal to the respective entries of WFRAG/WPAIR
1703 when the distance between the Calpha atoms in the reference structure
1704 <=D_CUT, 0 otherwise.
1706 3 - the force constants are calculated from the formula:
1708 k(CA_j,CA_k)=W*exp{-[d(CA_j,CA_k)/DIST_CUT)]**2/2}
1710 where k(CA_j,CA_k) is the force constant between the respective Calpha atoms,
1711 d(CA_j,CA_k) is the distance between these Calpha atoms in the reference
1712 structure, and W is the basis for force-constant calculation (see above).
1714 If NDIST>0, the restraints on specific distance are subsequently input:
1716 ihpb(i), jhpb(i), forcon(i), i=1,NDIST
1718 where ihpb(i) and jhpb(i) are the numbers of the residues the distance
1719 between the Calpha atoms of which will be distance restrained and forcon(i)
1720 is the respective force constant.
1722 8.1.14 Internal coordinates of the reference structure (free format;
1723 --------------------------------------------------------------------
1724 subroutine READ_ANGLES)
1725 -----------------------
1727 This part of the data is present, if REFSTR, but not PDBREF was specified,
1728 otherwise must not appear. It contains the following group of variables:
1730 (THETA(i),i=3,NRES) - the virtual-bond valence angles THETA
1731 (PHI(i),i=4,NRES) - the virtual-bond dihedral angles GAMMA
1732 (ALPH(i),i=2,NRES-1)- the ALPHA polar angles of consecutive side chains
1733 (OMEG(i),i=2,NRES-1)- the BETA polar angles of consecutive side chains.
1735 ALPHA(i) and OMEG(i) correspond to the side chain attached to CA(i). THETA(i)
1736 is the CA(i-2)-CA(i-1)-CA(i) virtual-bond angle and PHI(i) is the
1737 CA(i-3)-CA(i-2)-CA(i-1)-CA(i) virtual-bond dihedral angle gamma.
1739 8.1.15 Internal coordinates of the initial conformation (free format;
1740 ---------------------------------------------------------------------
1741 subroutine READ_ANGLES)
1742 -----------------------
1744 This part of the data is present, if RAND_CONF, MULTCONF, THREAD, or PDBSTART
1745 were not specified, otherwise must not appear. This input is as in section 10.
1747 8.1.15.1 File name with internal coordinates of the conformations to be processed
1748 ---------------------------------------------------------------------------------
1749 (text format; subroutine MOLREAD)
1750 ---------------------------------
1752 This data is present only, if MULTCONF was specified. It contains the name of
1753 the file with the internal coordinates. Up to 64 characters are allowed.
1754 The structure of the file is that of the *.int file produced by UNRES/CSA.
1755 See section "The structure of the INT files" for details.
1757 8.1.16 Control data for energy map construction (data lists; subroutine MAP_READ)
1758 ---------------------------------------------------------------------------------
1760 These data lists appear, if NMAP=n was specified, where n is the number of
1761 variables that will be grid-searched. One list is per one variable or a
1762 group of variables set equal (see below):
1764 PHI - the variable is a virtual-bond dihedral angle gamma
1765 THE - the variable is a virtual-bond angle theta
1766 ALP - the variable is a side-chain polar angle alpha
1767 OME - the variable is a side-chain polar angle beta
1769 RES1=number (integer)
1770 RES2=number (integer)
1772 The range of residues for which the values will be set; all these variables
1773 will be set at the same value. It is required that RES2 > RES1.
1778 Lower and upper limit of scanning in grid search (in degrees)
1780 NSTEP=number (integer)
1782 Number of steps in scanning along this variable/group of variables.
1784 8.2. Input coordinate files
1785 ---------------------------
1787 At present, geometry can be input either from the external files in the PDB
1788 format (with the PDBSTART option) or multiple conformations can be read
1789 as virtual-bond-valence and virtual-bond dihedral angles when the MULTCONF
1790 option is used (the latter, however, implies using standard virtual-bond
1791 lengths as initial values). The structure of internal-coordinate files
1792 is the same as that of output internal-coordinate files described in section
1795 8.3. Other input files
1796 ----------------------
1798 CSA parameters can optionally be read in free format from file INPUT.CSA.in
1799 (see section 8.1.4). When a CSA run is restarted, the CSA-specific output files
1800 also serve as input files. INPUT is the prefix of input and output files
1801 as explained in section 6.
1803 Restart files for MD and REMD simulations. They are read when the keyword
1804 RESTART appears on the MD/REMD data group (section 8.1.6).
1809 UNRES "main" output files (INPUT.out_${POT}[processor]) are log files from
1810 a run. They contain the information of the molecule, force field, calculation
1811 type, control parameters, etc.; however, not the structures produced during
1812 the run or their energies except single-point energy evaluation and
1813 minimization-related runs. The structural information is included in
1814 coordinate files (*.int, *.x, *.pdb, *.mol2, *.cx) and statistics files (*.stat),
1815 respectively; these files are further processed by other programs (WHAM,
1816 CLUSTER) or can be viewed by molecular viewers (pdb or mol2 files).
1818 9.1. Coordinate files
1819 ---------------------
1821 9.1.1. The internal coordinate (INT) file
1822 ------------------------------------------
1825 This file contains the internal coordinates of the conformations produced
1826 by UNRES in non-MD runs. The virtual-bond lengths are assumed constant so
1827 only the angular variables are provided (see ref
1829 IT,ENER,NSS,(IHPB(I),JHPB(I),I=1,NSS)
1830 (I5,F12.5,I2,9(1X,2I3))
1832 IT - the number of the conformation
1834 NSS - the number of disulfide bridges
1835 (IHPB(I),JHPB(I),I=1,NSS) - the positions of the pairs of half-cystines
1836 forming the bridges. If NSS>9, the remaining pairs are written in the
1837 following lines in the (3X,11(1X,2I3)) format.
1842 The virtual-bond angles THETA (in degrees)
1847 The virtual-bond dihedral angles GAMMA (in degrees)
1849 (ALPH(I),I=2,NRES-1)
1850 (OMEG(I),I=2,NRES-1)
1853 The polar angles ALPHA and BETA of the side-chain centers (in degrees).
1855 9.1.2. The plain Cartesian coordinate (X) files (subroutine CARTOUT)
1856 --------------------------------------------------------------------
1858 This file contains the Cartesian coordinates of the alpha-carbon and
1859 side-chain-center coordinates. All conformations from an MD/MREMD
1860 trajectory are collated to a single file. The structure of each
1861 conformation's record is as follows:
1863 1st line: time,potE,uconst,t_bath,nss,(ihpb(j),jhpb(j),j=1,nss),
1864 nrestr,(qfrag(i),i=1,nfrag),(qpair(i),i=1,npair),
1865 (utheta(i),ugamma(i),uscdiff(i),i=1,nfrag_back)
1867 time: MD time (in "molecular time units"; 1 mtu = 4.89 fs),
1868 potE: potential energy,
1869 uconst: restraint energy corresponding to restraints on Q and backbone geometry,
1871 t_bath: thermostat temperature,
1872 nss: number of disulfide bonds,
1873 ihpb(j), jhpb(j): the numbers of linked cystines for jth disulfide bond,
1874 nrestr: number of restraints on q and local geometry,
1875 qfrag(i): q value for ith fragment,
1876 qpair(i): q value for ith pair,
1877 utheta(i): sum of squares of the differences between the theta angles
1878 of the current conformation from those of the experimental conformation,
1879 ugamma(i): sum of squares of the differences beaten the gamma angles
1880 of the current conformation from those of the experimental conformation,
1881 uscdiff(i): sum of squares of the differences between the Cartesian difference
1882 of the unit vector of the Calpha-SC axis of the current conformation from
1883 those of the experimental conformation.
1885 Next lines: Cartesian coordinates of the Calpha atoms (including dummy atoms)
1886 (sequentially, 10 coordinates per line)
1887 Next lines: Cartesian coordinates of the SC atoms (including glycines and
1888 dummy atoms) (sequentially, 10 coordinates per line)
1890 9.1.3. The compressed Cartesian coordinate (CX) files
1891 -----------------------------------------------------
1893 These files are compressed binary files (extension cx). For each conformation,
1894 the items are written in the same order as specified in section 9.1.2. For
1895 MREMD runs, if TRAJ1FILE is specified on MREMD record (see section 8.1.6),
1896 snapshots from all trajectories are written every time the coordinates
1897 are dumped. Thus, the file contains snapshot 1 from trajectory 1, ...,
1898 snapshot 1 from trajectory M, snapshot 2 from trajectory 1, ..., etc.
1900 The compressed cx files can be converted to pdb file by using the xdrf2pdb
1901 auxiliary program (single trajectory files) or xdrf2pdb-m program (multiple
1902 trajectory files from MREMD runs generated by using the TRAJ1FILE option).
1903 The multiple-trajectory cx files are also input files for the auxiliary
1906 9.1.4. The Brookhaven Protein Data Bank format (PDB) files (subroutine PDBOUT)
1907 ------------------------------------------------------------------------------
1909 These files are written in PDB standard (see. e.g.,
1910 ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_Letter.pdf).
1911 The REMARK, ATOM, SSBOND, HELIX, SHEET, CONECT, TER, and ENDMDL are used.
1912 The Calpha (marked CA) and SC (marked CB) coordinates are output. The CONECT
1913 records specify the Calpha...Calpha and Calpha...SC virtual bonds. Secondary
1914 structure is detected based on peptide-group contacts, as specified in
1915 ref 12. Dummy residues are omitted from the output. If the program has
1916 multiple-chain function, the presence of a dummy residue in a sequence
1917 starts a new chain, which is assigned the next alphabet letter as ID, and
1918 residue numbering is started over.
1920 9.1.5. The SYBYLL (MOL2) files
1921 ------------------------------
1923 See the description of mol2 format (e.g.,
1924 http://tripos.com/data/support/mol2.pdf). Similar remarks apply as for
1925 the PDB format (section 9.1.4).
1927 9.2. The summary (STAT) file
1928 ----------------------------
1933 This file contains a short summary of the quantities characterizing the
1934 conformations produced by UNRES/CSA. It is created for MULTCONF and MCM.
1936 NOUT,EVDW,EVDW2,EVDW1+EES,ECORR,EBE,ESCLOC,ETORS,ETOT,RMS,FRAC
1939 NOUT - the number of the conformations
1941 EVDW,EVDW2,EVDW1+EES,ECORR,EBE,ESCLOC,ETORS - energy components
1945 RMS - RMS deviation from the reference structure (if REFSTR was specified)
1947 FRAC - fraction of side chain - side chain contacts of the reference
1948 structure present in this conformation (if REFSTR was specified)
1950 9.2.2. MD and MREMD runs
1951 -------------------------
1953 Each line of the stat file generated by MD/MREMD runs contains the following
1956 step - the number of the MD step
1958 time - time [unit is MTU (molecular time unit) equal to 48.9 fs]
1960 Ekin - kinetic energy [kcal/mol]
1962 Epot - potential energy [kcal/mol]
1964 Etot - total energy (Ekin+Epot)
1966 H-H0 - the difference between the cureent and initial extended Hamiltionian
1967 in Nose-Hoover or Nose-Poincare runs; not present for other thermostats.
1969 RMSD - root mean square deviation from the reference structure (only in
1970 REFSTR has been specified)
1972 damax - maximum change of acceleration between two MD steps
1974 fracn - fraction of native side-chain concacts (very crude, based on
1975 SC-SC distance only)
1977 fracnn - fraction of non-native side-chain contacts
1981 temp - actual temperature [K]
1983 T0 - initial (microcanonical runs) or thermostat (other run types)
1986 Rgyr - radius of gyration based on Calpha coordinates [A]
1988 proc - in MREMD runs the number of the processor (the number of the
1989 trajectory less 1); not present for other runs.
1991 For an USAMPL run, the following items follow the above list:
1993 iset - the number of the restraint set
1995 uconst - restraint energy pertaining to q-values
1997 uconst_back - restraint energy pertaining to virtual-backbone restraints
1999 (qfrag(i),i=1,nfrag) - q values of the specified fragments
2001 (qpair(ii2),ii2=1,npair) - q values of the specified pairs of fragments
2003 (utheta(i),ugamma(i),uscdiff(i),i=1,nfrag_back) - virtual-backbone and
2004 side-chain-rotamer restraint energies of the fragments specified
2006 If PRINT_COMPON has been specified, the energy components are printed
2007 after the items described above.
2009 9.3. CSA-specific output files
2010 ------------------------------
2012 There are several output files from the CSA routine:
2013 INPUT.CSA.seed, INPUT.CSA.history, INPUT.CSA.bank, INPUT.CSA.bank1,
2014 INPUT.CSA.rbank INPUT.CSA.alpha, INPUT.CSA.alpha1.
2016 The most informative outfile is INPUT.CSA.history. This file first write down
2017 the parameters in INPUT.CSA.csa file. Later it shows the energies of random
2018 minimized conformations in it's generation. After sorting the First_bank
2019 in energy (ascending order), the energies of the First_bank is re-written here.
2020 After this the output looks like:
2021 1 0 100 6048.2 1 100-224.124-114.346 202607 100 100
2022 1 0 700 5882.6 2 29-235.019-203.556 1130308 100 100
2023 1 0 1300 5721.5 2 18-242.245-212.138 2028008 100 100
2024 1 0 1900 5564.8 13 54-245.185-218.087 2897988 98 100
2025 1 0 2500 5412.4 13 61-246.214-222.068 3706478 97 100
2026 1 0 3100 5264.2 13 89-248.715-224.939 4514196 96 100
2028 Each line is written between each iteration (just after selection
2029 of seed conformations) containing following data:
2030 jlee,icycle,nstep,cutdif,ibmin,ibmax,ebmin,ebmax,nft,iuse,nbank
2031 ibmin and ibmax lists the index of bank conformations corresponding to the
2032 lowest and highest energies with ebmin and ebmax.
2033 nft is the total number of function evaluations so far.
2034 iuse is the total number of conformations which have not been used as seeds
2035 prior to calling subroutine select_is which select seeds.
2037 Therefore, in the example shown above, one notes that so far 3100
2038 minimizations has been performed corresponding to the total of 4514196
2039 function evaluations. The lowest and highest energy in the Bank is
2040 -248.715 (#13) and -224.939 (#89), respectively. The number of conformations
2041 already used as seeds (not including those selected as seeds in this iteration)
2042 so far is 4 (100-96).
2044 The files INPUT.CSA.bank and INPUT.CSA.rbank contains data of Bank and
2045 First_bank. For more information on these look subroutines write_bank
2046 and write_rbank. The file INPUT.CSA.bank is overwritten between each
2047 iteration whereas Bank is accumulated in INPUT.CSA.bank1 (not for every
2048 iteration but as specified in the subroutine together.f).
2050 The file INPUT.CSA.seed lists the index of the seed conformations with their
2051 energies. Files INPUT.CSA.alpha, INPUT.CSA.alpha1 are written only once
2052 at the beginning of the CSA run. These files contain some arrays used
2055 10. TECHNICAL SUPPORT CONTACT INFORMATION
2056 -----------------------------------------
2059 Faculty of Chemistry, University of Gdansk
2060 ul. Sobieskiego 18, 80-952 Gdansk Poland.
2061 phone: +48 58 523 5430
2062 fax: +48 58 523 5472
2063 e-mail: adam@chem.univ.gda.pl
2065 Dr. Cezary Czaplewski
2066 Faculty of Chemistry, University of Gdansk
2067 ul. Sobieskiego 18, 80-952 Gdansk Poland.
2068 phone: +48 58 523 5430
2069 fax: +48 58 523 5472
2070 e-mail: czarek@chem.univ.gda.pl
2072 Dr. Stanislaw Oldziej
2073 Intercollegiate Faculty of Biotechnology
2074 University of Gdansk, Medical University of Gdansk
2075 ul. Kladki 22, 80-922 Gdansk, Poland
2076 phone: +48 58 523 5361
2077 fax: +48 58 523 5472
2078 e-mail: stan@biotech.ug.gda.pl
2081 Korea Institute for Advanced Study
2082 207-43 Cheongnyangni 2-dong, Dongdaemun-gu,
2083 Seoul 130-722, Korea
2084 phone: +82-2-958-3890
2086 email: jlee@kias.re.kr
2088 Prepared by Adam Liwo and Jooyoung Lee, 7/17/99
2089 Revised by Cezary Czaplewski 1/4/01
2090 Revised by Cezary Czaplewski and Adam Liwo 8/26/03
2091 Revised by Cezary Czaplewski and Adam Liwo 11/26/11
2092 Revised by Adam Liwo 02/19/12