X-Git-Url: http://mmka.chem.univ.gda.pl/gitweb/?p=unres.git;a=blobdiff_plain;f=doc%2F3.1%2Fascii-text%2FUNRES_all.TXT;fp=doc%2F3.1%2Fascii-text%2FUNRES_all.TXT;h=ff5d4af35ee91f8b6ffa31c9c4b5463fefb9b6e5;hp=0000000000000000000000000000000000000000;hb=b75425747e3e2b448ca5e0ef8367712e1f339124;hpb=ce0fdb6f6e62a69055ec0f737c339851ee4b43b4 diff --git a/doc/3.1/ascii-text/UNRES_all.TXT b/doc/3.1/ascii-text/UNRES_all.TXT new file mode 100644 index 0000000..ff5d4af --- /dev/null +++ b/doc/3.1/ascii-text/UNRES_all.TXT @@ -0,0 +1,2093 @@ + UNRES - A PROGRAM FOR COARSE-GRAINED SIMULATIONS OF PROTEINS + ------------------------------------------------------------ + +TABLE OF CONTENTS +----------------- + +1. License terms + +2. Credits + +3. General information + 3.1. Purpose + 3.2. Functions of the program + 3.2. Companion programs + 3.4. Programming language + 3.5. References + +4. Installation + +5. Customizing your batch and C-shell script + +6. Command line and files + +7. Force fields + +8. Input files + 8.1. Main input data file + 8.1.1. Title + 8.1.2. Control data (data list format; READ_CONTROL subroutine) + 8.1.2.1 Keywords to chose calculation type + 8.1.2.2 Specification of protein and structure output in non-MD + applications + 8.1.2.3. Miscellaneous + 8.1.3. Minimizer options (data list, subroutine READ_MINIM) + 8.1.4. CSA control parameters + 8.1.5. MCM data (data list, subroutine MCMREAD) + 8.1.6. MD data (subroutine READ_MDPAR) + 8.1.7. REMD/MREMD data (subroutine READ_REMDPAR) + 8.1.8. Energy-term weights (data list; subroutine MOLREAD) + 8.1.9. Input and/or reference PDB file name (text format; subroutine MOLREAD) + 8.1.10. Amino-acid sequence (free and text format) + 8.1.11. Disulfide-bridge information (free format; subroutine READ_BRIDGE) + 8.1.12. Dihedral-angle restraint data (free format; subroutine MOLREAD) + 8.1.13. Distance restraints (subroutine READ_DIST_CONSTR) + 8.1.14. Internal coordinates of the reference structure (free format; + subroutine READ_ANGLES) + 8.1.15. Internal coordinates of the initial conformation (free format; + subroutine READ_ANGLES) + 8.1.15.1. File name with internal coordinates of the conformations + to be processed + 8.1.16 Control data for energy map construction (data lists; + subroutine MAP_READ) + 8.2. Parameter files + 8.3. Input coordinate files + 8.4. Other input files + +9. Output files + 9.1. Coordinate files + 9.1.1. The internal coordinate (INT) files + 9.1.2. The plain Cartesian coordinate (X) files + 9.1.3. The compressed Cartesian coordinate (CX) files + 9.1.4. The Brookhaven Protein Data Bank format (PDB) files + 9.1.5. The SYBYLL (MOL2) files + 9.2. The summary (STAT) file + 9.2.1. Non-MD runs + 8.2.2. MD and MREMD runs + 9.3. CSA-specific output files + +10. Technical support contact information + +1. LICENSE TERMS +---------------- + +* This software is provided free of charge to academic users, subject to the + condition that no part of it be sold or used otherwise for commercial + purposes, including, but not limited to its incorporation into commercial + software packages, without written consent from the authors. For permission + contact Prof. H. A. Scheraga, Cornell University. + +* This software package is provided on an "as is" basis. We in no way warrant + either this software or results it may produce. + +* Reports or publications using this software package must contain an + acknowledgment to the authors and the NIH Resource in the form commonly used + in academic research. + +2. CREDITS +---------- + +The current and former developers of UNRES are listed in this section in alphabetic +order together with their current or former affiliations. + +Maurizio Chinchio (formerly Cornell Univ., USA) +Cezary Czaplewski (Univ. of Gdansk, Poland) +Carlo Guardiani (Georgia State Univ., USA) +Yi He (Cornell Univ., USA) +Justyna Iwaszkiewicz (Swiss Institute of Bioinformatics, Switzerland) +Dawid Jagiela (Univ. of Gdansk, Poland) +Stanislaw Jaworski (deceased) +Sebastian Kalinowski (Univ. of Gdansk, Poland) +Urszula Kozlowska (deceased) +Rajmund Kazmierkiewicz (Univ. of Gdansk, Poland) +Jooyoung Lee (Korea Institute for Advanced Studies, Korea) +Adam Liwo (Univ. of Gdansk, Poland) +Mariusz Makowski (Univ. of Gdansk, Poland) +Marian Nanias (formerly Cornell Univ., USA) +Stanislaw Oldziej (Univ. of Gdansk, Poland) +Jaroslaw Pillardy (Cornell Univ., USA) +Daniel Ripoll (formerly Cornell Univ., USA) +Jeff Saunders (Schrodinger Inc., USA) +Harold A. Scheraga (Cornell Univ., USA) +Hujun Shen (Dalian Institute of Chemical Physics, P.R. China) +Adam Sieradzan (Univ. of Gdansk, Poland) +Ryszard Wawak (formerly Cornell Univ., USA) +Bartlomiej Zaborowski (Univ. of Gdansk, Poland) + +3. GENERAL INFORMATION +---------------------- + +3.1. Purpose +------------ + +Run coarse-grained calculations of polypeptide chains with the UNRES force field. +There are two versions of the package which should be kept separate because of +non-overlapping functions: version which runs global optimization (Conformational +Space Annealing, CSA) and version that runs coarse-grained molecular dynamics and +its extension. Because the installation, input file preparation and running CSA +and MD versions are similar, a common manual is provided. Items specific +for the CSA and MD version are marked "CSA" and "MD", respectively. + +MD version can be used to run multiple-chain proteins (however, that version of +the code is a new release and might fail if yet un-checked functions are used). +The multi-chain CSA version for this purpose is another package (written largely in +C++). + +3.2. Functions of the program +----------------------------- + +1. Perform energy evaluation of a single or multiple conformations + (serial and parallel) (CSA and MD) + +2. Run canonical mesoscopic molecular dynamics (serial and parallel) (MD). + +3. Run replica exchange (REMD) and multiplexing replica exchange (MREMD) + dynamics (parallel only) (MD). + +4. Run multicanonical molecular dynamics (parallel only) (MD). + +5. Run energy minimization (serial and parallel) (CSA and MD). + +6. Run conformational space annealing (CSA search) (parallel only) (CSA). + +7. Run Monte Carlo plus Minimization (MCM) (parallel only) (CSA). + +8. Run conformational family Monte Carlo (CFMC) calculations (CSA). + +9. Thread the sequence against a database from the PDB and minimize energy of + each structure (CSA). + +Energy and force evaluation is parallelized in MD version. + +3.3. Companion programs +----------------------- + +The structures produced by UNRES can be used as inputs to the following programs provided +with this package or separately: + +xdrf2pdb - converts the compressed coordinate files from MD (but not MREMD)runs into + PDB format. + +xdrf2pdb-m - same for MREMD runs (multiple trajectory capacity). + +xdrf2x - converts the plain Cartesian coordinate files into PDB format. + +WHAM - processes the coordinate files from MREMD runs and computes temperature profiles + of ensemble averages and computes the probabilities of conformations at selected + temperatures; also prepares data for CLUSTER and ZSCORE. + +CLUSTER - does the cluster analysis of the conformations; for MREMD runs takes the + coordinate files from WHAM which contain information to compute probabilities + of conformations at any temperature. + +PHOENIX - conversion of UNRES conformations to all-atom conformations. + +ZSCORE - force field optimization (for developers). + +Please consult the manuals of the corresponding packages for details. Note that not +all of these packages are released yet; they will be released depending on their +readiness for distribution. Contact Adam Liwo, Cezary Czaplewski or Stanislaw Oldziej +for developmental versions of these programs. + +3.4. Programming language +------------------------- + +This version of UNRES is written almost exclusively in Fortran 77; some subroutines +for data management are in ansi-C. The package was parallelized with MPI. + +3.5. References +--------------- + +Citing the following references in your work that makes use of UNRES is gratefully +acknowledged: + +[1] A. Liwo, S. Oldziej, M.R. Pincus, R.J. Wawak, S. Rackovsky, H.A. Scheraga. + A united-residue force field for off-lattice protein-structure simulations. + I: Functional forms and parameters of long-range side-chain interaction potentials + from protein crystal data. J. Comput. Chem., 1997, 18, 849-873. + +[2] A. Liwo, M.R. Pincus, R.J. Wawak, S. Rackovsky, S. Oldziej, H.A. Scheraga. + A united-residue force field for off-lattice protein-structure simulations. + II: Parameterization of local interactions and determination + of the weights of energy terms by Z-score optimization. + J. Comput. Chem., 1997, 18, 874-887. + +[3] A. Liwo, R. Kazmierkiewicz, C. Czaplewski, M. Groth, S. Oldziej, R.J. Wawak, + S. Rackovsky, M.R. Pincus, H.A. Scheraga. + United-residue force field for off-lattice protein-structure simulations. + III. Origin of backbone hydrogen-bonding cooperativity in united-residue potentials. + J. Comput. Chem., 1998, 19, 259-276. + +[4] A. Liwo, C. Czaplewski, J. Pillardy, H.A. Scheraga. + Cumulant-based expressions for the multibody terms for the correlation between + local and electrostatic interactions in the united-residue force field. + J. Chem. Phys., 2001, 115, 2323-2347. + +[5] J. Lee, D.R. Ripoll, C. Czaplewski, J. Pillardy, W.J. Wedemeyer, H.A. Scheraga, + Optimization of parameters in macromolecular potential energy functions by + conformational space annealing. J. Phys. Chem. B, 2001, 105, 7291-7298 + +[6] J. Pillardy, C. Czaplewski, A. Liwo, W.J. Wedemeyer, J. Lee, D.R. Ripoll, + P. Arlukowicz, S. Oldziej, Y.A. Arnautova, H.A. Scheraga, + Development of physics-based energy functions that predict medium-resolution + structures for proteins of the alpha, beta, and alpha/beta structural classes. + J. Phys. Chem. B, 2001, 105, 7299-7311 + +[7] A. Liwo, P. Arlukowicz, C. Czaplewski, S. Oldziej, J. Pillardy, H.A. Scheraga. + A method for optimizing potential-energy functions by a hierarchical design + of the potential-energy landscape: Application to the UNRES force field. + Proc. Natl. Acad. Sci. U.S.A., 2002, 99, 1937-1942. + +[8] J. A. Saunders and H.A. Scheraga. + Ab initio structure prediction of two $\alpha$-helical oligomers + with a multiple-chain united-residue force field and global search. + Biopolymers, 2003, 68, 300-317. + +[9] J.A. Saunders and H.A. Scheraga. + Challenges in structure prediction of oligomeric proteins at the united-residue + level: searching the multiple-chain energy landscape with CSA and CFMC procedures. + Biopolymers, 2003, 68, 318-332. + +[10] S. Oldziej, U. Kozlowska, A. Liwo, H.A. Scheraga. + Determination of the potentials of mean force for rotation about Calpha-Calpha + virtual bonds in polypeptides from the ab initio energy surfaces of terminally + blocked glycine, alanine, and proline. J. Phys. Chem. A, 2003, 107, 8035-8046. + +[11] A. Liwo, S. Oldziej, C. Czaplewski, U. Kozlowska, H.A. Scheraga. + Parameterization of backbone-electrostatic and multibody contributions + to the UNRES force field for protein-structure prediction from ab initio + energy surfaces of model systems. J. Phys. A, 2004, 108, 9421-9438. + +[12] S. Oldziej, A. Liwo, C. Czaplewski, J. Pillardy, H.A. Scheraga. + Optimization of the UNRES force field by hierarchical design of the + potential-energy landscape. 2. Off-lattice tests of the method with single + proteins. J. Phys. Chem. B., 2004, 108, 16934-16949. + +[13] S. Oldziej, J. Lagiewka, A. Liwo, C. Czaplewski, M. Chinchio, + M. Nanias, H.A. Scheraga. + Optimization of the UNRES force field by hierarchical design of the + potential-energy landscape. 3. Use of many proteins in optimization. + J. Phys. Chem. B., 2004, 108, 16950-16959. + +[14] M. Khalili, A. Liwo, F. Rakowski, P. Grochowski, H.A. Scheraga. + Molecular dynamics with the united-residue model of polypeptide chains. + I. Lagrange equations of motion and tests of numerical stability in the + microcanonical mode, J. Phys. Chem. B, 2005, 109, 13785-13797. + +[15] M. Khalili, A. Liwo, A. Jagielska, H.A. Scheraga. + Molecular dynamics with the united-residue model of polypeptide chains. + II. Langevin and Berendsen-bath dynamics and tests on model $\alpha$-helical + systems. J. Phys. Chem. B, 2005, 109, 13798-13810. + +[16] A. Liwo, M. Khalili, H.A. Scheraga. + Ab initio simulations of protein-folding pathways by molecular dynamics with + the united-residue model of polypeptide chains. + Proc. Natl. Acad. Sci. U.S.A., 2005, 102, 2362-2367. + +[17] F. Rakowski, P. Grochowski, B. Lesyng, A. Liwo, H. A. Scheraga. + Implementation of a symplectic multiple-time-step molecular dynamics algorithm, + based on the united-residue mesoscopic potential energy function. + J. Chem. Phys., 2006, 125, 204107. + +[18] M. Nanias, C. Czaplewski, H.A. Scheraga. + Replica exchange and multicanonical algorithms with the coarse-grained + united-residue (UNRES) force field. + J. Chem. Theory and Comput., 2006, 2, 513-528. + +[19] A. Liwo, M. Khalili, C. Czaplewski, S. Kalinowski, S. Oldziej, K. Wachucik, + H.A. Scheraga. + Modification and optimization of the united-residue (UNRES) potential energy + function for canonical simulations. I. Temperature dependence of the effective + energy function and tests of the optimization method with single training + proteins. + J. Phys. Chem. B, 2007, 111, 260-285. + +[20] U. Kozlowska, A. Liwo, H.A. Scheraga. + Determination of virtual-bond-angle potentials of mean force for coarse-grained + simulations of protein structure and folding from ab initio energy surfaces of + terminally-blocked glycine, alanine, and proline. + J. Phys.: Condens. Matter, 2007, 19, 285203. + +[21] M. Chinchio, C. Czaplewski, A. Liwo, S. Oldziej, H.A. Scheraga. + Dynamic formation and breaking of disulfide bonds in molecular dynamics + simulations with the UNRES force field. + J. Chem. Theory and Comput., 2007, 3, 1236-1248. + +[22] A.V. Rojas, A. Liwo, H.A. Scheraga. + Molecular dynamics with the united-residue force field: Ab Initio folding + simulations of multichain proteins. + J. Phys. Chem. B, 2007, 111, 293-309. + +[23] A. Liwo, C. Czaplewski, S. Oldziej, A.V. Rojas, R. Kazmierkiewicz, + M. Makowski, R.K. Murarka, H.A. Scheraga. + Simulation of protein structure and dynamics with the coarse-grained UNRES + force field. In: Coarse-Graining of Condensed Phase and Biomolecular + Systems., ed. G. Voth, Taylor & Francis, 2008, Chapter 8, pp. 107-122. + +[24] C. Czaplewski, S. Kalinowski, A. Liwo, H.A. Scheraga. + Application of multiplexed replica exchange molecular dynamics + to the UNRES force field: tests with $\alpha$ and $\alpha+\beta$ proteins. + J. Chem. Theor. Comput., 2009, 5, 627-640. + +[24] Y. He, Y. Xiao, A. Liwo, H.A. Scheraga. + Exploring the parameter space of the coarse-grained UNRES force field by random + search: selecting a transferable medium-resolution force field. + J. Comput. Chem., 2009, 30, 2127-2135. + +[25] U. Kozlowska, A. Liwo. H.A. Scheraga. + Determination of side-chain-rotamer and side-chain and backbone + virtual-bond-stretching potentials of mean force from AM1 energy surfaces of + terminally-blocked amino-acid residues, for coarse-grained simulations of + protein structure and folding. 1. The Method. + J. Comput. Chem., 2010, 31, 1143-1153. + +[26] U. Kozlowska, G.G. Maisuradze, A. Liwo, H.A. Scheraga. + Determination of side-chain-rotamer and side-chain and backbone + virtual-bond-stretching potentials of mean force from AM1 energy surfaces of + terminally-blocked amino-acid residues, for coarse-grained simulations of + protein structure and folding. 2. Results, comparison with statistical + potentials, and implementation in the UNRES force field. + J. Comput. Chem., 2010, 31, 1154-1167. + +[27] A. Liwo, S. Oldziej, C. Czaplewski, D.S. Kleinerman, P. Blood, H.A. Scheraga. + Implementation of molecular dynamics and its extensions with the coarse-grained + UNRES force field on massively parallel systems; towards millisecond-scale + simulations of protein structure, dynamics, and thermodynamics. + J. Chem. Theor. Comput., 2010, 6, 890-909. + +4. INSTALLATION +--------------- + +The distribution is contained in the UNRES.tar.gz file. To uncompress say: + +gzip -cd UNRES.tar.gz | tar xf - + +This will produce a directory named UNRES with the following subdirectories: + +src_CSA - the CSA-version source directory. + +src_MD - the MD-version source directory, single chains. + +src_MD-M - the MD-version source directory, oligomeric proteins + +bin - the binaries/scripts directory; its BATCH_SCRIPTS directory contains the + batch scripts (at present the only example is for PBS: unres_3P_PBS.csh, + which is an UNRES calling script and start.mat, which is the batch script + submitted to the PBS system). + +doc - documentation (this file and EXAMPLES.TXT) + +examples - sample input files (see EXAMPLES.TXT for description). + +To produce the executable do the following: + +a) To build parallel version, make sure that MPI is installed in your system. + Note that the package will have limited functions when compiled in a single-CPU mode. + On linux cluster the command source $HOME/.env should be added to .tcshrc + or equivalent file to use parallel version of the program, the + alternative is to use queuing system like PBS. + In some cases the FORTRAN library subroutine GETENV does not work properly + with MPI, if the script is run interactively. In such a case try to + add the source mygentenv.F and turn on the -DMYGETENV preprocessor flag. + +b) Change directory to the respective source directory. + +c) Edit the appropriate Makefile (parallel program that includes CSA + procedure, the serial version is no longer supported, for serial task + parallel program can be run using only one processor) to customize to your + system. Makefiles for the following systems are provided: + + Makefile_osf_f90 - OSF1/Tru64 UNIX HP Alphaserver with f90 compiler, + Makefile_lnx_pgf90 - Linux, the pgf90 compiler, + Makefile_lnx_ifc - Linux, ifc compiler. + Makefile_win_pgf90 - Windows, the pgf90 compiler. + + Other systems should not cause problems; all you have to do is to change + the compiler, compiler options, and preprocessor options. Also, change the + BIN variable, if you want to put your binaries in other place than + PROTARCH/BIN. In the case of Makefile make sure that the MPI directories are + correctly specified. + + The following architectures are defined in the .F source files: + + AIX - AIX systems (put -DAIX as one of the preprocessor options, if + this is your system) + + LINUX - Linux (put -DLINUX) + + G77 - Gnu-Fortran compilers (might require sum moderate source code editing) + (put -DG77). The recommended compiler is gfortran and not g77. + + PGI - PGI compilers + + WINPGI - additional setting for PGI compilers for MS Windows + + SGI - all SGI platforms; should also be good for SUN platforms (put -DSGI) + + WIN - MS Windows with Digital Fortran compiler (put -DWIN) + + For other platforms, the only problems might appear in connection with + machine-specific I/O instructions. Many files are opened in the append + mode, whose specification in the OPEN statement is quite machine-dependent. + In this case you might need to modify the source code accordingly. + The other platform dependent routines are the timing routines contained + in timing.F. In addition to the platforms specified above, ES9000, SUN, + KSR, and CRAY are defined there. + + For parallel build -DMP and -DMPI must be set (these are set in Makefile). + + IMPORTANT! Apart from this, two define flags: -DCRYST_TOR and -DMOMENT + define earlier versions of the force field. The MUST NOT be entered, if + the CASP5 and later versions of the force field are used. + +d) Build the unres executables by typing at your UNIX prompt: + + make # will build unres + + make clean # will remove the object files + + The bin directory contains pre-built binaries for Red Hat Linux. These + executables are specified in the csh scripts listed in section 4. + +e) Customize the C-shell scripts unres.unres (to run the parallel version on + set of workstation). See the next section of this manual for guidance. + +After the executables are build and C-shell scripts customized, you can run the +test examples contained in UNRES/examples. + +5. CUSTOMIZING YOUR C-SHELL SCRIPT +---------------------------------- + +IMPORTANT NOTE - The unres.csh script is for Linux and should also be easily +adaptable to other systems running MPICH. This script is for interactive +parallel jobs. Examples of scripts compatible with PBS (pbs.sub) and LoadLever +(sp2.sub) queuing systems are also provided. + +Edit the following lines in your unres.csh script: + +set DD = your_database_directory + +e.g., if you installed the package on the directory /usr/local, this line +looks like this: + +set DD = /usr/local/UNRES/PARAM + +set BIN = your_binaries_directory + +set FGPROCS = number_of_processors_per_energy/force_evaluation (MD) + +e.g., if the root directory is as above: + +set BIN = /usr/local/UNRES/bin + +6. COMMAND LINE AND FILES +------------------------- + +To run UNRES interactively enter the following command at your Unix prompt +or put it in the batch script: + +unres.csh POTENTIAL INPUT N_PROCS + +where: + +POTENTIAL specifies the side-chain interaction potential type and must be +one of the following: + +LJ - 6-12 radial Lennard-Jones +LJK - 6-12 radial Lennard-Jones-Kihara (shifted Lennard Jones) +BP - 6-12 anisotropic Berne-Pechukas based on Gaussian overlap (dilated + Lennard-Jones) +GB - 6-12 anisotropic Gay-Berne (shifted Lennard-Jones) +GBV - 6-12 anisotropic Gay-Berne-Vorobjev (shifted Lennard-Jones) + +See section 4. (Force Fields) for explanation and usage. + +At present, only the LJ and GB potentials are applied. The LJ potential +is used in the "CASP3" version of the UNRES force field that is able +to predict only alpha-helical structures. All further version of the +UNRES force field use the GB potential. For the description of all above-mentioned +potentials see A. Liwo, St. Oldziej, M.R. Pincus, R.J. Wawak, S. Rackovsky, +H.A. Scheraga, J. Comput. Chem., 1997, 18, 849-873. + +INPUT is the prefix for input and output files (see below) + +N_PROCS is the number of processors; for a CSA or REMD/MREMD run it MUST be at least 2. + +Note! The script takes one more variable, FGPROCS, as the fourth argument, +which is the number of fine-grain processors to parallelize energy +evaluations. The corresponding code is in UNRES/CSA, but it was written +using MPL instead of MPI and therefore is never used in the present version. +At present we have no plans to rewrite fine-grain parallelization using MPI, +because we found that the scalability for up to 200 residue polypeptide +chains was very poor, due to a small number of interactions and, +correspondingly, unfavorable ratio of the overhead to the computation time. + +INPUT.inp contains the main input data and the control parameters of the CSA + method. + +INPUT.out_POTENTIAL_xxx - main output files from different processors; xxx + denotes the number of the processor + +INPUT_POTENTIALxxx.stat - summary files with the energies, energy components, + and RMS deviations of the conformations produced by each of the processors; + not used in CSA runs; also it outputs different quantity in MD/MREMD runs. + +CSA version specific files: + +INPUT_POTENTIALxxx.int - internal coordinates; in the CSA run + INPUT_POTENTIAL_000.int contains the coordinates of the conformations, + and the other files are empty + +INPUT.CSA.history - history file from a CSA run. This is an I/O file, because + it can be used to restart an interrupted CSA run. + +INPUT.CSA.seed - stores the random seed generated in a CSA run; written for + restart purposes. + +INPUT.CSA.bank - current bank of conformations obtained in CSA calculations + (expressed as internal coordinates). This information is also stored in + INPUT_POTENTIAL000.int + +INPUT.CSA.rbank - as above, but contains random-generated conformations. + +MD version specific files: + +INPUT_MDyyy.pdb - Cartesian coordinates of the conformations in PDB format. + +INPUT_MDyyy.x - Cartesian coordinates of the conformations in ASCII format. + +INPUT_MDyyy.cx - Cartesian coordinates of the conformations in compressed format + (need xdr2pdb to convert to PDB format). + +The program currently produces some more files, but they are not used +for any purposes and most of them are scratched after a run is completed. + +The run script also contains definitions of the parameter files through the +following environmental variables: + +SIDEPAR - parameters of the SC-SC interaction potentials (U_{SC SC}); +SCPPAR - parameters of the SC-p interaction potential (U_{SCp}); this file can + be ignored by specifying the -DOLDSCP preprocessor flag, which means that the + built-in parameters are used; at present they are the same as the parameters + in the file specified by SCPPAR; +ELEPAR - parameters of the p-p interaction potentials (U_{pp}); +FOURIER - parameters of the multibody potentials of the coupling between the + backbone-local and backbone-electrostatic interactions (U_{corr}); +THETPAR - parameters of the virtual-bond-angle bending potentials (U_b); +ROTPAR - parameters of the side-chain rotamer potentials (U_{rot}); +TORPAR - parameters of the torsional potentials (U_{rot}); +TORDPAR - parameters of the double-torsional potentials. +SCCORPAR - parameters of the supplementary torsional sequence-specific potentials + (not implemented yet). + +7. FORCE FIELDS +--------------- + +UNRES is being developed since 1997 and several versions of the force field +were produced. The settings and references to these force fields are +summarized below. + +Force fields for CSA version (can be used in MD but haven't been parameterized for this +purpose). + +--------------------------------------------------------------------------------------- + Additional SC-SC Example script Structural +Force field compiler flags potential and executables classes covered References + (Linux; PGF90 + and IFC) +--------------------------------------------------------------------------------------- + +CASP3 -DCRYST_TOR LJ unres_CASP3.csh only alpha [1-3] + -DCRYST_BOND unres_pgf90_cryst_tor.exe + -DCRYST_THETA unres_ifc6_cryst_tor.exe + -DCRYST_SC + -DMOMENT + +ALPHA -DMOMENT GB unres_CASP4.csh only alpha [4-6] + -DCRYST_BOND unres_pgf90_moment.exe + -DCRYST_THETA unres_ifc6_moment.exe + -DCRYST_SC + +BETA -DMOMENT GB unres_CASP4.csh only beta [4-6] + -DCRYST_BOND unres_pgf90_moment.exe + -DCRYST_THETA unres_ifc6_moment.exe + -DCRYST_SC + +ALPHABETA -DMOMENT GB unres_CASP4.csh all [4-6] + -DCRYST_BOND unres_pgf90_moment.exe + -DCRYST_THETA unres_ifc6_moment.exe + -DCRYST_SC + +CASP5 -DCRYST_BOND GB unres_CASP5.csh all [7,8,11] + -DCRYST_THETA unres_pgf90.exe + -DCRYST_SC unres_ifc6.exe + +3P -DCRYST_BOND GB unres_3P.csh all [12,13] + -DCRYST_THETA unres_pgf90.exe + -DCRYST_SC unres_ifc6.exe + +4P -DCRYST_BOND GB unees_4P.csh all [12,13] + -DCRYST_THETA unres_pgf90.exe + -DCRYST_SC unres_ifc6.exe +--------------------------------------------------------------------------------------- + +Force fields for MD version + +--------------------------------------------------------------------------------------- + Additional SC-SC Example script Structural +Force field compiler flags potential and executables classes covered References + (Linux; PGF90 + and IFC) +--------------------------------------------------------------------------------------- + +GAB -DCRYST_BOND GB unres_GAB.csh mostly alpha [19] + -DCRYST_THETA + -DCRYST_SC + +E0G -DCRYST_BOND GB unres_E0G.csh mostly alpha [19] + -DCRYST_THET + -DCRYST_SC + +1L2Y_1LE1 none GB unres_ab.csh all [20,25-27] + +--------------------------------------------------------------------------------------- + +The example scripts (the *.csh filed) contain all appropriate parameter files, while +the energy-term weights are provided in the example input files listed in EXAMPLES.TXT +(*.inp; see section 5. for description of the input files). However, it is user's +responsibility to specify appropriate compiler flags. Note that a version WILL NOT work, +if the force-field specific compiler flags are not set. The parameter files specified +in the run script also must strictly correspond to the energy-term weights specified in +the input file. The parameter files for specific force fields are also specified below +and the energy-term weights are specified in section 5. + +The parameter files are as follows (the environment variables from section 3 are +used to identify the parameters): + +CASP3: + +BONDPAR bond.parm +THETPAR thetaml.5parm +ROTPAR scgauss.parm +TORPAR torsion_cryst.parm +TORDPAR torsion_double_631Gdp.parm (not used) +SIDEPAR scinter_LJ.parm +ELEPAR electr.parm +SCPPAR scp.parm +FOURIER fourier_GAP.parm (not used) +SCCORPAR rotcorr_AM1.parm (not used) + +ALPHA, BETA, ALPHABETA (CASP4): + +BONDPAR bond.parm +THETPAR thetaml.5parm +ROTPAR scgauss.parm +TORPAR torsion_ecepp.parm +TORDPAR torsion_double_631Gdp.parm (not used) +SIDEPAR scinter_GB.parm +ELEPAR electr.parm +SCPPAR scp.parm +FOURIER fourier_GAP.parm +SCCORPAR rotcorr_AM1.parm (not used) + +CASP5: + +BONDPAR bond.parm +THETPAR thetaml.5parm +ROTPAR scgauss.parm +TORPAR torsion_631Gdp.parm +TORDPAR torsion_double_631Gdp.parm +SIDEPAR scinter_GB.parm +ELEPAR electr_631Gdp.parm +SCPPAR scp.parm +FOURIER fourier_opt.parm.1igd_iter7n_c +SCCORPAR rotcorr_AM1.parm (not used) + +3P: + +BONDPAR bond.parm +THETPAR thetaml.5parm +ROTPAR scgauss.parm +TORPAR torsion_631Gdp.parm +TORDPAR torsion_double_631Gdp.parm +SIDEPAR sc_GB_opt.3P7_iter81_1r +ELEPAR electr_631Gdp.parm +SCPPAR scp.parm +FOURIER fourier_opt.parm.1igd_hc_iter3_3 +SCCORPAR rotcorr_AM1.parm (not used) + +4P: + +BONDPAR bond.parm +THETPAR thetaml.5parm +ROTPAR scgauss.parm +TORPAR torsion_631Gdp.parm +TORDPAR torsion_double_631Gdp.parm +SIDEPAR sc_GB_opt.4P5_iter33_3r +ELEPAR electr_631Gdp.parm +SCPPAR scp.parm +FOURIER fourier_opt.parm.1igd_hc_iter3_3 +SCCORPAR rotcorr_AM1.parm (not used) + +GAB: + +BONDPAR bond.parm +THETPAR thetaml.5parm +ROTPAR scgauss.parm +TORPAR torsion_631Gdp.parm +TORDPAR torsion_double_631Gdp.parm +SIDEPAR sc_GB_opt.1gab_3S_qclass5no310-shan2-sc-16-10-8k +ELEPAR electr_631Gdp.parm +SCPPAR scp.parm +FOURIER fourier_opt.parm.1igd_hc_iter3_3 +SCCORPAR rotcorr_AM1.parm + +E0G: + +BONDPAR bond.parm +THETPAR thetaml.5parm +ROTPAR scgauss.parm +TORPAR torsion_631Gdp.parm +TORDPAR torsion_double_631Gdp.parm +SIDEPAR sc_GB_opt.1e0g-52-17k-2k-newclass-shan1e9_gap8g-sc +ELEPAR electr_631Gdp.parm +SCPPAR scp.parm +FOURIER fourier_opt.parm.1igd_hc_iter3_3 +SCCORPAR rotcorr_AM1.parm + +1L2Y_1LE1: + +BONDPAR bond_AM1.parm +THETPAR theta_abinitio.parm +ROTPAR rotamers_AM1_aura.10022007.parm +TORPAR torsion_631Gdp.parm +TORDPAR torsion_double_631Gdp.parm +SIDEPAR scinter_${POT}.parm +ELEPAR electr_631Gdp.parm +SCPPAR scp.parm +FOURIER fourier_opt.parm.1igd_hc_iter3_3 +SCCORPAR rotcorr_AM1.parm + +Additionally, for 1L2Y_1LE1, the following environment variables and files are required +to generate random conformations: + +THETPARPDB thetaml.5parm +ROTPARPDB scgauss.parm + +For CSA, the best force field is 4P. For MD, the 1L2Y_1LE1 force field is best for +ab initio prediction but provides medium resolution (5 A for 60-residue proteins) and +overemphasizes beta structures and has to be run with secondary-structure-prediction +information. For prediction of the structure of mostly alpha-protein, and for running +dynamics of large proteins, the best is the GAB force field. All these force fields +were trained by using our procedure of hierarchical optimization [5]. +The 4P and 1L2Y_1LE1 force fields have considerable power independent of structural class. +The ALPHA, BETA, and ALPHABETA force fields (for CSA) were used in the CASP4 exercises +and the CASP5 force field was used in the CASP5 exercise with some success; ALPHA +predicts reasonably the structure of alpha-helical proteins and is still not obsolete, +while for beta and alpha+beta structure prediction +3P or 4P should be used, because they are cheaper and more reliable than BETA and +ALPHABETA. The early CASP3 force field is included for historical reasons only. + +7. INPUT FILES +-------------- + +7.1. Main input data file +------------------------- + +Most of the data are organized as data lists, where the data can be put +in any order, using a series of statements of the form: + +KEYWORD=value + +for simple non-logical variables + +or just + +KEYWORD + +to indicate that the corresponding option is turned on. For array variables +the assignment statement is: + +KEYWORD=value1,value2,... + +However, the data lists are unnamed and that must be placed EXACTLY in the +order indicated below. The presence of an "&" in the 80th column of a line +indicates that the next line will belong to the same data group. The parser +subroutines that interpret the keywords are case insensitive. + +Each group of data organized as a data list is indicated as "data list format" +input. + +8.1.1. Title +------------ +Any string containing up to 80 characters. The first input line is always +interpreted as title. + +8.1.2. Control data (data list format; READ_CONTROL subroutine) +--------------------------------------------------------------- + +8.1.2.1 Keywords to chose calculation type +------------------------------------------ + +OUT1FILE - only the master processor prints the output file in a parallel job + +MINIMIZE - if present, energy minimization will be carried out. + +REGULAR - regularize the read in conformation (usually a crystal or + NMR structure) by doing a series of three constrained minimizations, + to keep the structure as close as possible to the starting + (experimental) structure. The constraints are the CA-CA distances + of the initial structure. The constraints are gradually diminished + and removed in the last minimization. + +SOFTREG - regularize the read in conformation (usually a crystal or NMR + structure) by doing a series of constrained minimizations, with + additional use of soft potential and secondary structure + freezing, to keep the structure as close as possible to the + starting (experimental) structure. + + +CSA - if present, the run is a CSA run. At present, this is the only + reliable mode of doing global conformational search with this + package; it is NOT recommended to use MCM or THREAD for this + purpose. + +MCMA - if present, this is a Monte Carlo Minimization (MCM) run. + +MULTCONF- if present, conformations will be read from the INPUT.intin + file. + +MD - run canonical MD (single or multiple trajectories) + +RE - run REMD or MREMD (parallel jobs only) + +MUCA - run multicanonical MD calculations (parallel jobs only) + +MAP=number (integer) +Conformational map will be calculated in chosen angles. + +THREAD=number (integer) +Threading or threading-with-minimization run, using a database of structures +contained in the $DD/patterns.cart pattern data base (502 chains or chain +fragments), using a total number patterns. It is recommended to use this with +energy minimization; this implies regularization of each minimized pattern. +For references see A. Liwo, M.R. Pincus, R.J. Wawak, +S. Rackovsky, St. Oldziej, H.A. Scheraga, J. Comput. Chem., 1997, 18, 874-887 +and A. Liwo, St. Oldziej, R. Kazmierkiewicz, M. Groth, C. Czaplewski, +Acta Biochim. Pol., 1997, 44, 527-547. + +CHECKGRAD - compare numerical and analytical gradient; to be followed by: + CART - energy gradient in virtual-bond vectors (Cartesian coordinates) + INT - energy gradient in internal coordinates (default) + CARINT - derivatives of the internal coordinates in the virtual-bond vectors. + +8.1.2.2 Specification of protein and structure output in non-MD applications +---------------------------------------------------------------------------- + +ONE_LETTER - one-letter and not three-letter code of the amino-acid residues + is used + +SYM (1) - number of chains with same sequence (for oligomeric proteins only), + +PDBSTART - the initial conformation is read in from a PDB file + +UNRES_PDB - the starting conformation is in UNRES representation (Calpha + and SC coordinates only). This keyword MUST appear in such a case + or the program will generate erroneous and unrealistic side-chain + coordinates. + +RAND_CONF- start from a random conformation + +EXTCONF - start from an extended chain conformation + +PDBOUT - if present, conformations will be output in PDB format. Note that + this keyword affects only the output from single energy evaluation, + energy minimization and multiple-conformation data. To request + conformations from MD/MREMD runs in PDB format, the MDPDB keyword + must be placed on the MD input record. + +MOL2OUT - if present, conformations will be output in SYBYL mol2 format + +REFSTR - if present, reference structure will be read (e.g., to monitor + the RMS deviation from the crystal structure) + +PDBREF - if present, a reference structure will be read in to compare + the calculated conformations with it + +UNRES_PBD - the starting/reference structure is read from an UNRES-generated + PDB file + +Keywords: PDBOUT, MOL2OUT, PDBREF, and PDBSTART are ignored for a CSA run. +Output mode for MD version is specified in MD input (see section 5.5). + +8.1.2.3. Miscellaneous +---------------------- + +CONSTR_DIST=number +0 - no distance restraints +>0 imposes harmonic restraints on selected distances; see section 5.12. +In MD version, also restraints on the q variable [18] can be used. + +WEIDIS=number (real) +the weight of the distance term; applies for REGULARIZE and THREAD, otherwise +ignored. + +USE_SEC_PRED - use secondary-structure prediction information. + +SEED=number (integer) (no default) +Random seed (required, even if the run is not a CSA, MCM, MD or MREMD run) + +PHI - only the virtual-bond dihedral angles gamma are considered as + variables in energy minimization + +BACK - only the backbone virtual angles (virtual-bond angles theta and + virtual-bond dihedral angles gamma) are considered as variables + in energy minimization + +By default, all internal coordinates: theta, gamma, and the side-chain +centroid polar angles alpha and beta are considered as variables in energy +minimization. + +RESCALE_MODE=number (real) +Choice of the type of temperature dependence of the force field. +0 - no temperature dependence +1 - homographic dependence (not implemented yet with any force field) +2 - hyperbolic tangent dependence [18]. + +T_BATH=number (real) +temperature (for MD runs and temperature-dependent force fields). + +The following keywords apply to MCM only: + +MAXGEN=number (integer) (10000) +maximum number of conformations generated in a single MCM iteration + +MAXOVERLAP=number (integer) (1000) +maximum number of conformations with "bad" overlaps allowed to appear in a +row in a single MCM iteration. + +DISTCHAINMAX - (multi-chain capacity only) maximum distance between the + last residue of a given chain and the first residue of the + next chain such that restraints will not be imposed; quartic + restraints will be imposed for greater distances. + +ENERGY_DEC - detailed energies will be printed for each interacting pair + or each virtual bond, virtual-bond angle and dihedral angle, + side chain, etc. DO NOT use unless a single energy evaluation + was requested. + +8.1.3. Minimizer options (data list, subroutine READ_MINIM) +----------------------------------------------------------- + +This data group is present, if MINIMIZE was specified on the control card. +Otherwise, it must not appear. + +CART - minimize in virtual-bond vectors instead of angles + +MAXMIN=number (integer) (2000) +maximum number of iterations of the SUMSL minimizer + +MAXFUN=number (integer) (5000) +maximum number of function evaluations in a single minimization + +TOLF=number (real) (1.0e-2) +Tolerance on function + +RTOLF=number (real) (1.0d-4) +Relative tolerance on function + +The SUMSL minimizer is used in UNRES/CSA. For detailed description of +the control parameters see the source file cored.f and sumsld.f + + +8.1.4 CSA control parameters +---------------------------- + +This data group should be present only, if CSA was specified on the control +card. It is recommended that the readers to read publications on CSA method +for more complete description of the parameters. Brief description of +parameters: + +NCONF=number (integer) (50) +This corresponds to the size of the bank at the beginning of the +CSA procedure. The size of the bank, nbank, is set to nconf. +If necessary (at much later stages of the CSA: see icmax below), +nbank increases by multiple of nconf. + +JSTART=number (integer) (1) +JEND=number (integer) (1) +This corresponds to the limit values of do loop, each of which +corresponds to an separate CSA run. If jstart=1, and jstart=100, +this routine will repeat 100 separate CSA runs (limited by CPU) +each one with separate random number initialization. +The only difference between two CSA runs (one with jstart=jend=1 +and another one with jstart=jend=2) would be different random +number initializations if other parameters are identical. + +NSTMAX=number (integer) (500000) +This is to set a limit the total number of local minimizations of CSA +before termination. + +N1=number (integer) (6) +N2=number (integer) (4) +N3=number (integer) (0) +N4=number (integer) (0) +N5=number (integer) (0) +N6=number (integer) (10) +N7=number (integer) (0) +N8=number (integer) (0) +N9=number (integer) (0) +IS1=number (integer) (1) +IS2=number (integer) (8) +These numbers are used to generate trial conformations for each seed. +See the file, "newconf.f", for more details. + n1: the total number of trial conformations for each seed by substituting + nran number of variable angles (see subroutine newconf1ab and + subroutine newconf1ar) + n2: the total number of trial conformations for each seed by substituting + nran number of groups of variable angles (see subroutine newconf1bb and + subroutine newconf1br) + n3: the total number of trial conformations for each seed by substituting + a window of residues which forms a beta-hairpin, if there is no enough + beta-hairpins uses the same algorithm as n6 + n4: the total number of trial conformations for each seed by shifting the + turn in beta-hairpin by +/- 1 or 2 residues, if there is no enough + beta-hairpins uses the same algorithm as n6 + n5: not used + n6: the total number of trial conformations for each seed by substituting + a window of residues [is1,is2] inclusive. The size of the window is + determined in a random fashion (see subroutine newconf_residue for + generation of the trial conformations) + n7: the total number of trial conformations for each seed by copying a + remote strand pair forming nonlocal beta-sheet contact + n8: the total number of trial conformations for each seed by copying an + alpha-helical segment + n9: the total number of trial conformations for each seed by shifting the + alpha-helical segment by +/- 1 or 2 residues + +Typical values used for a 75-residue helical protein is +(6 4 0 0 0 10 1 26) for (n1,n2,n3,n4,n5,n6,is1,is2), respectively. +In this example, a total of 20 trial conformations are generated for a seed +Usually is1=1 is used for all applications, and the value of is2 is set about +to 1/3 of the total number of residues. n3, n4 and n7 are design to help in +case of proteins with beta-sheets + +NRAN0=number (integer) (4) +NRAN1=number (integer) (2) +IRR=number (integer) (1) +These numbers are used to determine if the CSA stage is very early. +One can use (4 2 1) for these values. For more details one should look into +the file, "newconf.f", for more details. + +NTOTAL=number (integer) (10000) +CUT1=number (real) (2.0) +CUT2=number (real) (5.0) +Annealing schedule is set in following fashion. +The value of D_cut is reduced geometrically from 1/cut1 of D_ave (at the +beginning) to 1/cut2 of D_ave (after ntotal number of minimizations) where +D_ave is the average distance between two conformations in the First_bank. + +ESTOP=number (real) (-3000.0) +The CSA procedure stops if a conformations with energy lower than estop is +obtained. If the do-loop set by jstart and jend requires more than one loop, +the program will go on until the do-loop is finished. + +ICMAX=number (integer) (3) +The maximum value of cycle (see the original publications for details). +If the number of cycle exceeds this value the program will add nconf +more conformations to Bank and First_bank to continue CSA procedure if +the new size of the nbank is within the maximum set by nbankm (see above). +If the size of nbank exceeds the maximum set by nbankm the CSA procedure +for this run will stop and next CSA will begin depending on the do-loop +set by jstart and jend. + +IRESTART=number (integer) (0) +This tells you if the run is fresh start (irestart=0) or a restart (irestart=1) +starting from an old results + +NDIFF=number (integer) (2) +The number of variables use in comparison when structure is added to the +bank,4 - all angels, 2 - only backbone angles gamma and theta + +NBANKTM=number (integer) (0) +The maximum number of structures saved in *.CSA.bankt as history of the run +Do not use bankt on massively parallel computation as it kills scalability. + +DELE=number (real) (20.0) +Energy cutoff for bankt. + +DIFCUT=number (real) (720.0) +Angle cutoff for bankt. + +IREF=number (integer) (0) +0 - normal run, 1 - local CSA which generates only structures close to the +reference one read from *.CSA.native.int file + +RMSCUT=number (real) (4.0) +CA RMSD cut off used in local CSA + +PNCCUT=number (real) (0.5) +Percentage of native contact used in local CSA + +NCONF_IN=number (integer) (0) +The number of conformation read for the first bank from the input file +*.intin + +Optionally, the CSA parameters can be read from file INPUT.CSA.in, if +this file exists. If so, they are read in free format in the following +order: + +nconf +jstart,jend +nstmax +n1,n2,n3,n4,n5,n6,n7,n8,is1,is2 +nran0,nran1,irr +nseed +ntotal,cut1,cut2 +estop +icmax,irestart +ntbankm,dele,difcut +iref,rmscut,pnccut +ndiff + + +8.1.5. MCM data (data list, subroutine MCMREAD) +----------------------------------------------- + +This data group is present, if MCM was specified on the control card. +Otherwise it must not appear. + +MAXACC=number (integer) (100) +Maximum number of accepted conformations + +MAXTRIAL=number (integer) (100) +Maximum number of unsuccessful trials in a row + +MAXTRIAL_ITER=number (integer) (1000) +Maximum number of unsuccessful trials in a single iteration + +MAXREPM=number (integer) (200) +Maximum number of repetitions of the same minimum + +RANFRACT=number (real) (0.5d0) +Fraction of chain-rebuild motions + +OVERLAP=number (real) (1.0d3) +Bad contact energy criterion + +NSTEPH=number (integer) (0) +Number of heating step in adaptive sampling + +NSTEPC=number (integer) (0) +Number of cooling step in adaptive sampling + +TMIN=number (real) (298.0d0) +Minimum temperature in adaptive-temperature sampling) + +TMAX=number (real) (298.0d0) +Maximum temperature in adaptive-temperature sampling) + +The temperature is changed according to the formula: + +T = TMIN*EXP(ISTEPH*(TMAX-TMIN)/NSTEPH) when heating + +and + +T = TMAX*EXP(-ISTEPC*(TMAX-TMIN)/NSTEPC) when cooling + +The default is to use a constant temperature. + +NWINDOW=number (integer) (0) +Number of windows in which the variables will be perturbed; the windows are +defined by the numbers of the respective amino-acid residues. If NWINDOW +is nonzero, after specifying all MCM input the next lines must define the +windows. Each line looks like this: + +winstart winend (free format) + +e.g. if NWINDOW=2, the input: + +4 10 +15 20 + +will mean that only the variables of residues 4-10 and 15-20 will be perturbed. +However, in general, all variables will be considered in minimization. + +PRINT_MC=number (0) +Printout level in MCM. 0 - no intermediate printing, 1 and 2 - moderate +printing, 3 - extensive printing. + +NO_PRINT_STAT - no output to INPUT_POTENTIALxxx.stat. + +NO_PRINT_INT - no internal-coordinate output to INPUT_POTENTIALxxx.int. + +8.1.6. MD data (subroutine READ_MDPAR) +-------------------------------------- + +NSTEP (1000000) number of time steps per trajectory. + +NTWE (100) NTWX (1000) frequency of energy and coordinate output, respectively. +The coordinates are dumped in the pdb or compressed Gromacs (cx) format, +depending on the next keyword. +NTWE=0 means no energy dump. + +MDPDB - dump coordinates in the PDB format (cx otherwise) + +TRAJ1FILE only the master processor outputs coordinates. This feature pertains + only to REMD/MREMD jobs and overrides NTWX; coordinates are dumped at every + exchange in MREMD. + +REST1FILE only the master writes the restart file + +DT (real) (0.1) time step; the unit is "molecular time unit" (mtu); 1 mtu = 48.9 fs + +DAMAX (real) (1.0) maximum allowed change of acceleration during a single time step. +The time step gets scaled down, if this is exceeded. + +DVMAX (real) (20.0) maximum allowed velocity (in A/mtu) + +EDRIFTMAX (real) (10.0) maximum allowed energy drift in a single MD step (10 kcal/mol) + +REST restart flag. The calculation is restarted if present. + +LARGE very detailed output. Don't use except for debugging. + +PRINT_COMPON prints energy components. + +RESET_MOMENT (1000) frequency of zeroing out the total angular momentum when +running Berendsen mode calculations (for Langevin calculations meaningless). + +RESET_VEL=number (integer) (1000) - frequency of resetting velocities to values +from Gaussian distribution. + +RATTLE - use RATTLE algorithm (constraint bonds); not yet implemented. + +RESPA - use the Multiple Time Step (MTS) or Adaptive Multiple Time Step (A-MTS) +algorithm [17]. Without this flag the variable time step (VTS) [14] is run. + +NTIME_SPLIT=number (integer) (1) - initial number of time-split steps + +MAXTIME_SPLIT=number(integer) (64) - maximum number of time-split step + +If NTIME_SPLIT==MAXTIME_SPLIT, MTS is run. + +R_CUT=number (real) (2.0) - the cut-off distance in splitting the forces into short- and +long-range in site-site VDW distance units. + +LAMBDA (real) (0.3) - the transition length (in site-site VDW distance units) between +short- and long-range forces. + +XIRESP - flag to use MTS/A-MTS with Nose-Hoover/Nose-Poincare thermostats. + +LANG=number (integer) (0) Langevin dynamics flag: + +0 - No explicit Langevin dynamics. +1 - Langevin with direct integration of the equations of motion (recommended + for Langevin calculations) +2 - Langevin calculation with analytical pre-integration of the friction and + stochastic part of the equations of motion using an algorithm adapted from TINKER. + This is MUCH MORE time- and memory-consuming than 1 and requires compiling without + the -DLANG0 flag and enormously increases memory requirements. +3 - The stochastic integrator developed by Cicotti and coworkers. +4 - for other stochastic integrators (not used at present). + +Note: With the enclosed code, the -DLANG0 compiler flag is included which disables +LANG=2 and LANG=3 + +TBF Berendsen thermostat. + +TAU_BATH (1.0) (units are mtus; 1mtu=48.9 fs) constant of the coupling to the thermal bath + used with the Berendsen thermostat. + +NOSEPOINCARE99 - the Nose-Poincare thermostat as of 1999 will be used. + +NOSEPOINCARE01 - the Nose-Poincare thermostat as of 2001 will be used. + +NOSEHOOVER96 - the Nose-Hoover thermostat will be used. + +Q_NP=number (real) (0.1) - the value of the mass of the fictitious particle in the calculations + with the Nose-Poincare thermostat. + +T_BATH (300.0) (in K) temperature of canonical simulation or temperature to generate +velocities. + +ETAWAT (0.8904) viscosity of water (in centipoises) + +RWAT (1.4) radius of water molecule (in A) + +SCAL_FRIC=number (real) (0.02) - scaling factor of the friction coefficients. + +SURFAREA - scale friction acting on atoms by atoms' solvent accessible area. + +RESET_FRICMAT=number (integer) (1000) - recalculate friction matrix every RESET_FRICMAT MD steps. + +USAMPL restraints on q (see reference 5 for meaning) will be imposed (see section . +In this case, the next records specify the restraints; these records are +placed before the list of temperatures or numbers of trajectories. + +EQ_TIME=number (real) (1.0e4) time (in mtus; 1 mtu=48.9 fs) after which restraints +on q will start to be in force. + +If USAMPL has been specified, the following information must be supplied after the +main MD input data record (subroutine READ_FRAGMENTS): + +Line 1: nset, npair, nfrag_back (number of sets of restraints, number of restrained +fragments, number of restrained pairs, number of restrained backbone fragments +(in terms of theta and gamma angles) + +For each set of restraints (1, 2,..., nset): + +mset(iset) - how many times the set is multiplied + +wfrag(i,iset), ifrag(1,i,iset), ifrag2(2,i,iset),qfrag(i,iset) +weight of the restraint, first and last residue of the fragment, target q value. +This information is repeated through nfrag. + +wpair(i,iset), ipair(1,i,iset), ipair(2,i,iset),qinpair(i,iset) +weight of the restraint, first and second fragment of the pair (according to fragment +list), target q value. This information is repeated through npair + +wfrag_back(1,i,iset), wfrag_back(2,i,iset), wfrag_back(3,i,iset), +ifrag_back(1,i,iset),ifrag_back(2,i,iset) +weight of the restraints on theta angles, weight on the restraints on gamma angles, +weight of the restraints on side-chain rotamers, first residue of the fragment, +last residue of the fragment. This information is repeated through nfrag_back. + +8.1.7 REMD/MREMD data (subroutine READ_REMDPAR) +----------------------------------------------- + +NREP (3) number of replicas in a REMD/MREMD run + +NSTEX (1000) number of steps after which exchange is performed in REMD/MREMD + runs + +The temperatures in replicas can be specified through + +RETMIN (10.0) minimum temperature in a REMD/MREMD run + +RETMAX (1000.0) maximum temperature in a REMD/MREMD run + +Then the range from retmin to retmax is divided into equal segments and +temperature of the replicas assigned accordingly, + +or + +TLIST means that the NREP temperature of the replicas will be input in the +next record + +MLIST numbers of trajectories per each of the NREP temperatures will be +specified in the record after the list of temperatures; this specifies +a MREMD run. + +Important! The number of processors must be exactly equal to the number of +trajectories, i.e., NREP for a REMD run or sum_i mlist(i) for a MREMD run. + +SYNC - all trajectories will be synchronized every NSTEX time steps +(by default, they are not synchronized) + +TRAJ1FILE only the master processor outputs coordinates. This feature pertains + only to REMD/MREMD jobs and overrides NTWX; coordinates are dumped at every + exchange in MREMD. + +REST1FILE only the master writes the restart file + +HREMD - Hamiltonian replica exchange flag; not only temperatures but also +sets energy-term weights are exchanged between conformations. + +TONLY - run a "fake" HREMD with many sets of energy-term weights in a +single run but only temperature exchange. + +8.1.8 Energy-term weights (data list; subroutine MOLREAD) +--------------------------------------------------------- + +WLONG=number (real) (1.0d0) +common weight of the U(SC-SC) (side-chain side-chain interaction) +and U(SC,p) (side-chain peptide-group) term + +WSCC = number (real) (WLONG) +weight of the U(SC-SC) term + +WSCP = number (real) (WLONG) +weight of the U(SC-p) term + +WELEC=number (real) (1.0d0) +weight of the U(p-p) (peptide-group peptide-group interaction) term + +WEL_LOC=number (real) (1.0d0) +weight of the U_el_loc^3 (local-electrostatic cooperativity, third-order) term + +WCORRH=number (real) (1.0d0) +weight of the U(corr) (cooperativity of hydrogen-bonding interactions, fourth-order) term + +WCORR5=number (real) (0.0d0) +weight of the U_el_loc^5 (local-electrostatic cooperativity, 5th order +contributions) + +WCORR6=number (real) (0.0d0) +weight of the U_el_loc^6 (local-electrostatic cooperativity, 6th order +contributions) + +WTURN3=number (real) (1.0d0) +weight of the U_turn^3 (local-electrostatic cooperativity within 3 residue +segment, 3rd order contribution) + +WTURN4=number (real) (1.0d0) +weight of the U_turn^4 (local-electrostatic cooperativity within 4 residue +segment, 4rd order contributions) + +WTURN6=number (real) (1.0d0) +weight of the U_turn^6 (local-electrostatic cooperativity within 6 residue +segment, 6rd order contributions) + +WTOR=number (real) (1.0d0) +weight of the torsional term U(tor) + +WANG=number (real) (1.0d0) +weight of the virtual-bond angle bending term U(b) + +WSCLOC=number (real) (1.0d0) +weight of the side-chain rotamer term U(SC) + +WSTRAIN=number (real) (1.0d0) +scaling factor of the distance-constrain or disulfide-bond strain energy term + +SCALSCP=number (real) (1.0d0) +scaling factor of U(SC,p); this is an alternative to specifying WSCP; in +this case WSCP will be calculated as WLONG*SCALSCP + +SCAL14=number (real) (1.0d0) +scaling factor of the 1,4 SC-p interactions + +CUTOFF (7.0) - cut-off on backbone-electrostatic interactions to compute 4- +and higher-order correlations + +DELT_CORR (0.5) - thickness of the distance range in which the energy is +decreased to zero + +The defaults are NOT the recommended values. No "working" default values +have been set, because the force field is still under development. The values +corresponding to the force fields listed in section 4 are as follows: + +CASP3: +WELEC=1.5 WSTRAIN=1.0 WTOR=0.08617 WANG=0.10384 WSCLOC=0.10384 WCORR=1.5 & +WTURN3=0 WTURN4=0 WTURN6=0 WEL_LOC=0 WCORR5=0 WCORR6=0 SCAL14=0.40 SCALSCP=1.0 & +CUTOFF=7.00000 WSCCOR=0.0 + +ALPHA: +WSC=1.00000 WSCP=0.72364 WELEC=1.10890 WANG=0.68702 WSCLOC=1.79888 & +WTOR=0.30562 WCORRH=1.09616 WCORR5=0.17452 WCORR6=0.36878 WEL_LOC=0.19508 & +WTURN3=0.00000 WTURN4=0.55588 WTURN6=0.11539 CUTOFF=7.00000 WCORR4=0.0000 & +WTORD=0.0 WSCCOR=0.0 + +BETA: +WSC=1.00000 WSCP=1.10684 WELEC=0.70000 WANG=0.80775 WSCLOC=1.91939 & +WTOR=3.36070 WCORRH=2.50000 WCORR5=0.99949 WCORR6=0.46247 WEL_LOC=2.50000 & +WTURN3=1.80121 WTURN4=4.35377 WTURN6=0.10000 CUTOFF=7.00000 WCORR4=0.00000 & +WSCCOR=0.0 + +ALPHABETA: +WSC=1.00000 WSCP=1.43178 WELEC=0.41501 WANG=0.37790 WSCLOC=0.12880 & +WTOR=1.98784 WCORRH=2.50526 WCORR5=0.23873 WCORR6=0.76327 WEL_LOC=2.97687 & +WTURN3=0.09261 WTURN4=0.79171 WTURN6=0.01074 CUTOFF=7.00000 WCORR4=0.00000 & +WSCCOR=0.0 + +CASP5: +WSC=1.00000 WSCP=1.54864 WELEC=0.20016 WANG=1.00572 WSCLOC=0.06764 & +WTOR=1.70537 WTORD=1.24442 WCORRH=0.91583 WCORR5=0.00607 WCORR6=0.02316 & +WEL_LOC=1.51083 WTURN3=2.00764 WTURN4=0.05345 WTURN6=0.05282 WSCCOR=0.0 & +CUTOFF=7.00000 WCORR4=0.00000 WSCCOR=0.0 + +3P: +WSC=1.00000 WSCP=2.85111 WELEC=0.36281 WANG=3.95152 WSCLOC=0.15244 & +WTOR=3.00008 WTORD=2.89863 WCORRH=1.91423 WCORR5=0.00000 WCORR6=0.00000 & +WEL_LOC=1.72128 WTURN3=2.99827 WTURN4=0.59174 WTURN6=0.00000 & +CUTOFF=7.00000 WCORR4=0.00000 WSCCOR=0.0 + +4P: +WSC=1.00000 WSCP=2.73684 WELEC=0.06833 WANG=4.15526 WSCLOC=0.16761 & +WTOR=2.99546 WTORD=2.89720 WCORRH=1.98989 WCORR5=0.00000 WCORR6=0.00000 & +WEL_LOC=1.60072 WTURN3=2.36351 WTURN4=1.34051 WTURN6=0.00000 & +CUTOFF=7.00000 WCORR4=0.00000 WSCCOR=0.0 + +GAB: +WLONG=1.35279 WSCP=1.59304 WELEC=0.71534 WBOND=1.00000 WANG=1.13873 & +WSCLOC=0.16258 WTOR=1.98599 WTORD=1.57069 WCORRH=0.42887 WCORR5=0.00000 & +WCORR6=0.00000 WEL_LOC=0.16036 WTURN3=1.68722 WTURN4=0.66230 WTURN6=0.00000 & +WVDWPP=0.11371 WHPB=1.00000 & +CUTOFF=7.00000 WCORR4=0.00000 + +E0G: +WLONG=1.70905 WSCP=2.18310 WELEC=1.06684 WBOND=1.00000 WANG=1.17536 & +WSCLOC=0.22070 WTOR=2.65798 WTORD=2.00646 WCORRH=0.23541 WCORR5=0.00000 & +WCORR6=0.00000 WEL_LOC=0.42789 WTURN3=1.68126 WTURN4=0.75080 WTURN6=0.00000 & +WVDWPP=0.27044 WHPB=1.00000 WSCP14=0.00000 & +CUTOFF=7.00000 WCORR4=0.00000 + +1L2Y_1LE1: +WLONG=1.00000 WSCP=1.23315 WELEC=0.84476 WBOND=1.00000 WANG=0.62954 & +WSCLOC=0.10554 WTOR=1.84316 WTORD=1.26571 WCORRH=0.19212 WCORR5=0.00000 & +WCORR6=0.00000 WEL_LOC=0.37357 WTURN3=1.40323 WTURN4=0.64673 WTURN6=0.00000 & +WVDWPP=0.23173 WHPB=1.00000 WSCCOR=0.0 & +CUTOFF=7.00000 WCORR4=0.00000 + +8.1.9. Input and/or reference PDB file name (text format; subroutine MOLREAD) +----------------------------------------------------------------------------- + +If PDBSTART or PDBREF was specified in the control card, this line contains +the PDB file name. Trailing slashes to specify the full path are permitted. +The file name can contain up to 64 characters. + +8.1.10. Amino-acid sequence (free and text format) +-------------------------------------------------- + +This data appears, if PDBSTART was not specified, otherwise must not be present +because the sequence would be taken from the PDB file. The first line contains +the number of amino-acid residues, including the end groups (free format), +the next lines contain the sequence in 20(1X,A3) format for the three-letter +or 80A1 format for the one-letter code. There are two types of end-groups: +Gly (three-letter code) or G (one-letter code), if an end group contains a full +peptide bond (e.g., the acetyl N-terminal group or the carboxyamide C-terminal +group) and D (in the three-letter code) or X (in the one-letter code), if the +end group does not contain a peptide group (e.g., the NH2 N-terminal end group +or the COOH C-terminal end group). (Note the Gly or G also denotes the regular +glycine residue, if found in the middle of a chain). +In the second case the end group is considered as a "dummy" group and serves +only to define the first (last) virtual-bond dihedral angle gamma for the +first (last) full amino-acid residue. + +Consider, for example, the Ac-Ala(19)-NHMe polypeptide. The three-letter code +input will look like this: + +21 + Gly Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala + Gly + +And the one-letter code input will be: + +21 +GAAAAAAAAAAAAAAAAAAAG + +If the sequence is changed to NH3(+)-Ala(19)-COO(-), the inputs will look +like this: + +21 + D Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala + D + +and + +21 +XAAAAAAAAAAAAAAAAAAAX + +The sequence input is case-insensitive, because the present version of UNRES +considers each amino-acid residue as an L-residue (there are no torsional +parameters for the combinations of the D- and L-residues yet). Furthermore, +each peptide group is considered as a trans group. + +If the version of UNRES has multi-chain capacity, placing a dummy residue +inside the sequence indicates start of a new chain. For example, a system +composed of two Ala(10) chains can be specified as follows (3-letter code): + +23 + D Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala D Ala Ala Ala Ala Ala Ala Ala Ala + Ala Ala D + +or (1-letter code) + +23 +XAAAAAAAAAAXAAAAAAAAAAX + + +8.1.11. Disulfide-bridge information (free format; subroutine READ_BRIDGE) +-------------------------------------------------------------------------- + +1st line: +NS,(ISS(i),i=1,NS) + +NS - the number of half-cystines (required even if no half-cystines are present) + +ISS(i) - the position of ith half-cystine in the sequence (starting from the +N-terminal end group) + +next line(s) (present only, if ns>0 and must not appear otherwise): +NSS,(IHPB(i),JHPB(i),i=1,NSS) + +NSS - the number of disulfide bridges; must not be greater than NS/2 + +IHPB(i),JHPB(i) - the cystine residue forming the ith bridge. + +The program will check, whether the residues specified in the ISS list +are cystines and terminate with error, if any of them is not. The program +also checks, if the numbers from the IHPB and the JHPB lists have appeared +in the ISS list. + +8.1.12. Dihedral-angle restraint data (free format; subroutine MOLREAD) +----------------------------------------------------------------------- + +This set of data specifies the harmonic constraints (if any) imposed on selected +virtual-bond dihedral angles gamma. + +1st line: +NDIH_CONSTR - the number of restrained gamma angles (required even if no +restrains are applied). + +2nd line (present only, if NDIH_CONSTR > 0; must not appear otherwise): +FTORS - the force constant expressed in kcal/(mol*rad**2) + +next NDIH_CONSTR lines (present only, if NDIH_CONSTR > 0): + +IDIH_CONSTR(i),PHI0(i),DRANGE(i) + +IDIH_CONSTR(i) - the number of ith restrained gamma angle. The angles are +numbered after the LAST alpha-carbons. Thus, the first "real" angle has number +4 and it corresponds to the rotation about the CA(2)-CA(3) virtual-bond axis +and the last angle has the number NRES and corresponds to the rotation about +the CA(NRES-2)-CA(NRES-1) virtual-bond axis. + +PHI0(i) - the "center" of the restraint (expressed in degrees) + +DRANGE(i) - the "flat well" range of the restraint (in degrees) + +The restraint energy for the ith restrained angle is expressed as: + + / + | FTORS*(GAMMA(IDIH_CONSTR(i))-PHI0(i)+DRANGE(i))**2, + | if GAMMA(IDIH_CONSTR(i))PHI0(i)+DRANGE(i) + \ + +Applying dihedral-angle constraints also implies that for ith constrained +gamma angle the sampling be carried out from the +[PHI0(i)-DRANGE(i)..PHI0(i)+DRANGE(i)] interval and not from the [-Pi..Pi] +interval, if random conformations are generated. If only this and not +restrained minimization is required, just set FTORS to 0. + +8.1.13 Distance restraints (subroutine READ_DIST_CONSTR) +-------------------------------------------------------- + +Restraints are imposed on Calpha...Calpha distances. + +NDIST=number (integer) (0) - number of restraints on specific distances. + +NFRAG=number (integer) (0) - number of distance-restrained protein segments. + +NPAIR=number (integer) (0) - number of distance-restrained pairs of segments. + Specifying NPAIR requires specification of segments. + +IFRAG=start(1),end(1),start(2),end(2)...start(NFRAG),end(NFRAG) (integers) +First and last residues of the distance restrained segments. + +WFRAG=w(1),w(2),...,w(NFRAG) (reals) - force constants or bases for force +constant calculation corresponding to fragment restraints. + +IPAIR=start(1),end(1),start(2),end(2),...,start(NPAIR),end(NPAIR) (integers) +numbers of segments (consecutive numbers of start or end pairs in IFRAG +specification), the distances between which will be restrained. + +WPAIR=w(1),w(2),...,w(NFRAG) (reals) - force constants or bases for force +constant calculation corresponding to pair restraints. + +DIST_CUT=number (real) (5.0) - the cut-off distance in angstroms for force- +constant calculations. + +The force constants within fragments/between pairs of fragments are calculated +depending on the value of DIST_CONSTR described in section 5.1: + +1 - all force constants are equal to the respective entries of WFRAG/WPAIR + +2 - the force constants are equal to the respective entries of WFRAG/WPAIR + when the distance between the Calpha atoms in the reference structure + <=D_CUT, 0 otherwise. + +3 - the force constants are calculated from the formula: + +k(CA_j,CA_k)=W*exp{-[d(CA_j,CA_k)/DIST_CUT)]**2/2} + +where k(CA_j,CA_k) is the force constant between the respective Calpha atoms, +d(CA_j,CA_k) is the distance between these Calpha atoms in the reference +structure, and W is the basis for force-constant calculation (see above). + +If NDIST>0, the restraints on specific distance are subsequently input: + +ihpb(i), jhpb(i), forcon(i), i=1,NDIST + +where ihpb(i) and jhpb(i) are the numbers of the residues the distance +between the Calpha atoms of which will be distance restrained and forcon(i) +is the respective force constant. + +8.1.14 Internal coordinates of the reference structure (free format; +-------------------------------------------------------------------- + subroutine READ_ANGLES) + ----------------------- + +This part of the data is present, if REFSTR, but not PDBREF was specified, +otherwise must not appear. It contains the following group of variables: + +(THETA(i),i=3,NRES) - the virtual-bond valence angles THETA +(PHI(i),i=4,NRES) - the virtual-bond dihedral angles GAMMA +(ALPH(i),i=2,NRES-1)- the ALPHA polar angles of consecutive side chains +(OMEG(i),i=2,NRES-1)- the BETA polar angles of consecutive side chains. + +ALPHA(i) and OMEG(i) correspond to the side chain attached to CA(i). THETA(i) +is the CA(i-2)-CA(i-1)-CA(i) virtual-bond angle and PHI(i) is the +CA(i-3)-CA(i-2)-CA(i-1)-CA(i) virtual-bond dihedral angle gamma. + +8.1.15 Internal coordinates of the initial conformation (free format; +--------------------------------------------------------------------- + subroutine READ_ANGLES) + ----------------------- + +This part of the data is present, if RAND_CONF, MULTCONF, THREAD, or PDBSTART +were not specified, otherwise must not appear. This input is as in section 10. + +8.1.15.1 File name with internal coordinates of the conformations to be processed +--------------------------------------------------------------------------------- + (text format; subroutine MOLREAD) + --------------------------------- + +This data is present only, if MULTCONF was specified. It contains the name of +the file with the internal coordinates. Up to 64 characters are allowed. +The structure of the file is that of the *.int file produced by UNRES/CSA. +See section "The structure of the INT files" for details. + +8.1.16 Control data for energy map construction (data lists; subroutine MAP_READ) +--------------------------------------------------------------------------------- + +These data lists appear, if NMAP=n was specified, where n is the number of +variables that will be grid-searched. One list is per one variable or a +group of variables set equal (see below): + +PHI - the variable is a virtual-bond dihedral angle gamma +THE - the variable is a virtual-bond angle theta +ALP - the variable is a side-chain polar angle alpha +OME - the variable is a side-chain polar angle beta + +RES1=number (integer) +RES2=number (integer) + +The range of residues for which the values will be set; all these variables +will be set at the same value. It is required that RES2 > RES1. + +FROM=angle (real) +TO=angle (real) + +Lower and upper limit of scanning in grid search (in degrees) + +NSTEP=number (integer) + +Number of steps in scanning along this variable/group of variables. + +8.2. Input coordinate files +--------------------------- + +At present, geometry can be input either from the external files in the PDB +format (with the PDBSTART option) or multiple conformations can be read +as virtual-bond-valence and virtual-bond dihedral angles when the MULTCONF +option is used (the latter, however, implies using standard virtual-bond +lengths as initial values). The structure of internal-coordinate files +is the same as that of output internal-coordinate files described in section +9.1.1. + +8.3. Other input files +---------------------- + +CSA parameters can optionally be read in free format from file INPUT.CSA.in +(see section 8.1.4). When a CSA run is restarted, the CSA-specific output files +also serve as input files. INPUT is the prefix of input and output files +as explained in section 6. + +Restart files for MD and REMD simulations. They are read when the keyword +RESTART appears on the MD/REMD data group (section 8.1.6). + +8. OUTPUT FILES +--------------- + +UNRES "main" output files (INPUT.out_${POT}[processor]) are log files from +a run. They contain the information of the molecule, force field, calculation +type, control parameters, etc.; however, not the structures produced during +the run or their energies except single-point energy evaluation and +minimization-related runs. The structural information is included in +coordinate files (*.int, *.x, *.pdb, *.mol2, *.cx) and statistics files (*.stat), +respectively; these files are further processed by other programs (WHAM, +CLUSTER) or can be viewed by molecular viewers (pdb or mol2 files). + +9.1. Coordinate files +--------------------- + +9.1.1. The internal coordinate (INT) file +------------------------------------------ + + +This file contains the internal coordinates of the conformations produced +by UNRES in non-MD runs. The virtual-bond lengths are assumed constant so +only the angular variables are provided (see ref + +IT,ENER,NSS,(IHPB(I),JHPB(I),I=1,NSS) +(I5,F12.5,I2,9(1X,2I3)) + +IT - the number of the conformation +ENER - total energy +NSS - the number of disulfide bridges +(IHPB(I),JHPB(I),I=1,NSS) - the positions of the pairs of half-cystines +forming the bridges. If NSS>9, the remaining pairs are written in the +following lines in the (3X,11(1X,2I3)) format. + +(THETA(I),I=3,NRES) +(8F10.4) + +The virtual-bond angles THETA (in degrees) + +(PHI(I),I=4,NRES) +(8F10.4) + +The virtual-bond dihedral angles GAMMA (in degrees) + +(ALPH(I),I=2,NRES-1) +(OMEG(I),I=2,NRES-1) +(8F10.4) + +The polar angles ALPHA and BETA of the side-chain centers (in degrees). + +9.1.2. The plain Cartesian coordinate (X) files (subroutine CARTOUT) +-------------------------------------------------------------------- + +This file contains the Cartesian coordinates of the alpha-carbon and +side-chain-center coordinates. All conformations from an MD/MREMD +trajectory are collated to a single file. The structure of each +conformation's record is as follows: + +1st line: time,potE,uconst,t_bath,nss,(ihpb(j),jhpb(j),j=1,nss), +nrestr,(qfrag(i),i=1,nfrag),(qpair(i),i=1,npair), +(utheta(i),ugamma(i),uscdiff(i),i=1,nfrag_back) + +time: MD time (in "molecular time units"; 1 mtu = 4.89 fs), +potE: potential energy, +uconst: restraint energy corresponding to restraints on Q and backbone geometry, +(see section ??), +t_bath: thermostat temperature, +nss: number of disulfide bonds, +ihpb(j), jhpb(j): the numbers of linked cystines for jth disulfide bond, +nrestr: number of restraints on q and local geometry, +qfrag(i): q value for ith fragment, +qpair(i): q value for ith pair, +utheta(i): sum of squares of the differences between the theta angles + of the current conformation from those of the experimental conformation, +ugamma(i): sum of squares of the differences beaten the gamma angles + of the current conformation from those of the experimental conformation, +uscdiff(i): sum of squares of the differences between the Cartesian difference + of the unit vector of the Calpha-SC axis of the current conformation from + those of the experimental conformation. + +Next lines: Cartesian coordinates of the Calpha atoms (including dummy atoms) +(sequentially, 10 coordinates per line) +Next lines: Cartesian coordinates of the SC atoms (including glycines and +dummy atoms) (sequentially, 10 coordinates per line) + +9.1.3. The compressed Cartesian coordinate (CX) files +----------------------------------------------------- + +These files are compressed binary files (extension cx). For each conformation, +the items are written in the same order as specified in section 9.1.2. For +MREMD runs, if TRAJ1FILE is specified on MREMD record (see section 8.1.6), +snapshots from all trajectories are written every time the coordinates +are dumped. Thus, the file contains snapshot 1 from trajectory 1, ..., +snapshot 1 from trajectory M, snapshot 2 from trajectory 1, ..., etc. + +The compressed cx files can be converted to pdb file by using the xdrf2pdb +auxiliary program (single trajectory files) or xdrf2pdb-m program (multiple +trajectory files from MREMD runs generated by using the TRAJ1FILE option). +The multiple-trajectory cx files are also input files for the auxiliary +WHAM program. + +9.1.4. The Brookhaven Protein Data Bank format (PDB) files (subroutine PDBOUT) +------------------------------------------------------------------------------ + +These files are written in PDB standard (see. e.g., +ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_Letter.pdf). +The REMARK, ATOM, SSBOND, HELIX, SHEET, CONECT, TER, and ENDMDL are used. +The Calpha (marked CA) and SC (marked CB) coordinates are output. The CONECT +records specify the Calpha...Calpha and Calpha...SC virtual bonds. Secondary +structure is detected based on peptide-group contacts, as specified in +ref 12. Dummy residues are omitted from the output. If the program has +multiple-chain function, the presence of a dummy residue in a sequence +starts a new chain, which is assigned the next alphabet letter as ID, and +residue numbering is started over. + +9.1.5. The SYBYLL (MOL2) files +------------------------------ + +See the description of mol2 format (e.g., +http://tripos.com/data/support/mol2.pdf). Similar remarks apply as for +the PDB format (section 9.1.4). + +9.2. The summary (STAT) file +---------------------------- + +9.2.1. Non-MD runs +------------------ + +This file contains a short summary of the quantities characterizing the +conformations produced by UNRES/CSA. It is created for MULTCONF and MCM. + +NOUT,EVDW,EVDW2,EVDW1+EES,ECORR,EBE,ESCLOC,ETORS,ETOT,RMS,FRAC +(I5,9(1PE14.5)) + +NOUT - the number of the conformations + +EVDW,EVDW2,EVDW1+EES,ECORR,EBE,ESCLOC,ETORS - energy components + +ETOT - total energy + +RMS - RMS deviation from the reference structure (if REFSTR was specified) + +FRAC - fraction of side chain - side chain contacts of the reference + structure present in this conformation (if REFSTR was specified) + +9.2.2. MD and MREMD runs +------------------------- + +Each line of the stat file generated by MD/MREMD runs contains the following +items in sequence: + +step - the number of the MD step + +time - time [unit is MTU (molecular time unit) equal to 48.9 fs] + +Ekin - kinetic energy [kcal/mol] + +Epot - potential energy [kcal/mol] + +Etot - total energy (Ekin+Epot) + +H-H0 - the difference between the cureent and initial extended Hamiltionian + in Nose-Hoover or Nose-Poincare runs; not present for other thermostats. + +RMSD - root mean square deviation from the reference structure (only in + REFSTR has been specified) + +damax - maximum change of acceleration between two MD steps + +fracn - fraction of native side-chain concacts (very crude, based on + SC-SC distance only) + +fracnn - fraction of non-native side-chain contacts + +co - contact order + +temp - actual temperature [K] + +T0 - initial (microcanonical runs) or thermostat (other run types) + temperature [K] + +Rgyr - radius of gyration based on Calpha coordinates [A] + +proc - in MREMD runs the number of the processor (the number of the + trajectory less 1); not present for other runs. + +For an USAMPL run, the following items follow the above list: + +iset - the number of the restraint set + +uconst - restraint energy pertaining to q-values + +uconst_back - restraint energy pertaining to virtual-backbone restraints + +(qfrag(i),i=1,nfrag) - q values of the specified fragments + +(qpair(ii2),ii2=1,npair) - q values of the specified pairs of fragments + +(utheta(i),ugamma(i),uscdiff(i),i=1,nfrag_back) - virtual-backbone and + side-chain-rotamer restraint energies of the fragments specified + +If PRINT_COMPON has been specified, the energy components are printed +after the items described above. + +9.3. CSA-specific output files +------------------------------ + +There are several output files from the CSA routine: +INPUT.CSA.seed, INPUT.CSA.history, INPUT.CSA.bank, INPUT.CSA.bank1, +INPUT.CSA.rbank INPUT.CSA.alpha, INPUT.CSA.alpha1. + +The most informative outfile is INPUT.CSA.history. This file first write down +the parameters in INPUT.CSA.csa file. Later it shows the energies of random +minimized conformations in it's generation. After sorting the First_bank +in energy (ascending order), the energies of the First_bank is re-written here. +After this the output looks like: + 1 0 100 6048.2 1 100-224.124-114.346 202607 100 100 + 1 0 700 5882.6 2 29-235.019-203.556 1130308 100 100 + 1 0 1300 5721.5 2 18-242.245-212.138 2028008 100 100 + 1 0 1900 5564.8 13 54-245.185-218.087 2897988 98 100 + 1 0 2500 5412.4 13 61-246.214-222.068 3706478 97 100 + 1 0 3100 5264.2 13 89-248.715-224.939 4514196 96 100 + +Each line is written between each iteration (just after selection +of seed conformations) containing following data: +jlee,icycle,nstep,cutdif,ibmin,ibmax,ebmin,ebmax,nft,iuse,nbank +ibmin and ibmax lists the index of bank conformations corresponding to the +lowest and highest energies with ebmin and ebmax. +nft is the total number of function evaluations so far. +iuse is the total number of conformations which have not been used as seeds +prior to calling subroutine select_is which select seeds. + +Therefore, in the example shown above, one notes that so far 3100 +minimizations has been performed corresponding to the total of 4514196 +function evaluations. The lowest and highest energy in the Bank is +-248.715 (#13) and -224.939 (#89), respectively. The number of conformations +already used as seeds (not including those selected as seeds in this iteration) +so far is 4 (100-96). + +The files INPUT.CSA.bank and INPUT.CSA.rbank contains data of Bank and +First_bank. For more information on these look subroutines write_bank +and write_rbank. The file INPUT.CSA.bank is overwritten between each +iteration whereas Bank is accumulated in INPUT.CSA.bank1 (not for every +iteration but as specified in the subroutine together.f). + +The file INPUT.CSA.seed lists the index of the seed conformations with their +energies. Files INPUT.CSA.alpha, INPUT.CSA.alpha1 are written only once +at the beginning of the CSA run. These files contain some arrays used +in CSA procedure. + +10. TECHNICAL SUPPORT CONTACT INFORMATION +----------------------------------------- + + Dr. Adam Liwo + Faculty of Chemistry, University of Gdansk + ul. Sobieskiego 18, 80-952 Gdansk Poland. + phone: +48 58 523 5430 + fax: +48 58 523 5472 + e-mail: adam@chem.univ.gda.pl + + Dr. Cezary Czaplewski + Faculty of Chemistry, University of Gdansk + ul. Sobieskiego 18, 80-952 Gdansk Poland. + phone: +48 58 523 5430 + fax: +48 58 523 5472 + e-mail: czarek@chem.univ.gda.pl + + Dr. Stanislaw Oldziej + Intercollegiate Faculty of Biotechnology + University of Gdansk, Medical University of Gdansk + ul. Kladki 22, 80-922 Gdansk, Poland + phone: +48 58 523 5361 + fax: +48 58 523 5472 + e-mail: stan@biotech.ug.gda.pl + + Dr. Jooyoung Lee + Korea Institute for Advanced Study + 207-43 Cheongnyangni 2-dong, Dongdaemun-gu, + Seoul 130-722, Korea + phone: +82-2-958-3890 + fax: +82-2-958-3731 + email: jlee@kias.re.kr + +Prepared by Adam Liwo and Jooyoung Lee, 7/17/99 +Revised by Cezary Czaplewski 1/4/01 +Revised by Cezary Czaplewski and Adam Liwo 8/26/03 +Revised by Cezary Czaplewski and Adam Liwo 11/26/11 +Revised by Adam Liwo 02/19/12 +