1 WHAM (Weighted Histogram Analysis Method)
2 Processing results of UNRES/MREMD simulations
3 ---------------------------------------------
12 3. Functions of the program
16 5. Running the program
18 6. Input and output files
20 6.2. The main input file
22 6.2.2 Molecule and energy parameter data
23 6.2.2.1. General information
24 6.2.2.2. Sequence information
25 6.2.2.3. Dihedral angle restraint information
26 6.2.2.4. Disulfide-bridge data
27 6.2.3. Energy-term weights and parameter files
28 6.2.4. (M)REMD/Hamiltonian (M)REMD setting specification
29 6.2.5. Information of files from which to read conformations
30 6.2.6. Information of reference structure and comparing scheme
31 6.3. The structure of the main output file (out)
32 6.4. The thermodynamic quantity and ensemble average (stat) files
33 6.5. The conformation summary with classification (stat) files
34 6.6. The histogram files
35 6.7. The rmsd-radius of gyration potential of mean force files
37 6.8. The compresses Cartesian coordinates (cx) file.
44 * This software is provided free of charge to academic users, subject to the
45 condition that no part of it be sold or used otherwise for commercial
46 purposes, including, but not limited to its incorporation into commercial
47 software packages, without written consent from the authors. For permission
48 contact Prof. H. A. Scheraga, Cornell University.
50 * This software package is provided on an "as is" basis. We in no way warrant
51 either this software or results it may produce.
53 * Reports or publications using this software package must contain an
54 acknowledgment to the authors and the NIH Resource in the form commonly
61 [1] S. Kumar, D. Bouzida, R.H. Swendsen, P.A. Kollman, J.M. Rosenberg.
62 The weighted histogram analysis method for free-energy calculations
63 on biomolecules. I. The method.
64 J. Comput. Chem., 1992, 13, 1011-1021.
66 [2] A. Liwo, M. Khalili, C. Czaplewski, S. Kalinowski, S. Oldziej, K. Wachucik,
68 Modification and optimization of the united-residue (UNRES) potential
69 energy function for canonical simulations. I. Temperature dependence of the
70 effective energy function and tests of the optimization method with single
72 J. Phys. Chem. B, 2007, 111, 260-285.
74 [3] S. Oldziej, A. Liwo, C. Czaplewski, J. Pillardy, H.A. Scheraga.
75 Optimization of the UNRES force field by hierarchical design of the
76 potential-energy landscape. 2. Off-lattice tests of the method with single
77 proteins. J. Phys. Chem. B., 2004, 108, 16934-16949.
79 [4] S. Oldziej, A. Liwo, C. Czaplewski, J. Pillardy, H.A. Scheraga.
80 Optimization of the UNRES force field by hierarchical design of the
81 potential-energy landscape. 2. Off-lattice tests of the method with single
82 proteins. J. Phys. Chem. B., 2004, 108, 16934-16949.
84 3. FUNCTIONS OF THE PROGRAM
85 ---------------------------
87 The program processes the results of replica exchange (REMD) or multiplexed
88 replica exchange molecular dynamics (MREMD) simulations with UNRES to compute
89 the probabilities of the obtained conformations to occur at particular
90 temperatures. The program is based on the variant of the weighted histogram
91 analysis (WHAM) method [1] described in ref [2].
93 The program outputs the following information:
95 a) Temperature profiles of thermodynamic and structural ensemble-averaged
98 b) Histograms of native-likeness measure q (defined by eqs 8-11 of ref [2]).
100 c) Optionally the most probable conformations at REMD temperatures.
102 d) Optionally the coordinates with information to compute probabilities
103 for the conformations to occur at any temperature.
105 The program takes usually UNRES compressed coordinate files (cx files) from
106 MREMD obtained by using the TRAJ1FILE option. The user can request to
107 partition the whole run into equal slices (or windows), each starting from,
108 say, snapshot n (for each trajectory) and ending at snapshot n+1.
109 Alternatively, the UNRES Cartesian coordinate (x files) can be input; however,
110 they must contain only the analyzed portion of the trajectories; they
111 are usually prepared from single trajectories by using xdrf2x.
113 Two versions of the program are provided:
115 a) Canonical version which treats single polypeptide chains; the source code
116 is in WHAM/src directory.
118 b) Version for oligomeric proteins; multiple chains are handled by inserting
119 dummy residues in the sequence; the source code is in WHAM/src-M directory.
124 Customize Makefile to your system. See section 7 of the description of UNRES
125 for compiler flags that are used to created executables for a particular
126 force field. There are already several Makefiles prepared for various systems
129 Run make in the WHAM/src directory WHAM/src-M directory for multichain
130 version. Make sure that MPI is installed on your system; the present program
131 runs only in parallel mode.
133 5. RUNNING THE PROGRAM
134 ----------------------
136 The program requires a parallel system to run. Depending on system,
137 either the wham.csh C-shell script (in WHAM/bin directory) can be started
138 using mpirun or the binary in the C-shell script must be executed through
139 mpirun. See the wham.csh C-shell script and section 6 for the files
140 processed by the program.
142 6. INPUT AND OUTPUT FILES
143 -------------------------
145 6.1. SUMMARY OF THE FILES
146 -------------------------
148 The C-shell script wham.csh is used to run the program (see the WHAM/bin
149 directory). The data files that the script needs are mostly the same as
150 for UNRES (see section 6 of UNRES description). In addition, the environmental
151 variable CONTFUN specifies the method to assess whether two side chains
152 are at contact; if CONTFUN=GB, the criterion defined by eq 8 of ref 4 is
153 used to assess whether two side chains are at contact. Also, the parameter
154 files from the C-shell scripts are overridden if the data from Hamiltonian
155 MREMD are processed; if so, the parameter files are defined in the main
158 The main input file must have inp extension. If it is INPUT.inp, the output
159 files are as follows:
161 INPUT.out_POTxxx - output files from different processors (INPUT.out_000 is the
162 main output file). POT is the identifier of the sidechain-sidechain
165 INPUT_POT_GB_xxx.stat or INPUT_POT_slice_YYXXX.stat- the summary conformation-
166 classification file from processor xxx (each processor handles part of
167 conformations); the second occurs if the run is partitioned into slices.
169 INPUT.thermal or INPUT_slice_yy.thermal - thermodynamic functions and
170 temperature profiles of the ensemble averages (the second form if the
171 run is partitioned into slices).
173 INPUT_T_xxx.pdb or INPUT_slice_yy_T_xxx.pdb - top conformations the number
174 of these conformations is selected by the user) in PDB format.
176 INPUT.cx - the compressed UNRES coordinate file with information to compute
177 the probability of a given conformation at any temperature.
179 INPUT.hist INPUT_slice_xx.hist INPUT_par_yy.hist INPUT_par_yy_slice_zz.x
180 - histograms of q at MREMD temperatures.
182 INPUT.ent INPUT_slice_xx.ent INPUT_par_yy.ent INPUT_par_yy_slice_xx.ent
183 - the histogram(s) of energy density.
185 INPUT.rmsrgy INPUT_par_yy.rmsrgy INPUT_slice_xx.rmsrgy or
186 INPUT_par_yy_slice_xx.rmsrgy
187 - the 2D histogram(s) of rmsd from the experimental structure and radius
193 This file has the same structure as the UNRES input file; most of the data are
194 input in a keyword-based form (see section 7.1 of UNRES description). The data
195 are grouped into records, referred to as lines. Each record, except for the
196 records that are input in non-keyword based form, can be continued by placing
197 an ampersand (&) in column 80. Such a format is referred to as the data list
200 In the following description, the default values are given in parentheses.
202 6.2.1. General data (data list format)
203 --------------------------------------
205 N_ENE (N_ENE_MAX) - the number of energy components
207 SYM (1) - number of chains with same sequence (for oligomeric proteins only),
209 HAMIL_REP - if present, Hamiltonian process the results of replica exchange runs
210 (replicas with different parameters of the energy function)
212 NPARMSET (1) - number of energy parameter sets (>1 only for Hamiltonian
213 replica exchange simulations)
215 SEPARATE_PARSET - if present, HREMD was run in a mode such that only temperature
216 but not energy-function parameters was exchanged
218 IPARMPRINT (1) - number of parameter set with which to construct conformational
219 ensembles; important only when HREMD runs are processed
221 ENE_ONLY - if present, only conformational energies will be calculated and
222 printed; no WHAM iteration
224 EINICHECK (2) - > 0 compare the conformational energies against those stored in
225 the coordinate file(s); 1: compare but print only a warning message if
226 different; 2: compare and terminate the program if different; 0: don't
229 MAXIT (5000) - maximum number of iterations in solving WHAM equations
231 ISAMPL (1) - input conformation sampling frequency (e.g., if ISAMPL=5, only
232 each 5th conformation will be read)
234 NSLICE (1) - number of "slices" or "windows" into which each trajectory will
235 be partitioned; each slice will be analyzed independently
237 FIMIN (0.001) - maximum average difference between window free energies
238 between the current and the previous iteration
240 ENSEMBLES (0) - number of conformations (ranked according to probabilities) to
241 be output to PDB file at each MREMD temperature; 0 means that no
242 conformations will be output. Non-zero values should not be used when NSLICE>1
244 CLASSIFY - if present, each conformation will be assigned a class, according
245 to the scheme described in ref [3]
247 DELTA (0.01) - one dimension bin size of the histogram in q
249 DELTRMS (0.05) - rms dimension bin size in rms-radius of gyration histograms
251 DELTRGY (0.05) - radius of gyration bin size in rms-radius of gyration histograms
253 NQ (1) - number of q's (can be for entire molecule, fragments, and pairs of
256 CXFILE - produce the compressed coordinate file with information necessary to
257 compute the probabilities of conformations at any temperature
259 HISTOUT - if present, the histograms of q at MREMD temperatures are
260 constructed and printed to main output file
262 HISTFILE - if present, the histograms are also printed to separate files
264 ENTFILE - if present, histogram of density of states (entropy) is constructed
267 RMSRGYMAP - if present, 2D histograms of radius of rmsd and radius of gyration at MREMD
268 temperatures are constructed and printed
270 WITH_DIHED_CONSTR - if present, dihedral-angle restraints were imposed in the
271 processed MREMD simulations
273 RESCALE (1) - Choice of the type of temperature dependence of the force field.
274 0 - no temperature dependence
275 1 - homographic dependence (not implemented yet with any force field)
276 2 - hyperbolic tangent dependence [18].
278 6.2.2 Molecule and energy parameter data
279 ----------------------------------------
281 6.2.2.1. General information
282 ----------------------------
284 SCAL14 (0.4) - scale factor of backbone-electrostatic 1,4-interactions
286 SCALSCP (1.0) - scale factor of SC-p interactions
288 CUTOFF (7.0) - cut-off on backbone-electrostatic interactions to compute 4-
289 and higher-order correlations
291 DELT_CORR (0.5) - thickness of the distance range in which the energy is
294 ONE_LETTER - if present, the sequence is to be read in 1-letter code,
295 otherwise 3-letter code
297 6.2.2.2. Sequence information
298 -----------------------------
300 1st record (keyword-based input):
302 NRES - number of residues, including the UNRES dummy terminal residues, if present
304 Next records: amino-acid sequence
306 3-letter code: Sequence is input in format 20(1X,A3)
308 1-letter code: Sequence is input in format 80A1
310 6.2.2.3. Dihedral angle restraint information
311 ---------------------------------------------
313 This is the information about dihedral-angle restraints, if any are present.
314 It is specified only when WITH_DIHED_CONSTR is present in the first record.
316 1st line: ndih_constr - number of restraints (free format)
318 2nd line: ftors - force constant (free format)
320 Each of the following ndih_constr lines:
322 idih_constr(i),phi0(i),drange(i) (free format)
324 idih_constr(i) - the number of the dihedral angle gamma corresponding to the
327 phi0(i) - center of dihedral-angle restraint
329 drange(i) - range of flat well (no restraints for phi0(i) +/- drange(i))
331 6.2.2.4. Disulfide-bridge data
332 ------------------------------
334 1st line: NS, (ISS(I),I=1,NS) (free format)
336 NS - number of cystine residues forming disulfide bridges
338 ISS(I) - the number of the Ith disulfide-bonding cystine in the sequence
340 2nd line: NSS, (IHPB(I),JHPB(I),I=1,NSS) (free format)
342 NSS - number of disulfide bridges
344 IHPB(I),JHPB(I) - the first and the second residue of ith disulfide link
346 Because the input is in free format, each line can be split
348 6.2.3. Energy-term weights and parameter files
349 ----------------------------------------------
351 There are NPARMSET records specified below.
353 All items described in this section are input in keyword-based mode.
355 1st record: Weights for the following energy terms:
357 WSC (1.0) - side-chain-side-chain interaction energy
359 WSCP (1.0) - side chain-peptide group interaction energy
361 WELEC (1.0) - peptide-group-peptide group interaction energy
363 WEL_LOC (1.0)- third-order backbone-local correlation energy
365 WCORR (1.0) - fourth-order backbone-local correlation energy
367 WCORR5 (1.0) - fifth-order backbone-local correlation energy
369 WCORR6 (1.0) - sixth-order backbone-local correlation energy
371 WTURN3 (1.0) - third-order backbone-local correlation energy of pairs of
372 peptide groups separated by a single peptide group
374 WTURN4 (1.0) - fourth-order backbone-local correlation energy of pairs of
375 peptide groups separated by two peptide groups
377 WTURN6 (1.0) - sixth-order backbone-local correlation energy for pairs of
378 peptide groups separated by four peptide groups
380 WBOND (1.0) - virtual-bond-stretching energy
382 WANG (1.0) - virtual-bond-angle-bending energy
384 WTOR (1.0) - virtual-bond-torsional energy
386 WTORD (1.0) - virtual-bond-double-torsional energy
388 WSCCOR (1.0) - sequence-specific virtual-bond-torsional energy
390 WDIHC (0.0) - dihedral-angle-restraint energy
392 WHPB (1.0) - distance-restraint energy
394 2nd record: Parameter files. If filename is not specified that corresponds to
395 particular parameters, the respective name from the C-shell script will be
396 assigned. If no files are to be specified, an empty line must be inserted.
398 BONDPAR - bond-stretching parameters
400 THETPAR - backbone virtual-bond-angle-bending parameters
402 ROTPAR - side-chain-rotamer parameters
404 TORPAR - backbone-torsional parameters
406 TORDPAR - backbone-double-torsional parameters
408 FOURIER - backbone-local - backbone-electrostatic correlation parameters
410 SCCORAR - sequence-specific backbone-torsional parameters (not used at
413 SIDEPAR - side-chain-side-chain-interaction parameters
415 ELEPAR - backbone-electrostatic-interaction parameters
417 SCPPAR - backbone-side-chain-interaction parameters
419 6.2.4. (M)REMD/Hamiltonian (M)REMD setting specification
420 --------------------------------------------------------
422 If HAMIL_REP is present in general data, read the following group of records
423 only once; otherwise, read for each parameter set (NPARSET times total)
425 NT (1) - number of temperatures
427 REPLICA - if present, replicas in temperatures were specified with this parameter set
429 UMBRELLA - if present, umbrella-sampling was run with this parameter set
431 READ_ISET - if present, umbrella-sampling-window number is read from the compressed Cartesian
432 coordinate (cx) file even if the data are not from umbrella-sampling run(s).
433 ISET is present in the cx files from the present version of UNRES.
435 Following NT records are for consecutive temperature replicas; each record is
436 organized as keyword-based input:
438 TEMP (298.0) - initial temperature of this replica (replicas in MREMD)
440 FI (0.0) - initial values of the dimensionless free energies for all q-restraint
441 windows for this replica (NR values)
443 KH (100.0) - force constants of q restraints (NR values)
445 Q0 (0.0d0) - q-restraint centers (NR values)
447 6.2.5. Information of files from which to read conformations
448 ------------------------------------------------------------
450 If HAMIL_REP is present in general data, read the following two records
451 only once; otherwise, read for each parameter set (NPARSET times total)
453 1st record (keyword-based input):
455 For temperature replica only ONE record is read; for non-(M)REMD runs, NT
456 records must be supplied. The records are in keyword-based format.
458 NFILE_ASC - number of files in ASCII format (UNRES Cartesian coordinate (x)
459 files) for current parameter set
461 NFILE_CX - number of compressed coordinate files (cx files) for current
464 NFILE_BIN - number of binary coordinate files (now obsolete because it
465 requires initial conversion of ASCII format trajectories into binary format)
467 It is strongly recommended to use cx files from (M)REMD runs with TRAJ1FILE
468 option. Multitude of trajectory files which are opened and closed by different
469 processors might impair file system accessibility. Should you wish to process
470 trajectories each one of which is stored in a separate file, better collate
471 the required slices of them first to an x file by using the xdrf2x program
472 piped to the UNIX cat command.
476 coordinate file name(s) without extension
478 6.2.6. Information of reference structure and comparing scheme
479 -----------------------------------------------------------------
481 The following records pertain to setting up the classification of conformation
482 aimed ultimately at obtaining a class numbers. Fragments and pairs of
483 fragments are specified and compared against those of reference structure in
484 terms of secondary structure, number of contacts, rmsd, virtual-bond-valence
485 and dihedral angles, etc. Then the class number is constructed as described in
486 ref 3. A brief description of comparison procedure is as follows:
488 1. Elementary fragments usually corresponding to elements of secondary
489 or supersecondary structure are selected. Based on division into fragments,
490 levels of structural hierarchy are defined.
492 2. At level 1, each fragment is checked for agreement with the corresponding
493 fragment in the native structure. Comparison is carried out at two levels:
494 the secondary structure agreement and the contact-pattern agreement level.
496 At the secondary structure level the secondary structure (helix, strand
497 or undefined) in the fragment is compared with that in the native fragment
498 in a residue-wise manner. Score 0 is assigned if the structure is different
499 in more than 1/3 of the fragment, 1 is assigned otherwise.
501 The contact-pattern agreement level compares the contacts between the peptide
502 groups of the backbone of the fragment and the native fragment and also
503 compares their virtual-bond dihedral angles gamma. It is allowed to shift
504 the sequence by up to 3 residues to obtain contact pattern match. A score
505 of 0 is assigned if more than 1/3 of native contacts do not occur or
506 there is more than 60 deg (usually, but this cutoff can be changed) maximum
507 difference in gamma. Otherwise score 1 is assigned.
509 The total score of a fragment is an octal number consisting of bits
510 hereafter referred to S (secondary structure) C (contact match) and H
511 (sHift) (they are in the order HCS). Their values are as follows:
513 S - 1 native secondary structure; 0 otherwise;
514 C - 1 native contact pattern; 0 otherwise;
515 H - 1 contact match obtained without sequence shift 0 otherwise.
517 For example, octal 7 (111) corresponds to native secondary structure, native
518 contact pattern, and no need to shift the sequence for contact match;
519 octal 1 (001) corresponds to native secondary structure only (i.e., nonnative
522 3. At level 2, contacts between (i) the peptide groups or (ii) the side chains
523 within pairs of fragments are compared. Case (i) holds when we seek contacts
524 between the strands of a larger beta-sheet formed by two fragments, case (ii)
525 when we seek the interhelix or helix-beta sheet contacts. Additionally,
526 the pairs of fragments are compared with their native counterparts by rmsd.
527 Score 0 is assigned to a pair of fragments, if it has less than 2/3 native
528 contacts and too large rmsd (a cut-off of 0.1 A/residue is set), score 1 if
529 it has enough native contacts and sufficiently low rmsd, but the sequence
530 has to be shifted to obtain a match, and score 2, if sufficient match is
531 obtained without shift.
533 4. At level 3 and higher, triads, quadruplets,..., etc. of fragments are
534 compared in terms of rmsd from their native counterparts (the last level
535 corresponds to comparing whole molecules). The score (0, 1, or 2) is assigned
536 to each composite fragment as in the case of level 2.
538 5. The TOTAL class number of a structure is a binary number composed of
539 parts of scores of fragments, fragment pairs, etc. It is illustrated
540 on the following example; it is assumed that the molecule has three fragment
541 as in the case of 1igd.
543 level 1 level 2 level 3
544 123 123 123||1-2 1-3 2-3 1-2 1-3 2-3 || 1-2-3 | 1-2-3 ||
545 sss|ccc|hhh|| c c c | h h h || r | h ||
547 Bits s, c, and h of level 1 are explained in point 2; bits c and h of level
548 2 pertain to contact-pattern match and shift; bits r and h of level 3 pertain
549 to rmsd match and shift for level 3.
551 The input is specified as follows:
554 Program to classify structures
556 1st record (keyword-based input):
558 VERBOSE : if present, detailed output in classification (use if you want to
561 PDBREF : if present, the reference structure is read from the pdb
563 BINARY : if present, the class will be output in octal/quaternary/binary format
564 for levels 1, 2, and 3, respectively
566 DONT_MERGE_HELICES : if present, the pieces of helices that contain only
567 small breaks of hydrogen-bonding contacts (e.g., a kink) are not merged
570 NLEVEL=n : number of classification levels
572 n>0 - the fragments for n levels will be defined manually
573 n<0 - the number of levels is -n and the fragments will be detected automatically
575 START=n : the number of conformation at which to start
577 END=n : the number of conformation at which to end
579 FREQ=n (1) : sampling frequency of conformations; e.g. FREQ=2 means that every
580 second conformation will be considered
582 CUTOFF_UP=x : upper boundary of rmsd cutoff (the value is per 50 residues)
584 CUTOFF_LOW=x : lower boundary of rmsd cutoff (per 50 residues)
586 RMSUP_LIM=x : lower absolute boundary of rmsd cutoff (regardless of fragment
589 RMSUPUP_LIM=x : upper absolute boundary of rmsd cutoff (regardless of fragment
592 FRAC_SEC=x (0.66666) the fraction of native secondary structure
593 to consider a fragment native in secondary structure
597 For nlevel < 0 (automatic fragment assignment):
599 SPLIT_BET=n (0) : if 1, the hairpins are split into strands and strands are
600 considered elementary fragments
602 ANGCUT_HEL=x (50): cutoff on gamma angle differences from the native for a helical
605 MAXANG_HEL=x (60) : as above but maximum cutoff
607 ANGCUT_BET=x (90), MAXANG_BET=x (360), ANGCUT_STRAND=xi (90), MAXANG_STRAND=x (360)
608 same but for a hairpin or sheet fragment.
610 FRAC_MIN=x (0.6666) : minimum fraction of native secondary structure
612 NC_FRAC_HEL=x (0.5) : fraction of native contacts for a helical fragment
614 NC_REQ_HEL=x (0) : minimum required number of contacts
616 NC_FRAC_BET=x (0.5), NC_REQ_BET=x (0) : same for beta sheet fragments
618 NC_FRAC_PAIR=x (0.3), NC_REQ_PAIR=x (0) : same for pairs of segments
620 NSHIFT_HEL=n (3), NSHIFT_BET=n (3), NSHIFT_STRAND=n (3), NSHIFT_PAIR=n (3) :
621 allowed sequence shift to match native and compared structure for the
622 respective types of secondary structure
624 RMS_SINGLE=n (0), CONT_SINGLE=n (1), LOCAL_SINGLE=n (1), RMS_PAIR=n (0),
626 CONT_PAIR=n (1) : types of criteria in considering the geometry of a fragment
627 or pair native; 1 means that the criterion is turned on
629 For nlevel > 0 (manual assignment):
635 NFRAG=n : number of elementary fragments
637 Next lines (one group of lines per each fragment):
641 NPIECE=n : number of segments constituting the fragment
643 ANGCUT, MAXANG, FRAC_MIN, NC_FRAC, NC_REQ : criterial numbers of native-likeness
644 as for automatic classification
646 LOCAL, ELCONT, SCCONT, RMS : types of criteria implemented, as for automatic
647 classification except that ELECONT and SCCONT mean that electrostatic or
648 side-chain contacts are considered, respectively
650 NPIECE following lines:
652 IFRAG1=n, IFRAG2=n : the start and end residue of a continuous segment constituting
659 NFRAG=n : number of fragments considered at this level
661 For each fragment the following line is read:
663 NPIECE=n : number of elementary fragments (as defined at level 1) constituting this
666 IPIECE=i1 i2 ... in: the numbers of these fragments
668 NC_FRAC, NC_REQ : contact criteria (valid only for level 2)
670 ELCONT, SCCONT, RMS : as for level 1; note, that for level 3 and higher the only
671 criterion of nativelikeness is rms
673 3rd (for nlevel<0) or following (for n>0) line:
675 Name of the file with reference structure (e.g., the pdb file with the
676 experimental structure)
678 6.3. The structure of the main output file (out)
679 ------------------------------------------------
681 The initial portion of the main output file, named INPUT.out_POT_000
682 contains information of parameter files specified in the C-shell script,
683 compilation info, and the UNRES numeric code of the amino-acid sequence.
684 Subsequently, actual energy-term weights and parameter files are printed.
685 If lprint was set at .true. in parmread.F, all energy-function
686 parameters are printed. If REFSTR was specified in the control-data list,
687 the program then outputs the read reference-structure coordinates and
688 partition of structure into fragments.
690 Subsequently, the information about the number of structures read in and
691 those that were rejected is printed followed by succinct information form
692 the iteration process. Finally, the histograms (also output separately to
693 specific histogram files; see section 6.6) and the data of the dependence of
694 free energy, energy, heat capacity, and conformational averages on temperature
695 are printed (these are also output separately to file described in section
698 The output files corresponding to non-master processors
699 (INPUT.out_POT_xxx where xxx>0 contain only the information up to the
700 iteration protocol. These files can be deleted right after the run.
702 6.4. The thermodynamic quantity and ensemble average (thermal) files
703 -----------------------------------------------------------------
705 The files INPUT.thermal or INPUT_slice_yy.thermal contain thermodynamic,
706 ensemble-averaged conformation-dependent quantities and their temperature
707 derivatives. The structure of a record is as follows:
709 T F E q_1...q_n rmsd Rgy Cv var(q_1)...var(q_n) var(rmsd) var(Rgy) cov(q_1,E)...cov(q_n,E) cov(rmsd,E) cov(Rgy,E)
710 298.0 -83.91454 -305.28112 0.30647 6.28347 11.61204 0.70886E+01 0.35393E-02 0.51539E+01 0.57012E+00 0.43802E+00 0.62384E+01 0.33912E+01
714 T: absolute temperature (in K),
718 E: average energy at T,
720 q_1..q_n: ensemble-averaged q values at T (usually only the total q corresponding to whole
721 molecule is requested, as in the example above, but the user can specify
722 more than one fragment or pair of fragments for which the q's are
723 calculated, If there's no reference structure, this entry contains
726 rmsd: ensemble-averaged root mean square deviation at T,
728 Rgy: ensemble-averaged radius of gyration computed from Calpha coordinates at T,
730 Cv: heat capacity at T,
732 var(q_1)...var(q_n): variances of q's at T,
734 var(rmsd): variance of rmsd at T,
736 var(Rgy): variance of radius of gyration at T,
738 cov(q_1,E)...cov(q_n,E): covariances of q's and energy at T,
740 cov(rmsd,E): covariance of rmsd and energy at T,
742 cov(Rgy,E): covariance of radius of gyration and energy at T.
744 According to Camacho and Thirumalali (Europhys. Lett., 35, 627, 1996), the
745 maximum of the variance of the radius of gyration corresponds to the collapse
746 point of a polypeptide chain and the maximum variance of q or rmsd corresponds to
747 the midpoint of the transition to the native structure. More precisely, these
748 points are inflection points in the plots of the respective quantities which,
749 with temperature-independent force field, are proportional to their covariances
752 6.5. The conformation summary with classification (stat) files
753 --------------------------------------------------------------
755 The stat files (with names INPUT_POT_xxx.stat or
756 INPUT_POT_sliceyyxxx.stat; where yy is the number of a slice and xxx
757 is the rank of a processor) contain the output of the classification
758 of subsequent conformations (equally partitioned between processors). The
759 files can be concatenated by processor rank to get a summary file. Each line
760 has the following structure (example values are also provided):
762 | level 1 | level 2 | level3 |
764 whole mol | frag1 frag2 frag3 cl1 | level3 | |
765 No energy rmsd q ang dif|n1n2 n3 rms q ang rms q ang rms q ang | nc1nc2 rms q rms q cl2| rms cl3|class
766 9999 -122.42 4.285 0.3751 47.8 |4 10 21 0.6 0.33 16.7 3.6 0.42 56.3 0.7 0.12 16.5 737 | 9 0 1.6 0.20 4.3 0.20 20 | 0 4.0 2 |737.20.2
768 No - number of conformation
770 whole mol denotes the characteristics of the whole molecule
773 level 1, 2, and 3 denote the characteristics computed for the respective fragments
776 n1, n2, n3 - number of native contacts for a given segment
778 cl1, cl2, cl3 - group of segment classes for segments at level 1, 2, and 3, respectively
780 class - total class of the conformation
782 The octal/quaternary/binary numbers denoting the class for a fragment at level 1, 2,
783 and 3, respectively, are described in ref. 3
785 6.6. The histogram files
786 ------------------------
788 The histogram file with names INPUT_[par_yy][_slice_xx].hist where xx denotes
789 the number of the slice and yy denotes the number of the parameter if
790 SEPARATE_PARSET was specified in input contain histograms of q at replica
791 temperatures and energy-parameter sets; with SEPARATE_PARSET histograms
792 corresponding to subsequent parameter sets are saved in files with par_yy
793 infixes. The histograms are multidimensional if q is a vector (usually,
794 however, q corresponds to the entire molecule and, consequently, the
795 histograms are one-dimensional). The histogram files are printed if histfile
796 and histout was specified in the control data record.
798 Each line of a histogram file corresponds to a given (multidimensional) bin in
799 q contains the following:
801 q_1,...,q_n at a given bin (format f6.3 for each)
803 histogram values for subsequent replica temperatures (format e20.10 for each)
805 iparm (the number of parameter set; format i5)
807 If SEPARATE_PARSET was not specified, the entries corresponding to each
808 parameter follow one another.
810 The state density (microcanonical entropy) is printed to file(s)
811 INPUT[_slice_xx].ent. Each line contains the left boundary of the energy
812 bin and ln(state density) followed by " ent" string. At present, the state
813 density is calculated correctly only if one energy-parameter set is used.
815 6.7. The rmsd-radius of gyration potential of mean force files
816 ------------------------------------------
818 These files with names INPUT[_par_yy][_slice_xx].rmsrgy contain the
819 two-dimensional potentials of mean force in rmsd and radius of gyration
820 at all replica-exchange temperatures and for all energy-parameter sets.
821 A line contains the left boundaries of the radius of gyration - rmsd bin
822 (radius of gyration first) (format 2f8.2) and the PMF values at all
823 replica-exchange temperatures (e14.5), followed by the number of the parameter
824 set. With SEPARATE_PARSET, the PMFs corresponding to different parameter sets
825 are printed to separate files.
830 The PDB files with names INPUT_[slice_xx_]Tyyy.pdb, where Tyyy specifies
831 a given replica temperature contain the conformations whose probabilities at
832 replica temperature T sum to 0.99, after sorting the conformations by
833 probabilities in descending order. The PDB files follow the standard format;
834 see ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_Letter.pdf.
835 For single-chain proteins, an example is as follows:
837 REMARK CONF 9059 TEMPERATURE 330.0 RMS 8.86
838 REMARK DIMENSIONLESS FREE ENERGY -1.12726E+02
839 REMARK ENERGY -2.22574E+01 ENTROPY -7.87818E+01
840 ATOM 1 CA VAL 1 8.480 5.714 -34.044
841 ATOM 2 CB VAL 1 9.803 5.201 -33.968
842 ATOM 3 CA ASP 2 8.284 2.028 -34.925
843 ATOM 4 CB ASP 2 7.460 0.983 -33.832
847 ATOM 115 CA LYS 58 28.446 -3.448 -12.936
848 ATOM 116 CB LYS 58 26.613 -4.175 -14.514
859 CONF is the number of the conformation from the processed slice of MREMD
862 TEMPERATURE is the replica temperature
864 RMS is the Calpha rmsd from the reference (experimental) structure.
866 DIMENSIONLESS FREE ENERGY is -log(probability) (equation 14 of ref 2)
867 for the conformation at this replica temperature calculated by WHAM.
869 ENERGY is the UNRES energy of the conformation at the replica temperature
870 (note that UNRES energy is in general temperature dependent).
872 ENTROPY is the omega of equation 15 of ref 2 of the conformation
874 In the ATOM entries, CA denotes a Calpha atom and CB denotes UNRES side-chain
875 atom. The CONECT entries specify the Calpha(i)-Calpha(i-1),
876 Calpha(i)-Calpha(i+1) and Calpha(i)-SC(i) links.
878 The PDB files generated for oligomeric proteins are similar except that
879 chains are separated with TER and molecules with ENDMDL records and chain
880 identifiers are included. An example is as follows:
882 REMARK CONF 765 TEMPERATURE 301.0 RMS 11.89
883 REMARK DIMENSIONLESS FREE ENERGY -4.48514E+02
884 REMARK ENERGY -3.58633E+02 ENTROPY 1.51120E+02
885 ATOM 1 CA GLY A 1 -0.736 11.305 24.600
886 ATOM 2 CA TYR A 2 -3.184 9.928 21.998
887 ATOM 3 CB TYR A 2 -1.474 10.815 20.433
891 ATOM 40 CB MET A 21 -4.033 -2.913 27.189
892 ATOM 41 CA GLY A 22 -5.795 -10.240 27.249
894 ATOM 42 CA GLY B 1 6.750 -6.905 19.263
895 ATOM 43 CA TYR B 2 5.667 -4.681 16.362
899 ATOM 163 CB MET D 21 4.439 12.326 -4.950
900 ATOM 164 CA GLY D 22 10.096 14.370 -9.301
915 6.8. The compressed Cartesian coordinates (cx) files
916 ----------------------------------------------------
918 These files contain compressed data in the Europort Data Compression XDRF
919 library format written by Dr. F. van Hoesel, Groeningen University
920 (http://hpcv100.rc.rug.nl/xdrfman.html). The files are written
921 by the cxwrite subroutine. The resulting cx file contains the omega
922 factors to compute probabilities of conformations at any temperature
923 and any energy-function parameters if Hamiltonian replica exchange was
924 performed in the preceding UNRES run. The files have general names
925 INPUT[_par_yy][_slice_xx].cx where xx is slice number and yy is parameter-set
928 The items written to the cx file are as follows (the precision is 5
931 1) Cartesian coordinates of Calpha and SC sites
932 2) nss (number of disulfide bonds)
934 a) ihpb (first residue of a disulfide link)
935 b) jhpb (second residue of a disulfide link)
936 4) UNRES energy at that replica temperature that the conformation was at
937 snapshot-recording time,
938 5) ln(omega) of eq 15 of ref 2,
940 7) conformation class number (0 if CLASSIFY was not specified).
946 Faculty of Chemistry, University of Gdansk
947 ul. Sobieskiego 18, 80-952 Gdansk Poland.
948 phone: +48 58 523 5430
950 e-mail: adam@chem.univ.gda.pl
952 Dr. Cezary Czaplewski
953 Faculty of Chemistry, University of Gdansk
954 ul. Sobieskiego 18, 80-952 Gdansk Poland.
955 phone: +48 58 523 5430
957 e-mail: czarek@chem.univ.gda.pl
959 Prepared by Adam Liwo, 02/19/12