doc/3.2.1/latex/clusterman.tex

   1 \documentclass[12pt]{article}
   2 %\usepackage{latex2html}
   3 \usepackage{enumerate}
   4 \usepackage{longtable}
   5 \usepackage{hyperref}
   6 \usepackage{amsmath}
   7 \usepackage{color}
   8 \parindent=0pt
   9 \parskip=12pt
  10 \textheight=24cm
  11 \textwidth=18cm
  12 \topmargin=-2.5cm
  13 \oddsidemargin=-0.5cm
  14 \setcounter{secnumdepth}{5}
  15 \setcounter{tocdepth}{5}
  16 \begin{document}
  17 \sloppy
  18
  19 \title{CLUSTER\\
  20 Cluster analysis of UNRES simulation results}
  21
  22 \author{Laboratory of Molecular Modeling\\ Faculty of Chemistry\\ University of Gdansk\\ Wita Stwosza 63\\ 80-308 Gdansk, Poland\\
  23 \\
  24 \\
  25 Scheraga Group\\ Baker Laboratory of Chemistry \\
  26 and Chemical Biology\\ Cornell University\\ Ithaca, NY 14853-1301, USA}
  27
  28 \maketitle
  29
  30 \newpage
  31
  32 \tableofcontents
  33
  34 % 1. License terms
  35 % 2. References
  36 % 3. Functions of the program
  37 % 4. Installation
  38 % 5. Running the program
  39 % 6. Input and output files
  40 %    6.1. Summary of files
  41 %    6.2. The main input file
  42 %         6.2.1. Title
  43 %         6.2.2. General data
  44 %         6.2.3. Energy-term weights and parameter files
  45 %         6.2.4 Molecule data
  46 %               6.2.4.1. Sequence information
  47 %               6.2.4.2. Dihedral angle restraint information
  48 %               6.2.4.3. Disulfide-bridge data
  49 %         6.2.5. Reference structure
  50 %    6.3. Main output file (out)
  51 %    6.4. Output coordinate files
  52 %         6.4.1. The internal coordinate (int) files
  53 %         6.4.2. The Cartesian coordinate (x) files
  54 %         6.4.3. The PDB files
  55 %                6.4.3.1. CLUST-UNRES runs
  56 %                6.4.3.2. CLUST-WHAM runs
  57 %                         6.4.3.2.1. Conformation family files
  58 %                         6.4.3.2.2. Average-structure file
  59 %    6.5. The conformation-distance file
  60 %    6.6. The clustering-tree PicTeX file
  61 % 7. Support
  62
  63 \newpage
  64
  65 \section{LICENSE TERMS}
  66 \label{sect:license}
  67
  68 \begin{itemize}
  69
  70 \item
  71                 This software is provided free of charge to academic users, subject to the condition that no part of it be sold or used otherwise for commercial purposes, including, but not limited to its incorporation into commercial software packages, without written consent from the authors. For permission contact Prof. H. A. Scheraga, Cornell University.
  72
  73 \item
  74                 This software package is provided on an ``as is'' basis. We in no way warrant either this software or results it may produce.
  75
  76 \item
  77                 Reports or publications using this software package must contain an acknowledgment to the authors and the NIH Resource in the form commonly used in academic research.
  78
  79 \end{itemize}
  80
  81 \newpage
  82
  83 \section{REFERENCES}
  84 \label{sect:references}
  85
  86 The program incorporates the hierarchical-clustering subroutine, hc.f written
  87 by G. Murtagh (refs 1 and 2). The subroutine contains seven methods of
  88 hierarchical clustering.
  89
  90 \begingroup
  91 \renewcommand{\section}[2]{}%
  92 \begin{thebibliography}{10}
  93
  94 \bibitem{murtagh_1985}
  95 Murtagh. Multidimensional clustering algorithms; Physica-Verlag:
  96 Vienna, Austria, 1985.
  97
  98 \bibitem{murtagh_1987}
  99 F. Murtagh, A. Heck. MultiVariate data analysis; Kluwer Academic:
 100 Dordrecht, Holland, 1987.
 101
 102 \bibitem{liwo_2007}
 103 A. Liwo, M. Khalili, C. Czaplewski, S. Kalinowski, S. Oldziej, K. Wachucik,
 104 H.A. Scheraga.
 105 Modification and optimization of the united-residue (UNRES) potential
 106 energy function for canonical simulations. I. Temperature dependence of the
 107 effective energy function and tests of the optimization method with single
 108 training proteins. {\it J. Phys. Chem. B}, {\bf 2007}, 111, 260-285.
 109
 110 \bibitem{oldziej_2004}
 111 S. Oldziej, A. Liwo, C. Czaplewski, J. Pillardy, H.A. Scheraga.
 112 Optimization of the UNRES force field by hierarchical design of the
 113 potential-energy landscape. 2. Off-lattice tests of the method with single
 114 proteins.  {\it J. Phys. Chem. B.}, {\bf 2004}, 108, 16934-16949.
 115
 116 \end{thebibliography}
 117 \endgroup
 118
 119 \newpage
 120
 121 \section{FUNCTIONS OF THE PROGRAM}
 122 \label{sect:functions}
 123
 124 The program runs cluster analysis of UNRES simulation results. There are two
 125 versions of the program depending on the origin of input conformation:
 126
 127 \begin{enumerate}
 128
 129 \item
 130    CLUST-UNRES: performs cluster analysis of conformations that are obtained
 131    directly from UNRES runs (CSA, MCM, MD, (M)REMD, multiple-conformation
 132    energy minimization). The source code and other important files are
 133    deposited in CLUST-UNRES subdirectory
 134
 135    The source code of this version is deposited in clust-unres/src
 136
 137 \item
 138    CLUST-WHAM: performs cluster analysis of conformations obtained in UNRES
 139    MREMD simulations and then processed with WHAM (weighted histogram analysis
 140    method). This enables the user to obtain clusters as conformational
 141    ensembles at a given temperature and to compute their probabilities
 142    (section 2.5 of ref 3). This version is deposited in the CLUST-WHAM
 143    subdirectory. This version has single- and multichain variants, whose
 144    source codes are deposited in the following subdirectories:
 145
 146 \begin{enumerate}
 147
 148 \item
 149    clust-wham/src    single-chain proteins
 150
 151 \item
 152    clust-wham/src-M  oligomeric proteins
 153
 154 \end{enumerate}
 155
 156 \end{enumerate}
 157
 158 The version developed for oligomeric proteins treats whole system as a single
 159 chain with dummy residues inserted. It also works for single chains but is
 160 not fully checked and it is recommended to use single-chain version for
 161 single-chain proteins.
 162
 163 \section{INSTALLATION}
 164 \label{sect:install}
 165
 166 It is recommended to use Cmake to install the whole package; please see
 167 Installation Guide.
 168
 169 Customize Makefile to your system. See section 7 of the description of UNRES
 170 for compiler flags that are used to created executables for a particular
 171 force field. There are already several Makefiles prepared for various
 172 systems and force fields.
 173
 174 Run make in the appropriate source directory version. CLUST-UNRES runs
 175 only in single-processor mode an CLUST-WHAM runs in both serial and parallel
 176 mode [only conformation-distance (rmsd) calculations are parallelized].
 177 The parallel version uses MPI.
 178
 179 \section{RUNNING THE PROGRAM}
 180 \label{sect:running}
 181
 182 The program requires a parallel system to run. Depending on system,
 183 either the wham.csh C-shell script (in WHAM/bin directory) can be started
 184 using mpirun or the binary in the C-shell script must be executed through
 185 mpirun. See the wham.csh C-shell script and section 6 for the files
 186 processed by the program.
 187
 188 \newpage
 189
 190 \section{INPUT AND OUTPUT FILES}
 191 \label{sect:inoutfiles}
 192
 193 \subsection{Summary of files}
 194 \label{sect:inoutfiles:summary}
 195
 196 The C-shell script wham.csh is used to run the program (see the
 197 bin/WHAM directory). The data files that the script needs are mostly the same as
 198 for UNRES (see section 6 of UNRES description). In addition, the environmental
 199 variable CONTFUN specifies the method to assess whether two side chains
 200 are at contact; if EONTFUN=GB, the criterion defined by eq 8 of ref 4 is
 201 used to assess whether two side chains are at contact. Also, the parameter
 202 files from the C-shell scripts are overridden if the data from Hamiltonian
 203 MREMD are processed; if so, the parameter files are defined in the main
 204 input file.
 205
 206 The main input file must have inp extension. If it is INPUT.inp, the output
 207 files are as follows:
 208
 209 Coordinate input file COORD.ext, where ext denotes file extension in one of the
 210 following formats:
 211
 212 \begin{description}
 213 \item{int} (extension int; UNRES angles theta, gamma, alpha, and beta),
 214 \item{x}   (extension x; UNRES Cartesian coordinate format; from MD),
 215 \item{pdb} (extension pdb; Protein Data Bank format; fro MD),
 216 \item{cx}  (extension cx; xdrf format; from WHAM).
 217 \end{description}
 218
 219 \begin{description}
 220 \item{INPUT\_clust.out} (single-processor mode) or INPUT\_clust.out\_xxx (parallel mode) --
 221      output file(s) (INPUT.out\_000 is the main output file for parallel mode).
 222
 223 \item{COORD\_clust.int} -- leading (lowest-energy) members of the families.
 224     in internal-coordinate format.
 225 \item{COORD\_clust.x} -- leading members of the families in UNRES Cartesian coordinate
 226     format.
 227 \item{COORD\_xxxx.pdb} or COORD\_xxxx\_yyy.pdb (CLUST-UNRES) -- PDB file of member yyy
 228     of family xxxx; yyy is omitted if the family contains only one member
 229     within a given energy cut-off.
 230 \item{COORD\_TxxxK\_yyyy.pdb} -- concatenated conformations in PDB format of the
 231     members of family yyyy clustered at T=xxxK ranked by probabilities in
 232     descending order at this temperature (CLUST-WHAM).
 233 \item{COORD\_T\_xxxK\_ave.pdb} -- cluster-averaged coordinates and coordinates of a
 234     member of each family that is closest to the cluster average in PDB
 235     format, concatenated in a single file (CLUST-WHAM).
 236
 237 \item{INPUT\_clust.tex} -- PicTeX code of the cluster tree.
 238
 239 \item{INPUT.rms} -- rmsds between conformations.
 240
 241 \end{description}
 242
 243 \subsection{Main input file}
 244 \label{sect:inoutfiles:main}
 245
 246 This file has the same structure as the UNRES input file; most of the data are
 247 input in a keyword-based form (see section 7.1 of UNRES description). The data
 248 are grouped into records, referred to as lines. Each record, except for the
 249 records that are input in non-keyword based form, can be continued by placing
 250 an ampersand (\&) in column 80. Such a format is referred to as the data list
 251 format.
 252
 253 In the following description, the default values are given in parentheses.
 254
 255 \subsubsection{Title}
 256
 257 An 80-character string from the first line is input.
 258
 259 \subsubsection{General data}
 260 \label{sect:inoutfiles:main:general}
 261
 262 (Data list format.)
 263
 264 \begin{description}
 265
 266 \item{NRES} (0) -- the number of residues.
 267
 268 \item{ONE\_LETTER} -- if present, the sequence is input in one-letter code.
 269
 270 \item{SYM} (1) -- number of chains with same sequence (for oligomeric proteins only).
 271
 272 \item{WITH\_DIHED\_CONSTR} -- if present, dihedral-angle restraints were imposed in the
 273     processed MREMD simulations
 274
 275 \item{RESCALE} (1) -- Choice of the type of temperature dependence of the force field.
 276
 277 \begin{description}
 278 \item{0}  -- no temperature dependence,
 279 \item{1}  -- homographic dependence (not implemented yet with any force field)
 280 \item{2}  -- hyperbolic tangent dependence \cite{liwo_2007}.
 281 \end{description}
 282
 283 \item{DISTCHAINMAX} (50.0) -- for oligomeric proteins, distance between the chains
 284      above which restraints will be switched on to keep the chains at a
 285      reasonable distance.
 286
 287 \item{PDBOUT} -- clusters will be printed in PDB format.
 288
 289 \item{ECUT} -- energy cut-off criterion to print conformations (UNRES-CLUST runs).
 290      Only those families will be output the energy of the lowest-energy
 291      conformation of which is within ECUT kcal/mol above that of the
 292      lowest-energy conformation and for a family only those members will be
 293      output which have energy within ECUT kcal/mol above the energy of the
 294      lowest-energy member of the family.
 295
 296 \item{PRINT\_CART} -- output leading members of the families in UNRES x format.
 297
 298 \item{PRINT\_INT} -- output leading members of the families in UNRES int format.
 299
 300 \item{REF\_STR} -- if present, reference structure is input and rmsd will be computed
 301       with respect to it (CLUST-UNRES only; rmsd is provided in the cx file
 302       from WHAM for CLUST-WHAM runs).
 303
 304 \item{PDBREF} -- if present, reference structure will be read in from a pdb file.
 305
 306 \item{SIDE} -- side chains will be considered in superposition when calculating rmsd.
 307
 308 \item{CA\_ONLY} -- only the Calpha atoms will be used in rmsd calculation.
 309
 310 \item{NSTART} (0) -- first residue to superpose.
 311
 312 \item{NEND} (0) -- last residue to superpose.
 313
 314 \item{NTEMP} (1) -- number of temperatures at which probabilities will be calculated
 315          and clustering performed (CLUST-WHAM).
 316
 317 \item{TEMPER} (NTEMP tiles) -- temperatures at which clustering will be performed
 318         (CLUST-WHAM).
 319
 320 \item{EFREE} -- if present, conformation entropy factor is read if the conformation
 321         is input from an x or pdb file.
 322
 323 \item{PROB} (0.99) -- cut-off on the summary probability of the conformations that
 324      are clustered at a given temperature (CLUST-WHAM).
 325
 326 \item{IOPT} (2) - clustering algorithm:
 327
 328 \begin{description}
 329 \item{1} -- Ward's minimum variance method.
 330 \item{2} -- single link method.
 331 \item{3} -- complete link method.
 332 \item{4} -- average link (or group average) method.
 333 \item{5} -- McQuitty's method.
 334 \item{6} -- Median (Gower's) method.
 335 \item{7} -- centroid method.
 336 \end{description}
 337
 338 Instead of IOPT=1, MINTREE and instead of IOPT=2 MINVAR can be specified
 339
 340 \item{NCUT} (1) -- number of cut-offs in clustering.
 341
 342 \item{CUTOFF} (-1.0; NCUT values) cut-offs at which clustering will be performed;
 343     at the cut-off flagged by a ``-'' sign clustering will be performed with
 344     cutoff value=abs(cutoff(i)) and conformations corresponding to clusters
 345     will be output in the desired format.
 346
 347 \item{MAKE\_TREE} -- if present, produce a clustering-tree graph.
 348
 349 \item{PLOT\_TREE} -- if present, the tree is written in PicTeX format to a file.
 350
 351 \item{PRINT\_DIST} -- if present, distance (rmsd) matrix is printed to main output
 352     file.
 353
 354 \item{PUNCH\_DIST} -- if present, the upper-triangle of the distance matrix will be
 355     printed to a file.
 356 \end{description}
 357
 358 \subsubsection{Energy-term weights and parameter files}
 359 \label{sect:inoutfiles:main:weights}
 360
 361 \begin{description}
 362 \item{WSC (1.0)}  --  side-chain-side-chain interaction energy.
 363
 364 \item{WSCP} (1.0)  --  side chain-peptide group interaction energya.
 365
 366 \item{WELEC} (1.0) --  peptide-group-peptide group interaction energy.
 367
 368 \item{WEL\_LOC} (1.0) -- third-order backbone-local correlation energy.
 369
 370 \item{WCORR} (1.0) -- fourth-order backbone-local correlation energy.
 371
 372 \item{WCORR5} (1.0) -- fifth-order backbone-local correlation energy.
 373
 374 \item{WCORR6} (1.0) -- sixth-order backbone-local correlation energy.
 375
 376 \item{WTURN3} (1.0) -- third-order backbone-local correlation energy of pairs of
 377                peptide groups separated by a single peptide group.
 378
 379 \item{WTURN4} (1.0) -- fourth-order backbone-local correlation energy of pairs of
 380                peptide groups separated by two peptide groups.
 381
 382 \item{WTURN6} (1.0) -- sixth-order backbone-local correlation energy for pairs of
 383                peptide groups separated by four peptide groups.
 384
 385 \item{WBOND} (1.0) -- virtual-bond-stretching energy.
 386
 387 \item{WANG} (1.0) --  virtual-bond-angle-bending energy.
 388
 389 \item{WTOR} (1.0) --  virtual-bond-torsional energy.
 390
 391 \item{WTORD} (1.0) -- virtual-bond-double-torsional energy.
 392
 393 \item{WSCCOR} (1.0) -- sequence-specific virtual-bond-torsional energy.
 394
 395 \item{WDIHC} (0.0) -- dihedral-angle-restraint energy.
 396
 397 \item{WHPB} (1.0)  -- distance-restraint energy.
 398
 399 \item{SCAL14} (0.4) -- scaling factor of 1,4-interactions
 400
 401 \end{description}
 402
 403 \subsubsection{Molecule information}
 404 \label{sect:inoutfiles:main:molinfo}
 405
 406 \paragraph{Sequence information\\ \\}
 407 \label{sect:inoutfiles:main:molinfo:sequence}
 408
 409 Amino-acid sequence
 410
 411 3-letter code: Sequence is input in format 20(1X,A3)
 412
 413 1-letter code: Sequence is input in format 80A1
 414
 415 \paragraph{Dihedral angle restraint information\\ \\}
 416 \label{sect:inoutfiles:molinfo:dihrestr}
 417
 418 This is the information about dihedral-angle restraints, if any are present.
 419 It is specified only when WITH\_DIHED\_CONSTR is present in the first record.
 420
 421 1st line: ndih\_constr -- number of restraints (free format)
 422
 423 2nd line: ftors -- force constant (free format)
 424
 425 Each of the following ndih\_constr lines:
 426
 427 idih\_constr(i),phi0(i),drange(i)  (free format)
 428
 429 \begin{description}
 430 \item{idih\_constr(i)} -- the number of the dihedral angle gamma corresponding to the
 431 ith restraint
 432
 433 \item{phi0(i)} -- center of dihedral-angle restraint
 434
 435 \item{drange(i)} -- range of flat well (no restraints for phi0(i) +/- drange(i))
 436
 437 \end{description}
 438
 439 \paragraph{Disulfide-bridge data \\ \\}
 440 \label{sect:inoutfiles:molinfo:disulfide}
 441
 442 1st line: NS, (ISS(I),I=1,NS)    (free format)
 443
 444 \begin{description}
 445
 446 \item{NS} -- number of cystine residues forming disulfide bridges.
 447
 448 \item{ISS(I)} -- the number of the Ith disulfide-bonding cystine in the sequence.
 449
 450 \end{description}
 451
 452 2nd line: NSS, (IHPB(I),JHPB(I),I=1,NSS) (free format)
 453
 454 \begin{description}
 455
 456 \item{NSS} -- number of disulfide bridges
 457
 458 \item{IHPB(I),JHPB(I)} -- the first and the second residue of ith disulfide link.
 459
 460 Because the input is in free format, each line can be split
 461 \end{description}
 462
 463 \subsubsection{Reference structure}
 464 \label{sect:inoutfiles:molinfo:refstr}
 465
 466 If PDBREF is specified, filename with reference (experimental) structure,
 467 otherwise UNRES internal coordinates as the theta, gamma, alpha, and beta
 468 angles.
 469
 470 \subsection{Main output file}
 471 \label{sect:inoutfiles:mainoutput}
 472
 473 The main (with name INPUT\_clust.out or INPUT\_clust.out\_000 for parallel runs)
 474 output file contains the results of clustering (numbers of families
 475 at different cut-off values, probabilities of clusters, composition of
 476 families, and rmsd values corresponding to families (0 if rmsd was not
 477 computed or read from WHAM-generated cx file).
 478
 479 The output files corresponding to non-master processors
 480 (INPUT\_clust.out\_xxx where xxx$>$0 contain only the information up to the
 481 clustering protocol. These files can be deleted right after the run.
 482
 483 Excerpts from the a sample output file are given below:
 484
 485 CLUST-UNRES:
 486
 487 \begin{verbatim}
 488
 489 THERE ARE   20 FAMILIES OF CONFORMATIONS
 490
 491 FAMILY    1 CONTAINS    2 CONFORMATION(S):
 492   42 -2.9384E+03  50 -2.9134E+03
 493
 494
 495 Max. distance in the family:    14.0; average distance in the family:    14.0
 496
 497 FAMILY    2 CONTAINS    3 CONFORMATION(S):
 498   13 -2.9342E+03   7 -2.8827E+03  10 -2.8682E+03
 499 \end{verbatim}
 500
 501 CLUST-WHAM:
 502
 503 \begin{verbatim}
 504 AT CUTOFF: 200.00000
 505 Maximum distance found:  137.82
 506 Free energies and probabilities of clusters at 325.0 K
 507 clust   efree    prob sumprob
 508     1   -76.5 0.25035 0.25035
 509     2   -76.5 0.24449 0.49484
 510     3   -76.4 0.21645 0.71129
 511     4   -76.4 0.20045 0.91174
 512     5   -75.8 0.08826 1.00000
 513
 514
 515 THERE ARE    5 FAMILIES OF CONFORMATIONS
 516
 517 FAMILY    1 WITH TOTAL FREE ENERGY   -7.65228E+01 CONTAINS  548 CONFORMATION(S):
 518 8363  -7.332E+013939  -7.332E+012583  -7.332E+017395  -7.332E+019932  -7.332E+01
 519 5816  -7.332E+013096  -7.332E+012663  -7.332E+014099  -7.332E+016822  -7.332E+01
 520 3176  -7.332E+017542  -7.332E+018933  -7.332E+017315  -7.332E+01 200  -7.332E+01.
 521 .
 522 5637  -7.062E+018060  -7.061E+013797  -7.060E+018800  -7.057E+016295  -7.057E+01
 523 6298  -7.057E+012332  -7.057E+012709  -7.057E+01
 524
 525 Max. distance in the family:    16.5; average distance in the family:     8.8
 526 Average RMSD 8.22 A
 527 \end{verbatim}
 528
 529 \subsection{Output coordinate files}
 530 \label{sect:inoutfiles:outcoord}
 531
 532 \subsubsection{The internal coordinate (int) files}
 533 \label{sect:inoutfiles:int}
 534
 535 The file with name COORD\_clust.int contains the angles theta, gamma, alpha,
 536 and beta of all residues of the leaders (lowest UNRES energy conformations
 537 from consecutive families for CLUST-UNRES runs and lowest free energy
 538 conformations for CLUST-WHAM runs). The format is the same as that of the
 539 file output by UNRES; see section 9.1.1 of UNRES description.
 540
 541 For CLUST-WHAM runs, the first line contains more items:
 542
 543 \begin{tabular}{ll}
 544 number of family                             &(format i5)\\
 545 UNRES free energy of the conformation        &(format f12.3)\\
 546 Free energy of the entire family             &(format f12.3)\\
 547 number of disulfide bonds                    &(format i2)\\
 548 list disulfide-bonded pairs                  &(format 2i3)\\
 549 conformation class number (0 if not provided)&(format i10)\\
 550 \end{tabular}
 551
 552 \subsubsection{The Cartesian coordinate (x) files}
 553 \label{sect:inoutfiles:card}
 554
 555 The file with name COORD\_clust.x contains the Cartesian coordinates of the
 556 alpha-carbon and side-chain-center coordinates. The coordinate format is
 557 as in section 9.1.2 of UNRES description and the first line contains the
 558 following items:
 559
 560 \begin{tabular}{ll}
 561 Number of the family                         &(format I5)\\
 562 UNRES free energy of the conformation        &(format f12.3)\\
 563 Free energy of the entire family             &(format f12.3)\\
 564 number of disulfide bonds                    &(format i2)\\
 565 list disulfide-bonded pairs                  &(format 2i3)\\
 566 conformation class number (0 if not provided)&(format i10)\\
 567 \end{tabular}
 568
 569 \subsubsection{The PDB files}
 570 \label{sect:inoutfiles:PDB}
 571
 572 The PDB files are in standard format (see
 573 \href{ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_Letter.pdf}{ftp://ftp.wwpdb.org/pub/pdb/doc/format\_descriptions}).
 574 The ATOM records contain Calpha coordinates (CA) or UNRES side-chain-center
 575 coordinates (CB). For oligomeric proteins chain identifiers are present
 576 (A, B, ..., etc.) and each chain ends with a TER record. Coordinates of a
 577 single conformation or multiple conformations  The header (REMARK) records
 578 and the contents depends on cluster run type. The next subsections are devoted
 579 to different run types.
 580
 581 \paragraph{CLUST-UNRES runs \\ \\}
 582 \label{sect:inoutfiles:PDB:clust-unres}
 583
 584 The files contain the members of the families obtained from clustering such
 585 that the lowest-energy conformation of a family is within ECUT kcal/mol higher
 586 in energy than the lowest-energy conformation. Again, within a family, only
 587 those conformations are output whose energy is within ECUT kcal/mol above
 588 that of the lowest-energy member of the family. Families and the members
 589 of a family within a family are ranked by increasing energy. The file names are:
 590
 591 COORD\_xxxx.pdb  where xxxx is the number of the family, if the family contains
 592     only one member of if only one member is output.
 593
 594 COORD\_xxxx\_yyy.pdb where xxxx is the number of the family and yyy is the number
 595     of the member of this family.
 596
 597 An example is the following:
 598
 599 \begin{verbatim}
 600 REMARK R0001                            ENERGY    -2.93843E+03
 601 ATOM      1  CA  GLY     1       0.000   0.000   0.000
 602 ATOM      2  CA  HIS     2       3.800   0.000   0.000
 603 ATOM      3  CB  HIS     2       5.113   1.656   0.015
 604 ATOM      4  CA  VAL     3       5.927  -3.149   0.000
 605 .
 606 .
 607 .
 608 ATOM    346  CB  GLU   183     -43.669 -32.853  -7.320
 609 TER
 610 CONECT    1    2
 611 CONECT    2    4    3
 612 .
 613 .
 614 .
 615 CONECT  341  343  342
 616 CONECT  343  344
 617 CONECT  345  346
 618 \end{verbatim}
 619
 620 where ENERGY is the UNRES energy. The CONECT records defined the Calpha-Calpha
 621 and Calpha-SC connection.
 622
 623 \paragraph{CLUST-WHAM runs\\ \\}
 624 \label{sect:inoutfiles:PDB:clust-wham}
 625
 626 The program generates a file for each family with its members and a summary
 627 file with ensemble-averaged conformations for all families. These are described
 628 in the two next sections.
 629
 630 \subparagraph{Conformation family files\\ \\}
 631 \label{sect:inoutfiles:PDB:clust-unres:family}
 632
 633 For each family, the file name is COORD\_TxxxK\_yyyy.pdb, where yyyy is the
 634 number of the family and xxx is the integer part of the temperature (K).
 635 The first REMARK line in the file contains the information about the free
 636 energy and average rmsd of the entire cluster and, for each conformation,
 637 the initial REMARK line contains these quantities for this conformation.
 638 Same applies to oligomeric proteins, for which the TER records separate the
 639 chains and the ENDMDL record separates conformations.
 640 An example is given below.
 641
 642 \begin{verbatim}
 643 REMARK CLUSTER    1 FREE ENERGY  -7.65228E+01 AVE RMSD 8.22
 644 REMARK 1BDD L18G full clust ENERGY    -7.33241E+01 RMS  10.40
 645 ATOM      1  CA  VAL     1      18.059 -33.585   4.616  1.00  5.00
 646 ATOM      2  CB  VAL     1      18.720 -32.797   3.592  1.00  5.00
 647 .
 648 .
 649 .
 650 ATOM    115  CA  LYS    58      29.641 -44.596  -8.159  1.00  5.00
 651 ATOM    116  CB  LYS    58      27.593 -45.927  -8.930  1.00  5.00
 652 TER
 653 CONECT    1    3    2
 654 CONECT    3    5    4
 655 .
 656 .
 657 CONECT  113  114
 658 CONECT  115  116
 659 TER
 660 REMARK 1BDD L18G full clust ENERGY    -7.33240E+01 RMS  10.04
 661 ATOM      1  CA  VAL     1       3.174   2.833 -34.386  1.00  5.00
 662 ATOM      2  CB  VAL     1       3.887   2.811 -33.168  1.00  5.00
 663 .
 664 .
 665 ATOM    115  CA  LYS    58      16.682   6.695 -20.438  1.00  5.00
 666 ATOM    116  CB  LYS    58      18.925   5.540 -20.776  1.00  5.00
 667 TER
 668 CONECT    1    3    2
 669 CONECT    3    5    4
 670 CONECT  113  114
 671 CONECT  115  116
 672 TER
 673 \end{verbatim}
 674
 675 \subparagraph{Average-structure file\\ \\}
 676 \label{sect:inoutfiles:PDB:clust-unres:average}
 677
 678 The file name is COORD\_T\_xxxK\_ave.pdb. The entries are in pairs; the first
 679 one is cluster-averaged conformation and the second is a family member which
 680 has the lowest rmsd from this average conformation. Computing average
 681 conformations is explained in section 2.5 of ref 3. Example excerpts from
 682 an entry corresponding to a given family are shown below.
 683
 684 \begin{verbatim}
 685 REMAR AVERAGE CONFORMATIONS AT TEMPERATURE  300.00
 686 REMARK CLUSTER    1
 687 REMARK 2HEP clustering 300K ENERGY    -8.22572E+01 RMS   3.29
 688 ATOM      1  CA  MET     1     -17.748  48.148 -19.284  1.00  5.96
 689 ATOM      2  CB  MET     1     -17.373  47.911 -19.294  1.00  6.34
 690 ATOM      3  CA  ILE     2     -18.770  49.138 -18.133  1.00  3.98
 691 .
 692 .
 693 .
 694 ATOM     80  CB  PHE    41     -14.353  44.680 -15.642  1.00  2.62
 695 ATOM     81  CA  ARG    42     -11.619  41.645 -13.117  1.00  4.06
 696 ATOM     82  CB  ARG    42     -11.330  40.378 -13.313  1.00  5.19
 697 TER
 698 CONECT    1    3    2
 699 CONECT    3    5    4
 700 .
 701 .
 702 .
 703 CONECT   76   78   77
 704 CONECT   78   79
 705 CONECT   79   80
 706 CONECT   81   82
 707 TER
 708 REMARK 2HEP clustering 300K ENERGY    -8.22572E+01 RMS   3.29
 709 ATOM      1  CA  MET     1     -37.698  40.489 -32.408  1.00  5.96
 710 ATOM      2  CB  MET     1     -38.477  39.426 -34.159  1.00  6.34
 711 .
 712 .
 713 .
 714 ATOM     80  CB  PHE    41     -35.345  50.342 -31.371  1.00  2.62
 715 ATOM     81  CA  ARG    42     -33.603  54.332 -27.130  1.00  4.06
 716 ATOM     82  CB  ARG    42     -33.832  53.074 -24.415  1.00  5.19
 717 TER
 718 CONECT    1    3    2
 719 CONECT    3    5    4
 720 .
 721 .
 722 .
 723 CONECT   76   78   77
 724 CONECT   78   79
 725 CONECT   79   80
 726 CONECT   81   82
 727 TER
 728 \end{verbatim}
 729
 730 \subsection{The conformation-distance file}
 731 \label{sect:inoutfiles:confdist}
 732
 733 The file name is INPUT\_clust.rms. It contains the upper-diagonal part of
 734 the matrix of rmsds between conformations and differences between their
 735 energies:
 736
 737 i,j,rmsd,energy(j)-energy(i) (format 2i5,2f10.5)
 738
 739 where i and j, j$>$i are the numbers of the conformations, rmsd is the rmsd
 740 between conformation i and conformation j and energy(i) and energy(j) are
 741 the UNRES energies of conformations i and j, respectively.
 742
 743 \subsection{The clustering-tree PicTeX file}
 744 \label{sect:inoutfiles:tree}
 745
 746 This file contains the PicTeX code of the clustering tree. The file name is
 747 INPUT\_clust.tex. It should be supplemented with LaTeX preamble and final
 748 commands or incorporated into a LaTeX source and compiled with LaTeX. The
 749 picture is produced by running LaTeX followed by dvips, dvipdf or other command
 750 to convert LaTeX-generated dvi files into a human-readable files.
 751
 752 \newpage
 753
 754 \section{SUPPORT}
 755 \label{sect:support}
 756
 757    Dr. Adam Liwo\\
 758    Faculty of Chemistry, University of Gdansk\\
 759    ul. Wita Stwosza 63, 80-308 Gdansk Poland.\\
 760    phone: +48 58 523 5124\\
 761    fax: +48 58 523 5012\\
 762    e-mail: \href{mailto:adam@sun1.chem.univ.gda.pl}{\textcolor{blue}{adam@sun1.chem.univ.gda.pl}}\\
 763
 764    Dr. Cezary Czaplewski\\
 765    Faculty of Chemistry, University of Gdansk\\
 766    ul. Wita Stwosza 63, 80-308 Gdansk Poland.\\
 767    phone: +48 58 523 5126\\
 768    fax: +48 58 523 5012\\
 769    e-mail: \href{mailto:cezary.czaplewski@ug.edu.pl}{cezary.czaplewski@ug.edu.pl}
 770
 771
 772 Prepared by Adam Liwo, 02/19/12
 773
 774 \LaTeX versioin, 09/28/12
 775
 776 Revised by Adam Liwo, 12/04/14
 777
 778 \end{document}