Installing and Using WatCH

A tool for identifying conserved water sites and analyzing their degree of conservation using heirarchical cluster analysis of superimposed related Protein Data Bank structures.

  1. Introduction
  2. Getting and Installing WatCH
  3. Running WatCH
  4. Output Files Produced by WatCH
  5. Demonstration Files
  6. Algorithmic Details of WatCH
  7. Contact Information

Introduction

WatCH is a tool for the calculation of conserved water sites in a series of related structures in Protein Data Bank files. WatCH was developed by members of the Protein Structural Analysis and Design Laboratory at Michigan State University. When several structures are superimposed, complete linkage cluster analysis provides an objective technique for resolving the continuum of overlaps between water sites into the set of maximally dense microclusters of overlapping water molecules, and also avoids reliance on any one structure as a reference.

For literature references related to WatCH, please see the section on Algorithmic Details.

Back to Table of Contents

Getting and Installing WatCH

Installation and Use of WatCH implies agreement with the attached License Agreement.
(Also included in the distribution package.)

System Requirements: WatCH has been compiled with the GNU and Sun C compiler packages on Sun Solaris for SPARC and x86 and IRIX platforms, but should work with most C compilers. Accessory installation, data extraction, and results analysis scripts also require perl, but the clustering program itself will run without it.

The latest version of WatCH, as well as the most recent version of this documentation, can be found at the home page for the Protein Structural Analysis and Design Laboratory, Department of Biochemistry and Molecular Biology, Michigan State University. To install WatCH, perform the following steps:

Back to Table of Contents

Running WatCH

Overview of usage

Before WatCH is run, the input PDB structures must be prepared as described below in Preparing Coordinate Files for Use with WatCH

WatCH is run via the command-line interface with the following usage:

Usage: cluster <point file> [<clustering threshold>] [<output file>] 
   (if threshold is not given, 3.60 will be used)
   (for the output file option to be used, the threshold must be set explicitly)
   (point file must in form of <integer label> <X> <Y> <Z>)
It requires an input file of the following form:
# WaterID X     Y       Z
10934     9.123 -10.345  11.653
where: Generation of this file is described below in Input File Preparation.

The resultant output file is in the following form:

# Column Number
#1       2     3       4       5        6
0        7    4.889 -24.226  27.970  -- 100305 100324 100346 100434 100307 100339 100312 
1        6   -0.668  -7.894  25.561  -- 100306 100309 100388 100459 100499 100506 
2        4   18.532 -13.080  32.383  -- 100308 100315 100311 100314 
where the columns are:
  1. Cluster Number,
  2. Cluster Conservation (i.e. number of water molecules in that cluster),
  3. X coordinate of the cluster,
  4. Y coordinate of the cluster,
  5. Z coordinate of the cluster, and
  6. Cluster contents (i.e. which water molecules are in that cluster).

Back to Table of Contents

Preparing Coordinate Files for Use with WatCH

Running WatCH requires the extraction of coordinates of the water molecules in each of the crystallographic structures of interest to a file as described above.

  1. Superimpose each of the structures in question to a common reference frame. This can be done using InsightII's Transform/Superimpose command or via another superposition method.
  2. Extract the WaterID and coordinate information for water molecules in each of the structures of interest using the prep_WatCH script. This must be run for each superimposed structure, giving each structure a unique identifier. This script is run as follows:

Back to Table of Contents

Analysis of WatCH output

As described above, the output of WatCH is a file in which contains the coordinates for each cluster, calculated as the mean of the coordinates of each of the cluster's member water molecules, and the cluster contents, i.e. the identifiers for the water molecules which are a member of that cluster. There are 3 additional perl scripts that may be of use for analysis of the results. An additional file is provided, an InsightII (Accelrys, Inc) spectrum, which can be used to facilitate coloring of the cluster output by degree of conservation, after using cluster2percent.pl and cluster2pdb.pl. This file is misc/cluster_per.spect and can be read using the InsightII Spectrum/Get command. The spectrum is colored with dark blue as 0% conservation, purplish blue as 50% conservation, purplish red as just over 50% conservation, and dark red as 100% conservation.

Cluster Conservation Spectrum

Back to Table of Contents

Demonstration Files

Included in the demo directory are a series of files which can be used as a demonstration of WatCH. This demo uses three thrombin structures, PDB codes 1hai, 1hah, and 1vr1.
  1. Superimpose 1hah and 1vr1 onto 1hai using the catalytic triad residues (His H57, Asp H102, & Ser H195). This gives the following files: 1hai.pdb (untransformed as it was used as the base for superposition), 1hah.transformed.pdb, and 1vr1.transformed.pdb.
  2. Extract the water molecule data from each file:
    ../bin/prep_WatCH 10 1hai.pdb > 1hai.wats
    ../bin/prep_WatCH 20 1hah.transformed.pdb > 1hah.wats
    ../bin/prep_WatCH 30 1vr1.transformed.pdb > 1vr1.wats
  3. Cat the files together into a single input file:
    cat 1hai.wats 1hah.wats 1vr1.wats > all.wats
  4. Run the clustering program:
    ../bin/cluster all.wats 3.6 all.clusters
  5. Transform from absolute to relative conservation:
    ../bin/cluster2percent.pl 3 all.clusters > all.clusters.percent
  6. Transform the cluster files into PDB files for visualization with InsightII, RasMol, or another molecular visualization program:
    ../bin/clusters2pdb.pl all.clusters > all.clusters.pdb
    ../bin/clusters2pdb.pl all.clusters.percent > all.clusters.percent.pdb
  7. Determine which clusters have water molecules from the same structure:
    ../bin/check_cluster_contents.pl -d all.clusters > all.clusters.samestr
  8. Count the number of water molecules from each structure in each cluster:
    ../bin/check_cluster_contents.pl -c all.clusters > all.clusters.strcount
  9. Identify the cluster which contains water molecule number 431 from 1hai (structure ID 10; water ID 100431):
    ../bin/check_cluster_contents.pl -p 100431 all.clusters > all.clusters.point100431
  10. Identify clusters which contain water molecules from 1vr1 (structure ID 30):
    ../bin/check_cluster_contents.pl -s 30 all.clusters > all.clusters.str30

Back to Table of Contents

Algorithmic Details of WatCH

WatCH uses complete-linkage hierarchical cluster analysis to calculate clusters of bound water molecules. Details of this algorithm and its application to thrombin and trypsin can be found in the following reference:

P. C. Sanschagrin and L. A. Kuhn
Cluster Analysis of Consensus Water Sites in Thrombin and Trypsin Shows Conservation between Serine Proteases and Contributions to Ligand Specificity
Prot. Sci., Vol. 7, No. 10, October 1998, pp 2054-2064.

PubMed Entry

Back to Table of Contents

Contact Information

Inquiries, bug reports, etc. should be directed to Dr. Leslie Kuhn at the following email address:

kuhn@agua.bch.msu.edu