WatCH Information Page
Protein Structural Analysis Laboratory
Department of Biochemistry & Molecular Biology
Michigan State University
WatCH
WatCH is a tool for the calculation of conserved water sites in a series of related structures in Protein Data Bank files. WatCH was developed by members of the Protein Structural Analysis and Design Laboratory at Michigan State University. When several structures are superimposed, complete linkage cluster analysis provides an objective technique for resolving the continuum of overlaps between water sites into the set of maximally dense microclusters of overlapping water molecules, and also avoids reliance on any one structure as a reference.
Relevant publications related to WatCH include the following:
Please note, you will need Adobe Acrobat Reader to view these pdf files. You can download a free copy of Acrobat Reader from here.
For information related to WatCH algorithims, please see the section on Algorithmic Details.
Usage information for WatCH can be found in the section on Running WatCH.
Installation and Use of WatCH implies agreement with the License Agreement (also included in the distribution package).
System Requirements: WatCH has been compiled with the GNU and Sun C compiler packages on Sun Solaris for SPARC and x86 and IRIX platforms, but should work with most C compilers. Accessory installation, data extraction, and results analysis scripts also require perl, but the clustering program itself will run without it.
The latest version of WatCH, as well as the most recent version of this documentation, can be downloaded below. To install WatCH, perform the following steps:
To install SSA 3.0, perform the following steps:
zcat WatCH.tar.Z | tar xvf - (tar.Z download)OR
gzcat WatCH.tar.gz | tar xvf - (tar.gz download)
Before WatCH is run, the input PDB structures must be prepared as described below in Preparing Coordinate Files for Use with WatCH. WatCH is run via the command-line interface with the following usage:
Usage: cluster <point file> [<clustering threshold>] [<output file>]
(if threshold is not given, 2.40 will be used)
(for the output file option to be used, the threshold must be set explicitly)
(point file must in form of <integer label> <X> <Y> <Z>)
It requires an input file of the following form:
# WaterID X Y Z
10934 9.123 -10.345 11.653
Where:
Generation of this file is described below in Coordinate File Preparation.
The resultant output file is in the following form:
# Column Number
#1 2 3 4 5 6
0 7 4.889 -24.226 27.970 -- 100305 100324 100346 100434 100307 100339 100312
1 6 -0.668 -7.894 25.561 -- 100306 100309 100388 100459 100499 100506
2 4 18.532 -13.080 32.383 -- 100308 100315 100311 100314
Where the columns are:
Running WatCH requires the extraction of coordinates of the water molecules in each of the crystallographic structures of interest to a file as described above.
prep_WatCH <structure ID> <pdb file> > <output file>
prep_WatCH <structure ID> <pdb file> >> <output file>
As described above, the output of WatCH is a file in which contains the coordinates for each cluster, calculated as the mean of the coordinates of each of the cluster's member water molecules, and the cluster contents, i.e. the identifiers for the water molecules which are a member of that cluster. There are 3 additional perl scripts that may be of use for analysis of the results.
check_cluster_contents.pl <check> <cluster file>
where checks are:
-d (find clusters with waters from same structure)
-c (give count of points from each str in each cluster)
-p <point id> (find cluster with specified cluster point)
-s <structure id> (find clusters from a specific structure)
except for -d and -c together, only 1 check may be used
cluster2pdb.pl:Convert the cluster output file to a PDB formatted file, containing water molecules representing each cluster centroid. The B-value column is used to hold the degree of conservation for each cluster.
cluster2percent.pl:Convert the cluster output file from one with absolute degrees of conservation, e.g. the cluster is conserved in 4 structures (of the e.g. 8), to one with relative conservation, e.g. the cluster is conserved in 50% of the structures. This script, followed by cluster2pdb.pl, can be used to achieve a PDB formatted file of the cluster cetntroids with the B-value column containing percentage conservation.
An additional file is provided, an InsightII (Accelrys, Inc) spectrum, which can be used to facilitate coloring of the cluster output by degree of conservation, after using cluster2percent.pl and cluster2pdb.pl. This file is misc/cluster_per.spect and can be read using the InsightII Spectrum/Get command. The spectrum is colored with dark blue as 0% conservation, purplish blue as 50% conservation, purplish red as just over 50% conservation, and dark red as 100% conservation.
Included in the demo directory are a series of files which can be used as a demonstration of WatCH. This demo uses three thrombin structures, PDB codes 1hai, 1hah, and 1vr1.
1hai.pdb
../bin/prep_WatCH 10 1hai.pdb > 1hai.wats
../bin/prep_WatCH 20 1hah.transformed.pdb > 1hah.wats
../bin/prep_WatCH 30 1vr1.transformed.pdb > 1vr1.wats
cat 1hai.wats 1hah.wats 1vr1.wats > all.wats
../bin/cluster all.wats 2.4 all.clusters
../bin/cluster2percent.pl 3 all.clusters > all.clusters.percent
../bin/clusters2pdb.pl all.clusters > all.clusters.pdb
../bin/clusters2pdb.pl all.clusters.percent > all.clusters.percent.pdb
../bin/check_cluster_contents.pl -d all.clusters > all.clusters.samestr
../bin/check_cluster_contents.pl -c all.clusters > all.clusters.strcount
../bin/check_cluster_contents.pl -p 100431 all.clusters \
> all.clusters.point100431
../bin/check_cluster_contents.pl -s 30 all.clusters > all.clusters.str30
WatCH uses complete-linkage hierarchical cluster analysis to calculate clusters of bound water molecules. Details of this algorithm and its application to thrombin and trypsin can be found in the following reference:
Scientific inquries concerning WatCH should be directed to Leslie Kuhn at: kuhnlab@msu.edu