
DPPH - An efficient program for deducing haplotypes from 
genotypes, determining if there
are deduced haplotypes that fit a tree model (i.e. a perfect phylogeny, 
a coalescent). This is the PPH problem.

We now have three distinct programs to solve the PPH problem. Program
DPPH is the fastest of the three programs and also outputs a
representation of all the solutions.

This package was written by Ren-Hua Chung at U.C. Davis Computer Science under the
direction of Dan Gusfield.
Copyright (C) 2002 R.H. Chung and D. Gusfield

We make no warranties or guarantees, assume no liability and grant no 
rights for commercial use. 

To cite the general problem and first results, please use: 

"Haplotyping as Perfect Phylogeny: Conceptual
Framework and Efficient Solutions" D. Gusfield 
In Proceedings of RECOMB, Sixth Annual Conference on Research in 
Computational Molecular Biology, April 2002

To cite the specific method implemented here, please use:

"V. Bafna and D. Gusfield and G. Lancia and S. Yooseph",
"Haplotyping as Perfect Phylogeny: A Direct Approach",
Techreport = "CSE-2002-21",
"UC Davis, Department of Computer Science",
"July 17, 2002"
To appear in Journal of Computational Biology


For either of these, and other papers on haplotyping see
wwwcsif.cs.ucdavis.edu/~gusfield/paperlist.html

This package contains these files:

pph.out - the main program
dpph - another copy of the main program
check.out - check consistency
extract1.pl - extracts haplotype to genotype of format 1
extract2.pl - extracts haplotype to genotype of format 2
extract3.pl - extracts haplotype to genotype of format 3
extract4.pl - extracts haplotype to genotype of format 4
extract5.pl - extracts haplotype to genotype of format 5
extract6.pl - extracts haplotype to genotype of format 6
extract7.pl - extracts haplotype to genotype of format 7
change01.pl - convert the file from the hudson program
FORMAT.txt - describes the formats of the input files
README.txt - this file
f1 - an example for format 1
f2 - an example for format 2
f3 - an example for format 3
f4 - an example for format 4
f5 - an example for format 5
f6 - an example for format 6
f7 - an example for format 7

The program is mostly written in C++, but uses several small Perl
programs as well, so you will have to have a Perl interpreter installed.
Perl is generally standard on Unix and Linux systems, and is easilly obtained
for MAC OSX and windows.

*************
Installation:
*************

1. Uncompress the tar file.
2. Enter the sub-directory where the untarred files are.
3. There should be an executable file "pph.out".

You will need both a C++ compiler and a Perl interpreter installed. Both
are typically already installed on Unix systems.

*********************************
Step by step to Use this program:
*********************************

1. Type "pph.out" or dpph to start the program.

2. Please input the filename:
   Input the filename of the file which contains genotype or haplotype information.
   
3. Please input the number of the file format: ( Please refer to the file "FORMAT.txt" ):
   This program can read seven different kinds of formats. Please refer to the file 
   "FORMAT.txt" for more information of the formats. Input value should be from 1 to 7.
   
4. Please input the file name that holds the ancestral vector, i.e., the binary
vector specifying the character states at the root of the tree.  

5. There are two special cases of particular importance. If the ancestral states are all-0, 
then just enter "d" for the default case; if you do not know the ancestral states, enter
"m" for the majority vector). In that case, the program will determine if there is an 
unrooted tree that could evolve the given genotypes.
   
6. Two output files will be generated. The name of the output files are "output1" and
"output2". "output1" reports the solution of the input and the perfect phylogeny tree.
"output2" also reports the class partitions that represent the set of all solutions,
along with above information. For information 
about the output formats, see the file FORMAT.txt.
   
7. The program also checks for consistency of the output. That is, it checks
that each output haplotype pair does actually correspond to the correct input
genotype vector.  The program reports the result of that verification step.

Please report any bug or suggestion to Ren-Hua Chung at rchung@ucdavis.edu. Thanks.
