
INPUT/OUTPUT FORMATS for the PPH package.
In the program PPH, when asked for the format of the input data, select
one of the numbers 1 through 7 as explained below. The input format
determines the output format.

*****************
The INPUT format:
*****************

Format 1: Please input 1 for this format while running the program.
There are only "haplotypes" (two binary vectors per individual) in the input file.
Example:
	011101 --\
	000101 --/ pair
	111001 --\
	000011 --/ pair
	.... 
Please see file f1 for a complete example.
	
Format 2: Please input 2 for this format while running the program.
There are labels before each pair of haplotypes.
Example:
	label1:   
	011101 --\
	000101 --/ pair
	label2:
	111001 --\
	000110 --/ pair
	....
Please see file f2 for a complete example.

Format 3: Please input 3 for this format while running the program.
There are only genotype vector in the input file. 0 and 1 are used for homozygous
sites, and 2 for hetrozygous sites.
Example:
	012200
	221001
	011100
	...
Please see file f3 for a complete example.
	
Format 4: Please input 4 for this format while running the program.
There are labels before each genotype vector.
Example:
	label1:
	012200
	label2:
	221001
	label3:
	011100
	...
Please see file f4 for a complete example.

Format 5: Please input 5 for this format while running the program.
Nucleotides 'A','T','C' and 'G' are used to represent the "haplotypes".
Example:
	AACCTG --\
	TTGGAC --/ pair
	TAGCAC --\
	ATCGTG --/ pair
	....
Please see file f5 for a complete example.

Format 6: Please input 6 for this format while running the program.
Nucleotides 'A','T','C' and 'G' are used to represent the haplotypes.
There are labels before each haplotype.
Example:
        label1	
	AACCTG --\
	TTGGAC --/ pair
	label2
	TAGCAC --\
	ATCGTG --/ pair
	....
Please see file f6 for a complete example.

Format 7: Please input 7 for this format while running the program.  

This format is the same as format 1, but the file is generated from
R. Hudson's coalesent generating program and hence has additional 
lines before and after the
"haplotypes". Also, Hudson's program uses 1 as the ancestral state and 0
as the derived state. We reverse these for consistency before any 
further processing
of this data. We do not return the data to its original form, so beware if you
use this format option.

Please see file f7 for a complete example.

NOTE: There should not be any "space" between entries in each row in the input file.

******************
The OUTPUT format:
******************

Format 1:
The output file would be like this:
011101 --\
000101 --/  The original input

022101 -->  The genotype implied by  the two input "haplotypes"

000101 --\
011101 --/  The output haplotypes called by the program
------
....   -->  next pair

Format 2:
The output file would be like this:
label1 : -->  The label
011101 --\
000101 --/  The original input

022101 -->  The corresponding genotype to the input haplotype

000101 --\
011101 --/  The output haplotypes called the program
------
...    -->  next pair

Format 3:
The output file would be like this:
022101 -->  The original genotype input

000101 --\
011101 --/  The output
------
...    -->  next pair

Format 4:
The output file would be like this:
label1 : -->  The label
022101 -->  The original input

000101 --\
011101 --/  The output
------
...    -->  next pair

Format 5:
The output file would be like this:
CCAACACCAC --\
CCAACAACAC --/ The input

CCAACAXCAC --> The corresponding genotype to the input haplotype. X is for 
	       a heterozygous site.

CCAACAACAC --\
CCAACACCAC --/ The output
----------

Format 6:
The output file would be like this:
labe1:     --> The label
CCAACACCAC --\
CCAACAACAC --/ The input

CCAACAXCAC --> The corresponding genotype to the input haplotype. X is for 
	       a heterozygous site.

CCAACAACAC --\
CCAACACCAC --/ The output
----------

Format 7:
011101 --\
000101 --/  The original input

022101 -->  The corresponding genotype to the input haplotype

000101 --\
011101 --/  The output
------
....   -->  next pair

The Tree Output

At the bottom of the output file, we show the tree associated with the
haplotypes produced, and whether this tree 
is unique or not. That is, whether it is the only PPH solution to the given 
input. The tree is encoded in New Hampshire (Newick) format.
That is, the encoding is obtained by a depth first traversal from the root of
the tree. First, a left paren is output. Then 
every time the traversal descends on an edge, a left paren is output followed
by the label(s) on that edge. Each label is of the form eX, where X is a column
number, and indicates that character X changes state on that edge.  If the edge
has no label (a case that can only happen when the edge goes to a leaf), then
the symbol e alone is output, without a number following it. Every time the traversal
ascends on an edge, a right paren is output. When the traversal reaches 
a leaf, the leaf
identifier is output in the form pYy, where Y is a row number, and y is either
a or b, reflecting the fact that there are two different leaves associated with 
row Y. Finally, when the traversal finishes, a right paren is output.

Note: In the tree notation, a leaf will not show up if the corresponding row
contains all 0's in the default case, or more generally, if the haplotype at 
the leaf is identical to the ancestral vector.

