Selected (Mostly Recent) Publications
Gusfield
Books:
UC Davis Engineering News Article: Computer scientist Dan Gusfield shapes new disciplines, one book at a time
1. Integer Linear Programming in Computational and Systems Biology: An entry level text and course.
D. Gusfield (Cambridge University Press, June 2019)
Cambridge Press Flyer
2. ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks
D. Gusfield (MIT Press, July 2014)
Link
to the book on Amazon
Preface, TOC, and part of Intro The book through
the statement of the Central Thesis.
The cover
3.Algorithm on Strings, Trees, and Sequences: Computer Science
and Computational Biology
D. Gusfield (Cambridge Press)
Also, translated and published in Russian. Published in paperback for SouthAsia sales.
4. The Stable Marriage Problem: Structure and Algorithms
D. Gusfield and R. Irving (MIT Press)
New papers related to computation in population genomics

Association Mapping for Compound Heterozygous Traits Using Phenotypic Distance
and Integer Programming
Dan Gusfield and Rasmus Nielsen
Proceedings of 2015 WABI conference
Papers on (MultiState) Perfect Phylogeny

Minimum AverageDistance Clique Trees.
ShouJun Xu, Rob Gysel, Dan Gusfield.
SIAM J. DISCRETE MATH. 2015 Society for Industrial and Applied Mathematics
Vol. 29, No. 3, pp. 1706  1734

Triangulation heuristics for maximum character compatibility.
Rob Gysel, Dan Gusfield, Kristian Stevens
ICCABS 2013: 12 (This won the best paper award for the conference,
but we only published the abstract in the proceedings. The journal version is still in preparation).

Reducing Problems in Unrooted Tree Compatibility to Restricted Triangulations of Intersection Graphs.
Rob Gysel, Kristian Stevens, Dan Gusfield.
WABI 2012: 93105

Constructing Perfect Phylogenies and Proper Triangulations for ThreeState Characters
Rob Gysel, Fumei Lam and Dan Gusfield.
Proceedings of WABI 2011, Springer LNCS
Journal Version:
Algorithms for Molecular Biology 7: 26 (2012)

Reducing MultiState to Binary Perfect Phylogeny with Applications
to Missing, Removable, Inserted and Deleted Data.
Kristian Stevens and Dan Gusfield
Proceedings of WABI 2010, Springer LNCS Vol. 6293 2010, p. 274287

Extensions and Improvements to the Chordal
Graph Approach to the Multistate Perfect
Phylogeny Problem
R. Gysel and D. Gusfield
In Proceedings of the ISBRA Conference, 2010, Springer LNBI Vol. 6054, p. 5260.
 The Threestate Perfect Phylogeny Problem Reduces to 2SAT.
D. Gusfield and Y. Wu
Communicationsand Information Systems, Special Issue Dedicated to Michael Waterman,
Volume 9, No. 4, 2009. Pages 195301

Generalizing the Splits Equivalence Theorem
and Four Gamete Condition: Perfect Phylogeny
on Three State Characters
Fumie Lam, D. Gusfield, S. Sridhar
SIAM J. on Discrete Math. Vol. 25, No. 3, pp. 11441175
This an expanded version of the WABI 2009 conference paper presented September 12, 2009.
The main result of the paper is that there is a threestate perfect phylogeny
(or in other terms, the data is compatible with a threestate tree)
if and only if there is one for every subset of three characters. This generalizes
the binary case where the classical fourgametes condition says that there is a binary
perfect phylogeny in binary data if and only if there is one for every pair of characters.
The paper also shows that there is a set of four subpatterns, each with three characters
and five taxa, such that any threestate
data that does not have a threestate perfect phylogeny must contain one of those four
subpatterns. This generalizes the fourgametes condition which shows that when a pair
of characters does not have a binary perfect phylogeny (is not compatible) the subpattern
of all four binary combinations must appear in that pair of characters.
The techniques are based
on the chordal graph view of perfect phylogeny.

The MultiState Perfect Phylogeny Problem with Missing and
Removable Data; Solutions via Integer Linear Programming and Chordal Graph Theory
Dan Gusfield
In Proceedings of the 2009 RECOMB Conference, May 2009.
Expanded journal version (with complete proofs)
J. of Computational Biology, Vol. 17, no. 3, 2010, p. 383399
For software related to this paper see Software
For powerpoint slides related to this paper see Recent Talks
Papers on Phylogenetic Networks and ARGs with Recombination
 A Resolution of the Static Formulation Question for the Problem of Computing the History Bound.
TCBB, Volume 14, 2017, pages 404417 ,
Julia Matsieva, Steven Kelk, Celine Scornavacca, Chris Whidden, and Dan Gusfield
 Persistent Phylogeny: A GalledTree and IntegerLinear Programming Approach.
Dan Gusfield.
2015 ACM Conference on Bioinformatics, Computational Biology and Health Informatics.

A Decomposition Theory for Phylogenetic Networks and Incompatible Characters
Dan Gusfield, Vikas Bansal, Vineet Bafna, Yun S. Song
Journal of Computational Biology Dec 2007, Vol. 14, No. 10: 12471272.
This is a large expansion of the RECOMB 2005 conference paper listed below. It
solves the major open question from that conference paper and extends the earlier results.

Algorithms to Distinguish the Role of GeneConversion from SingleCrossover Recombination in the Derivation of SNP
Sequences in Populations
Yun S. Song, Zhihong Ding, Dan Gusfield, Charles H. Langley, Yufeng Wu
Journal of Computational Biology Dec 2007, Vol. 14, No. 10: 12731286
This is the extended journal version of the RECOMB 2006 conference paper listed below.
 A new recombination lower bound and the minimum perfect phylogenetic forest problem
Y. Wu and D. Gusfield
LNCS, vol. 4598, Proceedings of the 13'th Annual International Conference on Combinatorics and Computing, 2007,
p. 1626
The Journal version is in Journal of Combinatorial Optimization, Vol 16, 2008
 Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants
Y. Wu and D. Gusfield
LNCS, vol. 4580, Proceedings of the 18'th Annual Symposium on Combinatorial Pattern Matching, 2007,
p. 150161

An efficientlycomputed lower bound on the number of recombinations in phylogenetic networks: Theory and empirical study
link to Science Direct for the final journal manuscript
D. Gusfield, D. Hickerson, S. Eddhu
Discrete Applied Mathematics, Special issue on Computational Biology, Vol. 155, pg. 806830, 2007
preprint in pdf

Efficient Computation of Minimum Recombination with Genotypes (not Haplotypes) (pdf)
Y.Wu and D. Gusfield
In Proceedings of The CSB Conference, Stanford CA., August 2006.
The journal version appeared in J. Bioinformatics and Computational Biology, Vol. 5 No. 2(a), 2007, p. 181200

Algorithms to distinguish the role of geneconversion from singlecrossover recombination
in the derivations of SNP sequences in populations (pdf)
Y.S. Song, Z. Ding, D. Gusfield, C. Langley, Y. Wu
In Proceedings of RECOMB 2006.
ps version
The Journal version is in Journal of Computational Biology, Vol. 14, 2007

Efficient computation of close lower and upper bounds on the minimum number of recombinations in biological sequence evolution
Yun S. Song, Yufeng Wu and Dan Gusfield
Proceedings of ISMB 2005, published as a special issue of Bioinformatics
For the associated software
HapBound and SHRUB

On the FullDecomposition Optimality Conjecture for Phylogenetic Networks (pdf)
D. Gusfield
UCD Technical Report, Jan. 2005

A Fundamental Decomposition Theory for Phylogenetic
Networks and Incompatible Characters (pdf)
click here for postscript version
D. Gusfield and V. Bansal
Proceedings of RECOMB 2005, May 2005
 Optimal, Efficient Reconstruction of RootUnknown Phylogenetic Networks with Constrained and
Structured Recombination (pdf)
click here for postscript version
Technical Report UCDavis CSE200431
D. Gusfield
November 25, 2004,
Journal version appears in J. Computer and Systems Sciences, 2005 Special issue on Computational Biology.
Vol. 70 (2005) p. 381398
 A fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters (pdf)
click here for postscript version
Technical Report UCDavis CSE200432
D. Gusfield
October 14, 2004,

Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination (pdf)
Click here for Postscript
D. Gusfield, S. Eddhu, C. Langley
Journal of Bioinformatics and Computational Biology, Vol. 2 no. 1 (2004) p. 173213
Special Issue on the 2003 IEEE CSB Bioinformatics Conference.
This is a much expanded journal version of the conference paper listed next.

Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination (pdf)
Click here for Postscript
D. Gusfield, S. Eddhu, C. Langley
Proceedings of the 2003 IEEE CSB Bioinformatics Conference. This won the bestpaper award.
Thank you Hewlett Packard for donating the prize money.

The Fine Structure of Galls in Phylogenetic Networks (pdf)
D. Gusfield, S. Eddhu, C. Langley
INFORMS J. of Computing Special Issue on Computational Biology (2004).
Papers on Haplotype Inference
 Analytical and Algorithmic Methods for Haplotype Frequency Inference: What do they tell us?
S. Orzack, D. Gusfield, L. Subrahmanyan, L. Essioux, S. Lissarrague
in Bioinformatics Algorithms: Techniques and Applications, I. Mandoiu and A. Zelikovsky (eds.), Wiley 2008, p. 373394
 Algorithms for Imperfect Phylogeny Haplotyping with a Single Homoplasy or Recombination Event (pdf),
Y. S. Song, Y. Wu and Gusfield
Proceedings of WABI 2005, October 2005, Lecture Notes in Bioinformatics
In this paper we look at extensions of the PPH problem (see below) to situations where the underlying haplotypes that
form the genotypes do not evolve on
a perfect phylogeny, but rather evolve either on a tree with one homoplasy event (a recurrent or back mutation) or on
a network with one recombination event. Using one empricallyjustified assumption, we present a polynomial time algorithm
for the first problem (the case of one homoplasy event) and an exponentialtime algorithm for the second problem.
We have subsequently developed a polynomialtime algorithm for the second problem as well.
 A LinearTime Algorithm for Perfect Phylogeny Haplotyping (pdf)
Z. Ding, V. Filkov, D. Gusfield
This is the expanded, full journal version of the RECOMB conference paper below.
Journal of Computational Biology, Vol. 13, No. 2, pg. 522553, 2006
 A LinearTime Algorithm for Perfect Phylogeny Haplotyping (pdf)
Z. Ding, V. Filkov, D. Gusfield
Proceedings of RECOMB 2005, May 2005
click here for postscript version
This paper solves the open question of whether the Perfect Phylogeny Haplotyping (PPH) Problem can be solved
in linear time. The method presented is simple enough for easy implementation, and our
implementation is on the web.
 Haplotype Inference (pdf)
D. Gusfield and S.H. Orzack
In CRC Handbook on Bioinformatics, 2006 (S. Aluru Editor), p. 181 1825

A Note on Efficient Computation of Haplotypes via Perfect Phylogeny
Vineet Bafna, Dan Gusfield, Sridhar Hannenhalli, Shibu Yooseph
Journal of Computational Biology, Vol. 11, no. 5, 2004, p. 858866
This paper shows that two prior methods for the PPH problem are unlikely to be
implementable in linear time. It also shows that the problem of finding a PPH solution
that minimizes the number of distinct haplotypes used (over all PPH solutions) is NPhard.
 An Overview of Combinatorial Methods for Haplotype Inference (pdf)
In Computational Methods for SNPs and Haplotype Inference, S. Istrail, M. W
aterman, and A. Clark
(eds). Lecture Notes in Computer Science, vol. 2983, Springer, p. 925, 2004.
click here for postscript version
(2004) A survey of combinatorial algorithms and software
for haplotype inference developed at UC Davis.
D. Gusfield
 Empirical Exploration of Perfect Phylogeny Haplotyping and
Haplotypers (pdf)
click here for postscript version
R.H. Chung and D. Gusfield
Proceedings of the 2003 Cocoon Conference July 2003
Link to the proceedings
This reports on the performance of three perfect phylogeny haplotyping
programs, and on two interesting phenomena observed when solving the
perfect phylogeny haplotyping problem on simulated data.
 Haplotyping by Pure Parsimony (pdf)
click here for postscript version
Technical Report UCDavis CSE20032
D. Gusfield
January 23, 2003,
Final version in The Proceedings of the 2003 Combinatorial
Pattern Matching Conference, June 2003
Link to conference proceedings
The Pure Parsimony problem for Haplotype Inference is to
find the smallest
set of binary strings that can generate an input set of genotypes.
The Pure Parsimony problem is NPhard, and no paper has previously
shown how
an optimal PureParsimony solution
can be computed efficiently for problem instances of the size of current biological
interest.
In this paper, we show how to formulate the PureParsimony problem
as an integer linear program; we explain
how to improve the practicality
of the integer programming formulation; and we present the results of
extensive experimentation we have
done to show the time and memory practicality of the method, and to
compare its accuracy against solutions found by
the widely used general haplotyping program PHASE.
The results are that the Pure Parsimony problem can
be solved efficiently in practice for a
wide range of problem instances of current interest in biology. Both the time needed
for a solution, and the accuracy of the solution, depend on the level of recombination
in the input strings. The speed of the solution improves with increasing recombination,
but the accuracy of the solution decreases with increasing recombination.
 Haplotyping as Perfect Phylogeny: A direct
approach (pdf).
Haplotyping as Perfect Phylogeny: A direct
approach (postscript)
Technical Report UCDavis CSE200221
V. Bafna, D. Gusfield, G. Lancia, S. Yooseph
July 17, 2002.
An augmented version has appeared in Journal of Computational Biology, Vol. 10 No. 3 (2003)
p. 323340.
This develops a more
direct, selfcontained method (than the first paper in the area:
"Haplotyping as Perfect Phylogeny: Conceptual
Framework and Efficient Solutions") determining whether unphased genotype
data can be explained by haplotypes that fit a perfect phylogeny. It
gives a more intuitive and algorithmicly simpler way to create the
representation of all such solutions.
An implementation of this method is available at:
DPPH .
 Haplotyping as Perfect Phylogeny: Conceptual
Framework and Efficient Solutions. (pdf)
Haplotyping as Perfect Phylogeny: Conceptual
Framework and Efficient Solutions (postscript)
D. Gusfield
Appeared in RECOMB 2002, April 2002.
This paper develops an
almostlineartime algorithm to determine if unphased genotype data can
be explained by haplotype pairs which fit a rooted perfect phylogeny. If
there is such an explanation, then the algorithm, in linear additional time,
determines if the solution is unique, and if not, produces a representation
of all the solutions. The method is based on reducing the haplotype problem
to a problem of graph realization. We use this reduction to graph realization,
in the software package
GPPH (previously called PPH). However,
in GPPH, we use a somewhat slower way to solve the graph realization
problem than is described in the paper.
Errata for: Haplotyping as Perfect Phylogeny: Conceptual
Framework and Efficient Solutions (pdf)
Errata for: Haplotyping as Perfect Phylogeny: Conceptual
Framework and Efficient Solutions (postscript)
 Perfect Phylogeny Haplotyper: Haplotype Inferral Using a Tree
Model (pdf)
Perfect Phylogeny Haplotyper: Haplotype Inferral Using a Tree
Model (Postscript)
R.H. Chung and D. Gusfield.
This gives a short description of the program PPH (now called GPPH),
which implements the solution
method discussed in the prior paper and Errata.
Bioinformatics, Vol. 19, No. 6, pp. 780781, 2003.
 Inference of Haplotypes in Samples of
Diploid Populations: Complexity and
Algorithms.(pdf)
Inference of Haplotypes in Samples of
Diploid Populations: Complexity and
Algorithms (postscript)
D. Gusfield
In Journal of Computational Biology, Vol. 8, No. 3, 2001 p. 305323

The absolute and relative accuracy of haplotype inferral methods and a consensus approach
to haplotype inferral. S. H. Orzack, D. Gusfield, V.P. Stanton.
Abstract in
51'th Annual Meeting of the American Society of Human Genetics, Fall 2001.
The full, much expanded
paper (in Genetics, listed next) details how to modify Clark's wellknown
and widely used method of haplotype inferral to greatly boost
its accuracy, making it nearly competitive in accuracy with PHASE,
and vastly faster. This paper is almost unique in the literature
in that it compares the accuracy of the methods to
laboratory determined haplotype data  determined by one of the authors. It is the joint collaboration
of a computer scientist, a geneticist and molecular biologists.

Analysis and Exploration of the Use of RuleBased
Algorithms and Consensus Methods for the
Inferral of Haplotypes
Genetics, Vol. 165, 915928, October 2003
Steven Hecht Orzack, Daniel Gusfield, Jeffrey
Olson, Steven Nesbitt, Lakshman
Subrahmanyanc, and Vincent P. Stanton, Jr.
Papers on Algorithms for RNA Folding
 A Sparsified FourRussians Algorithm for RNA Folding.
Yelena Frid, Dan Gusfield:
2015 Workshop on Algorithms in Bioinformatics (WABI).
 Faster Algorithms for RNAFolding Using the FourRussians Method
Balaji Venkatachalam, Dan Gusfield, and Yelena Frid
Proceedings of The WABI Conference, 2013. Springer LNCS
pages 126140
Journal Version: Algorithms for Molecular Biology 9: 5 (2014)
 Speedup of RNA Pseudoknotted Secondary Structure Recurrence Computation with the FourRussians Method
Yelena Frid and Dan Gusfield
Proceedings of
6th International Conference, COCOA 2012, Banff, AB, Canada, August 59, 2012.
Springer LNCS Vol. 7402
pages 176187
 WorstCase and Practical Speedup for the RNA cofolding Problem using the 4Russians Idea.
Yelena Frid and Dan Gusfield
Proceedings of WABI 2010, Springer LNCS Vol. 6293, 2010, p. 112.

A simple, practical and complete O(n^3/ log n)time Algorithm for RNA folding using the
FourRussians Speedup
Yelena Frid, D. Gusfield
Presented at the WABI 2009 conference, September 13, 2009
Expanded and corrected Journal Version: in
Algorithms for Molecular Biology 2010, 5:13
Other Recent Papers and Proceedings in Computational Biology and Bioinformatics

Untangling Tanglegrams: Comparing Trees by Their Drawings
Venkatachalam, B; Apple, J; St John, K; Gusfield, D,
In Proceedings of BIOINFORMATICS RESEARCH AND APPLICATIONS, 5542: 8899
MAY 1316, 2009
Journal Version: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 7, 2010, p. 588597
 Integer Programming Formulations and Computations Solving Phylogenetic and
Population Genetic Problems with Missing or Genotypic Data
(pdf)
D. Gusfield and Y. Frid and D. Brown
LNCS, vol. 4598, Proceedings of the 13'th Annual International Conference on Combinatorics and Computing, 2007,
p. 5164
 LinearTime Algorithms for Finding
and Representing all Tandem Repeats in a String (pdf)
D. Gusfield and J. Stoye
JCSS, vol. 69, 2004, p. 525546
We show how to decorate a suffix tree for a string with the endpoints of all the distinct substrings of the string that are
tandem repeats. The algorithm runs in lineartime as a function of the length of the string. Of course, such
an algorithmic result would not be possible without the remarkable mathematical result,
proven by A. Frankel and J. Simpson in 1998, that there
can only be a linear number of distinct substrings that are tandem repeats. The actual bound is 2n, for a string of length n.
After the suffix tree is decorated,
almost all imaginable questions about the tandem repeats or tandem arrays (how many occurrences there are, where they occurr,
their average length, the maximum
and minimum lengths from given positions etc.) that were previously studied individually, can be answered efficiently
using this decorated suffix tree.

RECOMB 2004, Proceedings of the Eighth Annual International Conference on Computational Molecular Biology
ACM Press
D. Gusfield (Program Chair), P. Bourne, S. Istrail, P. Pevzner, M. Waterman (editors)

Algorithms in Bioinformatics
Proceedings of the Second International Workshop on Algorithms in
Bioinformatics, WABI 2002, Rome, Italy, September 2002
Springer Lecture Notes in Computer Science Vol. 2452, 554 pages
R. Guigo and D. Gusfield (editors)

String Barcoding: Uncovering Optimal Virus Signatures (pdf)
S. Rash and D. Gusfield, Appeared in RECOMB 2002, April 2002

On the Complexity of Fundamental Computational Problems in Pedigree Analysis (pdf)
On the Complexity of Fundamental Computational Problems in Pedigree Analysis
(postscript)
A. Piccolboni and Dan Gusfield
An improved version has appeared in Journal of Computational Biology, Vol 10, No. 5, 2003,
p. 763774.

Partitiondistance: A problem and class of perfect graphs
arising in clustering (pdf),
Partitiondistance: A problem and class of perfect graphs
arising in clustering (postscript),
Dan Gusfield
Information Processing Letters, Volume 82, Issue 3, 16 May 2002,
Pages 159164

Simple and flexible detection of
contiguous repeats using a suffix tree (pdf) ,
Simple and flexible detection of
contiguous repeats using a suffix tree (postscript) ,
Jens Stoye and Dan Gusfield
Theoretical Computer Science, Volume 270,
Issues 12, 6 January 2002, Pages 843856
Other Recent Papers

The structure and complexity of sports elimination
numbers (pdf)
The structure and complexity of sports elimination
numbers (postscript)
Dan Gusfield and Chip Martel,
In Algorithmica (2002) 32, pages 7386
Hard to Find Older Papers

Efficient Algorithms for Inferring Evolutionary Trees
(pdf)
Efficient Algorithms for Inferring Evolutionary Trees
(ps)
D. Gusfield
I get many requests for this paper from people who can't find it in their libraries. It is
my most cited journal publication. It was also my first submitted paper in computational biology,
submitted around 1985, and rejected (I won't reveal the journal).
It was published in Networks, Vol. 21 (1991) p.
1928. Networks is no longer a major journal (judging by the libraries that don't
receive it, and I guess even by 1991, it was not widely received.). So I am posting
the version of it that I found in my files (after much searching). It may not be
absolutely identical to the published paper, but it has all the results. The paper
has three results: How to find a perfect phylogeny (if there is one) in linear time;
a lower bound proof that every entry in the data has to be examined; how to
solve the tree compatibility problem in linear time. The first result is also
obtained in a somewhat nicer way in my book Algorithms on Strings, Trees and Sequences.

A little knowledge goes a long way: Faster detection of
compromised data in 2D tables(pdf)
D. Gusfield
This paper is buried in the Proceedings of the 1990 Oakland conference on Research in Security and
Privacy. I don't think anyone knows it is there  I should have published it in a journal.
The paper considers the following problem Statistical Data Security:
In a 2D table of nonnegative numbers, say n by n, where all the row totals and
column totals
are known, but only some of the n^2 cell values are known, one wants to determine, for each cell whose
value is not known, what is the maximum possible value that the cell could take on. That is, if cell (i,j)
has an unknown value, what is the maximum value for cell (i,j) so that there is a setting of all
the other unknown (nonnegative) values that makes all the rows and columns sum to their correct, fixed values?
In this paper I show that no matter how many cell values are unknown, one needs to solve this problem only
2n1 times. More specifically,
I give an algorithm that selects at most 2n1 of the cells, so that after the maximum value
is determined for each of those 2n1 cells, the maximum value can be obtained for each of the remaining
unknown values by doing a single lookup in a table built during the solving of the first 2n1 problems.
So, the maximum value can be determined in constant time for each of
the remaining cells. The total time for the algorithm is the time needed for 4n2 network flow computations
in a directed graph of 2n nodes.

New Uses for Uniform Lifted Alignments (pdf)
D. Gusfield and L. Wang.
In Mathematical Support for Molecular Biology.
DIMACS Series on Discrete
Math and Theoretical Computer Science, VOL. 47, p. 3352 (1999)
This paper is obscure due to a terrible choice of titles. It should have been called something like:
How to Compute Evolutionary History Really Badly, and Why I Want to.
The paper uses approximation algorithms in a way that is backwards from what they were designed
for, in order to establish bounds on the accuracy of certain computations, rather than
trying to find good solutions. The key to doing this is to make an approximation algorithm
as inaccurate as possible, while still producing a solution that falls
within the worstcase approximation
bound. One reviewer of this paper wrote ``This is the dumbest idea I have heard in a long time".
We use this approach to try to show that a particular tree alignment of RNA sequences
that David Sankoff and Robert Cedergren constructed in 1983 is close to the optimal alignment
(under a given objective function). A simple analysis showed that the cost of their alignment is
no more than
62% larger than the optimal. In this paper we show that its cost is no more than 16.57% larger than
the optimal. I think this idea of using an approximation algorithm backwards can be applied in other problems,
but I have never seen the idea picked up by anyone else.

Efficient Methods for Multiple Sequence Alignment with Guaranteed Error Bounds
Bulletin of Mathematical Biology, Vol. 55, p. 141154, 1993
A prepublication version with all the main results, but a bit more raw than
the published journal version.
This was the first use of a bounded approximation algorithm in computational biology.
The particular multiple alignment
method developed in the paper was only for the purpose of being able to prove a guaranteed bound on the error, and
hence the result in the paper is mainly theoretical.
However, it has been reported that the actual alignments are useful in some applications.
A historical trivia note: This is the first (oldest) paper PubMed indexed using the search term Computational Biology.
PubMed query

Parametric and Inverse Parametric Sequence Alignment with XPARAL
D. Gusfield and P. Stelling
Methods in Enzymology, Vol. 266, 1996
A full list of journal and conference publications as of April 2012
Publications
Return to Gusfield Homepage
July, 2015
This research was partially supported by NSF grants ITR9723346,
SEIBIO 0513910, CCF0515378, IIS0803564, CCF1017580, and IIS1219278.