Outline and synopses of the 2002 Lecture Videos

Lecture 1 Introduction to Bioinformatics; the four modalities of biological research;
many of the key ideas in bioinformatics predate the rise of bioinformatics (some going
back to the 1940's), and
are yet are central today; genomic and other high-throughput omic data has made 
those ideas and methods, and hence 
bioinformatics, essential. This course focuses on the mathematical and algorithmic 
ideas in the methods,
and much less on packaged tools. We emphasize the chain-of-reasoning that
leads from raw biological data and questions, to simplifying biological rules and models 
(ex. central dogma of molecular biology), to
mathematical models based on those rules, to mathematical questions, to algorithmic
problems that derive from the mathematical and biological  models, to programming
issues, and finally to user-friendly tools. All along this chain of reasoning,
one makes simplifications and compromises,and the utility of the final tool must
be carefully evaluated. The tool is best understood by understanding the chain
of reasoning and the compromises that lead to it. This course focuses on the
mathematical and algorithmic parts of the chain. Through Perl programming it
touches on some programming aspects of the chain, and through the labs it introduces
the student to some representative and illustrative tools, but there are other courses
and resources that do that focus on tools more effectively than is done in this course.

**************

Lecture 2 Continuation of the discussion of the chain of reasoning that leads to bioinformatics
tools. Introduction to Sequence Analysis. Why the use of sequence? First Fact
of Biosequence analysis: In bio-sequences, high similarity usually  implies significant
functional, structural or historical similarity or relationship. This is one of those
rules or models that one assumes today, but was not always known or believed. And of
course, it frequently needs to be modified or modulated. Exploitation of the First Fact
has made bioinformatics important. So, much of the course deals with technical issues
of how we define, compute and exploit sequence similarities. The First Fact pays the
bills. What is the basis for
the First Fact:  Evolution, shared history, common descent: 
Protein evolution  involves duplication (parology) in a species
with modification (R. Doolittle quote). This leads to similar protein sequences and
similar chemical and/or biological properties. Another mechanism is through speciation
and then differentiation of the proteins in the two species (orthology). In either case,
what we learn from one variant of the protein may be valuable in understanding the other
variant, and through sequence similarity we can find which proteins give insight to other
proteins. That is technologically very powerful - when one has a new sequence, its
biological properties may be very similar to similar sequences that have already
been studied. That was a revolutionary insight in the 1970's and 1980's but is now
absolutely accepted - and even taken to be obvious. Discussion of the words
``homology" and ``similarity". Similarity is used to infer homology. Another explanation
for the First Fact: Necessity and/or convergent evolution. History (chance) v. Necessity.
The Converse of the First Fact is NOT true. It is not true that similar function or
structure implies similar sequence similarity (Fred Cohen quote). 

Introduction to
the technical issues of sequence similarity; the basic algorithmic problem;
first definition of sequence similarity, and the algorithmic problem of computing
a measure of similarity. Definition of alignment, and
an example of an alignment.

********************

Lecture 3 Grain Crop Circle illustrating synteny of genes in the grains, and
partial interchangeability of genes between the grains.
Wheat has a genome size of 16 Billion bases, whereas Rice has only
a few hundred million bases.

Definition of String Alignment and Similarity. The formal definition
of similarity should reflect the biology. Highly ``similar" sequences
should reflect and expose related biology or chemistry. Mismatches
in an alignment reflect single residue, point mutations. So the concept
of a mismatch is acceptable in our definition of similarity because
it reflects underlying biology. What does a space in an alignment
reflect in the underlying biology? Insertion or deletion of
DNA or amino acids. Examples of insertions or deletions in DNA.
The formal definitions of alignment and similarity should realistically
model actual biological processes that change sequences over time.
Example of simplest objective function. Does the simplest objective
function correctly model biology? Matches and mismatches seem well
motivated, but what about the way spaces (Indels) are handled in
the objective function? That will be a large issue that will
be examined in the course. Definition of string similarity. Notions
of string similarity are at the heart of sequence-oriented bioinformatics.
We will embellish the model and definition throughout the course.

Algorithmic problem: How can we efficiently compute the similarity of two
strings, when given two strings? The mathematical definition of similarity
gives the ``what", but now we want to address the ``how".
First, lets start thinking naively and find a straight-forward, brute-force
approach. Generate all the explicit alignments, compute the value of each
alignment, and take the best (maximum value). How many alignments are there?
Way too many. So brute-force will not work. Remember that computers are small (memory), 
slow, and dumb (contrary to popular images of computers). Brute force would take
many ages of the universe to align moderate length sequences. Because of that
we will spend a lot of time discussing alternative methods that are much faster.
So we want to be clear why we are spending so much energy on the more clever
methods - the brute-force methods won't work.

Details of counting how many alignments there are. Two Perl programs
that compute the numbers, countbadly.pl and countgoodly.pl. Get them
from the class website.

**********************

Lecture 4
Review of the definition of the (basic) Similarity of two sequences: Max over all
possible alignments of the two sequences of the number of matches - number of
mismatches - number of spaces in the alignment.  Review of the fact that a
straight-forward enumeration of all alignments is impractical for all but very
small examples.  Start of discussion on richer alignment models (this is slightly
out of order, but needed now to prepare for the scheduled lab of that week).
Addition of multipliers (costs or penalties) on mismatches and spaces. That is, in the
objective function we multiply the number of mismatches by a number X and multiply
the number of spaces by a number Y, in the objective function, in order to tune
the objective function to better reflect the underlying biology. But now we have
the question of what values X and Y should take on. Introduction to the work
of Walter Fitch in the 1980s. He had a pair of sequences where he believed
he knew (from biological or chemical considerations) the "correct" alignment.
He then studied how the optimal alignment was influenced by different values
of X and Y in the objective function. He observed that none of the obtained
alignments agreed sufficiently with the "correct" alignment. That led him
to introduce the concept of a "gap" in an alignment. A gap is a contiguous
run of spaces. The observation (by Fitch and others) is that in protein
particularly, the spaces in the alginment are typicaly organized into a few
gaps, and alignments that have many gaps do not capture the biology adequately
(particularly for proteins).

The basic objective function for similarity does not have
a term for the number of gaps. So Fitch enriched the definition of similarity to
include a term for gaps. So now in the objective function we have matches, mismatches,
spaces and gaps. 

Introduction to Xparal (the focus of the lab for this week). Mention of the Barton
data and alignments also used in the lab. Introduction to Substitution Matrices
and their uses in the lab problems. Substitution matrices penalize mismatches
different depending on which mismatch it is, and even score matches differently
depending on which amino acid matches. For example, a mismatch between Y and G
should be penalized more than a mismatch between I and L. Matches of very common
amino acids should have a smaller (positive) score than matches of rare
amino acids.  Now the value of an alignment uses the total for all the
scores coming from matches, and the total of all penalties coming from mismatches,
rather than just counting the number of matches and mismatches.

The lab on Xparal is intended to get you to see how critical is each piece of
the alignment model. Unfortunately, the version of Xparal uses old
Substitution matrices, PAM matrices, rather than BLOSUM matrices, but the
main point still is clear - a good substitution matrix can lead to alignments
that correctly reflect the biology.

Definitions of Gap Initiation and Gap Extension. Gap initiation should be
more highly penalized than Gap extension.

Details of Xparal and parametric alignment. Overhead projections that are hard
to read. Parametric alignment decomposes the parameter space into convex
polygons, where in any polygon, any alignment A that is optimal for some
combination of parameters inside that polygon is optimal for all combination of
parameters inside that polygon. Conversely, alignment A is not optimal for
any parameter combinations that fall outside that polygon.

*******************
Lecture 5 
How do we actually compute the Similarity of two sequences efficiently?

Introduction to notions of efficiency and complexity. The time required for
a method is considered as a function of the length of the input, i.e., the
lengths of the two strings to be aligned. Definition and use
of Alignment Graph to discuss and compute optimal alignments, and similarity.
Each legal (directed) path from node (0,0) to node (n,m) specifies an alignment;
conversely, each alignment is specified by such a path. Diagonal edges align two
characters (the two characters for the row and column that the edge enteres); horizontal
and vertical edges align a character with a space (details in the lecture or think about it).
[Note, the video is too dark at this point in the lecture, so consult the lecture notes.]
Moreover, the total
cost (length) of that path (adding the costs on the edges in the path) is exactly the
value of the associated alignment.  Therefore, the problem of finding an
optimal alignment is equivalent to finding a path from (0,0) to (n,m) which
has the largerst total cost. 

So far, we have only transformed the problem of computing
an optimal alignment to the problem of finding a maximum cost path from (0,0) to (n,m) in
a directed graph. But we have not really made any progress in seeing how we can solve
either of those problems. Now we tackle that issue. Introduction to the logic of
Dynamic Programming (DP) to find a path from (0,0) to (n,m) of maximum total cost (length, distance).

*******************
Lecture 6  Continuation of the discussion on how to efficiently compute the Similarity of two
sequences, and the optimal alignment of two sequences, by DP. Review of the Alignment
Graph. Development of a Dynamic Programming (DP) procedure  to compute the length of
the longest path from node (0,0) to  each of the nodes in the Alignment Graph. 
The distances to the nodes in the Alignment Graph can be computed row by row, going
left to right in each row, or column by column going top to bottom in each column.
We can also progress along the anti-diagonals. The key thing is that a node (i,j) can be
processed at any time after its three neighbors (i,j-1), (i-1,j) and (i-1,j-1) have been
processed.

Why do we do this kind of computation?  For each node
in the graph, the number of operations needed is a constant number, independent of how
long the sequences being aligned are.
So for two sequences of length n and m each, the totoal number of operations for this DP approach,
to compute the longest path length (or distance) from node (0,0) to node (n,m) is O(nm), 
i.e. bounded from above by a function which is proportional to the product of n and m.
Compare that to the number of alignments to see why this is so attractive for practical
computation, compared to explicitly generating all alignments.

Introduction to the issue of traceback, to actually construct the optimal alignment
from the computation done to find the longest distance from (0,0) to (n,m). This uses
the same kind of logic that was used to compute the distances (numbers) but it is used
in reverse. The traceback is  a general step in DP, not just for computing optimal alignments.
It is (in my experience) the most subtle part of DP for students to understand. For example,
it is not always obvious why we need such a step. Why can't we keep track of the
longest paths during the step when the distances (numbers) are computed? Why do we
have to do this traceback from node (n,m) to node (0,0)? Explanation via the running
example.

The bread-crumb or pointer approach to traceback.

All this, using an explicit Alignment Graph, is conceptually nice, but not a natural
approach to turn into a computer program. For that, we have to change our view 
of DP from the Alignment Graph to a purely notational approach.


Question for next time: What changes if we use a substitution matrix to weight specific 
matches and mismatches differently?

********************

Lecture 7

DP formal treatment - moving from alignment graph to recurrences and programable
notation. Define V(i,j) and develop general recurrences for it. This is essentially,
but not exactly, the Needleman-Wunsch algorithm in bioinformatics, that computes an
optimal *global* alignment between two sequences. The value of that optimal
alignment is defined as the ``similarity" between the two sequences. The use of
an ``alignment table", which clearly translates into an array in a computer program.

At bit of the required Perl needed to extend the basic Needleman-Wunsch program to 
handle mismatch and space penalties more general than -1.


*********************

Lecture 8

Development of DP notation and recurrences to allow the use of a general
substitution matrix in the computation of similarity. Computation in a table
instead of in a graph. Introduction to local alignment. Why local alignment of
biological sequences? Several examples.  Formalizing the local alignment objective
in terms of a pair of substrings of the two input strings, and the global alignment
between the pair of substrings. If we try to compute the optimal local alignment by
directly following the definition of local alignment, how many operations will that
require? Answer: a cubing of the number needed for global alignment, so n raised to the
sixth power, if both input strings are of length n. This is compared to n**2 for global alignment
of two strings of length n. So the straightforward implementation of the definition of
local alignment is impractical,
and we need a better (faster) method, if we are going to be able to compute local alignments. 
The Smith-Waterman DP method is the answer. It computes
the optimal local alignment using a number of operations that is proportional to the
number of operations used for global alignment, and the constant of proportionality is small.

********************

Lecture 9 Guest Lecture on Longest Common Subsequence by Professor Martel. 
Definition and DP solution. Expected length of the longest common subsequence.

**********************

Lecture 10  Detailed development of the DP solution to compute optimal local alignment by the Smith-Waterman
DP method.  Worst-case number of operations for the method is O(nm) which is the same as for global alignment.
The constant of proportionality is slightly larger for local than for global alignment, but the difference
is very small.


********************

Lecture 11  End-gap-free alignment. Example of use in Whole-Genome Shotgun sequencing.
Development of the DP to compute optimal End-gap-free alignment. Introduction
to the derivation of the expected length of the longest common substring.

********************

Lecture 12  Derivation of the expected length of the longest common substring between
two random strongs of the same length. Actually an upper bound on that length. Be aware
that I made an error towards the end of the lecture (around minute 35), defining r as the minimum integer
such that E(Cm) < 1 rather than defining r as the maximum integer such that E(Cm) >=  1, and also
forgetting that when the base of a logarithm is less than 1, the log is negative. 
That caused trouble. The notes for that lecture correct the error.

********************

Lecture 13  Continuation of the topic of string matching probability. Now we look at the
probability that a query string S matches completely at least once in a much
longer string D, which represents the concatination of sequences in a database. We address the
question: If a query string S and the long string D are both random, what is the probability
that S is found to exactly match somewhere in D? 

In the last nine minutes (starting around minute 44 of the lecture) of the lecture 
I talk about Perl and hashes in  Perl. Hashes are, for me, one of the key reasons
for programming in Perl, and their use is central in the toy BLAST program that you
will write in the next two weeks.


*********************

Lecture 14 BLAST Part of this lecture, on the ideas behind BLAST, 
follows the Lecture Notes on Blast we wrote that are linked from Lab 5.
Alignment based on DP is better than pure enumeration of alignments and this is
fast enough for the optimal alignment (local or global) of two sequences, but
even DP is not fast enough to do alignment between a query string and every
sequence in a large database.

The ideas of BLAST 1, ungapped BLAST. The use of k-mers, seeds, extensions, 
and hashing in BLAST. The exact matches are called HSPs, high scoring pairs. The value of
k was around 11 for DNA at the time of the lecture, and around 3 for amino 
acids.  I don't know what it is today. In the case of protein, a substitution
matrix is used. Also, neighborhoods are used, in order to incorporate small
mismatches. The lecture partly makes reference to the PERL needed to implement toy BLAST.

Question: Why is this method so much faster than doing DP?

**********************

Lecture 15 Continuation of BLAST

Hashing kmers, BLAST speed, HSPs, Kmer neighborhoods, E-value, Introdction to Gapped-BLAST, BLAST II.

*********************

Lecture 16

This whole lecture explicitly talks about the five kmer programs distributed
in Lab 6 for the start of the toy BLAST project.

********************

Lecture 17

Continuation of discussion of kmer programs, and Perl. Two dimensional hashes. The assignment of list to the value of
a hash. Dereferencing hashes.  Final discussion of BLAST and statistics in BLAST: E-values reported by BLAST.
BLAST results are based on the maximum score found in the matches of a query to sequences in
the database. Therefore, BLAST statistics concern the maximum value of a set of random variables (match values).
Maximum values are described by the extreme value distribution. In contrast, the Normal distribution concerns
the Sum of random variables. It is a mistabke to think that BLAST statistics based on the extreme value distribution
should look like or behave like the Normal distribution. In particular, the extreme value distribution has
a very long right tail - that is the probability of scores at the right end of the extreme value distribution remains high 
for much longer than for scores at the right end of the Normal distribution. Therefore, if you think the random scores
are distributed as in the Normal distribution, you will often be overly impressed by large observed scores.

********************

Lecture 18 

This lecture starts with a review of matching probabilities, the case of a query string
matching somewhere in a database. As n (the size of the database)
grows relative to m, the probability that the query string matches somewhere in the database
grows quickly. The take-home lesson is that when there are many trials (possible
starting places for the exact match of the query string in the database) the probability
of a success somewhere can be very different from the probability that two random strings of
the same length exactly match. The formula for the Extreme value distribution and density
is displayed without much discussion. A graph for the density is shown. The point again
is that a large max over many events can happen with high probability, even when the
probability of an exact match between two random stings is very low.
This point is connected to the Granin and BRCA1 story and the PROSITE search. The probability
of a random sequence matching the Granin regular expression is very low, but the probability
that a random sequence will match some regular expression in PROSITE, with a probability
as low or lower than the number computed for Granin, is very high. The analogy is made
to coin flipping. The probability of a specific sequence of heads/tails in 100 flips
is very low, but the probability that a sequence of that low probability will occur, is 1.
So when such a small probability event is observed, we have no ``statistical support" to
reject the null hypothesis that the heads/tail sequence was the result of random coin flips.
PROSITE is a filter, not an oracle.

********************

Lecture 19

The start of discussion on Multiple Sequence Alignment. Extension of pairwise sequence
alignment to multiple sequences in order to identify weak patterns in the sequence.
These patterns would not be pulled out of pairwise sequence alignment. Why do we do
multiple alignment? "Alignment of two sequences whisper, but a multiple sequence alignment
shouts".
Sum-of-Pairs objective function, and how to compute it exactly for three strings, using DP.
Development of the DP for three strings. For three strings of length m each the number of operations
for this exact DP is proportional to m^3 (n raised to the third power). In general, if there are k
strings, each of length n, the number of operations for the exact DP that computes the optimal sum of
pairs multiple alignment  is proportional to m^k (n raised to the k'th power). As k grows, unless
m is extremely small, this quickly becomes prohibitive, and for proteins, exact DP solutions are
practical for about 8 or 9 sequences at most. There is a program called MSA (look for it on the web)
that does this exact computation, but with many speedup tricks compared to the raw DP solution.

Introduction to Progressive Multiple Sequence Alignment. The tree consistency theorem.

***********************

Lecture 20

The tree consistency theorem. Definitions: S is a set of k strings and T is a tree with k nodes,
and each node is labeled with a distinct string in S. A multiple alignment M of S
is said to be ``consistent" with T, if the induced pairwise alignment 
of each pair of sequences
Si and Sj that are neighbors in T, is an optimal pairwise alignment of Si and Sj.
An example is shown of these definitions.

Tree Consistency Theorem: For any tree T labeled with the sequences in S, there is a multiple
alignment M of S, which is consistent with the labeled tree T. 

Definition of D(i,j) and DM(i,j). D(i,j) is the optimal pairwise alignment value of 
sequences Si and Sj. DM(i,j) is the value of the induced pairwise alignment of
Si and Sj in the multiple alignment M. For multiple alignment we score
a pairwise alignment as the number of mismatches plus the number of spaces.
This is different from the maximization criteria we used earlier for pairwise
alignment, but it has the same modeling validity - but now a low D value for 
a pair of sequences indicates
highly related sequences, instead of having a high similarity value V for the pair
of sequences. 

The multiple alignment sum-of-pairs problem is to find
a multiple alignment of the sequences in S to minimize
SUM i <= i <= j <= k [DM(i,j)].

Clearly, for any i,j, DM(i,j) >= D(i,j), so
SUM i <= i <= j <= k [DM(i,j)] >=
SUM i <= i <= j <= k [D(i,j)]. 
So we would like in the multiple alignment M to make DM(i,j) = D(i,j) for
as many pairs of sequences Si, Sj as possible.

The Tree Consistency Theorem says that given any set of sequences S and a tree T  whose
nodes are labeled by the sequences in S,
there is a multiple alignment M of S such that
DM(i,j) = D(i,j) for each pair of sequences Si, Sj which label neighboring nodes
in T.

A proof is given that is constructive, showing how to actually construct the
multiple alignment M from T and S. This constructive proof also shows how
to merge two sub-multiple alignments, given a pairwise alignment between
a sequence Si already in one of the sub-multiple alignments and a sequence Sj in 
the other sub-multiple alignment. As a special case,  this allows
us to add a new sequence into
an existing mutiple alignment given a pairwise alignment between a sequence
that is already in the MSA, and a sequence that is not already in it. This
merge achieve two goals: a) it preserves
the induced pairwise alignment value of
any pair of sequences that were already both in one of the sub-multiple alignment 
before the merge;  the
induced pairwise alignment value of Si and Sj in the merged multiple alignment
is exactly the same as the optimal pairwise alignment of Si and Sj.

Although tree consistency only assures that n-1 of the
(n choose 2) induced pairwise distances are equal (as small as) their optimal pairwise distances,
if the objective function used for pairwise alignment satisfies the triangle inequality condition,
then the total induced (sum-of-pairs) distance of the best center star is at most twice the
total distances of the (n choose 2) optimal pairwise alignments.

*******************

Lecture 21 
Introduction to the alignment method encoded in star.pl. In that method,
the ``guide tree" T is a star with one center and n-1 leaves. So the only issue
is which sequence to put in the center.
Multiple sequence alignment: progressive alignment using a guide tree - ClustalX and star.pl
Although consistency only assures that n-1 of the
(n choose 2) induced pairwise distances are equal (as small as) their optimal pairwise distances,
if the objective function used for pairwise alignment satisfies the triangle inequality condition,
then the total induced (sum-of-pairs) distance of the best center star is at most twice the
total distances of the (n choose 2) optimal pairwise alignments.

In the discussion of Clustal, the lecture says that the guide tree used in Clustal is essentially
the Minimum Spanning Tree of the weighted complete graph (i.e., with all possible edges) where each node
represents one of the input sequences,
each edge weight is edit distance between the two nodes at the ends of the edge. The MST is also given
by the Single-Link Cluster algorithm. But actually, the
guide tree used in Clustal is given by the algorithm called the NeighborJoining algorithm, which gives a
tree which is not the same as the Minimum Spanning Tree.

A few uses of multiple alignment - PSSM, PSI-BLAST, HMMs and Pfam, Prosite, BLOCKs 
(these go both ways, where a multiple alignment is used to construct an HMM, and conversely where an HMM is used
to build a multiple alignment). Review of the First Fact of sequence analysis, and
review of the Second Fact of sequence anlysis.

**********************

Lecture 22

Continuation of the discussion of uses of multiple alignment to build a model of a set of sequences
that are related in some biologically significant way, but where the pairwise sequence similarity of
any pair of sequences is not suffient to identify the important biological elements.
In this lecture we focus on PSSMs (Position Specifice Scoring Matrices) derived from a multiple
alignment. Frequency counts in each colum of a multiple alignment are turned into scores. How and Why, and
how are the scores used? A PSSM is used to evaluate how similar a new sequence, not one used in the multiple
alignment that lead to the PSSM, is to the original sequences. The goal is to accurately recognize new
members of the family of sequences used in the multiple alignment, and of course, to not incorrectly
declare a sequence to be in that family. We want no false positive and no false negatives. 

The score of a new sequence, using the character scores in the PSSM, 
really gives the log of the ratio of the probability that the new sequence was generated
by a ``generator of the family" or by the same process that generated the family, 
divided by the probability that the new sequence was generated by
a random sequence generator. Later we will be more precise about what a ``generator" or ``process" is.
So the score given to a sequence is a ``log odds ratio", a kind of score that is common in statistics
and genetics. Why is the log used?


*************************

Lecture 23

Finish of the discussion of profiles and PSSMs - a more formal treatment, and a more complete
discussion of why the log of the probability ratio is used.  A more complete discussion of the
model that generates the sequences to obtain the PSSM. This is really an introduction to
very simple Markov Models that generate sequences. Introduction to Markov chains and the
probability of a given Markov chain.

An  explicit introduction to Markov models and Hidden Markov Models.

The key property of a Markov Model is that at any point, the future is 
influenced by the current state the chain is in, but not influenced by the
past before the current state - that is, the probability of future parts of
the chain is not influenced by how the chain arrived at its current state,
but is influcenced by the current state of the chain.

************************

Lecture 24

HMMs for CpG islands.  This lecture follows the discussion in the Durbin and
Eddy book on CpG models.

************************

Lecture 25

Review of the CpG island model and the idea of HMMs modeling nature.
The Viterbi Algorithm to find the chain of largest probability in a HMM that generates a given sequence
S.
This is one of the ways that one can decide (educatedly guess) which part of a sequence S is
in the + model and which is in the - model.

The actual discussion of the DP to find the chain of largest probability that generates S starts
around minute 25 of the lecture. In the context of HMMs the DP solution of this problem is called the
Viterbi Algorithm.

*************************

Lecture 26

More detailed discussion of the Viterbi Algorithm: developing the recurences; evaluating the
recurrences in a table; traceback to find the optimal chain; derivation of the number
of operations used in the Viterbi Algorithm.

Introduction and discussion of the Forward Algorithm, computes the probability that a given sequence
is generated by the HMM.  For that, one has to (implicitly) consider all the chains in the Markov Model,
but again this can be done efficiently using DP with a number of operations proportional to nm^2 where n
is the length of the sequence, and m is the number of states in the HMM.

**********************

Lecture 27

Introduction to the Backward Algorithm for HMMs. The `what' and `why' of the backward algorithm, but not the
`how'. For information on the `how', see the Durbin book.

Discussion of Profile HMMs - these are actually the simplest and most common HMMs and the basis for
many of the HMMs in Pfam. Generally, a profile HMM is derived from a multiple alignment of the
sequences in the family.

***********************

Lectures 28, 29 Mystery lectures: We didn't get to these in the 2008 course, so I have not
looked at them recently. When I watch them, I will write the synopses. Be the first on your
block to do it before I do.