Course Description
Dan Gusfield (U California Davis), [introductory/intermediate, 8 hours]
ReCombinatorics: The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination
The work discussed in this course falls into the emerging area of Population Genomics. I will first introduce the area and then talk about models, problems and combinatorial algorithms involved in the inference of recombination from population data.
A phylogenetic network (or Ancestral Recombination Graph) is a generalization of a tree, allowing structural properties that are not tree-like. With the growth of genomic and population data, much of which does not fit ideal tree models, and the increasing appreciation of the genomic role of such phenomena as recombination (crossing-over and gene-conversion), recurrent and back mutation, horizontal gene transfer, and mobile genetic elements, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks.
In this course, I will discuss a range of recent algorithmic and mathematical results on phylogenetic networks with recombination and show applications of these results to several issues in Population Genomics. The methods involve combinatorial algorithms and graph theory; both theoretical and empirical results will be discussed.
References:
-
D. Gusfield. Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions. Proceedings of the Sixth Annual International Conference on Computational Biology (RECOMB 2002), ACM Press: 166-175, 2002
- D. Gusfield, S. Eddhu and C. Langley. Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination. Journal of Bioinformatics and Computational Biology, 2(1): 173-213, 2004
- D. Gusfield, S. Eddhu and C. Langley. The Fine Structure of Galls in Phylogenetic Networks. INFORMS Journal on Computing, Special issue on Computational Biology, 16(4): 459-469, 2004
- D. Gusfield. Optimal, Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained Recombination. Journal of Computer and Systems Sciences, Special issue on Computational Biology, 70: 381-398, 2005
- Y. Song, Y. Wu and D. Gusfield. Efficient Computation of Close Lower and Upper Bounds on the Minimum Number of Needed Recombinations in the Evolution of Biological Sequences. Bioinformatics, 21 Supplement 1, Proceedings of the ISMB 2005 Conference: 413-422, 2005
- Y. Song, D. Gusfield, Z. Ding, C. Langley, Y. Wu. Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover Recombination in the Derivation of SNP Sequences in Populations. Proceedings of RECOMB 2006, LNBI 3909, Springer: 231-245, 2006
- Y. Wu and D. Gusfield. Efficient Computation of Minimum Recombination with Genotypes (not Haplotypes). Proceedings of the Computational Systems Biology Conference, Stanford, August 2006: 145-156, 2006
- D. Gusfield, D. Hickerson and S. Eddhu. A Fundamental, Efficiently Computed Lower Bound on the Number of Recombinations Needed in a Phylogenetic History. Discrete Applied Mathematics, 155: 806-830, 2007
- Y. Wu. Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms. Proceedings of RECOMB 2007: 488-502, 2007
Y. Wu and D. Gusfield. Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants. Proceedings of Combinatorial Pattern Matching 2007: 150-161, 2007
- Y. Wu and D. Gusfield. A New Recombination Lower Bound and the Minimum Perfect Phylogenetic Forest Problem. Proceedings of the 13th Annual International Conference on Combinatorics and Computing, 2007: 16-26, 2007
- D. Gusfield, V. Bansal, V. Bafna and Y.S. Song. A Decomposition Theory for Phylogenetic Networks and Incompatible Characters. Journal of Computational Biology, 14(10): 1247-1272, 2007
All the papers and associated software can be accessed at wwwcsif/cs.ucdavis.edu/~gusfield.
Andrey Rzhetsky (U Chicago), [introductory, 6 hours]
Trees, Networks, and their Use in Systems Biology
There is plethora of biomedical problems that were productively tackled in the recent years through inference and computation over trees and networks. The successful applications span evolutionary biology, medical and statistical genetics, epidemiology, biochemistry and interface of health sciences and sociology. This course will give an overview of synthesis of computational approaches and domain problems associated with these topics, with introduction to relevant modeling and inference approaches.
References:
-
J. Felsenstein, Inferring Phylogenies. Sinauer Associates, 2004
- M.E.J. Newman, Networks: An Introduction. Oxford University Press, 2010
Robert Stevens (U Manchester) & James Malone (European Bioinformatics Institute, Hinxton), [introductory/intermediate, 4 hours]
Bio-Ontologies
Unless we know what entities our data describe, those data are of reduced value. Biologists are good at naming the entities they investigate, but they are too good at it. As a consequence, there are too many names for the things we need to analyse in our data: the functions, processes, anatomical components, cells, diseases and so on. Making explicit what the entities we analyse are and how they relate to each other is the topic of bio-ontologies. This short course will introduce you to the area of bio-ontologies, what they are, how they are built and what can be done with them once we have them. By the end of the course, attendees will know what an ontology is, the uses of an ontology in bioinformatics, bio-ontologies in use, outlines of ontology authoring.
References:
- Daniel L. Rubin, Nigam H. Shah, Natalya F. Noy. Biomedical ontologies: a functional perspective. Briefings in Bioinformatics, 9(1):75-90, 2008
- The Gene Ontology Consortium (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Research 38(1):D331-D335, 2010
- Barry Smith et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25:1251-1255, 2007
- James Malone, Ele Holloway, Tomasz Adamusiak, Misha Kapushesky, Jie Zheng, Nikolay Kolesnikov, Anna Zhukova, Alvis Brazma, Helen Parkinson. Modeling Sample Variables with an Experimental Factor Ontology. Bioinformatics 26(8):1112-1118, 2010
- Andrey Rzhetsky, James A. Evans. War of Ontology Worlds: Mathematics, Computer Code, or Esperanto? PLoS Computational Biology 7(9): e1002191. doi:10.1371/journal.pcbi.1002191, 2011
Martin Tompa (U Washington Seattle), [introductory/intermediate, 4 hours]
Comparative Sequence Analysis in Molecular Biology
In computational molecular biology, "phylogenetic footprinting" is a standard idea that is used to predict functional regions within a biological sequence (DNA, RNA, or protein). The procedure is to find corresponding sequences from several related species, and within these to identify those regions that have mutated less than expected over the course of evolution, suggesting that these regions are under selective pressure due to biological functionality.
We will discuss various algorithms for and applications of phylogenetic footprinting and demonstrate some of these using software available on the web. We will then turn our attention to the larger problem of doing phylogenetic footprinting on a whole-genome scale, demonstrating the use of a genome browser available on the web and discussing the issue of assessing its reliability.
References:
- Blanchette, M., Tompa, M. Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting. Genome Research, 12(5), May 2002, 739-748
- Chen, X, Tompa, M. Comparative Assessment of Methods for Aligning Multiple Genome Sequences. Nature Biotechnology, 28(6), June 2010, 567-572
- Karlin, S., Altschul, S.F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences USA, 87, 1990, 2264-2268
- Kuhn, R.M., Karolchik, D., Zweig, A.S., Trumbower, H., Thomas, D.J., Thakkapallayil, A., Sugnet, C.W., Stanke, M., Smith, K.E., Siepel, A., Rosenbloom, K.R., Rhead, B., Raney, B.J., Pohl, A., Pedersen, J.S., Hsu, F., Hinrichs, A.S., Harte, R.A., Diekhans, M., Clawson, H., Bejerano, G., Barber, G.P., Baertsch, R., Haussler, D., Kent, W.J. The UCSC Genome Browser database: update 2007. Nucleic Acids Research, 35, January 2007, D668-673
- Kumar, S., Filipski, A. Multiple sequence alignment: In pursuit of homologous DNA positions. Genome Research, 17, February 2007, 127-135
- Neph, S., Tompa, M. MicroFootPrinter: a Tool for Phylogenetic Footprinting in Prokaryotic Genomes. Nucleic Acids Research, 34, July 2006, W366-W368
- Prakash, A., Tompa, M. Measuring the Accuracy of Genome-Size Multiple Alignments. Genome Biology, 8, June 2007, R124
Alfonso Valencia (Spanish National Cancer Research Centre, Madrid), [advanced, 4 hours]
A Bioinformatics Perspective of Personalized Medicine
The fast progression of genomics is making of the use of personal genomic information a pressing daily reality. In this scenario, Bioinformatics plays a central role. The organization and analysis of individual genomes is a complex task involving data organization, integration and interpretation challenges. This task requires a blend of engineering and scientific developments at each step of the analysis, touching many areas in which the development of computational methods is very active.
In the context of the CNIO clinical setting, my group is developing both the technical framework for the interpretation of the result in collaboration with the clinicians and the science required at various level of the analysis. Based on the experience accumulated in this new area of application, I will review the key problems in the analysis of high-throughput sequencing information, prediction of the incidence of mutation in proteins and other coding regions, analysis of splicing and splice sites, comparative analysis of affected pathways, and extraction of mutation-drug-disease relations from databases and text-sources.
References:
- Anais Baudot, Víctor de la Torre & Alfonso Valencia, Mutated genes, pathways and processes in tumours. EMBO Reports 11(10):805-810, 2010
- Atul J. Butte, Translational Bioinformatics: Coming of Age. Journal of the American Medical Informatics Association 15(6):709-714, 2008
- Guy Haskin Fernald, Emidio Capriotti, Roxana Daneshjou, Konrad J. Karczewski & Russ B. Altman, Bioinformatics challenges for personalized medicine. Bioinformatics 27(13):1741-1748, 2011
- Marina Sirota, Joel T. Dudley, Jeewon Kim, Annie P. Chiang, Alex A. Morgan, Alejandro Sweet-Cordero, Julien Sage & Atul J. Butte, Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data. Science Translational Medicine 3(96ra77), 2011
Limsoon Wong (National University of Singapore), [introductory/intermediate, 6 hours]
Using Biological Networks for Protein Function Prediction, Biomarker Identification, and Other Problems in Computational Biology
While sequence homology search has been the main work horse in protein function prediction, it is not applicable to a significant portion of novel proteins that do not have informative homologs in sequence databases. Similarly, while statistical tests and learning algorithms based purely on gene expression profiles have been popular for analyzing disease samples, critical issues remain in the understanding of diseases based on the differentially expressed genes suggested by these methods. In the past decade, a large number of databases providing information on various types of biological networks have become available. These databases make it possible to tackle biological problems in novel ways. This course presents a review on biological network databases and an introduction to approaches -- based on biological networks -- for protein function prediction, biomarker identification, and other interesting challenges in computational biology.
References:
-
M. Emily, T. Mailund, J. Hein, L. Schauser, M. H. Schierup. Using biological networks to search for interacting loci in genome-wide association studies. European Journal of Human Genetics, 17(10):1231-1240, 2009
- W.W.B. Goh, Y.H. Lee, M. Chung, L. Wong. How advancement in biological network analysis methods empowers proteomics. Proteomics, in press.
- B.A. Shoemaker, A.R. Panchenko. Deciphering protein-protein Interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Computational Biology, 3(4):e43, 2007
- O. Vanunu, O. Magger, E. Ruppin, T. Shlomi, R. Sharan. Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology, 6(1):e1000641, 2010
- L. Wong. Using Biological Networks in Protein Function Prediction and Gene Expression Analysis. Internet Mathematics, in press.
Ying Xu (U Georgia), [advanced, 8 hours]
Cancer Bioinformatics
The availability of large-scale omic data for multiple types of cancers in the public domain, in conjunction with our current understanding about cancer, allows computational cancer biologists to study cancer in a comparative and more systematic manner, which makes it possible to discover previously unknown relationships among different aspects of cancer initiation, growth and metastasis. In this short course, I will present an overview about what computational researchers can do to help solving a variety of challenging cancer-related problems. I plan to cover the following topics: (a) a brief overview of cancer biology by reviewing the hallmarks of cancer; (b) a brief overview of information derivable through analyses and comparative analyses of large-scale transcriptomic data; (c) cancer classification based on transcriptomic data: examining cancers from multiple perspectives; and (d) a taste of hypothesis-driven cancer research through transcriptomic data mining.
References:
- Robert A. Weinberg, The Biology of Cancer. Garland Science, 2006
- Douglas Hanahan and Robert A. Weinberg, The hallmarks of cancer, Cell, 100(1):57-70, 2000
- Douglas Hanahan and Robert A. Weinberg, The hallmarks of cancer: the new generation, Cell, 144(5): 646-674, 2011
- In-class handouts
Zohar Yakhini (Agilent Laboratories), [intermediate/advanced, 6 hours]
Algorithmics and Statistics in the Analysis of High Throughput Molecular Measurement Data
The courses will be organized as follows:
- Introduction and overview: hybridization; SBH and deBruijn graphs; expression profiling by microarrays (1h.)
- Introduction to differential expression (Gaussian distribution, normal fits, t-test, ANOVA, TNoM and other non-parametric approaches). FDR, overabundance and class discovery (2h.) [Comment – hands-on assignment: 2 datasets analysis from download to OA, incl. optimal two class comparisons.]
- Statistical enrichment in ranked lists: the hypergeometric distribution; ranked lists; statistical approaches to ranked lists, including minimum hypergeometric and Wilcoxon rank sum; microRNA and related data analysis (2h.)
- Copy number measurements and aberration calling. Copy number variation in normal populations (1h)