COMP 691 R Bioinformatics Algorithms
Lecture Contents and Reading List
For each area to be studied, I plan to provide
- an introduction to the genomics and biology involved,
- a set of readings on the major algorithms and analysis techniques,
- comparative studies of algorithms, and
- links to the computer science literature for the algorithm design
principles involved.
Sequence Analysis
The first set of web pages explain how a sequencer works, and how a
sequencing project is organized.
The second web page is a very good tutorial. You should know about the
Smith-Waterman, FASTA, and BLAST algorithms, as well as how the scoring
matrix represents the "theory of evolution".
Then there is information on BLAST. The NCBI web pages are very detailed,
and you should note that the statistical analysis that underlies BLAST is
extremely important: it gives you a level of confidence for the results.
Reference 5 is an example of pattern finding in sequences, in this case
to classify the "family" to which a protein belongs.
The last four references deal with multiple alignment of sequences.
-
Basic biotechnology behind sequencers.
Read
Genomics1,
Genomics2,
Genomics3, and
the output of the ABI sequencer
-
A Tutorial on Searching Sequence Databases and Sequence Scoring Methods
- Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z.,
Miller, W., and Lipman, D.J. 1997.
Gapped BLAST and
PSI-BLAST: a new generation of protein database search programs,
Nucleic Acids Research. 25: 3389 - 3402.
An extremely good online guide and tutorial is available at NCBI
here.
- S.F. Altschul,
The statistics of sequence similarity scores.
- Hofmann, K; Bucher, P; Falquet, L and Bairoch, A (1999).
The PROSITE database, its status in 1999. Nucl. Acids Res. 27, 215-219.
prosite web site.
See the information on patterns and motifs, including how to construct them,
in stanford biochem 218 slides.
- Thompson J.D., Higgins D.G., Gibson T.J.;
"CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice.";
Nucleic Acids Res. 22:4673-4680(1994).
See the help at the
clustalw web server
for more information.
- B Morgenstern, K Frech, A Dress, and T Werner,
DIALIGN: finding local similarities by multiple sequence alignment,
Bioinformatics 1998 14: 290-294.
- B Morgenstern,
DIALIGN 2: improvement of the segment-to-segment approach to multiple
sequence alignment , Bioinformatics 1999 15: 211-218.
See the DIALIGN web page
- D Thompson, F Plewniak, and O Poch,
A comprehensive comparison of multiple sequence alignment programs,
Nucleic Acids Res. 1999 27: 2682-2690.
Other Links
Introduction to making and using protein multiple alignments
Tutorial: Phylogenetic analysis given at ISMB 1999.
Phylogeny Programs
Secondary Structure Prediction
-
Secondary Structure Prediction methods and links provides a good overview
and links to servers.
-
X. Zhang, J.P. Mesirov, D.L. Waltz,
Hybrid system for protein secondary structure prediction,
Journal of Molecular Biology, 225 (1992) 1049-1063.
-
A.A. Salamov and V.V. Solovyv,
Prediction of protein secondary structure by combining nearest-neighbor
algorithms and multiple sequence alignments,
Journal of Molecular Biology, 247 (1995) 11-15.
-
B. Rost and C. Sander,
Prediction of protein secondary structure at better than 70% accuracy,
Journal of Molecular Biology, 232 (1993) 584-599.
-
S. Salzberg and S. Cost,
Preicting protein secondary structure with a nearest-neighbor algorithm,
Journal of Molecular Biology, 227 (1992) 371-374.
-
J.U. Bowie, R. Luthy, D. Eisenberg,
A method to identify protein sequences that fold into a known three-dimensional structure, Science 253 (1991) 164-170.
-
R. King and M.J.E. Sternberg,
Machine learning approach for the prediction of protein secondary structure,
Journal of Molecular Biology, 216 (1990) 441-457.
Gene Expression Analysis
The first two papers discuss the steps in using microarrays for
comparative gene expression.
The next three papers look at the issue of data normalization.
The sixth paper discusses the statistical problems in analyzing microarray data.
The remaining papers present clustering approaches with applications to
gene expression analysis.
- Jeremy Buhler,
Anatomy of a Comparative Gene Expression Study.
- Michael B. Eisen and Patrick O. Brown,
DNA Arrays for Analysis of Gene Expession,
Methods in Enzymology, vol. 303 (1999) pp. 179-205.
- Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed,
Normalization for cDNA Microarray Data.
SPIE BiOS 2001, San Jose, California, January 2001.
- Johannes Schuchhardt, Dieter Beule, Arif Malik, Eryc Wolski, Holger Eickhoff,
Hans Lehrach, and Hanspeter Herzel,
Normalization strategies for cDNA microarrays,
Nucleic Acids Res. 2000 28: e47.
- Alexander Zien, Thomas Aigner, Ralf Zimmer, and Thomas Lengauer,
Centralization: a new method for the normalization of gene expression data,
Bioinformatics 2001 17: 323S-331S.
Abstract
Paper
- Sandrine Dudoit, Yee Hwa Yang, Matt Callow and Terry Speed,
Statistical methods for identifying differentially expressed genes in replicated
cDNA microarray experiments, Technical report #578, August 2000.
- A. Brazma, and L. Vilo,
Minireview: Gene Expression Data Analysis.
FEBS Letters 480 (2000) 17-24.
- J. Zhu and M. Q. Zhang,
Cluster, Function and Promoter: Analysis of Yeast Expression Array,
Pacific Symposium on Biocomputing 5:476-487 (2000).
- J. Vilo, A. Brazma, I. Jonassen, A. Robinson, and E. Ukkonen,
Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression
Data. ISMB'2000 August 2000. AAAI press. pp. 384-394.
- Pavlidis, P., Grundy W.N. (2000)
Combining microarray expression data and phylogenetic profiles to learn gene
functional categories using support vector machines.
Technical report, Columbia University Department of Computer Science.
- R. Sasik, T. Hwa, N. Iranfar, and W.F. Loomis,
Percolation Clustering: A Novel Algorithm Applied to the Clustering of
Gene Expression Patterns in Dictyostelium Development,
Pacific Symposium on Biocomputing 6:335-347 (2001).
- Ka Yee Yeung, David R. Haynor and Walter L. Ruzzo,
Validating Clustering for Gene Expression Data,
Technical Report UW-CSE-00-01-01, January, 2000.
Also appeared as
Bioinformatics, 2001 v 17 #4: 309-318.
Supplementary Web Site
- Laura Lazzeroni and Art Owen,
Plaid Models for Gene Expression Data
Technical Report,
Stanford University, March 2000.
Good people in the area are
Other links:
3D Structure Prediction
-
R. Srinivasan and G.D. Rose,
LINUS: A hierarchic procedure to predict the fold of a protein,
Proteins: Structure, Function, and Genetics 22 (1995) 81-99.
LINUS home page
-
S. Lemieux, S. Oldziej and F. Major,
Nucleic Acids : Qualitative Modeling,
in The Encyclopedia of Computational Chemistry,
P. Schleyer et al (editors),
John Wiley & Sons: Chichester, 1998.
-
J.R. Gunn,
Sampling protein conformations using segment libraries and a genetic algorithm,
J. Chem. Phys. 106 (1997) 4270-4281.
-
Liisa Holm and Chris Sander,
Protein structure comparison by alignment of distance matrices,
Journal of Molecular Biology 233 (1993) 123-138.
The Dali server
Other Links
STRUCTURE PREDICTION FLOWCHART
WWW Resources for Protein Structure
Protein-protein docking programs.
Protein Docking slides from Stanford Biochem 218.
General Information Sources
-
Biochem 218 Computational Molecular Biology course at Stanford.
Last modified on May 1, 2003 by gregb@cs.concordia.ca