Introduction to Computational Molecular Biology:
Genome and Protein Sequence Analysis
(Winter Quarter 2016)
Assignment 1, due Sunday Jan 17
Assignment 2, due Sunday Jan 24
Assignment 3, due Sunday Jan 31
Assignment 4, due Sunday Feb 7
Assignment 5, due Sunday Feb 14
SYLLABUS & LECTURE SLIDES:
Nature paper on Avida
Avida web site
Nature paper on human genome sequence
Nature paper on mouse genome sequence
Siepel et al. paper on PhyloHMMs & sequence conservation
Rabiner tutorial on HMMs
HMM scaling tutorial (Tobias Mann)
Supervised learning tutorial
- Biological Review : Gene and genome structure in prokaryotes and eukaryotes; the genetic code & codon usage; "global" genome organization. Sources and characteristics of sequence data; Genbank and other sequence databases.
- Lecture 1: Finding exact matches in sequences using suffix arrays.
- Lecture 2: Algorithmic complexity. Directed graphs; depth structure of directed acyclic graphs (DAGs); trees and linked lists. Reading: Durbin et al. section 2.1, 2.2, 2.3.
- Discussion Section 1: HW1 and general programming tips.
- Lecture 3: Dynamic programming on weighted DAGs. Reading: Durbin et al. 2.4, 2.5, 2.6.
- Lecture 4: Maximal-scoring sequence segments. Edit graphs & sequence alignment. Reading: Durbin et al. 6.1, 6.2, 6.3; Ewens & Grant 1.1, 1.2, 1.12, 3.1, 3.2, 3.4, 3.6, 5.2, 9.1, 9.2
- Discussion Section 2: HW1 & 2, DAGs, more graph algorithms, dynamic programming, RNA folding.
- Lecture 5: Smith-Waterman algorithm. Needleman-Wunsch algorithm. Local vs. global. Multiple sequence alignment. Linear space algorithms. Reading: Ewens & Grant 5.3.1, 5.3.2, 12.1, 12.2, 12.3; Durbin et al. chapter 3
- Lecture 6: Linear space algorithms (cont'd). General & affine gap penalties. Profiles.
- Discussion Section 3: BLAST.
- Lecture 7: Smith-Waterman special cases. Word nucleation approaches/BLAST. Probability models on sequences; review of basic probability theory: probability spaces, conditional probabilities, independence. Reading: Ewens & Grant 12.2, 12.3, 1.14, Appendix B.10; Durbin et al. chapter 3
- Lecture 8: Probabilities on sequences. Failure of equal frequency assumption for DNA. Site models. Site model examples: 3' splice sites, 5' splice sites, protein motifs. Site probability models.
- Discussion Section 4: BOWTIE, MUSCLE.
- Lecture 9: Comparing alternative models. Neyman-Pearson lemma. Weight matrices for site models. Weight matrices for splice sites in C. elegans. Score distributions.
- Lecture 10: Limitations of site models (variable spacing, non-independence). Hidden Markov Models: introduction; formal definition. Reading: Siepel et al.
- Discussion Section 5: motif finding.
C/C++ PROGRAMMING GUIDES:
OTHER RELEVANT COURSES AT UW:
COMPUTATIONAL BIOLOGY COURSES AT OTHER SITES: