Introduction to Computational Molecular Biology:
Genome and Protein Sequence Analysis
(Winter Quarter 2016)
Assignment 1, due Sunday Jan 17
Assignment 2, due Sunday Jan 24
Assignment 3, due Sunday Jan 31
Assignment 4, due Sunday Feb 7
Assignment 5, due Sunday Feb 14
Assignment 6, due Sunday Feb 21
Assignment 7, due Sunday Feb 28
Assignment 8, due Sunday Mar 6
Assignment 9, due Sunday Mar 13
SYLLABUS & LECTURE SLIDES:
Nature paper on Avida
Avida web site
Nature paper on human genome sequence
Nature paper on mouse genome sequence
Siepel et al. paper on PhyloHMMs & sequence conservation
Rabiner tutorial on HMMs
HMM scaling tutorial (Tobias Mann)
Supervised learning tutorial
- Biological Review : Gene and genome structure in prokaryotes and eukaryotes; the genetic code & codon usage; "global" genome organization. Sources and characteristics of sequence data; Genbank and other sequence databases.
- Lecture 1: Finding exact matches in sequences using suffix arrays.
- Lecture 2: Algorithmic complexity. Directed graphs; depth structure of directed acyclic graphs (DAGs); trees and linked lists. Reading: Durbin et al. section 2.1, 2.2, 2.3.
- Discussion Section 1: HW1 and general programming tips.
- Lecture 3: Dynamic programming on weighted DAGs. Reading: Durbin et al. 2.4, 2.5, 2.6.
- Lecture 4: Maximal-scoring sequence segments. Edit graphs & sequence alignment. Reading: Durbin et al. 6.1, 6.2, 6.3; Ewens & Grant 1.1, 1.2, 1.12, 3.1, 3.2, 3.4, 3.6, 5.2, 9.1, 9.2
- Discussion Section 2: HW1 & 2, DAGs, more graph algorithms, dynamic programming, RNA folding.
- Lecture 5: Smith-Waterman algorithm. Needleman-Wunsch algorithm. Local vs. global. Multiple sequence alignment. Linear space algorithms. Reading: Ewens & Grant 5.3.1, 5.3.2, 12.1, 12.2, 12.3; Durbin et al. chapter 3
- Lecture 6: Linear space algorithms (cont'd). General & affine gap penalties. Profiles.
- Discussion Section 3: BLAST.
- Lecture 7: Smith-Waterman special cases. Word nucleation approaches/BLAST. Probability models on sequences; review of basic probability theory: probability spaces, conditional probabilities, independence. Reading: Ewens & Grant 12.2, 12.3, 1.14, Appendix B.10; Durbin et al. chapter 3
- Lecture 8: Probabilities on sequences. Failure of equal frequency assumption for DNA. Site models. Site model examples: 3' splice sites, 5' splice sites, protein motifs. Site probability models.
- Discussion Section 4: BOWTIE, MUSCLE.
- Lecture 9: Comparing alternative models. Neyman-Pearson lemma. Weight matrices for site models. Weight matrices for splice sites in C. elegans. Score distributions.
- Lecture 10: Limitations of site models (variable spacing, non-independence). Hidden Markov Models: introduction; formal definition. Reading: Siepel et al.
- Discussion Section 5: motif finding.
- Lecture 11: HMM examples: -- splice sites; 2-state models; 7-state prokaryote genome model. Probabilities of sequences. Computing HMM probabilities via associated WDAG.
- Lecture 12: Computing HMM probabilities via associated WDAG. HMM Parameter estimation: Viterbi training; Baum-Welch (EM) algorithm; techniques for finding global maxima in likelihood surface.
- Discussion Section 6: Viterbi algorithm; bootstrapping.
- Lecture 13: Techniques for finding global maxima in likelihood surface. Detection of evolutionarily conserved regions using Phylo-HMMs.
- Lecture 14: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 15: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 16: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Discussion Section 8: EM algorithm, MCMC.
- Lecture 17: Maximal scoring segments. D-segments, relationship to 2-state HMMs.
- Lecture 18: Distribution of maximal segment scores in background sequence. Karlin-Altschul theory.
- Discussion Section 9: Faster Python programs: Cython, profiling.
- Lecture 19: Karlin-Altschul theory (cont'd). Information theory.
- Discussion Section 10: Unix tools. Organizing computational biology projects. Version control, GIT.
C/C++ PROGRAMMING GUIDES:
OTHER RELEVANT COURSES AT UW:
COMPUTATIONAL BIOLOGY COURSES AT OTHER SITES: