Introduction to Computational Molecular Biology:
Genome and Protein Sequence Analysis
(Winter Quarter 2015)
Assignment 1, due Sunday Jan. 18
Assignment 2, due Sunday Jan. 25
Assignment 3, due Sunday Feb. 1
Assignment 4, due Sunday Feb. 8
Assignment 5, due Sunday Feb. 15
Assignment 6, due Sunday Feb. 22
Assignment 7, due Sunday Mar. 1
Assignment 8, due Sunday Mar. 8
Assignment 9, due Sunday Mar. 15
SYLLABUS & LECTURE SLIDES:
Nature paper on Avida
Avida web site
Nature paper on human genome sequence
Nature paper on mouse genome sequence
Siepel et al. paper on PhyloHMMs & sequence conservation
Rabiner tutorial on HMMs
HMM scaling tutorial (Tobias Mann)
Supervised learning tutorial
- Biological Review : Gene and genome structure in prokaryotes and eukaryotes; the genetic code & codon usage; "global" genome organization. Sources and characteristics of sequence data; Genbank and other sequence databases.
- Lecture 1: Finding exact matches in sequences using suffix arrays.
- Lecture 2: Algorithmic complexity. Directed graphs; depth structure of directed acyclic graphs (DAGs); trees and linked lists. Reading: Durbin et al. section 2.1, 2.2, 2.3.
- Discussion Section 1: HW1 and general programming tips.
- Lecture 3: Dynamic programming on weighted DAGs. Reading: Durbin et al. 2.4, 2.5, 2.6.
- Lecture 4: Maximal-scoring sequence segments. Edit graphs & sequence alignment. Smith-Waterman algorithm. Needleman-Wunsch algorithm. Local vs. global. Reading: Durbin et al. 6.1, 6.2, 6.3; Ewens & Grant 1.1, 1.2, 1.12, 3.1, 3.2, 3.4, 3.6, 5.2, 9.1, 9.2
- Discussion Section 2: HW2 and C/C++ tips.
- Lecture 5: Multiple sequence alignment. Linear space algorithms. Reading: Ewens & Grant 5.3.1, 5.3.2, 12.1, 12.2, 12.3; Durbin et al. chapter 3
- Lecture 6: General & affine gap penalties. Profiles. Smith-Waterman special cases. Word nucleation approaches/BLAST.
- Discussion Section 3: BLAST.
- Lecture 7: Probability models on sequences; review of basic probability theory: probability spaces, conditional probabilities, independence. Comparing alternative models. Failure of equal frequency assumption for DNA. Site models. Examples: 3' splice sites Reading: Ewens & Grant 12.2, 12.3, 1.14, Appendix B.10; Durbin et al. chapter 3
- Lecture 8: Site model examples: 5' splice sites, protein motifs. Site probability models. Comparing alternative models. Neyman-Pearson lemma. Weight matrices for site models. Weight matrices for splice sites in C. elegans. Score distributions.
- Discussion Section 4: Bayesian and frequentist methods.
- Lecture 9: Limitations of site models (variable spacing, non-independence). Hidden Markov Models: introduction; formal definition.
- Lecture 10: HMM examples: -- splice sites; 2-state models; 7-state prokaryote genome model. Probabilities of sequences. Reading: Siepel et al.
- Discussion Section 5: Applications of Hidden Markov Models in comp-bio.
- Lecture 11: Probabilities of sequences; computing HMM probabilities via associated WDAG. HMM Parameter estimation: Viterbi training.
- Lecture 12: Baum-Welch (EM) algorithm; techniques for finding global maxima in likelihood surface. Detection of evolutionarily conserved regions using Phylo-HMMs.
- Discussion Section 6: Bayesian networks.
- Lecture 13: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 14: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Discussion Section 7: UCSC genome browser.
- Lecture 15: Supervised machine learning, logistic regression, gradient descent.
- Lecture 16: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd). Multiple alignment using HMMs.
- Discussion Section 8: Discrimative vs. generative machine learning models, and model complexity.
- Lecture 17: Maximal scoring segments. D-segments, relationship to 2-state HMMs.
- Lecture 18: Information theory: entropy. Information inequality. Distributions with maximum entropy. Boltzmann distribution. Coding theory/data compression, uniquely decodable codes. Kraft inequality.
- Discussion Section 9: Performance measures for classificaton.
- Lecture 19: Kraft inequality, entropy & expected code length. Information. MDL principle and overfitting. Relative entropy. Relative entropies of site models.
- Lecture 20: Sequence logos. Exact & approximate probability distributions for weight matrix scores. Maximal scoring segments. Karlin-Altschul theory.
- Discussion Section 10: How to organize a computational biology project.
C/C++ PROGRAMMING GUIDES:
OTHER RELEVANT COURSES AT UW:
COMPUTATIONAL BIOLOGY COURSES AT OTHER SITES: