MBT 599 Homework Assignment 5
Due Friday Feb. 9
- Write a program that does the following, for the same genome sequence you used in assignments 1-4:
- Reads the Genbank file (with suffix .gbk), and from the FEATURES entries infers the locations of the starts of the coding sequences on both strands.
- Uses the information in 1) to compute count and frequency matrices (of the type presented in lecture 10 for C. elegans splice sites) for the translation start sites. These should extend from position -10 (i.e. 10 bases upstream of the first base of the start codon) to position +10 (i.e. 10 bases downstream of that base). To generate this you will need to read in the genome sequence (which appears later in the Genbank file), and to complement it in order to handle genes on the opposite strand correctly.
Your output should provide
- the name and first line of the Genbank file
- the matrix of nucleotide counts at each position from -10 to +10, and the corresponding matrix of frequencies.
- Email this to me and Joe. Please make it as compact
as possible. Do NOT send the code itself. Include the output in the
body of your email message (as plain text), NOT as an attachment.