MBT 599 Homework Assignment 1
Due Friday Jan. 12
- Read Chapter 1 (review of molecular biology) of Clote and Backofen.
- Spend an hour or two exploring the NCBI web site, following as many links as possible,
reading as much material as you can, and getting an idea of the overall
structure of the site.
- Find a bacterium or archaeon for which the complete genome
sequence is available on that site and at least 500,000 bases in
length, and for which one of the organism's initials (i.e. the first letter of its first name, or the first letter of its last
name) is the same as one of your initials
(if none of the organisms has initials meeting this condition, choose one at random).
For this organism, find a file in "FASTA" format (i.e. having a header
line which starts with the character ">" and includes the organism name,
with the sequence itself following on subsequent lines) containing the
complete genome sequence; this file will have a name with the
extension ".fna". Download this file.
- Write a program which reads in the file you downloaded in 3,
counts all the nucleotides of each type (i.e. the number of A's, the number of C's, etc. including ambiguously coded
ones (N,R,Y, etc), if any) in the sequence; and prints out
- the name of the file
- the header line of the file
- a table indicating the nucleotide counts, and the total number of nucleotides.
- Email the output from running your program to me and Joe. Please
make it as compact as possible. Do NOT send the code itself. Include
the output in the body of your email message (as plain text), NOT as
an attachment.