MBT 599 Homework Assignment 1

Due Friday Jan. 12

  1. Read Chapter 1 (review of molecular biology) of Clote and Backofen.
  2. Spend an hour or two exploring the NCBI web site, following as many links as possible, reading as much material as you can, and getting an idea of the overall structure of the site.
  3. Find a bacterium or archaeon for which the complete genome sequence is available on that site and at least 500,000 bases in length, and for which one of the organism's initials (i.e. the first letter of its first name, or the first letter of its last name) is the same as one of your initials (if none of the organisms has initials meeting this condition, choose one at random). For this organism, find a file in "FASTA" format (i.e. having a header line which starts with the character ">" and includes the organism name, with the sequence itself following on subsequent lines) containing the complete genome sequence; this file will have a name with the extension ".fna". Download this file.
  4. Write a program which reads in the file you downloaded in 3, counts all the nucleotides of each type (i.e. the number of A's, the number of C's, etc. including ambiguously coded ones (N,R,Y, etc), if any) in the sequence; and prints out
    1. the name of the file
    2. the header line of the file
    3. a table indicating the nucleotide counts, and the total number of nucleotides.
  5. Email the output from running your program to me and Joe. Please make it as compact as possible. Do NOT send the code itself. Include the output in the body of your email message (as plain text), NOT as an attachment.