MBT/Genetics 541 Homework Assignment 2

Due Friday Apr. 13

  1. Read section 7.3 of Durbin et al.
  2. Write a program to:
    1. Read in a DNA sequence data set such as is produced by Dnatree. Above 60 sites, Dnatree wraps the sequences in a way that will require thought, so let's assume you have sequences 60 sites long or less (i.e,, don't bother with the wrapping issue unless you feel energetic).
    2. Take these sequences and, one site at a time, count the number of changes they require on a phylogeny of this tree topology: (A,(B,(C,(D,(E,(F,(G,(H,(I,J))))))))) Note that this is of a highly stereotyped shape, with the lineage to A branching off first, then the lineage to B, then C, and so on. I chose this so you do not have to deal with the tedious bookkeeping of representing an arbitrary tree -- you should be able to do everything with arrays (tables) in a simple way. The Fitch algorithm will be the best one to use.
    3. Use Dnatree to produce a data set and then use your program to count the number of changes it needs on this tree. Of course you can also count then by using the tree rearrangement part of Dnatree to make the tree and evaluate it. This will serve as a check on your program (hint -- for testing try a data set 1 site long).
  3. email me (joe@genetics) the data set and your program's output. I can do (limited amounts of) answering questions by email as you work.