CONSED 29.0 DOCUMENTATION

CONTENTS:
    1  WHAT IS NEW IN CONSED 29.0
    2  UPGRADING FROM CONSED 27.0 to 29.0
    3  UPGRADING FROM CONSED 28.0 to 29.0
    4  REQUIRED VERSIONS OF OTHER PROGRAMS
    5  INSTALLING CONSED
    6  NOTE TO LINUX USERS (64 BIT) (INSTALLATION)
    7  NOTE TO LINUX USERS (32 bit) (INSTALLATION)
    8  NOTE TO MACOSX USERS (INSTALLATION)
    9  NOTE TO SOLARIS USERS (INSTALLATION)
    10  QUICK TOUR
    11  QUICK TOUR OF BAMSCAPE
    12  QUICK TOUR OF CONSED (THE GRAPHICAL EDITOR)
    13  BAM2ACE: MAKING A CONSED-READY DATASET OUT OF A BAM FILE
    14  SANGER READS
    15  454 READS
    16  USING AUTOPCRAMPLIFY
    17  USING AUTOPRIMERS
    18  USING AUTOREPORT
    19  FEATURES FOR SNP ANALYSIS
    20  BIONANO DIGEST GENOME MAPS
    21  LESS USED CONSED FEATURES
    22  CONSED CUSTOMIZATION
    23  CREATING CUSTOM TAG TYPES
    24  EXPANDING CONSED'S CAPABILITIES WITH A LITTLE PROGRAMMING
    25  MONITORS AND MICE FOR CONSED
    26  ACE FILE FORMAT
    27  SAMPLE PHD BALL FORMAT
    28  TIMESTAMP MISMATCH
    29  CONSED REFERENCES
    30  RUNNING PHRED and PHRAP
    31  WHAT IS AUTOFINISH?
    32  USING AUTOFINISH
    33  CONTRIBUTED SOFTWARE 
    34  CONSED CUSTOMIZABLE CONSEDRC RESOURCES 
    35  ACKNOWLEDGEMENTS


BIG TABLE OF CONTENTS:

1.  WHAT IS NEW IN CONSED 29.0
2.  UPGRADING FROM CONSED 27.0 to 29.0
3.  UPGRADING FROM CONSED 28.0 to 29.0
4.  REQUIRED VERSIONS OF OTHER PROGRAMS
5.  INSTALLING CONSED
   5.12)  PRELIMINARY TESTING OF CONSED BEFORE COMPLETING THE REST OF THE INSTALLATION
   5.21)  ENOUGH MEMORY FOR CONSED
   5.23)  TESTING THE INSTALLATION
   5.24)  TESTING ADDING ILLUMINA READS
   5.25)  TESTING ADDING 454 READS
   5.26)  TESTING 454 READS (NEWBLER ASSEMBLY)
   5.27)  TESTING ADD NEW READS
   5.29)  TESTING RUNNING CROSS_MATCH FROM ASSEMBLY VIEW
   5.30)  TEST RUNNING PHREDPHRAP
   5.31)  TESTING MINIASSEMBLIES
   5.34)  FAKE READS
   5.35)  APPENDING EXPID TO THE PHD FILES
6.  NOTE TO LINUX USERS (64 BIT) (INSTALLATION)
7.  NOTE TO LINUX USERS (32 bit) (INSTALLATION)
8.  NOTE TO MACOSX USERS (INSTALLATION)
   8.8)  MICE FOR MAC
   8.9)  C COMPILER FOR MAC
9.  NOTE TO SOLARIS USERS (INSTALLATION)
10.  QUICK TOUR
11.  QUICK TOUR OF BAMSCAPE
   11.1)  USING BAMSCAPE
   11.36)  MODIFYING THE REFERENCE SEQUENCE
   11.42)  FINDING PROBLEMS IN BATCH
12.  QUICK TOUR OF CONSED (THE GRAPHICAL EDITOR)
   12.1)  GETTING YOUR OWN COPY OF A SAMPLE DATASET
   12.8)  SCROLLING
   12.12)  VERTICAL SCROLLING
   12.13)  GOTO POSITION
   12.14)  COLORS
   12.16)  HIGHLIGHTING READ NAMES 
   12.20)  MOVING ALONG A READ  
   12.21)  DIMMING AND UNDIMMING ENDS OF READS
   12.22)  FIXING A READ AT TOP OF ALIGNED READS WINDOW
   12.28)  EDITING THE CONSENSUS
   12.29)  SAVING THE ASSEMBLY
   12.30)  EXPORTING THE CONSENSUS
   12.33)  COMPLEMENTING THE CONTIG
   12.34)  FIND MAIN WINDOW
   12.35)  MULTIPLE UNDO EDIT
   12.36)  EXITING CONSED
   12.37)  CONSED -ACE
   12.38)  SORTING OF READS
   12.40)  SORTING BY BASE
   12.41)  SORTING BY MISMATCHES ON TOP
   12.42)  SORTING BY QUALITY
   12.43)  SORTING BY UNALIGNED + MISMATCHES ON TOP
   12.44)  ALPHABETICAL SORTING OF READS
   12.47)  SEARCH FOR STRING
   12.48)  COPY AND PASTE
   12.49)  FINDING VARIANTS/MISASSEMBLED READS/HIGHLY DISCREPANT LOCATIONS
   12.56)  EXTENDING THE CONSENSUS
   12.57)  HIGH AND LOW DEPTH OF COVERAGE REGIONS
   12.63)  ASSEMBLY VIEW
   12.64)  READ DEPTH
   12.65)  FORWARD/REVERSE PAIR DEPTH
   12.67)  INCONSISTENT FORWARD/REVERSE PAIRS
   12.75)  SEQUENCE MATCHES
   12.81)  RUNNING CROSS_MATCH FOR SEQUENCE MATCHES
   12.82)  PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES)
   12.86)  MINIASSEMBLIES
   12.90)  HIGHLIGHTING READS TO REMOVE THEM FROM A CONTIG
   12.101)  CONTIG ARRANGEMENT--REORDER CONTIGS
   12.104)  CONTIG ORIENTATION
   12.105)  NAVIGATING
   12.109)  CUSTOM NAVIGATION
   12.111)  TEAR CONTIG
   12.113)  JOIN CONTIGS
   12.114)  COMPARE CONTIGS WINDOW AND INVERTED REPEATS
   12.116)  REMOVING READS
   12.118)  TAGS
   12.122)  CREATING LONG TAGS
   12.125)  CONSENSUS TAGS
   12.127)  WHAT THE COLORS MEAN
   12.128)  SEARCH FOR READ NAME
   12.132)  ONLINE DOCUMENTATION
   12.133)  THE .WRK LOG FILE
   12.134)  FINDING, DISPLAYING, AND MAKING POTENTIAL JOINS
   12.145)  USING CONSED -MAKEJOINS TO MAKE JOINS IN BATCH
   12.147)  PROTEIN TRANSLATION 
   12.148)  OPEN READING FRAMES
   12.154)  DISPLAYING TRACKS WITH SCORES (BED FILES)
   12.155)  FIXING CONTIG-ENDS
   12.164)  FIXING THE CONSENSUS IN BATCH
   12.170)  HANDLING DUPLICATE READ NAMES
13.  BAM2ACE: MAKING A CONSED-READY DATASET OUT OF A BAM FILE
   13.8)  MAKING AN ACE FILE OUT OF AN ENTIRE BAM FILE
14.  SANGER READS
   14.2)  TRACES AND EDITING READS
   14.7)  SCROLLING TRACES AND ALIGNED READS TOGETHER
   14.8)  SHOW ALL TRACES
   14.9)  PRIMER-PICKING
   14.14)  CHECKING WHETHER A PARTICULAR OLIGO WOULD MAKE AN ACCEPTABLE PRIMER
   14.15)  PICKING PCR PRIMER PAIRS
   14.16)  ORDERING OF PRIMERS
   14.19)  ADD NEW READS (SANGER--NOT ILLUMINA OR 454)
   14.20)  ADDING NEW READS IN BATCH (SANGER)
   14.25)  ADDING NEW SANGER READS IN BATCH TO TARGETED REGIONS
   14.32)  CONSED-POLYPHRED INTERACTION TO REVIEW POLYMORPHIC SITES
15.  454 READS
   15.1)  USING 454 READS (NEWBLER ASSEMBLY)
   15.12)  USING 454'S NEWBLER ON YOUR OWN DATA
16.  USING AUTOPCRAMPLIFY
17.  USING AUTOPRIMERS
18.  USING AUTOREPORT
   18.1)  VARIANTS REPORT
19.  FEATURES FOR SNP ANALYSIS
   19.3)  FINDING SNPS IN BATCH
   19.4)  TAGGING A REFERENCE SEQUENCE
20.  BIONANO DIGEST GENOME MAPS
21.  LESS USED CONSED FEATURES
   21.1)  CHANGING THE CONSENSUS IN BATCH ACCORDING TO A SCRIPT
   21.2)  EXPORTING SCAFFOLDS [ EXPORT SCAFFOLDS ]
   21.3)  ADDING PAIRED ILLUMINA READS USING CROSS_MATCH
   21.12)  ADDING UNPAIRED ILLUMINA READS USING CROSS_MATCH
   21.24)  ALIGNING ILLUMINA READS USING CROSS_MATCH AGAINST A LARGE
   21.30)  ALIGNING YOUR OWN ILLUMINA DATA USING CROSS_MATCH TO A
   21.37)  USING 454 READS (ALIGNING WITH CROSS_MATCH TO REFERENCE SEQUENCE )
   21.42)  ADDING ADDITIONAL 454 OR ILLUMINA READS USING CROSS_MATCH
   21.43)  MULTIPLE HIGH QUALITY DISCREPANCIES VS SEARCH FOR HIGHLY
   21.44)  BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY
   21.45)  SELECTIVELY BACKING OUT EDITS AND REMOVING READS
   21.46)  REMOVING READS FROM A PHRAP ASSEMBLY
   21.47)  ADDING READS WITHOUT CHROMATOGRAM FILES
   21.48)  ALIGNING READS TO A BACKBONE
   21.49)  COMPARING READS TO A REFERENCE SEQUENCE
   21.50)  TAGGING ALL READS AT ONCE
   21.51)  EDITING ALL READS AT ONCE
   21.52)  FASTER CONSED STARTUP FOR SANGER READS
   21.53)  VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS
   21.54)  HIDING SOME TYPES OF TAGS
   21.55)  CUSTOM CONTIG NAMES
   21.56)  ERROR RATE
   21.57)  RESTRICTION DIGEST
   21.58)  RESTRICTION DIGEST AND ASSEMBLY VIEW
   21.59)  MULTIPLE TRACE POPUP
   21.60)  MAXIMUM NUMBER OF TRACES DISPLAYED
   21.61)  SCALING THE TRACES 
   21.62)  HOTKEYS FOR EDITING
   21.63)  SCROLLING TRACES INDEPENDENTLY
   21.64)  MEASURING ERROR RATE AND SINGLE SUBCLONE BASES FOR A REGION
   21.65)  PREVENTING 2 USERS FROM MAKING CONFLICTING EDITS
   21.66)  PRINTING CONSED WINDOWS 
   21.67)  COLOR MEANS EDITED AND TAGS
   21.68)  COLOR MEANS MATCH
   21.69)  AUTOEDIT
22.  CONSED CUSTOMIZATION
   22.1)  CUSTOMIZING NAVIGATE BY SINGLE STRANDED REGIONS AND NAVIGATE BY SINGLE
   22.3)  COLOR BLINDNESS
23.  CREATING CUSTOM TAG TYPES
24.  EXPANDING CONSED'S CAPABILITIES WITH A LITTLE PROGRAMMING
   24.1)  BRINGING UP CONSED FROM A SCRIPT
   24.2)  CONTROL OF CONSED FROM SOME OTHER PROGRAM
   24.5)  REMOVING READS IN BATCH
   24.6)  COMPLEMENTING CONTIGS IN BATCH
   24.7)  HOW TO WRITE A CUSTOM NAVIGATION FILE
   24.12)  COMPRESSING CHROMATOGRAMS
   24.13)  READING CHROMATOGRAMS OUT OF AN EXTERNAL DATABASE
   24.14)  COMPRESSING ACE FILES AND PHD BALLS
   24.18)  NO PHD FILES
   24.19)  ADDING TAGS FROM OTHER PROGRAMS
   24.20)  CHROMOSOME POSITIONS/USER-DEFINED CONSENSUS POSITIONS
   24.21)  DEFINING KEYS (HOTKEYS) TO CALL EXTERNAL PROGRAMS AND/OR APPLY TAGS AND/OR
   24.22)  READ PREFIXES
   24.23)  USING FILES CREATED ON WINDOWS OR WINDOWS NT.  
   24.24)  CREATING YOUR OWN ACE FILES (INSTEAD OF ACE FILES CREATED BY
   24.25)  CONSED OPTIONS
25.  MONITORS AND MICE FOR CONSED
26.  ACE FILE FORMAT
27.  SAMPLE PHD BALL FORMAT
28.  TIMESTAMP MISMATCH
29.  CONSED REFERENCES
30.  RUNNING PHRED and PHRAP
   30.7)  COMMON PROBLEMS RUNNING PHREDPHRAP
   30.9)  WHY ARE ALL THE READS NOT IN THE ASSEMBLY?
   30.10)  ARE THERE READS THAT ARE TOTALLY UNALIGNED?
   30.11)  CORRECTING FALSE JOINS MADE BY PHRAP
   30.12)  USING PHRAP ON NEXT-GEN READS
31.  WHAT IS AUTOFINISH?
32.  USING AUTOFINISH
   32.3)  AUTOFINISH:  MINIMUM NUMBER OF ERRORS FIXED PER READ
   32.4)  EDIT PARAMETERS:  HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS
   32.6)  DIVERSION:  UNIX LESSON
   32.7)  AUTOFINISH:  CHANGING MELTING TEMPERATURES
   32.8)  AUTOFINISH:  JUST CLOSING GAPS
   32.9)  AUTOFINISH:  JUST CLOSING GAPS JUST USING WALKS
   32.10)  AUTOFINISH:  NOT REPEATING FAILED EXPERIMENTS
   32.12)  AUTOFINISH:  NOT USING PARTICULAR SUBCLONE TEMPLATES
   32.13)  AUTOFINISH:  NOT USING ENTIRE LIBRARIES FOR FINISHING
   32.14)  MULTIPLE LIBRARIES WITH DIFFERENT INSERT SIZES
   32.15)  AUTOFINISH CLOSING GAPS WITH MINILIBRARIES
   32.17)  CLOSING GAPS USING PCR
   32.19)  AUTOFINISH:  TOO MANY UNIVERSAL PRIMER READS
   32.20)  AUTOFINISH FOR CDNA ASSEMBLIES
   32.21)  AUTOFINISH FOR LISTING GAP-SPANNING TEMPLATES
   32.22)  FINISHING A SPECIFIC CONTIG
   32.23)  MARKING THE END OF THE CLONE
33.  CONTRIBUTED SOFTWARE 
34.  CONSED CUSTOMIZABLE CONSEDRC RESOURCES 
35.  ACKNOWLEDGEMENTS


END BIG TABLE OF CONTENTS

----------------------------------------------------------------------------

1.  WHAT IS NEW IN CONSED 29.0


This is a maintenance release.  It fixes several bugs including:

    * Bamscape:  if you zoom in too far
    * Bam2Ace:  if you had zillions of regions
    * Consed's batch addNewReads gave a segmentation fault
    

----------------------------------------------------------------------------

2.  UPGRADING FROM CONSED 27.0 to 29.0

Since there is an installation script (see INSTALLING CONSED below),
it is easiest to delete (or hide) your current distribution and follow
the consed installation instructions (below).

If you have customized some of consed's perl scripts, you might want to save
some of them.  Be aware, however, that the following perl scripts have changed:

addReads2Consed.perl
bam2Ace.perl
convertBedToBamScape.perl
fasta2Ace.perl
picard2Regions.perl
autoPrimers.perl

In addition, the executables and some of the sample datasets have
changed.  You will get the new ones by using the installation script.


----------------------------------------------------------------------------

3.  UPGRADING FROM CONSED 28.0 to 29.0

The following script is new:

autoPrimers.perl

The executables, of course, have changed.


----------------------------------------------------------------------------

4.  REQUIRED VERSIONS OF OTHER PROGRAMS


Several functions of consed will not work without phrap and
cross_match (which is part of the phrap package).  These functions
include: miniassemblies, finding sequence similarity in Assembly View,
and Adding New Reads.  These are pretty important features, but most
consed features will work without phrap and crossmatch.

To use all of consed's features, you MUST have the following versions
of programs in order to use this version of Consed.  If you are using
previous versions of these programs, please upgrade to the following
versions.

(Note that the versions below are dates.  For example, 1.080714 means
year 2008, month 07 (July), and 14th day of the month--ignore the
leading "1.".  Thus 1.080714 is later than 1.080630.  Similarly,
0.990622.e means June 22, 1999 and ignore the leading "0".)


REQUIRED VERSION OF PHRAP TO WORK WITH CONSED

1.080721 or later for phrap and cross_match (This is a more recent
       version than the one you normally get.  See below for how to
       get them.  cross_match comes with phrap.)

You will need to specially request the most recent version of
phrap--not the one that you get with a normal request.  To request the
most recent version of phrap and cross_match (cross_match comes with
phrap), send an email to phg (at) u.washington.edu, with a Subject line
that says "phrap new version request", and an email body that consists
of the following two lines (it should be in exactly this format, to be
computer readable):

Request: phrap ver 1.080721 or later
Registered phrap email address: [[insert address here]]

The address should be the one you supplied previously when obtaining
phrap; the new version will be sent to it. If you have not previously
registered for phrap, or your registered address is no longer valid,
you will need to include a license agreement (with all questions
answered) in the email.  This system is not completely automated so
you may need to wait several days.

If you are using Sanger reads, you will need phred.

0.000925.c or later for phred.  Type phred -V to check.  Contact
       bge (at) u.washington.edu if your phred is earlier than this.

If you are using polyphred, you must have polyphred 3.5 or more
recent.  USING AN OLDER VERSION OF POLYPHRED WILL CAUSE SEVERE
PROBLEMS WITH CONSED WHICH WILL APPEAR AS PROBLEMS WITH FILETYPES.


----------------------------------------------------------------------------

5.  INSTALLING CONSED

To install Consed, you must have some basic Unix system administration
skills.  For example, you must be able to run X applications such as
xterm, you must know what PATH is for and how to add something to it,
you must be able to edit a file using a Unix editor (such as emacs,
vi, or pico), you must be able to move around in the filesystem from
the command line, and you must know how to build/compile a program.
If you do not know how to do these (such as if you are a mac user that
has minimal experience on the command line), find someone who does to
help you and make sure they finish the job, including completing the
tests below.

5.1)  Using Firefox, Safari, Chrome, Internet Explorer, or some other browser
on the computer of which you used for step 4, open url:

http://bozeman.mbt.washington.edu/consed/consed.html#howToGet

Click on the appropriate type of computer.  Your browser (e.g.,
firefox) will ask you what you want to name the file.  Just use the
default.

If you are denied access, *carefully* follow the instructions on the
"Don't have a cow, man--You are not authorized to get this document"
page, including the try-to-get-consed part.  Please do not email David
Gordon until after you have followed these instructions.

5.2)  If you have downloaded from a Windows computer, transfer the
downloaded file to a Linux/Macosx/Unix computer before doing anything
further.

5.3)  Unpack the downloaded in its own subdirectory:

   mkdir consed
   cd consed
   gunzip -c (whatever) | tar -xvf -

The "whatever" depends on, in your browser when you saved the file,
what name you gave the file.  It also depends on where you are and
where the file is.  But it will usually be one of the following:

    gunzip -c ../consed_linux.tar.gz   | tar -xvf -
    gunzip -c ../consed_solaris.tar.gz | tar -xvf -
    gunzip -c ../consed_mac.tar.gz     | tar -xvf - 


5.4)  Figure out the correct Consed executable file to use.  


IF YOU ARE USING LINUX: 

   Type the following executables in order (below).  Use the first one
   that does not give an error but simply says "Version 29.0".


   ./consed_rhel6linux64bit -v
   ./consed_rhel4linux64bit -v
   ./consed_rhel4linux64bit_static -v
   ./consed_linux32bit_dyn -v
   ./consed_linux32bit -v

   If it says something like "Exec format error. Wrong Architecture.",
   try another executable.

   If it says something like "error while loading shared libraries:
   libXp.so.6: cannot open shared object file: No such file or
   directory", try another executable.


IF YOU ARE USING MACOSX:

   Try to use:
   ./consed_mac_intel -v

   If you get any error or you have any problem with the subsequent
   instructions (below), read NOTE TO MACOSX users (below).  In any
   event read MICE FOR MAC (below).

5.5)  Decide where to put the Consed distribution.  I suggest you put
Consed, phred, cross_match (which comes with phrap), phrap, the perl
scripts, and other executables into /usr/local/genome/bin.  So create
/usr/local/genome (without the bin).

If you can't actually use /usr/local/genome, then you could make
/usr/local/genome be a link to the real location--that will work just
as well.

As a third choice, if you want to have another location xxx, then put:

export CONSED_HOME=xxx
into .bash_profile or .bashrc if you are using bash (which is the case
for macosx users)

or if you are using csh or tcsh, put into .login

setenv CONSED_HOME xxx

In this case phred, cross_match, phrap, the perl scripts and other
executables must go into $CONSED_HOME/bin


5.6)  At this point you should have decided:

-which consed executable to use
-where to put all of the consed programs and related files

From the directory where you ran "gunzip -c ... | tar -xvf -",
type:

pwd 

just so you are sure where you are, and then type:

./installConsed.perl (consed executable) (where consed programs)

For example:

./installConsed.perl consed_mac_intel /usr/local/genome

If this script gives an error and you can't figure it out, look at the
script itself and see what is giving the error.  Then follow the
scripts instructions about what to delete and then run the script
again.


5.7)  Make sure that the location of the consed executable is in every
Consed users' PATH.  This location might be /usr/local/genome/bin (or
$CONSED_HOME/bin).

For bash users (which includes macosx), you do this by putting
something like this in .bash_profile

export PATH=/usr/local/genome/bin:$PATH

For csh or tcsh users, you do this by putting something
like this in .cshrc

set path=(/usr/local/genome/bin $path)

and typing "rehash"

5.8)  Check this by logging on as a user and typing:

rehash  (don't worry if the rehash command says "not found")
consed -V

You should see 'Version 29.0'.  If you see an error message like this:
consed: Command not found.
you have some debugging to do.

5.9)  Put phrap and cross_match into the same bin directory
where you put consed by cd'ing to where you built phrap and typing:

cp phrap /usr/local/genome/bin
cp cross_match /usr/local/genome/bin

(or if you didn't use /usr/local/genome, replace it with what you
did use).

Check that the correct version of cross_match
is installed by typing:

cross_match

You should see:

> cross_match

cross_match cross_match 
cross_match version 1.080721

cross_match version 1.080721
Reading parameters ... 1.008 Mbytes allocated -- total 1.008 Mbytes

Run date:time  081205:135315
Run date:time  081205:135315
FATAL ERROR: Sequence files must be specified on command line. See documentation.

FATAL ERROR: Sequence files must be specified on command line. See documentation.


where 1.080721 is a date in the form YYMMDD.  It must be this date or
more recent.  Otherwise, follow the instructions above for getting
cross_match (which is part of the phrap package).

5.10)  Put phred into the same bin directory where you put consed by
cd'ing to where you built phred and typing:

cp phred /usr/local/genome/bin

5.11)  Put phredpar.dat (which comes with phred) into
/usr/local/genome/lib (or $CONSED_HOME/lib):

cp phredpar.dat /usr/local/genome/lib

(or if you didn't use /usr/local/genome, replace it with what you did use).

5.12)  PRELIMINARY TESTING OF CONSED BEFORE COMPLETING THE REST OF THE INSTALLATION

From the location where you put the example directories,
type the following:

cd standard/edit_dir


5.13)  start Consed by typing 
consed

If you get some error such as:

Error: Can't open display:

then the problem probably has nothing to do with Consed, but rather
with X.  To test this, run some other X application (such as xclock,
xterm, xeyes, or xcalc) and see if you get the same error.  (On some
versions of MACOSX, you must start X11 and then consed in an
xterm--see NOTE TO MACOSX USERS below.)  The problem may be due to
your X emulator.  See 'MONITORS AND MICE FOR CONSED' below.

Don't worry about a message like:
Warning: Cannot convert string "helvetica" to type FontStruct


Two windows will appear.  One of these will have the list of .ace
files and say 'select assembly file to open' and
'standard.fasta.screen.ace.1'.  Double click on
"standard.fasta.screen.ace.1".  The first window will go away.

You will now see a list of one contig and a list of reads.  This is the
'Consed Main Window'.  

Double click on 'Contig1'.

The 'Aligned Reads Window' will appear.  

Then follow the "COPY AND PASTE" instructions (elsewhere in this
document) and check that that works.  (This will not work on some
versions of macosx.  It should work on linux--if it doesn't, read the
NOTE TO LINUX USERS below.)

If this all works, consider this preliminary test successful.


5.14)  Here is a summary of the files in usr/local/genome/lib/screenLibs (or
$CONSED_HOME/lib/screenLibs):

filter454Reads.fa is the puc19 vector used to produce 454 reads.  454
reads containing puc19 vector are eliminated.

primerCloneScreen.seq is used to screen candidate primers when you use
Consed's function "Pick Primer from Clone Template" (on the Aligned
Reads Window).

primerSubcloneScreen.seq is used to screen candidate primers when you
use Consed's function "Pick Primer from Subclone Template" (on the
Aligned Reads Window).

repeats.fasta is used to tag repeats (to put a blue line under the bases)

vector.seq is used to mask the parts of reads that are from vector
rather than insert

sffLinkers.fa contains the linkers for 454 reads that separate the 2
reads of a read pair.


5.15)  Take a look at files primerCloneScreen.seq, primerSubcloneScreen.seq,
repeats.fasta, and vector.seq: They are dummy files indicating the fasta
format of the sequences that should be put in them.  You can modify
them to suit your needs.

5.16)  You should put
into primerCloneScreen.seq the vector sequence of the cloning vectors
you are using (BAC or fosmid) and into primerSubcloneScreen.seq the
sequencing vectors you are using (plasmid).  Don't be too
generous in putting lots of vectors into the files!  The larger they
are, the slower primer picking will be.  Our files are only this big:

-rw-r--r--   1 root     root       29938 Nov  7  1997 primerCloneScreen.seq
-rw-r--r--   1 root     root        7381 Aug 13  1997 primerSubcloneScreen.seq

and primer picking is quite fast enough.

TESTING PRIMER PICKING

5.17)  Follow the steps above under PRELIMINARY TESTING OF CONSED BEFORE
COMPLETING THE REST OF THE INSTALLATION to bring up the Aligned Reads
Window on Contig1.

Go to some location near the right end of the contig, say base
2470.  Click with the right mouse button on the consensus and click on
either one of the top strand primer choices (either from subclone
template or from clone template).  Consed will pause a moment, and
then there will appear a selection of primers that pass all of
Consed's requirements.  (If you get an error message, Consed might not
have been correctly installed.  See INSTALLING CONSED above.)
Templates are also chosen for each primer.  You may have to scroll the
primer list to the right to see the templates.  Consed lists these
templates in order of quality--all of them will cover the read you
want to make.

5.18)  You should put into the file
/usr/local/genome/lib/screenLibs/vector.seq

(or $CONSED_HOME/lib/screenLibs/vector.seq if you are not using
/usr/local/genome for the root of the Phred/Phrap/Consed files.)

the vector sequences (in FASTA format) that you want
to mask out before running phrap.  In general, it is the combination of
primerCloneScreen.seq and primerSubcloneScreen.seq.  I've given you a
dummy file, but you should replace it with your real vector.

5.19)  You should put into the file
/usr/local/genome/lib/screenLibs/repeats.fasta

(or $CONSED_HOME/lib/screenLibs/repeats.fasta if you are not using
/usr/local/genome for the root of the Phred/Phrap/Consed files.)

any sequences (in FASTA format) that you want to have automatically
tagged (visibly marked by a blue line in Consed).  These typically are
ALU sequences.  If you don't want to tag anything, then comment out
(put '#' as the first character of the line) the following lines in
phredPhrap:

To not tag anything, change:
!system( "$tagRepeats $szAceFileToBeProduced" ) 
  || die "some problem running $tagRepeats";

to:
#!system( "$tagRepeats $szAceFileToBeProduced" ) 
#  || die "some problem running $tagRepeats";

5.20)  If you are going to do any restriction digests, you should
create a file
/usr/local/genome/lib/screenLibs/singleVectorForRestrictionDigest.fasta
containing the cloning vector sequence.  This is used for doing
in-silico restriction digests.  Thus this cloning vector must start at
precisely the site where you cut the (circular) vector to ligate the
insert.  It is not sufficient to just download the vector sequence
from Genbank because they may start the sequence at a different site.

To get you started for doing the demonstration, I've provided such a
file that will work for the example datasets, but will not work for
your own data.

5.21)  ENOUGH MEMORY FOR CONSED

Enough memory is vital with large datasets.  Even if you have
enough physical memory, the operating system may not allow a single
process to use it all.  

In csh or tcsh type:

limit

You should see something like this:

cputime         unlimited
filesize        unlimited
datasize        2097148 kbytes
stacksize       8192 kbytes
coredumpsize    0 kbytes
vmemoryuse      unlimited
descriptors     64 

Type:
limit datasize unlimited
Then type:
limit
just to see that the number has changed.

On bash, type:
ulimit -d unlimited

5.22)  Make sure you have enough swap space to support the amount of RAM
on the computer.


5.23)  TESTING THE INSTALLATION

After installing Consed, you should run all the following tests to
make sure you have installed everything correctly:

If one of the tests (below) fails with a message like:

"couldn't execute ..."

then you can troubleshoot the problem by going to the directory where
this error occurred and type the command that failed.  If the command
includes any output redirection (e.g, 2>/dev/null or >>temp or >temp),
remove everything that occurs on the line after the 2> or > so that
all output comes to your screen.


5.24)  TESTING ADDING ILLUMINA READS

Follow the 8 steps under "ADDING PAIRED ILLUMINA READS" (below)

Troubleshooting:  If you get an error like this:

couldn't execute time /home/genome/BioSw/consed18/bin/cross_match
 reads081205_130653.fa.0 bacref.fa -discrep_lists -tags -masklevel 0
 -minscore 25 -gap1_only -repeat_screen 2
 >>alignmentFile.081205_130653.cross.0 2>/dev/null

then run it on the command line without "time" and without the ">>"
and "2>" so you can see any errors:

/home/genome/BioSw/consed18/bin/cross_match
 reads081205_130653.fa.0 bacref.fa -discrep_lists -tags -masklevel 0
 -minscore 25 -gap1_only -repeat_screen 2

If this says: 
FATAL ERROR: Command line option -gap1_only not recognized
that indicates that you are not running the correct version of
cross_match (see above under REQUIRED VERSIONS OF OTHER PROGRAMS).

Another possible cause of problems is that cross_match is not in the
right place (see above under INSTALLING CONSED) or that you have not
set CONSED_HOME (if you need to do this--see above under INSTALLING
CONSED).


5.25)  TESTING ADDING 454 READS

Follow the 4 steps under "USING 454 READS (ALIGNING TO REFERENCE
SEQUENCE )" (below)

5.26)  TESTING 454 READS (NEWBLER ASSEMBLY)

Follow the first 6 steps under "USING 454 READS (NEWBLER ASSEMBLY)" and
especially be sure that the traces pop up.


5.27)  TESTING ADD NEW READS

5.28)  Next you should test the ADD NEW READS step in the Quick Tour
(below).  This step requires that everything be set up correctly and
in the correct location.  Hopefully the error messages are clear
enough to help you if you have set up anything incorrectly.

5.29)  TESTING RUNNING CROSS_MATCH FROM ASSEMBLY VIEW

See RUNNING CROSS_MATCH FOR SEQUENCE MATCHES (below) and make sure
that step works.

5.30)  TEST RUNNING PHREDPHRAP

See the section RUNNING PHRED and PHRAP (below) 


5.31)  TESTING MINIASSEMBLIES

See PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES) and
MINIASSEMBLIES (below) and make sure those steps work.  

The newer version of phredPhrap is required for this.  If you have
invested a lot of work customizing some ancient version of phredPhrap
(e.g., 10 years old), and don't want to upgrade, you do have the
option of keeping your customized version of phredPhrap for regular
assemblies, and using the new version of phredPhrap for
miniassemblies.  To do this, you must specify the alternate
name/location of phredPhrap by the consedrc parameter:

consed.fullPathnameOfMiniassemblyScript: /usr/local/genome/bin/phredPhrap

(See CONSED CUSTOMIZATION below.)

If you can't even use the new version just for miniassemblies, then
there is a consedrc parameter:

consed.okToUseObsoleteMiniassemblyScript: true

This parameter will allow you to use obsolete versions of phredPhrap
which have bugs such as duplicating consensus tags.  If you like bugs,
this is how you can keep them.


------  NOTE:  You might be done installing consed --------


The following 4 installation steps are only necessary if you are using
autofinish or consed's primer picker *and* if you are using
Sanger reads.  Otherwise, you can skip:

MODIFYING determineReadTypes.perl
TROUBLESHOOTING YOUR CHANGES TO determineReadTypes.perl
FAKE READS
APPENDING EXPID TO THE PHD FILES


5.32)  MODIFYING determineReadTypes.perl

Read the comments in determineReadTypes.perl

Phrap, Consed's primer picking, and Consed/Autofinish all need the
following information for each read:
          is it a univeral primer forward, a universal primer reverse,  
             or a walking read?
          what is its template name?

If you are using different libraries that have different insert sizes, 
then Consed/Autofinish also need the library name for each read.

Generally this information can be determined from the read name, using
*your* naming convention.  Modify the perl script
determineReadTypes.perl to put this information at the end of the phd
file using WR info items.

If you don't want to do much perl programming and all your libraries
have the same insert size, you have the option of using the St Louis
naming convention.  In this case, you needn't do anything with
determineReadTypes.perl

You must also uncomment (remove the "#"s in column 1) the lines in
the phredPhrap script that say roughly:

#print "\n\n--------------------------------------------------------\n";
#print "Now running determineReadTypes.perl...\n";
#print "--------------------------------------------------------\n\n\n";

#!system( "$determineReadTypes" ) || die "some problem running determineReadTypes.perl $!\n";

But what is the St Louis naming convention?  Most of it (but not all)
is explaned in the file phrap.doc that comes with phrap.  In addition,
you must never use an underscore in the name if the read is a
universal primer forward or universal primer reverse read.  If the
read is a walk, then you must have an underscore (_) follow the
template name and then have a number (the oligo number).

Examples of reads in the St Louis naming convention:

read eeq03a01.g1 is univ rev template: eeq03a01 library: eeq03
read eeq03a02.b1 is univ fwd template: eeq03a02 library: eeq03
read eeq03a02.g1 is univ rev template: eeq03a02 library: eeq03
read eeq03a03.b1 is univ fwd template: eeq03a03 library: eeq03
read eej45h07_2.i1 is walk template: eej45h07 library: eej45
read eej46c12_1.i1 is walk template: eej46c12 library: eej46


Once you have correctly customized determineReadTypes.perl, then
uncomment the line in phredPhrap which calls determineReadTypes.perl

It is fine to assume the St Louis naming convention for the purpose of
the sample dataset directories that come with Consed ("standard",
"assembly_view", "autofinish", and "polyphred").

5.33)  TROUBLESHOOTING YOUR CHANGES TO determineReadTypes.perl

Consed allows you to check that you have correctly modified
determineReadTypes.perl: On the Consed Main Window, point to 'Info',
hold down the left mouse button, and release on 'Show Info for Each
Read'.  Study all the information and check that the information
presented is correct.  If, for example, Consed thinks that there are
templates that have 9 or more reads, it is likely that you have not
correctly customized determineReadTypes.perl

You will see a section that looks like this:

template djs736a2_fp04q286 with 2 reads
    djs736a2_fp04q286.x2 term     universal forward (from phd file)
    djs736a2_fp04q286.y2 term     universal reverse (from phd file)

You want to see the "from phd file" part.  If, instead of "from phd
file", it says "inferred from name", that means that
determineReadTypes.perl couldn't figure out what kind of read it was.

If you think you have made a mistake in customizing
determineReadTypes.perl, it is best to delete the PHD files (and
phd.ball if you are using that) and run phredPhrap again since the
otherwise incorrect WR items will be left in the PHD files.

There is more specific documentation within the script
determineReadTypes.perl for more information about how to customize
it.

CUSTOMIZING determineReadTypes.perl:  SPECIAL CASES


5.34)  FAKE READS

By "fake reads" I mean reads such as those created from a Genbank
reference sequence or a consensus from some other assembly... or others
for which there is no chromatogram (and there never was any
chromatogram).  If you don't use any such reads, you can skip this
step. 

In the past, any read that ended with a .a2 or .c3 (where 2 and 3
could be any numbers), was considered a fake read.  Now you can make
Autofinish not assume this using the consedrc parameter (see CONSED
CUSTOMIZATION): 

consed.fakeReadsSpecifiedByFilenameExtension: false


Instead, you must have determineReadTypes.perl put "fake" into the
"type:" field of a "template" WR item.  See determineReadTypes.perl for
more information.


5.35)  APPENDING EXPID TO THE PHD FILES

If you are not using Autofinish, you can skip this step.  If you are
using Autofinish, and would like Autofinish to tell you how well your
reads are succeeding, then the phd files must be appended with the
experiment id's.  In the 3 Autofinish summary files (*.univReverse,
*.univForwards, and *.customPrimers), you will see information like
this:

univ rev,,,->,-329,-249,71,Contig1,3,djs228_1034

or this:

tgaagaaatggctgactcc,56,1,->,3258,3338,3658,Contig1,4,djs228_2813,5,djs228_168,6,djs228_1248

The '3' just before the djs228_1034 on the line starting with "univ
rev" is an experiment id.  There is
also an expid '4' just before djs228_2813, an expid '5' before
djs228_168, and an expid '6' just before djs228_1248.

Autofinish doesn't know what you will end up calling these reads it is
telling you to make.  Autofinish only knows those reads by the numbers
3, 4, 5, and 6.  So when you make the reads, Autofinish needs to be
informed that this is 'experiment 3' or whatever.  You do this by
appending in the phd file the following structure:

WR{
expid addExpid 990811:140818
5
}

where WR stands for 'whole read item', 
      expid for 'expid'
      addExpid is the name of the program that you will write that
            will append this information
      990811:140818 is the date and time in format YYMMDD:HHMISS
      5 is the expid

This program must be run *after* phred runs to create the phd files.
Thus your program must have some method of determining what the expid
of each read is.  What the University of Washington Genome Center does
is to have the finishers put the expid as part of the filename.  This
makes it easy for a program to look at the phd file and figure out
what the expid is and then write the WR item into that phd file.  

Alternatively, you could keep a database and, after the phd file is
created, look into the database to see what the expid is.

When you have successfully added expid's to the phd files, the next
time you run Autofinish on this project, see the 'EVALUATE' section of
the Autofinish output file--you will see lots of interesting
information about how well the reads succeeded.


----------------------------------------------------------------------------
6.  NOTE TO LINUX USERS (64 BIT) (INSTALLATION)

Do you know for a fact that your computer is a 64 bit computer?
If it is, use one of the 64 bit binaries because the 64 bit version
will allow you to use consed on larger assemblies.

You can determine what kind of computer you have by
typing:

uname -a

If it says something like this:

Linux lake.interim.stanford.edu 2.6.9-78.0.1.ELsmp #1 SMP Tue Jul 22
18:11:48 EDT 2008 i686 i686 i386 GNU/Linux

where there is an "i686" or "i386", then you have 32 bit linux.
In this case skip down to "NOTE TO LINUX USERS (32 bit)" (below).

If it says something like this:

Linux lake.interim.stanford.edu 2.6.9-67.ELsmp #1 SMP Wed Nov 7
13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

where there is an "x86_64" present, then you have 64 bit linux and you
are reading in the right place.

If it says something like this:

Linux lake.interim.stanford.edu 2.4.21-sgi240rp04041413_10065 #1 SMP
Wed Apr 14 13:09:51 PDT 2004 ia64 unknown

where there is an "ia64" present, then you have itanium linux, which
we do not support.


I've supplied three executables: 
consed_rhel6linux64bit
consed_rhel4linux64bit
consed_rhel4linux64bit_static

The ones with rhel4 were built on a RedHat release 4 system and the
one with rhel6 was built on a RedHat release 6 system.  The one with
"static" in the name doesn't use shared libraries; the two others are
dynamically linked.

Try the first one first.  If Consed doesn't come up at all, then try
the second one.  The kind of problems you might have would cause
consed to immediately terminate, so if consed comes up at all (you can
see the Consed Main Window), that particular executable is fine for
you.  (See QUICK TOUR OF CONSED for how to start Consed--you must be
in the correct directory.)

(August 2013) One user on Fedora 17 had the following font problem:
the base letters in the Aligned Reads Window were different widths so
didn't line up vertically.  He solved this problem by downloading the
following font kit:

xorg-x11-fonts-misc

which includes the font that consed wants to use which is called:

-misc-fixed-medium-r-normal--15-140-75-75-c-90-iso8859-1

which is a common fixed-point font (every letter is the same width).


6.1)  (April 2012) One user of Ubuntu 11.04 had the following problem:

./consed_linux64bit: error while loading shared libraries:
  libstdc++.so.5: cannot open shared object file:
  No such file or directory


The problem was fixed by issuing the following command:

sudo apt-get install libstdc++5


-or-
 as root:

apt-get install libstdc++5


6.2)  If you can't copy and paste (see COPY AND PASTE elsewhere in this
document: if you highlight a segment of the consensus sequence, you
should be able to paste it into the search window), try the
dynamically linked executable consed_linux64bit.

6.3)  (Reported July 2012) If you get warnings like this:

Warning: String to TranslationTable conversion encountered errors
Warning: translation table syntax error: Unknown keysym name:  osfActivate
Warning: ... found while parsing ':<Key>osfActivate: 
PrimitiveParentActivate()'

it can be fixed by:
export XKEYSYMDB=/usr/share/X11/XKeysymDB
or
export XKEYSYMDB=/usr/lib/X11/XKeysymDB

(try each to see which works)

If you are using csh or tcsh instead of bash, use
setenv XKEYSYMDB /usr/share/X11/XKeysymDB
instead of 
export XKEYSYMDB=/usr/share/X11/XKeysymDB

6.4)  Another user reported the following problem (Aug 2008):

"now we are unable to copy/paste into Consed from text editors such as
emacs or vim.  However, copying/pasting within Consed works just
fine."

He then found the following fixed the problem:

"Initially, as I was following the installation instructions and
couldn't verify the version number with a 'consed -v' command with the
'consed_linux64bit' executable (it complained about a missing library,
libstdc++.so.5) I switched to the 'consed_linux64bit_static' executable
and it returned the version number properly.  After finishing the
installation and attempting to work with our assembly data we hit some
strange errors.  On a hunch and following Joel Martin's advice not to
use the _static executable we installed the compat-libstdc++-296 and
compat-libstdc++-33 libraries on our fedora 8 64-bit system and
reverted to the non-static executable.  (These were the only two in
the Legacy Software directory of our Fedora 8 repository.)"

6.5)  For users running on Ubuntu Linux (Aug 2010):

When Ubuntu upgraded from release 9 to release 10, they introduced a
serious bug into the X-Window server software.  This affects all
motif-based graphical applications, including Consed.  When you right
mouse click in consed's Aligned Reads Window, the mouse cursor is
captured within part of the Aligned Reads Window.  You cannot move the
pointer to the "quit" button to terminate consed, and you cannot give
input focus to any other window or application on the computer.

To fix your Ubuntu system so that consed will run, do the following:

Open a terminal window by pressing Alt+F2, write gnome-terminal and
press run

Type the following in the terminal window:

sudo add-apt-repository ppa:crcarlin/ppa
sudo apt-get update
sudo apt-get update  (same as the previous line)
sudo apt-get upgrade

When it asks you if you want to upgrade your xserver-xorg-core package
and (possibly) your xserver-xorg-dev package and other packages that
start with a x, allow it to do so.

Then reboot your computer.


Go again to the terminal (Alt + F2 and write gnome-terminal and press
run) and type the following in a terminal window:

sudo rm /etc/apt/sources.list.d/crcarlin-ppa-lucid.list
sudo apt-get update
sudo apt-get update  (same as the previous line)
sudo apt-get upgrade

According to one Consed user who tried this, this will fix the
problem.


6.6)  One user (July 2010) reported:

On Ubuntu 10.04, (Lucid) Static Consed Version 19.0 (090206) gave the
following message:
consed: relocation error: /lib/libnss_files.so.2: symbol __rawmemchr,
version GLIBC_2.2.5 not defined in file libc.so.6 with link time
reference

Shared/Dynamic linking Consed gave:
/pkg/consed/consed_linux64bit: error while loading shared libraries:
libstdc++.so.5: cannot open shared object file: No such file or
directory

I solved the problem by downloading and directly installing an old
library using dpkg :
http://packages.ubuntu.com/jaunty/amd64/libstdc++5/download


6.7)  Another user reported that consed_linux64bit could not find libXp.so.6
He solved this problem by downloading
xorg-x11-deprecated-libs-6.8.2-1.EL.52.x86_64.rpm 
from 
http://rpm.pbone.net/index.php3/stat/4/idpl/8965447/com/xorg-x11-deprecated-libs-6.8.2-1.EL.52.x86_64.rpm.html
or from
http://rpm.pbone.net/index.php3/stat/3/srodzaj/1/search/libXp.so.6()(64bit)
and installed the rpm package, then added "/usr/X11R6/lib64/" to
"/etc/ld.so.conf" 
then ran the command "ldconfig" 

6.8)  Several users reported that consed_linux64bit gave:
error while loading shared libraries: libXp.so.6: cannot open shared
object file: No such file or directory

and he solved the problem with the command:
yum install libXp


--------------------------------------------------------------------------
7.  NOTE TO LINUX USERS (32 bit) (INSTALLATION)

Do you know for a fact that your computer is not a 64 bit computer?
If it is, use one of the 64 bit binaries because the 64 bit version
will allow you to use consed on larger assemblies.

You can determine what kind of computer you have by
typing:

uname -a

If it says something like this:

Linux lake.interim.stanford.edu 2.6.9-78.0.1.ELsmp #1 SMP Tue Jul 22
18:11:48 EDT 2008 i686 i686 i386 GNU/Linux

where there is an "i686" or "i386", then you have 32 bit linux and you
are reading in the right place.

If it says something like this:

Linux lake.interim.stanford.edu 2.6.9-67.ELsmp #1 SMP Wed Nov 7
13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

where there is an "x86_64" present, then you have 64 bit linux and you
should skip up to "NOTE TO LINUX USERS (64 BIT)" (above).

If it says something like this:

Linux lake.interim.stanford.edu 2.4.21-sgi240rp04041413_10065 #1 SMP
Wed Apr 14 13:09:51 PDT 2004 ia64 unknown

where there is an "ia64" present, then you have itanium linux, which
we do not support.


We have found that there is a large variation among different linux
systems (even those with the same kernel) so I have provided 2
different executables (consed_linux32bit and  consed_linux32bit_dyn)
in the hope that one will work for you.

With one of them, consed may not come up at all but rather terminate
with an error such as the following:

> ./consed
./consed: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory

or

> ./consed: symbol regexec, version GLIBC_2.3.4 not defined in file
  libc.so.6 with link time reference

(See below for suggestions from Consed/linux users with similar
experiences.)

7.1)  If Consed does come up, do the following test:

Bring up Consed with the standard dataset as shown in QUICK TOUR OF
CONSED (above) and open standard.fasta.screen.ace.1 as shown in the
QUICK TOUR.  After Consed is up, on the Consed Main Window there is a
menu "Help" on the top right.  Push the left mouse button down on Help
menu.  There will be a list of choices that will appear.  While still
holding down the left mouse button, drag the cursor to "Test Exception
Handling" and release the left mouse button.

If a popup window appears with a "Dismiss" button, you are fine (but
you should still read the rest of this note).  If Consed terminates,
then this Consed executable does not work with the exception handling
shared libraries you have installed.  Try a different consed
executable or find different shared libraries, as discussed below.


7.2)  For users running on Ubuntu Linux (Aug 2010):

When Ubuntu upgraded from release 9 to release 10, they introduced a
serious bug into the X-Window server software.  This affects all
motif-based graphical applications, including Consed.  When you right
mouse click in consed's Aligned Reads Window, the mouse cursor is
captured within part of the Aligned Reads Window.  You cannot move the
pointer to the "quit" button to terminate consed, and you cannot give
input focus to any other window or application on the computer.

To fix your Ubuntu system so that consed will run, do the following:

Open a terminal window by pressing Alt+F2, write gnome-terminal and
press run

Type the following in the terminal window:

sudo add-apt-repository ppa:crcarlin/ppa
sudo apt-get update
sudo apt-get update  (same as the previous line)
sudo apt-get upgrade

When it asks you if you want to upgrade your xserver-xorg-core package
and (possibly) your xserver-xorg-dev package and other packages that
start with a x, allow it to do so.

Then reboot your computer.


Go again to the terminal (Alt + F2 and write gnome-terminal and press
run) and type the following in a terminal window:

sudo rm /etc/apt/sources.list.d/crcarlin-ppa-lucid.list
sudo apt-get update
sudo apt-get update  (same as the previous line)
sudo apt-get upgrade

According to one Consed user who tried this, this will fix the
problem.  

7.3)  If you try to run consed and get an error message like this:

> ./consed
./consed: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory

This is because there must be a file in /usr/lib
libstdc++.so.6

I have provided this file in case you don't have it.  Just put it in
/usr/lib and see if that fixes the problem.

One consed user reports:

did a little poking around and found that i needed:
compat-libstdc++-7.3-2.96.118 RPM for i386 since i'm running fedora
core 1 at the moment.  ...  Anyway, if anyone gets this error tell
them they're missing the Standard C++ libraries for Red Hat 7.3
backwards compatibility compiler and it can be downloaded here:

http://www2.linuxforum.net/RPM/fedora/core/1/Fedora/RPMS/compat-libstdc++-7.3-2.96.118.i386.html

Another system administrator said:

"I got the error:

../../consed_linux: error while loading shared libraries:
libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file
or directory

In order to resolve the issue on linux boxes you need to install the
compatibility libraries.
 
"To be on the safe side I installed the following
 
compat-libstdc++-33.i386 3.2.3-61
gcc-c++.i386 4.1.2-14.el5
gcc.i386 4.1.2-14.el5
cpp.i386 4.1.2-14.el5
libstdc++-devel.i386 4.1.2-14.el5
libgomp.i386 4.1.2-14.el5
libstdc++.i386 4.1.2-14.el5
libgcc.i386 4.1.2-14.el5
 
"I believe you may get away with just the first compat-libstdc
package."


7.4)  If you get warnings like this:

Warning: String to TranslationTable conversion encountered errors
Warning: translation table syntax error: Unknown keysym name:  osfActivate
Warning: ... found while parsing ':<Key>osfActivate: 
PrimitiveParentActivate()'

it can be fixed by:
export XKEYSYMDB=/usr/share/X11/XKeysymDB
or
export XKEYSYMDB=/usr/lib/X11/XKeysymDB

(try each to see which works)

If you are using csh or tcsh instead of bash, use
setenv XKEYSYMDB /usr/share/X11/XKeysymDB
instead of 
export XKEYSYMDB=/usr/share/X11/XKeysymDB


7.5)  If you can't cut and paste (e.g., if you highlight a segment of the
consensus sequence, you should be able to paste it into the search
window.  It gets highlighted, but nothing gets pasted), fix it by:

using the dynamic executable: consed_linux32bit


----------------------------------------------------------------------------
8.  NOTE TO MACOSX USERS (INSTALLATION)


Be aware that only macosx 10.6 and better are fully supported.  Older
versions will work, but will not have access to bamScape and bam2Ace.


8.1)  Downloading Consed.

One user (in 2013) reported a problem using Safari to download consed:

"Be aware that Safari automatically decompresses the original
file causing some problems when transferring the file to another
computer. You can disable the automatic decompressing option by
unchecking "Open Safe Files After Downloading" in Safari
Preferences/General."

8.2)  To create /usr/local/genome, create a terminal window and type:

cd /usr/local
sudo mkdir genome
sudo chmod 777 genome

(The last command says that anyone can read and write to the genome
directory.  If you don't want to allow this much access, read about
the chmod command and adjust it according to your wishes.)

(You can also use Finder to do it but it is tricky since Finder
normally will refuse to even show you /usr which is a hidden file.  To
get it to show you hidden files, google "showing hidden folders
macosx".  Then create a folder "genome" within /usr/local and return
to a terminal window for the rest of the installation.)


8.3)  Determine which of the 2 consed executables to use:

consed_mac_intel
consed_mac_ppc

If consed_mac_intel works, use it.  


If you are running macosx 10.10, you probably will get the
following error:

consed
dyld: Library not loaded: /usr/X11/lib/libX11.6.dylib

You can solve this problem by opening a terminal window and typing:

cd /usr
sudo ln -s /opt/X11 X11

(Note that the X's above are capitalized.)

That should do it--consed should then start without error.


If it gives an error such as:

consed_mac_intel: Bad CPU type in executable.
or
dyld: Library not loaded: /usr/X11/lib/libX11.6.dylib

then you can try consed_mac_ppc.  However consed_mac_ppc doesn't have
bamScape or bam2Ace.  (If there is overwhelming demand, I will change
this.)  You probably need to have macosx 10.6 or better to run
consed_mac_intel (10.5.8 won't work).


8.4)  Now follow the normal installation instructions 

./installConsed.perl (consed executable) (where consed programs)

(see above under INSTALLING CONSED).


8.5)  You must put /usr/local/genome/bin (or wherever you put consed and the
scripts) into your path.  On MacOSX, this is by a file in your home
directory .bash_profile.  You would add this line:

export PATH=/usr/local/genome/bin:$PATH

When you log off and log back on, your new path will include consed.


8.6)  X-WINDOWS on MacOSX can have problems.  To test this, type (in a
terminal window):

xterm

A xterm terminal window should appear.  

If not, here are some suggestions from various people, some
of which may work and some may not:

If you are using MacOSX 10.6 or better, things should just work.

If you are using MacOSX 10.5, there seems to be at least 2 problems
with X11: one is that cut/paste does not function within X11.  I've
also heard that the $DISPLAY variable is not set automatically.  Here
is what one user suggests:

  The best workaround is to remove X11 altogether, and
  replace it with the previous version that was part of Mac OS 10.4. Here
  is how:

  Remove the X11 installation of Mac Os 10.5

  sudo rm -rf /usr/X11 /usr/X11R6
  sudo rm /System/Library/LaunchAgents/org.x.X11.plist
  (or rm /System/Library/LaunchAgents/org.x.startx.plist)
  sudo rm /Library/Receipts/X11User.pkg
  sudo pkgutil --forget com.apple.pkg.X11DocumentationLeo
  sudo pkgutil --forget com.apple.pkg.X11User
  sudo pkgutil --forget com.apple.pkg.X11SDKLeo
  sudo pkgutil --forget org.x.X11.pkg

  Install the X11 installation of Mac Os 10.4
  (found on 10.4 installation CD:
  System/Installation/Packages/X11User.pkg)

  Install the newest xquartz (X11-2.3.2.1.dmg) found
  at http://xquartz.macosforge.org/trac/wiki

  You can start X11 from the dock and run consed just as usual.

For other versions of MacOSX
One person suggests:

http://sage.ucsc.edu/~wgscott/xtal/wiki/index.php/X11
http://sage.ucsc.edu/~wgscott/xtal/wiki/index.php/X11_more_details

Another says that for 10.5, an X-environment comes installed by
default (XQuartz).  Information about XQuartz (and the newest
versions) can be found at: 
http://xquartz.macosforge.org

Another says that for older versions of macosx (10.4 and earlier):

You must have an X environment on your MAC and you might need to turn
it on.  If you don't know how to do this, find someone locally who can
help you.

If you don't have an X environment already on your MAC, download from
Apple at www.apple.com/software  I suggest you use XDarwin in full
screen mode.  Use option-apple-A to move back and forth between the
MAC desktop and the X environment.  

Another counters that XDarwin is not so friendly and instead suggests
running the X11 version found at:

http://www.apple.com/downloads/macosx/apple/x11formacosx.html

or else OroborOSX (http://oroborosx.sourceforge.net/), and a new
(non-beta) version is available
(http://oroborosx.sourceforge.net/download.html).

Some people say that XDarwin is no longer supported.


8.7)  Please edit the phredPhrap script to reflect the correct location
of nice (there is a note in the phredPhrap script about this).


8.8)  MICE FOR MAC


If you have a 1-button mouse, I've found that:

apple-click = right button click
option-click = middle button click

(With X11 up, you may need to go into X11 Preferences, Input and
enable 3-button mouse emulation.)


8.9)  C COMPILER FOR MAC

You will need to have a c compiler to compile some programs.  If you
can't find one on your computer, one is part of the Xcode package
which is both part of a CD that came with your mac and it is also
available for download from Apple.  You will need to get a free
membership to Apple's ADC program to download it.


--------------------------------------------------------------------------
9.  NOTE TO SOLARIS USERS (INSTALLATION)

9.1)  Do not use /usr/ucb/cc !!!  How can you tell if you are using it?
Type:

which cc

If it says /usr/ucb/cc, you must get gcc or else buy the commercial cc
from Sun (which is /opt/SUNWspro/bin/cc).

If you use /usr/ucb/cc, strange things will happen, including
phd2fasta not working correctly by cutting off the first 2 characters
of read names.


----------------------------------------------------------------------------

10.  QUICK TOUR

Consed is a program for viewing and editing assemblies.

Below are:

QUICK TOUR OF BAMSCAPE
and
QUICK TOUR OF CONSED

If you are already an advanced Consed user, you should read through
this and do any of the exercises on features that you are unfamiliar
with.  I frequently run across people who are doing something in
Consed a hard way month after month, and request a new feature to make
things easier, when that new feature is already in Consed.

If you have never used Consed before, to follow this Quick Tour will
take you less than 6 hours.  I've heard of many people who do not
have 6 hours to spare so they skip the Quick Tour and then they
struggle for 2 days instead.

When you do the quick tour, I encourage you to be free about changing
the data set.  If you really mess things up (such as changing all a
read's bases to N's), no problem--just delete the data set and start
again with a fresh copy.


----------------------------------------------------------------------------

11.  QUICK TOUR OF BAMSCAPE


Note:  Currently this feature is available for linux (32 and 64 bit)
and macosx-intel (but not ppc).  It is not available for solaris.


11.1)  USING BAMSCAPE

Consed -bamscape can view (similar to IGV) BAM files.  It can be
brought up like this (don't do this yet):

consed -bamscape -bamFile myBamFile.bam -referenceFOF bamScapeReference.fof

where myBamFile.bam is a BAM file that must be sorted and must have an
associated .bai index file.  

bamScapeReference.fof looks like this:

/codon/gordon/genomes/human_genome_hg18/chr1.fa
/codon/gordon/genomes/human_genome_hg18/chr2.fa

where these are the pathnames of the fasta files of the reference
sequences.  (There can be multiple reference sequences in the same
file.)

-bamFile myBamFile.bam

11.2)  Type the following:

cd bamScape

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

11.3)  Make sure there is no file called rewriteReference.fof by typing:

ls -l rewriteReference.fof

If there is such a file, delete it:

rm rewriteReference.fof


11.4)  start bamscape like this:

consed -bamscape -bamFile reads.sorted.bam -referenceFOF bamScapeReference.fof

You should see a window popup labeled "Bam View" with the single
reference sequence "23".  

11.5)  Double click on the "23".

Up should pop a window entitled "Reads vs Reference".  This
will show the range of 1-200,001 bases of reference sequence 23.  

You will notice there are 2 panels.  The top panel shows read
depth--total read depth in blue.  You will notice that there are only
reads in the region from about 50kb to 175kb.

Look at the bottom panel.  It shows read discrepancies.  If, at a
position, all of the reads disagree with the reference base, the graph
will show 100%.  (Since there are generally more than 1 reference
position at the same pixel location on the screen, the graph will be
the maximum % discrepancy rate of all bases represented by the pixel.)

There will usually be some fraction of reads that are inconsistently
paired, but if there is a high fraction, then something is wrong
(perhaps the reads are mismapped).  You will notice that the brown
looks like it might be as high as the blue near position 100,000.


[SEARCH FOR PROBLEMS/VARIANTS] (needed by Help button)

11.6)  Finding Problem Regions

Point to "Navigate", hold down the left mouse button and release on
"Search for next problem".  The "Search for Problems/Variants" window will
pop up.  You will notice that "problems" can be defined in 3 ways:

too high (or too low) depth of coverage
too many reads with inconsistent mates (an "improper pair" in SAM terminology)
too many reads discrepant with the reference

You will also notice that "too many" is defined both in % and in
absolute number.

There is the ability to exclude junk reads by the filter "ignore reads
whose average base quality is less than _______".

11.7)  There is a box to the left of "Read depth < 5 reads".  Click this
box so it is checked.

11.8)  Click the button labeled "Find first problem in reference
sequence".

A window labeled "Problems" will pop up. There will be one line
saying:

23   1-45,741   read depth 0 too low

11.9)  Move this window aside so you can see the Reads vs Reference
window again.

You will see a similar message at the bottom of the Reads vs
Reference window.  A turquoise cursor will be blinking at position
45,741 which is the right end of the region with a flat red line--no
reads.

11.10)  Now click the button "Find first problem after cursor".  You
will see another region with too low read depth.

11.11)  Click "Find first problem after cursor" several more times.  You
can see that most of the problems are too low depth of coverage.

11.12)  So that you can see other problems, uncheck the box to the left of
"Read depth < 5 reads".

11.13)  Click "Find first problem in reference sequence".  At the bottom
of the Reads vs Reference Window and added to the Problems Window
will be a line:

7 reads with inconsistent mates out of 10 or 70 % (at left end of
region: 53,484-53,489)

11.14)  Click on "Find first problem after cursor".  The problem found
will be:


100 % of reads are discrepant (5 discrepant out of 5 all above or at
quality 30) There are 2 discrepant positions in a window of size 25 (at left end of region: 95,785-95,785)

What does this "window size" ... mean?  

In real datasets, many of the discrepant locations will be SNPs,
rather than misaligned reads.  To distinguish between polymorphic
sites and mismapped reads, we have added an additional filter:
isolated discrepant locations are ignored.  Rather, the discrepant
locations must cluster.  In the "Search for Problems/Variants" window
is a line saying 
">= [2] discrepant sites in a window of size [25] bp" 

If you wanted to report every discrepant position, including SNPs, you
would change the "2" to "1" and then every discrepant location would
be reported.

11.15)  Continue to click on "Find first problem after cursor" and watch
the messages at the bottom of Reads vs Reference Window.
When you have clicked about 24 times, you will finally get a message
saying "no further problems".  

You can play around with the options on the Search for Problems
Window.


Let's look more closely at a particular problem area.

11.16)  Point at the 100,000 and hold down the right mouse button, and
then release on "zoom in".  

11.17)  Zoom in a few more times until you see about 89,000 on the left
of the window and 109,000 on the right of the window.  If you zoom in
too far, then zoom out again.

Let's look at this region in detail with consed:

11.18)  Point at about 95,000 and hold down the *middle* mouse button.
While continuing to hold down the middle mouse button, move the pointer
to around 105,000 and release the mouse button.  While you were
moving, the region should have been highlighted in grey.

A window should pop up labeled "Swiped Region".  In this window is a
list of clusters of mates of reads in the region you just swiped.
These mate pairs are inconsistent as indicated in the bam file, so it
is really up to whatever created the bam file (bwa?) to decide what
"inconsistent" means, but it typically means too far away and/or in
the wrong orientation.  You will notice that there are about 22 mate
reads on reference sequence GL000224.1, 11 on GL000214.1, and about 10
on reference sequence 4, and smaller numbers on many other reference
sequences.

11.19)  Scroll down this list until you find a line showing reference
sequence 23 from 93,125-103,161 (the number 103,161 might be slightly
different).  Double click on that line.

Another Reads vs Reference Window will pop up showing just
the region starting at 93,125 and ending somewhere around
103,000----the region where the inconsistent mates are clustered.

11.20)  Dismiss this latest Reads vs Reference Window.

11.21)  Go back to the "Swiped Region" Window and click on "Add New Region
to List".  

Under "List of region for consed:", you should now see "23 95,039 to
104,928" (approximately).  

11.22)  Dismiss this "Swiped Region" window.

11.23)  In the first Reads vs Reference Window (the one that
shows about from 83,893-113,892), push the ">>>" button several times
until you see 150,000-170,000 (approximately).  

11.24)  Swipe the region from 155,000 to 165,000 again by pointing to
155,000, holding down the middle mouse button, moving the pointer to
165,000 while holding the middle mouse button down, and then releasing
the middle mouse button at 165,000.  

Another box labeled "Swiped Region" should pop up and you should still
see, under "List of regions for consed:", the line "23 95,039 to
104,928" (approximately).

11.25)  Click the button "Add New Region to List and Start Consed".

In about 2 seconds a window will appear labelled "Consed Main
Window".  In its "Contig List" are 2 contigs.  These 2
contigs are the 2 regions you swiped.

11.26)  Double click on the first contig.  

11.27)  In the goto"Pos:" box in the upper right-hand corner, type 100,000 and
 type "Enter".  There should now be a blinking red cursor on the
 consensus base at position 100,000.  These bases should be:
 ttctctccagcc

11.28)  Point to the T at position 100,000 and click with the left mouse button.

11.29)  Type A.

11.30)  A box labelled "Are you sure?" will popup saying "There is no read
that has base a.  Are you sure?  (y/n)".  Check the box that says "do
not ask this question again" and click "yes".
The base at position 100,000 should now be changed to an
A.  Now left-click on the base at position 100,001 and similarly
change it to an A.  Do this all positions from 100,000 to 100,010.

11.31)  Exit consed by pointing to the "File" menu and releasing on "Quit
 consed".  

11.32)  When a menu comes up titled "consed", click on "Save before
 Quitting."  

11.33)  A box labeled "Save assembly to file" will appear.  Click
"OK".  

11.34)  Another box will popup saying:

"Do you want to append this ace file to the list of ace files that will
be applied with consed -rewriteReference?"

Click "Yes".

All of the consed windows will disappear leaving bamScape still
running.

11.35)  Exit bamScape by clicking "Quit" in the Reads vs Reference
Window.


11.36)  MODIFYING THE REFERENCE SEQUENCE

Now there should be a file rewriteReference.fof

See this by typing:

ls -l rewriteReference.fof

Take a look at it:

> more rewriteReference.fof
EDITED /wd1/gordon/sunny/bamScape/consed1/edit_dir/130222.155105.ace.1

where it will be something different--the absolute pathname of the ace
file you just saved.

11.37)  Run the following command:

consed -rewriteReference -referenceFOF bamScapeReference.fof -newReferenceFOF newReference.fof -aceFileFOF rewriteReference.fof 

You will see output similar to this:

-rewriteReference will be run.
no consedrc file so no project-specific resources--that's ok
couldn't open readOrder.txt--that's ok
23 changed
closing new fasta, 23.fa.new
done

11.38)  You will see a new file in your directory, 23.fa.new

Type "ls" to see it.

11.39)  Compare 23.fa (the original reference sequence) with 23.fa.new
 (the modified reference sequence) by typing:

11.40)  cross_match 23.fa.new 23.fa -minmatch 100 -alignments >cross.out

(This assumes that you have installed cross_match which is part of the
phrap package available from phg@u.washington.edu using the same
license you used to get consed.)

and then examine cross.out:

11.41)  emacs cross.out

Scroll down to line 100,000 and you will see the sequences aligned at
this position with a difference between an A and a T with a "v"
between them (v for "transversion").

  23                 100000 ATCTCTCCAGCCTTCCCCGGATTTCTGCCACAGTCAGCCCCAGGCACCCA 100049
                            v                                                 
  23                 100000 TTCTCTCCAGCCTTCCCCGGATTTCTGCCACAGTCAGCCCCAGGCACCCA 100049


11.42)  FINDING PROBLEMS IN BATCH

11.43)  Type:

consed -findBamProblems -bamFile reads.sorted.bam -referenceFOF bamScapeReference.fof -nav myProblems.nav

This will find problems and put them into the myProblems.nav file in
Picard IntervalList format.  (See
http://picard-tools.sourcearchive.com/documentation/1.25-1/IntervalList_8java-source.html
)


Picard IntervalList format:
 * A SAM style header must be present in the file which lists the sequence records
 * against which the intervals are described.  After the header the file then contains
 * records one per line in text format with the following values tab-separated:
 *   - Sequence name
 *   - Start position (1-based)
 *   - End position (1-based, end inclusive)
 *   - Strand (either + or -)
 *   - Interval name (an, ideally unique, name for the interval)
 *   - Anything else

Here is a little example:

@HD	VN:1.0	SO:coordinate
@SQ	SN:chr1	LN:247249719	UR:/seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta	M5:9ebc6df9496613f373e73396d5b3b6b6	SP:Homo sapiens
@SQ	SN:chr2	LN:242951149	UR:/seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta	M5:b12c7373e3882120332983be99aeb18d	SP:Homo sapiens
.
.
.
@SQ	SN:chrX_random	LN:1719168	UR:/seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta	M5:f4d71e0758986c15e5455bf3e14e5d6f	SP:Homo sapiens
chr1	1104841	1104940	+	target_1
.
.
.
chr1	1110198	1110401	+	target_9


Examine this file by typing:

less myProblems.nav

Then bring up bamScape with all of these locations already loaded into
an interactive list:

consed -bamScape -bamFile reads.sorted.bam -referenceFOF bamScapeReference.fof -nav myProblems.nav

Double-click on any item in the list to go to that location in the
BVAligned Reads Window.  Alternatively, you can click "next"
repeatedly to examine these locations.

We found these problems using the default parameters.  The following
consedrc parameters allow you to set different parameters, just as you
did in the Search for Problems Window (above):

consed.BVFindProblemsTooHighDepthOfCoverage: true
consed.BVFindProblemsTooLowDepthOfCoverage: false
consed.BVFindProblemsDepthOfCoverageAboveThisNumber: 100
consed.BVFindProblemsDepthOfCoverageBelowThisNumber: 5
consed.BVFindProblemsInconsistentReads: true
consed.BVFindProblemsPerCentInconsistentReadsAboveThisNumber: 20.0
consed.BVFindProblemsNumberOfInconsistentReadsAboveThisNumber: 6
consed.BVFindProblemsDiscrepancyRate: true
consed.BVFindProblemsDiscrepancyRateIsAboveThisNumber: 30
consed.BVFindProblemsDiscrepancyNumberOfReadsIsAboveThisNumber: 4
consed.BVFindProblemsDiscrepancyIgnoreSoftTrimmed: false
consed.BVFindProblemsNumberOfDiscrepantSitesInAWindow: 2
consed.BVFindProblemsWindowSizeForDiscrepancies: 25

(see CONSED CUSTOMIZATION below).

Note:  at present -findBamProblems requires a single BAM file.  It
cannot handle multiple BAM files.


11.44)  Users can also write their own custom navigation files and load
them into bamScape.  On the main BamScape window, point to the
"navigation" menu, hold down the left mouse button, and release on
"Custom Navigation."  You must supply a file in Picard IntervalList
format, as described above.  If you prefer to write a file in BED
format, you can convert to Picard IntervalList format with:

convertBedToBamScape.perl CHM1_1.1_assemblyerrors.bed
  CHM1_1.1_assemblyerrors2.nav all_sequences.fa conversion.txt

where CHM1_1.1_assemblyerrors.bed is the input file in BED format,
CHM1_1.1_assemblyerrors2.nav is the output file in Picard IntervalList
format, all_sequences.fa is the fasta file of the reference sequences
and conversion.txt looks like this:

> more conversion.txt
1  gi|512322365|gb|CM001609.2|
2  gi|512322360|gb|CM001610.2|
3  gi|512322358|gb|CM001611.2|
4  gi|512322356|gb|CM001612.2|
5  gi|512322354|gb|CM001613.2|
6  gi|512322352|gb|CM001614.2|
7  gi|512322347|gb|CM001615.2|
8  gi|512322345|gb|CM001616.2|
9  gi|512322343|gb|CM001617.2|
10 gi|512322340|gb|CM001618.2|
11 gi|512322338|gb|CM001619.2|
12 gi|512322331|gb|CM001620.2|
13 gi|512322329|gb|CM001621.2|
14 gi|512322327|gb|CM001622.2|
15 gi|512322324|gb|CM001623.2|
16 gi|512322321|gb|CM001624.2|
17 gi|512322319|gb|CM001625.2|
18 gi|512322314|gb|CM001626.2|
19 gi|512322311|gb|CM001627.2|
20 gi|512322309|gb|CM001628.2|
21 gi|512322307|gb|CM001629.2|
22 gi|512322299|gb|CM001630.2|
X  gi|512322297|gb|CM001631.2|
M  gi|512322367|gb|CM001971.1|

where the first column is the name in the Bed file for the reference
sequence and the 2nd column is the name of the reference sequence
according to the bam file.


11.45)  If you want to examine these locations in consed, you can swipe to
bring up consed, as described above.  If you have many locations and
want to examine them all quickly, you can make a single ace file with
all of these locations by running:

picard2Regions.perl

This will create a regions file.  You then must run bam2Ace.perl (see
below)


----------------------------------------------------------------------------

12.  QUICK TOUR OF CONSED (THE GRAPHICAL EDITOR)


12.1)  GETTING YOUR OWN COPY OF A SAMPLE DATASET

12.2)  First get a copy of a sample dataset into your home directory.
Thus you can make as many edits as you like since you will always be
able delete it and get a fresh copy.  Make a copy as follows:

12.3)  cd ${CONSED_HOME}/examples

12.4)  ls

You should see something like this:

454_newbler           autoPCRAmplify_answer   polyphred
align454reads         bamScape                selectRegions
align454reads_answer  gene_track              selectRegionsAnswer
assembly_view         gene_track_answer       solexa_example
autofinish            illumina_paired         solexa_example_answer
autoPCRAmplify        illumina_paired_answer  standard

We want "illumina_paired_answer" right now (later we will want the others).

12.5)  Copy it to your home directory:

cp -r illumina_paired_answer ~

and go home:

cd ~

12.6)  Type the following:

cd illumina_paired_answer/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)


12.7)  start Consed by typing:

consed

(If it says "consed: Command not found.", go back to INSTALLING CONSED
and start over.)

(Don't worry about a message like:
Warning: Cannot convert string "helvetica" to type FontStruct )


Two windows will appear.  One of these will have a list of .ace
files and say 'Click on an ace file and then click Open'.

Double click on "ref.ace.1".  The first window will go away.

(If it asks the question "There is an edit history file...Do you want
to apply those edits?" click "No" )

You will now see a list of one contig and a list of reads.  This is the
'Consed Main Window'.  

Double click on 'c_elegans_piece' in the Contig List.

The 'Aligned Reads Window' will appear.  

12.8)  SCROLLING

Try scrolling back and forth.  You might prefer to enlarge the Aligned
Reads Window so you can see more reads at once.  Do this pointing to
the lower right corner, pushing down mouse button 1, and dragging it
to enlarge the window, then releasing mouse button 1.  

Try scrolling by dragging the thumb of the scrollbar.  Also try
scrolling by clicking on the 4 buttons at the bottom of the window: <<
< > >> for scrolling by small amounts.  For scrolling by tiny amounts,
click on the arrows at either end of the scrollbar.  For scrolling by
huge amounts, use the middle mouse button and just click on some
location on the black area of the scrollbar.  For scrolling to the
beginning or end of the contig, use the <<< or >>> buttons.

Try clicking ">>>".  In typical de novo assemblies, there are reads
that protrude far beyond the beginning of the contig and reads that
protrude beyond the end of the contig.  Moving the scrollbar to the
extreme right will scroll the contig to the end of the rightmost
read--typically far to the right of the end of the contig/consensus.
The <<< and >>> buttons will not go this far--they will just go to the
leftmost and rightmost positions of the consensus.

12.9)  Familiarize yourself with the features on the
Aligned Reads Window.  You should be able to figure out which are the
read names, which are the read bases, and what the yellow arrows are,
etc.

(Answer:  the yellow arrows are strand.)

12.10)  There is a column of C's (and infrequently I's and U's) just to
the right of the yellow arrows.  Point the mouse at one of the C's and
you will see the word "consistent..." in the big status box on the bottom
of the Aligned Reads Window (which is between the "cursor" button and
the "dismiss" button).  "Consistent" means that the read is part of a
mate pair that has pretty normal spacing compared to other mate pairs
in the same library.  

Consistent also means that the reads, if they are in the same contig,
are in one of these orientations:
   ->     <-
or
   <-      ->
but not
   ->     ->
or
  <-      <-

The entire message in the status might not be visible.  You can
enlarge the window to make it visible.

A message like this:

consistent, opp strands, ins size:276,lib default max 331

means that the reads of a pair are on opposite strands (as above),
they are 276 bp apart, they come from a library called "default" (the
one that all reads are put into that don't explictedly say which
library they are from), and that the library has a maximum insert size
of 331 bp.

12.11)  Point with the mouse cursor to one of the reads with a "C" (for
consistent), hold down the right mouse button, and release on "jump to
mate."  Notice that it jumps from the /2 to /1 and vice versa.

12.12)  VERTICAL SCROLLING

Scroll to about base 400 (see the yellow scale at the top of the
Aligned Reads black window).  Scroll the vertical scrollbar (on the
right) up and down.  (If you have made the window so large that you
can't scroll, make the window smaller and then try scrolling.)  The
scrollbar turns a green tint to remind you that you are scrolled down
and there are reads above the top.  You can scroll faster by clicking
on the black space above or below the thumb of the vertical scrollbar.
You can scroll by tiny amounts by clicking on the little grey arrows
at the top and bottom of the vertical scrollbar.  Try each of these.

12.13)  GOTO POSITION

In the Aligned Reads Window, click in the 'Pos:' box in the upper
right-hand corner.  Type in a number, such as 750, and push the
'Return' or 'Enter' key.  The Aligned Reads Window will scroll to
position 750.  We find this feature is particularly useful when one
person wants another person to look at something in the sequence.

(Little used feature: if you type in a number preceded by a "*" such
as "*750", the cursor will be moved to *padded* position 750 (counting
pads) which is unpadded position 749.  These numbers are different
because there is a pad after position 371--see for yourself.)


12.14)  COLORS

Notice the colors.  Scroll to position 377 and notice the read 'A'
about 8 reads down from the top read (all of the others have a C in
that column, as does the reference).  The red bases are the ones that
disagree with the consensus.

Now scroll to position 580.

Notice the different shades of grey background (around the bases).
They refer to the quality (error probability) of the base.  Quality
values mean the following:

A quality value of 10 means 1 error in ten to the 1.0 power
A quality value of 20 means 1 error in ten to the 2.0 power
A quality value of 30 means 1 error in ten to the 3.0 power
A quality value of 40 means 1 error in ten to the 4.0 power

and for quality values in between:

A quality value of 25 means 1 error in ten to the 2.5 power

Get the idea?


(These have actually been empirically verified for Sanger reads--if
you are interested in the gory details, read the phred papers:

Ewing B, Hillier L, Wendl M, Green P: Basecalling of automated
sequencer traces using phred. I. Accuracy assessment.  Genome Research
8, 175-185 (1998).

Ewing B, Green P: Basecalling of automated sequencer traces using
phred. II. Error probabilities.  Genome Research 8, 186-194 (1998).

In that same copy of the journal is a paper about Consed, as well.)

Also notice the upper and lowercase.  This is just a cruder indication 
of the quality of the bases:  uppercase is higher quality and
lowercase is lower quality.

12.15)  To see the quality value of a particular base, point at it and
click with the left mouse button.  You will see the quality displayed
in the Info Box at the bottom of the Aligned Reads Window.  What
is the highest quality base you can find?  What is the lowest quality
base you can find?  (Answers are below.)


These quality values are shown in grey scales:

Quality 0 through 4 is given by dark grey
Quality 5 through 9 is given by a shade lighter
Quality 10 through 14 is given by a shade still lighter
.
.
.
Quality of 40 through 97 is given by white (the brightest shade)

A quality value of 99 is reserved for bases that have been edited and
the user is absolutely sure of the base ('high quality edited').

A quality value of 98 is reserved for bases that have been edited and
the user is not sure of the base ('low quality edit').

Go to position 380.  Notice that the ends of some of the reads have a
completely black background and the letters themselves are grey
(rather than black or red).  These are the unaligned ends of reads, as
determined by the assembler/aligner (phrap, bwa, newbler, etc.)  See,
for example, the dim GCGCGCGCGCGCGCGCGCC on the right end of
C02D1ACXX:7:2202:15482:17608#GAACTATA/1 (about the 3rd read from the top).

12.16)  HIGHLIGHTING READ NAMES 

In the Aligned Reads Window, click on a read name with the left mouse
button.  The name will turn magenta.  Click again and it will turn
yellow again.  Try turning it magenta and then scrolling.  This
feature is helpful in keeping track of a particular read as you
scroll.

If you have an emacs window open (or any editor window), you can paste
the read name in by first clicking on it in consed to turn it magenta
and then clicking with the middle mouse button in the editor window.

To highlight a bunch of reads at once, use the same shift-click method
as in Windows: point and left click to highlight a read.  Then point
to an unhighlighted read several more reads down, hold down the shift
key, and left mouse click.  All of the intervening reads should be
highlighted.

12.17)  You can also make a file of the all of the reads you have
highlighted by doing the following:

Push the left mouse button down on the the 'Highlight' menu and
release on 'Save Highlighted Read Names to File'.  Try this.  (If you
can't find the "Highlight" menu: all menus are near the top of the
window.  Just under the title "Aligned Reads" are the menus labeled
"File", "Navigate", "Info", "Color", "Dim", "Highlight", "Tracks",
"Misc", and "Sort".)

12.18)  Look at the file you created ('highlighted_reads.txt' unless
you changed it)--in UNIX you do this by creating an xterm, cd'ing to
the same directory (illumina_paired_answer/edit_dir), and typing:

less highlighted_reads.txt

(type 'q' to get out of less).

12.19)  Turn off highlighting of all reads by pointing to the 'Highlight'
menu, holding the left mouse down, and release it on 'Unhighlight All
Reads in All Contigs'.

12.20)  MOVING ALONG A READ  

First highlight a read name.  Then left-mouse-click on a base on that
read.  You can move left and right within the read by using the left
and right arrow keys on your keyboard.  You can also scroll by 10 bp
at a time by using the "<" (less than) and ">" (greater than) on the
keyboard (not using the mouse).  (If this doesn't work, first click on
a base and then do it.)  Try this: hold down the control key and type
'a'.  You will move to the left end of the read.  Hold down the
control key and type 'e'.  You will move to the right end of the read.
(Emacs users will recognize these commands.)

When you are done playing with this, unhighlight the read name.


12.21)  DIMMING AND UNDIMMING ENDS OF READS

Scroll so that location 380 is about in the middle of the aligned
reads window.  Notice that is a dimmed stretch of GCGCGCGCGCGCGCGCGCC
that you looked at above.  Push the left mouse button down on the
'Dim' menu.  There will be a list of choices that will appear.  Drag
the cursor down to 'Dim Nothing' and release.  Now look what happened
to the color of the bases.  Many now appear red with a grey
background.  You are seeing the clipped-off bases with all the same
information as any other base.  In some assemblies (especially those
with some contamination, chimeras, vector, etc.) there is a huge
amount of red (discrepant) bases, the screen becomes distracting and
busy.  Thus by default the clipped-off/unaligned bases are made with a
black background and a grey foreground so they don't distract you.

Look at the choices under the "Dim" menu.  Notice there is a
distinction here between 'low quality ends of reads' and 'unaligned
ends of reads'.  Unaligned ends of reads can be low quality as well,
or they can be high quality, as in the case of chimeric reads.

Change back to "Dim" menu, item "Dim Unaligned" and find the read that
has the stretch of unaligned bases mentioned above.  Point with the
mouse to that read's name (the read names are yellow on the
left side of the window and this one is called
"C02D1ACXX:7:2202:15482:17608#GAACTATA/1" ) and hold down the right mouse
button.  You will notice there is a line that says "high quality from
293 to 392; aligned region from 293 to 373; chem: solexa".  This is giving the
same information in number form.  Highlight the read name (see
HIGHLIGHTING READ NAMES above) so you don't lose the read as you
scroll.  Then scroll check that the numbers agree with the dimming.

You can play with the dimming options a bit.  Then return it to 'Dim
Unaligned' for the rest of this tour.


12.22)  FIXING A READ AT TOP OF ALIGNED READS WINDOW

By now, you will have observed that it is difficult to follow a
particular read as you scroll.  This is because some reads end and
others begin so a particular read needs to move up or down to
accommodate these changes.  A particular read jumps up as you scroll
right and jumps down as you scroll left.

But sometimes you want to focus on a particular read as you
scroll--for example, you might want to compare other reads to it along
its length.  Highlighting the read helps, but there is a better way:

12.23)  In the Aligned Reads Window point to a single read, hold
down the right mouse button and release on 'Fix read ... at the top of
this window'.  Suddenly a copy of the read will be shown below the
'Search for String' button and above the numbers and scale.  Try
scrolling left and right.

12.24)  Now point to the read fixed at the top, hold down the right mouse
button and release on 'unfix read ... at the top of this window'.  The
read should be removed from the top.

12.25)  Try fixing several reads at the top of the window.  Then unfix them
all.

12.26)  Highlight 2 or 3 reads.  Point with the mouse pointer to the
"Highlight" menu at the top of the window, hold down the left mouse
button, and release on "Fix Highlighted Reads to Top of Window."

12.27)  You can then point to the "Highlight" menu, hold the left mouse
button down, and release on "Unfix Highlighted Reads At Top of
Window".

12.28)  EDITING THE CONSENSUS

You can edit the consensus in the Aligned Reads Window.  Click on the
't' in the consensus at position 382.  Looking down the column you
will see there is a 'C' that is red.  Type 'C' on the consensus.  Now
look down the column.  You will see that all of the T's have turned
red (since they disagree with the consensus) and the red 'C' is now
black (since it agrees with the consensus).  

You can also edit individual reads, if you like.  Look below under
TRACES AND EDITING READS.


12.29)  SAVING THE ASSEMBLY

To save the assembly, pull down the 'File' menu on the Aligned
Reads Window, and release on 'Save assembly'.  A box will pop up with
a suggested name.  I suggest you always use the one it suggests.  The
idea is that the ace files:


(project).fasta.screen.ace.1
(project).fasta.screen.ace.2
(project).fasta.screen.ace.3
(project).fasta.screen.ace.4
(project).fasta.screen.ace.5

are in order of how old they are.  If you feel you are taking up too
much disk space, then start deleting the ace files starting at the
oldest.  I do not recommend that you overwrite existing ace files.
The version numbers just keep growing, and that is not a problem.


12.30)  EXPORTING THE CONSENSUS

Bring the Aligned Reads Window into view
again.  Hold down the left mouse button on the 'File' menu and
release the button on 'Export consensus sequence'.  Notice that the
consensus will be stored (in this case) in a file called
'Contig1.fasta'.  Click 'OK'.  There is now a file in your edit_dir
directory called 'Contig1.fasta' that has the consensus sequence in
it.  If you want to see the file, bring up another Xterm (if you are
UNIX literate), and type:

cd illumina_paired_answer/edit_dir  

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

less c_elegans_piece.fasta

(You get out of less by typing "q".)

12.31)  Fancier exporting the consensus.  Bring the Aligned Reads Window
into view again.  Hold down the left mouse button on the 'File' menu
but this time release on 'Export consensus sequence (with
options)...'.  Just export a little snip of the consensus, from 370 to
380.  (You will notice this contains a pad * character.)  Under "Write
Both Bases File and Qual File or Just Bases File?" click "Both Files"
Click 'OK'.  Consed will want to call this file 'c_elegans_piece.fasta' again.
You can overwrite the existing file by answering "Yes" to "Save
anyway?"

Look in your other Xterm at these files:

more c_elegans_piece.fasta
more c_elegans_piece.fasta.qual

The one file contains the bases (but no * pads) and the other
contains the corresponding qualities of those bases (in this case the
qualities are all 20.)


12.32)  Exporting the consensus of all contigs at once: Go to the Main
Consed Window (not the Aligned Reads Window).  Point to 'File', hold
down the left mouse button, and release on 'Write all contigs to fasta
file'.  You then can choose a filename for all contigs to be written
to.  (In this project there is only 1 contig, so there is no
difference between this option and just exporting a contig at a time.)


(Note that there is a way of exporting the contigs as they are
oriented in the scaffold.  See EXPORTING SCAFFOLDS below.)


12.33)  COMPLEMENTING THE CONTIG

Push the button 'Compl Cont' in the Aligned Reads Window to complement the
contig.  This displays the opposite strand of the contig including the
consensus and all reads.  Push this button again to uncomplement it.


12.34)  FIND MAIN WINDOW

On the Aligned Reads window, click on 'Find Main Win'.  This will
cause the Consed Main Window to pop up in the event you have buried it under
other windows or iconified it.  (This may not work with some settings of
your X emulator.  In that case you will have to find and click on the
Main Window to bring it up.)


12.35)  MULTIPLE UNDO EDIT

Now that the Consed Main Window is visible, click the 'Undo Edit...'
button.  There will be a popup indicating the most recent edit.  (If
it says "no edits so far", click dismiss and then make some edits to
the consensus.)  Then click on 'Undo Edit...' again.)  Click 'undo'.
Then you will see the edit that was done before that.  Click 'undo'.
You can continue undoing if you like.  You now know how to undo more
than one edit.  You cannot choose which edits to undo and which to not
undo--edits can only be undone in precisely reverse order from the
order you made them.  Once you save the assembly, you cannot undo
prior edits.

12.36)  EXITING CONSED

On the Aligned Reads Window, point to 'File' menu, hold down the
left button and release on 'Quit Consed'.  If it asks you some
questions, answer 'Quit Without Saving and Discard .wrk File'.


12.37)  CONSED -ACE

Try bringing up Consed like this:

consed -ace ref.ace.1

(where "consed" is replaced by whatever command brings up consed on
your system).

This is an alternative to just typing "consed" and then selecting the
ace file from within consed.  Many users prefer this method instead.


12.38)  SORTING OF READS


12.39)  Scroll to position 382 and click on the t in the consensus at
that location.  A
green vertical line will appear.  Scroll up and down using the scrollbar on
the right side of the window.  

12.40)  SORTING BY BASE

Point to the "Sort" menu at the top of the window, hold down the left
mouse button, and release on "Sort Options and Help."  Click on "by
base".  Move this window to the side out of the way (or dismiss it,
but you will need it again soon).

Notice now the read with the red C at position 382 is on top. The
reads with T's are sorted by quality, the darker ones below.  

Look at position 397--all of the reads are A's except for a red G
further down.  Click on the consensus at position 397.  The red G will
jump further down as the reads resort by base, all the A's before the
G.  The A's are sorted by quality, the darkest ones near the bottom.

12.41)  SORTING BY MISMATCHES ON TOP

In the "How to Sort Reads" window, click on "mismatches on top".  The
read with a G will jump to the top, above all of the A's.  

12.42)  SORTING BY QUALITY

In the "How to Sort Reads" window, click on "by quality".  The read
with the G will jump to somewhere in the midst of the reads with A's
since now the reads are sorted by quality.  (Actually they are sorted
by quality in a 9-base window--you can click to see the actual
numerical quality values.)

12.43)  SORTING BY UNALIGNED + MISMATCHES ON TOP

Click on the consensus base at position 382.  In the "How to Sort
Reads" window, click on the button labeled "unaligned + mis on top".
The unaligned read with these bases GCGCGCGCGCGCGCGCGCC should rise to
the top, followed by the read with a red "C" at this position,
followed by all of the reads with black T's.

12.44)  ALPHABETICAL SORTING OF READS

If you have multiple reads from each of multiple patients, you might
want to sort the reads by patient, so that all reads from the same
patient are together.

12.45)  Click on "alpha" and also "by method specified above".  The
reads should be sorted in alphabetical order.  Scroll down to the
bottom and you will see that the reads are in alphanumeric order.

12.46)  When you are done experimenting, turn off sorting by quality:
In the How To Sort Reads Window, click on "by method specified above".
The reads will now be sorted by top strand first and then bottom
strand (look at the arrows in the Aligned Reads Window).

It is also possible to sort the reads by a user-provided file, but to
do this you must learn CONSED CUSTOMIZATION (below) with resources:

consed.showReadsInAlignedReadsWindowOrderedByFile: true
consed.showReadsInAlignedReadsWindowOrderedByThisFile: readOrder.txt


12.47)  SEARCH FOR STRING

Try the 'Search for String' button (left side of the Aligned Reads
Window).  Type in the string "aaaaa" and click 'ok'.  There
should be a list of 'hits'.  Double click on one of the hits (or
single click on it and click on 'go'.)  Notice that the Aligned Reads
Window scrolls to that position and has the cursor on the found
string.  (Some of the hits are complemented, so they are ttttt.)

Try also clicking on the "Next" and "Prev" buttons at the bottom of
the Searching Contigs Window.  

Try clicking on the "Next" and "Prev" buttons at the bottom of the
Aligned Reads Window.  Notice they have the same effect as those
button the Searching Contig Window but are more convenient.

Click "Dismiss" on the "Searching Contigs" Window.  Click the "Search
for String" button in the Aligned Reads Window again.  This time in
the Search For String Window select 'Search Just Reads'.  Erase the
singlets file.  Then click 'OK'.  You will notice there are many more
hits.  This is because this shows hits in each read, even if they are
at the same consensus position.

You can also try the approximate match search for string by clicking
on 'Approximate' instead of 'Exact'.  The 'Per Cent Mismatch' only
applies to the Approximate match search.  

12.48)  COPY AND PASTE

In the Aligned Reads Window, swipe some bases by holding down the left
mouse button.  You should see the bases turn yellow, at least
temporarily.  Then click the 'Search for String' button.  Click the
"Clear" button to clear the "Query string" box.  Use the middle mouse
button to paste the bases you have just swiped into the 'Query
string:' box.  Notice that you can swipe bases either from the
consensus or from a read.

The search for string is case-insensitive so don't worry about the
pasting being upper or lowercase.


12.49)  FINDING VARIANTS/MISASSEMBLED READS/HIGHLY DISCREPANT LOCATIONS

For this exercise, use the dataset called "solexa_example_answer".
Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above).

cd solexa_example_answer/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.  You might
need to type cd ../.. first to get out of the dataset you are
currently in.)

12.50)  Type:

ls

You should see a file ref.ace.1

Start consed by typing:

12.51)  consed -ace ref.ace.1

The Consed main window should appear.

12.52)  Point to the 'Navigate' menu, hold down the left mouse button, and
release on 'Search for highly discrepant positions'.  

12.53)  Do not change any of the defaults and just click the 'Search'
button.  Up will pop a window labelled 'Highly Discrepant Positions'
with an empty window.

Well, that's no fun--apparently there aren't any real variants in this
dataset.  Dismiss this window.

So that you can see what they look like, repeat the steps above to
bring up the Navigate by Highly Discrepant Regions Window again,
but this time change "Ignore Bases Below This Quality" from 20 to 12.
Click 'Search'.

Up will pop the Highly Discrepant Positions Window with a list of the
9 locations below: 

min # of discrepant reads: 2 min quality: 12, "r": base of reference seq
max depth of coverage: 100000 and ignoring reference seq
  A           C           G           T           *              pos     contig
  2   8.0%   23  92.0%r   0   0.0%    0   0.0%    0   0.0%           56 ref
  3   9.1%   30  90.9%r   0   0.0%    0   0.0%    0   0.0%          252 ref
  2   6.9%   27  93.1%r   0   0.0%    0   0.0%    0   0.0%          256 ref
  0   0.0%    0   0.0%   20  90.9%r   2   9.1%    0   0.0%          682 ref
  0   0.0%    0   0.0%   31  93.9%r   2   6.1%    0   0.0%          715 ref
  2   4.8%   40  95.2%r   0   0.0%    0   0.0%    0   0.0%          742 ref
  2   8.7%   21  91.3%r   0   0.0%    0   0.0%    0   0.0%          936 ref
  0   0.0%    1   2.4%    1   2.4%   39  95.1%r   0   0.0%          982 ref

This means, for example, that at position 56 of contig "ref", there are
2 A's, 23 C's, 0 G's, 0 T's, and 0 *'s (deletions).  There are 8.0% A's,
92.0% C's, 0% G's, 0% T's, and the reference sequence contains a C
at this position

If you see a line like this:

  0   0.0%    0   0.0%    0   0.0%    9  75.0%r   3  25.0%     542-543* ref

the "542-543*" means that two indel variants are right next to each
other at positions 542 and 543 and are probably a single event.  


12.54)  Click the 'Next' button on this window and watch the Aligned Reads
Window.  

To see the discrepant reads, change the sort.  As shown above (see
"SORTING OF READS"), point to the "Sort" menu, hold down the left
mouse button and release on "Sort Options and Help".  In the How to
Sort Reads Window, click on "mismatches on top."  You can then click
"dismiss" in the How to Sort Reads Window.  This sort will allow you
to see the discrepant bases and then the agreeing bases with the
highest quality agreeing bases first.

You can continue clicking the 'Next' button either in the
Highly Discrepant Positions Window or else at the bottom of the
Aligned Reads Window.  Do this until you have reached the end of the
list.  This provides a rapid method of reviewing variants.

12.55)  Go back to the Consed Main Window, point to the 'Navigate' menu,
hold down the left mouse button, and release on 'Search for highly
discrepant positions'.  When the window pops up entitled 'Navigate by
Highly Discrepant Regions', look at the different options.  This time
try 'Just list indels'.  Are there any indel variants in this data
set?  Try it and see.  Well, actually there is one, but you will need
to change the "minimum # of discrepant reads" to 1 to find it.  Play
around with these parameters a little.

There is also the ability to ignore locations in which the consensus
is an x or an n (or any bases you wish to ignore).  You turn on this
option by clicking "True" on the line "Ignore location if consensus
base is one of:".

Here are 2 more obscure options:

'maximum depth of coverage'

Typically you won't use this (it is set to a ridiculously high
number).  It is there in case you want to avoid regions that you
believe are collapsed repeats and thus what appear to be variants are
really just differences between different copies of repeats.

'Count only first of multiple reads starting at same location'

Some people believe that Illumina reads that start at exactly the same
location are really the same read (the same cluster) and the image
software made a mistake by making multiple reads out of it.  If such a
group of reads has a discrepancy in it, they want to count the group
as one read with the variant rather than multiple reads with the
variant.

This feature is also available as a report which can be generated
automatically without using consed's graphical interface.  You will
learn how to use consed's report feature later.

12.56)  EXTENDING THE CONSENSUS

You can edit or tag a Illumina read in the same way you do with Sanger
reads (below).  Scroll to the right end of the contig (around position
1000).  

Push the left mouse button down on the menu item 'Dim' and release on
"Dim Nothing".  You will see that there are a number of reads that
protrude beyond the right end of the consensus and are red, indicating
discrepant with the consensus (you may need to scroll down to see
them).  Suppose you want to extend the consensus based on those reads.

Find read HWI-EAS94_4_1_59_547_158 and click on the name to highlight
it so you don't have to find it again.  (Later you will learn how to
quickly find a read by name, but for now just use your eyes.)

Middle mouse click on the c base at the right end of this read.  You
will see a Trace Window pop up.  The dashed lines for the traces are
to remind you that this is a Illumina read and the traces are
completely fictional--this window just gives you the ability to edit
and tag the read.

Point at the first base of the gccatgtcataac sequence which is all
red, and hold down the middle mouse button and swipe to the last base
which is a c (all should turn yellow) and then release.  A "What to Do
with Selection" window should pop up.

In this window, click on the "Change Consensus" button.  In the
Aligned Reads Window, you will notice that the consensus has now been
extended to include the additional bases from the read
HWI-EAS94_4_1_59_547_158.

But there is a problem: click on consensus base "t" at position 1009.
Scroll up and down and notice that the highest quality bases are all G
(and red--discrepant) while the consensus is a t since the read you
used to extend the consensus had a t.  Overstrike the t in the
consensus with a G.  Now scroll up and down and notice the consensus
agrees with the highest quality reads.


12.57)  HIGH AND LOW DEPTH OF COVERAGE REGIONS

Go back to the Consed Main Window, point to the 'Navigate' menu, hold
down the left mouse button and release on 'Search for High (or Low)
Depth of Coverage Regions'.  A Window entitled 'Navigate by High (or
Low) Depth of Coverage' should pop up.

12.58)  Leave "show high depth (not low depth)" checked.  Change the
'min (for high depth regions) or max (for low depth regions) depth of
coverage' box from 10 to 50 and click the 'Search' button.  A navigate
window entitled 'High Depth of Coverage Regions' will pop up with a
number of regions with depth of coverage 50 and over.  Navigate to a
few of them.

To find low depth regions, just uncheck the "show high depth (not low
depth)" box.  If you wanted to find regions with absolutely no read
coverage, you could do that by changing 'min (for high depth regions)
or max (for low depth regions) depth of coverage' box to 0 and also
the 'ignore read bases below this quality' box to 0.  Then click
'search.'  How many such regions did you find?  Now change the "max
(for low depth regions) depth of coverage" box to 1.  Click "search."
How many such regions did you find now?

12.59)  You can find the depth of coverage at a specific position as
follows: in the Aligned Reads Window, set the cursor on the consensus
base at the position you are interested in.  Point to the Misc Menu,
hold down the left mouse button and release on "depth of coverage at
cursor".  The depth will be displayed at the bottom of the Aligned
Reads Window.

12.60)  You can also see an overview of the depth of coverage.  On the
Consed Main Window, click on Assembly View.  The Assembly View Window
will pop up.  An error box will come up saying "Sequence matches will
not be shown in Assembly View...".  Dismiss it for now.

You will learn much more about the Assembly View Window later, but
for now just notice a few features:

12.61)  Put the pointer on the grey bar with the numbers inside it.  Move
the pointer left and right and notice the information displayed near
the bottom of the Assembly View Window as you do this.  In particular,
you will see the depth of coverage and the base position change as you
move the pointer.  The green graph also indicates depth of coverage.
Notice that the green graph has horizontal black lines labeled "0, 50,
100, ... 250".  Let's change that...

12.62)  Find the button labeled "What to Show" near the bottom of the
Assembly View Window.  Click on it.  A popup menu will appear.  Click
on "Read Depth/Multiple Discrepancies."  The bottom box is labeled
"max read depth" and currently has the value 300 in it.  Change that
to 50 and click "Apply".  What happened to the green depth of
coverage graph?


12.63)  ASSEMBLY VIEW

Consed can show you a bird's eye view of the Assembly using
forward/reverse pair information, sequence match information, read
depth, etc.  We have a example database which shows its features.

Get your own copy of the dataset "assembly_view" (see above under
GETTING YOUR OWN COPY OF A SAMPLE DATASET).

cd assembly_view/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

ls
Restart consed

Double click on "assembly_view.fasta.screen.ace.1"

In the Consed Main Window, click on the button "Assembly View" which is
near the upper left corner of the window.

You should see 3 grey bars with pink labels "2", "3", and "1".  The
bars are the contigs: Pink "1" means Contig1, pink "2" means Contig2,
etc.  Notice the scale on the contigs.  This gives the contig
position.

12.64)  READ DEPTH

We covered this briefly under Illumina Reads.

You should see a dark-green graph above the contig bars.  This dark
green graph indicates read depth--the depth of the quality 20 (by
default) region of reads.  

Click on the button labelled "What to Show".  A menu will popup at
that location.  Click on the "Read Depth/Multiple Discrepancies" menu
item.  A window will appear labelled "Show Read Depth/Multiple
Discrepancies".

There is a field labeled "max read depth" and the current value should
be 300.  Change it to 30 and click the "Apply" button.  The read depth
graph should now be much bigger and easier to see.  But depending on
the dataset, a max of 30 might be much too small.  Thus you will need
to adjust it.

You might not like the horizontal lines in the read graph.  Turn them
off by clicking on the square with the check mark (a "toggle button")
labeled "make read depth horizontal lines across window" so that the
check mark disappears.  Then click "apply" and the horizontal lines
should disappear.

Note: the read depth is *not* the # of reads that have quality 20
bases or above, although this number is a good approximation.  For
example, suppose there is a stretch of 300 Q50 bases, and in the
middle of that stretch are 5 Q10 bases.  Those Q10 bases will be counted
toward the Q20 read depth.  (In computer science terms, these bases
are part of the maximal Q20 read segment.)

12.65)  FORWARD/REVERSE PAIR DEPTH

A "forward/reverse pair" is a pair of reads from the same subclone
template, each of which is primed within the subclone vector, but one
is primed on one side of the insert and the other is primed on the
other end of the insert.  A forward/reverse pair may both be assembled
into the same contig, in which case they should point towards each
other and be approximately the insert size apart.  A forward reverse
pair also might be in different contigs on different sides of a gap.

12.66)  To see the graph of forward/reverse pair depth: Click on the
button labelled "What to Show".  A menu will popup at that location.
Click on "Fwd/Rev Pairs".  A box will appear labelled "Which Fwd/Rev
Pairs to Show in Assembly View".  There is a little square (a toggle
button) next to "show consistent fwd/rev pair depth".  Click on this
toggle button to change it from appearing sticking out to appearing
pushed in.  Then click on "Apply".

A bright green graph should appear--this is fwd/rev pair depth.  It is
highest around 7000 to 10000 of Contig2 and around 14000 of Contig3.
The bright green graph indicates, for each base, the depth of subclone
templates that have a consistent forward/reverse pair.  A
forward/reverse pair is "consistent" if the forward and reverse are
pointing towards each other (this may be a problem for some Illumina
datasets--if this is a problem for anyone, let me know) and are not
too far away from each other.  ("Too far" is defined as 3 or more
standard deviations from the mean of the insert size of templates from
a particular library.)  In other words, the green graph tells for each
base, how many consistent forward/reverse pairs have that base between
the forward read and the reverse read.  This forward/reverse pair
depth is not the same as read depth, which is typically much less.
Forward/reverse pair depth is important in that it gives a measure of
the confidence of the assembly at a base.  If the forward/reverse pair
depth is close to zero, as it is in Contig1 position about 9300, there
is a likelihood that the assembly program has made an incorrect join.  When
the forward/reverse pair depth is zero, the green line turns red, as
it does on the right end of Contig3.

12.67)  INCONSISTENT FORWARD/REVERSE PAIRS

The red lines connect the right end of Contig3 with the middle of
Contig1.  These are filtered inconsisent forward/reverse pairs--they
are "inconsistent" because they are not consistent (see above) and
they are "filtered" in that they have another inconsistent read
close by (at both ends) that is inconsistent for the same reason.  

This is a good example of a misassembly.  There are many many reads at
the right end of Contig3 that are paired with reads in the middle of
Contig1.  Notice that the forward/reverse pair depth of Contig1 is
close to zero around base 9300.  (You can use the "Zoom In" button to
see this in more detail, but when you are done experimenting with the
Zoom buttons and the scroll bar, click on "Zoom Orig" for the rest of
this exercise.)  This is where the assembly program made a bad join.
If you tear the contig apart there, complement the left part of
Contig1, and then join it to the right end of Contig3, the
forward/reverse pairs will change from inconsistent to consistent.
You will learn later how to do that.

12.68)  Point to one of the red lines.  You will notice that it turns
yellow.  The text near the bottom of the Assembly View Window tells
you a little more about what you have "highlighted" (turned yellow).
If you want more information, click with the left mouse button.  A
window "Clicked Forward/Reverse Pairs" will appear giving information
about each highlighted read.  Try this.  In the "Clicked
Forward/Reverse Pairs" Window double click on one of the reads.  The
Aligned Reads Window should appear with the cursor on that read.  This
shows how to go from the Assembly View Window to the Aligned Reads
Window.

12.69)  You can also go from the Aligned Reads Window to the Assembly
View Window.  First you must make sure the Assembly View Window is
already open (or else open it by clicking on Assembly View in the
Consed Main Window).  In the Aligned Reads Window, point to a read
name, hold down the right mouse button, and release on "Find Read in
Assembly View" (one of the last items in the menu the appears when you
push down with the right mouse button).  If the read is from a
subclone that has a forward/reverse pair in the assembly, then the
same "Clicked Forward/Reverse Pairs" Window will appear.  It will
contain not only the read that you pointed to, but all of the other
reads from the same subclone as the one you pointed to.  In the
Assembly View Window, all of these reads will blink yellow.  You can
use this procedure to go within the Aligned Reads Window from forward
read to reverse read or visa versa.  (This may take some patience to
find one with this dataset because there are many reads that are not
paired.)

12.70)  Notice the aqua lines that connect the right end of Contig2 to
the left end of Contig3.  These are consistent gap-spanning
forward/reverse pairs.  These are the reads that tell you (and Consed
and many other programs) that the right end of Contig2 is connected to
the left end of Contig3.  As above, point to one to highlight it and
click on it to see more information.

12.71)  You can see much more information by clicking on the "What to
Show" button, and then when the menu pops up, click on the "Fwd/Rev
Pairs" menu item.  Up will pop the "Which Fwd/Rev Pairs to Show in
Assembly View" Window.  Click on "All" next to "Show Inconsistent
Forward/Reverse Pairs".  Then click "Apply" at the bottom of this
window.  In this particular example, you just see a few more stray red
lines.  In a real example, you would probably see so many red lines
that it would be a mess.  In most cases those inconsistent such as
chimerism and not to any misassembly.  Thus I suggest that you only
generally leave "Show Inconsistent Forward/Reverse Pairs" to
"Filtered".  

(I suggest not changing the "# of pairs to confirm or the "expected #
of confirmed inconsistent pairs.  We have used statistics so that the
predicted mean # of clusters of inconsistent fwd-rev pairs is 1.0.)

12.72)  Still in the "Which Fwd/Rev Pairs to Show in Assembly View"
Window, click on "Show each consistent fwd/rev pair within contigs"
(so the button looks as though it is pushed in) and click "Apply".
This will show a blue square for each consistent forward/reverse pair
within a contig.  The horizontal position of the square is the center
of the subclone (midway between the forward and reverse read) and the
vertical position of the square indicates the size of the subclone
(higher means a larger subclone).  If you really want to see the
position of the forward and reverse reads, you can do that too: Click
on "Show legs on squares for consistent fwd/rev pairs" ("Show each
consistent fwd/rev pair within contigs" must be still on) and click
"Apply".  What a mess!  I believe most of this information is much
more easily understood by just showing the "consistent fwd/rev pair
depth" (the bright green graph described above).  But it is your
choice.  When you want to highlight a consistent fwd/rev pair, you
must point to the square--not the legs.  Try it so you understand.

12.73)  Suppose you have an assembly and there are some forward/reverse
pairs that you specifically do not want to see in the Assembly View
Window.  For example, perhaps they are from a plate that was misnamed
or from a library that is somehow less reliable.  By hiding these
forward/reverse pairs, the more reliable/important ones can more
easily be seen.  This is how you can do that:

In the "Which Fwd/Rev Pairs to Show in Assembly View" Window, notice
the line that says: Do not show templates in file
doNotShowInAssemblyView.fof

Underneath this are 3 buttons and probably the one that is selected is
"show all templates".  Try clicking "do not show specified templates"
and click 'Apply'.  See if you notice that anything changed in which
forward/reverse pairs are displayed.  If not, switch back and forth
between "show all templates" and "do not show specified templates",
each time clicking 'Apply'.  When you see a line that appears and
disappears, click on it to find what template it is.  For example,
djs736a2_fp04q146 is one such template.  Then from an xterm in the
assembly_view/edit_dir directory, type:

more doNotShowInAssemblyView.fof

You will see the names of the templates that are displayed/hidden.

In order to hide particular forward/reverse pairs, put them into
this file.  This file can also contain the character '*' which means
"match any characters".  For example, djs736a1_fp* would match the template

djs736a1_fp04q206

but not 

djs736a2_fp01q127


12.74)  Try turning on/off each of the Fwd/Rev Pair options so you
understand them.  (In this example, there are no "consistent fwd/rev
pairs between different scaffolds.")

12.75)  SEQUENCE MATCHES

Notice the curvy orange lines connecting Contig1 with Contig2 and
Contig3.  These show sequence matches.  Point at the one connecting
Contig1 and Contig2 and click on it.  A "Sequence Matches" box will
popup saying that this match has 120 bases and has a similarity of
90.8%.  Click on that line so its background turns black.  Then click
on the button "Show Alignment".  Up will pop the Compare Contigs
Window with the alignment shown in the lower half of this box.  You
will learn more about this later (see "JOIN CONTIGS").  For now,
dismiss this window.

12.76)  In the Assembly View Window, click on "What to Show" and then when
the menu pops up, click on "Sequence Matches".  In the "Which Sequence
Matches to Show in Assembly View" Window, try clicking off "ok to show
sequence matches between contigs".  Then click the "Apply" button.
You should see the orange lines disappear.  (Any highlighted lines
will not disappear.)  Click "ok to show sequence matches between
contigs" back on, and click "Apply" and the lines should be back.

12.77)  Also in the "Which Sequence Matches to Show in Assembly View"
Window, change the minimum similarity from 90 to 85.  Click "Apply".
You should see a lot more orange curvy lines, and now you should also
see black curvy lines.  If you look carefully, you will see that 2
lines within each pair of orange curvy lines do not cross each other
but the 2 lines within each pair of black curvy lines do.  This is
because orange is used to show direct repeats and black is used to
show inverted repeats (relative to the orientation of the contigs in
the Assembly View Window).

12.78)  Also in the "Which Sequence Matches to Show in Assembly View"
Window, click on "filter seq matches by size" and set the min size to
400 and the max size to some huge number (e.g., 1000000), leave
minimum similarity at 85, and click "Apply".  You will see just one
direct repeat (orange curvy lines) of size 746.

12.79)  Try some of the other ways of filtering the sequence matches on
"Which Sequence Matches to Show in Assembly View".


12.80)  You must learn this step if you are going to ever see sequence
matches with your own data, so don't skip this step.  If you have
problems, it is likely that the phrap or consed packages have not
been installed correctly and you will need help from your system
administrator.  Exit Consed and look at the files in
assembly_view/edit_dir.  

Notice there is a file: assembly_view.fasta.screen.ace.1.aview

This is what Consed uses to show sequence matches in the Assembly
View Window.

When you use your own data, you will not have this file so you will
need to learn how to create it.  Hide it from Consed by (in practice
you will never do this step--this is just to simulate the .aview file
not being there):

mv assembly_view.fasta.screen.ace.1.aview assembly_view.fasta.screen.ace.1.aview_hide


Now restart consed and select ace file
assembly_view.fasta.screen.ace.1

If you are asked if you want to apply edits, click the "No" button.

Click on "Assembly View" in the Consed Main Window.

You will get the error message:

"Sequence matches will not be shown in Assembly View because there is
no file
assembly_view.fasta.screen.ace.1.aview
If you want sequence matches to be shown, click on "What to show:
Sequence Matches" and then "run cross_match"

12.81)  RUNNING CROSS_MATCH FOR SEQUENCE MATCHES

Just as the instructions (above) say, click on "What to show" and then 
when the popup menu appears, click on "Sequence Matches" and then when 
the "Which Sequence Matches to Show In Assembly View" Window comes up, 
click on the "Run Cross_Match" button.

Watch the action in the xterm.  There should be several pages worth of
output from cross_match that scrolls by in the xterm.  If you get an
error, it is likely that the phrap or consed packages are not
correctly installed.  You (or your system administrator) should track
down the problems and correct them.

If you are successful, then 3 orange pairs of curvy lines will appear
in the Assembly View Window--the same as you saw in the steps above.

Note to advanced users:  It is also possible to have cross_match run
automatically using the consedrc resource:
consed.assemblyViewAutomaticallyRunCrossmatchIfNecessary: true
See CONSED CUSTOMIZATION below.

12.82)  PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES)

When the Assembly View Window indicates (using forward-reverse pair
information) that there is a misassembly, Consed provides the tools to
correct that misassembly: you can first pull out the the misassembled
reads from their current contigs into individual contigs, with a
single read per contig.  Then you can reassemble those new contigs
that each contain a single read.  Let's do this:


12.83)  In the Assembly View Window move your cursor so that the red
forward/reverse pair lines turn yellow.  You will be unable to get
them all yellow, but get as many as you can.  Then click with the left
mouse button.  A window labelled "Clicked Fwd/Rev Pairs" should appear
with a very long list of reads in it (around 53 reads).

12.84)  In the "Clicked Fwd/Rev Pairs" Window, click on the button labelled
"Pull out reads".  A window labelled "Put Reads into Their Own Contigs"
should appear.

12.85)  In the "Put Reads into Their Own Contigs" Window, select all of
the reads.  You can do that by clicking with the left mouse button on
the first read and then scrolling down to the bottom of the list of
reads, holding down the shift key and clicking with the left mouse
button on the last read.  (When a read is selected, its background
should be black.)  Click on the button "Remove Highlighted Reads".
The Assembly View Window will close and reopen after a few seconds and
will complain about not being able to show sequence matches.  Save the
assembly (see "SAVING THE ASSEMBLY" above) and follow the instructions
in "RUNNING CROSS_MATCH FOR SEQUENCE MATCHES" (above).

The assembly will now probably contain 4 contigs: 2-3-1c in one scaffold
and 4 in the other.  That is because when the misassembled reads were
pulled out of Contig1, it fell into two new contigs: the new contig 1
and contig 4.  All of the reads you pulled out have created Contig5,
Contig6, ... and approximately Contig58, each of which contain only a
single read.

12.86)  MINIASSEMBLIES

On the Consed Main Window, click the button "Miniassembly".  A box
will popup labelled "Reassemble Some Contigs".  On the left part of
the box will be all contigs, from Contig1 to about Contig58.  Notice
that starting with Contig5 will be contigs that contain only a single
read.  On the right will be Contig5 through approximately Contig58.
You add or delete from the list on the right.  For example, to delete
Contig5 from the list on the right, click on it, and then click "Clear
Highlighted".  The right list should now only contain Contig6 through
the last contig.  Add Contig5 back to the right list by clicking on
Contig5 in the left list and then clicking on the button labelled
"Move Highlighted to Right".  Contig5 will now appear at the bottom of
the list on the right.

12.87)  Leave all of these boxes blank: "-minscore", "-minmatch",
"-forcelevel", and "other phrap options:".  Keep "Put into separate
contigs" selected rather than "Discard from assembly".  Click the
"Reassemble" button.  If you haven't saved the assembly, a box will
popup saying "Error You must first save the assembly before making a
miniassembly".  Follow the instructions you learned above ("SAVING THE
ASSEMBLY") to save the assembly.  Then click the "Reassemble" button
again and watch the action in the xterm.  Lots of output from
determineReadTypes.perl, phrap, cross_match will scroll by in the xterm
as those programs run.  (If they don't, you haven't correctly
installed all of the phred, phrap, or Consed package.)

12.88)  When the miniassembly is complete, a box will popup asking 
"Are you finished miniassemblying these contigs?"  Click the "Yes"
button.

12.89)  On the Consed Main Window, click the "Assembly View" button.
Consed will complain about not being able to show Sequence Matches so
save the assembly and follow the instructions in "RUNNING CROSS_MATCH
FOR SEQUENCE MATCHES" (above).  In the Assembly View Window in
addition to Contig1, Contig2, Contig3, and Contig4, you should see a
few more contigs.  These are the result of the miniassembly of all
those individual reads.

Note to advanced users:  you can have consed automatically save the
assembly before miniassembly by using the consedrc resource:
consed.autoSaveBeforeMiniassembly: true
See CONSED CUSTOMIZATION below.

12.90)  HIGHLIGHTING READS TO REMOVE THEM FROM A CONTIG

Instead of using Assembly View to pull out reads, you can do so using
highlighting.  Restart consed as follows:

Exit Consed and then restart Consed.

Double click on "assembly_view.fasta.screen.ace.1"

(If a window pops up saying "There is an edit history file ( a .wrk
file )...", click the "No" button.)

12.91)  Double click on 'Contig2'.  
12.92)  Scroll to position 1670.  
12.93)  Click on the consensus T at this location.
12.94)  Point to the 'Highlight' menu, hold down the left mouse button and
release on 'Highlight reads with string at cursor'.

12.95)  A box will pop up labeled 'Highlight reads with string at cursor'.
There will be an input field labeled 'enter string (*'s will be
ignored)'.  Type a 'GG' into this field and click OK.  It will say
'string gg matched 3 reads at position 1670'.

12.96)  Let's see if there are any mates of these reads and highlight them
also.  Point to the 'Highlight' menu, hold down the left mouse button and
release on 'Highlight mates of highlighted reads'.  This will
highlight 3 more reads.

12.97)  To see that now 6 reads are highlighted, save the highlighted
reads in a file and examine the file.  (See HIGHLIGHTING READ NAMES
above.)

12.98)  Pull these reads out of Contig2 as follows: 
Point to the 'Highlight' menu, hold down the left mouse button and
release on 'Remove highlighted reads'.  Up will pop a box labeled
'Remove Reads'.   You will also notice that there are 6 reads in the
'Reads to be removed' column--the 3 that you highlighted and their
mates.  

There are many options for what to do with the reads and what to do
with the remaining contig.  See below under "REMOVING READS".  For
now, do not change any of the options.

12.99)  Click 'do it' to remove these reads.  

12.100)  On the Contig Main Window, click the 'Miniassemble' button.  You
will notice that the reads you just removed are in the 'Contigs to
Reassemble' column--all ready to miniassemble.

You've now seen two different ways that users select reads for
miniassembly:  the first was in Assembly View by selecting mate pairs
that are inconsistent.  The second was in the Aligned Reads Window by
selecting reads that are discrepant with the other reads (and the
consensus) at a location.


12.101)  CONTIG ARRANGEMENT--REORDER CONTIGS

Contigs are arranged by Consed into "scaffolds" using forward/reverse pair
information.  However, you might have some external information (such
as digest information) that tells you a different arrangement.  You
can use Consed to rearrange the contigs.  This new arrangement will be
preserved even if you reassemble.

12.102)  Exit Consed and then restart Consed.

Double click on "assembly_view.fasta.screen.ace.1"

(If a window pops up saying "There is an edit history file ( a .wrk
file )...", click the "No" button.)

Click on the "Assembly View" button.  You will see two scaffolds: one
on the top row with Contig2 and Contig3, and one on the bottom row
with just Contig1.  Now suppose that you believe that Contig2 and
Contig1 are connected together instead of Contig2 and Contig3.  To do
this:

12.103)  Within the Assembly View Window, click on the "Contig Arrangement"
 button.  Up will pop a menu.  Click on "Reorder Contigs".  A "Reorder
 Contigs" Window will pop up.  Enter the following information:

Contig: 2 [Right End] connected to Contig: 1 [Left End]

That is, you must enter "2" and "1" in the contig boxes, and you must
click on the first "right end" button.  

Then click on the "Add and Restart Assembly View" button.  A warning
box will pop up telling you that you are crazy, because there are 13
forward/reverse pairs as evidence that the scaffold as displayed in
the Assembly View Window is already correct.  Click on "yes"--that you
are sure.

Well, that isn't quite what you wanted.  Contig 2 and Contig3 are
still together.  So connected the other end of Contig1:

Contig: 1 [Right End] connected to Contig: 3 [Left End]

Then click on the "Add and Restart Assembly View" button.  A warning
box will pop up again.  Click on "yes"--that you are sure.

The Assembly View Window will disappear for a second and reappear,
with Consed2 and Contig1 connected together, just as you wanted.

12.104)  CONTIG ORIENTATION

Some users want a scaffold oriented a particular way.  For
example, one user might be working on a particular gene so wants to
always view the top strand of that gene.  Another user might be
finishing a BAC and wants a particular end of the BAC on the left of
the scaffold.  The assembly program, however, may not respect
their wishes and might have contigs complemented from the way the
users want to view them.  Consed provides a way for the user to
indicate his/her desired orientation, and thereafter if phrap
complements a contig from that desired orientation, Consed will
complement the contig back when Consed starts up.

To demonstrate this, exit Consed and then restart Consed.

Double click on "assembly_view.fasta.screen.ace.1"

In the Consed Main Window, double click on Contig1.  You will see read
djs736a2_fp02q494.y1 pointing left.  But let's suppose that you would
rather the Contig be in the other orientation, with read
djs736a2_fp02q494.y1 pointing right.  

In the Consed Main Window, click on Assembly View.  Then click on the
button labelled "contig arrangement".  When a popup menu comes up,
click on "Reorient Contigs".  The "Reorient Contigs Window" should
come up.  Highlight the scaffold labelled "1" under "Select a
scaffold".  Click on "flip scaffold".  Then push the button labelled
"Apply and Restart Assembly View".  There will be an error box
complaining about not being able to show sequence matches.  To fix
that, save the assembly and follow the instructions in "RUNNING
CROSS_MATCH FOR SEQUENCE MATCHES" (above).  In the Consed Main Window,
double click on Contig1 so the Aligned Reads Window comes up.  Scroll
to the right end.  You will notice that djs736a2_fp02q494.y1 is now on 
the right end pointing right.  

What is the difference between doing this and just complementing the
contig, which just requires the click of a button?  There isn't any,
unless you are going to reassemble the project with phrap.  In that
case the difference is that complementing the contig will be undone
the next time phrap runs (you reassemble), but using this procedure
will be permanent, even if phrap complements the contig.


12.105)  NAVIGATING


Earlier you learned "Search for String", "Highly Discrepant
Locations", and "High (or Low) Depth of Coverage Regions".  Consed
uses this same navigation system to go to many other types of locations.

If consed is running, exit it.

In this case we need a private copy of the dataset called "standard"
(see GETTING YOUR OWN COPY OF A SAMPLE DATASET above).

Type:

cd standard/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

Restart Consed.

Double click on standard.fasta.screen.ace.1

Double click on Contig1

In the Aligned Reads window, pull down the Navigate menu and
release on 'Low consensus quality'.  You will see a list of locations.
Move the 'Low consensus quality' window down so you can see the
Aligned Reads window.  

Repeatedly click on 'Next' until you reach the end of the list.  (Low
consensus quality means an area in which the consensus bases each have
too high probability of being wrong.)  This saves you from having to
look through large amounts of high quality data trying to find problem
areas.

There are 2 'Next' buttons--one on the Aligned Reads Window and one on
the Low Consensus Quality Window.  You can click on either, but it is
probably more convenient to use the 'Next' button on the Aligned Reads
Window.  Thus you can keep the Aligned Reads Window in
front with input focus and keep the Low consensus quality window
pushed out of the way.

You may want to click on the 'Save' button in the Low consensus
quality Window to save to a file a copy of this list of problem areas
as you work through them.

In our experience, this will be the most important navigate list you
will use if you are finishing to high accuracy.  In fact, finishing
partly consists mainly of adding reads and rephrapping until this list
is reduced to nothing.

12.106)  Dismiss the Low consensus quality window.  Pull down the
'Navigate' menu again and release on 'High quality discrepancies as
above, but omitting tagged compressions and G_dropouts'.  You will
probably notice there are no entries (unless you created some yourself
by editing).  That is because there are no high quality discrepancies
with this dataset.  So let's force there to be some by lowering the
quality threshold.  First, dismiss the High quality discrepancies
window.

Click on 'Find Main Win'.  In the Consed Main Window, pulldown the
'Options' menu and release on 'General Preferences'.  Notice that the
default for 'Threshold for High Quality Discrepancy' is 40.  Change it
to 15 and click 'Apply & Dismiss'.

Then follow the steps above to bring up the High quality discrepancies
menu.  Now you will see several entries.  Click 'next' repeatedly to
go successively to the next high quality discrepancy in the Aligned
Reads Window.

You can also double click on a particular line in the High quality
discrepancies window to go to that location.  Alternatively, you can
single click on a line and then click the 'Go' button.

Dismiss the High quality discrepancies window.

12.107)  There is also a way of getting such a list in *ALL* contigs: Click
the "Find Main Win" button just above the black area containing the
reads.  On the "Consed Main Window", point to the "Navigate" menu (at
the top) and release on "High Quality Discrepancies in All Reads."
This will give the same list as before.

In some assemblies there are hundreds, sometimes thousands, of
contigs.  It is much more convenient to search through all contigs at
once than to search them one at a time.

12.108)  Try navigate by tags by selecting 'tags' under navigate: when
the Select Tag Type Window appears, double click on 'compression'.
(Note that you can't do anything else until you deal with this
window.)  This gives a list of a particular tag type in all contigs.

You can also search for many tag types at once.  Do this by following
the instructions above but this time click on several tag types and
then click 'ok'.  (You will only find compression tags because there
are only compression tags in this assembly.  When you learn how to
create tags below, you will be able to find multiple tag types.)

To speed-up selecting multiple tag types, do this: Point to the
Navigate menu and release on "Tags in All Contigs".  Type 'heter' in
the box and click 'select'.  You will notice that all of the
heterozygote tags are selected.  Then click 'ok' to find them all.
(You will find none because there aren't any in this dataset.)

12.109)  CUSTOM NAVIGATION

In the Consed Main Window, Point to the
Navigate menu and release on Custom Navigation.  A box will pop up saying

'Select custom navigation file:'  

There will be a file:

custom_navigation.nav

Double click on it.

You will see the now-familiar custom navigation box.  Click 'Next'
repeatedly until you get to the end of the list.

This list of locations is chosen by some program other than consed.
Many labs write such programs themselves.  This allows a human to
quickly review the sites the program has chosen.  If your lab is
interested in writing such a program, see below under HOW TO WRITE A
CUSTOM NAVIGATION FILE.

12.110)  Other navigation lists.  Feel free to experiment.  Some of
these are only applicable to Sanger sequencing to high accuracy, or
are obsolete, or are only used for consed development.

Some terms: "Unaligned high quality regions" are regions in which the
read is high quality so there is no question of the bases, but the
region differs so much from other reads that the assembly or alignment
program has given up trying to align the region with the consensus.
(This could be due to a chimeric read, or perhaps the read belongs
somewhere else.)  "Unaligned reads" are reads that are totally
unaligned to the consensus and don't belong there at all.  "Edits"
refers to human-made edits.

"Search for Questionable Consensus Bases" is the favorite list of one
finisher for finding misassemblies, but I don't recommend it.

12.111)  TEAR CONTIG

Just so you get the same results as I do, exit Consed and bring it up
again using the original ace file

standard.fasta.screen.ace.1 

If it asks if you want to apply edits, just say 'no'.

      
12.112)  When the assembly program really screws up, you may want to
just tear the contig apart in several places and then join the pieces
back together in a different way.  Let's try it:

Double click on "Contig1" so that the Aligned Reads Window comes up.
Go to location 1500.  Point the mouse at the consensus base at 1500
and push the right mouse button down.  Release the button on 'Tear
Contig at This Consensus Position'.  You will notice that in the
Aligned Reads Window, 4 read names are now colored purple:
djs74-996.s2, djs74-2689.s1, djs74-564.s1, and djs74-2931.s1.  The
purple reads are consed's suggestions of which visible reads will go
into the new left contig.  The read's that are not colored purple will
go into the new right contig.  If you click on a read name in the
Aligned Reads Window, it will switch back and forth between purple and
not purple.  Leave everything as it is and just click 'Do Tear'in the
"Tear Contig" Window.  (If you want to play around with which reads
go into which contig, do that another time.)

Now you should have 2 Aligned Reads Windows on top of each other.  One
should contain 'Contig2' and the other 'Contig3'.  Dismiss the little
window that says 'Tear Complete'.

Don't do the following now--this is just for your information:

You can also tear contigs in batch, possibly tearing at multiple
sites.  This is done by typing:

consed -ace (ace file) -tearContigs (file of locations)

where the "file of locations" has the following format:

(contig name) (unpadded position to tear)
(contig name) (unpadded position to tear)
(contig name) (unpadded position to tear)
(contig name) (unpadded position to tear)
.
.
.


12.113)  JOIN CONTIGS

Now let's join these 2 contigs back together:


Click on 'Search for String' and type in the following bases:
agctgccatc

Click 'OK'. 

Search for string should find 2 locations, one in Contig2 and one in
Contig3:

Contig2     (consensus)     1447-1456  (uncomplemented)
Contig3     (consensus)     829-838    (uncomplemented)

Double click on the first one.  The Aligned Reads Window for Contig2
will scroll to location 1447 and the window will raise up.  In that
Aligned Reads Window, click on 'Compare Cont'.

Now double click on the 'Contig3' line in the above Search for String
results.  The Aligned Reads Window for Contig3 will scroll to location
829 and lift up.  In that Aligned Reads Window, click on 'Compare
Cont'.

Now the Compare Contigs Window should be visible.  In the Compare
Contigs Window, try scrolling back and forth.  You can change the
cursors (blinking red), but if you do, please return them to the
locations 1447 and 829 for the next step.  The cursors 'pin' these
bases together when doing an alignment.  (The algorithm is a pinned
and banded Smith-Waterman alignment.)

Click on Align.  Try scrolling the alignment by dragging the thumb in
the lower half of the Compare Contigs.  An 'X' means there is a
discrepancy between the 2 contigs.  There is also a 'P' (see if you
can find it!)  The P indicates the bases that you pinned together.

You will also notice that some bases are lighter and some are darker.
This indicates quality just as in the Aligned Reads Window.  You will
notice that wherever there an is a discrepancy (an 'X') one of the
bases is low quality.  This is your cue that the discrepancy is just a
base calling error rather than indicating that the two contigs really
are different but similar locations.

Click a few times on "Next Discrepancy."  Then click on "Prev
Discrepancy."  Notice that the red cursors in the Compare Contigs
Window moves to the next/previous X (discrepancy).  Then look what
happens in the 2 Aligned Reads Windows: as you move from X to X in
this manner, the Aligned Reads Windows scroll as well. 

In the Compare Contigs Window click with the left mouse button on
either contig in the bottom alignment.  You will notice that both
contigs will have the red blinking cursor in the same position.  Click
on 'Scroll Both Aligned Reads Windows' and look at the Aligned Reads
Windows to see that they scroll to the corresponding positions.

The number of discrepancies and discrepancy rate is also
displayed--find this.

Finally click the 'Join Contigs' button.  The 2 previous Aligned Reads
Windows will disappear and there will be a new one which has a new
contig 'Contig4'.  You have made a join!

Scroll left and right.  You will notice that many of the reads are
highlighted.  These are the reads that came from the previous "right"
contig.  To unhighlight all of these reads at once, point to the
"Highlight" menu, hold down the left mouse button and release on
"Unhighlight All Reads in All Contigs".

It is possible to have more than one Compare Contigs Windows up at a
time.  This allows you to investigate a repeat that has more than 2
copies.

There are several other ways of making joins that you will learn
later:  one uses the Assembly View window, one uses autoreport to make
a list of potential joins and then allow the user to review each of
them before making them, and one is completely automated with no user
review.


12.114)  COMPARE CONTIGS WINDOW AND INVERTED REPEATS

In the above example, we used the Compare Contigs Window to
examine a sequence match between two different contigs.  It is also
possible to use the Compare Contigs Window to examine a sequence
match between two copies of a repeat within the same contig, either
direct or inverted.  

12.115)  To see this, restart Consed:

consed
Double click on standard.fasta.screen.ace.1

When it says "There is an edit history file (a .wrk file)...Do you
want to apply those edits?", click on "no".

Double click on Contig1 to bring up the Aligned Reads Window.  Go to
position 69 (use the "Pos:" box described above).  Click the "Compare
Cont" button on the Aligned Reads Window.  The Compare Contigs Window
will popup, but move it aside.  Go to position 2035 in the Aligned
Reads Window.  Click the "Compare Contig" button again on the Aligned
Reads Window.  In the Compare Contigs Window there are two copies of
Contig1--one on top and one on the bottom.  Each has a "complement
just in this window" button.  Click on the bottom one (the one that
has position 2035 blinking red).  After clicking on it, you should
notice that the numbers on the bottom contig are reversed--they
decrease to the right--a copy of Contig1 has been reversed and
complemented.  Now click the "Align" button.  Suddenly, you should see
the alignment appear in the bottom half of the Compare Contigs Window.
You should see bases between 69-78 aligned against the reversed
complement of bases from 2026-2035.

This has shown how you explore an inverted repeat.  If you wanted to
examine a direct repeat, you would use the same method except you
wouldn't click on the "complement just in this window" button.

Compare Contigs is one method of exploring joins of contigs that were
not made by your assembly program.  Another method is to use the
Assembly View Window (above).  They are designed to work together: the
Assembly View Window gives a high level view of all sequence matches
and takes you to the Compare Contigs Window which shows the alignment
of a single sequence match and, if the user so desires, makes a join.

Dismiss the Compare Contigs Window.

12.116)  REMOVING READS

Above you saw how reads can be removed by highlighting them
"HIGHLIGHTING READS TO REMOVE THEM FROM A CONTIG" or by using Assembly
View ("PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES)").
Here you will learn other ways to remove reads:

You can remove individual reads and put them into their own
contigs.  For example, in the Aligned Reads Window, go to location
2000.  Point to the read name of read djs74_2664.s1 and hold down the
right mouse button.  Release on 'Remove read djs74_2664.s1 from this
contig.'  A window will pop up labeled "Remove Reads" with various
options:

Ignore the top part of the window for now--it is for a different
method of specifying the reads to be deleted.

Pay attention to these options, which specify what you want done with
the reads removed from this contig:

For this exercise, click on "Just Put Each Read Into Its Own Contig",
"If no reads in a contig location, break contig? (No)" and
"Recalculate bases/qualities in old contig where reads were removed?"
(Yes)

Then click "Do It" on the bottom.

Presto-chango!  The read is put into its own contig and the old contig
is redrawn without the read in it.  At this point you should save the
assembly--you should always save the assembly after removing reads.

12.117)  You can also remove many reads at once.

Look at the Consed Main Window.  Click on the "Remove Reads" near the
top.  Type into the "File of read names:" box "reads_to_remove.fof"
and either push the "Enter" key or click on "Read File".  You should
see a list of 2 reads:

djs74-2231.s1
djs74-3174.s1

You can specify any of the options discussed above.

Delete Reads from Assembly means that the read will no longer appear
in Consed.  When you are using your own data and you really want to
remove reads from the assembly, you must also use the UNIX "rm"
command to remove the corresponding phd files from phd_dir and the
chromatograms from chromat_dir, if applicable.  Otherwise, the next
time you run assemble (possibly by running phredPhrap), the reads,
like Phoenix, will rise again to become part of the next assembly.

Notice that you can also remove all reads in a particular contig.

There is also a method of removing reads from a script in batch
without using Consed's graphical interface.  See "consed -removeReads"
below.


12.118)  TAGS

Restart Consed so the dataset is in its original condition:

consed
Double click on standard.fasta.screen.ace.1

When it says "There is an edit history file (a .wrk file)...Do you
want to apply those edits?", click on "no".

Double click on "Contig1" to bring up the Aligned Reads Window.


12.119)  Middle mouse click on a read base (as you did above).  You
will see a Trace Window pop up.  (These are actual traces of a Sanger
read.)  

12.120)  Point at a base on the "edt" line, hold down the middle mouse button,
move the pointer so several bases turn yellow, and then release the
middle mouse button.  

12.121)  A list of choices will pop up.  Select 'Add Tag'.  Type in a
comment in the box at the bottom, and select 'comment' from the list
of tag types.  You will now see a blue box both in the Aligned Reads
Window and in the Traces Window on that read.

To see the comment, you can just point to it in the Aligned Reads
Window (without any clicking) and you will see the comment in the
lower right hand corner of the Aligned Reads Window.  Alternatively,
you can click on that blue tag in the Aligned Reads Window with the
right mouse button and release on 'Tag: comment Show more info?'.
Alternatively, you can click on the blue tag in the Traces Window with
the right mouse button.

Try creating some other kinds of tags: again swipe some bases in the
Trace Window by selecting a different tag type.  You will notice that
different tags are in different colors.  You can always use the
methods above to see the kind of tag even if you forget what a
particular color means.

Create a tag and enter for the comment 'lazy fox'.  Then in the Main
Consed Window, push down the left mouse button on 'Navigate' and
release on 'Search for tags/find string in comment'.  In the box,
enter 'fox' and click 'Search'.  The tag should appear in a navigation 
window.  In this manner, you can find (and go to) all tags with 'fox'
in the comment.

You can also define your own tag types.  See below CREATING CUSTOM TAG
TYPES for how to do that.

12.122)  CREATING LONG TAGS

You can create really, really long tags as follows: Just create a
short version of the tag as above for where you want the tag to start.
Then figure out the consensus position of where you want the tag to
end.  In the Aligned Reads Window, click on the short tag with the
right mouse button and release on 'tag: show more info?' (as above).
A Tag Window will appear for that tag.  In the Tag Window, simply
change the End Unpadded Consensus Position to the place you want it to
end.  Then click 'OK'.  You will now notice that the tag will be as
long as you wanted.

Users were unsatisfied with this method of making long tags (perhaps
because it isn't intuitive) so I implemented the following method:  

12.123)  In the Aligned Reads Window middle mouse click on
a read base (as you did above).  You 
will see a Trace Window pop up. 

12.124)  Point at a base on the "edt" line, hold down the middle mouse button,
move the pointer so several bases turn yellow, but DO NOT RELEASE THE
MIDDLE MOUSE BUTTON.  Instead move the pointer right until it is
outside the window entirely.  The read will start to scroll.  When it
is at your desired location, release the middle mouse button and
create a tag as before.


12.125)  CONSENSUS TAGS

You can create tags on the consensus in the same way.  In the
Aligned Reads Window, use the middle mouse button to swipe some bases
on the consensus in the Aligned Reads Window.  Up will pop a list of
tag types.  Click on one of them.  Try it again somewhere else.  Try
it with the tag type being 'comment'.  In this case, you must enter a
comment.  Notice the pretty colors!  If you forget which tag type a particular
color represents, just point at the colored tag with the mouse and the
tag type will be displayed at the bottom of the Aligned Reads Window.

12.126)  Try creating some tags that overlap each other.  You will notice
that the overlapping region will be purple.  If you want to know which
tags overlap, you can use any of the methods already discussed.


12.127)  WHAT THE COLORS MEAN

At this point, you should know which each of the following colors
means (the answer is further below--no peeking!):

Dark grey background of a base vs very light background of a base
Grey base with black background
Red base
Black base
Color area covering lower half of a base
Purple area covering lower half of a base


12.128)  SEARCH FOR READ NAME

Restart Consed using the original ace file

standard.fasta.screen.ace.1 

If it asks if you want to apply edits, just say 'no'.

Instead of clicking on a read or contig name, you can type a read name
(or just part of a read name).  For example, if you want to look at the
location containing read djs74-2689.s1, do the following:

12.129)  Type "2689"  into the "Find reads containing (*'s allowed):" box
and then push the "Enter" key. Consed will immediately bring up the
Aligned Reads Window with the cursor on read djs74-2689.s1.  Suppose
that there were more than one read that matched?  Try it.

12.130)  Type: "26" and then push the "Enter" key.  This matches 3 reads:

djs74-2689.s1
djs74-2679.s1
djs74-2664.s1 

What happened?

Try entering "26*9" and see what happens.  What does the "*" mean?

Try using the box below labeled "Find 1st read starting with:".  

12.131)  Type djs74-2 into this box.

You will notice that as you type each letter, the first item in the
list that matches the letters typed will be highlighted.  Experiment
with deleting a few letters and typing others.  This is a powerful
method of quickly getting to the read name you are interested in.
When you get to the name in the list, you do not have to type the rest
of the name--just type carriage return or else click on 'OK'.


12.132)  ONLINE DOCUMENTATION

On the Aligned Reads Window or on the Consed Main Window, click on
the 'Help' menu and release on 'Show Complete Documentation'.  You will see
this document.  You can search for keywords in it.  It is also on the
web.  Go to http://bozeman.mbt.washington.edu/consed/consed.html, and
find "complete documentation" near the bottom of the page.


12.133)  THE .WRK LOG FILE

Consed keeps a log of all changes you make to an assembly: adding
new reads, putting reads into their own contigs, making joins and
tears, adding and removing tags, and changing bases.  This log is kept
in a file ending with ".wrk".  You can use this file to help you
remember exactly what you did to an assembly.


12.134)  FINDING, DISPLAYING, AND MAKING POTENTIAL JOINS

I don't have a dataset for you that has some joins that need to be
made.  So I'm going to have you make one:

12.135)  cd to the assembly_view/edit_dir directory (we used this for
Assembly View above).

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

consed -ace assembly_view.fasta.screen.ace.1

12.136)  Double click on Contig2.

12.137)  In the Aligned Reads Window go to location 8300.  (If you followed
the instructions on the Quick Tour, you know a fast way to get there
rather than scrolling.)

12.138)  Point the mouse at the consensus base at 8300
and push the right mouse button down.  Release the button on 'Tear
Contig at This Consensus Position'.  You will notice that in the
Aligned Reads Window, some read names are now colored purple.  Leave
everything as it is and just click 'Do Tear'.

12.139)  Save the assembly: pull down the 'File' menu on the Aligned Reads
Window, and release on 'Save assembly'.  Remember what you name the
ace file (for this exercise, I'll call it
assembly_view.fasta.screen.ace.7).

12.140)  Exit consed.

Congratulations:  you now have an ace file with 2 contigs that need to
be joined!  Now I can show you how to make joins either semiautomated
or fully automated.

Put the following into your consedrc file (for information on how
to change the consedrc file, see EDIT PARAMETERS: HOW TO CHANGE
CONSED/AUTOFINISH PARAMETERS elsewhere in this document.):

consed.autoReportPrintPotentialJoins: true


12.141)  Then type the following:

consed -ace assembly_view.fasta.screen.ace.7 -autoreport

(where "consed" is replaced by whatever command brings up consed on
your system).

There will be a flurry of output ending with:

Total # pairs: 10000, size: 0.720 Mbytes; edges12: 0; # score_hists: 0, size: 0.000 Mbytes; # query_domains: 10000, size: 0.880 Mbytes; # query_datas: 10000, size: 0.240 Mbytes
Total # segment blocks: 0, size: 0.000 Mbytes
Total # diffs: 41, in 1 lists, size: 0.000 Mbytes

see assembly_view.101229.104216.out

where the 101229.104216 will be replaced by your current date and
time.  (Note to programmers: this filename will in the file auto.fof )

12.142)  Look at this file by typing:

less assembly_view.101229.104216.out 
(where assembly_view.101229.104216.out is replaced by whatever your
.out file is called) and type "G" to look at the bottom of the file.

The end of the file assembly_view.101229.104216.out will look like this:

printPotentialJoins {
ALIGNMENT do 697  97 Contig4 right 7933 8704 Contig5 left 1 775 U
ALIGNMENT matchNotToGap_discrepancy 51  77 Contig3 left 499 670 Contig5 right 3776 3951 U
} printPotentialJoins

Type "q" to get out of "less."

In a larger assembly, there will be many ALIGNMENT lines.

12.143)  Bring up consed again like this:

consed -ace assembly_view.fasta.screen.ace.7 -displayMatches assembly_view.101229.104216.out

(where "consed" is replaced by whatever command brings up consed on
your system and assembly_view.101229.104216.out is replaced by the
file displayed in the previous step).

The Consed Main Window will popup and then a window titled "Sequence
Matches" will also pop up.  In this case there will be the same two
alignments you just saw in the file.  Only one says "do".  "Do"
means that it is recommended.  Otherwise it will tell you why it isn't
recommended.  In this case it is for 2 reasons:  "matchNotToGap"--the
match doesn't extend all the way to the gap and "discrepancy" meaning
there are high quality discrepancies between the contigs in the
overlapping region.

12.144)  Double click on the "do" alignment line and the Compare Contigs Window
will pop up with the alignment displayed.  You can click "Join
Contigs" to make the join.

12.145)  USING CONSED -MAKEJOINS TO MAKE JOINS IN BATCH

Alternatively, you can make all of the recommended joins in batch:

12.146)  consed -ace assembly_view.fasta.screen.ace.7 -makeJoins
assembly_view.101229.104216.out 

(where assembly_view.fasta.screen.ace.7 is the ace file you created
after the tear above and assembly_view.101229.104216.out is the output
of autoreport above).

This will create a new ace file.  Bring it up in consed and you will
see that it will look just like the 3 contigs in the original ace file
assembly_view.fasta.screen.ace.1


12.147)  PROTEIN TRANSLATION 

If you would like, you can see the amino acid translation of the
consensus in all reading frames.  In the Aligned Reads Window, push
down the left mouse button on the 'Misc' menu and release on 'Show Top
Strand Protein Translation'.  Try again but this time release on 'Show
Bottom Strand Protein Translation'.  Notice that there are 2
characters that are in magenta color.  What are those characters?  Why
are they made in a different color?  To not show the protein
translation, push down the left mouse button on the 'Misc' menu and
release on 'Don't show protein translation'.

12.148)  OPEN READING FRAMES

You can search for open reading frames (a methionine followed by some
amino acids and then a stop codon all within the same reading frame)
within a contig.  In the Aligned Reads Window, push the left mouse
button on 'Navigate' and release on 'Search for Open Reading Frames'.
Notice that the open reading frames are shown for all 6 reading frames
and are sorted by length.


12.149)  DISPLAYING TRACKS (WIG and BED files)

This assumes you are still displaying standard.fasta.screen.ace.1

12.150)  In the Aligned Reads Window, point to the "Track" menu, hold down the
left mouse button and release on "Add Track".  A box will popup asking
if consensus position 1 corresponds to position 1 of the track file.
Click the "Yes" button, and the "Add Track Window" will appear.

12.151)  Click on the "Browse" button (upper left-hand corner of the Add
Track Window).  Another window will pop up.  Scroll the list of files
down to "wig_fixed.txt".  Double click on it.  This window will
disappear.

12.152)  In the the "Add Track" window, click "OK".  Now in the Aligned
Reads Window, you will see a yellow graph labeled "Coverage".  Try
scrolling left and right to see it.


12.153)  DISPLAYING GENE TRACKS (BED files)

You can also use Consed to display BED files for purposes such as
displaying genes.  For this exercise, you will need a different sample
dataset than the "standard" one we have been using so quit out of
consed.

Follow the instructions (above) GETTING YOUR OWN COPY OF A SAMPLE
DATASET in order to make a private copy of the dataset
gene_track_answer.

Type:

cd gene_track_answer/chrF_500/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

Restart Consed.

Double click on chrF_500.ace.1

Double click on chrF_500_45000

Point to the "Tracks" menu, hold down the left mouse button, and
release on "Add Track".  In the box that says "File containing track
information:" , type:

../../hgTablesFake.txt

and push the "enter" key.

In the box that says "Examples of sequences in this file:", you should
see "chrF" appear.

Do not modify any of the other fields.  

Click "OK" at the bottom of this window.

You should now see a large black area above a yellow horizontal line
above the consensus numbers.   Point to that area, hold down the right
mouse button, and you will see a menu.  The 2nd item from the bottom
should say "Go to next feature of track "tb_knownGene".  While still
holding down the right mouse button, move the pointer to point to this
item, and release the mouse button.

Now you will see a gene labeled "uc010ufi.2", a right arrow
(indicating this is a top strand gene), and a green horizontal line
(which is the 5' untranslated region).  There also is a message

more bed lines...to show, right mouse click and "change size of track"

Follow those instructions:  point to this track area, hold down the
right mouse button, wait for the menu to popup, and release on the
menu item "change size of track".

A window will popup labeled "Change Track Height".  Right below the
instruction "Move vertical slider to change track height:" is a
vertical slider.  Grab the slider and move it up and down and notice
how the #s in the box below change.  Move it so the number in the box
is around "120".  Then click "Apply" and then click "Dismiss".

In the Aligned Reads Window, you should now see 5 genes: uc010ufi.2,
uc010ufj.2, uc010ufk.2, ...

Point to this track area, hold down the right mouse button, and
release on "Go to next feature of track "tb_knownGene".  The window
should scroll to position 976 and two of the genes (the top and bottom
ones) should show amino acid translations within a thicker green
line.  This indicates the start of translation.  The thinner green
lines are untranslated regions.  The yellow arrows indicate these are
top strand genes.  

Again point to this track area, hold down the right mouse button, and
release on "Go to next feature of track "tb_knownGene".  The window
should scroll to position 1125.  Here you will see very thin lines to
the right, thiner than the untranslated regions to the left.  These
very thin lines are introns. 


12.154)  DISPLAYING TRACKS WITH SCORES (BED FILES)

(This assumes you have consed still up and displaying the BED file
hgTablesFake.txt from the preceding exercise.)

Point to the "Tracks" menu, hold down the left mouse button and
release on "Add Track".  

In the box that says "File containing track
information:" , type:

../../with_scores.bed

and push the "enter" key.

In the box that says "Examples of sequences in this file:", you should
see "chrF" appear.

Do not modify any of the other fields.  

Click "OK" at the bottom of this window.

Now you should see a second track appear.  Point to that second track,
hold down the right mouse button, and release on "Go to next feature
of track conservation".  You should see a grey bar.  As in the UCSC
Genome Browser, the grey scale corresponds to the score, with darker
meaning a higher score.


12.155)  FIXING CONTIG-ENDS

When you've added reads, consed does not automatically extend the
consensus using the new data so you end up with good quality reads
sticking out of the contigs.  In addition, the existing consensus
might be wrong and other reads near the ends of contigs may be
misaligned.  In the past, users have fixed this by pulling out reads
and rejoining them, and/or bring up traces and "change consensus".
This is a tedious process to fix hundreds of contigs ends.

We now have a feature that fixes contig ends in batch.  

For this exercise, use the dataset called "illumina_paired_answer".
Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above).  

cd illumina_paired_answer/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

12.156)  First examine the dataset:

consed -ace ref.ace.1

12.157)  Scroll between positions -70 and 0 and notice that the left end of
read "c_elegans_piece" is the left end of the consensus and there are
some reads sticking out the left end of the consensus.

12.158)  Scroll to position 1110 and notice that the right end of read 
"c_elegans_piece" is the right end of the consensus and there are some
reads sticking out the right end of the consensus.

12.159)  Scroll to position 501 and notice that all of the reads disagree
with the reference sequence.  This error in the consensus will be
fixed in the next exercise FIXING THE CONSENSUS IN BATCH.

12.160)  Exit consed and then type:

consed -ace ref.ace.1 -fixContigEnds

There will be lots of output ending with:

moving consensus tags...
deleting old contigs...
writing ref.ace.2
Wrote new ace file: ref.ace.2
See output in ref.ace.1.140203.160835.out  (you will have different
numbers here)

12.161)  Examine this new ace file:

consed -ace ref.ace.2

12.162)  Scroll to about position 71 and you will see that the consensus
extends far to the left of the left end of the read "c_elegans_piece".

12.163)  Click on a base of the read "c_elegans_piece", hold down the
control key, and type "e".  You should move to the right end of the
read c_elegans_piece, position 1170.  (If you don't understand these
instructions, just scroll to position 1170.)

You will see that the consensus extends considerably to the right of
the end of the read c_elegans_piece.

This is what has happened: phrap has reassembled the reads at each end
of each contigs, extending the consensus based on the consensus of
each little assembly.

When you are using your own data, if you don't want all ends of all
contigs reassembled, you can restrict it in 2 ways:

consed -ace (ace file) -fixContigEnds -contigEndsFOF desired_contig_ends.fof

where desired_contig_ends.fof is a file that looks like this:

Contig466 left
Contig466 right

You can also restrict fixing to contigs that have more contigs by
putting into your consedrc file the following (for information on how
to change the consedrc file, see EDIT PARAMETERS: HOW TO CHANGE
CONSED/AUTOFINISH PARAMETERS elsewhere in this document.):

consed.fixContigEndsMinNumberOfReadsInContig: 5

If you have a -contigEndsFOF, a contig end will only be done if it
also meets the minimum number of reads filter (above).

Note to old-timers:  do not use the following any longer:

consed.addNewReadsExtendConsensusUsingProtrudingNewReads: true

"consed -fixContigEnds" supercedes the above parameter.

12.164)  FIXING THE CONSENSUS IN BATCH

This is useful if either your assembler makes lots of errors in the
consensus or if the consensus is really a reference sequence from a
different genome and you want to use the reads to modify the
reference.

This exercise assumes that you have completed the exercise above
"FIXING CONTIG-ENDS".  Continue where you left off.

12.165)  Continue examining this ace file:

consed -ace ref.ace.2

12.166)  Scroll to position 571 in contig "c_elegans_piece" and you will see
the same location you looked at before in which the consensus has a c
(incorrect) and all of the reads have a G.

12.167)  Exit consed and, on the command line, type:

consed -ace ref.ace.2 -fixConsensus 

and a new ace file will be created. 

12.168)  Examine the new ace file:

consed -ace ref.ace.3

12.169)  Scroll to position 571 in contig "c_elegans_piece" and you will
see that the consensus is now a G with a blue tag on it.  Point at the
blue tag and it will say "automatedEdit" on the bottom line of the
window.

AutomatedEdit tags allow the user to bring up consed after running
consed -fixConsensus and rapidly review each changed location by
navigating to each automatedEdit tag.  (See NAVIGATING (above).)

Warning:  if you run both -fixContigEnds and -fixConsensus, you must
run -fixConsensus *after* running -fixContigEnds.


12.170)  HANDLING DUPLICATE READ NAMES


If you have an assembly that gives the following error message:

there is at least 1 (maybe lots) of reads with the same name.  For example, m130722_222134_00116_c100533042550000001823085711101382_s1_p0/103207

then run:

consed -ace (ace file) -renameDuplicates

This will cause reads to be suffixed with _d1, _d2, etc. so the names
are unique.  

Probably more than you want to know:  Base segments are eliminated.
RT tags are only retained if the read can be determined unambiguously.


12.171)  Answer to What the Colors Mean (above)

Greyscale of background indicates quality
Grey base with black background--clipped off part of read (either due
    to low quality or due to alignment)
Red base--discrepant with consensus
Black base--agrees with consensus
Colored area covering half of a base--tag (see Quick Tour) 
Purple tag--more than 1 tag covering a base


----------------------------------------------------------------------------

13.  BAM2ACE: MAKING A CONSED-READY DATASET OUT OF A BAM FILE


Note: Currently this feature is available for linux (32 and 64 bit)
and macosx-intel (but not ppc).  It is not available for solaris.

You've already seen how bamscape can view a bam file and then start
consed on a particular region.

If you know exactly the region of the bam file that you would like to
view with consed, you can also convert a BAM file (or part of a BAM
file) into an ace file that can be edited with consed.  To do this,
use:

for multiple bam files:
bam2Ace.perl -bamFiles (bam file fof) -regionsFile (regions file)

for a single bam file:
bam2Ace.perl -bamFile (bam file) -regionsFile (regions file)

where:

(bam file fof) looks like this:

1869.merged.sorted.nodups.realigned.bam
1871.merged.sorted.nodups.realigned.bam

and (regions file) looks like this:

BEGIN_SEQ_FASTA
chr1 /net/gs/vol2/shared/greenlab/genomes/hg19/chr1.fa
END_SEQ_FASTA
chr1 1653034 1653150
chr1 1654146 1654257
chr1 1634345 1634438
chr1 1634518 1634708


where the lines between BEGIN_SEQ_FASTA and END_SEQ_FASTA list the
sequences and which files they can be found in.  (E.g., chr1 is found
in /net/gs/vol2/shared/greenlab/genomes/hg19/chr1.fa)  The lines after
END_SEQ_FASTA give the sequence name, start position, and end position
of the regions to put into consed.  The start and end positions are
1-based respect to the sequences and are typically chromosome
positions.  If the sequence names have spaces in them, such as:

>gi|57116681|ref|NC_000962.2| Mycobacterium tuberculosis 

just use the first word (gi|57116681|ref|NC_000962.2| in this case)
since spaces are not allowed.

bam2Ace will not (by default) take all of your reads.  Typically the
depth of Illumina reads is in the thousands, which is unwieldy for
examination and finishing.  So bam2Ace runs shallowerDepth that takes
just the highest quality reads of each allele.  If you want all reads,
use this consedrc resource:

consed.bam2AceShallowerDepth: false

(See CONSED CUSTOMIZATION for more info about consedrc files.)

It is also possible to add consensus tags at defined positions in the
reference sequence.  You can do that like this:

23 10,000 20,000 tag: polymorphism 15,000 15,000 

where you have added a polymorphism consensus tag at position 15,000
of reference sequence 23.


For this exercise, use the dataset called "bamScape".

13.1)  Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above).

13.2)  Type:
cd bamScape

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

13.3)  Type:
bam2Ace.perl -bamFile reads.sorted.bam -regionsFile bam2AceRegions.txt

There will be a lot of xterm output ending with this:

read phd files in ../phdball_dir/phd.ball.1  found: 4 totals: used: 4 need: 759
read phd files in ../phdball_dir/phd.ball.1  found: 5 totals: used: 5 need: 759
read phd files in ../phdball_dir/phd.ball.1  found: 6 totals: used: 6 need: 759
read phd files in ../phdball_dir/phd.ball.1  found: 7 totals: used: 7 need: 759
read phd files in ../phdball_dir/phd.ball.1  found: 8 totals: used: 8 need: 759
read phd files in ../phdball_dir/phd.ball.1  found: 9 totals: used: 9 need: 759
Number of phd blocks used from ../phdball_dir/phd.ball.1: 759
Number of individual phd files read: 0
Total reads in assembly: 759
Finished setting quality values in 0 seconds 
writing new ace file bam2Ace.ace.1
writing bam2Ace.ace.1
cd /wd1/gordon/sunny/bamScape/consed1/edit_dir/


(the consed1 above might be consed2 or consed3 ....)


13.4)  In your output, look at the last line--the one starting with "cd "
and ending with "/edit_dir/"

Type that line.  You should now be in an edit_dir subdirectory.

13.5)  In your output, look at the 2nd to last line--the one starting
with "writing"

consed -ace (ace file)

where (ace file) is replaced by whatever ace file is on the 2nd to last line
of your output.

The "Consed Main Window" should popup and the "Contig List" should
have 2 contigs:  

23_10000_20000 (0 reads, 10,001 bps)
23_105000_115000 (759 reads, 10,001 bps)

What?  A contig with 0 reads?

13.6)  Double-click on that contig.

The "Aligned Reads" Window will popup and you will see that the
consensus line consists just of N's.  Scroll from one end to the other
and you will just see N's.  So now you know why no reads are aligned
to this region.  This can happen with real data, too.

13.7)  Dismiss this "Aligned Reads" Window.  Double click on the 2nd
chromosome (the one with 759 reads).

Another "Aligned Reads" window will pop up, but this time there are
plenty of reads.


13.8)  MAKING AN ACE FILE OUT OF AN ENTIRE BAM FILE

Making an ace file out of the entire reference sequence (rather than
just a targeted region), can be a little dangerous since the ace
file/phd ball might be enormous--too big for consed.  But if you know
this won't happen, there is a script to do it for you:

makeRegionsFile.perl myReference.fa
which takes a file (such as myReference.fa) with one or more reference
sequences in it, and produces a regions file suitable for bam2Ace.perl.
The regions file will make the regions the entire lengths
of the reference sequences.

The sample bam file in the bamScape dataset (above) provides a good
example for practicing doing this.

13.9)  Type this:  

cd ..
pwd

Do this several times until you are in the bamScape directory (the
path ends with "/bamScape") rather than the directory you were in for
running consed.

13.10)  Type:
makeRegionsFile.perl 23.fa

13.11)  Type:
ls -l
and notice there is a file 23Regions.txt

13.12)  Type:

bam2Ace.perl -bamFile reads.sorted.bam -regionsFile 23Regions.txt

There will be a flurry of output as before ending with roughly:

writing new ace file bam2Ace.ace.1
writing bam2Ace.ace.1
cd /wd1/gordon/sunny/bamScape/consed3/edit_dir/

13.13)  cd to the directory indicated on the last line of the output.

13.14)  Bring up consed with the ace file as indicated by the 2nd to last
line of the output:

consed -ace bam2Ace.ace.1

In this case there are 10,064 reads.  Try scrolling across the contig.


----------------------------------------------------------------------

14.  SANGER READS


For these exercises with Sanger reads, you must have some basic
knowledge of using consed, as learned in the first dozen or so steps
of QUICK TOUR OF CONSED (above).  After you've gotten that, continue
here.

14.1)  In this case we need a private copy of the dataset called "standard"
(see GETTING YOUR OWN COPY OF A SAMPLE DATASET above).

Type:

cd standard/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

Restart Consed.

Double click on standard.fasta.screen.ace.1

Double click on Contig1

14.2)  TRACES AND EDITING READS

Many of the Traces and Editing features apply to both Sanger reads and
Next Gen reads.  However, in the case of Next Gen reads, a fake trace
is created and displayed so all of the same editing features are
available.

Point with the mouse at a base of one of the reads and click with the
middle mouse button.  (If you do not have a 3 button mouse, see
MONITORS AND MICE FOR CONSED below.)  The Trace Window showing the
traces for that stretch of read should popup.

There are 2 rows of numbers:

'con' are the consensus positions
'rd'  are the read positions

There are 3 rows of bases in the trace window:

'con' is the consensus
'edt' is where you can edit the base calls of the read
'phd' is the original phred base calls

Notice that a red rectangle blinks (the 'cursor') in the corresponding
positions of the Aligned Reads Window and the Trace Window.


14.3)  Try editing in the Trace Window.  You can click the left mouse
button on a base in the 'edt' line to set the cursor (a blinking red
rectangle).  You can directly overstrike a base by typing a letter.
Try this.  Try undoing it by clicking on 'undo' which is near the
lower right corner of the Trace Window.  If you want to undo more than
one edit, you will have to go to the main Consed window and click on
the button labeled 'Undo Edit...'--see MULTIPLE UNDO EDIT elsewhere in
this document.  You can overstrike with the following characters: acgt
(bases), * (a pad, in effect deleting the base), and mrwsykvhdb (IUB
ambiguity codes).

You can move left and right with the arrow keys. (If this doesn't work,
your window may not have "input focus"--an X Windows issue.  To give a
window input focus, you can click within it.)

We believe that the user should change a Sanger base call only while
viewing the traces.  That is why editing is done here--not in the
Aligned Reads Window.

14.4)  You can insert a column of pads by pushing the space bar.  Try
this.  (You may need to click on a base on the 'edt' line first to
give the window input focus.)

(For those of you new to editing assemblies, a 'pad', which in Consed
and phrap is represented by the '*' character, is used to align
two or more sequences such as these:
     gttgacagtaatcta
     gttgacataatcta
in which one sequence has an inserted or deleted base with respect to
the other.  By inserting the pad character, it is possible to get a
good alignment: 
     gttgacagtaatcta
     gttgaca*taatcta
This is the purpose of pad character--it is just a placeholder.)

You can then overstrike a pad with a base.  In this way you
can insert a base, and still preserve the alignment.

14.5)  Try highlighting part of a read on the edt line by holding
down the middle mouse button and dragging the cursor over some bases.
They will turn yellow as you drag.  Then release the mouse button.  A
window will pop up giving you some choices of what to do with those
(yellow) bases.:


    Change Consensus--make the highlighted bases edited high quality and
        change the consensus to agree with that stretch of the read.
        This also is a directive to phrap (upon reassembly) to use that
        stretch of that read to be the consensus.
    Change to n's--Change the highlighted bases to n's which means
        they are unknown bases.  This also tells phrap (when it
        reassembles) to not make any join based on these bases.  It is
        useful when you believe the bases may be in the chimeric
        portion of a read.
    Change to n's to left--same as above but to left end.
    Change to n's to right--same as above but to right end.
    Change to x's to left--Change the highlighted bases to x's which
        means they are vector.  This also tells phrap to ignore these bases
        for the purpose of determining overlap.
    Change to x's to right--same as above but to right end.
    Add Tag--allows user to add any tag to a stretch of read bases.
    Dismiss--you decided you don't really want to do anything with
        this stretch of bases.

The following options are only relevant if you are using phrap to
reassemble (including miniassembly):

    Make High Quality--makes the highlighted bases edited high quality
        (99).  This tells phrap (when it reassembles) that you are
        sure of the sequence here.
    Make low quality--makes the highlighted bases edited low quality.
        This tells phrap (when it reassembles) that you are not sure
        of the bases here and phrap can go ahead and make a join even
        if the bases in this region don't match perfectly.
    Make Low Quality to Left End--same as above, but all the way to
        the left end of the read.
    Make Low Quality to Right End--same as above, but all the way to
        the right end of the read.


This popup is made so that nothing else works until you choose
something.  Try each of these choices, except for tags, which you'll
try below.  When you are done, dismiss this window.  If you don't,
you'll be sorry!  (Consed will freeze.)

'Change Consensus' has an additional function--if a read extends out
on the right beyond the end of the consensus, you can extend the
consensus by using this function.  You might want to do this, for
example, if your assembly program did not correctly find the cloning
site and thus clipped too much.  You can add these bases to the
consensus by using 'Change Consensus'.  Typically, the quality of
these bases in the read and in the consensus is 99.  That is so that
next time phrap runs, it will correctly extend the consensus.

However, if you aren't going to reassemble, you might want to just
leave the quality values the way the base-caller originally called
them.  You can do this by using a Consed parameter
(consed.extendConsensusWithHighQuality), which you will learn more
about later (see CONSED CUSTOMIZATION).


14.6)  To delete a base, overstrike it with a '*' character.  Even if
the consensus has many *'s in it, this is OK since when you export the
consensus (try the exercise on EXPORTING THE CONSENSUS), the *'s are
not exported.  While you are editing in Consed, we believe there
should be a visual indication that a base was deleted.


14.7)  SCROLLING TRACES AND ALIGNED READS TOGETHER

In the Aligned Reads Window, notice the yellow arrows between the
column of read names and the bases.  Some of these arrows are magenta,
which indicates its trace is up.  In the Aligned Reads window, scroll
along the contig to a different point but keep the read in view whose
trace is already up (its arrow is magenta).  In the Aligned Reads
Window click the left mouse button on a base of that read while
watching the Trace Window.  Notice that the corresponding trace window
instantly scrolls to the corresponding location.  Now go to the Trace
Window and scroll the traces to a new location.  In the Trace Window
click on the edt line with the left mouse button while watching the
Aligned Reads Window.  You will notice that the Aligned Reads window
will instantly scroll to the corresponding location.  Thus you can
keep the Aligned Reads window and the traces scrolled to the same
location.

14.8)  SHOW ALL TRACES

Go to base 2000.  Point to the consensus base, push down the right
mouse button, and release on 'Display traces for all reads'.  You will
see all traces displayed in a scrolling window.  You can drag the
scrollbar on the right down and up to see all the traces.  This
feature is particularly useful for polymorphism/mutation detection
work.  This feature was added to work in cooperation with polyphred.
(See CONSED-POLYPHRED intereaction below.)

In this Traces Window, point at one of the bases of one of the reads
and click with the left mouse button.  The base should start blinking
in red.  Now push the down arrow key on your keyboard.  The cursor
should move to the next read.  Repeatedly type the down arrow key.
Eventually the display should scroll so you can continue to see the
read the cursor is on.  Try the up arrow key as well.

If there are more than 100 traces at a position, you will see those
traces in batches of 100 traces.  You can use the bottons at the
bottom of the Traces Window labelled "prev 100 traces" and "next 100
traces" to move to the previous and next batches of 100 traces.

There is also a button at the top left of the Traces Window that changes
between "Show All Traces" and "Show Just Good Traces".  A "good trace" 
means a trace that is all of the following:

    * it has a base at the cursor location
    * there is no dataNeeded tag on the read 

    (this is customizable using the resource
    consed.showAllTracesDoNotShowTraceIfTheseTagsPresent: )

14.9)  PRIMER-PICKING

Go to position 2470 which is near the right end of the contig.
Click with the right mouse button on the consensus and click on
"top strand primer from subclone template".  Consed will pause a moment, and
then there will appear a selection of primers that pass all of
Consed's requirements.  (If you get an error message, Consed might not
have been correctly installed.  See INSTALLING CONSED above.)
Templates are also chosen for each primer.  You may have to scroll the
primer list to the right to see the templates.  Consed lists these
templates in order of quality--all of them will cover the read you
want to make.

14.10)  Double click on one of the primers in the Primers Window.  That
will cause the Aligned Reads Window to scroll to show that oligo in
context.  Click on 'Accept Primer'.  A comment box will pop up.  Enter
some comment and click 'OK'.  Notice that a yellow oligo tag, with a
little red end, is created on the consensus for that primer.  The red
end points in the direction of the oligo.

14.11)  Point to the yellow and press down the right mouse button and then
release on 'Tag: oligo ... show more info?'  A box will popup with
much information about the oligo--all you need to order that oligo and 
do the reaction.  Notice the field:  'Oligo name'.  The name should be 
something like 'standard.1'.

14.12)  If you can't find the oligo, you can find it again using its
name.  Scroll the Aligned Reads Window so you can't see the oligo
anymore.  In the Consed Main Window, point to the "Navigation" menu,
push down the left mouse button and release on "Search for oligo tags
by name".  A box will pop up saying "Search for Oligo Tags".  Enter
"standard.1" (or whatever your oligo name is).  Click "search".  The Aligned
Reads Window will scroll to the location of that tag.  

14.13)  To check whether the primer matches some other location in the
assembly, do the following: As before, in the Aligned Reads Window,
point to the yellow tag and press down the right mouse button and then
release on 'Tag: oligo ... show more info?'  A box will popup.  Click
on the button labelled 'search for oligo bases'.  

Note that Consed's primer picking will generally (there are some
exceptions) not pick primers that match to more than one location.
However, if you have added more information and/or reassembled since
that primer was picked, there now could be another location that the
primer matches to.

I would suggest you just accept the first primer in the list.
However, if you want to understand the differences, here is the
explanation (if you want more information, see Gordon 1998 listed in
the consed references).

-----matches----- min
self false vector qua
4       22    13  50

"4" is a measure of the primer's match to itself or another copy of
itself forming a loop or primer-dimer making it less available for
priming.  Bigger is worse.

"22" is a measure of a match to some other location (not the location
you want) on the template.  Bigger is worse.

"13" is a measure of the match to the vector sequence(s) that are in
the vector files.  Bigger is worse.  Typically the vector files are:
/usr/local/genome/lib/screenLibs/primerSubcloneScreen.seq
or
/usr/local/genome/lib/screenLibs/primerCloneScreen.seq
but there are consedrc parameters that allow these files to be some
place else.

"50" is the minimum consensus quality of the primer.  Bigger is better
because it gives you greater confidence that the primer sequence is
correct at the location you want to prime.

When picking primers (above), what is the difference between 'Pick
Primer from Subclone Template' and 'Pick Primer from Clone Template'?

There are 3 differences:  

A.  which vector file the primers are screened against.  In the former
case, the primer is screened against the file primerSubcloneScreen.seq
and in the latter case against the file primerCloneScreen.seq 

B.  In checking for false matches elsewhere in the assembly, if the
template is the whole clone, then Consed must check for false matches
in the *entire* assembly, including all other contigs.  But if the
template is just going to be a subclone, Consed only needs to check
elsewhere in that subclone.  Actually, to be conservative, Consed
checks for false matches +/- the maximum insert size of a subclone.

C.  If you are picking primers for subclone template, then the primer
picker can also pick the subclone templates.  If it doesn't find any
suitable subclone template, it will reject the primer.  (By default,
picking of subclone templates is turned on.  If you prefer to pick
your own templates, and want Consed's primer picker to be much faster,
you can turn it off temporarily or permanently.  To turn it off
temporarily, go to the Consed Main Window, point to the Options menu,
hold down the left mouse button and release on 'Primer Picking
Preferences'.  Scroll down to 'Pick Subclone Templates for Primers'
and click 'False'.  Click on 'Apply and Dismiss'.  To change this
permanently, see CONSED CUSTOMIZATION below.  Beware: you must
correctly customize determineReadTypes.perl for template picking to
work.  See INSTALLING CONSED above.)

If you are interested in the details of primer-picking, type:

consed -printDefaultResources

which will tell you the primer-picking parameters and what they do.


14.14)  CHECKING WHETHER A PARTICULAR OLIGO WOULD MAKE AN ACCEPTABLE PRIMER

You can check this as follows:

In the Aligned Reads Window, point to the 'Misc' menu, hold down the
left mouse button and release on 'Check Primer'.  Enter the left and
right consensus positions of the primer, check which strand, and
whether the primer is to use subclone templates or the whole clone as
a template.  For example, type 20 for left and 40 for right,
select "<-" (bottom strand) and subclone.  Then click "Check Primer".
A box "What is Wrong With This Primer" will pop up telling you what is
and is not acceptable about this primer.


14.15)  PICKING PCR PRIMER PAIRS

In the Aligned Reads Window, go to the location where you want to pick
the first PCR primer, base 500.  Point to the consensus, hold down
the right mouse button and release on 'Top Strand PCR Primer'.  Then
scroll to the location where you want to pick the second PCR primer,
base 2200.  Point to the consensus, hold down the right mouse
button and release on "Bottom Strand PCR Primer".  There will be a
pause and then there will be a list of PCR primer pairs.  Click on the 
first (top) pair and click "Accept Pair".  You will now see pcr
primers (in yellow with a red tip) at 404 and 2250. 

You can modify the parameters for choosing PCR primer pairs by going
to the Consed Main Window, pointing to "Options", holding down the
left mouse button, and releasing on "Primer Picking Preferences."  For
example, by default Consed does not display all PCR primer pairs--this
would take too long and give you too many.  However, you can ask it to
show you all such pairs.  In the Primer Picking Preferences, scroll
down to "Check All PCR Pairs (huge) or Just Sample?" and click on
"All".  Then click on "Apply and Dismiss".  Then pick PCR primers
again, as above.  Don't be surprised if you get 10,000 or more pairs
of primers!

(PCR Primers are screened for: melting temperature and length, the
melting temperature of the 2 primers must be sufficiently close to
each other, each primers must not stick to itself or to the other
primer, no mononucleotide repeats, only ACGT's (no n's or ambiguity
codes), and primer pair must not amplify any other location.  There
are many more details...)


14.16)  ORDERING OF PRIMERS

I heard of a finisher who manually ordered 72 primers.  She had to
cut/paste the bases of each primer.  That is not only painful, but
also error prone.  I've supplied you a script that you can use to save to a
file all primers that you have selected.

14.17)  The primers and are saved in the ace file when you exit consed, so
exit consed by clicking "quit" (not by clicking the X in the corner of
the window).  When it pops up a "warning--there are unsaved edits",
click on "Save Before Quitting" and click "OK" to whatever name it
offers, but remember that name.

14.18)  The script is ace2Oligos.perl.  Run it like this:

ace2Oligos.perl standard.fasta.screen.ace.3 oligo.txt

where standard.fasta.screen.ace.3 is replaced by whatever name it
offered (above) when you exited consed and oligo.txt is the name of
the file you want it to put the oligo in.  It looks like:

name=standard.1
sequence=ttattggcaattgggtga
template=clone
date=140204:155602 temp=56

name=standard.2
sequence=cactttggctttgattctgta
template=clone
date=140204:155602 temp=56


ace2Oligos.perl finds all oligo tags in the ace file and makes sure
that all of them are in this primer file.

ace2Oligos.perl does not record the comments that the finisher entered
when creating the oligo.  If you want to record that as well, you
could use the script ace2OligosWithComments.perl which was written by
a Consed user and thus is found in the 'contributions' directory.


14.19)  ADD NEW READS (SANGER--NOT ILLUMINA OR 454)

For this to work, your system administrator must have set up
everything correctly. (See below in INSTALLING CONSED.)  Assuming you
have set everything up correctly, you can now experiment with adding
reads.

If you have consed up, terminate it.  You should be in the 

standard/edit_dir

directory

(see the beginning of SANGER READS above).

Copy the new chromatograms into the chromat_dir
directory by typing (on the command line):

cp ../chromats_to_add/* ../chromat_dir

Bring it up consed again using the original ace file
standard.fasta.screen.ace.1 

If it asks if you want to apply edits, just say 'no'.

On the Main Window, click on the Add New Reads button.  There will
appear a list of files ending with .fof. These are files that contain
lists of chromatograms.  Double click on 'reads_to_add.fof'  (Accept
the defaults for the other options in this window.)

If you get an error message, look carefully at the full error message
in the xterm to diagnose the problem.  Probably there is some mistake
in how you installed Consed.  See INSTALLING CONSED (above).

There should be lots of progress output in the xterm from which you
started Consed.  When it completes, there will be a Reads Added Window
popup with a report of which reads were added.  In this case, it
should say that 9 reads were successfully added and list them.

When you are using your own data, you will need to create the file
reads_to_add.fof.  This file should contain a list of just the file
names of the reads/chromatograms you are going to add (not including
directory names). The chromatagram files to be added should be placed
in the ../chromat_dir and reads_to_add.fof should be put in edit_dir.

If your assembly doesn't already contain any chromatograms, and you
are adding them for the first time, a simple way to create the file
reads_to_add.fof is to first copy the chromatagrams to be added into
the (empty) ../chromat_dir and then type the following command (assuming you
are inside of edit_dir):

ls ../chromat_dir > reads_to_add.fof

The reads_to_add.fof can now be used following the instructions above.


14.20)  ADDING NEW READS IN BATCH (SANGER)

You've seen (above) how to add Sanger reads using consed's graphical
interface.  You can also add Sanger reads in batch.

For example, if you are sequencing the same region over and over and you have a
reference sequence, phrap may not be a good choice for creating an
assembly:  phrap will take a long time to run (since many reads match
each other), phrap may make several contigs when you know there should 
be only one, and phrap may not put all the reads into the assembly.
Consed provides an alternative to phrap.  

See how to add Sanger reads to a
reference sequence with the "standard" dataset.

14.21)  cd to standard/edit_dir directory 

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.  See above
under "SANGER READS".) 

14.22)  Follow the instructions under "ADD NEW READS" above including
cp ../chromats_to_add/* ../chromat_dir

14.23)  In edit_dir, look at the file "reads_to_add.fof" which contains
a list of the reads to be added.

14.24)  Run:

consed -ace standard.fasta.screen.ace.1 -addNewReads reads_to_add.fof -newAceFilename standard.fasta.screen.ace.20

When this completes, there will be a new ace file
standard.fasta.screen.ace.20 with all the reads added.

There will also be a custom navigation file that is named something like:

standard.070913.141632.nav

where 070913.141632 is the date and time so will be different for you.
(See CUSTOM NAVIGATION below.)  This will allow you to visually
find each added read in the assembly, if you so choose.

What Consed does is take each reads and try to align it against the
reference sequence.  It will thus attempt to make one contig with all
of the reads in it.  Some reads may not align very well against the
reference sequence.  In that case, you can tell consed what you want
to do by the following parameter in the consedrc file:

consed.addNewReadsPutReadIntoItsOwnContig: ifUnaligned

means that if a read does not match the reference sequence very well,
it will be put into its own contig.  (For information on how to change
the consedrc file, see EDIT PARAMETERS: HOW TO CHANGE
CONSED/AUTOFINISH PARAMETERS elsewhere in this document.)

consed.addNewReadsPutReadIntoItsOwnContig: never

means that if a read does not match the reference sequence very well,
it will not be put into the assembly at all.

consed.addNewReadsPutReadIntoItsOwnContig: always

means that each read is not even compared to the reference sequence,
but just put into its own contig.

Consensus quality values are not recalculated unless you put the
following into your consedrc file:

consed.addNewReadsRecalculateConsensusQuality: true


USING YOUR OWN DATA:

Create an edit_dir, phdball_dir, and chromat_dir as usual.  Put the
reference sequence, in fasta format, into edit_dir.

Type:

fasta2Ace.perl reference.fa -noread

This will create an ace file for an assembly that just contains the
single reference sequence as the consensus and has no reads.  Run
consed to view it and make sure you have followed each of these steps
successfully so far.

(If you have multiple reference sequences, put them all in
reference.fa and run fasta2Ace.perl just as shown above.  Each
reference sequence will be a separate contig.)

Then run "consed -addNewReads" as shown above with the standard
dataset.

(There is a little used feature to add a single polymorphism tag at a
defined position.  Examine fasta2Ace.perl for more information.)


14.25)  ADDING NEW SANGER READS IN BATCH TO TARGETED REGIONS

Suppose that you are trying to close a gap by sequencing on a PCR
product.  The read matches several other locations besides the edges
of the gap, but you know it goes at the edge of the gap.  You can
direct consed to put the read at the edge of the gap.

To do so, you construct a list of reads to be added (which are assumed
to be in ../chromat_dir).  On the line with each read is a region that
you want the read to go into, in this format:

(read name) (contig name) (start of region) (end of region)

I suggest that you make the (start of region) and (end of region)
bigger than necessary so there is little chance that the read will
protrude out of the region (which would be bad since the protruding
part of the read would be unaligned, even if it should be aligned).

Let's try doing this.

14.26)  If you are not already in standard/edit_dir (check with
'pwd'), cd to standard/edit_dir directory.

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

14.27)  Follow the instructions under "ADD NEW READS" above including
cp ../chromats_to_add/* ../chromat_dir

14.28)  In edit_dir, look at the file "reads_to_add_to_regions.fof" which
contains a list of reads and regions:

djs74-1455.s1 Contig1 1100 2500
djs74-1465.s1 Contig1 1000 2500
djs74-2282.s1 Contig1 500  1700
djs74-2712.s1 Contig1 1    2000
djs74-2861.s1 Contig1 1    2000
djs74-536.s2  Contig1 500  2000
djs74-568.s1  Contig1 500  2000
djs74-649.x1  
djs74-867.s1

Notice that the last 2 do not have contig positions--this indicates
that consed should look for the best alignment of these reads anywhere
in the assembly.

14.29)  Type:
addSangerReads.perl standard.fasta.screen.ace.1 reads_to_add_to_regions.fof

There will be a huge amount of output, which is mainly cross_match
trying to align the reads.  The output will end with something like this:

writing standard.fasta.screen.ace.6
See log file: standard.120718.170851.out

(where the numbers will be different)

14.30)  Look at the log file by typing:
less standard.120718.170851.out

(where the numbers will be different).

It should tell you that all the reads went into each region.

14.31)  Bring up consed on the new assembly:

consed -ace standard.fasta.screen.ace.6
(where the number might be different).

You will be able to see the newly added reads.

By default, reads that do not align are put into their own contigs.
If you prefer these reads not go into the assembly, set:

consed.addNewReadsPutReadIntoItsOwnContig: never


14.32)  CONSED-POLYPHRED INTERACTION TO REVIEW POLYMORPHIC SITES

This example applies not just to polyphred, but to any program that
would find for you particular positions on the consensus.

Polyphred is a program for finding polymorphic sites; it was developed by
Debbie Nickerson's group (contact them at http://droog.mbt.washington.edu).

We have a example database, 'polyphred', which has had polyphred run on
it already.  Polyphred has put a polymorphism tag on each polymorphic
site.  

If Consed is running, exit it.

For this exercise, use the dataset called "polyphred".
Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above).

Type:

cd polyphred/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

ls

Restart Consed.

Double click on example2.fasta.screen.ace.1

When Consed comes up, you should see 2 contigs.
Double click on Contig2

In the Aligned Reads Window, push the left mouse button while pointing
to the 'Navigate' menu and release on: 

'Toggle feature:  when navigating to consensus location, pop up all
traces (currently off)' 

That will turn this feature on.

Now point to the 'Navigate' menu, hold down the left mouse button, and
release on 'Tags'.  Up should pop a list of tag types.  Double click
on 'polymorphism'.  Polyphred has already been run so the consensus is
tagged with polymorphism tags at each polymorphic site.  Up will pop a
window labelled 'Polymorphism Tags' with a list of sites.  Click on
'Next'.

If you correctly followed the instructions above, all the traces should
pop up at the first polymorphic site.  You may want to reposition the
traces window to see it better.  

Now ignore the original 'Polymorphism Tags' window and instead click
on 'Next' in the *traces* window.  This will take you to the next
polymorphic site.  Pretty nice, huh?

Many labs write programs that apply tags to the consensus, and then
their staff uses consed to review those sites using the procedure
above.


----------------------------------------------------------------------------

15.  454 READS

15.1)  USING 454 READS (NEWBLER ASSEMBLY)


The Newbler Assembler and Consed work hand-in-glove together.  

For this exercise, use the dataset called "454_newbler".
Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above).

15.2)  Type:

cd 454_newbler/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

15.3)  Restart Consed

15.4)  Double click on "454Contigs.ace.1".  You will see 2 contigs in
       the list: 

       contig00001 
       contig00002

15.5)  Double click on contig00001 to bring up the Aligned Reads
       Window

15.6)  Using the thumb at the bottom, scroll from the far left of the contig
all the way to the far right to get an idea of the assembly.  (It is a
very small one.)

15.7)  In the Aligned Reads Window, scroll to position 246 and middle
mouse click on the t in read ERQJC7K01CLG7G (which is probably the
second to bottom read on the screen).

A "trace" (curves made of dotted lines) should pop up.  This trace is
fake (but it allows you to edit the bases).

Now terminate consed.  In the current directory (edit_dir), create a
file called "consedrc" with one line in it:

consed.storeTracePeakPositions: always

(If you don't know how to create files in unix, learn.  See CONSED
CUSTOMIZATION below.)

Restart consed and follow the steps above up to and including clicking
on the t in read ERQJC7K01CLG7G at position 246.

Now a different looking trace should pop up.  Rather than having
dotted curves, it should have vertical rectangles of different
colors. 

If this kind of trace does not pop up, there is an installation problem.  See
INSTALLING CONSED (above).  Look closely at the error message that pops
up and the error message in the xterm where Consed was started.  They
will indicate where Consed is expecting to find sff2scf and what the
problem is.

Unlike chromatograms from fluorescent sequencers, the spacing and
width of the peaks is meaningless, but the height of the peaks is the
actual intensity of the light emitted during each of the 454 cycles.
When the light intensity indicates that there is more than one base in
a row, instead of having a very tall peak, we break the peak up into n
peaks where n is the number of repeated bases.

15.8)  Look at the 3 "T" peaks at positions 244 through 246 ("con"
line) or 195 through 197 ("rd" line) in the traces window.  Notice
that the rightmost peak is higher than the others.  The reason for
this is that the intensity of the light emitted was not exactly three
times that of a single normal base, so we made the trace show the left
peaks as high as a standard peak and the height of the rightmost peak
is whatever amount of intensity is left over.  Look back in the
Aligned Reads Window and you will see that all other reads have 4 t's
instead of 3.  So the reason that the 3rd peak is higher than the
others is that there are probably 4 t's here instead of 3.

15.9)  In the Aligned Reads Window look at the t at unlabelled position
between 228 and 229 of read ERQJC7K01A7AUR (probably the 2nd read from
the bottom).  Notice that none of the other reads have a t at this
position, so this read may have a base-calling error.  Middle mouse
click on this t.  You will see in the Trace Window that the t peak is
shorter than a normal peak, confirming this suspicion.

15.10)  Let's take a look behind the scenes: Terminate consed and examine
the contents of ../chromat_dir (it should be empty), ../phd_dir (it
should be empty), ../phdball_dir (it should contain phd.ball.1), and
../sff_dir (it should contain reads.sff).  Since there are no files in
chromat_dir, there are no traces initially.  When you click to see a
trace, Consed runs the program sff2scf which creates the trace for the
read you are interested in by reading reads.sff (which has all the
intensity information of each base in each read).  Jim Knight of 454
corporation did a great job developing it.  Each time a 454 trace pops
up, tip your hat to Jim!  

Consed runs sff2scf for example like this:

sff2scf sff:-f:pairedreads.sff:ERQJC7K01C3R2X

where pairedreads.sff is the name of the sff file and EBE03TV02D2D4F
is the name of the read (without the _left or _right extension). It
will write the scf file into /tmp where Consed will read it.

15.11)  On the Consed Main Window is a button on the left labelled
"Assembly View".  Click it.  You will see 2 grey bars labelled
"contig00002c" (the "c" is for "complemented") and contig00001.  This
indicates that the left end of contig00002 is connected to the left
end of contig00001 (the right end of contig00002c is the left end of
contig00002).  Newbler has given Consed forward-reverse pair
information which Consed has used to determine this orientation of the
contigs--another great Jim Knight job.  You will learn more about the
other graphics here in the Assembly View section (below).


15.12)  USING 454'S NEWBLER ON YOUR OWN DATA

First you should run through the tutorial above so that you know
that everything works with my example dataset.  

15.13)  Run Newbler according to the 454 documentation using the -consed
option.

15.14)  Delete the consedrc file that Newbler creates in edit_dir--it is
intended for obsolete versions of consed and may cause problems with
the current version.

15.15)  Delete the phd.ball link in edit_dir--it is also intended for
obsolete versions of consed and may cause problems with the current
version.

15.16)  Check that the current version of sff2scf is the one to be used.

Type "sff2scf -v"
It should say "080721" (or later).  If instead it says 
       "Error:  Unable to open SCF file:  ../chromat_dir/-v", 
your version is old and should be discarded.  Use the new version that
comes with consed.


----------------------------------------------------------------------------

16.  USING AUTOPCRAMPLIFY

If you have a fasta sequence, and you want to amplify part of that
sequence using pcr, and you want to select a pair of PCR primers, you
can do that using Consed's autoPCRAmplify function.  It can handle
very high throughput: on a slow computer it takes about 5 minutes to
find PCR primers for a hundred different regions.

For this exercise, use the dataset called "autoPCRAmplify".

16.1)  Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above). 

16.2)  Then type:

cd autoPCRAmplify

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

16.3)  Type:

ls

You will see there is one file, brian.fa

16.4)  Look at what is in this file:

more brian.fa

You will see it looks like this:

>AP000527.C22.6.mRNA.primerRegion 1 70 81 150 smallest
ACAGGGCCCCTCGCGGGCCCTGACGCAGGATGGAGTTGAGGTGGGGGCAG
CGCTGGACCCCAGGGCCCCTNNNNNNNNNNTGCCGCAGTCTTGGATGATG
GGTTCCTAGAAGCTCTCAACATCTCTTCTTAATTGGAGAAAGTGTTAAGC
>AC004019.C22.4.mRNA.primerRegion 1 70 81 150 smallest
AGCTGTGAGCTGTGCAATCATGTAACTAACTTTGTTTAAGTATTGTTTAG
TCTTTCTGGTCTCCAGATGANNNNNNNNNNTCAGACATTCCACAGCTACC
TAGAGGACATCATCAACTACCGCTGGGAGCTCGAAGAAGGGAAGCCCAAC

The numbers 1 70 81 150 means that the left primer should be selected
from the region from 1 to 70 of the sequence (starting at 1), so the
primer should be chosen from the sequence:
ACAGGGCCCCTCGCGGGCCCTGACGCAGGATGGAGTTGAGGTGGGGGCAG
CGCTGGACCCCAGGGCCCCT

The 81 150 means that the right primer should be selected from the
region from 81 to 150 of the sequence, i.e. from within:

TGCCGCAGTCTTGGATGATG
GGTTCCTAGAAGCTCTCAACATCTCTTCTTAATTGGAGAAAGTGTTAAGC


"smallest":

 --------------------                   -------------------------
               --->                       <---


"biggest":

 --------------------                   -------------------------
   --->                                                    <---


The word "smallest" means that the primers should be chosen so that
the product is as small as possible.  That means that the left primer
should be chosen as far as possible to the right within the 1-70
region and the right primer should be chosen as far as possible to the 
left within the 81-150 region.  If we had instead put "biggest", the
primers would instead have been chosen to make the PCR product as
large as possible.

Notice that in the diagram above, I didn't make it look like this:

"smallest":

 --------------------                   -------------------------
                 --->                   <---

(the primers are at the very edge of the regions).  The reason is that 
in general, due to other checks on the primers, the primers that would 
make the absolute smallest product are not acceptable, and the primers 
must be backed up.  Similarly for "biggest".


16.5)  Then run the following:

amplifyTranscripts.perl brian.fa

(The name comes from the fact that this perl program was originally
developed to amplify cDNA transcripts.)

You should see a page or two of output flash by the screen, ending
with:


---------------------------------------------------------
working on AP000527.C22.6.mRNA.primerRegion
---------------------------------------------------------


working on transcript AC004019.C22.4.mRNA.primerRegion (2 out of 2...


---------------------------------------------------------
working on AC004019.C22.4.mRNA.primerRegion
---------------------------------------------------------


see files primers_unsorted.txt for primers and failures.txt for failures (if any)

16.6)  Look at the files just created in your directory.  

You should see "failures.txt" which should be empty.  And you should
see "primers_unsorted.txt" which should contain:

PRIMER_PAIR {
Region: AP000527.C22.6.mRNA.primerRegion  Product size: 67
AP000527.C22.6.mRNA.primerRegionf: AGTTGAGGTGGGGGCAGC temp: 64
AP000527.C22.6.mRNA.primerRegionr: CATCATCCAAGACTGCGGC temp: 63
}
 
 
PRIMER_PAIR {
Region: AC004019.C22.4.mRNA.primerRegion  Product size: 59
AC004019.C22.4.mRNA.primerRegionf: TTTAGTCTTTCTGGTCTCCAGATGA temp: 61
AC004019.C22.4.mRNA.primerRegionr: TCTAGGTAGCTGTGGAATGTCTGA temp: 60
}


This gives the primers.  The top strand primer is the one ending in
'f' and the bottom strand primer is the one ending in 'r'.  Both are
in 5' to 3' orientation, so the 'r' primer is reverse complemented
from the sequence in the original fasta file.

16.7)  To put these primers into 96 well format for ordering, type

orderPrimerPairs.perl no

You will see output like this:

> orderPrimerPairs.perl no
finished sorting
attachments:
/me1/gordon/sunny/autoPCRAmplify_answer/brian.fa "brian.fa", /me1/gordon/sunny/autoPCRAmplify_answer/primers_sorted.txt "primers_sorted.txt", /me1/gordon/sunny/autoPCRAmplify_answer/to_order.txt "to_order.txt"

Type 

ls

and see that there will be the following 4 files created:

to_order.txt, which is the primers in 96 well format, tab-separated so
    this can be easily imported into Excel
primers_unsorted_shorter.txt, which in this case is identical to 
    primers_unsorted.txt   (Don't ask.)
primers_sorted.txt, which has the same primers once again, but this
    time they are sorted by product size.  If you have thousands of 
    primers, you may want to run all the big ones together on the
    thermocycler, then all the next longest ones, etc.                    
primers081205.fasta (in which 081205 is the current date in YYMMDD)
    the same primers, YET AGAIN, but this time in fasta format, in
    case you want to use them in some other program that wants fasta
    format


----------------------------------------------------------------------------
17.  USING AUTOPRIMERS

This feature is used when you have a known template and you want to
choose custom sequencing primers to cover the template (both strands)
with roughly equally spaced reads.  It is assumed that the template is
an insert in a vector and that there are 2 universal primer sites in
the vector:  1 top strand and 1 bottom strand:


----------------iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii------------------
    --->                                                    <---
    a1: top strand univeral primer                          b1: bottom
                                                            stand universal
                                                            primer

        <------><-------------------------------><--------->
            A                     I                   B
                                        -->UP


where A is the distance from the end of universal primer a1 to the
insert and B is the distance from the end of the universal primer b1
to the insert and I is the insert length and UP is another (optional)
top strand universl primer that is always used..

Top strand reads cover both A and I.  Bottom strand reads cover both I
and B.

To use this feature, put your insert sequences into a fasta file and
run:

autoPrimers.perl (fasta file name)

You will need to edit autoPrimers.perl for your values of A, B, and
your target read length.

The program will attempt to make reads all roughly the same size and
as small as possible and not less than:

consed.autoPrimersMinReadLength: 500

A (above) is:

consed.autoPrimersLeftUniversalPrimerDistanceToInsert: 50

B (above) is:

consed.autoPrimersRightUniversalPrimerDistanceToInsert: 50

If you have an UP primer, then the distance from it to the end of the
insert is:

consed.autoPrimersDoNotTryToSequenceTopStrandThisManyBasesOnRightEndOfInsert: 198

I is calculated by consed and is the unpadded length of the consensus
sequence of the contig.

All consed.primers... resources are relevant, but the most commonly
modified ones are:

consed.primersMinMeltingTemp: 55
consed.primersMaxMeltingTemp: 60
consed.primersMinimumLengthOfAPrimer: 22
consed.primersMaximumLengthOfAPrimer: 40

The complete vector sequence must be put into a fasta file
cloning_vector.fa (for example) and this resource must specify its path:

consed.primersSubcloneFullPathnameOfFileOfSequencesForScreening: /net/grc/vol3/home/dgordon/consed_demo/moderna/cloning_vector.fa

If the vector is circular, this file must be the vector sequence
starting at the right end of the insert and continuing to the left end
of the insert.  People tend to want to write the vector sequence by
just removing the insert sequence and concatenating the sequence
before and after the insert together.  This will not work.


There are 4 resources that must always be left alone in
autoPrimers.perl:

   print filConsedrc "consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.0\n";
   print filConsedrc "consed.primersMinQuality: 0\n";
   print filConsedrc "consed.autoFinishAllowCustomPrimerSubcloneReads: false\n";
   print filConsedrc "consed.autoFinishAllowWholeCloneReads: true\n";


----------------------------------------------------------------------------
18.  USING AUTOREPORT

Autoreport is a command-line (non-graphical) method of running consed
to report information about the assembly.  


18.1)  VARIANTS REPORT

Let's try the consed.autoReportPrintHighlyDiscrepantRegions
feature.  

For this exercise, use the dataset called "solexa_example_answer".

18.2)  Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above). 

18.3)  Then type:

cd solexa_example_answer/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

18.4)  Type:

ls

You should see a file ref.ace.1

We will need the consedrc file in this directory with the following in
it:

consed.autoReportPrintHighlyDiscrepantRegions: true
consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality: 12

These options were explained above under "Search for highly discrepant
positions".

To create this file, do the following:

18.5)  EDIT PARAMETERS:  HOW TO CHANGE consedrc PARAMETERS

This section applies not only to autoreport, but also to autofinish,
autoedit, and customizing consed.

You can edit consedrc using an editor, such as pico, or you can do it with
Consed, which is far easier.  To do it with Consed, bring up consed as
follows:

18.6)  type:
consed -editConsedrc

A window should come up with many consedrc parameters.

18.7)  Find consed.autoReportPrintHighlyDiscrepantRegions.  You can
easily do this by typing in the "Find Parameter" box at the bottom
"printhighly" and click on "Find First".

18.8)  Click "True".  This item should turn red, indicating that it is
now different than the default value.

18.9)  Find 
"consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality"

As in the step above, in the "Find Parameter" box at the bottom, type
"ignorebases" which will be enough to find it.

18.10)  Change the default of 20 to 12.  (The default is actually better
for finding real variants, but there aren't any real variants with
this dataset so if you leave it at 20 you won't get any output.)

18.11)  At the bottom of the Edit consedrc Window, click "Just project".
Then click "save".  A box titled "Name of parameter file to write"
should pop up.  Click "OK".  That box will disappear and a box saying
"Note that these new parameters will take effect only after restarting
Consed/Autofinish" will popup.  Click "Dismiss" on that box and click
"Dismiss" on the "Edit consedrc Window".  All windows should disappear.

18.12)  Back on the command line, type:
ls -al

and you should see consedrc

18.13)  Type:
more consedrc

and see that it should contain just this:

consed.autoReportPrintHighlyDiscrepantRegions: true
consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality: 12

(Get in the habit of checking consedrc after using Consed's Edit
consedrc Window.) 

Why doesn't consedrc contain these others (below) as well?  See if you can
figure that out.

consed.navigateByHighlyDiscrepantPositionsMinDiscrepantReads: 2
consed.navigateByHighlyDiscrepantPositionsMaxDepthOfCoverage: 100000
consed.navigateByHighlyDiscrepantPositionsJustListIndels: false
consed.navigateByHighlyDiscrepantPositionsIgnoreOtherReadsStartingAtSameLocation: false


18.14)  Type:
consed -ace ref.ace.1 -autoreport


There will be a lot of output ending with something like:
see ref.ace.1.081211.160556.out

where 081211.160556 will be replaced by your current date and time.

18.15)  Type:
more ref.ace.1.081211.160556.out (replace this by the name of your file)

This file will contain a huge amount of output (listing the parameters
used in the run)--the important part is at the end:

printHighlyDiscrepantRegions {
Highly Discrepant Positions
min # of discrepant reads: 2 min quality: 12 "r": base of reference seq
max depth of coverage: 100000 and ignoring reference seq
  A           C           G           T           *              pos     contig
  2   8.0%   23  92.0%r   0   0.0%    0   0.0%    0   0.0%           56 ref
  3   9.1%   30  90.9%r   0   0.0%    0   0.0%    0   0.0%          252 ref
  2   6.9%   27  93.1%r   0   0.0%    0   0.0%    0   0.0%          256 ref
  0   0.0%    0   0.0%   20  90.9%r   2   9.1%    0   0.0%          682 ref
  0   0.0%    0   0.0%   31  93.9%r   2   6.1%    0   0.0%          715 ref
  2   4.8%   40  95.2%r   0   0.0%    0   0.0%    0   0.0%          742 ref
  2   8.7%   21  91.3%r   0   0.0%    0   0.0%    0   0.0%          936 ref
  0   0.0%    1   2.4%    1   2.4%   39  95.1%r   0   0.0%          982 ref
} printHighlyDiscrepantRegions

This output is explained above under "Search for highly discrepant
positions".

Programmers:  if you want to run this report automatically and have
the results parsed, there is also a file auto.fof which will contain the
name of this output file.

18.16)  In this case we need a private copy of the dataset called "standard"
(see GETTING YOUR OWN COPY OF A SAMPLE DATASET above).

18.17)  Then Type:

cd standard/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

18.18)  Type:
consed -editconsedrc

Follow the example above and this time make consedrc have only the
following two lines:

consed.autoReportPrintLowConsensusQualityRegions: true
consed.autoReportPrintSingleSubcloneRegions: true

18.19)  Run autoreport as follows:

consed -ace standard.fasta.screen.ace.1 -autoreport

(where "consed" must be replaced by whatever command your system
administer says to use).

You will see something like this:

> consed -ace standard.fasta.screen.ace.1 -autoreport
couldn't open readOrder.txt--that's ok
opened file standard.070918.162756.out for output
Now setting quality values
Number of individual phd files read: 24
Total reads in assembly: 24
Finished setting quality values in 0 seconds 
see standard.070918.162756.out

18.20)  Look at standard.070918.162756.out (where the 070918.162756 will
be replaced by the current date and time).  Scroll down to the bottom
(where the important information is) and you will see:

lowConsensusQualityRegions {
Contig1     (consensus)               1-83      base quality below threshold
Contig1     (consensus)              85-110     base quality below threshold
Contig1     (consensus)             113-117     base quality below threshold
Contig1     (consensus)             120-156     base quality below threshold
Contig1     (consensus)             159-166     base quality below threshold
Contig1     (consensus)             168-171     base quality below threshold
Contig1     (consensus)             185-187     base quality below threshold
Contig1     (consensus)             189-190     base quality below threshold
Contig1     (consensus)             192         base quality below threshold
Contig1     (consensus)             194-199     base quality below threshold
Contig1     (consensus)             269         base quality below threshold
Contig1     (consensus)             271-275     base quality below threshold
Contig1     (consensus)            2584-2591    base quality below threshold
} lowConsensusQualityRegions
singleSubcloneRegions {
Contig1     (consensus)               1-199     199 bp single subclone
Contig1     (consensus)            2588-2591        4 bp single subclone
} singleSubcloneRegions

This gives the low consensus quality regions and the single subclone
regions.


18.21)  If you want to specify an output file name (something other than
standard.070918.162756.out, you can do so by running autoreport like
this:

consed -ace standard.fasta.screen.ace.1 -autoreport -outputFile myVeryOwnFileName.out

where myVeryOwnFileName.out can be anything you want.


----------------------------------------------------------------------------

19.  FEATURES FOR SNP ANALYSIS

19.1)  FINDING SNPS IN CONSED USING THE METHOD OF Li, Ruan, and Durbin (2008),


19.2)  On the Consed Main Window, point to the 'Navigate' menu, hold down
the left mouse button, and release on 'Search for SNPs'.

Up will pop the SNPs window.  Here is a typical line:


sub het       37   27  30 A  G   1_53001_270001          83,976 30976

sub = "substitution polymorphism".  Other choices are del and ins for
deletion or insertion.

37 is the genotype quality
Ignore the 27.
30 is the read depth
A is the reference base
G is the alternate allele
1_53001_270001 is the contig name
83,976 is the reference position
30976 is the contig position

19.3)  FINDING SNPS IN BATCH

Put the following into consedrc:

consed.autoReportCalculateGenotypes: true

(see CONSED CUSTOMIZATION below).

Then run:

consed -ace (ace file) -autoreport

The SNP calls will be produced in VCF format.


19.4)  TAGGING A REFERENCE SEQUENCE

If you want to download known snp sites with, say 200 base pairs on
each side of the snp site, then fasta2Ace.perl can tag the reference
sequence(s) if you know the location of the snps.

You run it as follows:

fasta2Ace.perl (fasta file) -polymorphism 201

where 201 is replaced by whichever base position the snp is in each
sequence in (fasta file), which is a file of just the reference
sequences.

Then you run addNewReads (see README.txt that comes with
consed). After doing the alignments with addNewReads, you can find
each snp site by on the Consed Main Window/ Navigate / Tags in All
Contigs / polymorphism.  A box will popup with a list of all of the
polymorphism tags.  You can click "next" repeatedly to view each snp
site.


----------------------------------------------------------------------------

20.  BIONANO DIGEST GENOME MAPS

BamScape can be used with Bionano Genomics digest genome maps to help
find misassemblies by showing regions that are confirmed by the
Bionano digest, regions that disagree with the Bionano digest, and
regions that are unsupported.

20.1)  Review running bamScape by doing the complete exercise (above)
labelled "USING BAMSCAPE".  

20.2)  Terminate bamscape.  

20.3)  Start bamScape again by typing:
consed -bamScape -bamFile reads.sorted.bam -referenceFOF bamScapeReference.fof -bionanoXmapFile test.xmap -bionanoKeyFile  test.key 

As in the exercise USING BAMSCAPE (above) bring up the Reads vs Reference
Window for reference sequence 23.

You will now see an additional panel labeled "Bionano Digest".  This
graph has a vertical scale on the left labeled with numbers 0, 1, 2,
and 3.  You will also see thick vertical blue bars, and thin
horizontal bars of various colors.  

Some background: The user sequences and assembles the DNA, giving a
consensus sequence.  Bionano does an insilico restriction digest of
this consensus, giving a map with restriction sites and distances
between the restriction sites.  Independently of this, the Bionano
technology digests the physical DNA and finds the location and
distance between the restriction sites.  So this gives 2 restriction
site maps, including distances between the restriction sites.  Bionano
aligns these 2 maps.  Ideally, there would be a perfect match with
neither map having regions that cannot be found in the other.  In the
real world, not only are there regions not found in the other, but
there are also regions found in one that are found multiple times in
the other.  bamScape attempts to alert the user to these types of
problems.

Back to the bamScape display:  green (height 1) means that there is a single
location in the digest map matching this location in the sequence.
Any other # is bad:  yellow means there is NO location in the digest
map matching this location in the sequence.  3 means there are 3 (or
more) locations in the digest map matching this location in the
sequence map.  Typically this indicates that there really are 3 (or
more) locations in the DNA that have similar sequences...so similar
that the assembly program thought all of the reads were from a single
location and put them all together, commonly refered to as a
"collapsed repeat."  

The blue bars themselves generally indicate a problem--they indicate
that matching location between the 2 maps has terminated.  Why?

20.4)  Point to the leftmost blue vertical bar.  The status lines at
bottom will say:

no match segments in digest space to left of this one

This indicates that in digest space there is no matches further to the
left of this match.

Point to the next blue vertical bar and the status lines will say:

NEXT sequence region is 23 61,234-71,234 1000 bp away to right in digest space

Be aware that this sequence region includes the 3rd, 4th, 5th, and 6th
blue vertical lines (counting from 1 at the left).  The region between
the 4th and 5th blue lines is orange, indicating that this location
matches 2 different regions in digest space.


----------------------------------------------------------------------------

21.  LESS USED CONSED FEATURES


21.1)  CHANGING THE CONSENSUS IN BATCH ACCORDING TO A SCRIPT

[CHANGE CONSENSUS]

consed -ace (ace file) -changeConsensus (change file) 

where "change file" is a file with lines like this:

Contig21 28-30 x 

where Contig21 is the contig, 28-30 are the unpadded positions and x
is the new base.  

You can also specify the positions in padded positions like this:

Contig21 *35-*40 c

where 35 and 40 in *padded* positions.  You might prefer using padded
to unpadded positions if, for example, you are have some pads in the
consensus that you want to change to other bases.  unpadded positions
would not be useful because they only refer to non-pad bases.

21.2)  EXPORTING SCAFFOLDS [ EXPORT SCAFFOLDS ]

Go to assembly_view/edit_dir and type:

consed -ace assembly_view.fasta.screen.ace.1 -exportScaffolds scaffolds.fa

(where "consed" is replaced by whatever command brings up consed on
your system).

Look at scaffolds.fa

Contigs are separated by 50 n's.  You can change the 50 to some other
number by modifying the consedrc parameter:

consed.exportScaffoldsNsBetweenContigs: 50

If you prefer to export in fastq format, add the following to your
consedrc file:

consed.exportScaffoldsFastaOrFastq: fastq

This will give Sanger (+33) encoding of the fastq file.  If you prefer
Illumina (+64) encoding, you can add the following to your consedrc
file:

consed.solexa64FastqOrSanger33FastqForOutput: 64

However, this will cause a problem since assemblies often have bases
that are very high quality which will give non-printing characters
(e.g., 90 which will give 154 when Illumina-encoded).

By default, the contigs are not trimmed at the low-quality ends.  If
you want the low quality bases at the ends of contigs to be trimmed,
you can do that:

consed.exportScaffoldsTrimEnds: true

Note that the trimming looks for the maximal high quality segment of
bases quality 13 and above.  That means that if the left end of the
contig has bases that are quality:

17 21 9 9 9 9 9 9 13 11 11 11 11 20 20 20 ...

all of these bases to the left of the three 20's will be
clipped off.  The error probably of the 9's and the 11's is great
enough to overcome the 3 higher quality bases with 17, 21, and 13.

The quality threshold of bases to be kept is set at 13.  I suggest
leaving it there.  If you are determined to change it, modify
the consedrc parameter:

consed.exportScaffoldsTrimEndsQuality: 13


21.3)  ADDING PAIRED ILLUMINA READS USING CROSS_MATCH

If you have a reference sequence or an existing assembly, you can use
this procedure to align, using cross_match, additional paired Illumina
reads to that reference sequence or consensus of the assembly and make
the reads part of a new consed-ready dataset.


21.4)  Exit Consed and type:
cd illumina_paired/edit_dir

(This is not the same directory you were in for the examples above,
which was 
solexa_example_answer/edit_dir 
You may need to first type 
cd ../.. 
depending on which directory you are currently in.)

21.5)  Type:
ls

You should see ref.fa and fastq.fa

21.6)  Look at them:
more ref.fa
This is just the reference sequence in fasta format

more fastq.fof

This contains a pair of Illumina fastq files, the "1" reads in
paired_illumina1.fq and the "2" reads in paired_illumina2.fq

21.7)  Type:
ls ../solexa_dir

You will see the 2 Illumina fastq files:

paired_illumina1.fq  
paired_illumina2.fq


21.8)  First make sure you are still in illumina_paired/edit_dir by
typing:

pwd

which should say something that ends with:

illumina_paired/edit_dir

Convert the reference sequence into an assembly by typing:

fasta2Ace.perl ref.fa -noread

There should now be a file ref.ace in this directory.

To check that everything is fine, bring up Consed and double click on
"ref.ace"

You should see an assembly with 1 contig--the reference sequence
called 'c_elegans_piece'.  If you look at that contig in the Aligned
Reads Window (by double-clicking on it), you will notice no reads.
Terminate Consed.

21.9)  Type:
addSolexaReads.perl -ace ref.ace -fastqfof fastq.fof -fasta ref.fa

There will be a flurry of output from various programs ending with
something like this:

Inserting pads in contigs to accommodate insertions in new reads...Done
ending insertPadsInContigs 0
Inserting pads in reads and setting read bases...
writing new ace file ref.ace.1
writing ref.ace.1
See new ace file ref.ace.1
0.0 minutes cross_match and -phdBall2Fasta time
0.0 minutes consed time
0.0 minutes total time

If you instead get error messages, the Consed/cross_match package is
not installed correctly.  See INSTALLING CONSED (above).

21.10)  Type:
ls

You should now see the file ref.ace.1

21.11)  Start Consed and double click on ref.ace.1
Double click on the contig 'c_elegans_piece' in the Contig List.

The Aligned Reads Window will popup.  Scroll around a little.

If you have trouble with any of these steps, you can compare your
results to the correct result in the directory 
illumina_paired_answer/edit_dir (and
illumina_paired_answer/phdball_dir).

There is an option to fasta2Ace.perl that allows you to tag each snp
if you know its position.  See TAGGING A REFERENCE SEQUENCE (below).

There is also an option to addSolexaReads.perl to not all of the reads
but rather a subset of the reads.


21.12)  ADDING UNPAIRED ILLUMINA READS USING CROSS_MATCH

If you have a reference sequence or an existing assembly, you can use
this procedure to align, using cross_match, additional unpaired Illumina
reads to that reference sequence or consensus of the assembly and make
the reads part of a new consed-ready dataset.

21.13)  Exit Consed.

For this exercise, use the dataset called "solexa_example".

21.14)  Make a copy (so you can modify it all you want) following the
instructions GETTING YOUR OWN COPY OF A SAMPLE DATASET (above). 

21.15)  Then type:

cd solexa_example/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

21.16)  Type:
ls

You should see ref.fa and solexa_files.fof

21.17)  Type:
more ref.fa 

This just contains the reference sequence in fasta format

21.18)  Type
more solexa_files.fof

It contains just one line for each fastq file.  We have
just one such file so there is just one line:

solexa_reads.fastq

where solexa_reads.fastq is a solexa fastq file (note that "solexa
fastq" is different than normal fastq--don't mix them up). 

21.19)  Type:
ls ../solexa_dir

You will see this file:
solexa_reads.fastq


21.20)  First make sure you are still in solexa_example/edit_dir by
typing:

pwd

which should say something that ends with:

solexa_example/edit_dir

Convert the reference sequence into an assembly by typing:

fasta2Ace.perl ref.fa

(This is the old method that doesn't use -noread  The newer method
using -noread creates an ace file without any reads.  In both cases
the consensus matches ref.fa)

There should now be a file ref.ace in this directory and a file
phd.ball.1 in ../phdball_dir

To check that everything is fine, bring up Consed and double click on
"ref.ace"

You should see an assembly with exactly 1 read--the reference sequence
called 'ref'.  Terminate Consed.

21.21)  Type:
addSolexaReads.perl -ace ref.ace -fastqFOF solexa_files.fof -fasta ref.fa

There will be a flurry of output from various programs ending with
something like this:

Inserting pads in contigs to accommodate insertions in new reads...Done
ending insertPadsInContigs 0
Inserting pads in reads and setting read bases...
now saving assembly... 0
writing ./ref.ace.1
See new ace file ref.ace.1
done 0
See log file: ref.080627.111305.out
0.0 minutes to make fasta files
0.0 minutes cross_match time
0.0 minutes consed time
0.0 minutes total time


If you instead get error messages, the Consed/cross_match package is
not installed correctly.  See INSTALLING_CONSED (above).

21.22)  Type:
ls

You should now see the file ref.ace.1

21.23)  Start Consed and double click on ref.ace.1
Double click on the contig 'ref' in the Contig List.

The Aligned Reads Window will popup.  Scroll around a little to
convince yourself that you have created exactly the same assembly as
you used in the exercises above under "USING ILLUMINA READS".

There is an option to fasta2Ace.perl that allows you to tag each snp
if you know its position.  See TAGGING A REFERENCE SEQUENCE (below).


21.24)  ALIGNING ILLUMINA READS USING CROSS_MATCH AGAINST A LARGE
GENOME AND SELECTING A SMALL REGION FOR VIEWING WITH CONSED

In many applications, you will want to align your Illumina reads against
a large genome (such as the human genome) even though you are only
interested in some part of that genome.  For example, you might only
be interested in reads that do map *best* to the region of interest,
and thus you must map them against the entire genome to be sure they
don't match better to some other location.

Consed handles this by allowing you to run cross_match against a large
genome and then allows you to specify certain regions of interest to
view with consed.  This exercise shows you how to do this.  

21.25)  Type:
cd selectRegions/edit_dir

(You may need to first type cd ../.. depending on which directory you
are currently in.)

ls

You will see:

solexa_files.fof which is a list of the Illumina Gerald fastq
files.

refs.fof which is a file of filenames of the reference sequences.
Typically refs.fof will be a list of the fasta files of the genome
such as the human genome, but in this case it contains just one
filename, ref.fa which is a small fasta file.  (I made it small so it
doesn't take you too long to download consed.)  

regions.txt is a file specifying the regions that you are interested
in.

21.26)  Run:
alignSolexaReads2Refs.perl solexa_files.fof refs.fof my_alignments.fof

(my_alignments.fof will be created by this program--it could be any
name)

The last line should say "see my_alignments.fof"

At this point you have aligned all of the Illumina reads against the
reference sequence ref.fa (If you want to see those alignments, look
in my_alignments.fof and then look in the file in my_alignments.fof,
and then look through the pages of output for the ALIGNMENT lines.)

Now suppose that we are interested in the following two regions:

Bases from 1 to 100, and from 901 to 1000.  

21.27)  type:
more regions.txt

This indicates to consed that you are interested in these 2 regions.
It also shows the path of the fasta file (ref.fa) which contains the
sequence ref.

In this case there are only 2 regions specified, but there is no
reason, with your own data, that you couldn't specify thousands of
regions.


21.28)  type:
selectRegions.perl regions.txt my_alignments.fof my_new_ace.ace

The last line of the output should say something like:
writing  my_new_ace.ace.2

21.29)  type:
consed -ace my_new_ace.ace.2

(where "consed" is replaced by whatever command brings up consed on
your system).


You should see 2 contigs:  ref_1 and ref_901 which are the 2 regions
specified.  There should be a total of 213 reads--211 Illumina reads and
2 fake fasta file reads.  Bring up the Aligned Reads Window and scroll
around a bit.  

In the contig ref_901 you will notice that the left end of the contig
is numbered 901--the consensus numbers refer to the positions in the
original reference sequence.  Thus if your reference sequence were,
for example, a chromosome, the numbers would be chromosome positions.

If you would like, you can see the consensus numbers starting at
position 1.  To do this, point to a "Misc" menu, hold down the left
mouse button, and release on "Turn On/Off User-Defined Consensus Scale
Numbers".

For those of you interested in what is happening behind the scenes,
you might want to look at the files in phdball_dir:  phd.ball.1
contains all 867 of the Illumina reads, phd.ball.2 contains the 2 fake reads
representing the 2 regions, and phd.ball.3 contains just the 211
Illumina reads that align to the 2 regions. my_new_ace.ace.2 tells
consed that it only needs to read phd.ball.2 and phd.ball.3  Also note
that this uses cross_match as the alignment engine.


21.30)  ALIGNING YOUR OWN ILLUMINA DATA USING CROSS_MATCH TO A
REFERENCE SEQUENCE OR CONSENSUS OF AN ASSEMBLY

You first must complete the exercises above using the example Illumina
data so you are confident you are doing the process correctly.  

After you have done the exercises above, create a project directory
with subdirectories:

phdball_dir
edit_dir
solexa_dir
phd_dir

21.31)  Put the Gerald fastq or Bustard files (pairs of *_seq.txt and
*_prb.txt) into solexa_dir (or you could use links or you could even
make solexa_dir itself be a link).

21.32)  In edit_dir, make a file myFiles.fof just like solexa_files.fof
in the solexa_example dataset described above.  

If you have paired reads, there should be 2 filenames on each line
like this:

file1 file2
.     .
.     .


one file for the 1st reads of each pair and one file for the 2nd reads
of each pair.  There must be perfect correspondence--the nth line of
file1 must correspond to the nth line of file2.

21.33)  Make a fasta file myFasta.fa in edit_dir containing the reference
sequences.  

(Note that addSolexaReads.perl is written such that lowercase bases in
the reference sequences are assumed to be repeats and matches of
Illumina reads to such regions are pretty much ignored.  If your
reference sequence were totally lowercase, you would get no
matches--bad.  If you instead want to not ignore matches to lowercase
regions, you must modify addSolexaReads.perl by removing the words
"-repeat_screen 2".  See phrap.doc which came with the
phrap/cross_match.  It is generally good to use repeat_screen with a
repeat-screened reference sequence since it greatly speeds up
cross_match and a match to within a repeat doesn't mean much.)

21.34)  Convert the fasta file to an assembly by typing
fasta2Ace.perl myFasta
This should create myFasta.ace

21.35)  Run addSolexaReads.perl like this:
addSolexaReads.perl -ace myFasta.ace -fastqFOF myFiles.fof -fasta myFasta.fa

This should create myFasta.ace.1 which should contain all of the
aligning Illumina reads.

Consensus quality values are not recalculated unless you put the
following into your consedrc file:

consed.addNewReadsRecalculateConsensusQuality: true

(For information on how to change the consedrc file, see EDIT
PARAMETERS: HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS elsewhere in
this document.)

21.36)  If you only want to add some of the reads in the fastq files, put
the names of those reads into a file readsList.txt

Then run:

addSolexaReads.perl -ace myFasta.ace -fastqFOF myFiles.fof -fasta myFasta.fa -readsList readsList.txt

readsList.txt can also include the approximate contig locations of
each read.  Only that region of the contig will be searched to find
the alignment of the read.


21.37)  USING 454 READS (ALIGNING WITH CROSS_MATCH TO REFERENCE SEQUENCE )

Consed/cross_match can quickly align 454 reads to an existing
reference sequence.  You start with a fasta file of the reference
sequence and the 454 sff files.  You end up with an assembly with all
of the 454 reads aligned against the reference sequence.

To do this:

Exit Consed and type:
21.38)  cd align454reads/edit_dir

(You might need to type "cd ../.." first depending on which directory
you are currently in.)

ls


You will see 2 files:

reference.fa which contains the reference sequence
sff.fof which is an fof file referring to the 454 sff files

(Note that add454Reads.perl is written such that lowercase bases in
the reference sequence are assumed to be repeats and matches of 454
reads to such regions are pretty much ignored.  If you instead want to
not ignore matches to lowercase regions, you must modify
add454Reads.perl by removing the words "-repeat_screen 2".  See
phrap.doc which came with the phrap/cross_match.)

You might notice that there also is a align454reads_answer directory
parallel to the align454reads directory.  This contains the files that
you should get if you correctly follow this exercise and have Consed
correctly installed.  You can refer to it for troubleshooting.  Also
see INSTALLING CONSED (above).


21.39)  Convert the reference.fa file into an assembly by typing:
fasta2Ace.perl reference.fa

There should now be a file reference.ace in this directory and a file
phd.ball.1 in ../phdball_dir

To check that everything is fine, bring up Consed and double click on
"reference.ace"

You should see that this is a assembly with exactly 1 read--the
reference sequence.  Terminate Consed.

21.40)  Type:
add454Reads.perl reference.ace sff.fof reference.fa

There should be lots of output to the screen and no error messages and
it should complete in a second or two.

Type: ls

Now you should see reference.ace.1

21.41)  Bring up Consed and double click on "reference.ace.1"

Then double click on contig "myreference" to bring up the Aligned
Reads Window.  Scroll around a little and middle mouse click on a read
or two to see the trace.


21.42)  ADDING ADDITIONAL 454 OR ILLUMINA READS USING CROSS_MATCH
(YOUR OWN DATA)

You can add additional 454 or Illumina reads to an existing assembly.
It doesn't matter whether the existing assembly is 454, Illumina, or
sanger.  

To add 454 reads, use:

add454Reads.perl (existing ace file) (fof of sff files) (fasta)

(This will add all of the reads in the sff files.)

To add Illumina reads, use:
addSolexaReads.perl (existing ace file) (fof with prefixes) (fasta)

In both cases the fasta file must precisely match the consensus of the
existing ace file.

If you want to add just a few 454 reads (not all of them in the sff
file), and you know the names of the 454 reads you want to add, you
can use a different method:

In edit_dir, create a file (e.g., reads.fof) that contains the names
of all the 454 reads you want to add.  To find all reads in an sff
file, type:


sffinfo -a ../sff_dir/my454.sff >reads.fof

where my454.sff is the sff file.

Create a ../phd_dir directory and a ../chromat_dir directory.

Run:

sff2scfAndPhd ../sff_dir/my454.sff reads.fof

This will create both scf files in ../chromat_dir and phd files in
../phd_dir.

Then run either "add new reads" from within consed (see ADD NEW READS
below) or automated add new reads (see ADDING NEW READS IN BATCH
(SANGER) (below)).


ILLUMINA AND 454 DATA--WHAT IS HAPPENING BEHIND THE SCENES

454 data comes in sff files (in sff_dir)

1. sff files --> phdballs (in phdball_dir) via the program 
consed -sff2PhdBall 

consed -sff2PhdBall calls a perl script
filter454Reads.perl 

filter454Reads.perl runs cross_match to find, within each read,
sffLinkers.fa which is the 454 linker sequence (if any) that separates
the _left and _right 454 read.  It also looks for puc19 contamination
(filter454Reads.fa).

2. phdballs  --> *.fa fasta files (in edit_dir) via the program 
          consed -phdball2fasta
3. fasta files --> *.cross alignments (in edit_dir) via the program cross_match
4. alignments and phdballs --> ace file (in edit_dir) via the program
           consed -addReads 

All of the above steps are run by add454Reads.perl

For Illumina data, all of the steps are the same, except for the 1st step:

Illumina data comes in *.fastq files or *_seq.txt and *_prb.txt
       "Bustard" files (in solexa_dir)
1. Illumina files --> phdball (in phdball_dir) via the program
        consed -solexa2PhdBall (fastq fof) -phdBallFOF (phd ball fof)


21.43)  MULTIPLE HIGH QUALITY DISCREPANCIES VS SEARCH FOR HIGHLY
DISCREPANT REGIONS

You have already used (above) "Search for highly discrepant
positions".  "Multiple high quality discrepanices" (MHQD) is similar
but much less flexible.  It requires that there be one read at a
position that differs with the consensus and is at least
consed.qualityThresholdForFindingHighQualityDiscrepancies (40 by
default) at that base and within a 9-base window about the base (4
bases on each side, not including pads).  It then requires there be a
second read of any quality of a different subclone that has the same
base.


21.44)  BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY

If you decide that all your edits are terrible and you want to start
over (perhaps you have been training a new finisher), the cleanest
solution is to delete everything in phd_dir and edit_dir , but leave
everything in chromat_dir and just reassemble (run phredPhrap) or
realign the reads again.


21.45)  SELECTIVELY BACKING OUT EDITS AND REMOVING READS

If you want to back out all edits in just particular reads, I have
provided a perl script to do this:


revertToUneditedRead (read name)

What it does it copy the .phd.1 to 1 greater than the highest
version.  

Then you must reassemble using the phredPhrap script to create an ace
file that has no edits for that particular read.  It will have all
edits for all other reads. 

Why doesn't it just delete all phd files except for the
.phd.1?  In that case, Consed could not read any previous ace file
since all previous versions of ace files would refer to phd files that 
have been deleted.

21.46)  REMOVING READS FROM A PHRAP ASSEMBLY

(This is obsolete and has been replaced by consed -removeReads.  See
elsewhere in this document.)

Create a file containing the filename of all the reads you want to
remove, one filename per line.
Then use the perl script

removeReads  <file of filenames>

Then reassemble using the phredPhrap script.


21.47)  ADDING READS WITHOUT CHROMATOGRAM FILES

This may happen if you, for example, download sequence from Genbank
and want to assemble it along with your reads.  

There are 2 ways to do this, depending on whether you want to edit the 
read or not.  

a)  If you want to edit the read, run mktrace to produce a fake trace.  It 
will have all perfect peaks.  

Run:

mktrace (name of file with fasta sequence)

It will create both a chromatogram and a phd file.  Move the
chromatogram into ../chromat_dir.  Move the phd file into ../phd_dir.

Then run the phredPhrap script normally.  You will be able to bring up 
the traces in Consed and edit the read.

b)  If it is not important to edit the reads, there is a method that
is a little faster.  Create just a fake phd file using:

fasta2Phd.perl (name of file with fasta sequence)


It will create a file whose name is taken from the fasta file name:
for example, if the fasta filename is Contig1.c.fasta, then the phd file
will be called Contig1.c.phd.1 The fasta name in the file is ignored.
You can then put this in the phd_dir, and reassemble using the
phredPhrap script.

If the reads are really fake (you don't want the templates to be
chosen by Consed/Autofinish as a template for a primer), then the read
should end with an extension .c or .a or .c1 or
.c2 ... or .a1 or .a2 or ...   This indicates to Consed/Autofinish
that the read is a fake read.

Note:  when you are creating phd files such as this, you must start with
(read name).phd.1   Do not start with (read name).phd.2 or any higher
version number.  This is because Consed looks for the .1 version in
order to find the original phred calls so it expects there to be a .1
version.

There is also a publicly contributed script "lib2Phd.perl" that takes
a fasta file that contains more than one sequence and makes phd files
for each of them.


21.48)  ALIGNING READS TO A BACKBONE

If you sequence the same region (in different people or in different
species), then you may want them all aligned together, even if phrap
doesn't want to put them all together.  To align them all together,
first use a reference sequence and make an assembly out of it by using 
mktrace or fasta2Phd.perl (see above) followed by phd2Ace.perl (see
above).  Then add all of the other reads using Consed's Add New Reads
feature (either automated or manual--see above). 


21.49)  COMPARING READS TO A REFERENCE SEQUENCE

The reference sequence, as in the step above, will just be another
read in the assembly.  Let's call it "ref".  To compare the other
reads to it, in the Aligned Reads Window, point at the Navigate Menu,
hold down the left mouse button and release on "Compare Reads To
Reference Sequence".  A Window labelled "Enter Name of Reference
Read" will pop up.  Enter the name of the read and click "OK".  A list
of high quality read positions that disagree with the reference read
will be displayed.


21.50)  TAGGING ALL READS AT ONCE

Follow the instructions for tagging the consensus, but when the list
of tag types pops up, click the "tag all reads" box at the top of this 
list.  Then continue as with tagging the consensus.

21.51)  EDITING ALL READS AT ONCE

Please don't do this.  Not unless you REALLY know what you are doing
and have a good reason for doing so.  You should really only change a
base call if you are looking at the chromatogram and thus have a basis 
in that read for making the change.

If you are determined to do this in spite of my pleas and protests, do 
the following.  Suppose that at a particular consensus position some reads
have "a" and some "c" and you want them all to be "a".  In the Aligned
Reads Window, point to an "a" at that position, hold down the left
mouse button and release on "make all reads a".  

The reason you shouldn't do this is that perhaps the reads that were
"c" were actually correct and were a different copy of a repeat.
Hence the reads with "a" and the reads with "c" did not really
overlap.  But you just destroyed the evidence.


21.52)  FASTER CONSED STARTUP FOR SANGER READS

Warning: This only applies to assemblies with large numbers of Sanger
reads.  This will have no effect on assemblies with 454 or Illumina
reads.

You can greatly speed up Consed startup if you are willing to use more 
disk space.  The disk space used will be about equal to the total
space used by the PHD files.  Try this will a large dataset (you won't 
notice any difference with the example datasets that come with Consed.)

    To use this method of startup:
    
    1) cd to directory where ace file is kept
    2) type: makePhdBall.perl
        (This will create a file called phd.ball which is big.)
    3) start consed normally


In many situations, this will greatly speed up Consed startup.  The
amount of speedup depends on which operating system is used: on Linux,
the time to read phd files dropped from 75 seconds to 8 seconds, and
thus the total time to start up consed dropped from 86 seconds to 17
seconds.  I saw similar speedups on Solaris where the phd files are on
an nfs mounted disk.  However, there was another situation in which
the startup time was the same.

Warning: If you create phd.ball as above, Consed will be reading most
phd files from phd.ball instead of from ../phd_dir.  If you delete phd
files in phd_dir, you must also delete phd.ball.  Otherwise Consed
will give lots of error messages "TIME STAMP MISMATCH" and many things
will not work correctly.


21.53)  VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS


If you have a chromatogram, you can use Consed to view it, even if it
hasn't been assembled into the ace file.  This is common with cDNA
assemblies in which the reads don't overlap and thus phrap doesn't put 
them together into a contig.

To do this, make the same edit_dir, phd_dir,
and chromat_dir as above, put the chromatogram into chromat_dir, run
phred on it to generate the phd file which goes into phd_dir.

Then go to edit_dir and run:

phd2Ace.perl (name of phd file)

For example, if your phd file is myRead.phd.1
from edit_dir, type:

phd2Ace.perl myRead.phd.1

This will produce myRead.ace

Then just start Consed normally:
consed -ace myRead.ace
and you can view the chromatogram.

21.54)  HIDING SOME TYPES OF TAGS

If you have many tags that overlap and thus are purple, you can
hide some less relevant tag types so there is less purple and there is
less distraction.  Make sure you have a few tags visible.  Then click
on 'Find Main Win'.  In the Main Window, open the Options menu, and
release on 'Hide Some Tag Types'.  A list of tag types will pop up.
Select the type that you have visible (above).  Then click 'OK'.  Go
back to the Aligned Reads Window.  That tag should still be visible.
Click on the button 'Some Tags' in the upper right part of the Aligned
Reads Window.  Your tag should disappear.  The 'Some Tags' button
should have changed to 'Sh All Tags'.  Click on it again.  Your tags
should have reappeared.

21.55)  CUSTOM CONTIG NAMES

Normally, when you re-assemble, phrap will name the contigs
differently--what was Contig31 before may become Contig32.  To help
you know which contig is which, Consed allows you to give a name
(e.g., "A") to a contig which will persist after re-assembling.  To do
this, swipe some consensus bases with the middle mouse button (as
above).  When the "Select Tag Type" box pops up, click on "contigName"
and also type a name into the "Contig Name:" field and then click
"OK".  The next time you re-assemble, the name "A" will appear in the
list of contigs on the Consed Main Window.


21.56)  ERROR RATE

In the Aligned Reads Window is a box (upper right) labelled
'Err/10kb'.  This is the estimated error rate for this contig, and it
is a good indicator of when you are done (or not done) finishing.
In addition, you can find the error rate for a particular region of
contig as follows:  Point at 'Misc' menu, hold down the left mouse
button, pull down and release on 'Show Error Info For Region'.  Fill 
in the boxes for left and right consensus position, click on
'Calculate' and you will be given the error and single subclone data
for that region.

21.57)  RESTRICTION DIGEST

Restart Consed.

Double click on "standard.fasta.screen.ace.1"

In the Consed Main Window, click the "Digest" button.  For the
purpose of this exercise, the full pathname of file of vector sequence
can refer to any file of sequence in fasta format.  However, when you
are using it with your own data it should refer to a file that
contains the sequence of your cloning vector. For example, if you are
sequencing a BAC, it should contain BAC vector.  The sequence must
start at the vector/insert junction that you used when you ligated the
insert.

Click "OK".  You will see a comparison of in-silico fragments (those
calculated from the sequence) and real fragments (those in
fragSizes.txt which supposedly came from a real gel).

* If a band is red, that means that it doesn't match.  
* If a band has a "v" on it, that means it is a vector fragment.
* If a band has a "g" on it, that means it is a gap-spanning fragment.

Move the pointer over the fragments, and you will see the fragment
sizes appear.  Move the pointer to the in-silico fragment with size
2299.  Click on it.  You will see the fragment on the left size of the
window become highlighted.  Click on the button labeled "right end"
(2nd row from the bottom of the window) and the Aligned Reads Window
will pop up, with the cursor on the right end of the fragment.

Click on "show problems" and navigate through the list of problems by
clicking on "next".  You will notice that the Gel Window is zoomed
in.  To return to the original zoom, click on "Zoom Original".  

Where it says "Select Enzyme:", point to "EcoRV", hold down the left
mouse button and release on "HindIII".  This is how you change
enzymes.

Click on the button labeled "Text Output".  This can be saved to a
file and printed out.

Dismiss the restriction digest window.  On the Consed Main Window,
click the "Digest" button again.  Notice the file "fragSizes.txt".
This is a file of actual gel fragment sizes.  If you don't have an
actual gel, but rather you want to just make predictions of fragment
sizes from the sequence, you can leave this box blank (erase the
"fragSizes.txt").  Try that.


fragSizes.txt has the following format:

>EcoRV
448
710
1102
1197
-1
>HindIII
448
508
586
735
801
-1
 
where EcoRV and HindIII are enzymes and the numbers below them are the 
actual fragment sizes.  Each enzyme list is terminated by -1.  

Consed does its best to try to figure out which end of the clone
insert is connected to which end of the vector.  However, it sometimes
is wrong.  If you believe it is wrong, you can click "compl vector" to
try connecting the insert to the vector in the opposite orientation
and see if that produces better agreement with the actual digest.


21.58)  RESTRICTION DIGEST AND ASSEMBLY VIEW

Get your own copy of the dataset "assembly_view" (see above under
GETTING YOUR OWN COPY OF A SAMPLE DATASET).

cd assembly_view/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

Restart consed

Double click on "assembly_view.fasta.screen.ace.1"

In the Consed Main Window, click on the button "Assembly View" which is
near the upper left corner of the window.

Also on the Consed Main Window, click on Digest.  The "Select Enzyme
and Contigs" Window should appear with EcoRV and HindIII selected.
Click OK.  The "Display Digest" Window should appear.

Now look at the Assembly View Window.  You will notice blue, green,
and red rectangles under the grey contig bars.  These rectangles are
the in-silico restriction fragments.  Point to one of them-- it will
turn yellow and information will be displayed in the information box
below.  Point to one of the EcoRV fragments, hold down the right mouse
button, and release on "Goto fragment in digest window".  Notice that
in the Display Digest Window, the selected fragment is highlighted
both on the left side (the text) and in the Gel (right) side.


21.59)  MULTIPLE TRACE POPUP

Bring up dataset standard.  In the Aligned Reads window, scroll to
a region that has many reads and that has some discrepancies--try
position 1162.  Hold down the shift key, and click with the middle
mouse button on the consensus.  At this location 3 traces will
pop up--these are the 2 highest quality traces that agree with the
consensus (on each strand) and the highest quality trace that
disagrees with the consensus.  This feature is useful in areas of high
coverage when you want to rapidly examine just the most significant
traces rather than looking at all of them.


21.60)  MAXIMUM NUMBER OF TRACES DISPLAYED

Bring up dataset standard.  Scroll to position 1162.  Bring up 4
reads and then try bringing up additional reads.You will notice that
new reads are put at the top of the stack of traces and, once there
are 4 traces displayed, traces are automatically removed from the
bottom of the stack.  If you want to change this maximum number of
traces to something besides 4, you can do that: In the Consed Main
Window (click on 'Find Main Win' on the Aligned Reads window), pull
down the 'Options' menu, and release on 'General Preferences'.  Try
changing the 'Max Number of Traces Shown' to 3.  Then click 'Apply and
Dismiss'.  Now dismiss the Trace Window and again start adding
additional traces to the Trace Window.  You will notice that now the
number of traces shown will not exceed 3.

If you want to view a large number of traces at once, you should use
the SHOW ALL TRACES (described above).


21.61)  SCALING THE TRACES 

In the Trace Window, grab the thumb of the line that is labelled "V"
(for Vertical magnification) and move it back and forth, noticing the
effect on the traces.  This is useful if the traces are too small or
too large.  There are several other methods of scaling the traces you
will learn later.


21.62)  HOTKEYS FOR EDITING

If you do a lot of editing, you will want to have a faster method
of doing these edits than having the popup and selecting an option.
Thus the following hot keys exist:


    < and > (less than and greater than) to make n's to the left
        and the right (respectively) of the cursor
    control-l and control-r to make low quality to the left and
        the right (respectively) of the cursor
    overstriking with a capital letter (e.g., C instead of c) causes
        the base to become high quality rather than low quality
    overstriking with a lower case letter causes the base to become
        low quality

Give these a try.

21.63)  SCROLLING TRACES INDEPENDENTLY

Dismiss all of your Trace Windows.  Then pop up traces for 2
different reads in approximately the same location.  Scroll one of
them.  You may want to scroll by clicking the arrows or clicking to
the left or right of the thumb.  You will notice that both will
scroll.  Consed will do its best to have corresponding peak lined up.
(Consed can't line all of them up because the peak spacing is not
uniform and differs from read to read.)  Try removing a trace by
clicking on one of the 'Remove' buttons in the Trace Window.  Try
adding other traces.  Then click on 'No' for scrolling the traces
together and try scrolling.  You will now observe that they scroll
separately.


21.64)  MEASURING ERROR RATE AND SINGLE SUBCLONE BASES FOR A REGION

Some contigs have long tails of low quality bases and you would
like to find out the error rate for the contig without that long
tail.  On the Align Reads Window, pull down the Misc menu, and release 
on 'Show Errors for a Region'.  This will tell you both the error rate 
for the region and the number of single subclone bases for that region.


21.65)  PREVENTING 2 USERS FROM MAKING CONFLICTING EDITS

If there are 2 users that are both editing in the same directory,
there is the possibility they will both make edits to the same read.
Whoever saves their assembly last will wipe out the edits of the other
person, even if they were using different ace files.  To help prevent
this, consed can warn you if someone else is making edits in the same
directory.  Set the consed parameter:

consed.onlyAllowOneReadWriteConsedAtATime: true

The default is "false" so you have to turn this to true to make it
work (see CONSED CUSTOMIZATION).

This will usually work even if the 2 users are on different computers
(and the directory is nfs-mounted between them) and even if the
different computers have different operating systems.  I've tested the
following combinations:
user 1 on Solaris; user 2 on Solaris
user 1 on Linux; user 2 on Linux
user 1 on Linux; user 2 on Solaris  <--- does not work

Only the last combination doesn't work.


21.66)  PRINTING CONSED WINDOWS 

Consed windows are really designed for being viewed on a computer
screen, rather than on paper.  If you want to print out a consed
window, the default colors are not so good.  Since Consed windows are
mostly black (Aligned Reads Window and Traces Window), a lot of toner
is used up and the window is difficult to read.  

I've solved this: Go to the Consed Main Window, pulldown the 'Options'
menu and release on 'General Preferences'.  Scroll down to "Make light
background in Aligned Reads Window..." and click on "Do it now".  You
will notice the light background.  A few other things (traces colors
and thickness) are also customized for making color prints.

You can also make consed start with like this by putting the 
following into your consedrc

consed.makeLightBackgroundInAlignedReadsWindowAndTracesWindow: true

(see CONSED CUSTOMIZATION)


If you are running on a Linux box, there is a free (or nearly free)
program called "xv".  One web site is http://www.trilon.com/xv It is
written by one of those dying breed of UNIX programmers who just
*loved* UNIX and programming and sharing it.  His web site is
enjoyable because some of his passion comes through.  With xv, you can
make a postscript file from a Consed window.  Then you can print the
postscript file on a color printer.

If you are running on a Windows computer (with an X emulator) or on
macosx, you can make a screen snapshot.  Or you can use the Microsoft
Snipping tool:
http://windows.microsoft.com/en-us/windows7/use-snipping-tool-to-capture-screen-shots


21.67)  COLOR MEANS EDITED AND TAGS

(For this step, first click on the 'Dim' menu and release on 'Dim
Nothing'.)  Point to the 'Color' menu, hold down the left mouse button
and release on 'Color Means Edited and Tags'.  Notice that the bases
that you have edited (make sure you have edited some bases) will stand
out in either white or grey (depending on whether the base was made
high quality or low quality).  Observe this both in the Trace Window
and the Aligned Reads window.  This colormode is useful if you are
interested in easily spotting which bases are edited.


21.68)  COLOR MEANS MATCH

In the Aligned Reads Window, go to the menu labelled 'color', and
pulldown and release on 'color means match'.

Now you notice different colors:  The
colors have the following meaning:

    Blue:   agrees with consensus
    Orange: disagrees with consensus
    (Yellow: this stretch of this read was used by phrap to form the
consensus--no longer supported)
    Grey:   Low quality or unaligned ends of reads 

Return to the 'Color Means Quality and Tags' colormode by the
following:  point to the 'Color' menu, hold down the left mouse button
and release on 'Color Means Quality and Tags'.  This is the colormode
most commonly used.


21.69)  AUTOEDIT

Autoedit is a program that will read an ace file, make edits according
to which options you specify, and then write out a new ace file, all
without any interaction from the user.  Thus Autoedit can be run
automatically at night, the same way you can run phredPhrap.  Autoedit
has various options that are controlled from the consedrc file the
same as the consedrc file controls Autofinish.

Run AutoEdit as follows:

consed -ace (name of exising ace file) -autoEdit 

This will create another ace file with a version number one higher
than the one you just ran.  If you want to specify a particular new ace
file name, you can do it this way:

consed -ace (old ace file) -autoEdit -newAceFileName (new ace file)


Autoedit has the following options (if you do not specify any of
these, autoedit will do nothing):


consed.autoEditConvertCloneEndBasesToXs: true
bool
! If true, will convert to X's bases of all reads that protrude beyond a
! cloneEnd tag.
! (YES)

consed.autoEditTellPhrapNotToOverlapMultiplyDiscrepantReads: true
bool
! This will find all locations where there are multiple identical 
! discrepancies with the consensus (and some other conditions) and try
! to make most of the reads quality 99 at that location so that phrap,
! next time it is run, will not overlap those reads.  This will fix
! many misassemblies.
! (YES)

consed.autoEditTagEditableLowConsensusQualityRegions: true
bool
! This will find regions that are low quality, but that a human
! finisher could easily determine the correct base and thus
! money could be saved by not having Autofinish suggest additional
! reads overlapping the region
! (YES)

consed.autoEditRecalculateHighQualitySegmentsOfReads: false
bool
! If true, will recalculate the high quality segments of the reads
! (YES)

consed.autoEditFixRunsInConsensus: false
bool
! fixes this: 
! ccc (cons)
! cc* (read1)
! *cc (read2)
! (YES)  

21.70)  (not currently recommended way of tagging SNPs and not
currently supported)

You must have a set of fake reads (see fasta2Phd.perl) with
polymorphism tags on the SNPs.  There must also be a read in the
assembly that has a genomeRegion WR item.  Then run

consed -ace (ace file) -tagSNPs (file of fake reads)


----------------------------------------------------------------------------


22.  CONSED CUSTOMIZATION

If you want to customize Consed, it would help to be able to edit in
UNIX.  There is no Microsoft Word in UNIX, but there is emacs, vi,
pico, nano and other editors.  

I suggest pico or nano for their simplicity.  (You can get more
information by googling, for example, "pico unix editor".)

Point at the 'Info' menu on the Consed Main Window, hold down the left
mouse button and release on menu item 'Show Current Consed Parameters'.  This
shows you what is available to be changed by putting in your
~/consedrc file.

You can also see what is available by typing on the command line:
consed -printDefaultResources
(Warning:  if you have changed any resource, it will not show the new
value--just the default value.)

Type:

consed -editConsedrc

This includes most of the parameters found under 'Info/Show Current
Consed Parameters' (above).  It provides an easy graphical way for you
to edit these parameters, if you are not familiar with editing under
UNIX.  You just change the parameter you want and click "Save".  (See
HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS (far above)).  For the new
parameter to take effect, you must restart Consed/Autofinish.

Changes in ~/consedrc only affect one user.  If you want to make a
change to affect all Consed users on the system, put a file in some
central location (e.g., /usr/local/genome/lib/consedrc ) and then
have every user set the environment variable CONSED_PARAMETERS to
that the full pathname of the file.  For example, if using csh or
tcsh, type:

setenv CONSED_PARAMETERS /usr/local/genome/lib/consedrc

If using bash, type:

CONSED_PARAMETERS=/usr/local/genome/lib/consedrc
export CONSED_PARAMETERS

Anything the user puts in ~/consedrc will override whatever is in the
CONSED_PARAMETERS file.

You can also have different parameters for different projects.  Put a
consedrc file in the edit_dir of a particular project.  When you are
working on that project, whatever is in that consedrc will override
whatever is in your ~/consedrc file or the  CONSED_PARAMETERS file.


22.1)  CUSTOMIZING NAVIGATE BY SINGLE STRANDED REGIONS AND NAVIGATE BY SINGLE
SUBCLONE REGIONS

You can set the parameters:

consed.searchFunctionsUseUnalignedEndsOfReads: false
consed.searchFunctionsUseLowQualityEndsOfReads: true

If you set consed.searchFunctionsUseUnalignedEndsOfReads to be false,
then the unaligned ends of a read are not considered to cover the
consensus.

If you set consed.searchFunctionsUseLowQualityEndsOfReads to false,
then the low quality ends of a read are not considered to cover the
consensus.

For example, if the settings are:

consed.searchFunctionsUseUnalignedEndsOfReads: false
consed.searchFunctionsUseLowQualityEndsOfReads: false

then a base in a read is only considered to cover the consensus if it
is both in the aligned portion of the read and the high quality
portion of the read.

22.2)  consedrc vs .Xdefaults

There are a few consed resources that are changed in ~/.Xdefaults
rather than consedrc.  They are mainly colors, fonts, and scale
resources.  Point at the 'Info' menu on the Consed Main
Window, hold down the left mouse button and release on menu item 'Show
Default X Resources'.  This shows you what is available to be changed
by putting in your ~/.Xdefaults file.  

Although most Consed parameters now go into consedrc, there are still
a very few that need to stay in .Xdefaults.  Here is the rule: if the
parameter starts with

consed.

such as

consed.gunzipFullPath: /bin/uncompress

then it goes into consedrc

If the resource (here it is called a "resource" rather than a
"parameter") starts with

consed*

such as

consed*contigwin.background: Black

then it goes in .Xdefaults

For example, if you want to change the background color of Assembly
View, put the following in your ~/.Xdefaults file:

consed*Assembly_View*background: red

Type xrdb -remove

and restart consed


22.3)  COLOR BLINDNESS

One person with Red/Green colorblindness (Deutan), found the following
colors helpful:

consed.colorTracesG: Yellow
consed.colorTracesA: forest green
consed.colorTracesC: medium blue
consed.colorTracesT: light coral

Put these in a consedrc in your home directory.


----------------------------------------------------------------------------


23.  CREATING CUSTOM TAG TYPES

You can add your own tag types by creating a file of your custom tag
types.  The file looks like this:

mytag1 red consensus yes
mytag2 purple both yes
mytag3 green read no
 
    field 1 ("mytag1") is the tag name
    field 2 ("red") is the color
    field 3 is "consensus", "read", or "both" depending on which kind of tag
        it is
    field 4 is "yes" or "no" depending on whether the user can add
        this tag in Consed (by swiping) or whether it is a tag that
        can only be viewed in Consed (presumably it would be added by
        some software of yours before the user sees it in Consed).
 
If the file is called "/usr/local/genome/lib/tagTypes.txt", then in
consedrc put the following line:

consed.fileOfTagTypes: /usr/local/genome/lib/tagTypes.txt
so that Consed knows where the file is.

Once you have done this, the user of Consed can add tags of these
types in the method described in TAGS of the Quick Tour (above).

The list of available colors is found in the file rgb.txt found in
/usr/X11R6/shar/X11/rgb.txt on macosx, /usr/lib/X11/rgb.txt or
/usr/share/X11/rgb.txt on Linux or /usr/openwin/lib/rgb.txt on
Solaris.  For more information, consult any X-Windows reference, since
this has nothing to do specifically with Consed.  For your
convenience, here are a few of the color names.  One way to find out
what they look like is to try them:

mint cream              DeepSkyBlue1       DeepPink4         
azure                   DeepSkyBlue2       HotPink1          
alice blue              DeepSkyBlue3       HotPink2          
lavender                DeepSkyBlue4       HotPink3          
lavender blush          SkyBlue1           HotPink4          
misty rose              SkyBlue2           pink1             
white                   SkyBlue3           pink2             
black                   SkyBlue4           pink3             
dark slate gray         LightSkyBlue1      pink4             
dim gray                LightSkyBlue2      LightPink1        
slate gray              LightSkyBlue3      LightPink2        
light slate gray        LightSkyBlue4      LightPink3        
gray                    SlateGray1         LightPink4        


You can also associate data with tags.  For example, you can have a
tag type SNPprobability which gives, at a particular consensus
position, the probability that a base is a SNP.  Thus there needs to
be a floating point number with the tag.  This can be defined in the
same file /usr/local/genome/lib/tagTypes.txt (as above), but instead
of having one line for the tag type (as shown above), it has a more
complicated structure to allow for tag fields:

TAG_TYPE
NAME: tag_type
CONS_OR_READ: both
USER_CAN_ADD: yes
COLOR: color1
FIELD: name type
POINTER_FIELD: name (type of tag pointed to) (optional ?*+)
END_TAG_TYPE

CONS_OR_READ: can be followed by "consensus", "read", or "both".

In FIELD: name type,
"type" is one of integer, floating, or string

In POINTER_FIELD: name (type of tag pointed to) (optional ?*+)

? means 0 or 1 occurrences
* means 0 or more occurrences
+ means 1 or more occurrences
absence of any of these means exactly 1 occurrence

Note that FIELD and POINTER_FIELD can both be present 0 or more times.

Here is an example for a SNP with a probability:

tagTypes.txt contains:

TAG_TYPE
NAME: SNP
CONS_OR_READ: consensus
USER_CAN_ADD: yes
COLOR: yellow
FIELD: probability floating
END_TAG_TYPE

Note that tagTypes.txt is not read by default.  You must set the
consedrc parameter:

consed.fileOfTagTypes: tagTypes.txt

The ace file will then contain SNP tags that look like this:

CT{
Contig1 SNP consed 1863 1870 091030:091242
probability 75.2
}


Here is an example of a user-defined tag type that points to another
copy:

tagTypes.txt:

TAG_TYPE
NAME: tear2
CONS_OR_READ: consensus
USER_CAN_ADD: yes
COLOR: yellow
POINTER_FIELD: other_tear2_tag tear2
END_TAG_TYPE

and these tags look like this in the ace file:

CT{
Contig2 tear2 consed 6470 6470 090714:103935
ID: 5
other_tear2_tag 6
}

CT{
Contig2 tear2 consed 6487 6487 090714:103941
ID: 6
other_tear2_tag 5
}

This means that the tear2 tag with ID 5 refers to the other tear2 with
ID 6, and visa versa.


----------------------------------------------------------------------------

24.  EXPANDING CONSED'S CAPABILITIES WITH A LITTLE PROGRAMMING

Lab managers:  Please do not get put off by the title of this
section.  You should read through this section so you are aware of
what consed is capable of.  If you think one of these features would
be very helpful to your lab, then get a programmer to spend a day or
two and write you some scripts that could really help you out.  But
first you need to be aware of what is possible.  So read through this.

24.1)  BRINGING UP CONSED FROM A SCRIPT

Suppose that you want to write a script that brings up consed on one
ace file to a particular position, and then brings up consed on
another ace file at a particular position, and then brings up consed
on another ace file at a particular position, ... you can do this by:

consed -ace (name of ace file) -mainContigPos (unpadded pos)

This will bring up consed with the main contig (the contig with the
most number of reads) with the Aligned Reads Window already up and
scrolled to position (unpadded pos).

Thus you could write a script like this:

cd directory1
consed -ace file1.ace -mainContigPos 1050
cd directory2
consed -ace file2.ace -mainContigPos 2057
cd directory3
consed -ace file3.ace -mainContigPos 1487
.
.
.

24.2)  CONTROL OF CONSED FROM SOME OTHER PROGRAM

Consed can be controlled by some other program.  For example, you
might have a program that displays mapping data and you would like the
user to be able to click on a location and have Consed come up showing
the bases in that region.  This feature allows a programmer to do
this.

Here is an example of how to do this:

24.3)  In this case we need a private copy of the dataset called "standard"
(see GETTING YOUR OWN COPY OF A SAMPLE DATASET above).

24.4)  Then Type:

cd standard/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

Start consed as follows:

consed -socket 5432 -ace standard.fasta.screen.ace.1

If a window pops up asking if you want to apply edits, answer "no".

Open another xterm and cd to standard/edit_dir

From this directory, run the script testSocket.perl (thanks to Bill
Gilliland) which is supplied with consed in the scripts directory of
the consed distribution.

This script will say:

issuing command Scroll Contig1 100
waiting for you to type a command (such as Scroll Contig1 150)...

You should immediately see consed's Aligned Reads Window open and
scroll automatically to position 100.

If you were to type (in the window in which testSocket.perl is
running):

Scroll Contig1 150

then you would see Consed immediately scroll to position 150.


Here are the details of how you could use it:

The external program can start up Consed as follows:

consed -socket (local port number) -ace (ace filename)

For example,

consed -socket 5432 -ace standard.fasta.screen.ace.1

After Consed completes coming up (including you clicking whether you
want to apply edits), you will see the message in the xterm:

success bind to local port number: 5432

And then you will see a file created by Consed in the default
directory (which is usually the directory the ace file is in) called
consedSocketLocalPortNumber

This gives the port number of the Berkeley socket that Consed has
opened and is listening on.  Thus your program can read this file and
create a connection to the Berkeley socket created by Consed.

Once the connection is established, your program can send commands to
Consed at that socket indicating to Consed which contig to display and
what consensus position to scroll to.  Currently, the only acceptable
commands are:

Scroll (contigname) (consensus position)<return>
PopupTraces (read name) (unpadded read position in the direction of sequencing)<return>

'Unpadded read position in the direction of sequencing' is the
position from the right end, if the read is a bottom strand read.

Just send such a command to the Berkeley socket, and Consed will
respond appropriately.  (Currently, Consed doesn't like it if another
process establishes a connection and then terminates without first
terminating the connection.)


24.5)  REMOVING READS IN BATCH

Consed can remove reads from the command line (without the graphical
interface) as follows:

consed -ace (ace file) -removeReads (file with reads to remove)

consed -ace (ace file) -removeContigs (file with contigs to remove)

consed -ace (ace file) -selectContigs (file with contigs to keep)

(Optionally, -newacefile (new ace file) can be used to specify the
name of the new ace file.)

consed -removeReads, consed -selectContigs, and consed -removeContigs
do not bring up the graphical interface.  What is done with the reads
is governed by the consedrc parameter
consed.removeReadsWhatToDoWithReads: which can have values
removeTogether, delete, or eachIntoOwnContig

There are a number of other consedrc parameters:
consed.removeReadsMakeCustomNavigationFileWhereConsensusRecalculated: false
consed.removeReadsWhatToDoIfZeroDepthRegions: 
which can have values break or nobreak

! When removing reads, what should happen if removing a read causes a
! contig to have a location that is zero depth of coverage.  Options
! are: a) break (to break the contig into several new contigs that
! have nonzero depth of coverage), b) nobreak (to leave the contig in
! one piece with a new 0 depth of coverage region)

consed.removeReadsRecalculateConsensus: true

! when using consed -removeReads, use this to determine whether
! to recalculate the consensus bases.  When using gui, ask user.
! If you will be allowing contigs to break, 
! the the consensus will be recalculated regardless of the setting of
! consed.removeReadsRecalculateConsensus


consed.removeReadsWhatToDoWithUnalignedReads: allIntoOneContig

! options are allIntoOneContig or eachIntoOwnContig
! allIntoOneContig only applies when specifying that contigs
! will be broken apart where there are no reads, i.e.,
! consed.removeReadsWhatToDoIfZeroDepthRegions: break


24.6)  COMPLEMENTING CONTIGS IN BATCH

Consed can complement contigs from the command line (without the
graphical interface) as follows:

consed -ace (ace file) -complementContigs (file with list of contigs
to complement)

(Optionally, -newacefile (new ace file) can be used to specify the
name of the new ace file.)


24.7)  HOW TO WRITE A CUSTOM NAVIGATION FILE

In the Main Window, there is also a Navigate menu.  Pull it down and
release on the Custom Navigation menu item.  A box will pop up saying
'Select custom navigation file:'  
There will be a file:
custom_navigation.nav
Double click on it.

You will see the now-familiar custom navigation box.  Click 'Next'
repeatedly until you get to the end of the list.

Consed doesn't write such a file--it just reads it.  This feature
allows you the ability to write your own programs that select
locations that you want your finishers to examine.  Your program
writes a file, the user reads that file into Consed in this manner,
and you can go to each of the locations.

The format of the file is as follows:

There is a title (optional) line that looks like this:

TITLE: low quality base in discrepant region

and then there are blocks that look like this:

BEGIN_REGION
TYPE: READ
READ: B11_hs1-60153193_GGor_050426.f
UNPADDED_READ_POS: 34 34
COMMENT: a comment
END_REGION

The block above refers to read position 34 of read
B11_hs1-60153193_GGor_050426.f Even if this read is complemented in
the assembly (it is right to left), this position refers to the base
position in the direction of sequencing--same as the position within
the PHD file.


There is another kind of block:

BEGIN_REGION
TYPE: CONSENSUS
CONTIG: hs21-15002178_HSap-Contig
UNPADDED_CONS_POS: 1774 1784
COMMENT: another comment
END_REGION

which refers to a position on the consensus.  Notice that it is
missing the "READ:" line, the TYPE: line is different, and instead of
"UNPADDED_READ_POS" it has "UNPADDED_CONS_POS".  When
someone is navigating, the blinking cursor will be put onto the
consensus (with the second kind of block) rather than the blinking
cursor on the read (with the first kind of block).

You might want to specify the consensus positions in terms of some
user-defined positions (the first position of the consensus is not 1
but rather is some other number).  For example, you might want to use
chromosome positions, rather than the position within the contig.  You
can let Consed know that the UNPADDED_CONS_POS numbers are
user-defined positions by putting the words "user-defined positions"
somewhere in the TITLE line like this:

TITLE: low quality base in discrepant region (user-defined positions)

So that Consed knows what number to start numbering the consensus at,
you must have a startNumberingConsensus tag on the consensus or a read
indicating the user-defined position of the left-end of the contig.
See USER-DEFINED CONSENSUS POSITIONS in this document.

There is a 3rd type of block that you probably won't use much.  It is
used when you know the consensus position within a read, but not the
read position.  Then you can use:

BEGIN_REGION
TYPE: READ
CONTIG: hs2-105068850_HSap-Contig
READ: E02_hs2-105068850_PTro_040520.f
UNPADDED_CONS_POS: 295 299
COMMENT: left 2
END_REGION

The block above refers to a position on read
E02_hs2-105068850_PTro_040520.f in contig hs2-105068850_HSap-Contig at
consensus positions 295-299.

There is a 4th type of block that you will probably not use much,
either.  It is when you know the *padded* consensus position (the
position that includes *'s).  This is especially useful when you want
the user to navigate to a particular pad in the consensus and there
are several pads in a row.  The padded position is the only way to
unambiguously say which pad you are interested in.


BEGIN_REGION
TYPE: CONSENSUS
CONTIG: Contig138464
PADDED_CONS_POS: 512330 512330
COMMENT: padded 512330 512330
END_REGION

(Note the comment can be anything.)


24.8)  Consed can startup with your custom navigation file already loaded
and displayed.  To illustrate this, do this exercise.  

24.9)  In this case we need a private copy of the dataset called "standard"
(see GETTING YOUR OWN COPY OF A SAMPLE DATASET above).

24.10)  Then Type:

cd standard/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

24.11)  Start consed:

consed -ace standard.fasta.screen.ace.1 -nav custom_navigation.nav

You will see consed come up with the custom navigation window visible
and loaded.

Suppose that you want to review, say, 100 different ace files, each
with a computer-generated list of locations.  Using this -nav feature,
this is easy:  you write a script that has:

cd (next ace file location)
consed -ace (ace file) -nav (custom navigation file)
cd (next ace file location)
consed -ace (ace file) -nav (custom navigation file)
cd (next ace file location)
.
.
.

The only thing the user need do is click "next", "next", "next",
... and finally "quit" and automatically the next ace file is brought
up with the next custom navigation file already loaded visible.

If you want the user to see the traces at each position (this assumes
you are using Sanger reads), then you can set the following in your
consedrc file (see CONSED CUSTOMIZATION):

consed.navigateAutomaticTracePopup: true

Warning:  if the user inserts and deletes bases from the consensus
sequence, all downstream positions will be changed.

24.12)  COMPRESSING CHROMATOGRAMS

If you are interested in compressing your chromatogram files, go into
chromat_dir and gzip one of the chromatogram files.  Make sure that
gunzip is in /usr/local/bin   (You can change this location via the
Consed parameter

consed.gunzipFullPath: /usr/local/bin/gunzip

--see CONSED CUSTOMIZATION (above), but it will be easiest for 
you and your users if you just put gunzip (or a link to it) in
/usr/local/bin and not have to bother with Consed parameters.)

Restart Consed and bring up the corresponding trace.  You will notice
no appreciable delay.


24.13)  READING CHROMATOGRAMS OUT OF AN EXTERNAL DATABASE

Normally, chromatograms are kept in ../chromat_dir.  If you want to
keep them somewhere else (such as in an external database), you can do
that.  When the chromatogram is needed (when the user asks to view a
trace), Consed will call an external program, passing it the name of
the read required, and then look for the chromatogram in /tmp (by
default).  It will read the chromatogram and then delete it.  Use the
parameters:

consed.alwaysRunProgramToGetChromats: true
consed.programToRunToGetChromats: /usr/local/bin/programToGetChromat

In this case, "programToGetChromat" is the name of the program that
gets the chromatogram and puts it into /tmp.

If you keep *some* chromats in an external database but *some*
chromats are in ../chromat_dir, then set

consed.alwaysRunProgramToGetChromats: last

which means it will first look in ../chromat_dir and, if it doesn't
find it, it will then run the program to get the chromats.


24.14)  COMPRESSING ACE FILES AND PHD BALLS

Consed can read and write compressed ace files and phd balls (by
default, using gzip and gunzip).  To see this, just compress an ace
file:

24.15)  Type:

cd standard/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)


24.16)  Type:
gzip -c standard.fasta.screen.ace.1 >somethingelse.ace.gz


24.17)  And then you can bring up consed like this:

consed -ace somethingelse.ace.gz

Consed can also *write* compressed ace files.  When you are saving the
ace file and consed asks you for the filename, just append a ".gz" and
consed will know to compress the file.  (See "SAVING THE ASSEMBLY"
above.)

In tests that I did, this did not appear to have any performance
improvement.  I believe the reason is that consed is not I/O
bound--when starting up, reading the ace file is a very small portion
of the time.  In fact, you will notice that when consed is starting
up, it uses 100% of the cpu.

You can also compress phd balls--just:

cd ../phdball_dir

and gzip the phd balls.  Even though the ace file refers to the
uncompressed file name, consed is smart enough to try the .gz version
if it can't find the original.  (Try it and see.)  Phd balls are
usually even larger than ace files so compressing them will give you
even more improvement in disk space usage.


24.18)  NO PHD FILES

Try bring up Consed like this:

consed -nophd

This mode allows you to view an assembly when you don't have phd files
or chromatograms but you only have the ace file.  I do not recommend
nor support this option!  There are so many things that do not work
with this option that I haven't bothered to keep track of them, but
here are a few items: can't make joins, can't recalculate consensus
quality, can't view traces, can't edit, autofinish will not give good
results, can't view quality of the read bases, ...


24.19)  ADDING TAGS FROM OTHER PROGRAMS

You can also write external programs that add tags to the ace file
and/or the phd files.  Both RT (read) and CT (consensus) tags can be
appended to the end of the ace file.  BEGIN_TAG tags can be appended
to the end of the phd files.  Do not rewrite the ace file or the phd
file--there is no need to do so and it will cause problems.  See
SAMPLE PHD BALL FORMAT in this document for the format of BEGIN_TAG
read tags.


24.20)  CHROMOSOME POSITIONS/USER-DEFINED CONSENSUS POSITIONS

Suppose instead of labeling the consensus 1, 2, 3, 4, ..., you want,
for example, to number it: 100,000,001, 100,000,002, 100,000,003,
100,000,004, etc. (e.g., in chromosome positions).  You can do this.
Note that all bases in the consensus (except pads) will be
numbered--you cannot, for example, only number exon bases and not
number intron bases (pity).

These user-defined consensus positions will apply not only to the
consensus scale in the Aligned Reads Window, but also to all of the
Navigate lists and Search for String.

To start numbering the consensus at a number different from 1, add a
"startNumberingConsensus" tag to either the consensus or a read in
that contig.  The tag will look like this (this is a consensus tag in
the ace file):

CT{
hs18-25105605_HSap-Contig startNumberingConsensus consed 1 1 041123:152840
25105605
}

This says that the consensus will be numbered starting at 25,105,605

You cannot add such a tag by using Consed--you must have a program add
it to the ace file (or a phd file of one of the reads in the contig).

You can switch between labeling positions 1, 2, 3, ... and labeling
positions by chromosome or user-defined positions as follows:

Point to a "Misc" menu, hold down the left mouse button, and release
on "Turn On/Off User-Defined Consensus Scale Numbers".  

24.21)  DEFINING KEYS (HOTKEYS) TO CALL EXTERNAL PROGRAMS AND/OR APPLY TAGS AND/OR
 INTEGRATE CONSED WITH EXTERNAL DATABASES

[CUSTOM KEYS, USER-DEFINED KEYS]

You can define keys (such as Control-N) to apply a particular tag to a
single base, saving you the several steps in applying tags: swiping
and selecting a tag type (as shown under "TAGS" above).  However, it
is even more powerful than that.  You can also define an external
program to run when you type this key.  That external program can be
your own, and it could be, for example, a program that puts
information into an external database.

The first thing you need to set up a custom hotkey is a consedrc
file which goes in edit_dir of the project you're working on (see
above CONSED CUSTOMIZATION for other possible locations).

Put the following in that file:

consed.userDefinedKeys: 14 15
! make a space-separated list of the decimal ASCII values of the keys
! 14 means control-N, 15 means control-O
 
consed.programsForUserDefinedKeys: /bin/echo /bin/echo
! a space-separated list of the full pathnames of the commands to run
 
consed.argumentsToPassToUserDefinedPrograms: argument_for_first_key argument_for_se
cond_key
! a space-separated list of the arguments to pass to each user-defined programs
 
consed.tagsToApplyWithUserDefinedKeys: none polymorphismConfirmed
! a space-separate list of the tag types to apply when the user
! presses a user-defined key.  If a key is to have no associated tag,
! then enter "none" for that key.


This makes control-N and control-O ("oh"--not zero) call "/bin/echo"
by default.  In either the aligned reads window or the trace window,
click the cursor on a base and try these keys (e.g., holding down the
control key and typing 'o').  Watch in the xterm where you started
Consed for output like this:
 
argument_for_first_key djs74-561.s1 97 Contig1 2534 2581 a 51 /kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 tr.window
argument_for_second_key djs74-2679.s1 78 Contig1 2527 2574 c 39 /kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 a.r.window
 
djs74_561.s1 the read the user was viewing (or "consensus" if the
   cursor is on the consensus)
97 the base position in the direction of sequencing (or -1 if the
   cursor is on the consensus)
Contig1 the contig
2534 the unpadded consensus position
2581 is the padded (counts *'s) consensus position
'a' is the base
51 is the quality of the base
/kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 is the ace file 
tr.window means it was called by the user pushing the key in the trace
window--not the aligned reads window.


It's the same as if you had run the 
program from the shell, with command-line arguments, like this:
 
bash%: /bin/echo argument_for_first_key djs74-561.s1 97 Contig1 2534 2581 a 51 /kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 tr.window
 
You will also see that control-O will automatically add a
polymorphismConfirmed tag, but control-N will not add any tag.  That
is because of consed.tagsToApplyWithUserDefinedKeys (see above).
 
Several groups that are doing polymorphism detection have expressed
interest in this feature because it enables them to have Consed
directly write into an external database (e.g., Oracle or Sybase) by
calling a program that then writes to the database.
 
You can use these hotkeys from within the trace window or the aligned
reads window.
You don't have to use only ctrl-N/ctrl-O... for instance 1 is 
control-A, 2 is control-B, 3 is control-C, 4 is control-D, etc.

If you want to pass this information to a database, you will need to know 
how to talk to your database, and either choose your hotkey to do it 
directly for you, or call another program that takes the parameters 
above and massages them into the format your database needs.

control-A, control-E, and control-T already mean something in the
aligned reads window, so those keys cannot be defined to be anything
else.  Typically control-C, control-S, and control-Q already mean
something to the operating system so you can't use those, either.
 

24.22)  READ PREFIXES

You can create a file called readPrefixes.txt in edit_dir.  This file
contains a list of reads and prefixes for those reads.  In the Aligned 
Reads Window, the Consed user will see those read prefixes in a column 
before the read names.  This can be a very helpful feature for
finishers.  For example, these read prefixes can indicate to the
finishers which templates are available to use for making finishing
reads.

The format of the file is:

(readname) (read prefix) (color for read prefix)

The read prefix and color for read prefix are optional.  If you
leave them out, you get '*' for the read prefix in blue.


The consed parameters involving this feature are:

consed.defaultReadPrefix: *
consed.readPrefixesFile: readPrefixes.txt
consed.maxCharsDisplayedForReadPrefix: 1

but you probably won't need to change them.


24.23)  USING FILES CREATED ON WINDOWS OR WINDOWS NT.  

Don't.  (E.g., phd files generated by a Beckman CEQ-2000.)  These
files initially had <CR><LF> at end of line instead of <LF>.  CONSED
chokes every time it tries to read something from these phd files.
If you must use these files, you must first convert them to UNIX
format, which means stripping out the CR's and just having \n (decimal 10)
separate lines.

24.24)  CREATING YOUR OWN ACE FILES (INSTEAD OF ACE FILES CREATED BY
 PHRAP)

Some people have tried creating their own ace files, try Consed on it,
and when Consed starts up ok, they don't understand when later some
feature in Consed doesn't work.  This is because Consed does not check
everything about an ace file when it starts up.  If you are going to
write software to create ace files, here is a partial list of Consed
features you should check before you think your ace files are fine for
Consed:

assembly view
restriction digest
read all traces
complement contig and then read all traces
add new reads
 
If all of these work properly, then your ace files are probably ok.


24.25)  CONSED OPTIONS

You've seen quite a few consed options, such as -removeReads, -socket,
-ace, -nophd, -removeContigs, etc.

To see them all, type 
consed -help


--------------------------------------------------------------------------

25.  MONITORS AND MICE FOR CONSED

If your monitor is part of a Unix computer (a Linux box, a Mac or a
Sun) or is an Xterminal, then you will probably have no problem.

If your monitor is a PC running Windows (any flavor), then you must
have an X emulator installed and running.  X emulators include:
Exceed, XWin32, Reflection X, and OpenNT.  Any of these will work if
configured correctly (and the 'correctly' is the key).  I encourage
you to use single window mode (where there is one huge unix window
with xterms inside it) and then use a Unix window manager such as CDE,
fvwm, or mwm.

If your monitor is a MAC with macosx running, see NOTE TO MACOSX
USERS (above).

Whatever you monitor, you must have 3 button mouse or 3 button
emulation.  3 Button emulation is tricky since Consed uses all 3
buttons of the mouse and it also uses Control-Middle-Mouse-button,
Shift-Middle-Mouse-Button and Control-Right-Mouse-Button.  So if you
are going to try to just use a 2 button mouse (or, God-forbid, a 1
button mouse), you should make sure that you can emulate each of
those.  Often, if you push the left and right mouse buttons at the
same time, your X server will interpret that to be the middle mouse
button.  But you must consult your X emulator or X server to know what
it will do--that is out of Consed's control.


--------------------------------------------------------------------------

26.  ACE FILE FORMAT


Note that consed really requires both an ace file and a phd ball to
fully function.  If you are trying to write files that consed can
read, I strongly urge you to write both files.  Read the next section
about phd balls. 


Refer to the accompanying sample_ace_file.txt (below)

AS <number of contigs> <total number of reads in ace file>

CO <contig name> <# of bases> <# of reads in contig> <# of base segments in contig> <U or C>

This defines the contig.  The U or C indicates whether the contig has
been complemented from the way phrap originally created it.  Thus this
is always U for an ace file created by phrap.

The contig sequence follows.  It includes pads--"*" characters which
are inserted by phrap in order to make room for some read that has an
extra base at that position.  (Note: any position which counts the *'s is
referred to as a "padded position".  A position that does not count
*'s is referred to as "unpadded position".)  The contig sequence must
be followed by a blank line.

BQ

This starts the list of base qualities for the unpadded consensus
bases.  (NB: annoyingly, no qualities are given for *'s in the
consensus.)  The contig is the one from the previous CO, hence no name
is needed here.  The list of base qualities must be followed by a
blank line.


AF <read name> <C or U> <padded start consensus position>

This defines the location of the read within the contig.
C or U means complemented or uncomplemented.  
<padded start consensus position> means the position of the
beginning of the read, in terms of consensus bases which start at 1
and do count *'s.  

BS <padded start consensus position> <padded end consensus position> <read name>

The BS line (base segment) indicates which read phrap has chosen to be
the consensus at a particular position.

BS lines are now optional since they don't make much sense for
assemblers other than phrap.  (It is also possible to have contigs
with no reads in them.)  If you don't have any BS lines, there must be
a blank line between the most recent AF line and the next QA line.

If you choose to to write BS lines, I suggest you choose any read
which matches the consensus perfectly over the stretch of bases.
There must not be any two BS lines that intersect.  Each unpadded base
must be included in some BS line.

RD <read name> <# of padded bases> <# of whole read info items> <# of read tags>
Below RD is the sequence of bases for the read.  The sequence includes
*'s and is in the orientation that phrap needed to align it against
the consensus (thus it might be complemented from the direction it was
sequenced).  

QA <qual clipping start> <qual clipping end> <align clipping start> <align clipping end>

This line indicates which part of the read is the high quality segment
(if there is any) and which part of the read is aligned against the
consensus.  These positions are offsets (and count *'s) from the left
end of the read (left, as shown in Consed).  Hence for bottom strand
reads, the offsets are from the end of the read.  The offsets are
1-based.  That is, if the left-most base is in the aligned,
high-quality region, <qual clipping start> = 1 and <align clipping
start> = 1 (not zero).  If the entire read is low quality, then <qual
clipping start> and <qual clipping end> will both be -1.  phrap will
sometimes make a QA line in which both align clipping positions are
-1.  This means that the read is completely unaligned and shouldn't be
there at all.  (Sorry!  I know the read shouldn't even be in the ace file.)

DS CHROMAT_FILE: <name of chromat file> PHD_FILE: <name of phd file> TIME: <date/time of the phd file> CHEM: <prim, term, unknown, etc> DYE: <usually ET, big, etc> TEMPLATE: <template name> DIRECTION: <fwd or rev>

This line must contain information that matches the phd file.  If you
are writing an ace file, pay particular attention to this line.  Make
sure that Consed can read your ace file without reporting any errors.

For next-gen reads, without chromats or phd files, the DS lines look
like this:

DS VERSION: 1 TIME: Wed Dec 24 11:21:50 2008 CHEM: solexa

with just these 3 pieces of information: VERSION, TIME, and CHEM


There can be additional information on this line.
This replaces the DESCRIPTION line from the old ace file.

The following is for transient read tags (those generated by
cross_match and phrap).  

RT{
<read name> <tag type> <what program created tag> <padded read pos start> <padded read pos end> <date when tag was created in form YYMMDD:HHMISS>
}

for example:

RT{
djs14_680.s1 matchElsewhereLowQual phrap 904 933 990823:114356
}

The padded read pos in the RT tag is from the left end of the read
(the end with the smallest consensus position) in the assembly,
regardless whether the read is complemented or not.

There are consensus tags now in the ace file.  All consensus tags have
the following format:

CT{
<contig name> <tag type> <what program created tag> <padded cons pos start> <padded cons pos end> <date when tag was created in form YYMMDD> <NoTrans>
(possibly additional information)
}

The NoTrans is optional--it indicates that, when you reassemble, this
tag should not be transferred to the new assembly.  This is true with
tags that should be recreated each time because they have to do with
the assembly (e.g., repeat tags).

e.g.,

CT{
Contig206 repeat tagRepeats.perl 118732 119060 990823:115033 NoTrans
AluY
}
 
In the case of most consensus tag types, there is only 1 line for the
consensus tag.  In the case of comment tags and oligo tags, there are
additional lines of information.  The comment tag includes the comment
on the additional lines.  The oligo tag has the following information:
<oligo name> <oligo bases from 5' to 3'> <melting temp> <C or U
indicating whether the oligo is top strand or bottom strand relative
to the orientation of the contig as created by phrap>

Tags with comments look like this:

CT{
Contig1 polymorphism consed 1308 1315 080819:132640
COMMENT{
this is comment line 1
this is comment line 2
C}
}

where the comment block ends starts with 
COMMENT{ 
and ends with 
C}
Both of these must start in column 1.  There can be an arbitrary
number of lines in a comment block.


26.1)  Scaffolds--contigEndPair tags

contigEndPair tags can be used to link contigs together into
scaffolds.  

For example, suppose you want to connect the right end of Contig3 to
the left end of Contig1:

Contig3         Contig1
-----------     --------------


The following tags will make this link which you will be able to view
in AssemblyView. 


CT{
Contig1 contigEndPair consed 367 367 101119:110741
2
<-gap
acatcttctg
}

CT{
Contig3 contigEndPair consed 18392 18392 101119:110741
2
gap->
ggaccacagg
}

"367 367" is the padded consensus position of the tag.  Similarly with
"18392 18392".  

The "2" in both tags above is an ID that is unique to this pair of
contigEndPair tags.  
<-gap indicates the direction of the gap--it is a way of indicating
whether the other contig is linked to the left or right end of this
contig.

acatcttctg: Since this tag is on the left end of the contig (as
indicated by "<-gap" ), the bases are the reverse complement of the 10
bases from 357 to 367 (this gives 11 unpadded positions (cagaaga*tgt,
but there actually are only 10 bases here because * is a gap
character).  Consed checks the bases of contigEndPair tags when
putting scaffolds together--if the bases don't match the current
assembly, consed will not use that contigEndPair tag so the scaffold
will not be completely put together.  Be forewarned if you are writing
your own contigEndPair tags.

ggaccacagg:  Since this tag is on the right end of the contig (as
indicated by "gap->"), these bases are the 10 bases (not including gap
characters) from 18392 to 18401.  

Consed puts the tags on the last 10 bases of the high quality segment
of the contig (leftmost 10 bases for <-gap contigEndPair tags and
rightmost 10 bases for gap-> contigEndPair tags).  This is not
necessary, but it is a good idea to put them on sequence that isn't
going to change as you finish since if the bases change, the
contigEndPair tags will become useless.  Don't be a lazy programmer!

When you follow the instructions (above) on
CONTIG ARRANGEMENT--REORDER CONTIGS, behind the scenes consed is
adding contigEndPair tags.


WA{
<tag type> <what program created tag> <date tag was created in form YYMMDD:HHMISS>
1 or more lines of data
}

This line is a 'whole assembly' tag.  It is used for information
referring to the assembly as a whole.  For example, it is used to tell
consed which phd balls must be read in addition to this ace file.
Phrap puts its version and phrap command line options into another WA
tag.  (See examples below.)

You can append CT, WA, and RT tags to the end of the ace file in any
order you like, but they must be at the end of the ace file.

Below are two sample ace files.

Here is a sample ace file of Illumina reads:

AS 1 29

CO c_elegans_piece 148 29 0 U
ccgatgggacatggtcttcaaagcaacccaactgtacaagtgagttttca
agatttttttggatttctggaattttcaacattttcaacattttcagaag
tcgcctgcacccacctcccagaagttgcgaatgctaaaatagaggttc


BQ
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

AF C02D1ACXX:7:2101:17074:161212#GAACTATA/2/2 U -69
AF C02D1ACXX:7:1105:16754:170643#GAACTATA/2/2 U -27
AF C02D1ACXX:7:1304:8114:195839#GAACTATA/1/1 U -17
AF 635DTAAXX:3:59:17023:9292#GAACTATA/1/1 U -15
AF C02D1ACXX:7:2101:17074:161212#GAACTATA/1/1 C -4
AF c_elegans_piece U 1
AF C02D1ACXX:7:1306:7184:127960#GAACTATA/1/1 U 14
AF C02D1ACXX:7:1302:7347:172113#GAACTATA/2/2 U 16
AF C02D1ACXX:7:2204:17943:5286#GAACTATA/2/2 U 19
AF C02D1ACXX:7:2207:18183:186320#GAACTATA/2/2 U 20
AF C02D1ACXX:7:2108:4188:194053#GAACTATA/2/2 U 32
AF C02D1ACXX:7:2106:14054:45582#GAACTATA/1/1 U 34
AF C02D1ACXX:7:1201:3956:26308#GAACTATA/2/2 U 38
AF C02D1ACXX:7:1304:8114:195839#GAACTATA/2/2 C 48
AF C02D1ACXX:7:2102:21181:43700#GAACTATA/1/1 U 57
AF 635DTAAXX:3:59:17023:9292#GAACTATA/2/2 C 72
AF C02D1ACXX:7:2205:4184:168245#GAACTATA/1/1 U 73
AF C02D1ACXX:7:1105:16754:170643#GAACTATA/1/1 C 74
AF C02D1ACXX:7:1306:7184:127960#GAACTATA/2/2 C 78
AF C02D1ACXX:7:1201:3956:26308#GAACTATA/1/1 C 90
AF C02D1ACXX:7:2108:4188:194053#GAACTATA/1/1 C 98
AF C02D1ACXX:7:1203:8472:91581#GAACTATA/2/2 U 103
AF C02D1ACXX:7:2106:14054:45582#GAACTATA/2/2 C 107
AF C02D1ACXX:7:1302:7347:172113#GAACTATA/1/1 C 108
AF C02D1ACXX:7:1207:19451:166347#GAACTATA/2/2 U 108
AF C02D1ACXX:7:2107:4968:75168#GAACTATA/2/2 U 109
AF C02D1ACXX:7:2204:17943:5286#GAACTATA/1/1 C 116
AF C02D1ACXX:7:1108:14429:183405#GAACTATA/2/2 U 119
AF C02D1ACXX:7:1108:14429:183405#GAACTATA/1/1 C 121

RD C02D1ACXX:7:2101:17074:161212#GAACTATA/2/2 100 0 0
GGTAGTGTGTTCCGCTTTGATTGCCACTCCGGGTACCGGAGAGAAGGAGT
TGAGAGCTCGCTGTGCAAATCCGATGGGACATGGTCTTCAAAGCAACCCA

QA 1 100 71 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1105:16754:170643#GAACTATA/2/2 100 0 0
GAAGGAGTTGAGAGCTCGCTGTGCAAATCCGATGGGACATGGTCTTCAAA
GCAACCCAACTGTACAAGTGAGTTTTCAAGaTTTTTTTGGATTTCTGGAA

QA 1 100 29 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1304:8114:195839#GAACTATA/1/1 100 0 0
AGAGCTCGCTGTGCAAATCCGATGGGACATGGTCTTCAAAGCAACCCAAC
TGTACAAGTGAGTTTTCAAGATTTTTTTGGATTTCTGGAATTTTCAACAT

QA 1 100 19 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD 635DTAAXX:3:59:17023:9292#GAACTATA/1/1 76 0 0
AGCTCGCTGTGCAAATCCGATGGGACATGGTCTTCAAAGCAACCCAACTG
TACAAGTGAGTTTTCAAGATTTTTTT

QA 1 76 17 76
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2101:17074:161212#GAACTATA/1/1 100 0 0
CAAATCCGATGGGACATGGTCTTCAAAGCAACCCAACTGTACAAGTGAGT
TTTCAAGATTTTTTTGGATTTCTGGAATTTTCAACATTTTCAACATTTTC

QA 1 100 6 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD c_elegans_piece 148 0 0
ccgatgggacatggtcttcaaagcaacccaactgtacaagtgagttttca
agatttttttggatttctggaattttcaacattttcaacattttcagaag
tcgcctgcacccacctcccagaagttgcgaatgctaaaatagaggttc

QA 1 148 1 148
DS VERSION: 1 TIME: Wed Sep 4 11:00:11 2013
RD C02D1ACXX:7:1306:7184:127960#GAACTATA/1/1 100 0 0
GTCTTCAAAGCAACCCAACTGTACAAGTGAGTTTTCAAGATTTTTTTGGA
TTTCTGGAATTTTCAACATTTTCAACATTTTCAGAAGTCGCCTGCACCCA

QA 1 100 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1302:7347:172113#GAACTATA/2/2 100 0 0
CTTCAAAGCAACCCAACTGTACAAGTGAGTTTTCAAGATTTTTTTGGATT
TCTGGAATTTTCAACATTTTCAACATTTTCAGAAGTCGCCTGCACCCACc

QA 1 100 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2204:17943:5286#GAACTATA/2/2 100 0 0
CAAAGCAACCCAACTGTACAAGTGAGTTTTCAAGATTTTTTTGGATTTCT
GGAATTTTCAACATTTTCAACATTTTCAGAAGTCGCCTGCACCCACCTCC

QA 1 100 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2207:18183:186320#GAACTATA/2/2 100 0 0
AAAGCAACCCAACTGTACAAGTGAGTTTTCAAGATTTTTTTGGATTTCTG
GAATTTTCAACATTTTCAACATTTTCAGAAGTCGCCTGCACCCACCTCCC

QA 1 100 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2108:4188:194053#GAACTATA/2/2 100 0 0
CTGTACAAGtGaGTTTTAAAGATTTTTTTGGATTTCTGGAATTTTCAACA
TTTTCAACATTTtCAGAAGTCGacTGCACCCACcTCCCAGAagtTGCGAa

QA 1 99 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2106:14054:45582#GAACTATA/1/1 100 0 0
GTACAAGTGAGTTTTCAAGATTTTTTTGGATTTCTGGAATTTTCAACATT
TTCAACATTTTCAGAAGTCGCCTGCACCCACCTCCCAGAAGTTGCGAATG

QA 1 100 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1201:3956:26308#GAACTATA/2/2 100 0 0
AAGTGAGTTTTCAAGATTTTTTTGGATTTCTGGAATTTTCAACATTTTCA
ACATTTTCAGAAGTCGCCTGCACCCACCTCCCAGAAGTTGCGAATGCTAA

QA 1 100 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1304:8114:195839#GAACTATA/2/2 100 0 0
TCAAGATTTTTTTGGATTTCTGGAATTTTCAACATTTTCAACATTTTCAG
AAGTCGCCTGCACCCACCTCCCAGAAGTTGCGAATGCTAAAATATAGGTT

QA 1 100 1 100
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2102:21181:43700#GAACTATA/1/1 100 0 0
TTTTGGATTTCTGGAATTTTCAACATTTTCAACATTTTCAGAAGTCGCCT
GCACCCACCTcCCAGAAGTTGCGAATGCTAAAATAGAGGTtCCGGATAGA

QA 1 100 1 92
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD 635DTAAXX:3:59:17023:9292#GAACTATA/2/2 76 0 0
ATTTTCAACATTTTCAACATTTTCAGAAGTCGCCTGCACCCACCTCCCAG
AAGTTGCGAATGCTAAAATAGAGGTT

QA 1 76 1 76
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2205:4184:168245#GAACTATA/1/1 100 0 0
TTTTCAACATTTTCAACATTTTCAGAAGTCGCCTGCACCCACCTCCCAGA
AGTTGCGAATGCTAAAATAGAGGTTCCGGATAGATTTTTGTTTGGTGACG

QA 1 100 1 76
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1105:16754:170643#GAACTATA/1/1 100 0 0
TTTCAACATCTTCAACATTTTCAGAAGTCGCCTGCACCCACCTCCCAGAA
GTTGCGAATGCTAAAATAGAGGTTCCGGATAGATTTTTGTTTGGTGACGT

QA 1 100 1 75
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1306:7184:127960#GAACTATA/2/2 100 0 0
AACATTTTCAACATTTTCAGAAGTCGCCTGCACCCACCTCCCAGAAGTTG
CGAATGCTAAAATAGAGGTTCCGGATAGATTTTTGTTTGGTGACGTGGCC

QA 1 100 1 71
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1201:3956:26308#GAACTATA/1/1 100 0 0
ATTTTCAGAAGTCGCCTGCACCCACCTCCCAGAAGTTGCGAATGCTAAAA
TAGAGGTTCCGGATAGATTTTTGTTTGGTGACGTGGCCCGAGTGGTCTGC

QA 1 100 1 59
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2108:4188:194053#GAACTATA/1/1 100 0 0
aagtcgcctgcaccctcctcCcagaagTTgCGAATGcTAAAATAGAGGTt
CCGGataGATTTTTGtTTGGtGACGTGGcCcGAGTGGTCTGCAaCTCTGG

QA 21 100 1 51
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1203:8472:91581#GAACTATA/2/2 100 0 0
GCCTGCACCCACCTCCCAGAAGTTGCGAATGCTAAAATAGAGGTTCCGGA
TAGATTTTTGTTTGGTGACGTGGCCCGAGTGGTCTGCAACTCTGGCTTCa

QA 1 100 1 46
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2106:14054:45582#GAACTATA/2/2 100 0 0
GCaCCCaCCTCCCAGAAGTTGCgAATGCTAAAATAGAGGTTCCGGATAGA
TTTTTGTTTGGTGACGTGGCCCGAGTGGTCTGCAACTCTGGCTTCACTAT

QA 1 100 1 42
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1302:7347:172113#GAACTATA/1/1 100 0 0
cACCCaCCTCCCAGAAGTTGCGAATGCTAAAATAGAGGTTCCGGATAGAT
TTTTGTTTGGTGACGTGGCCCGAGTGGTCTGCAACTCTGGCTTCACTATC

QA 1 100 1 41
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1207:19451:166347#GAACTATA/2/2 100 0 0
CACCCACCTCCCAGAAGTTGCGAATGCTAAAATAGAGGTTCCGGATAGAT
TTTTGTTTGGTGACGTGGCCCGAGTGGtCTGCAACTCTGGCtTCACTATC

QA 1 100 1 41
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2107:4968:75168#GAACTATA/2/2 100 0 0
ACCCACCTCCCAGAAGTTGCGAATGCTAAAATAGAGGTTCCGGATAGATT
TTTGTTTGGTGACGTGGCCCGAGTGGTCTGCAACTCTGGCTTCACTATCG

QA 1 100 1 40
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:2204:17943:5286#GAACTATA/1/1 100 0 0
tCCCAGAAGTTGCGAATGCTAAAATAGAGGTTCCGGATAGATTTTTGTTT
GGTGACGTGGCCCGAGTGGTCTGCAACTCTGGCTTCACTATCGACGGACC

QA 1 100 1 33
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1108:14429:183405#GAACTATA/2/2 100 0 0
CAGAAGTTGCGAATGCTAAAATAGAGGTTCCGGATAGATTTTTGTTTGGT
GACGTGGCCCGAGTGGTCTGCAACTCTGGCTTCACTATCGACGGACCTGA

QA 1 100 1 30
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013
RD C02D1ACXX:7:1108:14429:183405#GAACTATA/1/1 100 0 0
GAAGTTGCGAATGCTAAAATAGAGGTTCCGGATAGATTTTTGTTTGGTGA
CGTGGCCCGAGTGGTCTGCAACTCTGGCTTCACTATCGACGGACCTGAAG

QA 1 100 1 28
DS VERSION: 1 TIME: Wed Sep 4 11:01:14 2013

WA{
phdBall fasta2Ace.perl 130904:110011
../phdball_dir/phd.ball.1
}

WA{
phdBall consed 130904:110114
../phdball_dir/phd.ball.2
}


Sample Ace File with Sanger reads:

AS 1 8

CO Contig1 1475 8 156 U
agccccgggccgtggggttccttgagcactcccaaagttccaacccagga
tgtccccgacgcttaaaccttccaagtctgaaacgggaaatttgatttgc
gggctaggataaacgccggggagaaaggcagaactgccttttacccccca
aggatatcccttgggaagggcccctttgcactcagctgctccctaattat
ggcgatcctccctctatctttgtccccctgtctttcaggatccctctcAA
CAACAgaccaCTCccattaaaGAAATCtccttctgatctgcgggatcACA
TAAAACAGTGCCattcAAaAcgtcccttcCcccAATGTCtaagtgTggtg
gagcCcttcctgcCCggctctgtgcacccacggtgcctgcatgaccccgg
atGCAGTGTGCACCAGctCCCATCATTCAAgagCATGACTGTGTTGCCAA
CCAGCcacCAGGCACTGGGGAGGGAGCtgaGGGAGCAcaaAAGGGATGAG
CCACCCTCTGTcCcagAAGTGGAGGGCATGGGGCTTGGCTGGGCTTAGAG
CTAACATACACAGGATGCTGAAAAAGAACAACACAAggtGTGTGGAGCAA
AGGAAAGGGAAATCAGCTTGAAGCTGATGTTAGTGTGCTTGGGCTGAGTA
CAGCCATGCTCTCAGTTGAGGCACGGTTGGCTCCCCATGGGCAAGATCCC
TCCTGGCCCATCTCTCCTCTTATTCTCTATCCCTTCCCCAGGTCCCTGCC
TTAGAGGTTTCACCAGAGCACAGCTCCTGCCTGTGGCCAAAACAGTATTT
GGCCACTCACCGACCCAGTGTCAGC*ATCCAGATGGGTTCCACATCTCAC
AACCCT*GAGCAGCAGAGAAGGGTTTGAAAGGCCAGGGGAG*AATGAAGA
CGAAGGAGG*TGTTGGCAACAACACAGA*G*AGTCAGCAGCCAGAACGCC
AGGTATCCACACACATAAGACATTCTAAATTTTTACTCAACAGAAATTGT
CTATGTCTGTGTCTGGGCACCATGGCAACACCTTATCTCTACAAAAATTA
GCGGAATGTAGTGGTGCCTGTGTGTAGTCCCAGCTATTCAAGAGGCTGAA
GTGGGAGGATTGCTTGAGCCATGGAAGTCAAGGCTGTAGTGAGCCATGAT
TGTGTCAATGCACTCCAGACAGAGCAAGACCCTGCTCCCACCACACACCT
CaaacgaaAAAAAAaaagggcaaagatatgaactgaaatggaatatag*a
gcagcaaaaggaacagaaaattgtctatgcctggttctctagtcatgtgc
agaacagacagtatcccggccctattgagttcttggggcagttaggcttg
tgcacccttgcttctatgccacagttagggcattcgggattcccatcctt
ttccccggggttgctttttgtttgcgattaccttttcggaacaatggggg
gaaattattttccaagttgggtttg


BQ
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 22
 23 26 24 25 25 17 17 13 14 19 21 22 22 17 17 11 8 7 10 13 18 23 28 28 31 31 32 18 18 10 10 10 12 15 8 6 6 8 8 10 15 9 11 12 15 14 15 20 20 28
 30 31 24 24 22 24 25 28 23 27 24 27 18 15 15 16 21 23 18 20 13 8 7 7 12 10 9 10 10 21 12 14 14 28 27 32 24 23 20 19 15 17 15 17 19 20 13 13 13 14
 14 10 10 10 23 10 10 10 10 10 11 11 18 25 24 10 10 10 10 10 14 10 11 11 11 13 12 12 10 12 10 10 10 10 10 10 14 10 12 10 10 10 10 10 10 10 14 13 15 15
 17 19 24 32 37 37 37 37 32 30 30 30 28 23 23 25 15 15 20 27 32 23 22 22 27 32 34 34 21 21 12 12 12 24 32 41 45 45 37 45 45 45 45 45 37 37 37 41 41 37
 37 37 41 32 32 14 14 19 32 28 37 37 41 41 45 45 37 37 37 30 30 32 32 37 37 32 28 16 16 17 32 32 37 45 37 25 25 9 9 9 25 25 37 37 37 37 37 45 40 37
 37 37 45 45 37 37 37 37 38 25 25 12 25 10 10 15 32 47 52 62 62 55 43 43 34 43 43 58 58 78 77 72 72 70 70 70 74 77 69 68 55 55 55 57 61 65 70 73 68 61
 64 58 56 56 64 65 67 70 70 75 79 70 70 70 70 70 70 67 71 71 71 84 63 63 62 62 62 59 59 61 61 64 64 49 42 32 10 6 18 32 35 46 47 48 47 47 47 55 55 55
 55 49 46 47 47 55 55 55 54 47 47 47 48 48 54 54 54 48 48 55 47 47 47 55 49 48 48 48 55 47 48 48 47 47 47 46 48 48 48 50 44 43 44 44 49 49 73 75 82 78
 74 66 66 58 54 60 68 68 61 63 47 57 45 74 85 78 70 65 62 61 61 55 73 65 59 61 75 77 80 86 81 81 83 85 85 85 90 84 78 78 73 75 78 77 86 75 76 83 79 84
 87 78 72 75 72 72 76 79 82 88 90 89 89 89 89 89 90 90 90 85 85 79 83 83 90 90 90 90 90 90 90 90 90 90 90 90 90 89 89 89 90 90 90 90 90 90 90 90 90 90
 90 90 90 81 66 66 62 62 62 73 89 90 90 86 86 86 86 88 88 90 90 90 90 90 90 90 88 71 68 61 61 66 66 70 65 64 70 70 76 90 90 90 90 90 90 85 90 90 90 87
 87 79 79 79 79 89 74 65 71 72 79 73 73 70 75 79 76 81 81 83 80 87 89 90 82 82 90 88 88 88 88 89 86 77 77 80 79 79 79 90 90 90 90 79 79 61 58 53 76 63
 57 65 76 76 76 80 89 89 89 90 90 90 90 88 88 88 88 88 88 90 90 90 90 90 90 90 90 90 90 88 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 83 79 58 43
 45 68 70 61 75 76 73 68 84 88 90 90 90 90 90 89 72 54 62 62 53 55 55 80 83 80 80 83 85 83 87 83 83 83 85 85 86 86 84 81 83 82 77 78 76 76 77 77 80 88
 88 87 90 90 90 90 85 84 82 71 75 62 62 37 68 75 77 74 70 71 70 72 72 80 80 80 84 83 82 66 70 55 55 55 37 55 55 55 55 55 55 55 55 54 55 55 55 48 47 47
 47 47 47 47 47 47 47 47 55 50 50 50 47 47 47 47 44 44 55 48 51 51 54 54 54 54 54 55 54 54 55 55 55 55 55 55 55 55 55 55 55 55 55 51 51 51 54 51 61 61
 61 61 61 61 44 42 34 34 37 37 37 44 47 47 47 61 61 61 61 61 61 61 47 49 48 47 55 54 55 55 55 55 55 44 44 44 44 46 43 43 44 44 44 51 44 47 44 34 44 44
 44 44 39 39 43 42 50 42 42 38 37 38 41 50 52 55 47 47 39 44 44 46 41 42 40 43 40 41 42 38 37 42 55 50 44 44 46 48 55 55 55 37 34 34 33 42 47 42 42 42
 42 55 46 46 46 48 47 48 46 43 41 39 42 39 44 44 44 48 48 38 36 36 38 38 38 44 44 44 44 44 44 42 42 36 41 40 36 36 30 33 32 29 28 28 23 12 16 10 8 8
 13 14 23 20 21 28 28 31 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

AF K26-217c U 498
AF K26-526t U 510
AF K26-961c U 577
AF K26-394c U 797
AF K26-291s U 828
AF K26-822c U 883
AF K26-572c C 1
AF K26-766c C 408
BS 1 515 K26-572c
BS 516 516 K26-217c
BS 517 521 K26-572c
BS 522 529 K26-217c
BS 530 538 K26-572c
BS 539 569 K26-217c
BS 570 571 K26-526t
BS 572 573 K26-217c
BS 574 579 K26-526t
BS 580 584 K26-217c
BS 585 591 K26-526t
BS 592 592 K26-217c
BS 593 601 K26-526t
BS 602 604 K26-217c
BS 605 606 K26-526t
BS 607 607 K26-217c
BS 608 621 K26-526t
BS 622 628 K26-217c
BS 629 629 K26-526t
BS 630 630 K26-217c
BS 631 633 K26-526t
BS 634 634 K26-217c
BS 635 635 K26-526t
BS 636 639 K26-217c
BS 640 646 K26-526t
BS 647 648 K26-217c
BS 649 649 K26-526t
BS 650 650 K26-217c
BS 651 654 K26-766c
BS 655 655 K26-961c
BS 656 656 K26-217c
BS 657 669 K26-961c
BS 670 675 K26-217c
BS 676 676 K26-961c
BS 677 688 K26-217c
BS 689 693 K26-526t
BS 694 696 K26-217c
BS 697 698 K26-526t
BS 699 700 K26-961c
BS 701 706 K26-217c
BS 707 707 K26-961c
BS 708 708 K26-217c
BS 709 709 K26-961c
BS 710 710 K26-526t
BS 711 775 K26-961c
BS 776 776 K26-766c
BS 777 777 K26-961c
BS 778 834 K26-766c
BS 835 837 K26-961c
BS 838 840 K26-394c
BS 841 882 K26-766c
BS 883 884 K26-394c
BS 885 898 K26-766c
BS 899 899 K26-961c
BS 900 900 K26-766c
BS 901 901 K26-961c
BS 902 934 K26-766c
BS 935 935 K26-394c
BS 936 936 K26-766c
BS 937 937 K26-394c
BS 938 940 K26-766c
BS 941 944 K26-394c
BS 945 945 K26-291s
BS 946 948 K26-822c
BS 949 949 K26-766c
BS 950 951 K26-822c
BS 952 954 K26-766c
BS 955 955 K26-822c
BS 956 957 K26-394c
BS 958 962 K26-822c
BS 963 963 K26-394c
BS 964 970 K26-822c
BS 971 971 K26-394c
BS 972 972 K26-822c
BS 973 973 K26-394c
BS 974 976 K26-822c
BS 977 979 K26-394c
BS 980 986 K26-291s
BS 987 987 K26-394c
BS 988 1004 K26-822c
BS 1005 1009 K26-394c
BS 1010 1012 K26-291s
BS 1013 1014 K26-394c
BS 1015 1021 K26-822c
BS 1022 1022 K26-394c
BS 1023 1026 K26-822c
BS 1027 1028 K26-291s
BS 1029 1036 K26-822c
BS 1037 1052 K26-291s
BS 1053 1053 K26-822c
BS 1054 1060 K26-291s
BS 1061 1061 K26-822c
BS 1062 1062 K26-291s
BS 1063 1065 K26-394c
BS 1066 1068 K26-822c
BS 1069 1079 K26-291s
BS 1080 1081 K26-822c
BS 1082 1082 K26-291s
BS 1083 1084 K26-822c
BS 1085 1089 K26-291s
BS 1090 1094 K26-822c
BS 1095 1096 K26-394c
BS 1097 1099 K26-822c
BS 1100 1100 K26-291s
BS 1101 1104 K26-822c
BS 1105 1105 K26-394c
BS 1106 1110 K26-822c
BS 1111 1115 K26-291s
BS 1116 1122 K26-822c
BS 1123 1124 K26-291s
BS 1125 1135 K26-822c
BS 1136 1136 K26-394c
BS 1137 1139 K26-822c
BS 1140 1140 K26-291s
BS 1141 1150 K26-822c
BS 1151 1155 K26-291s
BS 1156 1161 K26-822c
BS 1162 1164 K26-291s
BS 1165 1167 K26-822c
BS 1168 1173 K26-291s
BS 1174 1175 K26-822c
BS 1176 1189 K26-291s
BS 1190 1196 K26-822c
BS 1197 1199 K26-291s
BS 1200 1221 K26-822c
BS 1222 1225 K26-291s
BS 1226 1227 K26-822c
BS 1228 1228 K26-394c
BS 1229 1231 K26-291s
BS 1232 1233 K26-822c
BS 1234 1235 K26-291s
BS 1236 1236 K26-394c
BS 1237 1239 K26-291s
BS 1240 1242 K26-822c
BS 1243 1244 K26-291s
BS 1245 1247 K26-394c
BS 1248 1255 K26-822c
BS 1256 1256 K26-291s
BS 1257 1257 K26-394c
BS 1258 1258 K26-291s
BS 1259 1259 K26-822c
BS 1260 1260 K26-394c
BS 1261 1265 K26-291s
BS 1266 1266 K26-822c
BS 1267 1268 K26-394c
BS 1269 1269 K26-822c
BS 1270 1275 K26-291s
BS 1276 1280 K26-822c
BS 1281 1281 K26-394c
BS 1282 1290 K26-822c
BS 1291 1292 K26-291s
BS 1293 1294 K26-822c
BS 1295 1297 K26-291s
BS 1298 1301 K26-822c
BS 1302 1302 K26-291s
BS 1303 1475 K26-822c

RD K26-217c 563 0 0
tcccCgtgagatcatcctgaAGTGGAGGGCATGGGGCTTGGCTGGGCTTA
GAGCTAACATACACAGGATGCTGAAAAAGAACAACACAAgntGTGTGGAG
CAAAGGAAAGGGAAATCAGCTTGAAGCTGATGTTAGTGTGCTTGGGCTGA
GTACAGCCATGctntCAGTTGAGGCACGGTTGGCTCCCCATGGGCAAGAT
CCCTCCTGGCCCATCTCTCCTCTTATTCTCTATCCCTTCCCCAGGTCCCT
GCCTTAGAGGTTTCACCAGAGCACAGCTCCTGcctgtggccaAAACAGTA
TTTGGCCACTCACcGAcccagTGTCAGC*atccaGatggGtTccacatct
cacaaccct*gggcagcagagaaggggtttaaaggccagggggg*tatta
agccgaaggagg*ttttggaaacaccaaggg*g*ggtcagaccccaacgc
cagtttccccaaaaaggggcattcaaatttttttctcagagattttcttt
ccttttttgggccccgggaaccttttttaaaaaatgggggattgggcccc
cttggcccccctc

QA 19 349 19 424
DS CHROMAT_FILE: K26-217c PHD_FILE: K26-217c.phd.1 TIME: Thu Sep 12 15:42:38 1996
RD K26-526t 687 0 0
ccgtcctgagtggAGggcatggggcttggctggGCTTAGAGCTAACATAC
ACAGGATGCTGAAAAAGAACAACACAAggtGTGTGGAGCAAAGGAAAGGG
AAATCAGCTTGAAGCTGATGTTAGTGTGCTTGGGCTGAGTACagcnatgc
tntgaGTTGAggaacgGTTGGCTCCCCATGGGCAAGATCCCTCCTGGCCC
ATCTCTCCTCTTATTCTCTATCCCTTCCCCAGGTCCCTGCCTTAGAGGTT
TCACcAgAGCACAgCTCctgcctgtggccaAAACAGTATTTGGccACTCA
CCGAcCCAGTGTcagt*atccAGATGGGttccACATCtcacagcccT*Ga
gcAgcagngaaGGGTttgaaagggcAgggggggaatgaaGacggaggagg
gtgttggcaaccacacaga*ggagtcaggaggcaggacggcaggtatccA
Cacacattaggcattttaaatttttacttaacaggaattgtctatggctg
ggtttgggaac*atgggaacacctattcttt*caaaa*ttggggggat*t
agtggtgc*tgt*tatagtcccgttattaaGggttaagtggggtttcttt
gccaggaggtaaggtttggggccctatttttaattacttggaaggaagcc
ttttcccagataaggaaaaaggaggtTTtttgtttta

QA 12 353 9 572
DS CHROMAT_FILE: K26-526t PHD_FILE: K26-526t.phd.1 TIME: Thu Sep 12 15:42:33 1996
RD K26-961c 517 0 0
aatattaccggcgcggggttCcgTCGGAAAGGGAAATCAGCTTGAAGCTG
ATGTTAGTGTGCTTGgGCTGAGTacaGCCATGCTCTCAGTTGAGGCACGG
TTGGCTCCCCATGGGCAAGATCCCTCCTGGCCCATCTCTCCTCTTATTCT
CTATCCCTTCCCCAGGTCCCTGCCTTAGAGGTTTCACCAGAGCACAGCTC
CTGccTGTGGCCAAAACAGTATTTGGccactgaccGACCCagtGTCAGC*
ATCCAGATGGGTTCCACATCTCacaaccCT*GAGCAGCAGAGAAGGGTTT
GAaagGcCAGGGGAG*AATGAAGACgaaggaGG*TGTTgGcaacaacaca
gA*G*AGTCAGCAGccAgaacgccaggtatccacACACATaaggCATtct
aaatttttaCtcaACaggaattgtctATgtctgtgTCtgggcaccagggc
a*cacctTATCTCTAcaaaaat*agcgggatttagtggtgcttgtgtg**
g*cccagctattcaggg

QA 20 415 26 514
DS CHROMAT_FILE: K26-961c PHD_FILE: K26-961c.phd.1 TIME: Thu Sep 12 15:42:37 1996
RD K26-394c 628 0 0
ctgcgtatcgtcacc*accCAGTGTCagctatcCAGATGGGTTCCACATC
TcacaacCCT*GAGCAGCAGAGAAGGGTTTGAAAGGCCAGGGGAG*AATG
AAGACga*gGAGG*tgTTGGCAACAacacagA*G*AGTCAGCAGCCAGAA
CGCCAGGTATCCACACACATAAGACATTCTAAATTTTTACTCAACAGAAA
TTGTCTATGTCTGTGTCTGGgcaCCATGGCAACACCTTATCTCTACAAAA
ATTAGCGGAATGTAGTGGTGCCTGtgtGTAGTCCCAGCTATTCaaGAGGC
TGAAGTGGGAGGATTGCTTGagccaTggaagtcaagGCTGTAGTGagCCa
TGattgtgtCaATGCACtcnagAcagagcaaGACCctgctcccaccacac
aacttaanaggaaaaaaaaaaaggaaaagaaatgaaatgaaatgggatat
ag*aa*aggaaaagga*cagaaa*ttgtctatgcctggt*ctctagtaat
gtcagtcagccagtttccagccttttggtcttgggcattctgctgtcaca
atctcttggaacgttgggcagggaatcccatttttcccccgtttTttttt
gtggcaattaccttttggaaccctgggt

QA 18 368 11 502
DS CHROMAT_FILE: K26-394c PHD_FILE: K26-394c.phd.1 TIME: Thu Sep 12 15:42:32 1996
RD K26-291s 556 0 0
gaggatcgcttTCCacatctcaCAaccctcgagCAgCagagAAgggTTTG
AAAGGCCAGGGGAG*AATGAAGACGa*ggAGG*TGTTGGCAACAacacag
a*G*AGTCAGCAGCCAGAACGCCAggtaTCCAcacacataAgccatTCTA
AATTTTTACTCAAcagAAATTGTCTAtgTCTGTGTCTGggcacCATGGCA
ACACCTTATCTCTACAAAAATTAGCGGAATGTAGTggtGCCTGTGTGTAG
TCCCAGCTATTCAAgaggctGAAGTgcgaggatTGCTTgagCCATGGAAG
TcaaggctgtAGTGAgccatgatTGTGTCAATGCACTCCAGACAGAGCAA
GACCCTGCTCCCAccaCACAcctcaaaaggtattgattaaaGGAaAagaa
atgaaAtgaaatgagataaaggaaaaggaaaaagaacaggatattgTCtA
Tgcctgat*ctctagt*atgtgcagacagaagtttccagccactgagttc
ttgccccagctaactttttacaaatccccctggggaaggtttggcccagg
cagatg

QA 11 373 11 476
DS CHROMAT_FILE: K26-291s PHD_FILE: K26-291s.phd.1 TIME: Thu Sep 12 15:42:31 1996
RD K26-822c 593 0 0
ggggatccg*tcatgagacga*ggAGG*TGTTGGCAACa*ca*agaag*A
GTCAGCAGCCAGAACGCCAGGTATCCACACACATAAGACATTCTAAATTT
TTACTCAACAGAAATTGTCTATGTCTGtgtCTGGGCACCATGGCAACACC
TTATCTCTACAAAAATTAGCGGAATGTAGTggTGCCTGtgtGTAGTCCCA
GCTATTCAAGAGGCTGAAGTGGGAGGATTGCTTGAGCCATGGAAGTCAAG
GCTGTAGTGAGCCATGATTGtgtCAATGCACTCCAGAcAgAGCaAgacCC
tgCTCccACCACACacctCaaacgaaAAAAAAaaagggcaaagatatgaa
ctgaaatggaatatag*agcagcaaaaggaacagaaaattgtcTATGcct
ggttctctagtcatgtgcagaacagacagtatcccggccctattgagttc
ttggggcagttaggcttgtgcacccttgcttctatgccacagttagggca
ttcgggattcccatccttttccccggggttgctttttgtttgcgattacc
ttttcggaacaatggggggaaattattttccaagttgggtttg

QA 25 333 16 593
DS CHROMAT_FILE: K26-822c PHD_FILE: K26-822c.phd.1 TIME: Thu Sep 12 15:42:36 1996
RD K26-572c 594 0 0
agccccgggccgtggggttccttgagcactcccaaagttccaacccagga
tgtccccgacgcttaaaCcttccaagtctgaaacgggaaAtttgatttgc
gggctaggataaacgccggggagaaaggcagaactgccttttaccCCcca
aggatatcccttgggaagggcccctttgcactcagctgctccctaattat
ggcgatcctccctctatctttgtccccctgtctttcaggatccctctcAA
CAACAgaccaCTCccattaaaGAAATCtccttctgatctgcgggatcACA
TAAAACAGTGCCattcAAaAcgtcccttcCcccAATGTCtaagtgTggtg
gagcCcttcctgcCCggctctgtgcacccacggtgcctgcatgaccccgg
atGCAGTGTGCACCAGctCCCATCATTCAAgagCATGACTGTGTTGCCAA
CCAGCcacCAGGCACTGGGGAGGGAGCtgaGGGAGCAcaaAAGGGATGAG
CCACCCTCTGTcCcagAAGTGGAgcgcATGGGGCTTGGCTgggcTTAGAG
CtaacaTACACAGGATGCTGAAaaagaaCAACACaatagtaaca

QA 249 584 1 586
DS CHROMAT_FILE: K26-572c PHD_FILE: K26-572c.phd.1 TIME: Thu Sep 12 15:42:34 1996
RD K26-766c 603 0 0
gaataattggaatcacggcaaaaatttggggacaaatattatttccaaaa
ttcccccagcaatcacacaggccctcaagcccatcaactcggtcattcac
cgattttcctaaatcaagggtattagcttg*ctgggcttacacctaacat
acacagcatgctcaatgagaAcaatacgagctgtgtggagcacaggaagg
ggaAAtcagcctgaagctgctgttagtgtgcttgg*ctgAGTACAGCcaT
GCTctCAGTTgaggcAcggTTGGCTCCCCATGGgCAAGATCCCTCCTggC
CCATCTCTCCTCTTaTTCTCTATCCCTTCCCCAGGTCCCTGCCTTAGagg
tttCACCAGAGCACAGCTCCTGCCTGTGGCCAAAACAGTATTTGGCCACT
CACCGACCCAGTGTCAGC*ATCCAGATGGGTTCCACATCTCACAACCCT*
GAGCAGCAGAGAAGGGTTTGAAAGGCCAGGGGAG*AATGAAGACGAAGGA
GG*TGTTGGCAACAACACAGA*G*AGTCAGCAGCCAGAACGCCAGGTATC
CACACACATAagaCATtctaAATTTTTACTCAAacgatcCccggaaccac
acg

QA 240 584 126 583
DS CHROMAT_FILE: K26-766c PHD_FILE: K26-766c.phd.1 TIME: Thu Sep 12 15:42:35 1996

WA{
phrap_params phrap 990621:161947
/usr/local/genome/bin/phrap standard.fasta.screen -new_ace -view 
phrap version 0.990319
}

CT{
Contig1 repeat consed 976 986 971218:180623
}

CT{
Contig1 comment consed 996 1007 971218:180623
This is line 1 of a comment
There may be any number of lines
}

CT{
Contig1 oligo consed 963 987 971218:180623
standard.1 acataagacattctaaatttttact 50 U
seq from clone
}


----------------------------------------------------------------------------
27.  SAMPLE PHD BALL FORMAT

Consed requires not just ace files, but phd balls to fully function.
Among other things, phd balls are required to give consed the quality
of read bases and to tell consed which pairs of reads form mate pairs
(forward-reverse pairs).

PHD files (as opposed to phdballs) are a leftover from the days of
sequencing when there were only a few thousand reads at most.  The
linux operating system and software cannot handle millions of phd
files in the same directory, so Consed now typically uses a small
number of phd balls.  

Phd balls are just concatenations of phd files with a few differences,
described below, including: 1) comment at beginning 2) version # on
the BEGIN_SEQUENCE line and 3) peak positions are optional.

Here is an example of a phd ball file that contains 2 reads (typically
a phd ball file will contain up to a million).  Notice that there is a
comment at the beginning starting with "#" at the beginning of the
line.  Also notice that the BEGIN_SEQUENCE line is slightly different
than in phd files due to the "1" at the end of the line--this is the
version, which corresponds to the extension on the end of a phd file
name such as HWI-EAS94_4_1_1_537_446.phd.1

Notice also that peak positions (which normally form a 3rd column
after the quality) are now optional, which helps keep the file size
down.  For reads that you want to see the traces, you will need to
have peak positions.  

Also notice that now the only required fields between the
BEGIN_COMMENT line and the END_COMMENT line is TIME and CHEM.

Phd files (as opposed to phd balls) do still exist and are supported.
When the user edits and read and saves the assembly, consed writes out
a single phd file instead of writing a new entire phd ball (which can
be gigabytes).


Here is a Illumina example:


# solexa file ../solexa_dir/solexa_reads.fastq (beginning)

BEGIN_SEQUENCE HWI-EAS94_4_1_1_537_446 1
BEGIN_COMMENT
TIME: Wed Dec 24 11:21:50 2008
CHEM: solexa
END_COMMENT
BEGIN_DNA
g 30
c 30
c 30
a 30
a 30
t 30
c 30
a 30
g 30
g 30
t 30
t 30
t 30
c 30
t 30
c 30
t 30
g 30
c 30
a 30
a 28
g 23
c 30
c 30
c 30
c 30
t 30
t 30
t 28
a 22
g 8
c 22
a 7
g 15
c 15
t 15
g 10
a 10
g 11
c 15
END_DNA
END_SEQUENCE

BEGIN_SEQUENCE HWI-EAS94_4_1_1_602_99 1
BEGIN_COMMENT
TIME: Wed Dec 24 11:21:50 2008
CHEM: solexa
END_COMMENT
BEGIN_DNA
g 30
c 30
c 30
a 30
t 30
g 30
g 30
c 30
a 30
c 30
a 30
t 30
a 30
t 30
a 30
t 30
g 30
a 30
a 30
g 30
g 30
t 30
c 30
a 30
g 30
a 30
g 16
g 30
a 28
c 22
a 22
a 22
c 14
t 15
t 15
g 5
c 10
t 15
g 10
t 5
END_DNA
END_SEQUENCE


phd ball files for 454 reads (in which traces are displayed) have more
information.  Here is an example:

BEGIN_SEQUENCE EBE03TV04IHLTF.77-243 1

BEGIN_COMMENT

CHROMAT_FILE: sff:reads.sff:EBE03TV04IHLTF
QUALITY_LEVELS: 99
TIME: Thu Jul 27 12:33:48 2000
TRACE_ARRAY_MIN_INDEX: 0
TRACE_ARRAY_MAX_INDEX: 4723
CHEM: 454

END_COMMENT

BEGIN_DNA
g 37 91
g 37 110
g 37 129
g 37 148
a 37 167
t 37 186
g 37 205
a 37 224
a 37 243
a 37 262
g 37 281
g 37 300
g 37 319
.
.
.
a 26 4385
t 26 4404
c 26 4423
t 30 4442
c 33 4461
g 33 4480
g 33 4499
t 33 4518
g 33 4537
g 36 4556
t 36 4575
a 33 4594
g 33 4613
g 33 4632
t 36 4651
g 26 4670
a 22 4689
END_DNA

END_SEQUENCE

(more BEGIN_SEQUENCE/END_SEQUENCE blocks to follow)


The line:
CHROMAT_FILE: sff:reads.sff:EBE03TV04IHLTF
indicates both the sff file that the read came from as well as the
read name.


Forward-reverse pair (mate pair) information is in the phd ball (or
phd files), as follows:  Look at the end of this read
ERQJC7K01DAVCO_right.27-112.pr1 (see below) and notice the WR item:

WR{
template newbler 000727:123348
name: ERQJC7K01BWV8Q
lib: pairedreads.sff
}

This indicates the read is from a template named ERQJC7K01BWV8Q.  The
"lib:" line is optional.

The WR item:

WR{
primer newbler 000727:123348
type: univ rev
}

indicates it is the universal reverse read.  If there will be another
read also from template ERQJC7K01BWV8Q but of type "univ fwd", consed
will realize that these reads form a mate pair.  This is how Assembly
View is able to show mate pairs (consistent and inconsistent).


BEGIN_SEQUENCE ERQJC7K01DAVCO_right.27-112.pr1 1

BEGIN_COMMENT

CHROMAT_FILE: sff:pairedreads.sff:ERQJC7K01DAVCO
QUALITY_LEVELS: 99
TIME: Thu Jul 27 12:33:48 2000
TRACE_ARRAY_MIN_INDEX: 0
TRACE_ARRAY_MAX_INDEX: 6072
CHEM: 454

END_COMMENT

BEGIN_DNA
c 32 3720
c 32 3739
t 32 3758
g 32 3777
c 32 3796
a 32 3815
a 34 3834
a 34 3853
g 34 3872
c 34 3891
a 34 3910
a 34 3929
a 34 3948
a 34 3967
t 32 3986
c 30 4005
a 29 4024
a 29 4043
t 29 4062
t 29 4081
a 25 4100
a 25 4119
a 25 4138
.
.
.

t 16 5240
t 16 5259
t 16 5278
c 16 5297
t 21 5316
t 28 5335
t 28 5354
t 28 5373
t 21 5392
a 24 5411
a 21 5430
END_DNA

END_SEQUENCE

WR{
template newbler 000727:123348
name: ERQJC7K01BWV8Q
lib: pairedreads.sff
}

WR{
primer newbler 000727:123348
type: univ rev
}


Reads can also have information which refers to a particular range of
bases in the read--"read tags".  Read tags are indicated as follows:


BEGIN_TAG
TYPE: polymorphism
SOURCE: consed
UNPADDED_READ_POS: 130 134
DATE: 10/07/02 12:37:23
END_TAG

In phd files, read tags can be either between the END_DNA and
END_SEQUENCE, or after END_SEQUENCE.  In phd balls, read tags can be
between END_DNA and END_SEQUENCE, but I'm not sure about after
END_SEQUENCE.

There are two other forms:

The form below allows a free-form comment of multiple lines between
BEGIN_COMMENT and END_COMMENT:

BEGIN_TAG
TYPE: polymorphism
SOURCE: fasta2PhdBall.perl
UNPADDED_READ_POS: 76 76
DATE: 10/07/02 13:35:03
BEGIN_COMMENT
chrpos: 27,868,168
END_COMMENT
END_TAG

And the form below is for non-free-form miscellaneous data: 

BEGIN_TAG
TYPE: polymorphism
SOURCE: fasta2PhdBall.perl
UNPADDED_READ_POS: 76 76
DATE: 10/07/02 13:35:03
line1 of misc data
line2 of misc data
END_TAG

I am not aware of any existing read tag that uses this form (let
me know if I am wrong), but you can create tags with this form.


----------------------------------------------------------------------------

28.  TIMESTAMP MISMATCH

Consed can report an error that looks like this:

exception thrown: PHD file timestamp mismatch:  ace file says Sat Nov 3 01:19:20 PDT but PHD file says Sat Nov 4 13:25:27 PDT2007 for read FOWW00624.c1_1000_100_2

This is a very serious error and should be understood: a read with
timestamp Nov 3 was put into an assembly and generated an ace file
which also said this read had timestamp Nov 3.  So far the ace file
and the phd file (or phdball) match.  But then on Nov 4, someone
deleted this read's phd file and recreated it.  So the ace file refers
to a phd file (or phdball) that no longer exists.  The new phd file
(Nov 4) might have different base calls or even different numbers of
bases.  I have seen this.  It can cause consed to segmentation fault
or worse--it can cause problems that may seem unrelated.

The correct solution to this problem is 

    1) to figure out why it happened to make sure it doesn't happen
    again and

    2) delete the ace file and reassemble to get an ace file that
    matches the Nov 4 phd file (or phdball).

One user begged me on his knees to allow consed to proceed even after
detecting this error.  So I foolishly implemented the command line
option:

consed -allowTimestampMismatch

This will allow you to proceed past the timestamp mismatch so you can
get the segmentation fault and other problems I mentioned above.  If
you use this option, please do not ask me for any further support.  If
my time is spent chasing problems that turn out to be due to the use
of this option, I will remove the option.


----------------------------------------------------------------------------
29.  CONSED REFERENCES

(I suggest the 2013 article if you want to cite Consed.)

Gordon and Green. 2013. Consed: A Graphical Editor for Next-Generation
Sequencing. Bioinformatics. Volume 29 Number 22 pp.2936-2937.

Gordon, David. "Viewing and Editing Assembled Sequences Using Consed",
in Current Protocols in Bioinformatics,A. D. Baxevanis and
D. B. Davison, eds, New York: John Wiley & Co., 2004, 11.2.1-11.2.43.

Gordon D, Desmarais C, Green P: Automated finishing with
Autofinish. Genome Res 11:614-625 (2001).

Gordon, D., C. Abajian, and P. Green. 1998. Consed: A Graphical Tool
for Sequence Finishing. Genome Research. 8:195-202


------------------------------------------------------------------------
30.  RUNNING PHRED and PHRAP

This section assumes that you have installed consed and various files
according to INSTALLING CONSED (above).

(If you are intending to run Phrap on NextGen reads, skip ahead in
this section--the first part of this section is just for Sanger reads.)

phred and phrap *must* be run via the phredPhrap perl script.  If you
don't do this, you are on your own.  If you run phred on its own, and
then you run phrap on its own, you will get an ace file that will not
be usable by Consed.  After you have run into problems (and you
probably will), then do not email us--instead please use the
phredPhrap script.  To use the phredPhrap script to run phred and
phrap:

30.1)  Type:
phredPhrap -V

It should say:
080818
(or newer).

If it does not, then you probably have not installed all the perl
scripts from the scripts directory, as directed in INSTALLING CONSED.

30.2)  Make a private copy of the dataset called "standard"
(see GETTING YOUR OWN COPY OF A SAMPLE DATASET above).

30.3)  Then type:

cd standard/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

30.4)  Delete all the files in phd_dir and edit_dir:

rm phd_dir/*
rm edit_dir/*

30.5)  cd edit_dir

30.6)  Run phredPhrap by typing

phredPhrap

That's it--you no longer need to type *any* arguments, and generally
you should not.  


Then run Consed on the resulting ace file as indicated in the beginning of
the Quick Tour (above).  If you have any problems, this is the time to 
diagnose them before you use your own data.  

(If you want to use advanced phrap options, you can do
that, for example, like this:

phredPhrap -forcelevel 3

If you want to use advanced cross_match options, you will need to edit
the phredPhrap script, but be sure to save a backup.)

30.7)  COMMON PROBLEMS RUNNING PHREDPHRAP

Problems were due to polyphred.  To check this, in
phredPhrap, leave the following line:

$bUsingPolyPhred = 0;

This will make polyphred not be used.  If the problem then goes away,
you will know the problem has something to do with polyphred so do not
contact any of the phred/phrap/Consed people.  Instead, contact the
polyphred people:  http://droog.mbt.washington.edu and
debnick@u.washington.edu

30.8)  Permission problems.  Check that you have write access to the
phd_dir and edit_dir directories.  You can do this by trying to create
a file in those directories:

touch ../phd_dir/xxx
which creates a file

ls -l ../phd_dir/xxx
which checks if the file was created.

Do the same with ../edit_dir/xxx

If you get a permission problem, do not contact me.  UNIX permission
problems are very simple for anyone who knows UNIX--get someone
locally who understands UNIX and can help you solve the permission
problem.


30.9)  WHY ARE ALL THE READS NOT IN THE ASSEMBLY?

You will notice that there are some contigs that contain only one
read.  You will also notice that there are some reads that are not
shown by Consed at all, since phrap did not put them into the ace
file.  Why?

If a read does not have a significant match (with Smith-Waterman score
exceeding minscore) to any other read, that read is not included in
the ace file.  Instead, that read is put in the '.singlets' file.
That read will not appear in Consed.

If a read does have a significant match to any other read, then it
will appear in the ace file and be shown by Consed.  However, such a
read might have other problems: it might not be possible to assemble
such a read with other reads (in the case of EST's this read may be a
unique representative of a particular gene (or a genomic sequence
contaminant) that happens to contain an Alu repeat and thus happens to
match other reads in the data set; or it may represent the only read
of a particular alternatively spliced form; or it may have data
anomalies of some sort (chimeras, etc.).  Such a read would end up in
a contig all of its own.


30.10)  ARE THERE READS THAT ARE TOTALLY UNALIGNED?

Unfortunately, yes.  In my opinion, Phrap shouldn't have put them in
the assembly at all.  But we just have to live with it.  You can find
if a read is totally unaligned by pointing the the read name in the
Aligned Reads Window and holding down the right mouse button.  Consed
will tell you the aligned positions, the high quality position, and
the chemistry of the read.


30.11)  CORRECTING FALSE JOINS MADE BY PHRAP

Phrap may put several reads together that you believe do not belong
together.  (For example, you may see several high quality
discrepancies between the reads.)  If you are sure these reads do not
belong together, you can force a subsequent reassembly by phrap to not
assemble those reads together.  You do this by finding a location
where there is a high quality discrepancy.  Then click on the read
with the right mouse button and release on 'Tell phrap not to overlap
reads discrepant at this location'.  There are no high quality
discrepancies with this dataset so Consed won't let you do this.
(Try it and see.)  However, when you use your own data, you may get
the chance! 

It is possible to automate this procedure using AutoEdit (see USING
AUTOEDIT).

30.12)  USING PHRAP ON NEXT-GEN READS

Note that phrap will take forever if you throw millions reads at it.
But it will work for Next-Gen reads as long as there aren't too many
(it has worked on one dataset with a few hundred thousand).

You must have 1 or 2 fastq file of the next-gen reads such as those created by
the Illumina software.  The fastq files must of the simple format:
each read must have exactly 4 lines (the sequence cannot be more than
1 line).  Make a consed directory structure like this:

myRegion/edit_dir
myRegion/phdball_dir
myRegion/solexa_dir
myRegion/phd_dir

Put the fastq files in solexa_dir.

cd to edit_dir
Run fastq2Phrap.perl like this:

fastq2Phrap.perl (fastq)
or
fastq2Phrap.perl (fastq1) (fastq2)

where fastq1 and fastq2 are paired mate files such that the nth read
in fastq1 is the mate of the nth read in fastq2.

Feel free to change the phrap parameters by editing fastq2Phrap.perl


------------------------------------------------------------------------


31.  WHAT IS AUTOFINISH?

Autofinish automatically chooses reads for finishing.  Autofinish
sometimes is able to completely finish a project with no human
decisions.  In other cases Autofinish mostly finishes a project, and a 
human just needs to do the final difficult problems since all the
routine problems have already been completed by Autofinish.  Thus a
human finisher is able to complete far more projects in the same
length of time.

Autofinish is flexible to the finishing strategy of your lab.  It can
be used to finish with just universal primer reads, just oligo walks,
just minilibraries, or a combination of these.  It can be used to
finish either genomic or cDNA.

Autofinish will do the following:

-close gaps
-improve sequence quality
-determine the relative orientation of contigs
-ensure that, at each consensus base, at least 2 reads from different
templates are aligned

(You can configure Autofinish to do any combination of these tasks.)

Autofinish will suggest the following types of experiments:

-universal primer reads (forward or reverse)
-custom primer reads with subclone templates
-custom primer reads with whole clone templates
-minilibraries (transposon or shatter) from subclone templates
-PCR

(You can configure Autofinish to suggestion any combination of these
experiments.) 


------------------------------------------------------------------------


32.  USING AUTOFINISH


Note:  Before you use Autofinish on your own data, you must modify
determineReadTypes.perl.  See INSTALLING CONSED above for information
about this.  

To do the exercises in this section, it would help to be able to edit
a file under UNIX and run a program under UNIX.  If you can't do that,
have someone teach you.  (It will not work to edit a file on Windows
and then transfer to UNIX.)  Typical editors on UNIX are vi and emacs,
but pico is probably the simplest for occasional users.  You can find
more information on pico by googling "pico editor"

You should also learn how to examine a file in UNIX, how to move
around the filesystem, etc.  If you don't know how to do this,
consult:

http://www.washington.edu/computing/unix/startdoc/files.html
and
http://www.washington.edu/computing/unix/startdoc/directories.html

There are also many books about Unix at bookstores.

Get your own copy of the dataset "autofinish" (see above under
GETTING YOUR OWN COPY OF A SAMPLE DATASET).

32.1)  Then type:

cd autofinish/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

32.2)  Try starting Autofinish by typing:

consed -ace autofinish.fasta.screen.ace.1 -autofinish 

(If it says "consed: Command not found", consult the person who
installed Consed. )

If Autofinish says:

Run-time exception error; current exception: InputDataError
        No handler for exception.
Abort

that means that you have not followed the instructions under
'INSTALLING CONSED' above.  Please follow those instructions and then
try this again.

When you have successfully run the above command, Autofinish will
create 7 files:

autofinish.fof
(project name).001014.155627.customPrimers
(project name).001014.155627.nav
(project name).001014.155627.out
(project name).001014.155627.sorted
(project name).001014.155627.univForwards
(project name).001014.155627.univReverses

Where '001014.155627' is replaced by your current date and time in
format YYMMDD.HHMISS.  The first file, autofinish.fof, is a file of
filenames.  It contains the names of the other files.

(project name).001014.155627.univForwards
    is the summary file of the suggested universal forward subclone reads
(project name).001014.155627.univReverses
    is the summary file of the suggested universal reverse subclone reads
(project name).001014.155627.customPrimers
    is the summary file of the suggested custom primer reads

These are the files you will typically use for directing your bench
work.  If you like, you can import these files into Excel since the
fields are separated by commas.

The .out file is the Autofinish output file.  This is the most
important file to examine while you are evaluating Autofinish.  If you
want to know *why* Autofinish picked the reads it did, it will tell
you.  Consult this file before you start complaining about
Autofinish's choices.  I've had people complain, and then, once they
look in the .out file (*not* any of the other files), they learn
information that persuades them that Autofinish was correct all along.
This is hard to over-emphasize, but I will try to over-emphasize it:

It will tell you lots more, such as the orientation of the contigs.
It will also tell you the value of all Autofinish parameters used.  If you
try to customize one of the parameters, check in the .out file to be
sure that Autofinish used the value you intended.

CONSULT THE .out FILE CAREFULLY IF YOU DISAGREE WITH ANY OF AUTOFINISH'S
CHOICES!

The .sorted file gives the reads sorted by contig and position.  This
file is useful if you want to find what reads Autofinish suggested for a
particular location.  It is *not* useful for understanding *why*
Consed chose a particular read.  It is deliberately terse to make it
useful for automation the ordering of reads.

The .nav file is a custom navigation file (see "CUSTOM NAVIGATION" far
below).  This file allows a Consed user to just click 'next', 'next',
... to review all of Autofinish's suggestions in context.  This is a
great way to quickly and easily review all of the reads suggested by
Autofinish.

This finishing tool is designed to be run in batch after each
assembly.  In a high throughput operation, the production people can
make these reads without anyone using Consed to examine the assembly
interactively.  Only when Autofinish cannot help you any longer
(generally after 3 or more times of running Autofinish, making the
reads, and re-assembling), must you bring up Consed graphically and
examine the assembly.

We suggest that you write some of your own software to parse the
summary files to automatically order primers and reads.  The summary
files (.customPrimers, .univForwards, .univReverses) will not change
much but the .out file may change, so don't try to parse it.


32.3)  AUTOFINISH:  MINIMUM NUMBER OF ERRORS FIXED PER READ

By default, the minimum number of errors fixed by an experiment is
0.02 

Human finishers typically look for low consensus quality
regions--regions that have one or more bases below a particular
quality threshold.  However, Autofinish can do better: it can find
regions where the *total* number of errors is greater than some
particular cutoff value.  This method can find regions where none of
the bases are low quality, but many are medium quality and thus
the total number of errors in the region is high.  Autofinish will
also ignore regions that have a very few low quality bases, as long as
the total number of errors is smaller than your cutoff.  This is a
better critereon because it is the total number of errors that you are
trying to reduce when finishing--not the number of bases with quality
below some arbitrary cutoff.

Two bases of quality 20 have 0.02 errors (on average).  Similarly, 200
bases of quality 30 have 0.02 errors (on average).  (Quality values
were explained at the beginning of this document.)  Suppose that you
want Autofinish to suggest an additional read for an area that even
just has one quality 20 base.  (Be aware that Autofinish will consider
10 quality 30 bases to be just as severe as 1 quality 20 base since,
on average, they will both have precisely the same number of errors: 0.01)

32.4)  EDIT PARAMETERS:  HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS

This shows how to change
consed.autoFinishMinNumberOfErrorsFixedByAnExp.  

To change any other
parameter, follow these same instructions replacing

consed.autoFinishMinNumberOfErrorsFixedByAnExp with the parameter you
want to change.

In the edit_dir directory is a file called "consedrc".
(If it isn't there, this exercise will create it.)

In that consedrc, add the following line:

consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.01

You can do this using an editor, such as pico, or you can do it with
Consed.  To do it with Consed, bring up consed as follows:

consed -editConsedrc

Up will pop the "Edit Parameters" window.  Near the top is
"consed.autoFinishMinNumberOfErrorsFixedByAnExp".  Point and click in
the box on the left containing 0.02 just underneath
"consed.autoFinishMinNumberOfErrorsFixedByAnExp".  After clicking, the
box outline should turn bold and the cursor should start blinking.
Change the 0.02 to 0.01.  Click on "just project" near the bottom of
the window.  The box containing 0.01 should turn red indicating that
it is now different than the default.  Then click "save".  A box
titled "Name of parameter file to write" should pop up.  Click "dismiss".
Note: you can changed more than one of these values before clicking
"save". 

To be sure that everything happened correctly, look at consedrc file.
It should contain the line:

consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.01

(If you don't know how to view a file, get a UNIX book and learn the
commands "less", "more", "pico", "vi", or "emacs".)

(Get in the habit of checking consedrc after using Consed's Edit
Parameter Window.) 


32.5)  AUTOFINISH:  MINIMUM NUMBER OF ERRORS FIXED PER READ (continued)

Then run Autofinish again:

consed -ace autofinish.fasta.screen.ace.1 -autofinish 

Look at the files just created by typing 'ls -tlr' and look at the
.out file by bringing it up with your favorite UNIX editor.  You
should see:

PARAMETERS_CHANGED_FROM_DEFAULTS {
.
consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.010
.
.

Further down is a section:

PARAMETERS {
! If you want to modify any of these parameters, just cut/paste
! the relevant line into your ~/consedrc file
! (or into the edit_dir/consedrc file)
! In the following, I have annotated the parameters with the following
! symbols:
!
! (YES)  freely customize to your own site
! (OK)  don't change unless you have a specific need and know what you
!         are doing
! (NO)  don't change this!

This section contains all Autofinish parameters, whether you have
changed them or not.  Thus a changed parameter will be in both lists.

Find 
consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.010
in this second list.

Then compare the .sorted files from this run of Autofinish and the
previous run of Autofinish in which the
consed.autoFinishMinNumberOfErrorsFixedByAnExp value 0.02 You will
notice that there are 2 additional reads suggested when the parameter
is 0.01.  There is a resequence with dye terminator chemistry of the
djs228_474 template and a de novo reverse on template djs228_2632.
Look at the .out file to see why Autofinish chose these reads.  It
will indicate that the first read is mainly to fix 0.01 errors in the
region from 2536 to 2545 and the second read to mainly fix 0.01 errors
from 969 to 978.

Bring up Consed to see what is in the 10 base region from 2536 to
2545.  You will see that there is a quality 25 base at 2539 and a
quality 21 base at 2540.  After that come some bases whose qualities
are in the high 30s.

In the Aligned Reads Window, point at the Misc menu, hold down the
left mouse button, and release on Show Error for a Region.  Enter 2539
and 2549 for the "Left Consensus Position of Region" and "Right
Consensus Position of Region" respectively and click on "Calculate".
You will see that there are .0135 errors in this region.  This is less
than 0.02 so Autofinish will not try to fix this region unless you
reduce consed.autoFinishMinNumberOfErrorsFixedByAnExp to 0.01

The default is 0.02 because most labs do not want to fix regions that
have less than 0.02 errors.


32.6)  DIVERSION:  UNIX LESSON

Note for UNIX novices: Earlier, I said that you only needed to know 3
UNIX commands: pwd, ls, and cd.  Then I added "ls -a", "less" and an
editor (such as pico).  Now I want you to learn one more:

ls -tlr

This is the same as ls, but it puts one file on a line and prints the
lines so that the most recent files are on the bottom.  Since you will 
be creating many, many files as you work through these Autofinish
exercises, this command gives an easy way to see the files you have
just created, without having to always look at autofinish.fof to look
for the names of the files you just created.


32.7)  AUTOFINISH:  CHANGING MELTING TEMPERATURES

Use 'ls -tlr' to find the most recent .out file.  Search in the
.out file (using your favorite editor) for MeltingTemp and you will
find the following lines:

consed.primersMinMeltingTemp: 55
consed.primersMaxMeltingTemp: 60

Some labs prefer to use primers with lower melting temperatures.  In
your consedrc file, put the following lines:

consed.primersMinMeltingTemp: 50
consed.primersMaxMeltingTemp: 55

You can do this by following the instructions above under HOW TO
CHANGE CONSED/AUTOFINISH PARAMETERS.  When you are done doing that,
look in the consedrc file to make sure it contains the above 2 lines.

Then run Autofinish again:

consed -ace autofinish.fasta.screen.ace.1 -autofinish

Using your favorite editor, check that the .out file you just
created says:

consed.primersMinMeltingTemp: 50
consed.primersMaxMeltingTemp: 55

(You can find the most recent .out file by typing 'ls -tlr'.)

Compare the .sorted files from this run of Autofinish and the previous
run.  The difference should be the custom primer read:

The previous .sorted file had:
tcttttgtctttccatatacatttt,56
which means the melting temperature is 56.

The latest .sorted file had:
cattttagaatcagtttgttg,50
which means the melting temperature is 50.


32.8)  AUTOFINISH:  JUST CLOSING GAPS

You could use Autofinish to just close gaps (you are not interested in
fixing single subclone regions or weak regions).  Add the following
to the consedrc file (and remove everything else so that Autofinish
uses the default values for everything else):

consed.autoFinishCoverLowConsensusQualityRegions: false
consed.autoFinishCoverSingleSubcloneRegions: false

If you are using the Edit Parameter Window to change these values, you
will find them when scrolling about 1/3 way down.  Change the
consed.primersMinMeltingTemp and consed.primersMaxMeltingTemp back to
their original values.  Then check the consedrc to make sure it
contains the above 2 lines.  (Get in the habit of checking consedrc
after using Consed's Edit Parameter Window.)

Now you should see in the .sorted file just 4 reads:  one custom
primer read pointing out the left end of the contig and 3 reverses off
the left end of the contig.  The right end is not extended because
Autofinish recognizes that it is the end of the BAC.

You can change any of the parameters listed at the top of the
Autofinish output file (or actually any of the more exhaustive list of
parameters listed in the 'Info' menu, 'Show Consed Parameters' list.)

We believe the defaults are an excellent starting point.


32.9)  AUTOFINISH:  JUST CLOSING GAPS JUST USING WALKS

One high-throughput operation was only interested in closing gaps and
only interested in using walks to close those gaps.  This is the
appropriate set of Autofinish parameters to do this:


consed.autoFinishCoverSingleSubcloneRegions: false
consed.autoFinishCoverLowConsensusQualityRegions: false
consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false
consed.autoFinishAllowPCR: false
consed.autoFinishAllowResequencingReads: false
consed.autoFinishAllowMinilibraries: false
consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: false
consed.autoFinishCallReversesToFlankGaps: false


(and every other parameter left the default value).

The first 2 parameters are the same as the "AUTOFINISH:  JUST CLOSING
GAPS" section (above).  The other parameters tell Autofinish all of
the types of reactions it is not allowed to use, leaving just walks.

Try this.  Now you should see only a single read, a walk, pointing
left off the left end of the contig.


32.10)  AUTOFINISH:  NOT REPEATING FAILED EXPERIMENTS

For this exercise, keep a backup copy of the ace file:

cp autofinish.fasta.screen.ace.1 autofinish.fasta.screen.ace.1.save

If you run Autofinish with the -doExperiments parameter (see below), 
-doExperiments causes Autofinish to record its suggestions in the ace
file (hence changing the ace file).  If one of these suggested reads
fails to fix a problem, when Autofinish is run again it won't pick the
same read again.

consed -ace (ace file name) -autofinish -doExperiments

If a forward or reverse universal primer read failed, Autofinish (when 
run in a subsequent round) will not suggest that same experiment.  If
a custom primer read fails, Autofinish will not pick that same
experiment again, and it won't pick a custom primer read that is even
close to the failed one.  'Close' is defined by the parameter:

consed.autoFinishNewCustomPrimerReadThisFarFromOldCustomPrimerRead: 50

In addition, Autofinish (the next time it is run) will tell you how
well each experiment did in solving the problem it was intended to
solve.

Return the parameters to the defaults and try this by running
Autofinish twice like this:

consed -ace autofinish.fasta.screen.ace.1 -autofinish -doExperiments
consed -ace autofinish.fasta.screen.ace.1 -autofinish -doExperiments

and look at the .out file from the 2nd run.  (You can find the most
recent .out file by typing 'ls -tlr'.)  You should see lines such 
as this:

rejecting experiment: reverse universal primer read with template djs228_1094
because an earlier round of autofinish called this with expid: 1
rejecting experiment: reverse universal primer read with template djs228_1422
because an earlier round of autofinish called this with expid: 2
rejecting experiment: reverse universal primer read with template djs228_1034
because an earlier round of autofinish called this with expid: 3

This is Autofinish trying experiment after experiment but finding they 
were already suggested in an earlier round of Autofinish.

You should not type '-doExperiments' if you do not intend to do the
experiments Autofinish suggests.  If you use -doExperiments, but you
don't really do the experiments, and then you run Autofinish again,
Autofinish will be very upset--it will think that all of its suggested
experiments failed (because it can't find them).  It will see that all
of the problems are still present but it will think that it should not
choose any of those same experiments again so it will suggest
different experiments that will not be as good as its original
suggestions.

-doExperiments will also cause suggested oligos to be tagged.

Note:  consed -doExperiments cannot be run on gzip'ed ace files--it
must be run on uncompressed (regular) ace files.

Primer id's created by Autofinish use the same naming scheme as
primers created in Consed and they will not conflict with each other.
For example, if Autofinish creates oligos djs14.1, djs14.2, and
djs14.3, then the next primer that a user accepts will be djs14.4.  If 
Autofinish is run a second time, it will start with primer djs14.5.

When you have completed this exercise with -doExperiments, replace the 
original .ace file by typing:

cp autofinish.fasta.screen.ace.1.save autofinish.fasta.screen.ace.1 


32.11)  AUTOFINISH:  doNotFinish particular regions

If there is a region that you don't care to finish (e.g., it has
already been finished by an overlapping clone or you know there is no
gene there), then you can put a doNotFinish tag on the consensus and
Autofinish will not try to finish this area.  

First, delete the consedrc file (or, if you are using the Edit
Parameter Window of Consed, restore the parameters to their default
values) and run Autofinish again:

consed -ace autofinish.fasta.screen.ace.1 -autofinish

Bring up consed:

consed -ace autofinish.fasta.screen.ace.1

and put a doNotFinish tag on the region from 2000 to 4000.  (If you
don't know how to do that, read through the Consed Quick Tour, above.)
Save the assembly as autofinish.fasta.screen.ace.2

Run Autofinish again:

consed -ace autofinish.fasta.screen.ace.2 -autofinish

Look at the .out files for each of the 2 runs of Autofinish.  (You can
find the most recent .out files by typing 'ls -tlr'.)  You will notice
in the .out file for the 2nd run of Autofinish that, in the other than
the experiments to extend the contig to the left, there is only one
experiment which is from 315 to 1662.  If you find that experiment in
the .out file, it will say "Contig1 0.05 errors fixed in region from
315 to 1662 fixing 0.05 errors from 969 to 978"

The "969 to 978" gives the worst 10 base window that the read is
intended to fix.  If you look with Consed, you will see that
there is a quality 12 base at 974.

You can also use doNotFinish tags to prevent Autofinish from
*extending* a contig into a gap by putting a doNotFinish tag near the
end of the contig and setting the following Autofinish parameters:

consed.autoFinishDoNotExtendContigsWhereTheseTagsAre: doNotFinish
consed.autoFinishDoNotExtendContigsIfTagsAreThisCloseToContigEnd: 50


32.12)  AUTOFINISH:  NOT USING PARTICULAR SUBCLONE TEMPLATES

If you no longer have a template that was used in shotgun, and thus
you don't want Autofinish to pick that template, you can put it in a
file badTemplates.txt in edit_dir.  This is a simple file with one
name per line.  

Using your favorite UNIX editor, create a file called
"badTemplates.txt" in edit_dir.  Make it contain a single line:

djs228_1094

Delete consedrc (or, if you are using the Edit Parameter Window,
restore the parameters to their defaults) and run autofinish again:

consed -ace autofinish.fasta.screen.ace.1 -autofinish

Search the .out file for djs228_1094.  You will find one line like
this:

not using template: djs228_1094 because  in bad templates file

Now try deleting badTemplates.txt and running autofinish again the
same way.  You will notice there are many differences in reads chosen,
since djs22_1094 is now available again for making reverses as well as
a template for custom primer walks.

badTemplates.txt can accept "*" (match any characters) as part of the
name.  For example, djs140_23* will eliminate templates:
 
        djs140_235684
         djs140_235783
         djs140_2326
         etc.
 
32.13)  AUTOFINISH:  NOT USING ENTIRE LIBRARIES FOR FINISHING

In addition to the badTemplates.txt file, you can use a
badLibraries.txt file which contains a list of all libraries that are
off-limits to Autofinish (e.g., you threw away all subclone templates
from this library or they are from a different lab which gave you the
chromatograms but not the templates).  Autofinish determines the
library of a read by the following in the PHD file:

WR{
template dscript 990603:090231
name: djs366_101
lib: library1
}

where "library1" is replaced by the actual library name.  Take a look
at any phd file in autofinish/phd_dir and you will see this.
Generally, determineReadTypes.perl puts this library information into
the PHD file.

Make sure that badTemplates.txt is deleted and consedrc is either
deleted (or use the Edit Parameter Window to restore the defaults) and
run Autofinish again.

consed -ace autofinish.fasta.screen.ace.1 -autofinish

Now create a file badLibraries.txt containing a single line:

lib1

and run autofinish again:

consed -ace autofinish.fasta.screen.ace.1 -autofinish


Look at the .out file.  You will see lines like this:

not using template: djs228_1034 because  in bad libraries file
not using template: djs228_1051 because  in bad libraries file
not using template: djs228_1094 because  in bad libraries file
.
.
.

You will see that there are no reads suggested that use any of these
templates, even though some of them (e.g.., djs228_1034) were used in
the Autofinish run (above) before you created the badLibraries.txt
file.

When you start doing this with your own data, you must put the lib:
line into your phd files.  Do this by modifying determineReadTypes.perl.


32.14)  MULTIPLE LIBRARIES WITH DIFFERENT INSERT SIZES

If different libraries have different insert sizes, Autofinish must
know the insert size of each library.  If there are 5 or more
forward-reverse pairs, where the forward and reverse are both in the
same contig, then Consed/Autofinish calculates the insert size of the
library by finding the mean and standard deviation of the insert sizes 
of these forward-reverse pairs.  The maximum insert size of the
library is set at the mean plus 2.5 times the standard deviation.

If there are fewer than 5 forward-reverse pairs, where the forward and 
reverse are both in the same contig, Consed/Autofinish considers this
statistical information unreliable so instead relies on a
file called 'librariesInfo.txt" which must be placed in edit_dir
(where the ace file is).  This file looks like this:

   
LIB{
name: lib0
avgInsertSize: 1500
maxInsertSize: 3000
stranded: double
cost: 600.0
}
 
LIB{
name: lib1
avgInsertSize: 3000
maxInsertSize: 5000
stranded: double
cost: 1000.0
}
 
LIB{
name: lib2
avgInsertSize: 10000
maxInsertSize: 12000
stranded: double
cost: 5000.0
}
 
 
'name' is the name of the library.  This is the name that goes into
the PHD file after the 'lib:' keyword (see AUTOFINISH:  NOT USING
ENTIRE LIBRARIES FOR FINISHING above).  'avgInsertSize' is the
average insert size of the library--the figure to be used by
Autofinish if there are not enough forward/reverse pairs for
Autofinish to calculate the mean insert size of the library.
'maxInsertSize' is the maximum insert size--if forward/reverse pairs
are further apart than this, Autofinish will assume these reads are
misassembled.  'stranded' is whether this template is single or double 
stranded.  'cost' is the cost of making a minilibrary out of a
template from this library.

In consedrc, there must be a line like this:

consed.primersMaxInsertSizeOfASubclone: 5000

where 5000 is replaced by whatever the maximum insert size of all of
your different libraries.

For this exercise make consedrc have a single line:

consed.primersMaxInsertSizeOfASubclone: 12000

Alternatively, use the Edit Parameter Window to set
consed.primersMaxInsertSizeOfASubclone to 12000.


For this exercise I have a file in edit_dir called
"librariesInfo.txt_hide".  To make Autofinish pay attention to it, do
the following:

cp librariesInfo.txt_hide librariesInfo.txt

Delete badLibraries.txt:
rm badLibraries.txt

Before you run Autofinish again, first restart Consed:

consed -ace autofinish.fasta.screen.ace.1

On Consed's Main Window, point to 'Info', hold down the left mouse
button, and release on 'Show Library Info'.  You should see the names
of your libraries and the correct number of reads in each library.
This feature will be useful in debugging your use of librariesInfo.txt


Then run Autofinish again:

consed -ace autofinish.fasta.screen.ace.1 -autofinish


Look at the .out file.  Look for the following:

"Choosing de novo universal primer reads to try to close gaps"

You will see there are many reads under this heading.  These are 
the lib1 and lib2 reads that have a large average insert size and thus 
span the gap.  Autofinish did not choose some of these reads before
because, if the insert size were only 1500 bases, these reads would
not have helped to close the gap.

When you are done with this exercise, delete librariesInfo.txt and
consedrc


When there are many reads from the same library, Consed/Autofinish
will look at the forward/reverse pairs that are within the same contig 
(so the insert size of that template can be directly measured) and 
figure out the mean and standard deviation of the insert size of
templates from that library.  Consed/Autofinish will use these numbers 
rather than the number from librariesInfo.txt 


32.15)  AUTOFINISH CLOSING GAPS WITH MINILIBRARIES


If you wanted Autofinish to *only* suggest minilibraries to close
gaps, use the following parameters:

consed.autoFinishAllowWholeCloneReads: false
consed.autoFinishAllowCustomPrimerSubcloneReads: false
consed.autoFinishAllowResequencingReads: false
consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false
consed.autoFinishAllowPCR: false
consed.autoFinishAllowResequencingAUniversalPrimerAutofinishRead: false
consed.autoFinishCallReversesToFlankGaps: false
consed.autoFinishAllowMinilibraries: true
consed.autoFinishAlwaysCloseGapsUsingMinilibraries: true
consed.autoFinishPrintMinilibrariesSummaryFile: true


Get your own copy of the dataset "assembly_view" (see above under
GETTING YOUR OWN COPY OF A SAMPLE DATASET).

32.16)  Then type:

cd assembly_view/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

Attention! This is *not* the same directory you have been using.  It,
autofinish/edit_dir, does not have any gaps so it cannot be used for
this exercise.

Create a consedrc file with the parameters above in it.
Alternatively, start Consed in this directory and use the Edit
Parameter Window to modify the parameters as above.  Then run
Autofinish:

consed -ace assembly_view.fasta.screen.ace.1 -autofinish

When it has completed, look in the .out file.  You will see the
following:

Enough existing fwd/rev pairs to establish:
Left end of Contig3 has 13 fwd/rev pairs connecting it to
Right end of Contig2 with gap size -460 (contigs overlap)

Trying to suggest minilibrary for gap between right end of Contig2 and left end of 
Contig3
MINILIBRARY{
best template: djs736a2_fp04q274 from lib djs736a2
size: 3607 errors fixed: 0.01 errors fixed per dollar: 0.00
connecting right end of Contig2 to left end of Contig3 with estimated gap size -460
 
alternative template: djs736a1_fp02q472 from lib djs736a1
size: 1184 errors fixed: 0.01 errors fixed per dollar: 0.00
}
 
You will also see a more terse (but more easily parseable) description
in the .minilibraries file.


The parameter:

consed.autoFinishPrintMinilibrariesSummaryFile: true

will cause Autofinish to print a file with name similar to:
(project name).001014.155627.minilibraries


Or you could be more sparing in which gaps you close with
minilibraries and which you do not:

consed.autoFinishAlwaysCloseGapsUsingMinilibraries: false

If the parameter above is set to false, then Autofinish will only
choose minilibraries if the gap is the size below or larger:

consed.autoFinishSuggestMinilibraryIfGapThisManyBasesOrLarger: 800

If you try this in this example, you will see that Autofinish will not 
suggest a minilibrary because the gap has negative size (meaning the
contigs overlap) and thus is not more than 800 bases.

Autofinish can suggest more than one minilibrary per gap:

consed.autoFinishSuggestThisManyMinilibrariesPerGap: 2

is the default, but you can increase it.  If you try this, you will
see more alternate templates suggested for the minilibrary.

When you are done, delete the consedrc file.


32.17)  CLOSING GAPS USING PCR


If you are interested in just closing remaining gaps with PCR, you can 
set the following Autofinish parameters:

consed.autoFinishAllowWholeCloneReads: false
consed.autoFinishAllowCustomPrimerSubcloneReads: false
consed.autoFinishAllowResequencingReads: false
consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false
consed.autoFinishAllowMinilibraries: false
consed.autoFinishAllowPCR: true
consed.autoFinishAllowResequencingAUniversalPrimerAutofinishRead: false
consed.autoFinishCallReversesToFlankGaps: false
consed.autoFinishCoverLowConsensusQualityRegions: false
consed.autoFinishCoverSingleSubcloneRegions: false
consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: false
consed.autoFinishDoNotDoPCRIfThisManyAvailableGapSpanningTemplates: 1000000

This will cause Autofinish to try to close all gaps using PCR.

Get your own copy of the dataset "assembly_view" (see above under
GETTING YOUR OWN COPY OF A SAMPLE DATASET).

32.18)  Then type:

cd assembly_view/edit_dir

(You should get no error from this.  If you do, type "pwd" to find out
where you are and cd to the correct directory accordingly.)

Create a consedrc file with the parameters above in it.  Then run
Autofinish:

consed -ace assembly_view.fasta.screen.ace.1 -autofinish

When it has completed, look in the .out file.  You will see the
following:


Please make the following PCR primers:
   assembly_view.1: tgaggcaggagaatcagg 11872 to 11889 (top strand) for right end o
f Contig2 melt:  57.3
   assembly_view.2: tgtaaagaggcatctcagtttc 101 to 122 (bottom strand) for left end
 of Contig3 melt:  56.3
est. product size: 203
}PCR FOR ORIENTED CONTIGS


PCR FOR UNORIENTED CONTIGS{
779 acceptable primers on left end of contig Contig1
868 acceptable primers on right end of contig Contig1
977 acceptable primers on left end of contig Contig2
739 acceptable primers on right end of contig Contig3
contig-ends left end of Contig2 and right end of Contig3 are in the same scaffold so not considering pcr for them
Please make the following PCR primers:
   assembly_view.3: ttctgggtctggaggaca 485 to 502 (bottom strand) for left end of Contig1 melt:  57.5
   assembly_view.4: tcagtaattgggactataggtacat 13198 to 13222 (top strand) for right end of Contig1 melt:  55.3
   assembly_view.5: tttgttttgttttgtattttgttt 504 to 527 (bottom strand) for left end of Contig2 melt:  55.1
   assembly_view.6: accaaataacaggtaaaccaaa 15503 to 15524 (top strand) for right end of Contig3 melt:  55.4
Do PCR reactions with the following pairs of primers:
   assembly_view.3: ttctgggtctggaggaca   assembly_view.5: tttgttttgttttgtattttgttt
   assembly_view.3: ttctgggtctggaggaca   assembly_view.6: accaaataacaggtaaaccaaa
   assembly_view.4: tcagtaattgggactataggtacat   assembly_view.5: tttgttttgttttgtattttgttt
   assembly_view.4: tcagtaattgggactataggtacat   assembly_view.6: accaaataacaggtaaaccaaa
}PCR FOR UNORIENTED CONTIGS


PCR FOR ORIENTED CONTIGS means that autofinish knows which end of
which contig is connected to which end of some other contig (just as
Assembly View knows) and is suggesting to a single PCR reaction to
make a product that spans the gap.

PCR FOR UNORIENTED CONTIGS gives a list of PCR reactions to do for the
cases in which consed doesn't know which end is connected to which.
You will see a list of PCR primers to synthesize.  Then you will see
a list of which pairs of these primers to do PCR with.
(For example, it doesn't make sense to do PCR with the primer off the left end of
Contig1 and the primer off the right end of Contig1.)

Some of these pairs of PCR primers will give products and others will
not (or will give enormous products).  The ones giving products will
tell you how the contigs are ordered and oriented.  You can then
sequence the product to find the gap sequence between the contigs.


32.19)  AUTOFINISH:  TOO MANY UNIVERSAL PRIMER READS

St Louis wanted more universal primer reads, so I put in a feature
that allows for redundant universal primer reads.  If you get too
many for your taste, then put this into your consedrc file:

consed.autoFinishRedundancy: 1.0

The default is 2.0, meaning that Autofinish will try to fix every
problem area twice--once by some universal primer reads and once again 
by other universal primer reads.  Then, and only then, will it try
oligo walks to finish remaining problems.

Baylor wanted more reverses to close gaps, so I put a feature into
Autofinish that calls *all* reverses near gaps:

    (contig)    ___________________________ 

                                           <- reverse 1
                                            <- reverse 2
                                             <- reverse 3
                                             <- reverse 4


(including reverses that are likely to fall into the gap) in the hope
that enough of them will hook onto each other that the gap will be
closed.  (If there is already a reverse pointing out but no forward,
Autofinish will suggest the forward.)  If this feature gives you too
many reverses for your taste, then in your consedrc file:

consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: false


32.20)  AUTOFINISH FOR CDNA ASSEMBLIES


The way to use Autofinish for cDNA assemblies is to pretend that the
cDNA is a BAC and that you are only going to allow whole clone custom
primer BAC reads.  To do this, put the following into your consedrc
file:

consed.autoFinishAllowResequencingReads: false
consed.autoFinishAllowWholeCloneReads: true
consed.autoFinishAllowCustomPrimerSubcloneReads: false
consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false
consed.autoFinishAllowPCR: false
consed.autoFinishCDNANotGenomic: true
consed.autoFinishCheckThatReadsFromTheSameTemplateAreConsistent: false
consed.checkIfTooManyWalks: true
consed.autoFinishExcludeContigIfOnlyThisManyReadsOrLess: 0
consed.autoFinishExcludeContigIfDepthOfCoverageOutOfLine: false
consed.autoFinishExcludeContigIfThisManyBasesOrLess: 0
consed.autoFinishCoverSingleSubcloneRegions: false
consed.autoFinishContinueEvenThoughReadInfoDoesNotMakeSense: true
consed.autoFinishCallReversesToFlankGaps: false

You don't want Autofinish to try to extend off the 3' end or the 5'
end of the cDNA, right?  How is Autofinish going to determine that?
It determines it as follows:

In the 5' end read, put the following into the phd file:

WR{
primer determineReadTypes 001019:112654
type: univ fwd
}

WR{
template determineReadTypes 001019:112654
name: cDNA1
}


In the 3' end read (the read that is primed off the polyA tail), put
the following into the phd file:

WR{
primer determineReadTypes 001019:112654
type: univ rev
}

WR{
template determineReadTypes 001019:112654
name: cDNA1
}

For all other reads, such as transposon reads and custom primer walks,
put the following into the phd file:

WR{
primer dscript 001019:112654
type: walk
}

WR{
template determineReadTypes 001019:112654
name: cDNA2
type: bac
}

If you are going to finish many cDNA's, you will find it will work
better to modify determineReadTypes.perl than to go editing every phd
file.

So Autofinish finds the univ fwd read and assumes it indicates the 5'
end of the cDNA and it finds the univ rev read and assumes it
indicates the 3' end of the cDNA.  (The parameter
consed.autoFinishCDNANotGenomic: true 
tells it to try to find the end of the cDNA in this manner.)


There is one additional problem when using Autofinish for cDNA
assemblies:  initially, the ace file created by phrap is empty since
the 3' and 5' reads don't overlap enough.  You have *no* contigs for
Autofinish to finish.  So phrap is of no use initially.

But you can use Consed to create the
assembly:

First run phredPhrap to phred both reads and run
determineReadTypes.perl Then pick the 3' read and run phd2Ace.perl on
it:

phd2Ace.perl (name of phd file)

This will give you an ace file with one read in it.

Now suppose that you have other reads from the same cDNA.  You can use 
this technique to add them to the ace file:

To add all the reads phrap has neglected to put into the ace file, do
the following:

1. create a file of read names.  E.g.,

djs74_1180.s1
djs74_1432.s1
djs74_1455.s1
djs74_1465.s1
djs74_1532.s1
djs74_1802.s1
djs74_1803.s1

Typically, you will get this list of reads by looking in the
singlets file.  

Then run consed:

2. consed -ace old_ace.ace -addNewReads fileOfReadNames.txt -newAceFilename new_ace.ace

where: 
fileOfReadNames.txt is the name of the file (above) containing
    the read names
new_ace.ace is whatever you want the new ace file to be named
old_ace.ace is the name of the old ace file

Now you have an ace file that contains all the reads you have
sequenced for that cDNA.  You can now run Autofinish on it:

consed -ace new_ace.ace -autofinish


32.21)  AUTOFINISH FOR LISTING GAP-SPANNING TEMPLATES

Sometimes people ask me for how to make Autofinish suggest all
templates that span a gap.  People who ask this question are not using
Autofinish to automate finishing--they are using it as a tool in the
hand of a human finisher.  Although evidence has shown that Autofinish
is far more powerful in an automated mode, it is also a powerful tool
in the hands of a human finisher.  I will specify how to do this, but
hope you will move to the next level of using it in an automated
manner.

One method is to just shut off Autofinish suggesting any experiments
at all:

consed.autoFinishCallReversesToFlankGaps: false
consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false
consed.autoFinishAllowResequencingReads: false
consed.autoFinishCoverSingleSubcloneRegions: false
consed.autoFinishCoverLowConsensusQualityRegions: false
consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: false
consed.autoFinishAllowCustomPrimerSubcloneReads: false
consed.autoFinishAllowPCR: false
consed.autoFinishAllowMinilibraries: false

Thus Autofinish will order and orient the contigs, printing out the
forward/reverse pairs that connect the contigs, as exemplified below:

Examining existing fwd/rev pairs that flank gap at Right end of Contig46

read                       (start  end)   mate read            mate contig (start   end) contig
                           (pos    pos)   name                 name        (pos     pos) length
agroa3_fp19q452.x1u3 ->     -13     824 agroa3_fp19q452.y1   Contig47   ->     365    1282     1844
mag3gpk041f9.y1      ->      29     689 mag3gpk041f9.x1      Contig53   ->  139675  140402   140921
agroa3_fp06q173.x1u3_m ->      91    1053 agroa3_fp06q173.y1   Contig53   ->  139683  140618   140921
mag2gpk013f21.y1     ->     314     887 mag2gpk013f21.x1     Contig57   ->  257472  258018   261664
mag3gpk064f6.x1      ->     542    1173 mag3gpk064f6.y1      Contig53   ->  139061  139709   140921
mag3gpk105a18.y1     ->     710    1318 mag3gpk105a18.x1     Contig53   ->  138774  139329   140921
Enough existing fwd/rev pairs to establish:
Right end of Contig53 has 4 fwd/rev pairs connecting it to
Right end of Contig46 with gap size -17 (contigs overlap)

This shows you that (according to the naming convention of this lab),
the following templates span the gap between the right end of Contig53
and the right end of Contig46 (clearly one of these contigs is
complemented with respect to the other):  

agroa3_fp19q452
mag3gpk041f9
agroa3_fp06q173
mag2gpk013f21
mag3gpk064f6
mag3gpk105a18

However, what are you going to do now with these templates?  Walk on
them?  Resequence the universal primer reads?  Whichever you plan to
do, why not allow Autofinish to make the suggestions and spend you
time on the harder problems?


32.22)  FINISHING A SPECIFIC CONTIG

../../consed -ace autofinish.fasta.screen.ace.1 -autofinish -contig Contig1

This will just finish Contig1.


32.23)  MARKING THE END OF THE CLONE

Autofinish tries it best to recognize the end of the clone (BAC), and
it does pretty well, but you might have information it doesn't have,
such as knowing the sequence of the BAC vector or having reads that
were primed from within the BAC vector.  You can tell Autofinish this
information by adding cloneEnd tags.  You can do this in Consed as
follow:

In the Aligned Reads Window put the cursor on the consensus at the
base position marking the end of the insert.  Point at the "Misc"
menu, hold down the left mouse button and release on "Add Clone End
Tag With Insert To Right" (alternatively, "Add Clone End Tag With
Insert To Left").  Then save the assembly and run Autofinish.  

If you want such tags to be added automatically, you could write a
perl script to append such tags to the ace file.

Some sites have found that this is not enough--they want to change all
bases beyond the clone end tag to X's.  You can do this either
interactively or automatically.  To do it interactively, in the
Aligned Reads Window, put the cursor on the vector base at the
vector/insert junction.  Hold down the left mouse button on the 'Misc'
menu and release the button on either 'Change to X's to Left in All
Reads' or 'Change to X's to Right in All Reads'.  

However, if you are using Autofinish, you will probably also want to have
this process automated.  To do this, set the following parameter in
consedrc:

consed.autoEditConvertCloneEndBasesToXs: true

(It is set this way by default, so you normally won't need to do this
unless you have unset it.)

Then run AutoEdit as follows:

consed -ace (name of ace file with the clone end tags) -autoEdit 

This will create another ace file with a version number one higher
than the one you just ran.  If you want to specify a particular new ace
file name, you can do it this way:

consed -ace (old ace file) -autoEdit -newAceFileName (new ace file)

After following this procedure, the consensus may start with X's and
end with X's like this:

XXXXXXXBBBBBBBBBBBBBBBBBBBBBBBXXXXXXXX
^
position
1

If you would rather that the consensus not contain this masked vector,
but rather start with base position 1 being the first base of the
insert like this below, you must reassemble with phrap.

       BBBBBBBBBBBBBBBBBBBBBBB
       ^
       position
       1


------------------------------------------------------------------------

33.  CONTRIBUTED SOFTWARE 

[contributions]

sff2phd.perl 
universal reads importer from sff files, for adding reads
to phredPhrap projects. It does automatic 454 PairEnd splitting,
clipping/filtering and quality adjustment. It outputs phd.ball file,
individual phd files, can also write phrap ready fasta files and even
fastq files. Can do MID tag splitting. Supports reads import using
inclusion/exclusion lists. Adds RT tags to phd.ball/phd files for
correct pair-end hadling.  Can work with custom read pair linker
sequence.  (Markiyan Samborskyy, ms587@mole.bio.cam.ac.uk)

cons.perl
consed autoloader on the latest ace file/ newbler ace bug fixer.
Can take phredPhrap/newbler project path as an argument, and will try to
load the latest ace file. If run from within base dir or edit dir - then
no path is required. Can also fix newbler contig ends if run with -f
parameter.  (Markiyan Samborskyy, ms587@mole.bio.cam.ac.uk)


ace2fof
Application that reads in information from an ace file specified by the user and
writes a list of reads (potentially with their associated contig and left and   
right unpadded assembly positions) to a file specified by the user, based on    
contig and unpadded positional information specified by the user.               


acestatus.pl

# author: Cliff Wollam
# PURPOSE: Gets reads assembled, untrimmed length and trimmed length
#          information for all contigs in ace file(s) passed.
# 991122 - cwollam - Changed so that it gets the more logical padded
#          and unpadded lengths of the contigs
# acestatus.pl - simple script that shows the major contigs and # of
#reads in each contig in xterm of a a specific acefile. So if                 
#       you
# are looking for a specific assembly/acefile you don't have to open
# the acefile of each assembly to find it.
# USAGE FROM THE COMMAND LINE:
# acestatus.pl ace_filename

mergeAces.perl

# Bugs and complaints to Bill Gilliland, billg@ucdavis.edu.
# mergeAces.perl v. 0.2 5/22/01
# Part of the MACE script package to get consed to display multiple
# ace files.

# Take two (or more) projects and merge them together, creating a new
# project directory (if it doesn't already exist) with all the cgrams
# and phd files of the original projects.

aceContigs2Phd.perl

#
# PURPOSE:  writes out each of the contigs in an ace file to a phd file
#
# INPUT:  ace file
#
# OUTPUT:  a dummy phd file for each of the contigs in the ace file
#
# Revisions: Original by Don Bovee, UWGC Feb 2003


ace2Oligos.perl

# Purpose:  Scans the ace file looking for new oligos so they can
#   be ordered.
#
# How it works:  Keeps a file of already-ordered oligos
#   First reads this file so it knows what oligos are already in it.
#   Then reads the ace file.  If there are any oligos in the ace file
#   that aren't already in the oligo file, it adds the new oligos to
#   the oligo file. It also adds the comments (if any) if Comment and/or Print
#       is given as the third, optional argument.
#
# Rev: 980427 (David Gordon)
# Rev: 000330 to handle comments in oligo tags
# Rev: 020528 to print comments in oligo tags upon request (Peter Kos)

export_cons

#this gets the consensus for any contig of any version of any clone
#that exists in consed.  to use, type export_cons from any directory
#and it will ask you for the clone name, the version, and the contig.
#the default version is the .ace, so hit return if thats the version
#you want.  command line version is 
#export_cons <CLONE_NAME> <VERSION> <CONTIG>
#just type export_cons <CLONE_NAME> <CONTIG> for the .ace version.
#output is <CLONE_NAME>.vace.<VERSION>.Contig<CONTIG>
#and is a fasta format sequence file.
# 990805 - Cliff Wollam changed so that it uses a complete ace filename for
#          everything.  If user passes nothing on the cmd line then they
#          will enter a whole ace file name.  If user passes 2 cmd line
#          args then ace is assumed to be $1.fasta.screen.ace.  If user
#          passes three cmd line args then $2 can be either an entire ace
#          filename to use or it can be just a version number in which case
#          ace filename will be taken to be $1.fasta.screen.ace.$2.
#
#
#export_cons - allows you to export a contig without from any acefile          
#w/out having open consed. (use acestatus.pl here to find contig # of
#interest). Fasta file that it creates is in the edit_dir of the clone
#that you specify.
#USAGE FROM THE COMMAND LINE:
#export_cons <CLONE_NAME> <VERSION> <CONTIG>
#If <VERSION> is omitted but <CLONE_NAME> and <CONTIG> are present
#then VERSION is the .ace for CLONE_NAME.
#If nothing is present on the command line you are prompted to
#enter values for these three things.


sff2phd_Samborskyy

   In comparison with the previous one, it can work with MID adapters:
   select only reads with specified adapters,  and clip MID & Y-adapters
   before adding them to the phredPhrap project.
   The splitter algorithm is imlemented within script itself, using
   string match.
   Also you can use specific MID's for different sff files in the
   project.
   MID are identified using their sequence or MID_ID, like:
   -MID="ATGATAGCTTC,MID1,MID14,RL1,RL12"
   The MID's to use can be specified via the command line -MID or by
   creating the file with default name "mids" (-MID_file=) in the current
   folder.
   The format is the following:

   [MID_ID(s)]  -  will apply to ALL new files being imported

   [sff_input_filename][TAB][MID_ID(s)] will apply ONLY to the specified
   file
   Bouth types of the lines can be mixed inside single file.

   Also it does:

   Support reads inclusion/exclusions lists. Inclusion filter is applyed
   first, than exclusion:
   454_reads_include (-reads_include=)
   454_reads_exclude (-reads_exclude=)

   Limit to the maximum number of reads imported (after filtering):
   -max_reads=[#of reads]
   It is very userfull option for the first pass of the cDNA assembly by
   phrap, to get rrn's sequences (try1000-10000 reads) and look at the
   largest #reads contigs and use them for filtering as "vector.seq" for
   the next pass, importing all the reads - to get others, less abundant
   transcripts.

   -xn - exclude reads with n's within clipping area (usually had been
   under gas buble during the 454 run, and the reads gets really
   dephased/error prone after that).

   Also replaced seek request by sysread into $tmp (to allow
   serial/streaming input) - now it can import gzipped/bzipped sff files
   directly (not yet supported by the consed/importSFFfiles).

   -- 
   Best regards,
   Dr. Markiyan Samborskyy                           mailto:ms587@mole.bio.cam.ac.uk
   DNA Sequencing Facility,
   University of Cambridge.


phredPhrapWithPhdBalls

   modified from phredPhrap by Ben Allen at LANL
   Allen, Benjamin S" <bsa@lanl.gov> so it can use reads in phdballs.


revert_fof

# PURPOSE:  If the user really screws up a read, to back out all changes.
#           You must reassemble after using this.
#           The difference
#           is that the edits are still in lower versions of phd files
#           in case you ever want them back, or in case you want to
#           to run consed on an older ace file that refers to those
#           older phd files that have edits in them.
#           Run from within edit_dir
#revert_fof - takes the phd.1 of a read and moves it to the last phd.#,
# so that edits are removed from that read the next time you phrap. Same 
# as "revert" except it uses fof (list) of reads, and reverts multiple reads.
# USAGE FROM THE COMMAND LINE:
# From the clone/edit_dir
# revert_fof fof_name
# where fof_name is the name of the list of reads that you want
# reverted.  fof_name should not contain any path information or
# phd extensions.
# revert_fof use file of files to give reads to revert 3/14/01 SL

recover_consensus_tags

# Purpose:  Transfers all consensus tags from a set of old assemblies to
#     a new assembly.  
# How to Use It:   
#     Tags moved from the old aces in the edit_dir to the new one.  There will
#     be a recovered ace file created, called "ace_file.recovered"  Thus the input
 
#     original ace file still kept
# INPUT:  command line:  <input ace file>
# rct - recovers tags from older acefiles if lost in new version from    
# phrapping.Transfers all consensus tags from a set of old assemblies to
# a new assembly.
# USAGE FROM THE COMMAND LINE:
#  rct ace_filename <only ace file to be transfer (optional)>
# You must be in the edit_dir where the ace file is located


------------------------------------------------------------------------

34.  CONSED CUSTOMIZABLE CONSEDRC RESOURCES 

! In the following, I have annotated the parameters with the following
! symbols:
!
! (YES)  freely customize to your own site
! (OK)  don't change unless you have a specific need and know what you
!         are doing
! (NO)  don't change this!

!
!
! parameters in the (YES) category:
!


consed.printPS: true
bool
! print memory                
! (YES)


consed.defaultTagType: polymorphism
RWCString
! when swiping the consensus in the Aligned Reads Window to create a
! tag, what is the default tag type to be added?
! (YES)

consed.defaultTagOnConsensusNotReads: true
bool
! when swiping the consensus in the Aligned Reads Window to create a
! tag, by default will the consensus be tagged or the reads be tagged?
! (YES)

consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.02
double
! if an experiment solves fewer errors than this, it isn't worth doing
! so won't be chosen.  This parameter controls when Autofinish stops
! choosing experiments.
! (YES)

consed.autoFinishRedundancy: 2.0
double
! This number should be between 1.0 and 2.0 If you want more reads
! for each area, increase the number towards 2.0  If you want fewer
! reads per area, decrease it towards 1.0.  This only affects
! universal primer reads--not custom primer reads.
!
! (YES)

consed.autoFinishAverageInsertSize: 1500
int
! If a template has a forward but no reverse, when deciding whether to
! allow this template for a particular primer or reverse, we need to
! make an assumption of where is the end of the template.  If we have
! do not have enough forward/reverse pairs to determine the mean, then
! this parameter is used.
! (YES)

consed.primersMaxInsertSizeOfASubclone: 3000
int
// for checking for false-annealing
! check +/- this distance from the primer for false-annealing
! and check at most this distance for templates for a primer.
! Thus if you have more than one library, make this the max of
! all libraries.
! (YES)

consed.primersMaxMeltingTemp: 60
int
! (YES)

consed.primersMaxMeltingTempForPCR: 58
int
! Note:  the difference between consed.primersMaxMeltingTempForPCR and
! consed.primersMinMeltingTempForPCR must be less than or equal to
! consed.primersMaxMeltingTempDifferenceForPCR
! Otherwise, autofinish may take forever to pick pcr primers.
! (YES)

consed.primersPickTemplatesForPrimers: true
bool
! when picking primers for subclone templates, pick templates also.
! If there is no suitable template for a primer, do not pick the
! primer.  If you like to pick your own templates, you might want to
! turn this off for a little improvement in speed.
! This has no effect on Autofinish--just on interactive primer picking
! in Consed.
! (YES)

consed.primersSubcloneFullPathnameOfFileOfSequencesForScreening: $CONSED_HOME/lib/screenLibs/primerSubcloneScreen.seq
RWCString
! vector sequence file if choosing subclone (e.g., M13, plasmid)
! templates
! (YES)

consed.primersCloneFullPathnameOfFileOfSequencesForScreening: $CONSED_HOME/lib/screenLibs/primerCloneScreen.seq
RWCString
! vector sequence file if choosing clone (e.g., cosmid, BAC) template
! (YES)

consed.primersMinMeltingTemp: 55
int
! (YES)

consed.primersMinMeltingTempForPCR: 55
int
! (YES)

consed.searchFunctionsUseUnalignedEndsOfReads: false
bool
! when navigating by    
! searchForSingleSubcloneRegions and searchForSingleStrandedRegions,
! and the read below has both aligned and unaligned portions, which
! bases of the read are considered to cover the region:
!   uuuuuuuAAAAAAAAAAAAAAAAAAAAAAAAAuuuuuuuu
!   <--------- if "true" ------------------>
!          <-----if "false"-------->
! where u means an unaligned base and A means an aligned base
! (YES)

consed.searchFunctionsUseLowQualityEndsOfReads: true
bool
! when navigating by    
! searchForSingleSubcloneRegions and searchForSingleStrandedRegions,
! and the read below has both low quality and high quality portions,
! which portions of the read are considered to cover the region:
!   lllllllAAAAAAAAAAAAAAAAAAAAAAAAAllllllll
!   <--------- if "true" ------------------>
!          <-----if "false"-------->
! where l means a low quality base and A means a high quality base
! (YES)

consed.inexactSearchForStringMaxPerCentMismatch: 5
int
! when using the inexact search for string, allow up to this
! % mismatch:  the sum of the insertion, deletion, and substitution
! differences divided by the length of the query string
! (YES)

consed.onlyAllowOneReadWriteConsedAtATime: false
bool
! if there is another read-write consed (or Autofinish) process running in the
! same directory, and this consed (or Autofinish) is not read-only,
! then terminate with an error message
! (YES)

consed.autoFinishAllowHighQualityDiscrepanciesInTemplateIfConsistentForwardReversePair: true
bool
!  otherwise, a single serious hqd will cause the template to be rejected.
! (YES)


consed.printWindowCommand: /usr/bin/X11/xwd | /usr/bin/X11/xpr | /bin/lp -dlevulose
RWCString
! system command to print out a Consed Window
! (YES)

consed.fileOfTagTypes:
FileName
! pathname of a file with the following format:
! (tag name) (color for displaying) (consensus or read or both) (yes/no)
! where "consensus" or "read" or "both" indicates whether the tag
! is available for the user to add to the consensus, to reads, or to
! both, and "yes" or "no" indicates whether the tag can be created
! in Consed by swiping, or whether it only can be created by an
! external program and displayed by Consed.
! (YES)

consed.assemblyViewShowConsistentFwdRevPairs: false
bool
! too many squares!  See assemblyViewShowConsistentFwdRevPairDepth
! (YES)

consed.assemblyViewShowConsistentFwdRevPairDepth: false
bool
! This actually shows more information than  
! assemblyViewShowConsistentFwdRevPairs    and is much easier to read
! (YES)

consed.assemblyViewShowConsistentFwdRevPairsBetweenDifferentScaffolds: true
bool
! Lone links from the end of one contig to the end of another, but not
! confirmed by another in order to make the contigs joined into a scaffold.
! (YES)

consed.assemblyViewShowLegsOnSquaresForConsistentFwdRevPairs: false
bool
! This is even more cluttered than assemblyViewShowConsistentFwdRevPairs
! (YES)

consed.assemblyViewShowGapSpanningFwdRevPairs: true
bool
! This shows gap-spanning fwd/rev pairs that caused the contigs to
! be joined into a scaffold.
! (YES)

consed.assemblyViewShowWhichInconsistentFwdRevPairs: filtered
RWCString
! choices are: filtered, none, all
! "filtered" means that an inconsistent fwd/rev pair is only shown
! if it is confirmed by another inconsistent fwd/rev pair
! If all, full of red lines.  If filtered, then only red lines that are
! confirmed by other red lines are shown.
! (YES)

consed.assemblyViewHowManyPairsToConfirmAnInconsistentFwdRevPair: 100
int
! (OK)

consed.assemblyViewExpectedRandomConfirmedInconsistentFwdRevPairs: 1.0
double
! This is used to calculate
! consed.assemblyViewHowManyPairsToConfirmAnInconsistentFwdRevPair.  It
! is the predicted mean number of confirmed clusters (of size
! consed.assemblyViewHowManyPairsToConfirmAnInconsistentFwdRevPair ) of
! mate pairs assuming the reads are randomly distributed.  It uses the
! size of all of the contigs, the number of inconsistent mate pairs, and
! the # of inconsistent mate pairs
! (OK)

consed.assemblyViewFilterInconsistentFwdRevPairsIfThisClose: 500
int
! If 2 mate pairs each have one read this close together and each have
! their other read this close together, they are considered to confirm
! each other.
! (NO)

consed.assemblyViewCalculateHowManyPairsToConfirmAnInconsistentFwdRevPair: true
bool
! if this is true, then
! assemblyViewHowManyPairsToConfirmAnInconsistentFwdRevPair is ignored
! and a value is calculated using
! consed.assemblyViewExpectedRandomConfirmedInconsistentFwdRevPairs
! (OK)

consed.assemblyViewShowReadDepth: true
bool
! If true, read depth is shown in assemblyView
! (YES)

consed.assemblyViewShowMultipleHighQualityDiscrepancies: false
bool
! If true, multiple high quality discrepancies (both indel and
! non-indel type) are shown in assemblyView
! (YES)

consed.assemblyViewShowRestrictionDigestCutSites: true
bool
! If true, and you open a Digest Window in Consed and you open
! the Assembly View window in Consed, the restriction digest cut
! sites will be shown in Assembly View (in addition to showing them in 
! the Digest Window)
! (YES)

consed.assemblyViewFilterSequenceMatchesBySize: false
bool
! only show sequence matches if they fall between
! consed.assemblyViewSequenceMatchesMinSize and
! consed.assemblyViewSequenceMatchesMaxSize
! (YES)

consed.assemblyViewSequenceMatchesMinSize: 100
int
! if consed.assemblyViewFilterSequenceMatchesBySize is true,
! then only show sequence matches that are larger than this
! (YES)

consed.assemblyViewSequenceMatchesMaxSize: 10000
int
! if consed.assemblyViewFilterSequenceMatchesBySize is true,
! then only show sequence matches that are smaller than this
! (YES)

consed.assemblyViewAutomaticallyStartWithConsed: false
bool
! when consed starts, start assembly view.  This only works if you
! specify the ace file on the command line.
! (YES)

consed.assemblyViewDisplayTheseTagTypesOnTheseLines: edit 0 matchElsewhereHighQual 1 matchElsewhereLowQual 2
RWCString
! space-separated list of form:
! (tagtype) (line number) (tagtype) (line number)
! where line number is where in Assembly View the tag will be displayed
! (YES)

consed.assemblyViewShowTags: true
bool
! If true, and some tag types are selected, these tags
! will be shown in assemblyView.  If false, no tags
! will be shown in assemblyView.
! (YES)

consed.assemblyViewMaxReadDepth: 300
int
! This is used both for read depth and fwd-rev pair depth and MultipleDiscrepancies
! (OK)

consed.assemblyViewDepthScaleHorizontalLines: true
bool
! This is to make horizontal lines rather than tick marks
! (OK)

consed.assemblyViewReadDepthWindowWidth: 530
int
! This makes the window slightly smaller so labels are fields
! are closer together
! (NO)

consed.autoEditRecalculateHighQualitySegmentsOfReads: false
bool
! If true, will recalculate the high quality segments of the reads
! (YES)

consed.autoEditConvertCloneEndBasesToXs: true
bool
! If true, will convert to X's bases of all reads that protrude beyond a
! cloneEnd tag.
! (YES)

consed.autoEditTellPhrapNotToOverlapMultiplyDiscrepantReads: true
bool
! This will find all locations where there are multiple identical 
! discrepancies with the consensus (and some other conditions) and try
! to make most of the reads quality 99 at that location so that phrap,
! next time it is run, will not overlap those reads.  This will fix
! many misassemblies.
! (YES)

consed.autoEditTagEditableLowConsensusQualityRegions: true
bool
! This will find regions that are low quality, but that a human
! finisher could easily determine the correct base and thus
! money could be saved by not having Autofinish suggest additional
! reads overlapping the region
! (YES)

consed.autoEditMakeFakeRead: false
bool
! takes a list of reads and makes a false read that consists of the
! combination of those reads (using the consensus to fill in between
! them)
! (YES)

consed.autoEditMakeFakeReadFromRead1: read1
RWCString
! read 1 from which to make the fake read

consed.autoEditMakeFakeReadFromRead2: read2
RWCString
! read 2 from which to make the fake read

consed.autoEditMakeFakeReadName: mama
RWCString
! name of fake read

consed.autoEditMakeFakeReadFastaFilename: mama.fa
FileName
! name of fasta file to put the read into

consed.autoEditMergeAssembly: false
bool
! This is used to take 2 assemblies (each with a single contig) that
! have a read in common and merge them into a single assembly with a
! single contig as follows: It reads consed.autoEditSecondaryAceFile
! into a secondary assembly and finds read
! consed.autoEditMakeFakeReadName within that secondary assembly as well
! as within the primary assembly.  It assumes that these reads have the
! same unpadded bases (although they may have different pads).  It
! equalizes the pads, and then moves the other reads (which are
! typically the fwd/rev pair of reads) from the secondary assembly into
! the primary assembly.  It then deletes the secondary assembly and
! deletes consed.autoEditMakeFakeReadName from the primary assembly.
! (YES)

consed.autoEditSecondaryAceFile: mama.ace
FileName
! ace file of the fake read and the forward/reverse pair

consed.autoEditFixRunsInConsensus: false
bool
! fixes this: 
! ccc (cons)
! cc* (read1)
! *cc (read2)
! (YES)  

consed.showAllTracesJustShowGoodTraces: true
bool
! Just show traces where there is a base at the cursor and
! there is trace signal at the cursor and where 
! there is no "dataNeeded" tag at the cursor as specified by
! consed.showAllTracesDoNotShowTraceIfTheseTagsPresent
! (YES)

consed.addAlignedSequenceQualityOfBases: 40
int
! when running consed -addAlignedSequence, what quality should the
! bases be?
! (YES)

consed.makeLightBackgroundInAlignedReadsWindowAndTracesWindow: false
bool
! for printing screens, saves toner
! (YES)

consed.putVerticalLineAtCursor: true
bool
! for very high depth of coverage regions, a line helps your eye see
! follow the column
! (YES)

consed.putHorizontalLineAtCursor: true
bool
! for very wide monitors, helps to follow a read with your eye
! (YES)

consed.highlightedReadsFile: highlighted_reads.txt
FileName
! The user can use the "Misc/save highlighted reads to file" function
! to save highlighted read names to this file.
! (YES)

consed.exportScaffoldsNsBetweenContigs: 50
int
! How many n's to put between contigs in a scaffold.
! (OK)

consed.exportScaffoldsFastaOrFastq: fasta
RWCString
! options are fastq or fasta.  The latter also
! writes a .fasta.qual file
! (YES)

consed.exportScaffoldsTrimEnds: false
bool
! when exporting scaffolds, whether to trim the low quality ends of the
! contigs
! (OK)

consed.exportScaffoldsTrimEndsQuality: 13
int
! when trimming ends, find maximal segment with this
! quality and better
! (OK)

consed.solexa64FastqOrSanger33FastqForOutput: 33
int
! for exporting scaffolds in fastq format
! (OK)

consed.autoReportPrintReadNamesInRegion: false
bool
!(OK)

consed.autoReportPrintReadNamesInRegionContig: Contig1
RWCString
!(OK)

consed.autoReportPrintReadNamesInRegionLeftPos: 1
int
!(OK)

consed.autoReportPrintReadNamesInRegionRightPos: 1000
int
!(OK)

consed.autoReportPrintHighlyDiscrepantRegions: false
bool
! motivated by solexa reads.  Print where many reads disagree with
! reference sequence
! uses: 
! consed.navigateByHighlyDiscrepantPositionsMinDiscrepantReads: 2
! consed.navigateByHighlyDiscrepantPositionsMaxDepthOfCoverage: 100000
! consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality: 20
! consed.navigateByHighlyDiscrepantPositionsJustListIndels: false
! consed.navigateByHighlyDiscrepantPositionsIgnoreOtherReadsStartingAtSameLocation: false
! consed.navigateByHighlyDiscrepantPositionsIgnoreIfListedBasesInConsensus: false
! consed.navigateByHighlyDiscrepantPositionsIgnoreIfTheseBasesInConsensus: xn
! consed.navigateByHighlyDiscrepantPositionsStopOnlyOnceAtAnIndel: true
! (OK)

consed.autoReportPrintScaffolds: false
bool
! (OK)

consed.numberUnpaddedConsensusAtUserDefined: true
bool
! allow user to put a tag on the consensus to specify the number to
! start numbering the consensus.
! Must use tag consed.tagColorStartNumberingConsensus as the
! tag with the number in it.
! (OK)

consed.autoReportPrintHighQualityDiscrepancies: false
bool
! (OK)

consed.autoReportHighQualityDiscrepanciesExcludeCompressionOrG_dropoutTags: true
bool
! used in connection with consed.autoReportPrintHighQualityDiscrepancies
! (OK)

consed.autoReportHighQualityDiscrepanciesExcludeMostPads: true
bool
! used in connection with consed.autoReportPrintHighQualityDiscrepancies
! Excludes high quality discrepancy pads except those in cases such as this:
!  consensus   aa
!  read 1      *a
!  read 2      *a
!  read 3      a*
!  read 4      a*
! (OK)

consed.autoReportPrintLowConsensusQualityRegions: false
bool
! (OK)

consed.autoReportPrintSingleSubcloneRegions: false
bool
! (OK)

consed.autoReportPrintSingleStrandedRegions: false
bool
! (OK)

consed.autoReportPrintLinkingForwardReversePairs: false
bool
! (OK)

consed.autoReportPrintFilteredInconsistentForwardReversePairs: false
bool
! (OK)

consed.autoReportPrintAssemblySummary: false
bool
! (OK)

consed.autoReportPrintPotentialJoins: false
bool
! (OK)

consed.autoReportCalculateGenotypes: false
bool
! (OK)

consed.calculateGenotypesMinimumSNPQuality: 10
int
! (OK)

consed.calculateGenotypesMinimumQualityDiscrepancy: 20
int
! For snp program will only calculate the genotype at a position if
! that position has a discrepancy (with the consensus) of this quality
! or higher.
! (OK)

consed.calculateGenotypesMinimumConsensusQuality: 0
int
! This used to be 10.  But I am afraid of undercalling SNPs so
! changed it to 0.
! (OK)  

consed.calculateGenotypesMinimumMappingQuality: 20
int
! there must be at least 1 read that has this mapping quality or
! better.  Otherwise this site is not considered.
! (OK)

consed.calculateGenotypeDebugging: false
bool
! turn on to get more info, especially about why genotypes
! where not calculated for some columns
! (OK)

consed.potentialJoinThisManyHighQualityBases: 1000
int
! use this many bases from the high quality segment when looking for
! overlaps
! used by consed.autoReportPrintPotentialJoins:
! (OK)

consed.potentialJoinCrossMatchOptions: -masklevel 101
RWCString
! (OK)

consed.potentialJoinCleanUpTemporaryFiles: true
bool
! (OK)

consed.potentialJoinIgnoreDiscrepancyThisFarFromEndOfAlignment: 75
int
! if a discrepancy is within this far from the end of an alignment,
! ignore the discrepancy
! (OK)

consed.potentialJoinMinReadDepthAtDiscrepancy: 3
int
! if a discrepant position has fewer than this number of reads,
! ignore the discrepancy
! (OK)

consed.potentialJoinHighQualityDiscrepancy: 20
int
! if a discrepancy is below this quality, ignore the discrepancy
! (OK)

consed.showAllTracesDoNotShowTraceIfTheseTagsPresent: dataNeeded    
RWCString
! See consed.showAllTracesJustShowGoodTraces 
! (OK)

consed.nameOfFakeJoiningReadsIncludesAceFileName: false
bool
! This is useful if the user is going to combine the reads
! from a number of different ace files together.  
! (OK)

consed.whenUserScrollsOffWindowMillisecondsBetweenScrolling: 250
int
! (OK)

consed.whenUserScrollsOffWindowBasesToScrollEachTime: 15
int
! (OK)

consed.compareContigsUseBandedRatherThanFullSmithWaterman: true
bool
! (OK)

consed.compareContigsBandSize: 50
int
! band size of banded Smith Waterman
! (OK)

consed.assemblyViewShowFwdRevPairDepthsInRedIfOnlyThisMany: 1
int
! (OK)

consed.assemblyViewShowSequenceMatches: true
bool
! When false, do not show any sequence matches (repeats) 
! at all in Assembly View.
! Some people like to start out this way since displaying sequence
! matches slows down scrolling.
! (OK)

consed.assemblyViewOKToShowSequenceMatchesBetweenContigs: true
bool
! (OK)

consed.assemblyViewOKToShowSequenceMatchesWithinContigs: true
bool
! (OK)

consed.assemblyViewOKToShowDirectSequenceMatches: true
bool
! This means in which neither copy must be complemented with respect
! to the way it is in the scaffold as created by Consed.
! (OK)

consed.assemblyViewOKToShowInvertedSequenceMatches: true
bool
! This means that exactly one copy must be complemented with respect
! to the way it is in the scaffold as created by Consed.
! (OK)

consed.assemblyViewOnlyShowSequenceMatchesToAParticularRegion: false
bool
! You must set consed.assemblyViewOnlyShowSequencematchesToThisContig
! consed.assemblyViewOnlyShowSequenceMatchesToThisRegionLeft
! consed.assemblyViewOnlyShowSequenceMatchesToThisRegionRight
! (OK)

consed.assemblyViewOnlyShowSequenceMatchesToThisContig:
RWCString
! You must make
! consed.assemblyViewOnlyShowSequenceMatchesToAParticularRegion: true
! (OK)

consed.assemblyViewOnlyShowSequenceMatchesToThisRegionLeft: 0
int
! consed.assemblyViewOnlyShowSequenceMatchesToAParticularRegion: true
! (OK)

consed.assemblyViewOnlyShowSequenceMatchesToThisRegionRight: 0
int
! consed.assemblyViewOnlyShowSequenceMatchesToAParticularRegion: true
! (OK)

consed.assemblyViewOnlyShowSequenceMatchesToEndsOfContigs: false
bool
! (OK)

consed.assemblyViewOnlyShowSequenceMatchesToEndsOfContigsThisFar: 1000
int
! This many base pairs from the end of the contig.
! (OK)

consed.autoFinishDoNotDoPCRIfThisManyAvailableGapSpanningTemplates: 2
int
! (OK)

consed.autoFinishDoNotDoUnorientedPCRIfThisManyOrMoreUnorientedPCRReactions: 6
int
! \"unoriented\" pcr reactions means cases in which autofinish is suggesting
! a pcr reaction to span a gap, but it doesn't know whether the 2 contig ends
! really go together since there are not enough (or no)templates that span 
! that gap 
! (OK)

consed.autoFinishDoNotDoOrientedPCRIfGapSizeLargerThanThis: 10000
int
! Gap size can be specified in user-defined contigEndPair tags in a 
! gap_size: field
! If the gap size is greater than this number, do not do PCR.
! (OK)

consed.autoFinishDoNotDoPCRIfEndIsExtendedByReads: false
bool
! If this is true, and autofinish was able to walk off the end of a
! contig, do not do PCR with that end of the contig. 
!
! (OK)

consed.autoFinishMaxAcceptableErrorsPerMegabase: 0
int
! target error rate.  This parameter used to be the one that stopped
! Autofinish from calling more reads.  However, consider a BAC that is
! nearly perfect except for one region with 3 quality 10 bases in a
! row.  In this case the global errors per megabase is very
! low--perhaps lower than 1 error per megabase.  Despite this, most
! labs would like to do one more read to fix this problem.  Thus we
! set this parameter to zero (to disable it) so Autofinish will use
! the parameter consed.autoFinishMinNumberOfErrorsFixedByAnExp to stop
! calling more reads--it is a local error rate.
! (OK)

consed.autoFinishIfNotEnoughFwdRevPairsUseThisPerCentOfInsertSize: 90
int
! If a template has a forward but no reverse, when deciding whether to
! allow this template for a particular primer, we need to make an assumption
! of where is the end of the template.  If the template comes from a library
! with insert size 1500, it would be reasonable to assume that the end of
! template will be 1500 bases from the forward read.  But if this template
! has an insert that is shorter than average, the walk may walk into vector.
! To be conservative, we may want to assume that the insert is somewhat 
! shorter than average.  By default, we assume that it is 90% as large as 
! the average. This parameter gives that percentage.  This parameter
! is used both by Consed and Autofinish.
! (OK)

consed.primersNumberOfBasesToBackUpToStartLooking: 50
int
! e.g., if this is 50 and you want a read at position 1000, primers
! will be searched before base 950 but not in the region 950 to 1000
! This has no effect on Autofinish--just on interactively picking primers.
! (OK)

consed.primersMakePCRPrimersThisManyBasesBackFromEndOfHighQualitySegment: 100
int
! When a PCR product is made, you want it to overlap by this many bases
! the high quality part of the existing consensus.  Thus choose PCR
! primers this many bases back (or more)
! (OK)

consed.primersOKToChoosePrimersInSingleSubcloneRegion: true
bool
! (OK)

consed.primersOKToChoosePrimersWhereHighQualityDiscrepancies: false
bool
! (OK)

consed.primersOKToChoosePrimersWhereUnalignedHighQualityRegion: false
bool
! (OK)

consed.autoFinishCallReversesToFlankGaps: true
bool
! if there is a forward-reverse pair flanking a gap, print it out
! if there is not, suggest reverses to flank the gap
! (OK)

consed.autoFinishAllowWholeCloneReads: false
bool
! ok to call reads whose template for sequencing reaction is the
! entire clone (BAC or cosmid)
! (OK)

consed.autoFinishAllowCustomPrimerSubcloneReads: true
bool
! ok to call reads with custom primers and subclone template
! (OK)

consed.autoFinishAllowResequencingReads: true
bool
! This is just universal primer reads to be resequenced using
! dye terminator chemistry or special chemistry.  (It does not
! mean resequencing a custom primer read.)
! (OK)

consed.autoFinishAllowResequencingReadsOnlyForRunsAndStops: false
bool
! This parameter only has any effect when
! consed.autoFinishAllowResequencingReads is set to true.  In that
! case no resequencing reads will be suggested, unless it is to cross
! a run or stop and special chemistry is suggested.
! (OK)    

consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: true
bool
! Allows calling reverse when there is just a forward.
! Allows calling a forward when there is just a reverse.
! (OK)

consed.autoFinishAllowMinilibraries: false
bool
! Allows calling minilibraries (shatter libraries or transposon
! libraries) of subclone templates for closing gaps
! (OK)

consed.autoFinishAllowPCR: true
bool
! Allows calling PCR for closing gaps, but only as a last resort
! (OK)

consed.autoFinishAllowUnorientedPCRReactions: true
bool
! Allows calling PCR amongst contig-ends that have insufficient
! fwd/rev pair linkage to any other contig-end.  Thus it suggests
! pcr amongst all such contig-ends.  
! To allow this type of pcr, you must also make:
! consed.autoFinishAllowPCRForUnorientedContigEnds: true
! See also:
! consed.autoFinishDoNotDoUnorientedPCRIfThisManyOrMoreUnorientedPCRReactions: 
! which gives you finer control over unoriented pcr.
! (OK)

consed.autoFinishAllowResequencingAUniversalPrimerAutofinishRead: false
bool
! if Autofinish suggests a de novo universal primer read,
! do not allow Autofinish to suggest a resequence of this read
! (OK)


consed.autoFinishAlwaysCloseGapsUsingMinilibraries: false
bool
! \"Minilibraries\" includes transposing a subclone template or
! making a shatter library from a subclone template
! (OK)

consed.autoFinishMaximumFinishingReadLength: 2000
int
! Change this only if your finishing reads are typically shorter
! than your shotgun reads.  Otherwise, leave it unrealistically long,
! and Autofinish will set its model read based on your existing
! shotgun reads.
! (OK)

consed.autoFinishSuggestMinilibraryIfGapThisManyBasesOrLarger: 800
int
! (OK)

consed.autoFinishSuggestSpecialChemistryForRunsAndStops: true
bool
! Suggest special chemistry such as dGTP for reads that cross
! mononucleotide or dinucleotide repeats that cause reads to fail or 
! stops (structure) that cause reads to fail and thus dye terminator
! reads won't work.
! (OK)


consed.autoFinishSuggestThisManyMinilibrariesPerGap: 2
int
! (OK)

consed.primersWindowSizeInLooking: 450
int
! e.g., if this is 300, with example above, primers will be searched
! from base 650 to 950.  This has no effect on Autofinish--it is just
! used for interactive primer picking in Consed.
! (OK)

consed.primersAssumeTemplatesAreDoubleStrandedUnlessSpecified: false
bool
! you can put the template type in the phd file in a WR template item
! consed will have a list of these and know which are single and
! double stranded
! (OK)

consed.alignedReadsWindowInitialCharsWide: 60
int
! initial width of the aligned reads window including the read name and 
! the bases
! (OK)

consed.alignedReadsWindowInitialCharsHigh: 20
int
! initial height of the aligned reads window area where the consensus
! and reads are
! (OK)

consed.alignedReadsWindowMaxCharsForReadNames: 20
int
! how many columns are reserved for read names
! If alignedReadsWindowAutomaticallyExpandRoomForReadNames is false,
! then this is the fixed number of chars on the screen for read names.
! If alignedReadsWindowAutomaticallyExpandRoomForReadNames is true,
! then this is not used.
! (OK)

consed.alignedReadsWindowMaxCharsForReadNamesWhenAutomaticallyExpand: 45
int
! how many columns are reserved for read names in the following 
! sitution:
! if alignedReadsWindowAutomaticallyExpandRoomForReadNames is true,
! and the result of the algorithm comes up with a number that is
! greater than this, this number will override and be used instead.
! (OK)  

consed.alignedReadsWindowAutomaticallyExpandRoomForReadNames: true
bool
! If true, expand and contract space for read names, but don't
! contract less than consed.alignedReadsWindowMaxCharsForReadNames.
! If false, then always use
! consed.alignedReadsWindowMaxCharsForReadNames
! for space reserved for read names.
! (OK)

consed.alignedReadsWindowReadNameSizeStdDeviations: 2.0
double
! The room for read names is this many standard deviations above
! the mean of all reads    
! (NO)

consed.alignedReadsWindowReadPrefixSizeStdDeviations: 2.0
double
! The room for read prefixes is this many standard deviations above
! the mean of all reads
! (NO)

consed.alignedReadsWindowDefaultRoomForReadPrefixes: 2
int
! The default room for read prefixes is this (this is 
! used if consed.alignedReadsWindowAutomaticallyExpandRoomForReadNames
! is false
! (OK)  

consed.defaultReadPrefix: *
RWCString
! This is used as the character to prefix reads with when the 
! read is in consed.readPrefixesFile and the prefix is not specified.
! (OK)

consed.readPrefixesFile: readPrefixes.txt
FileName
! This file should contain a list of reads that you would want to 
! have prefixes in the Aligned Reads Window.  Each line should
! have the following format:
! (read name) (prefix) (color)
! The prefix and color are optional.  You can have a line like this:
! (read name) (prefix)
! or this:
! (read name)
! but not this:
! (read name) (color)
! If the color is not specified, the color will default to 
! consed.colorReadPrefixes: blue
! If the prefix is not specified, it will default to 
! (OK)

consed.maxCharsDisplayedForReadPrefix: 15
int
! It is still ok to have long read prefixes in the file
! consed.readPrefixesFile but only this many characters
! will be displayed in the Aligned Reads window
! (OK)

consed.autoFinishAllowResequencingReadsToExtendContigs: false
bool
! if false, a resequencing read is not called to extend a contig--only
! custom primer reads and de novo universal primer reads are called
! for this purpose.  
! (OK)

consed.autoFinishCallHowManyReversesToFlankGaps: 2
int
! This has two purposes: 1) it specifies how many forward/reverse
! pairs should be present for Consed/Autofinish to be certain of the
! order/ orientation of two contigs.  If there are this many fwd/rev
! pairs flanking a gap, Autofinish will print out the contig ends that
! flank the gap.  2) If consed.autoFinishCallReversesToFlankGaps is
! set to true, and there are less than this many fwd/rev pairs
! flanking a gap, Autofinish will suggest additional reverses until
! there are this many.
! (OK)

consed.autoFinishCloseGaps: true
bool
! this allows you to turn off choosing reads to close gaps
! (OK)

consed.autoFinishContinueEvenThoughReadInfoDoesNotMakeSense: false
bool
! this allows you to override the checks that autofinish makes on the
! read info, such as checking there are not more than 5 or so reads
! from the same subclone template
! (OK)

consed.autoFinishCostOfResequencingUniversalPrimerSubcloneReaction: 20.0
double
! compares universal primer subclone reaction, custom primer subclone 
! reaction, and custom primer clone reaction to decide which to favor
! (OK)

consed.autoFinishCostOfCustomPrimerSubcloneReaction: 60.0
double
! see above
! (OK)

consed.autoFinishCostOfCustomPrimerCloneReaction: 80.0
double
! see above 
! (OK)

consed.autoFinishCostOfDeNovoUniversalPrimerSubcloneReaction: 60.0
double
! cost of reverse where there is only a forward or cost of forward
! when there is only a reverse
! (OK)

consed.autoFinishCostOfMinilibrary: 500.0
double
! cost of making a minilibrary (transposon library or shatter library)
! from a subclone template
! (OK)

consed.autoFinishCoverSingleSubcloneRegions: true
bool
! this allows you to turn off choosing reads to cover single subclone regions
! (OK)

consed.autoFinishCoverLowConsensusQualityRegions: true
bool
! this allows you to turn off choosing reads to cover low consensus
! quality regions
! (OK)

consed.autoFinishDebugUniversalPrimerReadsFile: gordon_debug.txt
FileName
! for debugging Autofinish
! put a file with this name in the same directory as the ace file
! format:
! fcalld09 fwd
! fgj74f01 rev
! (template name) (fwd or rev)
! (OK)

consed.autoFinishDebugCustomPrimerReadsFile: debug_custom.txt
FileName
! for debugging Autofinish
! put a file with this name in the same directory as the ace file
! format:
! cgggacctgg
! (primer in 5' to 3' orientation)
! (OK)

consed.autoFinishDoNotAllowSubcloneCustomPrimerReadsCloserThanThisManyBases: 200
int
! see consed.autoFinishDoNotAllowSubcloneCustomPrimerReadsCloseTogether
! (OK)

consed.autoFinishDoNotAllowWholeCloneCustomPrimerReadsCloserThanThisManyBases: 300
int
! see consed.autoFinishDoNotAllowWholeCloneCustomPrimerReadsCloseTogether
! (OK)

consed.autoFinishDoNotFinishWhereTheseTagsAre: doNotFinish editable
RWCString
! list of tag types separated by spaces.  E.g.,
! doNotFinish repeat 
! tells autofinish that you are not interested in finishing in this region
! (OK)

consed.autoFinishDoNotExtendContigsWhereTheseTagsAre: doNotFinish
RWCString
! list of tag types separated by spaces.  E.g.,
! doNotFinish repeat
! tells autofinish that you do not want to extend the contig near this
! tag.  If you do not want this feature, just leave the list empty.
! (OK)

consed.autoFinishDoNotExtendContigsIfTagsAreThisCloseToContigEnd: 50
int
! Uses the list from consed.autoFinishDoNotExtendContigsWhereTheseTagsAre
! and checks if any of these tags are within this many bases of the end of 
! the contig.  If they are, does not extend the contig.
! (OK)

consed.dumpContigOrderAndOrientationInfoToThisFile:
FileName
! In the case of Consed (not autofinish or autoPCRAmplify), send the
! output to this file rather than stderr.  If this name is blank,    
! continue (in case of consed), to send output to stderr.
! (OK)

consed.autoFinishDumpTemplates: false
bool
! for debugging, this allows you to dump all information about the 
! templates--insert locations 
! (OK)

consed.autoFinishExcludeContigIfOnlyThisManyReadsOrLess: 10
int
! (OK)

consed.autoFinishExcludeContigIfDepthOfCoverageGreaterThanThis: 100000000.0
double
! To exclude contigs that are probably E. coli contamination
! \"depth of coverage\" is defined here to mean the sum of the read
! lengths (including low quality ends) divided by the contig length.
! Turned off by making a very high number July 21, 2011 (DG) since
! some applications have extremely high depth of coverage and should
! still be included.
! (OK)

consed.autoFinishExcludeContigIfThisManyBasesOrLess: 1000
int
! consed.autoFinishExcludeContigIfTooShort must be set to true for
! this to have any effect
! (OK)

consed.autoFinishHowManyTemplatesYouIntendToUseForCustomPrimerSubcloneReactions: 3
int
! this tells autofinish which templates you are planning on using
! which is necessary to figure out which regions will still be single
! subclone regions
! (OK)


consed.primersMinNumberOfTemplatesForPrimers: 1
int
! if there are fewer templates than this, the primer is rejected

// Pat wanted this 70 on May 5, 2000 to allow for 20 bases of poor
// quality at beginning of read and then 50 bases for phrap to
// assemble together
consed.autoFinishMinBaseOverlapBetweenAReadAndHighQualitySegmentOfConsensus: 70
int
! when extending the consensus, a read that is too far from the
! consensus will not be assembled by phrap with this contig and thus
! will not be useful for extending the consensus.  This gives the
! minimum overlap of a read with the high quality segment of the
! consensus.  As reads are picked, then additional reads may be picked
! further out.
! (OK)

consed.autoFinishNumberOfVectorBasesAtBeginningOfAUniveralPrimerRead: 40
int
! used to figure out where the beginning of a reverse will be.  Not
! important to be accurate because the insert size is so uncertain
! (OK)

consed.autoFinishCDNANotGenomic: false
bool
! If this is set to true, the whole clone is assumed to be cDNA and,
! rather than the normal method of detecting the end of the clone,
! Autofinish detects the end of the cDNA as follows:
! the user is expected to add whole read items of type 'template',
! with 'type: univ fwd' for the 5' end and 'type: univ rev' for the 3'
! end of the cDNA.  
! (OK)

consed.autoFinishConfidenceThatReadWillCoverSingleSubcloneRegion: 90
int
! Autofinish computes the per cent of existing reads are aligned at
! each base position.  Typically, this number starts at around 0% at
! base position 1, rises to close to 100% at around base position 300,
! and then drops again to 0% at base position 800 or so.  This number
! specifies how high the number must be for Autofinish to consider an
! Autofinish read to cover a single subclone region.
! (OK)

consed.autoFinishPrintForwardOrReverseStrandWhenPrintingSubcloneTemplatesForCustomPrimerReads: true
bool
! If this is true, then custom primer reads are printed out like this:
! tccagaaaactaattcaaaataatg,56,standard.2,->,2413,2413,3681,Contig1,9,djs74_690 (fwd),10,djs74_1803 (fwd),11,djs74_1861 (fwd)
! If this is false, then custom primer reads are printed out like this:
! tccagaaaactaattcaaaataatg,56,standard.2,->,2413,2413,3681,Contig1,9,djs74_690,10,djs74_1803,11,djs74_1861
! The difference is the (fwd) or (rev) that indicates which strand of
! the subclone template is to be used.  This is particularly important if
! you use M13 and thus must make the reverse strand.
! (OK)

consed.autoFinishPrintMinilibrariesSummaryFile: false
bool
! If this is true, Autofinish will print a file with name
! xxx.minilibraries just as it prints one as xxx.univReverses and
! xxx.univForwards
! (OK)

consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: true
bool
! This is set to true to increase the chance of closing a gap.  For
! every subclone template that has just one universal primer read
! (either just a forward or just a reverse) that might protrude off
! the end of the contig, Autofinish suggests the universal primer read
! off the opposite end of the subclone template.
! If this parameter is set false, then
! Autofinish may still choose some of these reads, but it won't
! necessarily choose them all.
! (OK)

consed.autoFinishDoNotIgnoreLCQIfThisManyBasesFromEndOfContigForLCQTagger: 300
int
! Do not ignore low consensus quality bases if they are this many 
! bases from the end of the contig.    
! (OK)

consed.checkIfTooManyWalks: true
bool
! this just checks if the number of walks, pcr ends, and unknown reads
! exceeds 20% of the total number of reads.  If this is exceeded, then 
! a warning message is given.  Typically, such a warning indicates
! that you have incorrectly customized determineReadTypes.perl
! (OK)

consed.numberOfColumnsBeforeReadNameInAlignedReadsWindow: 1
int
! this is for displaying information about the whole read items, 
! both from PHD files and from a file

consed.compareContigsAlignsThisManyBasesMax: 2000
int
! (OK)

consed.compressedChromatExtension: .gz
RWCString
! (OK)

consed.dimLowQualityEndsOfReads: false
bool
!
! (OK)

// phil 980713 requested that the default be to not dim
// low quality ends of reads and to dim the unaligned ends 
// of reads

consed.dimUnalignedEndsOfReads: true
bool
! (OK)

consed.fakeReadsSpecifiedByFilenameExtension: true
bool
! if this is true, then reads that end with .a[0-9]* or .c[0-9]* will
! be considered fake reads.  Otherwise, fake reads will be indicated
! by a WR item in the PHD file.
! (OK)

consed.fullPathnameOfAddReads2ConsedScript: $CONSED_HOME/bin/addReads2Consed.perl
FileName
! (OK)

consed.fullPathnameOfFixContigEndScript: $CONSED_HOME/bin/fixContigEnd.perl
FileName
! (OK)

consed.fixContigEndsCleanUpTemporaryFiles: true
bool
! -fixContigEnds leaves behind zillions of temporary files from
! phrapping.  Delete these (except for debugging).
! (OK)

consed.fixContigEndsMinSmithWatermanScoreToMakeJoin: 30
int
! when making the join, if the smith-waterman score is less
! than this, do not make the join and leave the contig-end as is
! (NO)

consed.fixContigEndsMinNumberOfReadsInContig: 5
int
! only fix contigs that have this number of reads or more
! (YES)

consed.fixConsensusRecalculateQualities: false
bool
! should the consensus quality be recalculated at a base even
! if the consensus base is correct?
! (OK)

consed.humanGenomeFOF: 
FileName
! Used by consed -comparePhaster for comparing bwa and phaster output
! (YES)

consed.exomicRegions: 
FileName
! Used by consed -comparePhaster for denoting exomic regions
! Each line should look like this:
! chrX:99032-99156
! which indicates that positions 99,032 to 99,156 on chrX 
! are in the exome
! (YES)

consed.fullPathnameOfCrossMatch: $CONSED_HOME/bin/cross_match
FileName
! (OK)

consed.fullPathnameOfPhred: $CONSED_HOME/bin/phred
FileName
! (OK)

consed.fullPathnameOfMiniassemblyScript: $CONSED_HOME/bin/phredPhrap
FileName
! If you are up-to-date with phredPhrap, this script serves both
! the purpose of assemblying the entire project, as well as making
! miniassemblies.  The difference is whether phredPhrap has the
! -include_chromats option.
! (OK)

consed.decompressionProgram: /bin/gunzip -c
RWCString
! Program and arguments for decompressing the ace file and the phd
! ball.
! (OK)

consed.compressionProgram:  /bin/gzip -c
RWCString
! Program and arguments for compressing the ace file and the phd ball.
! (OK)

consed.compressedExtension: .gz
RWCString
! Used with consed.decomprssionProgram and consed.compressionProgram
! Should include the dot.
! (OK)

consed.okToUseObsoleteMiniassemblyScript: false
bool
! If you have made so many modifications to phredPhrap that you
! can't afford to rev up to 120312, you can set this to true.  This
! will mean that you will duplicate consensus tags.
! (OK)

consed.gunzipFullPath: /bin/gunzip
RWCString
! (OK)

consed.fullPathnameOfCat:  /bin/cat
FileName
! (OK)

consed.fullPathnameOfFilter454ReadsScript: $CONSED_HOME/bin/filter454Reads.perl
FileName
! Runs crossmatch between the unpaired reads and puc19 to eliminate those
! reads contaminated with puc19
! (OK)

consed.filter454ReadsAgainstThis: $CONSED_HOME/lib/screenLibs/filter454Reads.fa
FileName
! used by consed.fullPathnameOfFilter454ReadsAgainstVectorScript
! (OK)

consed.454LinkerSequences: $CONSED_HOME/lib/screenLibs/sffLinkers.fa
FileName
! the linker sequence for paired-end 454 reads used in 
! sff2PhdBall
! (OK)

consed.hideSomeTagTypesAtStartup: false
bool
! (OK)

consed.processMatePairsAtStartup: true
bool
! This is necessary to show the mate pair flags, but it does add
! around 15% to startup time.
! (OK)

consed.maximumNumberOfTracesShown: 4
int
! (OK)

consed.navigateAutomaticTracePopup: false
bool
! (OK)

consed.navigateAutomaticAllTracesPopup: false
bool
! (OK)

consed.primersMinimumLengthOfAPrimer: 15
int
! (OK)

consed.primersMaximumLengthOfAPrimer: 25
int
! (OK)

consed.primersMinimumLengthOfAPrimerForPCR: 18
int
! (OK)

consed.primersMaximumLengthOfAPrimerForPCR: 30
int
! (OK)

consed.primersMaxMeltingTempDifferenceForPCR: 3.0
double
! how large can the difference of melting temperatures be between
! two primers of a PCR primer pair
! (OK)

consed.primersMaxPCRPrimerPairsToDisplay: 100000
int
! there is a limit here, because there could possibly be millions
! (OK)

consed.primersCheckJustSomePCRPrimerPairsRatherThanAll: true
bool
! If there are 1000 1st primers, and 1000 2nd primers, that gives
! a million pairs for Consed to check, which takes a long time.  So
! instead, just check some of the pairs
! (OK)

consed.primersNumberOfTemplatesToDisplayInFront: 2
int
! this shows the number of templates to show in the interactive primer
! picking window
! (OK)

consed.primersMaxLengthOfMononucleotideRepeat: 4
int
! (OK)

consed.primersBadLibrariesFile: badLibraries.txt
FileName
! file of libraries, one per line
! If any template is from any one of these libraries, then
! consed/autofinish will not use this template for walking or
! suggesting any universal primer reads
! (OK)

consed.primersLibrariesInfoFile: librariesInfo.txt
FileName
! file of libraries, with one entry for each library of the following
! format:
! LIB{
! name: library1
! avgInsertSize: 3000
! maxInsertSize: 5000
! stranded: single
! cost: 600.0
! }
! (OK)

consed.primersBadTemplatesFile: badTemplates.txt
FileName
! file of templates that you've tried, don't work, and you don't want to try
! again
! (OK)

consed.primersChooseTemplatesByPositionInsteadOfQuality: true
bool
! Templates for subclone custom primer walks can be chosen either on
! the basis of the quality of the template (as determined by the quality 
! of existing reads from that template) or by the location of the end of 
! the template.  If this parameter is false, templates will be chosen
! based solely on quality.  If this parameter is true, then templates
! with forward/reverse pairs will be picked first, followed by templates 
! that have the beginning of the insert closest to the primer.
! (OK)

consed.primersWhenChoosingATemplateMinPotentialReadLength: 350
int
! when choosing templates for a custom primer, only choose a template
! if the read can be chosen at least this long
! (OK)

consed.primersWindowSizeInLookingForPCR: 2000
int
! will look this many bases back from the pointer when looking for a PCR
! primer.  Used both interactively and for Autofinish (see
! getUnpaddedRangeForMakingPCRPrimers )
! (OK)

consed.qualityThresholdForFindingHighQualityDiscrepancies: 40
int
! high quality discrepancies have this quality or higher    
! (OK)

consed.qualityThresholdForNavigateByDepthOfCoverage: 10
int
! for high depth of coverage, this is the minimum quality
! see consed.navigateByHighDepthOfCoverageNotLow
! (OK)

consed.navigateByHighDepthOfCoverageNotLow: true
bool
! see consed.qualityThresholdForNavigateByDepthOfCoverage:
! (OK)

consed.MinDepthForNavigateByDepthOfCoverage: 10
int
! see consed.qualityThresholdForNavigateByDepthOfCoverage:
! (OK)

consed.defaultVectorPathnameForRestrictionFragments: $CONSED_HOME/lib/screenLibs/singleVectorForRestrictionDigest.fasta
FileName
! If you want to have the vector cut with the restriction
! enzymes, put the vector sequence in a file in fasta format
! and put a pathname to it here.
! (OK)

consed.fileOfAdditionalRestrictionEnzymes: 
FileName
! If you want a restriction enzyme that is not in the huge list
! that comes with Consed, you can put additional enzymes in a file
! and put the full pathname of that file here.  The file must be in
! the form:
! AatI AGGCCT
! where AatI is the name of the enzyme and AGGCCT is the recognized
! sequence.  Do not include the cut site or any other information.
! There must be a single space separating them.
! (OK)

consed.commonRestrictionEnzymes: BglII EcoRV NsiI HindIII BamHI XhoI PstI
RWCString
! a space-separated list of enzymes.  Make sure they match precisely
! those that are either defaults or in the file indicated by
! consed.fileOfAdditionalRestrictionEnzymes
! (OK)

consed.defaultSelectedRestrictionEnzymes: EcoRV HindIII
RWCString
! a space-separated list of enzymes that will initially be
! selected when the user pops open the list of restriction enzymes.
! Currently these must be from among the consed.commonRestrictionEnzymes
! (OK)

consed.restrictionEnzymesActualFragmentsFile: fragSizes.txt
FileName
! format like this:
! >EcoRV
! 2385
! 2489
  ! -1
! >XhoIII
! 259
! 3843
! -1
! (OK)

consed.restrictionDigestInitialWindowSizeInTextRows: 45
int
! (OK)

consed.restrictionDigestDoNoShowAreaOfFragmentsOverThisSize: 50000
int
! In the picture of the real and in-silico 
! (OK)

consed.showReadsAlphabetically: false
bool
! (OK)

consed.showReadsInAlignedReadsWindowOrderedByFile: false
bool
! There are now 3 different ways to sort the reads in the Aligned
! Reads Window (top to bottom):  
! 1) alphabetically in which case you should set:
!    consed.showReadsAlphabetically: true    
!    consed.showReadsInAlignedReadsWindowOrderedByFile: false
! 2) by the left end of the reads in which case you should set:
!    consed.showReadsAlphabetically: false
!    consed.showReadsInAlignedReadsWindowOrderedByFile: false
! 3) by a file that specifies the order of the reads in which case you
!    should set:
!    consed.showReadsAlphabetically: false
!    consed.showReadsInAlignedReadsWindowOrderedByFile: true
! It is an error to set:            
!    consed.showReadsAlphabetically: true
!    consed.showReadsInAlignedReadsWindowOrderedByFile: true
! (OK)

consed.showReadsInAlignedReadsWindowOrderedByThisFile: readOrder.txt
FileName
! This file has one read name per line.  Wildcards ('*') are allowed.
! E.g.,
! ABX*
! myFavoriteRead.scf
! *.abi
! This means that all reads that start with ABX* will come first,
! followed by the single read myFavoriteRead.scf and then reads that end 
! with .abi   A read that doesn't meet any of these criteria (e.g.,
! rs10469282 ) comes last.    
! (OK)

!consed.showReadsSortedByQualityValuesAtCursor: true
!bool
! deprecated
! Note that currently this only applies when the cursor is set
! on the consensus position.  When scrolling, the reads are
! sorted according to consed.showReadsAlphabetically and
! consed.showReadsInAlignedReadsWindowOrderedByFile
! (OK)

consed.showReadsAtCursorSortedHow: none
RWCString
! Note that currently this only applies when the cursor is set
! on the consensus position.  When scrolling, the reads are
! sorted according to consed.showReadsAlphabetically and
! consed.showReadsInAlignedReadsWindowOrderedByFile
! use the following values:  quality, base, discrepantBase, none, bamThenBase
! (OK)

consed.showABIBasesInTraceWindow: false
bool
! (OK)

consed.tracesWindowInitialPixelHeight: 50
int
! (OK)

consed.assemblyViewWindowInitialPixelHeight: 500
int
! (OK)


consed.assemblyViewFileOfTemplatesToNotShow: doNotShowInAssemblyView.fof
FileName
! (OK)

consed.assemblyViewCrossMatchMinmatch: 30
int
! value of -minmatch to be passed to crossmatch
! (OK)

consed.assemblyViewCrossMatchMinscore: 60
int
! value of -minscore to be passed to crossmatch
! (OK)

consed.assemblyViewFindSequenceMatchesForConsedScript: $CONSED_HOME/bin/findSequenceMatchesForConsed.perl
FileName
! script that generates the file that is used by Assembly View to
! show sequence matches    
! (OK)

consed.assemblyViewCrossmatchMinmatch: 50
int
! default value of -minmatch for running crossmatch with 
! findSequenceMatchesForConsed.perl
! (OK)

consed.assemblyViewCrossmatchMinscore: 50
int
! default value of -minscore for running crossmatch with
! findSequenceMatchesForConsed.perl
! (OK)

consed.assemblyViewAutomaticallyRunCrossmatchIfNecessary: false
bool
! (OK)

consed.assemblyViewSequenceMatchesMinimumSimilarity: 90
int
! only show sequence matches if their simlarity is at least this
! value.  This can be changed by the user within consed/assembly view/
! by clicking on "What to show/Sequence Matches"
! (OK)

consed.assemblyViewNumberOfRowsOfTags: 4
int
! (OK)

consed.assemblyViewWindowInitialPixelWidth: 800
int
! (OK)

consed.assemblyViewRestartAfterRemoveHighlightedReads: true
bool
! In the "Put Reads Into Their Own Contigs" Window, after clicking
! "Remove Highlighted Reads", should assembly view be restarted?
! (OK)

consed.assemblyViewRestartAfterJoin: true
bool
! (OK)

consed.assemblyViewRestartAfterTear: true
bool
! (OK)

consed.assemblyViewRestartAfterMiniassembly: true
bool
! (OK)

consed.tracesWindowInitialPixelWidth: 800
int
! (OK)

consed.automaticallyScaleTraces: true
bool
! (OK)

consed.automaticallyScaleTracesSamplePeakHeightFractionOfWindowHeight: 0.99
double
! (OK)

consed.automaticallyScaleTracesSamplePeakPercentile: 100
int
! (OK)

consed.createFakeTracesWhenNeeded: true
bool
! if consed cannot find traces, such as when the reads are solexa
! reads but the user has not specified CHEM: solexa, go ahead and 
! create a fake trace so user can edit the read
! (OK)

consed.verticalTraceMagnification: 30
int
! (OK)

consed.storeTracePeakPositions: whenChromatAvailable
RWCString
! never, always or whenChromatAvailable
! changing this to "always" increases memory usage by close to 100%,
! especially if the assembler is Newbler.  always means it will store
! the positions if they are present in the phd file/ball.
! "whenChromatAvailable" means it will store trace peak positions for
! a read when there is a chromat in chromat_dir.  (Note:  these just
! apply to phd balls.  peak positions in phd files are always stored.)  
! (OK)

consed.userDefinedKeys: 14 15
RWCString
! make a space-separated list of the decimal ASCII values of the keys
! 14 means control-N, 15 means control-O
! (OK)

consed.programsForUserDefinedKeys: /bin/echo /bin/echo
RWCString
! a space-separated list of the full pathnames of the commands to run
! This goes with consed.userDefinedKeys
! (OK)

consed.argumentsToPassToUserDefinedPrograms: argument_for_first_key argument_for_second_key
RWCString
! a space-separated list of the arguments to pass to the user-defined programs
! This goes with consed.userDefinedKeys
! (OK)

consed.tagsToApplyWithUserDefinedKeys: none polymorphismConfirmed
RWCString
! a space-separate list of the tag types to apply when the user
! presses a user-defined key.  If a key is to have no associated tag,
! then enter "none" for that key.
! This goes with consed.userDefinedKeys
! (OK)

consed.snpGenomeUseInsertionPolymorphisms: true
bool
! used with consed -snpGenome
! (OK)

consed.listOfTagTypesToHide: matchElsewhereHighQual matchElsewhereLowQual
RWCString
! (OK)

consed.listOfOptionalWordsToSaveInListOfReadNames: forward reverse ET BigDye customOligo SeqEx FS dyePrimer dyeTerminator doubleStranded singleStranded
RWCString
! (OK)

consed.extendConsensusWithHighQuality: false
bool
! When using "change consensus" to extend the consensus, make the
! read edited high quality.  This will cause phrap, the next time
! the project is assembled, to similarly extend the consensus.  If
! this is set to false, then do not change the quality of the read and 
! extend the consensus with the original read qualities.  
! (OK)

consed.fastStartup: true
bool
! If you have used catPhdFiles.perl to create a huge file with all the
! xxx.phd.1 files, and you have enough memory on your computer, then
! you can startup up consed up to 7 times faster
! (OK)

consed.fastStartupFile: phd.ball
FileName
! If you have used catPhdFiles.perl to create a huge file with all the
! xxx.phd.1 files, and you have enough memory on your computer, then
! you can startup up consed up to 7 times faster.  This file gives
! the name of the huge file.
! (OK)

consed.alwaysRunProgramToGetChromats: false
RWCString
! This allows consed to get chromats out of a database, or do some
! other pre-processing of a chromat before reading it. If set to true,
! consed does not look in ../chromat_dir at all for the chromat, but
! rather runs the program listed in consed.programToRunToGetChromats
! with argument name-of-read and then reads the chromat out of
! consed.uncompressedChromatDirectory and then later deletes the
! chromat from consed.uncompressedChromatDirectory
! If set to false, it never does this.  If set to "last", it does
! this as a last resort.
! (OK)

consed.programToRunToGetChromats: /usr/local/bin/myFavoriteProgram
FileName
! Set this to the program or script that you want to use to
! get a chromat and put it into /tmp (or whatever you set
! consed.uncompressedChromatDirectory to)
! (OK)

consed.programToRunToGetChromatsOf454Reads: $CONSED_HOME/bin/sff2scf
FileName
! This will be run on 454 reads if the read is not found in
! chromat_dir.  If you don't want this to happen, you can make 
! this null.
! (OK)

consed.createFakeChromatsForSolexaReads: true
bool
! (OK)

consed.autoFinishUseLongModelReadRatherThanShort: false
bool
! When calculating the distribution of quality values at high read 
! positions, should Autofinish assume that the reads that were this
! long and longer are representative of finishing reads, or should it
! assume that some finishing will not make it out this far in roughly
! the same proportion as the existing reads.
! (OK)


consed.askAgainIfWantToQuitConsedIfThisManyReads: 500000
int
! If you have to wait a long time for consed to come up, don't
! quit out of consed by mistake.
! (OK)

consed.printWindowInstructions: Make sure that the window you want to print is unobscured.  Then click \"Yes\" to dismiss this box.  Then click on the window you want to print.  You will hear a beep immediately, then another beep a little later.  Then the copy of the window should come off the printer specified by your environment variable LPDEST.
RWCString
! (OK)

consed.allowMultipleSearchForStringWindows: false
bool
! If this is false, and there is already a SearchForString Window up,
! and the user clicks on SearchForString, it will be brought to the 
! front, rather than another one being created.        
! (OK)

consed.autoPCRAmplifyFalseProductsOKIfLargerThanThis: 3000
int
! If a pcr primer pair matches somewhere else and creates a product
! larger than this, the pcr primer pair will still be acceptable
! since the product will not easily form in the cycle time.
! (OK)

consed.autoPCRAmplifyMakePrimerOutOfFirstRegion: false
bool
! I don't expect people will use this.  It allows you to amplify a
! region using autoPCRAmplify not by allowing Consed to choose each
! primer (the normal case) but rather by fixing the first primer to be
! the first area bordering the region.  I added this to allow
! non-specific priming to the transplice leader.
! (OK)

consed.autoPCRAmplifyMaybeRejectPrimerIfThisCloseToDesiredProduct: 5000
int
! --->    --->
! false   true match
! In such a case, the primer pair will be rejected if the false is
! within 5000 bases of true, even if false is a false match of the
! other primer.
!
! <---   --->
! false  true match
! In this case, the primer pair will not be eliminated.
! (OK)


consed.addNewReadsRecalculateConsensusQuality: false
bool
! When running consed by 
! consed -ace old_ace.ace -addReads fileOfPhdFiles.txt -newAceFilename new_ace.ace
! consensus quality is recalculated
! This also applies to add454Reads.perl and addSolexaReads.perl
! (OK)

consed.addNewReadsPutReadIntoItsOwnContig: rules
RWCString
! choices are: 
! "always" (just put each Sanger read into its own contig.  Works
!         with consed -addNewReads.  Next-Gen reads that are aligned
!         will be put into the contigs where they align).
! "ifUnaligned" (put read into a contig if it aligns against the
!    consensus, otherwise if it is Sanger put it into its own contig
!    and if it is Next-Gen and there is a read list, put it into its own
!    contig but if no read list, ignore the read.
! "never" (put read into a contig if it aligns against the consensus;
!    otherwise do not put it into the assembly) 
! (OK)

consed.addNewReadsCheckThatCrossMatchRunCorrectly: true
bool
! addReads2Consed.perl changed in March 2008 to have -discrep_lists
! instead of -alignments.  Check that user is using the new 
! parameters
! (OK)  

consed.addNewReads2CrossMatchOptions: -masklevel 0 -minscore 25 -gap1_only -repeat_screen 2
RWCString
! (OK)

consed.addNewReadsCleanUpTemporaryFiles: true
bool
! (OK)

consed.warnUserWhenTryingToEditAllReads: true
bool
! in the Aligned Reads Window, the user may change all reads at once
! to a particular base at a particular position.  This is dangerous
! and the user is warned.  This resource allows the user to suppress
! this warning.
! (OK)

consed.maybeXKEYSYMDBPath: /usr/share/X11/XKeysymDB
FileName
! Fixes a problem in X on some versions of linux giving pages of the 
! following errors:
! Warning: translation table syntax error: Unknown keysym name:  osfActivate
! (OK)

consed.maybeXKEYSYMDBPath2: /usr/X11R6/lib/X11/XKeysymDB
FileName
! Fixes a problem in X on some versions of linux giving pages of the 
! following errors:
! Warning: translation table syntax error: Unknown keysym name:  osfActivate
! (OK)


consed.amountToMoveWithBigLeftAndRightArrows: 10
int
! allows user to move on a read in the Aligned Reads Window
! by more than 1 base at a time
! (OK)

consed.navigateByHighlyDiscrepantPositionsMinDiscrepantReads: 2
int
! ignores low quality reads
! (OK)

consed.navigateByHighlyDiscrepantPositionsMaxDepthOfCoverage: 100000
int
! (OK)

consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality: 20
int
! (OK)

consed.navigateByHighlyDiscrepantPositionsJustListIndels: false
bool
! (OK)

consed.navigateByHighlyDiscrepantPositionsIgnoreOtherReadsStartingAtSameLocation: false
bool
! If there are, for example, 3 reads that all start at the same
! location, use only the first and ignore the second and third
! (OK)

consed.navigateByHighlyDiscrepantPositionsIgnoreIfListedBasesInConsensus: false
bool
! Do not report this position if there is one of the bases in the consensus
! listed in 
! consed.navigateByHighlyDiscrepantPositionsIgnoreIfTheseBasesInConsensus
! (OK)

consed.navigateByHighlyDiscrepantPositionsIgnoreIfTheseBasesInConsensus: xn
RWCString
! Do not report this position if there is one of these bases in the 
! consensus and if
! consed.navigateByHighlyDiscrepantPositionsIgnoreIfListedBasesInConsensus:
! is set to true
! (OK)

consed.navigateByHighlyDiscrepantPositionsStopOnlyOnceAtAnIndel: true
bool
! If this is false, and there is a long indel, the navigate will stop
! once at each position within the indel.  If this is true, it will only
! stop once at the beginning of the indel.  The danger of this
! behavior is that it will *not* stop at an SNP within the indel.
! (OK)

consed.navigateByQuestionableConsensusBasesIncludePads: true
bool
! by changing this to false, you can only look at substitution
! type errors in the consensus
! (OK)

consed.navigateByQuestionableConsensusBasesMinQuality: 0
int
! sum of quality values only uses bases of this quality and
! greater
! (OK)

consed.phdBallDirectory: ../phdball_dir
FileName
! phd balls are assumed to be in here.  This is typically where consed
! starts, but could be relative to that, such as ../phdball_dir    
! (OK)

consed.newAceFileFOF: newAceFile.fof
FileName
! if consed needs to write a new ace file, the name of that is written
! to this file
! (OK)

consed.navigateByHighOrLowDepthCoalesceRegionsIfThisClose: 50
int
! for navigate by high or low depth of coverage
! (OK)

// superceded by consed.removeReadsWhatToDoWithReads (4/2013)
//consed.removeReadsDeleteNotJustPutInOwnContig: true
//bool
// ! used for consed -removeReads and consed -removeContigs
// ! (OK)

consed.removeReadsWhatToDoWithReads: removeTogether
RWCString
! options are:
! removeTogether: overlapping removed reads are removed together
!    keeping the alignment between them
! delete:  reads are deleted from the assembly
! eachIntoOwnContig: a new contig is made from each read
! (OK)

consed.removeReadsMakeCustomNavigationFileWhereConsensusRecalculated: false
bool
! (OK)

consed.removeReadsWhatToDoIfZeroDepthRegions: break
RWCString
! When removing reads, what should happen if removing a read causes a
! contig to have a location that is zero depth of coverage.  Options
! are: a) break (to break the contig into several new contigs that
! have nonzero depth of coverage), b) nobreak (to leave the contig in
! one piece with a new 0 depth of coverage region)
! (OK)

consed.removeReadsRecalculateConsensus: true
bool
! when using consed -removeReads, use this to determine whether
! to recalculate the consensus bases.  When using gui, ask user.
! If you will be allowing contigs to break, 
! the the consensus will be recalculated regardless of the setting of
! consed.removeReadsRecalculateConsensus
! (OK)

consed.removeReadsOKToAskUser: true
bool
! if you don't want to be asked questions when removing reads
! including whether to allow the contig to break at 0-depth of
! coverage locations and whether to recalculate the consensus where
! reads are removed, set this to false and the resources
! consed.removeReadsWhatToDoIfZeroDepthRegions and
! consed.removeReadsRecalculatedConsensus will be used instead of
! asking the user.  However, if you will be allowing contigs to break, 
! the the consensus will be recalculated regardless of the setting of
! consed.removeReadsRecalculateConsensus
! (OK)

consed.removeReadsDoNotRecalculateEditedBases: true
bool
! When a read is removed, edited base (quality 98 or 99) should
! generally not be recalculated
! (OK)  

consed.removeReadsWhatToDoWithUnalignedReads: allIntoOneContig
RWCString
! options are allIntoOneContig or eachIntoOwnContig
! allIntoOneContig only applies when specifying that contigs
! will be broken apart where there are no reads, i.e.,
! consed.removeReadsWhatToDoIfZeroDepthRegions: break
! (OK)

consed.paired454LeftReadExtension: _left
RWCString
! (OK)

consed.paired454RightReadExtension: _right
RWCString
! (OK)

consed.snpGenome1MSnps: snp1M.txt
FileName
! file for development of snpGenome
! (OK)

consed.diffChromosomesExcludeDeletions: false
bool
! for testing snpGenome moving deletions
! (OK)

consed.snpGenomeFilterByWeight: true
bool
! if true, only considers polymorphisms with soWeight == "1"
! (OK)

consed.wantReadsUpToThisFarFromSnps: 50
int
! for phaster2PhdBall (phaster2Ace.perl ) to take reads, even if they don't overlap the
! snp, that are this far away from the snp
! (OK)

consed.phaster2PhdBallSaveWhichMate: both
RWCString
! for phaster2PhdBall (phaster2Ace.perl) to determine whether
! the function of just saving the reads that intersect the snp are 
! saved, or whether both mates of a read pair are saved if either one
! intersects the snp location and has one of the desired alleles
! alternative: unmapped which says that just the unmapped read is
! saved
! (OK)

consed.phaster2PhdBallSaveInPhasterFormat: false
bool
! for phaster2PhdBall::maybeSaveBothReads to save the read in
! phaster format rather than in phd format.  This only applies
! if consed.phaster2PhdBallSaveBothMates: is set to true
! the phaster lines will be set to the file specified by
! -phdBall on the command line
! (OK )

consed.phaster2PhdBallCalculateNewLocationsFile: false
bool
! for phaster2PhdBall.  Calculates depth of coverage and
! makes new locations that have depth of coverage   
! (OK)

consed.phaster2PhdBallSinglePhdBall: false
bool
! by default, phaster2PhdBall will make a different phdball
! for each desired location.  This will make it make a single
! phdball for all locations.
! (OK)

consed.phaster2PhdBallTesting: false
bool
! Just during testing.
! (NO)

consed.phdBall2FastaIgnoreLowQualityReads: false
bool
! for consed -phdBall2Fasta 
! if a read has mean quality below a threshold, do not
! write it to the fasta file
! (OK)

consed.phdBall2FastaLowestAverageQuality: 25
int
! for consed -phdBall2Fasta
! if read has mean quality below this, ignore it
! (OK)

consed.nextPhredPipelineControlFile: control-file.txt
FileName
! (OK)

consed.nextPhredPipelineTiffPerlScript: bin/run_tiff2intens_1_tile.perl
FileName
! (OK)

consed.nextPhredPipelinePhasterPerlScript: bin/run_phaster_1_tile.perl
FileName
! (OK)

consed.nextPhredPipelineVersion: 100625
RWCString
! (OK)

consed.nextPhredPipelineMainDirectory: /et/grc/vol3/np_testing/pipeline
RWCString
! (OK)

consed.shallowerDepthOfCoverageHistogramInterval: 25
int
! (OK)

consed.shallowerDepthOfCoverageHighQuality: 20
int
! lowest high for determining if a discrepant base
! is serious
! (OK)

consed.shallowerDepthIncludeMates: true
bool
! if a read is kept, keep its mate as well (even if its
! mate is not needed for any other reason)  
! (OK)

consed.shallowerDepthTargetDepth: 20
int
! try to make depth this low (but generally will be higher)
! (OK)

consed.shallowerDepthMinAlleleFraction: 0.2
double
! if there is an allele whose fraction of reads is less than this,
! don't bother trying to save any reads with this allele
! (OK)

consed.shallowerDepthBasesToTrimOnReadEnds: 5
int
! Discrepancies at the ends of reads are often due to alignment
! errors and do not reflect real variants.  
! (NO)

consed.printAllResources: false
bool
! don't print all resources
! (YES)

consed.autoSaveAceFile: autoSave.ace
FileName
! this is just the root.  Each subsequent save makes auto_save.ace.1
! auto_save.ace.2 auto_save.ace.3 etc.
! (OK)

consed.autoSaveBeforeMiniassembly: false
bool
! Just before running miniassembly, autosave the assembly
! (OK)

consed.bam2AceTerminateIfTooManyReads: true
bool
! If this is set to true and the number of reads found
! by bam2Ace is more than consed.bam2AceMaximumReads, 
! terminate with an error message.
! (YES)

consed.bam2AceMaximumReads: 1000000
int
! If there are going to be more than this # of reads
! and consed.bam2AceTerminateIfTooManyReads: true
! then terminate with an error message.
! (OK)

consed.bam2AceShallowerDepth: true
bool
! If false, takes all reads that overlap region.
! (OK)

consed.bam2AceShallowerDepthWhenBamScapeCallsBam2Ace: false
bool
! When bamScape brings up consed, bam2Ace first runs.  Should
! it be run with shallowerDepth, or should all of the reads be
! in consed?
! (OK)

consed.bam2AceCoalesceRegionsIfThisClose: 2000
int
! It is important that the same read not be included in more than one
! region.  To this end, if 2 reads are sufficiently close that a read
! might overlap both regions, then the 2 regions should be coalesced so
! that the read will overlap just one region (the new coalesced one).
! (OK)

consed.bam2AcePrintEachReadToLogFile: true
bool
! This creates a huge amount of output so you might want to turn it off.
! (YES)


consed.bam2AceWarnUsersAfterThisNumberOfReads: 10000000
int
! If this is unlimited, then there can be problems both with
! too much memory usage, too long start up time, and difficulty
! reassembling the reads
! (OK)

consed.okToUseShortInsertTemplatesToLinkContigs: true
bool
! Why not use short insert fwd/rev pairs to link contigs?  This is
! important for using solexa fwd/rev pairs to link contigs.
! (OK)

consed.bam2AceSubregionSize: 10000
int
! prevents running out of memory
! (NO)

consed.bam2AceSubregionOverlapSize: 1000
int
! solves problem of reads over boundary 
! between subregions
! (NO)

consed.bam2AceMaxQualityLevel: 60
int
! How many quality levels are used for scoring the quality of reads
! (NO)

consed.bam2AceCleanUpTemporaryFiles: true
bool
! (OK)

consed.bam2AceJustTrimming: false
bool
! for developing using trimmed ends of reads to indicate insertion
! (OK)

consed.bam2AceInsertionOnlyWhenThisManyReadsHaveClippedBases: 3
int
! This allows insertions to not be created for low depth regions
! (OK)

consed.bam2AceInsertionOnlyWhenThisPerCentOfReadsHaveClippedBases: 30.0
double
! (OK)

consed.fullPathnameOfBam2AceScript: $CONSED_HOME/bin/bam2Ace.perl
FileName
! used when bamScape brings up consed
! (OK)


// removed so that window is sized to fit everything in
//!consed.bvReadsInReferenceWindowHeight: -1
//!int
//! (OK)

consed.BVReferenceNameIsJustFirstWord: false
bool
! e.g., if the fasta file looks like this:
! >gi|512322299|gb|CM001630.2| Homo sapiens chromosome 22, whole genome shotgun sequence
! should the reference sequence name be that entire header or just
! gi|512322299|gb|CM001630.2|
! (OK)


consed.BVCleanUpAfterStartingConsed: true
bool
! cleans up .out .regions and .bamfof files
! (OK)

consed.BVReadsInReferenceWindowWidth: 800
int
! (OK)

consed.BVHighQualityDiscrepancyMinimum: 30
int
! Used for discrepancy graph.
! (OK)

consed.BVSubregionSizeForDiscrepancies: 100000
int
! (NO)

consed.okToAskIfSureWhenOverstrikingConsensus: true
bool
! (OK)

consed.smallFont: -misc-fixed-*-r-*--*-*-*-*-*-*-iso8859-*
RWCString
! (NO)

consed.smallFontSize: 10
int
! (NO)

consed.customTrackHeight: 50
int
! (OK)

consed.BVFindProblemsTooHighDepthOfCoverage: true
bool
! (OK)

consed.BVFindProblemsTooLowDepthOfCoverage: false
bool
! (OK)

consed.BVFindProblemsDepthOfCoverageAboveThisNumber: 100
int
! (OK)

consed.BVFindProblemsDepthOfCoverageBelowThisNumber: 5
int
! (OK)

consed.BVFindProblemsInconsistentReads: true
bool
! (OK)

consed.BVFindProblemsPerCentInconsistentReadsAboveThisNumber: 20.0
double
! (OK)

consed.BVFindProblemsNumberOfInconsistentReadsAboveThisNumber: 6
int
! (OK)

consed.BVFindProblemsDiscrepancyRate: true
bool
! (OK)

consed.BVFindProblemsDiscrepancyRateIsAboveThisNumber: 30
int
! in per cent
! (OK)

consed.BVFindProblemsDiscrepancyNumberOfReadsIsAtLeastThisNumber: 4
int
! # of reads of all the same allele
! (OK)

consed.BVFindProblemsDiscrepancyIgnoreSoftTrimmed: false
bool
! when calculating discrepancies, ignore regions of a read that
! are soft trimmed
! (OK)


consed.BVFindProblemsNumberOfDiscrepantSitesInAWindow: 1
int
! if there are this many (or more) discrepant sites in a window,
! consider this a problem
! (OK)  


consed.BVFindProblemsWindowSizeForDiscrepancies: 25
int
! used in conjunction with  consed.BVFindProblemsNumberOfDiscrepantSitesInAWindow                                                
! (OK)

consed.BVFindProblemsIgnoreReadsWithAverageBaseQualityTooLow: true
bool
! (OK)

consed.BVFindProblemsIgnoreReadsWithAverageBaseQualityBelowThisNumber: 20
int
! (OK)

consed.BVFindProblemsClusteredInconsistentMatePairsToSameReferenceSequence: false
bool
! Pat said (currently) not interested in small misassemblies--just ones
! between contigs
! (OK)

consed.BVFindProblemsClusteredInconsistentMatePairsToOtherReferenceSequences: true
bool
! This reports clusters of inconsistent read pairs that all go from a
! location on one reference sequence to a location on a different
! reference sequence.  Such clusters likely indicates a misassembly.
! (OK)

consed.BVFindProblemsClusteredInconsistentMatePairsMateUnmapped: true
bool
! This reports clusters of inconsistent read pairs that all go from a
! location on one reference sequence and the mates are all unmapped.
! Such clusters likely indicates sequence missing from the reference 
! sequence.
! (OK)

consed.BVFindProblemsClusteredInconsistentMatePairsToOtherReferenceSequencesMinCount: 5
int
! This needs to be changed based on the read depth
! (YES)

consed.BVRewriteReferenceFOF: rewriteReference.fof
FileName
! By default, this is assumed to be in the same directory as bamScape
! (and consed -rewriteReference).  It contains the ace files to be used
! by rewriteReference to modify the reference sequence(s).
! (OK)

consed.BVFindProblemsClusteredInconsistentMatePairsWindowSize: 10000
int
! If A1, A2 are a pair, and B1, B2 are a pair, A1 and B1 must be this
! close or closer, and A2-B2 must be this close or closer so the
! 2 mate pairs are considered a cluster.  You might want to change 
! this number depending on your sequencing technology.
! (YES)

consed.BVFindProblemsPrintInfoForClusterView: false
bool
! For development.  Allowing printing information to be displayed in
! cluster view
! (OK)

consed.BVSwipedRegionHideClustersOfInconsistentMatesOnSameReferenceSequence: true
bool
! There are often many clustered mates on the same reference sequence
! and some people are less concerned about this type of possible
! misassembly than that in which the cluster is on a difference
! reference sequence.  Thus they can hide the ones on the same reference
! sequence.
! (OK)

consed.BVInconsistentReadDepthJustToOtherContigs: true
bool
! In Reads Aligned to Reference Window, show clustered inconsistent
! read depth of mates pairs just between different contigs, or should
! it also include mate pairs within the same contig?
! (OK)

consed.BVShowDiscrepantReadsPane: true
bool
! (OK)

consed.BVPixelsPerInconsistentReadDepth: 4
double
! what to multiple by the inconsistent read depth to get the vertical
! pixel height of the graph at a point
! (OK)

consed.BVShowKeysInReadsVsReferenceWindow: true
bool
! Pat doesn't like the keys, so he can remove them.
! (OK)

consed.bionanoMinimumXmapConfidence: 5.0
double
! According to Bionano, ignore confidence values below this.
! (NO)

consed.bionanoPointerPixelTolerance: 4
int
! When mousing over a vertical line, how far away can the pointer be?
! (OK)


consed.inconsistentMatePairIfPointingOut: false
bool
! if true, then <- -> is considered inconsistent
! (OK)

consed.inconsistentMatePairIfPointingIn: false
bool
! if true, then -> <- is considered inconsistent
! (OK)


!
!
!
! parameters in the (NO) category
!
!
consed.maybeConvertSlashesToDotsInPhdBallReadNames: false
bool
! Used to fix a few assemblies that were created in August 2013
! (NO)


consed.calculateGenotypesMinimumBaseQuality: 10
int
! For snp program.  Will ignore reads with this quality or below.
! The algorithm will not work with high error probabilities.
! (NO)

consed.calculateGenotypesF: 0.85
double
! In Li and Durbin's model, this is a measure of the dependence of
! reads on the same strand.  I found a big difference between 0.85 and
! 0.90--some datasets even give different genotypes.
! (NO)

consed.calculateGenotypesMaxDepthOfCoverage: 400
int
! Li and Durbin's model using large N and K causes underflow or
! overflow with too large values of N and K and thus I take a sample
! of all of the reads of this size.
! (NO)

consed.maxNumberOfReadsPerPhdBall: 1000000
int
! This is important since cross_match slows down on fasta files of
! over a few million reads                                   
! (NO)

consed.userWantsToSaveToThisAceFile:
FileName
! (NO)

consed.autoFinishEmulate9_66Behavior: false
bool
! Picks univ primer reads and walks in the same phase.  This results
! in poor redundancy of universal primer reads, may pick custom primer
! reads over universal primer reads, but may pick fewer
! reads overall.
! (NO)

consed.primersPCRPrimersGroupedIntoWindowOfThisManyBases: 200
int
! to speed up PCR primer picking and to reduce the number of 
! PCR primer pairs, group primers into windows of this size
! and then just compare window against window
! (NO)

consed.primersLookForThisManyPCRPrimerPairsPerPairOfGroups: 2
int
! to speed up PCR primer picking and to reduce the number of PCr
! primer pairs, group primers into windows and then just accept
! this many primer pairs from a pair of groups
! (NO)

consed.primersMaxInsertSizeHowManyStandardDeviationsAboveMean: 2.5
double
! This is used to set nMaxInsertSize_ for the library
! (NO)

consed.primersMaxInsertSizeCalculationDiscardOutliersThisManyStandardDeviationsAboveMean: 3.0
double
! Mean and standard deviation are calculated.  Then any pairs this far
! above the mean are discarded and the mean and standard deviation are
! calculated again.
! (NO)


consed.autoFinishStandardDeviationsFromMeanFromGapToLookForTemplatesForSuggestingEachMissingReadOfReadPairs: -1.0
double
! Only applies when consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs:
! is set to true.  If m is the mean insert size and d is the standard
! deviation and this parameter is p, then consider all templates
! within a distance m + p*d from the gap.
! (NO)

consed.autoFinishCheckThatReadsFromTheSameTemplateAreConsistent: true
bool
! I strongly advise keeping this true.  If you change it to false, you
! are on your own.  If the forward and reverse universal primer reads
! look like this:   <---               ---->, how is autofinish going
! to even know where the template is, huh?  Leave it true!
! (NO)

consed.autoFinishDoNotAllowSubcloneCustomPrimerReadsCloseTogether: true
bool
! at higher redundancy, autofinish may pick custom primer reactions
! that are only a few bases apart on the same strand.  This parameter,
! along with
! consed.autoFinishDoNotAllowSubcloneCustomPrimerReadsCloserThanThisManyBases,
! says how far apart they can be
! (NO)

consed.autoFinishDoNotAllowWholeCloneCustomPrimerReadsCloseTogether: true
bool
! Even at redundancy 1, Autofinish may pick whole clone reads just
! a few bases apart.  This prevents it.
! (NO)

consed.autoFinishMinilibrariesPreferTemplateIfSizeThisManyStdDevsFromMean: 2.0
double
! If a template is more than this many standard deviations from the
! mean, try to avoid using it, unless there is nothing else.
! Rationale: there is something wrong with this template--an insertion
! or deletion.
! (NO)

consed.autoFinishMinNumberOfForwardReversePairsInLibraryToCalculateAverageInsertSize: 5
int
! If there are at least this many fwd/rev pairs in a library, then 
! the mean and standard deviation are used for sizing other templates in
! the same library.  If there are fewer than this, then the default size 
! specified in the librariesInfo.txt file is used.
! (NO)

consed.autoFinishIfEnoughFwdRevPairsUseThisManyStdDevBelowMeanForInsertSize: 0.2
double
! If you are interested in walking on a template that does not have a
! forward/reverse pair, then the precise insert size is uncertain.  If
! this template comes from a library that has lots of templates with
! forward/reverse pairs, then the mean and standard deviation of the
! insert sizes from this library is known.  For the template in
! question, we could just use the mean of this library (this parameter =
! 0.0), but we could be conservative and assume the insert size is
! somewhat less.  This parameter tells how much less.
! (NO)

consed.autoFinishNewCustomPrimerReadThisFarFromOldCustomPrimerRead: 50
int
! this tells autofinish when it wants to make a new custom primer
! read, how far this read must be from any previous custom primer
! reads on the same strand
! (NO)

consed.autoFinishMinNumberOfSingleSubcloneBasesFixedByAnExp: 1
int
! if an experiment will only fix less than this number of single
! subclone bases, don't do it even if the total number of single
! subclone bases in the contig is too high
! (NO)

consed.autoFinishNumberOfBasesBetweenContigsAssumed: 1000
int
! gap size--each base in the gap counts as 1 error so autofinish tries
! to extend into gaps
! (NO)

consed.autoFinishPotentialHighQualityPartOfReadStart: 80
int
// Phil and Kerrie both suggested upping this from 50 
// on March 25, 1999
! nReadUnpaddedConsPosStart + nAutoFinishPotentialHighQualityPartOfReadStart_
! == nReadUnpaddedConsPosStartOfPotentialHighQuality
! this is used to evaluate the quality of templates
! this no longer has much effect on the reads autofinish chooses
! (NO)

consed.autoFinishPotentialHighQualityPartOfReadEnd: 300
int
! nReadUnpaddedConsPosStart + nAutoFinishPotentialHighQualityPartOfReadEnd_
! == nReadUnpaddedConsPosEndOfPotentialHighQuality
! this is used to evaluate the quality of templates
! this no longer has much effect on the reads autofinish chooses
! (NO)

consed.autoFinishPrintCustomNavigationFileForChosenReads: true
bool
! If this is true, then autofinish will print a file of the chosen reads
! in the format for consed to navigate (prev and next) to each
! location of the proposed new reads
! (NO)

consed.autoFinishReversesForFlankingGapsTemplateMustProtrudeFromContigThisMuch: 100
int
! Normal case:
! --------------------------- (consensus)
!                  ----------- template1
!                     ----------- template2
!                          ----------- template3
! Then you probably would want to use template3 since a reverse is
! most likely to go in the other contig rather than go into gap.
! But suppose that template2 and template3 don't exist.  Would you
! want to use template1?  This parameter tells Autofinish whether you
! would want to use it, or pick no reverse at all.
! (NO)

consed.autoFinishTagOligosWhenDoExperiments: true
bool
! when autofinish is run with -doExperiments, tags the oligos
! it chooses
! (NO)

consed.countPads: false
bool
! (NO)

consed.debugging: 0
int
! for consed development use
! (NO)

consed.debugging2: 0
int
! for consed development use
! (NO)

consed.debugging3: 0
int
! for consed development use
! (NO)

consed.debuggingString: joseph
RWCString
! for consed development use
! (NO)

consed.ignoreHighQualityDiscrepanciesThisManyBasesFromEndOfAlignedRegion: 5
int
// Phil specified this (changed from 10) on 6/30/98
consed.ignoreUnalignedHighQualitySegmentsShorterThanThis: 20
int
! (NO)

consed.primersLookThisFarForForwardVectorInsertJunction: 125
int
! don't change this--if no X's this far from beginning of read, then
! assume that you are in insert
! (NO)

consed.primersDNAConcentrationNanomolar: 50.0
double
! used for melting temperature--don't change this!
! (NO)

consed.primersMaxMatchElsewhereScore: 17
int
! used for testing false-annealing to template and to vector
! (NO)

consed.primersMaxMatchElsewhereScoreForPCR: 21
int
! used for testing false-annealing to template and to vector
! when used with PCR
! (NO)

consed.primersMaxSelfMatchScore: 6
int
! cutoff for self-annealing of a primer
! (NO)

consed.primersMaxPrimerDimerScoreForPCR: 14
int
! careful changing this
! (NO)

consed.primersMinQuality: 30
int
! you must be sure of the sequence of a primer or it won't anneal to
! where you want
! (NO)

consed.primersPrintInfoOnRejectedTemplates: true
bool
! whether to print which templates were rejected and why (this output
! can be large )
! (NO)

consed.primersSaltConcentrationMillimolar: 50.0
double
! used for melting temperature--don't change this!
! (NO)

consed.primersScreenForVector: true
bool
! whether or not to screen primers for annealing to vector
! (NO)

consed.primersToleranceForDifferentBeginningLocationOfUniversalPrimerReads: 100
int
! different forward reads or different reverse reads 
! can differ by up to this amount in the starting location
! If they differ by more, then there is something wrong
! with the template (it is mislabeled?) so don't use it again for
! walking
! (NO)

consed.primersTooManyVectorBasesInWalkingRead: 10
int
! if there are this many x's, then don't walk again on this template
! (NO)

consed.qualityThresholdForLowConsensusQuality: 25
int
// Phil had this changed from 20 to 25 on 15 Jul 98
! highest low quality.  A base at this quality is considered low
! quality.  A base higher than this is considered high quality.
! (NO)

consed.tagColorPerCentOfBase: 50
int
! (NO)

consed.uncompressedChromatDirectory: /tmp
RWCString
! (NO)

consed.454sff2scfDirectory: /tmp
FileName
! when a user asks to see a 454 trace, and sff2scf runs, the
! scf file will be put here.  This is hard-coded in sff2scf.c
! (NO)

consed.whenMakingFakeReadToJoinContigsAddThisManyBasesOnEitherSideOfAlignedRegion: 200
int
! (NO)

consed.writeThisAceFormat: 2
int
! (NO)

consed.dumpCoreIfBoundsError: false
bool
! (NO)


consed.autoFinishMinSmithWatermanScoreOfARun: 20
int
! (NO) 

consed.autoFinishDoNotComparePCRPrimersMoreThanThisManyTimes: 1.0e+9
double
! When autofinish tries to find a compatible set of pcr primers, it
! can take billions of tries.  This limits the number of tries so that,
! if autofinish can't find it in this number of tries, it gives up
! rather than running for days, weeks, years!
! (NO)

consed.restrictionDigestMaximumBasesToCompareToVector: 200
int
! (NO)

consed.restrictionDigestZoomFactor: 2.0
double
! Amount to zoom in or out in the gel window of the restriction digest
! (NO)

consed.restrictionDigestZoomFactorForNavigate: 10.0
double
! When looking at restriction gel and navigating to first problem
! location, this is the amount to zoom in.
! (NO)

consed.restrictionDigestToleranceInPositionUnits: 20
int
! (NO)

consed.autoPCRAmplifyTooManySeriousFalseMatches: 100
int
! If a pcr primer pair has a significant false match to this many
! other places in the assembly, do not consider for possible pcr
! primer pairs.  This is just for the purpose of speeding up picking
! of primer pairs--the higher the number, the faster the searching,
! but the more likely a primer pair will be selected that will
! create multiple products.
! (NO)

consed.assemblyViewZoomFactor: 1.5
double
! amount to zoom in or out
! (NO)

consed.assemblyViewGridCellWidthInPixels: 4.0
double
! for keeping track where objects are on screen.  If you make it
! larger, you get better drawing performance, but lower resolution
! of which objects the cursor is pointing at
! (NO)

consed.assemblyViewCursorSensitivityInPixels: 4
int
! square about cursor that will detect objects
! (NO)

consed.assemblyViewReadDepthQuality: 20
int
! (NO)

consed.showAllTracesMaxNumberOfTracesToShowAtOnce: 100
int
! (NO)

consed.allowFwdRevPairScaffoldsToBeMergedIfThisManyBasesIntersectionOrLess: 1000
int
! (NO)

consed.justForPrimateProject: false
bool
! (NO)

consed.solexaFilesAreAssumedToBeHere: ../solexa_dir
FileName
! any solexa files (or links) are assumed to be in this directory. 
! If you change this, it can have effects in 3 places:  1) add new
! reads list of solexa files 2) add new reads where it looks for these
! files and 3) subsequent runs of consed it will prepend this to the
! path it finds in the ace file under PHD_DIR: on the DS line.  There
! may be other implications as well.
! (NO)

consed.solexaAlignmentFilesPerInsertingPadsCycle: 50
int
! (NO)

consed.solexaAlignmentsPerAlignmentFile: 10000
int
! (NO)

consed.solexaFastqFilesArePhredQualityNotSolexaQuality: true
bool
! (NO)

consed.solexa64FastqOrSanger33Fastq: auto
RWCString
! valid values are: auto (it figures it out itself),
! solexa64 (+64), and sanger33 (+33)
! If consed is being fooled, you can set these to force
! consed to override what the file appears to be.
! (NO)

consed.maximumReadsInReadList: 200000
int
! even if there are millions of reads, don't display them all or
! it will eat up memory and time
! (NO)


consed.maxLengthOfReadsInapLocatedFragment2: 10000
int
! (NO)

consed.maximumStartupErrorsToReport: 50
int
! (NO)

consed.maximumUnrecognizedMiscTagLinesToReport: 25
int
! (NO)

consed.454LinkerAlignmentMatchScore: 1
int
! (NO)

consed.454LinkerAlignmentMismatchScore: 3
int
! (NO)

consed.454LinkerAlignmentIndelScore: 2
int
! (NO)

consed.filter454ReadsDeleteCrossMatchOutput: true
bool
! This should be changed to false only for troubleshooting.  
! Otherwise unused files will accumulate on your disk.
! (NO)


------------------------------------------------------------------------

35.  ACKNOWLEDGEMENTS

Thanks to Jim Knight for 454 data.

Thanks to 1000 Genome project www.1000genomes.org for the bamscape
data.  It may only be used for training purposes.

Thanks to Owen Thompson (Bob Waterston's lab) for providing the
illumina_paired data which was C. elegans but I made modifications to
exhibit more consed features.

Thanks to countless people who have made suggestions over the years
for consed features that I have adopted.  Special thanks to Pat Minx.