A partial list of improvements from version 0.960731:

1) Major improvements in speed and memory usage, particularly for
large datasets (e.g. bacterial genomes, ESTs).

2) Improved accuracy of assembly and consensus sequences, and improved
treatment of data anomalies (chimeras, etc). The consensus quality
values should also be more accurate.

3) More user control over stringency of assembly, through options
-forcelevel and -repeat_stringency.  The old -revise_greedy flag, for
use when the original greedy assembly fails, has been improved.

4) A large number of other new options.

5) Fixes of essentially all bugs that have been reported to me.

6) For consed users, much more information about various aspects of
the assembly is now provided by phrap to consed, via the new format
.ace file (obtained using the -new_ace option). There are tags for
compressions and G dropouts, for chimeras, and for reads that have
matches elsewhere in the assembly, among others.  You need a current
version of consed to be able to read the new_ace format and view these

7) The treatment of read names by phrap has been made much more
flexible; information on template, read direction, and chemistry,
previously conveyed via the read naming convention, can now be
provided in the .phd or FASTA file instead so arbitrary read names may
be used.

8) Improved Gap compatibility (see documentation for details).

9) The documentation (phrap.doc) has been much extended and improved.
(Parts still need work!).


1) the new version has NOT yet been tested on datasets lacking quality
information (e.g. EST datasets where the chromatograms are unavailable
for base calling with phred). If you experience problems on such
datasets you may want to try using the old version (0.960731) instead,
which I will send you on request if you do not have it.

2) the new version of swat has a new method for computing expectation
values that has not been fully tested yet.


The source code for the swat/cross_match/phrap package is being sent to
you in the form an email message containing a uuencoded .tar.Z file;
you will need to have access to a Unix system for the initial
unpacking, but once you've uudecoded it and unpacked the .tar file
(steps i and ii below), you should be able to compile the programs
on computers running other operating systems -- they should be portable
to almost anything with a decent C compiler and adequate memory (64 Mb
RAM or more is desirable). Here are the steps needed to unpack and
install the programs:

i. Save the email message as a file (for example, "temp.mail"). If
possible, do this using the Unix mail command, rather than another
mail program -- some mail programs (e.g. Pine) remove trailing spaces
on each line of incoming messages, which will corrupt a uuencoded
 Do not attempt to modify the saved mail message in any way. That is
unnecessary and may corrupt the message.

ii. To unpack the saved file email message, execute the following two
commands on a Unix workstation, in the directory containing the file
created in step i above:

> uudecode temp.mail

> zcat distrib.tar.Z | tar xvf -

If either of these commands results in an error message, it is likely that
the email message was corrupted by your mail program (see step i above).

iii. To produce working versions of the programs, move (if necessary)
all of the files produced by the above command to the computer on
which you wish to run the programs (which must have a C compiler!),
and execute the following command in the directory that contains the

> make

If your compiler does not recognize the -O2 optimization flag (which
should be evident from warning messages it produces), you should change
the line

      CFLAGS= -O2

in the file "makefile" to

      CFLAGS= -O

Then remove all executables (files ending in .o) produced by the
original make, and recompile.

iv. If you have datasets with more than 64,000 reads, or that include
sequences longer than 64,000 bp, you will need the .manyreads and/or
.longreads versions of phrap and cross_match. These are created using the

> make manyreads

v. If you are operating a non-commercial (academic or government)
computer facility which provides access to several independent
investigators, you are required by the licensing agreement to set the
permissions on the executables and source code to allow execute but
not read access, so that the programs may not be copied.

vi. The documentation is in three .doc files: general.doc, phrap.doc,
and swat.doc. Please read it!

Contact me if you have problems with any of the above steps. Before
doing so however please record exactly what steps you carried out, on
what computer & operating system, and what error messages you

N.B. PLEASE SEND MAIL TO ME ONLY AT phg@u.washington.edu