PRICE Genome Assembler Sourcecode Download

We are pleased to release PRICE (Paired-Read Iterative Contig Extension), a de novo genome assembler implemented in C++. Its name describes the strategy that it implements for genome assembly: PRICE uses paired-read information to iteratively increase the size of existing contigs. Initially, those contigs can be individual reads from a subset of the paired-read dataset, non-paired reads from sequencing technologies that provide non-paired data, or contigs that were output from a prior run of PRICE or any other assembler.

PRICE was designed to address the challenge of assembling viral genomes that comprise a small minority of the reads within ultra-deep, short-read, shotgun metagenomic datasets. PRICE has already enabled the discovery of several novel virus genomes from such complex datasets, and it is also being applied to the de novo assembly of large individual genomes.

The following link provides access to the full documentation accompanying the most recent version of PRICE; these HTML pages for any PRICE release can also be downloaded below.

The PRICE software system will compile into two independent executables: PriceTI, the assembler; and PriceSeqFilter, for filtering input data based on quality or other criteria prior to assembly. For full useage descriptions, see the documentation pages.

Citation: a manuscript describing PRICE has been accepted for publication in the open-access journal G3 - Genes|Genomes|Genetics, and an early-online version of the article is available here. Below, you will find the current release of PRICE, as well as an archive of prior releases, including pre-publication versions that have been used in other studies. PRICE continues to be maintained, in terms of software optimization, bug fixes, added features, and improvements to ease-of-use, so check back soon!

Version No.	Release Date	PRICE Sourcecode	Full Documentation	Sample Job	Complete File Set (Source+Docs+Job)	Update Description
Most Recent:
1.2	4/8/2014	245k	23k	20m	20m	Numerous efficiency improvements and bug fixes.
Previous Versions:
1.0.1	5/6/2013	239k	23k	20m	20m	Bug fix for PriceSeqFilter: segfault when executed with single-read input corrected; Minor documentation update.
1.0	5/3/2013	239k	23k	20m	20m	A new executable, PriceSeqFilter, for filtering input data prior to executing an assembly run using the same criteria available through the PriceTI assembler command line; Bug fix: PriceTI interface previously terminated if negative numbers were provided as arguments, erroneously interpreting them as invalid flags. This has been corrected. Updated documentation.
0.18.2	2/25/2013	217k	15k	20m	20m	Updated documentation, including file format specifications; New sample job recommended command; Stdout log no longer reports the addition of zero contigs; Reduced memory use during mapping; Special instructions for installation on Mac OS X (see README.txt or "Installation" in documentation); Improved error messages for incorrect number of arguments used for a flag; Improved error messages for the specification of input files that don't exist; Improved error messages for inappropriate command-line input when numbers (or integer numbers) are required; Improved error messages for invalid score characters in _sequence.txt (Illumina) input files; Tolerates an expanded set of whitespace characters in input files; Optimized memory use and efficiency during the 1st mapping step; Requirement of >half (as opposed to >=half) of linking reads to connect two contigs for them to be combined into a single assembly job; Prints version number at each call.
0.18	5/27/2012	217k	15k	20m	20m	Bug fix: several bugs that contributed to inverted-strand misassemblies, including the replacement of legitimate linkages between adjacent contigs with reverse-orientation linkages in the AssemblyJobCreator class, also the flipping of some sequences prior to their collapse with redundant contigs in the AssemblyJobSubset class; Bug fix: small memory leak when using the -badf flag; Bug fix: the paired-ends of reads filtered by -badf/-repmask should not be blocked from mapping to contigs, but were in the previous version; New feature: ability to mask paired-end reads in which one read contains a di-nucleotide repeat stretch using -maxDi (similar to the homopolymer filter -maxHp); New feature: quality filter flags -rqf and -rnf, to remove paired-end reads for which at least one of the reads has an unacceptably high number of low-quality or uncalled (N) nucleotides, respectively; New feature: filters for initial contigs to remove sequences matching those in a file (-icbf), low-quality sequences (-icqf/-icnf), or sequences with homopolymer/dinucleotide stretches (-icmHp/-icmDi); these are the initial-contig equivalents of the read filters -badf, -rqf/-rnf, and -maxHp/-maxDi, respectively; New feature: run logs now include information about the number of initial contigs and reads that are removed or retained by filters, as well as explicitly the number of initial contigs gathered from files at each cycle; Explicit blocking of the same contig being included in a single contig-edge assembly job in both orientations (a source of palindromic misassemblies); Fastq nucleotides with less than 50% probability of being correct are automatically converted to N's; Potentially accellerated processing of fastq quality scores.
0.17.2	5/8/2012	210k	15k	20m	20m	Bug fix: memory leak in the verbose log writer class; Bug fix: improved thread safety during read mapping that should reduce the frequency of already-infrequent core-dump crashes.
0.17.1	5/6/2012	211k	15k	20m	20m	Bug fix: corrected an illegal read error during gapped alignment scoring/collapsing; this error would sometimes occur without raising exceptions, but would also sometimes result in an exception declaring that "conditions after a gap block are not legit".
0.17	4/25/2012	203k	15k	20m	20m	Bug fix: corrected error during parsing of the -repMask command args; Acceleration of contig targeting (substantial acceleration when using -target and in the AssemblyJobGraph class); New design feature: verification that at least one of the best-hit matches for a pair of paired-end reads is to a contig edge window (previously assumed but not verified); Reciprocal matches of a read to both strands of a contig are filtered at an earlier (less time-consuming) step; Acceleration of assembly by the AssemblyJobSubset class.
0.16.2	4/9/2012	203k	15k	20m	20m	Speed improvements to the various AssemblyJob interface-implementing classes.
0.16.1	4/6/2012	201k	15k	20m	20m	Speed improvements in the ScoredSeqCollectionBwt class that reduce the time for seeding alignments to long sequences.
0.16	4/4/2012	199k	15k	20m	20m	Speed improvement for dynamic programming alignment of long sequences.
0.15	4/1/2012	199k	15k	20m	20m	Bug fix: corrected an illegal memory write when using -target; New feature: more versatile and better-specified -trim commands (including -trimB and -trimI for basal/continuous and initial trimming of contigs, respectively); Additional threading to increase the efficiency of the assembly job creation steps (in between read-mapping and job-running); New feature: control of the match/mismatch/initiate gap/extend gap scores for dynamic programming alignments using the -r, -q, -G, and -E flags (akin to those flags for NCBI BLAST); For developers: removal of dependencies on other PRICE classes for use of the Assembler programmatic interface.
0.14	3/18/2012	196k	13k	20m	20m	Bug fix: -trim command was broken (non-functional) in previous version(s), function now restored; Bug fix: corrected an illegal memory write when using -spf/-spfp; New feature: aborts if paired-end files both point to the same file (-fp/-fpp/-mp/-mpp); New feature: aborts if an output file cannot be written; New feature: length filter (-lenf) can be applied variably through a run Updated documentation, including preliminary developer documentation.
0.13	10/31/2011	193k	9.8k	20m	20m	New feature: false paired-end reads. Use single-direction reads (like 454 or IonTorrent data) as if they were paired-ends using the -spf or -spfp input flags (see descruption using --help for more info).
0.12	10/17/2011	191k	9.8k	20m	20m	New feature: repeat detection based on significantly high levels of coverage using the -repmask flag; Changed default value for -link flag from 5 to 2 (see -link description using --help for more info); Added support for .fna, .ffn, and .frn as valid fasta file appends.
0.11.1	9/27/2011	180k	9.8k	20m	20m	Bug fix: Segfaults occuring due to incorrect interpretation of nucleotide scores when reading .fastq/_sequence.txt mate-pair files.
0.11	9/25/2011	180k	9.8k	20m	20m	Efficiency improvements for the extraction of read information from files, as well as gapped alignments and assembly of redundant sequences.
0.10	9/5/2011	177k	9.8k	20m	20m	Bug fix: another cause of infrequent segfault during the second read mapping step only due to a now-corrected race condition when threaded; Substantial acceleration of ungapped alignment, most notable during the read-mapping steps.
0.9	8/23/2011	175k	9.8k	20m	20m	Bug fix: infrequent segfault during the read mapping steps due to a now-corrected race condition when threaded; More even balancing of computational load between threads during meta-assembly and in later assembly cycles; Re-design and simplification of ScoredSeqCollectionBwt class (for the purpose of load balancing); Some optimization of read-mapping implementation; Optimization of the AssemblyJobSubset class (more optimally prevents the exploration of alignments that will ultimately be of insufficient quality).
0.8	7/31/2011	172k	9.8k	20m	20m	Added a second meta-assembly step that occurs post-contig filtering (allows scaled parameters to be adjusted to reflect the size of the filtered contig set, allowing more dissimilar but nonetheless likely redundant contigs to be collapsed; Bug fix: corrected a problem with filtering out N-containing substring seeds for searches to a BWT dataset (also speed-optimized that operation); Implemented a new method for obtaining full-sequence alignments from a collection of sequences and applied it to -target mode (improves both efficiency and completeness of results); Additional speed optimizations for seeding alignments with substring matches to a BWT dataset.
0.7	7/17/2011	171k	9.8k	20m	20m	Bug fix: -nco previously threw an exception when called, now it is functional; Multiple output files now guarantee the return equal-length sequences in the same order as one another; Corrected an error in the execution of read mapping that allowed some sub-optimal alignments to persist when only best-scoring matches were being sought; Increased the efficiency of several aspects of PRICE, especially methods for getting information about ScoredSeq objects or Alignments, or copying sequences.
0.6	6/27/2011	163k	9.8k	20m	20m	Bug fix: -reset flag function restored (was broken such that it had no effect); Support for the input of read files with pairs of reads facing away from one another (typical of mate-pair libraries) using the new -mp, -mpp, -ms, and -msp flags; New args for paired-end and mate-paired read files allow them to be used cyclically; Accelerated retrieval of data from disc during the second mapping and assembly steps of each cycle; Speed increase for the first mapping step; Reduced verbosity of the verbose log file: statistics are no longer printed for individual assembly jobs with only one sequence; Efficiency improvements for gathering sequence information from the ScoredSeqFlip class.
0.5	6/13/2011	159k	9.6k	20m	20m	New -icfNt and -picfNt flags allow initial contigs to be introduced without assembled contigs being targeted to them in -target mode; Support for single-file paired-end input (paired ends found as alternating file entries); Modified behavior from the -badf flag: reads that map to the sequences in the bad file are prevented from mapping to existing contigs but can still be included in the assembly; Small efficiency improvements for sequence mapping/alignment.
0.4	5/31/2011	154k	9.0k	20m	20m	Multiple output files can be written in parallel; Filtering of reads with matches to sequences in a provided file (-badf command-line arg); Corrected a bug in AssemblyJobGraph class that caused infrequent segmentation faults; Corrected a bug in read mapping that previously allowed sub-par mappings to be retained; Some simplification to the source code module structure.
0.3	5/16/2011	156k	8.8k	20m	20m	Large reduction to the runtime memory footprint, including correction of memory leaks in the AssemblyJobGraph class and just-in-time loading of sequence/score information from files into RAM; Improvements to thread safety for the reading of sequence information from files into RAM; Implementation of a literal interpretation of fastq (and _sequence.txt) quality scores into the the internal representation of sequence support scores; Small updates to the user manual and --help message.
0.2	5/2/2011	150k	8.5k	20m	20m	Correction to targeting during linked contig edge assembly jobs: targeting alignments now allow gaps; Additional threading of de Bruijn graph assembly, contig edge assembly jobs, and reading paired-end files; New -reset option for rejuvenation of dead contigs; Performance optimizations for gapped alignments; Stronger support for file format variants; Both reads from each edge-mapping pair are now re-mapped to the entire contig set; User specification allowed for transient use of paired-read files across only specified extend cycles; Default -dbms value changed from 5 to 3; Changed location of internal storage of some default parameters.
0.1	4/15/2011	146k	8k	20m	20m	Minor change to avoid spurious palindromic contigs; Gapped alignments used in -target mode; Initial user-level documentation provided.
0.0	4/1/2011	146k		20m	20m	N/A

Return to DeRisi Lab homepage