flexbar - flexible barcode and adapter removal ============================================== SYNOPSIS flexbar -r reads [-b barcodes] [-a adapters] [options] DESCRIPTION The program Flexbar preprocesses high-throughput sequencing data efficiently. It demultiplexes barcoded runs and removes adapter sequences. Several adapter removal presets for Illumina libraries are included. Flexbar computes exact overlap alignments using SIMD and multicore parallelism. Moreover, trimming and filtering features are provided, e.g. trimming of homopolymers at read ends. Flexbar increases read mapping rates and improves genome as well as transcriptome assemblies. Unique molecular identifiers can be extracted in a flexible way. The software supports data in fasta and fastq format from multiple sequencing platforms. Refer to the manual on github.com/seqan/flexbar/wiki or contact Johannes Roehr on github.com/jtroehr for support with this application. OPTIONS -h, --help Display the help message. -hh, --full-help Display the help message with advanced options. --version-check BOOL Turn this option off to disable version update notifications of the application. One of 1, ON, TRUE, T, YES, 0, OFF, FALSE, F, and NO. Default: 1. -hm, --man-help Print advanced options as man document. -v, --versions Print Flexbar and SeqAn version numbers. -c, --cite Show program references for citation. Basic options: -n, --threads INTEGER Number of threads to employ. Default: 1. -N, --bundle INTEGER Number of (paired) reads per thread. Default: 256. -M, --bundles INTEGER Process only certain number of bundles for testing. -t, --target OUTPUT_PREFIX Prefix for output file names or paths. Default: flexbarOut. -r, --reads INPUT_FILE Fasta/q file or stdin (-) with reads that may contain barcodes. -p, --reads2 INPUT_FILE Second input file of paired reads, gz and bz2 files supported. -i, --interleaved Interleaved format for first input set with paired reads. -I, --iupac Accept iupac symbols in reads and convert to N if not ATCG. Barcode detection: -b, --barcodes INPUT_FILE Fasta file with barcodes for demultiplexing, may contain N. -b2, --barcodes2 INPUT_FILE Additional barcodes file for second read set in paired mode. -br, --barcode-reads INPUT_FILE Fasta/q file containing separate barcode reads for detection. -bo, --barcode-min-overlap INTEGER Minimum overlap of barcode and read. Default: barcode length. -be, --barcode-error-rate DOUBLE Error rate threshold for mismatches and gaps. Default: 0.0. -bt, --barcode-trim-end STRING Type of detection, see section trim-end modes. Default: LTAIL. -bn, --barcode-tail-length INTEGER Region size in tail trim-end modes. Default: barcode length. -bk, --barcode-keep Keep barcodes within reads instead of removal. -bu, --barcode-unassigned Include unassigned reads in output generation. -bm, --barcode-match INTEGER Alignment match score. Default: 1. -bi, --barcode-mismatch INTEGER Alignment mismatch score. Default: -1. -bg, --barcode-gap INTEGER Alignment gap score. Default: -9. Adapter removal: -a, --adapters INPUT_FILE Fasta file with adapters for removal that may contain N. -a2, --adapters2 INPUT_FILE File with extra adapters for second read set in paired mode. -as, --adapter-seq STRING Single adapter sequence as alternative to adapters option. -aa, --adapter-preset STRING One of TruSeq, SmallRNA, Methyl, Ribo, Nextera, and NexteraMP. -ao, --adapter-min-overlap INTEGER Minimum overlap for removal without pair overlap. Default: 3. -ae, --adapter-error-rate DOUBLE Error rate threshold for mismatches and gaps. Default: 0.1. -at, --adapter-trim-end STRING Type of removal, see section trim-end modes. Default: RIGHT. -an, --adapter-tail-length INTEGER Region size for tail trim-end modes. Default: adapter length. -ax, --adapter-relaxed Skip restriction to pass read ends in right and left modes. -ap, --adapter-pair-overlap STRING Overlap detection of paired reads. One of ON, SHORT, and ONLY. -av, --adapter-min-poverlap INTEGER Minimum overlap of paired reads for detection. Default: 40. -ac, --adapter-revcomp STRING Include reverse complements of adapters. One of ON and ONLY. -ad, --adapter-revcomp-end STRING Use different trim-end for reverse complements of adapters. -ab, --adapter-add-barcode Add reverse complement of detected barcode to adapters. -ar, --adapter-read-set STRING Consider only single read set for adapters. One of 1 and 2. -ak, --adapter-trimmed-out STRING Modify that trimmed reads are kept. One of OFF and ONLY. -ay, --adapter-cycles INTEGER Number of adapter removal cycles. Default: 1. -am, --adapter-match INTEGER Alignment match score. Default: 1. -ai, --adapter-mismatch INTEGER Alignment mismatch score. Default: -1. -ag, --adapter-gap INTEGER Alignment gap score. Default: -6. Filtering and trimming: -u, --max-uncalled INTEGER Allowed uncalled bases N for each read. Default: 0. -x, --pre-trim-left INTEGER Trim given number of bases on 5' read end before detection. -y, --pre-trim-right INTEGER Trim specified number of bases on 3' end prior to detection. -k, --post-trim-length INTEGER Trim to specified read length from 3' end after removal. -m, --min-read-length INTEGER Minimum read length to remain after removal. Default: 18. Quality-based trimming: -q, --qtrim STRING Quality-based trimming mode. One of TAIL, WIN, and BWA. -qf, --qtrim-format STRING Quality format. One of sanger, solexa, i1.3, i1.5, and i1.8. -qt, --qtrim-threshold INTEGER Minimum quality as threshold for trimming. Default: 20. -qw, --qtrim-win-size INTEGER Region size for sliding window approach. Default: 5. -qa, --qtrim-post-removal Perform quality-based trimming after removal steps. Trimming of homopolymers: -hl, --htrim-left STRING Trim specific homopolymers on left read end after removal. -hr, --htrim-right STRING Trim certain homopolymers on right read end after removal. -hi, --htrim-min-length INTEGER Minimum length of homopolymers at read ends. Default: 3. -h2, --htrim-min-length2 INTEGER Minimum length for homopolymers specified after first one. -hx, --htrim-max-length INTEGER Maximum length of homopolymers on left and right read end. -hf, --htrim-max-first Apply maximum length of homopolymers only for first one. -he, --htrim-error-rate DOUBLE Error rate threshold for mismatches. Default: 0.1. -ha, --htrim-adapter Trim only in case of adapter removal on same side. Output selection: -f, --fasta-output Prefer non-quality format fasta for output. -z, --zip-output STRING Direct compression of output files. One of GZ and BZ2. -1, --stdout-reads Write reads to stdout, tagged and interleaved if needed. -R, --output-reads OUTPUT_FILE Output file for reads instead of target prefix usage. -P, --output-reads2 OUTPUT_FILE Output file for reads2 instead of target prefix usage. -j, --length-dist Generate length distribution for read output files. -s, --single-reads Write single reads for too short counterparts in pairs. -S, --single-reads-paired Write paired single reads with N for short counterparts. Logging and tagging: -l, --align-log STRING Print chosen read alignments. One of ALL, MOD, and TAB. -o, --stdout-log Write statistics to stdout instead of target log file. -O, --output-log OUTPUT_FILE Output file for logging instead of target prefix usage. -g, --removal-tags Tag reads that are subject to adapter or barcode removal. -e, --number-tags Replace read tags by ascending number to save space. -d, --umi-tags Capture UMIs in reads at barcode or adapter N positions. TRIM-END MODES ANY: longer side of read remains after removal of overlap LEFT: right side remains after removal, align <= read end RIGHT: left part remains after removal, align >= read start LTAIL: consider first n bases of reads in alignment RTAIL: use only last n bases, see tail-length options EXAMPLES flexbar -r reads.fq -t target -q TAIL -qf i1.8 flexbar -r reads.fq -b barcodes.fa -bt LTAIL flexbar -r reads.fq -a adapters.fa -ao 3 -ae 0.1 flexbar -r r1.fq -p r2.fq -a a1.fa -a2 a2.fa -ap ON flexbar -r r1.fq -p r2.fq -aa TruSeq -ap ON VERSION Last update: May 2019 flexbar version: 3.5.0 SeqAn version: 2.4.0 Available on github.com/seqan/flexbar