Pipelines ========= HCMV pipeline ############# **alignmentStats_ReCVR.sh** is a bash script written to align multiple fastq files to a reference sequence. The script expect 4 arguments. .. code-block:: bash alignmentStats_ReCVR.sh **Input** **\** is a directory of paired fastq files. The fastq files need to be in a directory and have the extension \_R1\_001.fastq and \_R2\_001.fastq (or \_R1\_001.fq and \_R2\_001.fq). **\** is the name of the reference fasta file (e.g., merlin.fa) **\** is the fasta file with the signatures of different HCMV strains. **\** is the name of the reference library when building the bowtie2 index Dependencies ************ `cutadapt `_ `trim_galore `_ `bowtie2 `_ `samtools `_ `weeSAMv1.4 `_ `gawk `_ `SamRemoveIndels.awk `_ - hash-bang may need to be changed depending on your gawk installation `UniqSamPE.awk `_ - hash-bang may need to be changed depending on your gawk installation `miRNA_Search `_ Pipeline ************ * Step 1: Each fastq file in the folder is trimmed using `trim_galore `_ with the following settings (--paired --length 21 --quality 10 --stringency 3). * Step 2: The processed reads are subsequently aligned against the reference provided using bowtie2 allowing for a maximum fragment length of 1200 (-X 1200) * Step 3: The assembly statistics are generated using `weeSAMv1.4 `_. A newer version of weeSAM is available `here `_ if you wish to have more comprehensive statistics. A number of assembly statistics are also printed to the terminal. * Step 4: The library diversity is also estimated, first using `SamRemoveIndels.awk `_ and then with `UniqSamPE.awk `_. These provide additional statistics which enable the calculation of the Ratio of total to unique coverage. The diversity of genotypes in the sample is also estimated using `miRNA_Search `_, which used the signature motifs to determine the number of posible strains in the sample. * Step 5: The final stats are printed to output.csv in the input directory.