nf-core/rnavar
gatk4 RNA variant calling pipeline
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.csv$
You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See usage docs.
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Save FastQ files after merging re-sequenced libraries in the results directory.
boolean
Reference genome related files and options required for the workflow.
Name of iGenomes reference.
string
If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38
.
See the nf-core website docs for more details.
Path to FASTA genome file.
string
^\S+\.fn?a(sta)?(\.gz)?$
This parameter is mandatory if --genome
is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with --save_reference
to save BWA index for future runs.
Path to FASTA dictionary file.
string
NB If none provided, will be generated automatically from the FASTA reference.
Path to FASTA reference index.
string
NB If none provided, will be generated automatically from the FASTA reference
Directory / URL base for iGenomes references.
string
s3://ngi-igenomes/igenomes
Do not load the iGenomes reference config.
boolean
Do not load igenomes.config
when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config
.
Path to GTF annotation file.
string
This parameter is mandatory if --genome
is not specified.
Path to GFF3 annotation file.
string
This parameter must be specified if --genome
or --gtf
are not specified.
Path to BED file containing exon intervals. This will be created from the GTF file if not specified.
string
Read length
number
151
Specify the read length for the STAR aligner.
If generated by the pipeline, save the STAR index in the results directory.
boolean
If the STAR index is generated by the pipeline, then please use this parameter to save it to your results folder. These index can then be used for future pipeline runs, reducing processing times.
Path to known indels VCF file
string
Path to known indels index file
string
Path to dbSNP VCF file
string
Path to dbSNP VCF index file
string
snpEff DB version
string
VEP genome
string
If you use AWS iGenomes or a local resource with genomes.conf, this has already been set for you appropriately.
VEP species
string
If you use AWS iGenomes or a local resource with genomes.conf, this has already been set for you appropriately.
VEP cache version
string
Define parameters related to read alignment
Specifies the alignment algorithm to use. Currently available option is 'star'
string
star
This parameter define which aligner is to be used for aligning the RNA reads to the reference genome. Currently only STAR aligner is supported. So use 'star' as the value for this option.
Path to STAR index folder or compressed file (tar.gz)
string
This parameter can be used if there is an pre-defined STAR index available. You can either give the full path to the index directory or a compressed file in tar.gz format.
Enable STAR 2-pass mapping mode.
boolean
This parameter enables STAR to perform 2-pass mapping. Default true.
Do not use GTF file during STAR index buidling step
boolean
Do not use parameter --sjdbGTFfile <GTF file> during the STAR genomeGenerate process.
Option to limit RAM when sorting BAM file. Value to be specified in bytes. If 0, will be set to the genome index size.
integer
This parameter specifies the maximum available RAM (bytes) for sorting BAM during STAR alignment.
Specifies the number of genome bins for coordinate-sorting
integer
50
This parameter specifies the number of bins to be used for coordinate sorting during STAR alignment step.
Specifies the maximum number of collapsed junctions
integer
1000000
Sequencing center information to be added to read group of BAM files.
string
This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.
Specify the sequencing platform used
string
illumina
This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.
Where possible, save unaligned reads from aligner to the results directory.
boolean
This may either be in the form of FastQ or BAM files depending on the options available for that particular tool.
Save the intermediate BAM files from the alignment step.
boolean
By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.
Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with larger chromosome sizes.
boolean
Specify whether to remove duplicates from the BAM during Picard MarkDuplicates step.
boolean
Specify true for removing duplicates from BAM file during Picard MarkDuplicates step.
The minimum phred-scaled confidence threshold at which variants should be called.
number
20
Specify the minimum phred-scaled confidence threshold at which variants should be called.
Specify which tools RNAvar should use for annotating variants. Values can be 'snpeff', 'vep' or 'merge'. If you specify 'merge', the pipeline runs both snpeff and VEP annotation.
string
List of tools to be used for variant annotation.
This parameter must be a combination of the following values:snpeff
, vep
, merge
Enable the use of cache for annotation
boolean
And disable usage of snpeff and vep specific containers for annotation
To be used with --snpeff_cache
and/or --vep_cache
Enable CADD cache.
boolean
Path to CADD InDels file.
string
Path to CADD InDels index.
string
Path to CADD SNVs file.
string
Path to CADD SNVs index.
string
Enable the use of the VEP GeneSplicer plugin.
boolean
Path to snpEff cache
string
To be used with --annotation_cache
Path to VEP cache
string
To be used with --annotation_cache
Define parameters that control the stages in the pipeline
Skip the process of base recalibration steps i.e., GATK BaseRecalibrator and GATK ApplyBQSR.
boolean
This parameter disable the base recalibration step, thus using a un-calibrated BAM file for variant calling.
Skip the process of preparing interval lists for the GATK variant calling step
boolean
This parameter disable preparing multiple interval lists to use with HaplotypeCaller module of GATK. It is recommended not to disable the step as it is required to run the variant calling correctly.
Skip variant filtering of GATK
boolean
Set this parameter if you don't want to filter any variants.
Skip variant annotation
boolean
Set this parameter if you don't want to run variant annotation.
Skip MultiQC reports
boolean
This parameter disable all QC reports
Define parameters of the tools used in the pipeline
Number of times the gene interval list to be split in order to run GATK haplotype caller in parallel
integer
25
Set this parameter to decide the number of splits for the gene interval list file.
Do not use gene interval file during variant calling
boolean
This parameter, if set to True, does not use the gene intervals during the variant calling step, which then results in variants from all regions including non-genic. Default is False
The window size (in bases) in which to evaluate clustered SNPs.
integer
35
This parameter is used by GATK variant filteration step. It defines the window size (in bases) in which to evaluate clustered SNPs. It has to be used together with the other option 'cluster'.
The number of SNPs which make up a cluster. Must be at least 2.
integer
3
This parameter is used by GATK variant filteration step. It defines the number of SNPs which make up a cluster within a window. Must be at least 2.
Value to be used for the FisherStrand (FS) filter
number
30
This parameter defines the value to use for the FisherStrand (FS) filter in the GATK variant-filtering step.
The value should given in a float number format. Default is 30.0
Value to be used for the QualByDepth (QD) filter
number
2
This parameter defines the value to use for the QualByDepth (QD) filter in the GATK variant-filtering step.
The value should given in a float number format. Default is 2.0
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer
16
Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1
Maximum amount of memory that can be requested for any single job.
string
128.GB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'
Maximum amount of time that can be requested for any single job.
string
240.h
^(\d+\.?\s*(s|m|h|day)\s*)+$
Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'
Less common options for the pipeline, typically set in a config file.
Display help text.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Custom config file to supply to MultiQC.
string
Directory to keep pipeline Nextflow logs and reports.
string
${params.outdir}/pipeline_info
Boolean whether to validate parameters against the schema at runtime
boolean
true
Show all params when using --help
boolean
Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.
boolean