Usage 1: circRNA quantifcation¶
Basic options¶
usage: CIRIquant [-h] [--config FILE] [-1 MATE1] [-2 MATE2] [-o DIR]
[-p PREFIX] [-t INT] [-a INT] [-l INT] [--ciri3] [-v]
[--version] [-e LOG] [--bed FILE] [--circ FILE] [--tool TOOL]
[--RNaseR FILE] [--bam BAM] [--no-gene] [--no-fsj]
[--bsj-file FILE]
optional arguments:
-h, --help show this help message and exit
--config FILE Config file in YAML format
-1 MATE1, --read1 MATE1
Input mate1 reads (for paired-end data)
-2 MATE2, --read2 MATE2
Input mate2 reads (for paired-end data)
-o DIR, --out DIR Output directory, default: ./
-p PREFIX, --prefix PREFIX
Output sample prefix, default: input sample name
-t INT, --threads INT
Number of CPU threads, default: 4
-a INT, --anchor INT Minimum anchor length for junction alignment, default:
5
-l INT, --library-type INT
Library type, 0: unstranded, 1: read1 match the sense
strand,2: read1 match the antisense strand, default: 0
-v, --verbose Run in debugging mode
--version show program's version number and exit
-e LOG, --log LOG Log file, default: out_dir/prefix.log
--bed FILE bed file for putative circRNAs (optional)
--circ FILE circRNA prediction results from other softwares
--tool TOOL circRNA prediction tool, required if --circ is
provided
--RNaseR FILE CIRIquant result of RNase R sample
--bam BAM hisat2 alignment to reference genome
--no-gene Skip stringtie estimation for gene abundance
--no-fsj Skip FSJ extraction to reduce run time
--bsj-file FILE output BSJ read IDs to file (optional)
NOTE:
For now, –circ and –tool options support results from
CIRI2
/CIRCexplorer2
/DCC
/KNIFE
/MapSplice
/UROBORUS
/circRNA_finder
/find_circ
For tools like
DCC
andcircRNA_finder
, please manually remove duplicated circRNAs with same junction postion but have opposite strands.Gene expression values are needed for normalization, do not use
--no-gene
if you need to run DE analysis afterwards.
Example YAML config¶
A YAML-formated config file is needed for CIRIquant to find software and reference needed.
A valid example of minimal config file:
reference:
fasta: /home/zhangjy/Data/database/hg19.fa
gtf: /home/zhangjy/Data/database/gencode.v19.annotation.gtf
bwa_index: /home/zhangjy/Data/database/hg19/_BWAtmp/hg19
hisat_index: /home/zhangjy/Data/database/hg19/_HISATtmp/hg19
An example of supported config file:
// Example of config file
name: hg19
tools:
bwa: /home/zhangjy/bin/bwa
hisat2: /home/zhangjy/bin/hisat2
stringtie: /home/zhangjy/bin/stringtie
samtools: /home/zhangjy/bin/samtools
reference:
fasta: /home/zhangjy/Data/database/hg19.fa
gtf: /home/zhangjy/Data/database/gencode.v19.annotation.gtf
bwa_index: /home/zhangjy/Data/database/hg19/_BWAtmp/hg19
hisat_index: /home/zhangjy/Data/database/hg19/_HISATtmp/hg19
Key | Description |
---|---|
name | the name of config file (optional) |
bwa | the path of bwa (optional, defaults to bwa in $PATH) |
hisat2 | the path of hisat2 (optional, defaults to hisat2 in $PATH) |
stringtie | the path of stringite (optional, defaults to stringtie in $PATH) |
samtools | the path of samtools , samtools version below 1.3.1 is not supported (optional, defaults to samtools in $PATH) |
fasta | reference genome fasta, a fai index by samtools faidx is also needed under the same directory |
gtf | annotation file of reference genome in GTF/GFF3 format |
bwa_index | prefix of BWA index for reference genome |
hisat_index | prefix of HISAT2 index for reference genome |
Example circRNA bed file¶
For quantification of user-provided circRNAs, a list of junction sites in bed format is required, the 4th column must be in “chrom:start|end” format. For example:
chr1 10000 10099 chr1:10000|10099 . +
chr1 31000 31200 chr1:31000|31200 . -
Example Usage¶
Recommended: Predict circRNAs using CIRI2 (packaged in CIRIquant)¶
CIRIquant -t 4 \
-1 ./test_1.fq.gz \
-2 ./test_2.fq.gz \
--config ./chr1.yml \
-o ./test \
-p test
Quantify circRNAs using provided BED format input¶
CIRIquant -t 4 \
-1 ./test_1.fq.gz \
-2 ./test_2.fq.gz \
--config ./chr1.yml \
-o ./test \
-p test \
--bed your_circRNAs.bed
Quantify circRNAs using results from other tools¶
For example, if you have find_circ
results of predicted circRNAs.
CIRIquant -t 4 \
-1 ./test_1.fq.gz \
-2 ./test_2.fq.gz \
--config ./chr1.yml \
-o ./test \
-p test \
--circ find_circ_results.txt \
--tool find_circ
Output format¶
The main output of CIRIquant is a GTF file, that contains detailed information of BSJ and FSJ reads of circRNAs and annotation of circRNA back-spliced regions in the attribute columns
Description of each columns’s value
column | name | description |
---|---|---|
1 | chrom | chromosome / contig name |
2 | source | CIRIquant |
3 | type | circRNA |
4 | start | 5' back-spliced junction site |
5 | end | 3' back-spliced junction site |
6 | score | CPM of circRNAs (#BSJ / #Mapped reads) |
7 | strand | strand information |
8 | . | . |
9 | attributes | attributes seperated by semicolon |
The attributes containing several pre-defined keys and values:
key | description |
---|---|
circ_id | name of circRNA |
circ_type | circRNA types: exon / intron / intergenic |
bsj | number of bsj reads |
fsj | number of fsj reads |
junc_ratio | circular to linear ratio: 2 * bsj / ( 2 * bsj + fsj) |
rnaser_bsj | number of bsj reads in RNase R data (only when --RNaseR is specificed) |
rnaser_fsj | number of fsj reads in RNase R data (only when --RNaseR is specificed) |
gene_id | ensemble id of host gene |
gene_name | HGNC symbol of host gene |
gene_type | type of host gene in gtf file |