About¶

Manual of CIRI-full v2.0

If you have any questions, please contact

Yi Zheng @ Beijing Institutes of Life Science, Chinese Academy of Sciences.
Email: zhengyi12@mails.ucas.ac.cn

CIRI-full is an accurate, high-throughput approach that uses both BSJ and reverse overlap (RO) features to reconstruct and quantify full-length circular RNAs from RNA-seq data sets. In CIRI-full, the BSJ feature is employed to detect cirexons and to determine the boundaries of circRNAs. The RO feature, deduced from the overlapped sequence of paired-end reads, is used to explore the detailed landscape within boundary sites. The alignments of both BSJ & RO merged reads will be visualized. The relative abundance of isoforms within one circRNA will be estimated according to the coverage and spliced events of BSJ & RO merged reads.

Installation¶

CIRI-full is developed in JAVA, and it can be performed in any system which has Java SE Runtime Environment.It requires:

bwa:		A read mapping tool, which generates SAM file for CIRI-full, CIRI & CIRI-AS https://sourceforge.net/projects/bio-bwa/files/
CIRI2:		A circRNA detection tool  https://sourceforge.net/projects/ciri/
CIRI-AS:	A tool to detect cirexon and alternative splicing events in circRNAs https://sourceforge.net/projects/ciri/

CIRI2 and CIRI-AS are already packed with the CIRI-full software.

After downloading the CIRI-full package, you can extract it by typing:

unzip CIRI-full.zip	cd CIRI-full2. Preparation for running CIRI-full

Before running CIRI-full, you need to run CIRI and CIRI-AS to detect circRNAs and their associated BSJs and cirexons from your sequence data.

Here is a recommend protocol to run CIRI and CIRI-AS:

# Index the reference genome:
bwa index -a bwtsw reference.fa
# Split mapping using bwa-mem:
bwa mem -T 19 -t number_thread reference.fa read_1.fq read_2.fq > read.sam 
#  2.3 Running CIRI & CIRI-AS 
perl CIRI.pl -I read.sam -O prefix.ciri -F reference.fa -A annotation.gtf -T number_thread
perl CIRI_AS.pl -S read.sam -C prefix.ciri -F reference.fa -A annotation.gtf -O prefix -D yes

For detailed instructions on above tools, please read the manuals of bwa, CIRI and CIRI-AS.

Running CIRI-full pipeline¶

The CIRI-full Pipeline module is an automatic pipeline for detecting and reconstructing circRNAs. This pipeline includes CIRI, CIRI-AS and CIRI-full tools, which will finally generate reconstructed full-length circRNA sequences and the annotation of all identified circRNAs.

Before running the Pipeline module, please make sure that bwa is added to $PATH

The Pipeline module runs from a command line as follows:

java –jar CIRI-full.jar Pipeline [options]

Options:

-1	reads1 of paired-end reads (required, equal length, fastq or fastq.gz format)
-2	reads2 of paired-end reads (required, equal length, fastq or fastq.gz format)
-r	reference genome in fasta format, the same file used in preparation step when building bwa index (required). 
-a	annotation file of reference genome in GTF format (optional).
-o	prefix of output files (optional, default: out)
-d	directory of output files (required)
-t	number of threads used in CIRI and bwa mem (optional, default: 1)	
-0	output all circRNAs including those with only one BSJ read support (optional, option for CIRI)

Four folders will be created under the dictionary set by -d option, CIRI_output/, CIRI-AS_output/, CIRI-full_output/ and sam/, which contain the output files of CIRI, CIRI-AS, CIRI-full and bwa.

For detailed information of these files, please refer to the following instructions.

Running CIRI-full step-by-step¶

CIRI-full includes three modules, RO1, RO2 and Merge. These modules should be performed sequentially in the following order: RO1, RO2 and Merge.

The RO1 module¶

This module is designed to identify 5’-RO feature on paired-end reads from RNA-seq data set and then, merge these RO containing paired-end reads into long single-end reads.

The RO1 module runs from a command line as follows:

java -jar CIRI_full.jar RO1 [options]

Options:

-1	read1 of paired-end reads (required, equal length)	-2	read2 of paired-end reads (required, equal length)
-o	prefix of output files (optional,default: out)
-minM	sets the number of minimum 5’-RO length (optional, integer, default 13)
-minI	sets the minimum identity percentage of 5’-RO alignment (optional, default 95)

RO1 module will generate two output files:

prefix_ro1_align.txt		
prefix_ro1.fq

Description of prefix_ro1_align.txt:

Each column gives the alignment information of each read pair which contain 5’-RO feature.

#read_id
#alignment_identity
#start_position_on_read1
#end_position_on_read1
#start_position_on_read2
#end_position_on_read2
#read_length

Description of prefix_ro1.fq

Read pairs with 5’-RO feature are merged into long sequences in FASTQ format. These sequences are taken as candidate RO merged-reads and will be filtered in the following steps.

The RO2 module¶

The RO2 module is to analyze the alignment results of candidate RO merged-reads and screen out authentic ones for reconstructing full-length circRNAs.

Data preparation before running the RO2 module:

RO2 module filters RO merged-reads based on the SAM file generated by bwa-mem.

A recommended protocol for running bwa-mem:

bwa index -a bwtsw reference.fa	bwa mem -T 19 reference.fa prefix_ro1.fq > prefix_ro1.sam 

Note that prefix_ro1.fq file is the output file in the previous step (the RO1 module).

The RO2 module runs from a command line as follows:

java -jar CIRI_full.jar RO2 [options]Options:

Options:

-r	reference genome in fasta format, the same file used in the preparation step when building bwa index (required). 
-s	SAM alignment of prefix.ro1.fq generated by bwa mem (required).
-l	the read length of given RNA-seq paired end data (required).
-range	maximum spanning distance of circRNAs on the reference(optional, integer, default 100000). 
-o	prefix of output files (required)

RO2 module will generate following output files:

prefix_ro2.sam
prefix_ro2_info.list

Description of prefix_ro2.sam:

This file is the SAM alignment of authentic RO reads.

Description of prefix_ro2_info.list:

This file gives the detailed alignment information of authentic RO reads.

Columns are separated by tabs:

#Read_ID
#Chr
#BSJ_position
#Strand
#Reconstructed_state
#Cirexon
#Mapping_order
#Splice_site_state+
#Splice_site_state-

#Splice_site_state+/- represents the mapping boundary deviation from the GT/AG splicing site, where -1 indicates that GT/AG splicing site cannot be detected on the current strand; positive value represents the distance between GT/AG splice site and split mapping position.

The Merge module¶

The Merge module combines the results of RO2 and CIRI-AS to reconstruct full-length circRNAs.

The Merge module runs from a command line as follows:

java –jar CIRI_full.jar Merge [options]

Options:

-a	annotation file of the reference genome in GTF format (optional).
-c	output file of CIRI (required)
-as	output_all file generated in CIRI-AS (using -D yes argument. This file has a suffix “_jav.list” ) (required)
-ro	RO read information file (prefix_ro2_info.list) generated by RO2 module  (required)
-o	prefix of output files (required)
-r	reference genome file (in FASTA format) (required)

The Merge module will generate three output files.

prefix_merge_circRNA_detail.anno

Description of prefix_merge_circRNA_detail.anno

This file contains mapping information of BSJ reads (detected by CIRI) and RO merged-reads (detected by RO). Reads are clustered according to the BSJ position. Columns are separated by tabs:

#BSJ
#Chr
#Start
#End
#GTF-annotated_exon
#Cirexon
#Coveage
#BSJ_reads_information
#RO_reads_information
#Original_gene

Running CIRI-vis¶

CIRI-vis is a tool for visualizing alignments of BSJ & RO merged reads and estimating the related abundance of isoforms according to the output of CIRI-full (prefix_merge_circRNA_detail.anno) or CIRI-AS (prefix_jav.list).

CIRI-vis.jar runs from a command line as follows:

java -jar CIRI-vis.jar [Options]

Options:

-i		The path of input file of CIRI-vis. (required)
-l		The path of library length file. (required for isoform quantification)
-r		The path of reference genome sequence in FASTA format. (required for output circRNA sequence)
-list		The list of chosen circRNA BSJ. (optional)
-d		The dictionary of output. Default currentdir/stdir
-max	The maximum expression (BSJ reads number) of circRNA that displayed by CIRI-vis. Default 999999999
-min	The minimum expression (BSJ reads number) of circRNA that displayed by CIRI-vis. Default 10. **Note: please only use one of -min, -exp, -rank**
-rank	Only display the expression top X% of circRNA
-exp		Only display the top expression circRNA that contain X% of BSJ reads.
-iso		The maximum number of considering isoform, default 10. High value will make the quantification slower

CIRI-vis will output a set of pdf file, a “.list” file and a “.fa” file(if reference genome file is available) in a new created folder(set by “-d” parameter):

One pdf file display circRNA isoforms on one BSJ.
“.list” file shows detail information of each isoform.
“.fa” file shows the sequences of fully reconstructed circRNA isoforms.

Description of prefix.list:

This file gives the detailed information of circRNA isoforms. Columns are separated by tabs:

Columns	Description
1	The name of pdf file.
2	ID of the BSJ position of circRNA isoform in the pattern of "chr:start
3	chromosome of a predicted circRNA isoform
4	start loci of a predicted circRNA isoform on the chromosome
5	end loci of a predicted circRNA isoform on the chromosome
6	circular junction read (also called as back-spliced junction read) count of a predicted circRNA
7	the serial number of isoform in circRNA
8	the estimate BSJ read count of this predicted isoform.
9	the minimum length of this predicted isoform.
10	whether this predicted isoform is fully reconstructed.
11	The cirexon position in this predicted isoform, “0-0” represent for the breakpoint during reconstruction.

Description of prefix.fa:

This FASTA format file will be generated if reference genome sequence is available. It contains the sequence of fully reconstructed isoform. They were named in this format:

>(Image_name)#(BSJ) length=(isoform_length) (isoform_BSJ_read_count)/(circRNA_BSJ_read_count)

If you want to display only a subset of circRNA, please use parameter “-list” to give CIRI-vis a list of BSJ position. The format should be like:

chr10:74474869|74475660
chr8:141856359|141900868

Notes:

IF you ran CIRI-full Pipeline in previous step, the input file will be named prefix_merge_circRNA_detail.anno under CIRI-full_output folder.
IF you only ran CIRI-AS with ‘-d yes’ parameter in previous step, the input file will be named prefix_jav.list under your CIRI-AS output folder.
Library length file is necessary for isoform expression estimation. library length file will be prefix _library_length.list under your CIRI-AS output folder

How to run the test data set using CIRI-full¶

Test data sets (FASTQ file, annotation file and reference sequence) are packaged with the CIRI-full software, which can be found in the “CIRI-full_test/“ folder. Temporary and final results are given in the “CIRI-full/test_output/” folder.

Here are the commands for running the test data sets:

cd CIRI-full_v2.0/CIRI-full_test/
bwa index test_ref.fajava -jar ../CIRI-full.jar Pipeline -1 test_1.fq.gz -2 test_2.fq.gz -a test_anno.gtf -r test_ref.fa -d test_output/ -o testunset DISPLAY
java -jar ../CIRI-vis.jar -i test_output/CIRI-full_output/test_merge_circRNA_detail.anno -l ../CIRI-vis_test/test_library_length.list -r test_ref.fa –d test_output/CIRI-vis_out -min 1

If you want to run CIRI-full step by step, you can use the following commands:

cd CIRI-full_v2.0/CIRI-full_test/
mkdir test_output
bwa index test_ref.fa
bwa mem -T 19 test_ref.fa test_1.fq.gz test_2.fq.gz > test_output/test.sam 
perl ../bin/CIRI2.pl -I test_output/test.sam -O test_output/test.ciri -F test_ref.fa -A test_anno.gtf 
perl ../bin/CIRI_AS_v1.2.pl -S test_output/test.sam -C test_output/test.ciri -F test_ref.fa -A test_anno.gtf -O test_output/test -D yes
java -jar ../CIRI-full.jar RO1 -1 test_1.fq.gz -2 test_2.fq.gz -o test_output/test 
bwa mem -T 19 test_ref.fa test_output/test_ro1.fq > test_output/test_ro1.sam
java -jar ../CIRI-full.jar RO2 -r test_ref.fa -s test_output/test_ro1.sam -l 250 -o test_output/test
java -jar ../CIRI-full.jar Merge -c test_output/test.ciri -as test_output/test_jav.list -ro test_output/test_ro2_info.list -a test_anno.gtf -r test_ref.fa -o test_output/test
unset DISPLAY
java -jar ../CIRI-vis.jar -i test_output/CIRI-full_output/test_merge_circRNA_detail.anno -l ../CIRI-vis_test/test_library_length.list -r test_ref.fa -min 1

Note: Please make sure you are using exactly the same version of genomic sequences and their annotations when running CIRI-full.