Usage 3: Differential expression analysis

Study without biological replicate

For sample without replicate, the differential expression & differential splicing analysis is performed using CIRI_DE

Usage:
  CIRI_DE [options] -n <control> -c <case> -o <out>

  <control>         CIRIquant result of control sample
  <case>            CIRIquant result of treatment cases
  <out>             Output file

Options (defaults in parentheses):

  -p                p value threshold for DE and DS score calculation (default: 0.05)
  -t                numer of threads (default: 4)

Example usage:
  CIRI_DE -n control.gtf -c case.gtf -o CIRI_DE.tsv

The output format CIRI_DE is in the format below:

column name description
1 circRNA_ID circRNA identifier
2 Case_BSJ number of BSJ reads in case
3 Case_FSJ number of FSJ reads in case
4 Case_Ratio junction ratio in case
5 Ctrl_BSJ number of BSJ reads in control
6 Ctrl_FSJ number of FSJ reads in control
7 Ctrl_Ratio junction ratio in control
8 DE_score differential expression score
9 DS_score differential splicing score

Study with biological replicates

For study with biological replicates, a customed analysis pipeline of edgeR is recommended and we provide prep_CIRIquant to generate matrix of circRNA expression level / junction ratio and CIRI_DE_replicate for DE analysis

Step1: Prepare CIRIquant output files

One should provide a text file listing sample information and path to CIRIquant output GTF files

CONTROL1 ./c1/c1.gtf C 1
CONTROL2 ./c2/c2.gtf C 2
CONTROL3 ./c3/c3.gtf C 3
CASE1 ./t1/t1.gtf T 1
CASE2 ./t2/t2.gtf T 2
CASE3 ./t3/t3.gtf T 3

The first three columns is required by default. For paired samples, you could also add a column of subject name.

column description
1 sample name
2 path to CIRIquant output gtf
3 group ("C" for control, "T" for treatment)
4 subject (optional, only for paired samples)

Note: If you are planning to use CIRI_DE for differential expression, then group name in column 3 must be either “C” or “T”.

Then, run prep_CIRIquant to summarize the circRNA expression profile in all samples

Usage:
  prep_CIRIquant [options]

  -i                the file of sample list
  --lib             where to output library information
  --circ            where to output circRNA annotation information
  --bsj             where to output the circRNA expression matrix
  --ratio           where to output the circRNA junction ratio matrix

Example:
  prep_CIRIquant -i sample.lst \
                 --lib library_info.csv \
                 --circ circRNA_info.csv \
                 --bsj circRNA_bsj.csv \
                 --ratio circRNA_ratio.csv

These count matrices (CSV files) can then be imported into R for use by DESeq2 and edgeR (using the DESeqDataSetFromMatrix and DGEList functions, respectively).

Step2: Prepare StringTie output

The output of StringTie should locate under output_dir/gene/prefix_out.gtf. You need to use prepDE.py from stringTie to generate the gene count matrix for normalization.

For example, one can provide a text file sample_gene.lst containing sample IDs and path to StringTie outputs:

CONTROL1 ./c1/gene/c1_out.gtf
CONTROL2 ./c2/gene/c2_out.gtf
CONTROL3 ./c3/gene/c3_out.gtf
CASE1 ./t1/gene/t1_out.gtf
CASE2 ./t2/gene/t2_out.gtf
CASE3 ./t3/gene/t3_out.gtf

Then, run prepDE.py -i sample_gene.lst and use gene_count_matrix.csv generated under current working directory for further analysis.

Step3: Differential expression analysis

For differential analysis using CIRI_DE_replicate, you need to install a R environment and edgeR package from Bioconductor.

usage: CIRI_DE_replicate [-h] --lib FILE --bsj FILE --gene FILE --out
                              FILE --out2 FILE

optional arguments:
  -h, --help   show this help message and exit
  --lib FILE   library information
  --bsj FILE   circRNA expression matrix
  --gene FILE  gene expression matrix
  --out FILE   output result of circRNA differential expression analysis
  --out2 FILE  output result of gene differential expression analysis

Example:
  CIRI_DE_replicate \
          --lib  library_info.csv \
          --bsj  circRNA_bsj.csv \
          --gene gene_count_matrix.csv \
          --out  circRNA_de.tsv \
          --out2 gene_de.tsv

Please be noted that the output results is unfiltered, and you could apply a more stringent filter on expression values to get a more convincing result.