Informatics for RNA-sequence Analysis (2013)

 

Course Objectives

High-throughput sequencing of RNA libraries (RNA-seq) has become increasingly common and largely supplanted gene microarrays for transcriptome profiling. When processed appropriately, RNA-seq data has the potential to provide a considerably more detailed view of the transcriptome. The CBW has developed a 2-day course providing an introduction to RNA-seq data analysis followed by integrated tutorials demonstrating the use of popular RNA-seq analysis packages. The tutorials are designed as self contained units that include example data (Illumina paired-end RNA-seq data) and detailed instructions for installation of all required bioinformatics tools (TopHat, Cufflinks, etc.).
Participants will gain practical experience and skills to be able to:
  • Align RNA-seq data to a reference genome
  • Estimate known gene and transcript expression
  • Perform differential expression analysis
  • Detect expressed gene fusions
  • Discover novel isoforms
  • Visualize and summarize the output of RNA-seq analyses

Target Audience

Graduates, postgraduates and PIs working with or about to embark on an analysis of RNA-seq data. Attendees may be familiar with some aspect of RNA-seq analysis (e.g. gene expression analysis) or have no direct experience.
Prerequisites for attendance
Basic familiarity with Linux environment and S, R, or Matlab. Must be able to complete and understand the following simple Linux and R tutorials before attending:

Course Outline

Day 1

Module 1 - Introduction to RNA sequencing (2013) (Faculty: Malachi Griffith &: Obi Griffith)
  • Overview of course
  • Basic introduction to the technology of RNA sequencing (RNA-seq)
  • Experimental design considerations
  • Commonly asked questions
  • Brief introduction to “cloud computing” prior to first tutorial
Lab Exercise:
  • Examine and understand the format of raw FastQ files
  • Obtain reference genomes (fasta) and gene annotation resources (GTF/GFF)
Module 2 - RNA-seq alignment and visualization (2013) (Faculty: Malachi Griffith)
  • Use of Bowtie/TopHat
  • ‘Regular mode’ vs. ‘Fusion mode’
  • Introduction to the BAM format
  • Basic manipulation of BAMs with samtools, etc.
  • Visualization of RNA-seq alignments - IGV
  • BAM read counting and determination of variant allele expression status
Lab Exercise:
  • Run Bowtie2/TopHat2 with parameter suitable for gene expression analysis
  • Use samtools to demonstrate the features of the SAM/BAM format and basic manipulation of these alignment files (view, sort, index, manipulate headers, extract data, etc.)
  • Use IGV to visualize TopHat2 alignments, view a variant position, load exon junctions file, etc.
Module 3 - Expression and Differential Expression (2013) (Faculty: Obi Griffith)
This module is focused on known genes and transcripts. In this module we will:
  • Get FPKM style expression estimates using Cufflinks
  • Perform differential expression analysis with Cuffdiff
  • Perform summary analysis with CummeRbund
Downstream interpretation of expression analysis (multiple testing, clustering, heatmaps, classification, pathway analysis, etc) will also be discussed.
Lab Exercise:
  • Run Cufflinks, Cuffdiff, and CummeRbund
  • Explore the output of these in R

Day 2

Module 4 - Gene fusion discovery (2013) (Faculty: Obi Griffith)
This module will use Tophat-fusion to visualize, annotate, and validate strategies for gene fusions.
Lab Exercise:
  • Identify gene fusions and create visualizations of them
Module 5 - Isoform discovery and alternative expression (2013) (Faculty: Malachi Griffith)
Explore use of Cufflinks in reference annotation based transcript (RABT) assembly mode and ‘de novo’ assembly mode. Both modes require a reference genome sequence.
Lab Exercise:
  • Run Cufflinks in alternate modes more conducive to isoform discovery and explore the results