Exploratory Analysis of Biological Data using R (2013)

Course Objectives

Before we can begin to apply rigorous statistical tools to research data, we often need to approach our data intuitively, and look for meaningful associations, surprising patterns, or irregularities, to formulate hypotheses. This is commonly referred to as Exploratory Data Analysis–EDA. This workshop introduces the essential tools and strategies that are available through the free statistical workbench R. Participants should be able to modify the scripts and protocols we discuss for their research tasks, identify potential problems with their own data, and define their statistics needs for cases in which expert advice is required. Case studies with common research scenarios such as microarray data, and flow cytometry will emphasize practical skills. Writing your own R functions and analysis scripts will be introduced at the beginning of the workshop and skills will be gradually built on over the course of the lectures. Plotting and visualization is a key element of EDA and we will gradually build skills–from the elementary built-in routines via their (sometimes bewildering) array of parameters to sophisticated, publication-ready presentations.

Target Audience

Graduates, postgraduates and PIs who need to design and execute strategies for data analysis but have little or no formal prior training in statistics and /or familiarity with the R statistical workbench.

Prerequisites:

Your own laptop with R installed. If you do not have access to a laptop, you may loan one from CBW. Please contact course_info@bioinformatics.ca for more information.
Completing an online tutorial on the installation and basic use of R before the workshop.

Pre-Readings:

You need to complete our introductory R tutorial for the course beforehand. The tutorial is very accessible and designed for students who have never used R before. Please navigate to: http://www.biochemistry.utoronto.ca/steipe/R

Course Outline

Day 1

Module 1: The R Landscape (2013) (Faculty: Boris Steipe)

An overview of R's capabilities and how to expand them through the large, community-contributed resources such as CRAN and BioConductor how to keep abreast of best-practices
Reading and writing data from common biological file-formats, including numeric data, sequences, annotations, and networks
The difference between the various types of data objects in R and when each one is appropriate
Conditional selections and other filtering approaches
First experiments with writing R scripts

Module 2: Exploratory data analysis for biological data (2013) (Faculty: Boris Steipe)

In this module we will discuss the principles of Exploratory Data Analysis (EDA), how to compute descriptive statistical measures, how to smooth and transform data and how to visualize data using R's powerful and flexible plotting routines. Topics include:

EDA principles
Descriptive statistics: mean/median and variance, quantiles, outliers
Transformations and smoothing techniques (e.g. Lowess)
Plotting in R: basics, advanced options, special packages and best practices

Module 3: Hypothesis testing for EDA (2013) (Faculty: Boris Steipe)

Common statistical tests and their underlying assumptions about the data
p-values, distributions, Z-scores and "significance"
False positive and false negative error rates
Bootstrap and resampling techniques
Multiple testing corrections: Bonferroni, family wise error rate, false discovery rate
Non-parametric alternatives
Power calculation and sample size

Lab Practical: Working with your own data

Day 2

Module 4: Data reduction (2013) (Faculty: Boris Steipe)

Much of our biological data is very high-dimensional, and accordingly difficult to assess. However, powerful methods exist to simplify the problem. Topics include:

Visualizing multi-dimensional data
Data reduction with Principal Components Analysis
Using explicit models for data reduction

Module 5: Clustering Analysis (2013) (Faculty: Boris Steipe)

Very many clustering methods are in common use in the biological sciences and that fact alone should warn you that none is appropriate for all data under all conditions. Topics include:

Calculating "distance" between (high-dimensional) data points
Clustering principles and methods: hierarchical-, centroid-based, and information-based approaches in R
Assessing the quality of clustering results
Density estimation as an alternative>/li>
Outlook: classification

Module 6: Regression Analysis (2013) (Faculty: Boris Steipe)

Types of models for regression analysis in R
Linear regression
Calculating and plotting residuals
Predictions
Non-linear regression with arbitrary functions

blogtest

Latest News

Exploratory Analysis of Biological Data using R (2013)

Course Objectives

Target Audience

Course Outline

Day 1

Day 2

0 comments:

Post a Comment

Popular Posts

Recent Posts

Social

More Links

About Me

Blog Archive

8,521,717

44,112

2,358

RSS Feeds

Featured Posts

Labels

Popular Tags

About

Featured Posts

Featured Posts

Recent Comments