blogtest: August 2013

Saturday, 31 August 2013

Computational Biology Corner: Measurements Are Not Data!

Contributor

Mathukumalli Vidyasagar is the Founding Head of the Bioengineering Department, University of Texas at Dallas. He is a Fellow of the Royal Society, UK. Read more

After (too long!) an absence, Mathukumalli Vidyasagar ("Sagar") returns with his Computational Biology Corner column. This time Sagar recounts an incident that reinforces the need to critically review your measurements!

In my last column dated about a year ago, I had addressed the lack of standardization in biological instrumentation. In that column I bemoaned the fact that two different platforms, each of which claim to measure exactly the same quantity, namely the amount of messenger RNA (mRNA) produced by various cancer tumor tissues, produce wildly different measurements. I must apologetically return to the same theme in this column, albeit under different circumstances.

To refresh the memory of the reader (and to introduce my earlier column to those who had not read it when it originally appeared), my students downloaded data on about 580 ovarian cancer tumors from the web site of the National Cancer Institute (NCI), specifically The Cancer Genome Atlas (TCGA) project. Gene expression levels of the several genes in each tumor sample had been measured using two different platforms. But when the sets of measurements were plotted against each other, there was no resemblance whatsoever between them! Therefore, any prognostic predictor based on one set of measurements will fail miserably on the other set of data, and it does not matter which one is used! To us engineers, it sounds fairly absurd to say "Well, if you use this platform, then these are the genes that give you the best predictions of your chances of recovery, but if you use that platform, then an entirely different set of genes are the best predictors."

But today's column goes one better, because it concerns two sets of measurements taken on ostensibly the same platform, but at two different points in time. To me the lack of repeatability on the same platform is far worse than repeatability across platforms, because it would cause me to question even the worth of a single platform.

In brief here is the story. About 18 months ago, one of our collaborators measured the expression levels of 1,428 micro-RNAs (miRNAs) in 94 tumors of endometrial cancer. He found that, out of the 1,428 x 94 = 134,232 measurements, about 43% came out as "NaN" (Not a Number). We had assumed that the NaN readings were due to the fact that the quantities being measured were too small to register, and thus replaced them by a very small number. Later on we got a fresh supply of 30 more tumors, and when some of the same miRNAs were measured on the new samples, there were hardly any NaN entries! So our collaborators re-measured the miRNAs on three of the old samples -- and this time again there were hardly any NaN readings! Exploring the mystery further, our collaborators discovered that the company that manufactured the hybridization system for the measurement platform had gone out of business, so the core facility was now using a different (and supposedly functionally equivalent) system for hybridization. Except that the expression levels were now 4 to 10 times higher, or an addition of 2 to 3 on a binary logarithmic scale. So the two measurement systems, taken as a whole, were not at all identical. However, knowing this, somehow we could "normalize" for this phenomenon. But we were hardly prepared for what happened next.

One particular miRNA measurement on the old and the new samples did not match at all!. The diagram below shows the situation. The blue curve is the set of measurements of the original samples and the red is for the new batch of 30 samples. There is the well-known two-sample K-S test that allows us to test whether or not two sets of samples are generated by the same (unknown) probability distribution. In this case however no such fancy mathematics is needed because one can see with the naked eye that the two sets of samples have nothing to do with each other.

Read more

OncodriveFM:An approach to uncover driver genes or gene modules

Description

Oncodrive-fm is an approach to uncover driver genes or gene modules. It computes a metric of functional impact using three well-known methods (SIFT, PolyPhen2 and MutationAssessor) and assesses how the functional impact of variants found in a gene across several tumor samples deviates from a null distribution. It is thus based on the assumption that any bias towards the accumulation of variants with high functional impact is an indication of positive selection and can thus be used to detect candidate driver genes or gene modules.

How it works

Oncodrive-fm starts by computing three metrics of the functional impact of each non-synonymous SNVs (nsSNVs) found in genes across a list of tumor samples. Any measure of the impact of nsSNVs on protein function (or FI score) could in principle be used here. We have chosen three well-known methods whose scores may be obtained in a high-throughput manner to evaluate hundreds of nsSNVs in a few minutes. Stop-gain SNVs (stSNVs) and frameshift-causing indels (fsindels) are incorporated to the bias analysis by assigning them scores that are comparable to the highest-ranking tier of nsSNVs. Finally, synonymous SNVs (sSNVs) are taken into account with scores equal to those of bottom ranking nsSNVs.

The second step starts by averaging the FI scores of variants per gene and comparing them to the distribution of scores of variants in functionally similar genes. If somatic SNVs were obtained using a whole-genome or whole-exome sequencing approach, the null distribution contains all SNVs and fsindels detected across tumor samples. We call this the internal null distribution. On the other hand, if only a limited number of genes have been sequenced, the null distribution of each gene is composed of nsSNVs that occur naturally in human populations, or external mull distribution. The mean FI of each gene across all tumor samples is then probed for significance employing a permutations strategy.

How it performs

We have applied the Oncodrive-fm approach to three datasets of genes with SNVs and fsindels in samples of different tumor types: glioblastoma multiforme (gbm), and serous ovarian carcinoma (soc) produced within The Cancer Gene Atlas (TCGA) project and chronic lymphocytic leukemia (cll), produced within the International Cancer Genomes Consortium (ICGC) initiative. We were able to detect most genes also pinpointed by MutSig (a method that searches recurrently mutated genes) as significantly biased in gbm and soc. Moreover, we were able to detect recurrent genes with low functional impact which may not constitute true drivers and we uncovered other top-ranking functionally affected genes, some of which could be lowly recurrent drivers.

How to install and run

You will find detailed information on how to install OncodriveFM and run some examples at Bitbucket

How to cite

If you use OncodriveFM, please cite it as Gonzalez-Perez A and Lopez-Bigas N. 2012. Functional impact bias reveals cancer drivers. Nucleic Acids Res., 10.1093/nar/gks743.

Any comments or feedback, please contact

Abel González Pérez, PhD

Bioinformatician, Postdoctoral Researcher

Research Unit on Biomedical Informatics - GRIB

Parc de Recerca Biomèdica de Barcelona (PRBB)

abel.gonzalez@upf.edu

Original version

We distribute the original PERL implementation of OncodriveFM in a tar ball below. You will need the PERL interpreter installed in your computer as well as the Statistics::Descriptive cpan package in your PERL5LIB directory. You will also need an R installation. The functional_impact_analysis.pl and pathways_functional_impact_analysis.pl scripts use R it. If your R executable cannot be invoked directly, please make a shortcut or edit these two scripts accordingly. You can run the examples provided (gbm and cll) by doing:

>./pipeline_launcher.pl ../config/cll.config

>./pipeline_launcher.pl ../config/glioblastoma.config

from the bin directory of the installation.

You may open and check the config files for an explanation of all configuration arguments.

OncodriveFM 0.0.1 is the version presented in the submitted paper and can be downloaded from here and the documentation from here

Computational Biologist @ Brigham and Women’s Hospital

The Personalized Cancer Medicine Partnership (PROFILE) is a collaborative venture between two Harvard Medical School affiliated institutions- the Brigham and Women’s Hospital and the Dana-Farber Cancer Institute. The mission of PROFILE is to advance translational and personalized cancer medicine by implementing tumor genomic profiling on all cancer patients treated at these institutions. PROFILE will employ state-of-the art technology to generate a detailed profile of key “druggable” or otherwise “actionable” cancer genomic alterations in a CLIA-approved and “real-time” process to facilitate rapid clinical application. These cancer genomic profiles will be used to guide patient treatment and/or stratification for clinical trials of novel anticancer agents.

Role and Responsibilities:

This exceptional opportunity offers the chance to work at the forefront of translational cancer technologies and applications. The Computational Biologist will join the current bioinformatics group to analyze and represent genomic data generated by next-generation sequencing (NGS). The Computational Biologist will report to the Group Leader of Bioinformatics and will have the following responsibilities:

Development of novel algorithms and tools for the analysis of somatic and germline genomic alterations.
Identification and evaluation of analysis tools for the identification of variants from NGS data.
Definition, application and validation of computational approaches for cancer genome analysis from next-generation sequencing (and other) data
Analysis of datasets from high-throughput molecular assays utilizing appropriate bioinformatics and/or genetic analyses
Identification of variants in individual samples as well as performing cohort analysis on groups of related samples
Additional tasks including, but not limited to: maintaining awareness of emerging approaches and methods in computational biology as they relate to clinical applications of next-generation sequencing, supporting the visualization and analysis of existing data, developing innovative solutions for the analysis and management of genomic data, including next-generation sequencing data.

Qualifications:

Masters in Computer Science, Bioinformatics, Engineering, Math, Statistics, Physics, or a related quantitative discipline with 3-5 years’ experience, or PhD in the same fields with 1-2 years’ experience required. Advanced statistical and computational data analysis experience using (R, Perl, or MATLAB); demonstrated experience modeling complex multi-dimensional biological data, and strong programming skills (using Java or Perl) in a UNIX/Linux environment are essential. Experience in molecular biology, cancer genomics, or a related field; proven experience in the development and validation of NGS algorithms and knowledge of relevant databases, methods and analytical tools used in next-generation sequencing data analysis/interpretation is highly desirable. Candidate should have the ability to work independently or collaboratively on several concurrent, fast-paced projects. A minimum two-year commitment is required.

How to apply: Interested candidates should provide a resume and a brief cover letter summarizing previous experiences, training, and qualifications, as well as names and contact information for at least two references.

Please contact: Anna Cooley

AnnaC_Cooley@dfci.harvard.edu

(617) 582-8643

Read more/Apply

Senior Scientist - Computational Biology @ Johnson & Johnson,San Diego, CA, US

Description:

Johnson & Johnson Pharmaceutical Research & Development L.L.C., a member of Johnson and Johnson's family of companies, is recruiting for a Senior Research Scientist for Molecular Profiling, located in La Jolla, CA.

Johnson & Johnson Pharmaceutical Research & Development, L.L.C. develops treatments that improve the health and lifestyles of people worldwide. Research and development areas encompass novel targets in neurologic disorders, gastroenterology, oncology, infectious disease, diabetes, hematology, metabolic disorders, immunologic disorders, and reproductive medicine.

This position is with the Systems Pharmacology and Biomarkers team and requires a highly motivated and skilled Principle Scientist to assist in the early development of innovative therapies for immune diseases such as rheumatoid arthritis, psoriasis, asthma, COPD and inflammatory bowel disease.

This position requires the Senior Research Scientist to accelerate scientific discovery for immunology through the use of informatics in driving innovation and enabling effective decision making. The successful candidate will work alongside scientists in discovery, and clinical groups to design new experiments, analyze and interpret data, and effectively communicate results. Data analysis will focus on systems biology and network pharmacology. Impactful results will be used in determining the efficacy and safety of compounds, impacting dose decisions, selecting novel indications for targets and compounds, biological interpretation of results, and the identification of prediction biomarker for patient stratification and precision medicine. The successful candidate will represent the department on cross-functional disease focused teams.

Team members are encouraged to publish and to develop and maintain a strong professional network within the internal and external scientific communities.

J2W:LI

J2W:BIO

J2W:NSJ

Qualifications

The successful candidate will have at a minimum a PhD degree in Systems Biology, Immunology, Epidemiology, Statistics, Computational Biology, Bioinformatics or related field combined with a minimum of 4 years of academic and/or industry experience. The ability to apply data mining, machine learning, and emergent algorithms to pre-clinical, clinical and health outcomes data is required. Experience in network analysis and/or modeling biological pathways is preferred.

Experience in handling exploratory data such as microarrays, protein arrays, ELISA, MS or NGS is required. Proficiency in software such as SAS, S-PLUS, R, Omicsoft, Cytoscape is desired. A strong publication track record is desired. The ability to work independently, communicate clearly, collaborate effectively and establish effective and trusted partnerships with internal and external scientists is essential.

This position is based in La Jolla, CA and may require up to 5% travel (domestic & international).

BE VITAL in your career, Be seen for the talent you bring to your work. Explore opportunities within the Johnson & Johnson Family of Companies.

Primary Location:North America-United States-California-San Diego

Organization: Janssen Research & Development, LLC. (6084)

Apply here

Computational Biologist @ Provincial Health Services Authority (PHSA)

Computational Biologist

In accordance with the Mission, Vision and Values, and strategic directions of Provincial Health Services Authority patient safety is a priority and a responsibility shared by everyone at PHSA, and as such, the requirement to continuously improve quality and safety is inherent in all aspects of this position. The Computational Biologist performs biological sequence searches related to the operation of a large scale, high throughput DNA mapping and DNA sequencing production facility.

Qualifications:

Education, Training and Experience
Graduation from a recognized Bachelor of Science Program in either Biological Sciences or Computer Science.
Two (2) year’s of recent related experience or an equivalent combination of education, training and experience acceptable to the GSC Group Leaders.

Skills and Abilities

Experience with relevant operating systems.
Demonstrated interpersonal skills including the ability to work effectively with others in a team environment
Demonstrated ability to efficiently organize work assignments and establish priorities.
Ability to use related equipment.

We invite you to apply by clicking the "Apply Online Now" button where you can register for the first time or enter your Username and Password in order to re-access your profile on our system.

Applications will be accepted until this position has been filled.

For more information on all that the PHSA has to offer, please visit: http://careers.phsa.ca

For more information about the BC Cancer Agency, please visit the website at:www.bccancer.ca

Internal competition closes September 8, 2013. Internal applications received after this date will be considered as late applications.

***Employees of PHSA must apply via the "Internal Application Process". Current PHSA staff who apply to this posting using this external site will be considered with other external candidates. Seniority will not apply.***

The PHSA is committed to employment equity and hires on the basis of merit. We encourage applications from all qualified individuals, including Aboriginal peoples, persons with disabilities and members of visible minorities.

Read more

Friday, 30 August 2013

JRF/ SRF @ Indian Grassland and Fodder Research Institute

Project	Post Name	Qualification
PPV&FRA	SRF	M.Sc in Genetics/M.Sc in Agri /M.Sc Botany/M.Sc agriculture Botany .Desirable : Experience in handling of crop plants
NIFBSFARA		M.Sc /M.Sc in Agri /MA in RS & GIS/Geography/Botany.Desirable : Work Experience in ERDAS Imagine, ARCGIS, GPS use in Grasslands identification, characterization an mapping , geo processing, GDB creation and wide experience of GT using GPS etc.
ICAR		M.Sc Agri/M.Sc in following subjects Agronomy/ Soil Science/Soil and Water Conservation /Agro forestry Plant Physiology/Agri Extension/Botany or B.tech (Agricultural Engineering ) Or other allied subjects. Desirable: M.tech (Soil Water Conservation/ Engineering/Water Resource Develop,emt/Irrigation and Drainage Engineering) Experience of working in On -Farm projects.
DBT	JRF	M.Sc. in Biotechnology/Molecular biology /Biochemistry.Desirable: Experience in Tissue culture/gene cloning/ Bioinformatics/ Transformation work in crop plants.

Age : 35 yrs for men and 40 yrs for women candidates

No.of Post: 1

Pay Scale : Rs.16000

How to apply:

The candidates may come along with their application & Bio-data on plain paper with passport size photograph, attested copies of mark sheets and certificates and produce original at the time of interview.

Post Name	Post Name	Date & Time
PPV&FRA	SRF	12/09/2013 at 10.00 AM
NIFBSFARA	SRF	13/09/2013 at 10.00 AM
ICAR	SRF	16/09/2013 at 10.00 AM
DBT	JRF	18.09.2013 at 10.00 AM