Print Page   |   Sign In   |   Become a Member
Site Search
New Investigator Digest
Blog Home All Blogs
Search all posts for:   


View all (27) posts »

The Era of Big Data - Are You Ready?

Posted By Connections Editor, Friday, April 13, 2018
Updated: Wednesday, April 11, 2018

Nina Cabezas-Wallscheid, François Mercier and Cedric Tremblay


You (or maybe your boss, postdoc, graduate student) pushed to generate that fancy OMICs data from your favorite population (exciting!). Now you have to face the momentous task of sitting in front of the computer trying to scrutinize terabytes of information. Does this situation sound familiar to you? In this monthly Connection, we aim to share our own experiences on how we handled the challenge of the “era of big data”: why and how we got started on learning bioinformatics, how we found good collaborators and/or recruited students.


François Mercier, M.D., McGill University (Montreal, Canada), Member of the ISEH New Investigator Committee

If I could travel back in time and give pieces of advice to my younger self, one of these would be to start learning bioinformatics earlier in my training. When I joined the laboratory of David Scadden in late 2010, I was coming from a clinical field and interested in “big questions” in the field: understanding the mechanisms of clonal evolution in acute myeloid leukemia (AML) and developing genetic perturbation screens in animal models. Unbeknown to me at the time, these projects would lead to very large amounts of data! I needed to find ways to analyze it in order to extract meaningful biological information. Luckily, I was able to interest collaborators with specific expertise in computational biology. First, we collaborated with Winston Hide at the Harvard School of Public Health, who developed a handy pathway analysis platform named Pathprinting. In Winston’s laboratory, I met a very talented post-doctoral fellow, Jiantao Shi, who initially performed most of the computational analysis. When Winston moved his laboratory to the UK, we continued to collaborate with Jiantao and his new supervisor, Franziska Michor, a mathematician who does extremely interesting work in modeling of cancer growth and therapy. These collaborations were very useful in identifying several new genetic hits in AML. In retrospect, I think that I was very lucky in meeting collaborators who were interested in my ideas and had a solid understanding of cancer biology.


From my personal experience, I can identify a few salient points. First, it is very important to seize opportunities to become familiar with common terminology, tools for basic analysis, and consider downstream analysis in the design of the experiments. This can be done by consulting early with a computational biologist or, even better, by getting familiarized with the analysis pipeline. I personally took classes in Unix, R, and RNA-sequencing analysis, offered for free at Harvard. Second, it is important to manage the collaborations with computational biologists with the attention they deserve, including setting up frequent meetings and acknowledging everyone’s contributions when the time comes to publish. Third, see the fun in learning this new skillset, including the enjoyment of analyzing your first RNA-seq dataset or the beautiful graphs generated using some R packages! My goal for my students is that they become proficient in the basic analysis techniques, as I think that it will become an essential skill in the workplace.


A couple of advice: When facing a problem, Google can be your best friend! There are many online resources easily accessible. Also, learn from the start how to organize your data. A good primer can be found here:


Cedric Tremblay, Ph.D., Australian Centre for Blood Diseases (ACBD) – Monash University (Melbourne, Australia) Member of the ISEH New Investigator Committee.

During my undergraduate years, I took an initiation course in bioinformatics as I was very curious about this emerging field of research. The course was very basic but helped me to familiarize myself with Unix and R at a time when microarrays were trendy, and RNA-seq or ChIP-seq were still regarded as esoteric. Unfortunately, the limited resources available in my host laboratory prevented us from performing large-scale unbiased approaches like microarrays, RNA-seq or ChIP-seq. As a result, I lost track of the light-speed pace of bioinformatics for half a decade. In retrospect, I realize that it would have been a great opportunity to take bioinformatics courses for keeping me up-to-date.

Subsequently, I had to start to learn bioinformatics from scratch again when I started my postdoc in Prof. Trang Hoang’s laboratory at the Institute for Research in Immunology and Cancer (IRIC) in Montreal. Luckily for me, there was a team of dedicated bioinformaticians at the IRIC, who helped me to appropriately design experiments and analyse the data generated by large-scale experiments. After 2 years in the laboratory, trying to perform home-made analyses, we had the extremely talented PhD student Veronique Lisi joining us from the Systems Biology Program. Although I didn’t realize this back then, Veronique joining us was a life-changing moment during my postdoc, as I could once again familiarize myself with bioinformatics. Through really informative discussions with her and the other bioinformaticians from the platform at IRIC, I acquired a better understanding of the in-house and commercially available tools to analyse the enormous amount of data that we generated in the laboratory. Although I could grasp the complexity of the task required to perform home-made analyses, I also acknowledged my limitations and realized that the bioinformatics field was too complex for me to perform the type of analyses that I was aiming to do in the future.


When I relocated to the Australian Centre of Blood Diseases (ACBD) in Melbourne for my second postdoc, large scale experiments like RNA-seq, ChIP-seq and ATAC-seq became the norm and single-cell analyses emerged as the new standard for interrogating molecular mechanisms at the clonal/cellular level. Given that a few of my projects aimed to study small populations of rare cells, I realized that my limited understanding of bioinformatics would be a limiting factor for extracting relevant information from the enormous amount of data to be generated within the coming years. To manage this, I established a fruitful collaboration with Prof. David Powell, the Head of the Monash Bioinformatics Platform, who became a great ally for designing experiments, analyzing data and extracting meaningful results for publications. This collaboration literally saved me time, energy and a few sleepless nights to meet tight deadlines! The institute also hired Dr. Nick Wong – a dedicated bioinformatician with strong wet lab experience, who enabled our team to work in close collaboration for designing experiments, but also generating and analysing data. The presence of an on-site bioinformatician involved in every step of the projects and organizing practical workshops on publically available tools (like R) contributed to significantly improve my skills in bioinformatics. Although my understanding of the language of bioinformatics has significantly improved over the past years, I think that a bioinformatician represents an extremely valuable asset to any research team aiming to perform large-scale OMICS. Given that the world of research is currently evolving at light speed with the emergence of single-cell analyses, I expect bioinformaticians to be part of every research laboratory in the near future, as their complementary expertise will facilitate every aspect of the large-scale experiments that will become the norm in medical research.


In a nutshell: As the world is quickly evolving in science, remaining up-to-date by following courses or workshops on bioinformatics represents the best strategy for thriving during the era of big data.


Nina Cabezas-Wallscheid, Ph.D., Max Planck Institute of Immunobiology and Epigenetics (Freiburg, Germany), Member of the ISEH New Investigator Committee.

It was during my PhD that I generated my first OMICs dataset. Back then (2010), generating and analyzing RNA-seq data was indeed a great challenge. User-friendly bioinformatics pipelines were almost non-existent, and the field was moving too fast to deal with the technical bias generated from the data. Excitingly, when I started my postdoc in 2011 in Andreas Trumpp’s lab (Heidelberg, Germany), one of my first goals was to establish a protocol to generate low-input RNA-seq data from rare populations such as hematopoietic stem cells. I luckily had a bit of experience in generating RNA-seq data, so I rapidly got “hands on”. However, since the lab of Andreas had no bioinformatician, I knew that sooner or later I would face the challenge of the analysis. Once I got my first data set, I started working with commercially available analysis tools.  Although user-friendly, these tools were a black box. Therefore, to familiarize myself with bioinformatics, I attended R programing courses and RNA-seq summer classes. Although I was proud of my resulting home-made analyses, I rapidly realized that the type of analysis I aimed for and the bioinformatics field in general were too complex for an amateur like me. If I was willing to dig deeper into the data, I had to find a good collaborator. I was very lucky to meet Wolfgang Huber (EMBL) during one of the bioinformatics courses, who is an expert in the RNA-seq field (Wolfgang and his team developed DESeq).  Wolfgang and our lab started a collaboration, which also included one of his extremely talented PhD students Alejandro Reyes. Working with Alejandro was a very exciting time. I learned so much during our regular meetings: tools for the analysis, limitations of the technique, quality controls, statistics, etc. In a follow-up study in which we faced the challenge of analyzing single-cell RNA-seq data, I also had the great opportunity to work very closely with Oliver Stegle and his postdoc Florian Buettner (EBI). From them, I learned the best tools, practices, and limitations in the analyses of single-cell data. All these collaborations ended up being extremely fruitful, not only due to significant publications, but also for the amazing experience acquired working side-by-side with such talented scientists and bioinformaticians like Alejandro, Wolfgang, Florian and Oli. I recently started my own lab, and now have the great support of Dominic Grün, who is an expert on single-cell RNA-seq analysis and my laboratory neighbor. My goal is to build a solid bioinformatics background for each member of the lab. Very exciting time! Some pieces of advice: I recommend that you attend bioinformatics courses. Not only to get the general knowledge on the subject, but also to learn “the language” of bioinformaticians, find a good collaborator with whom you can thoroughly discuss your data and spend a considerable time in front of your data. It really helps!


We hope our experiences can be useful to you when charting your course on how to deal with big datasets. And if you feel you are not ready, do not worry. You are not alone.


And finally, a tool box to get you started:


Examples of free online classes for Unix, R, and RNA-seq analysis:


Examples of useful links for downstream analyses of your data:

Gene ontology (free):,

Protein interactions (free):

Pathway analysis ($):,

Gene set enrichment analysis (free):


This post has not been tagged.

Share |
Permalink | Comments (0)

Association Management Software Powered by YourMembership  ::  Legal