
Advances in tumour profiling technologies, including sequencing and imaging platforms, enabled the generation of petabytes of biomedical data. This is especially true in cancer research where large sets of tumours and preclinical cancer models have been characterized at the (epi)genomic, transcriptomic, proteomic and imaging levels. Although these big biomedical data offer formidable opportunities to improve our understanding of cancer treatments, there is a risk for artifactual discoveries if analyzed with inappropriate bioinformatics methods. A major issue in most biomedical studies analyzing preclinical and clinical genomics data is “overfitting”, i.e., making inferences or building predictive models seemingly supported by the data at hand but failing to validate in independent data. My team’s contributions to cancer research lie in three main areas: (i) prognostic models from genomics profiles and radiological images; (ii) predictive models for therapy response from genomics profiles; and (iii) research reproducibility, open science, and development and sustainable scientific software.