1000shapes (Biotechnology)
Company: 1000shapes GmbH is a ZIB spin-off that transfers research in life sciences into products for clinical applications.
Project: The project will deal with the integrative analysis of large medical data sets coming from a large study about knee osteoarthritis, one of the most common causes of disability in adults. Based on clinical, imaging, genomics and proteomics data the project team will work on and with state-of-the-art algorithms for analyzing this data. The ultimate goal is to integrate the single data sources into a large modelling framework which allows detection / diagnosis of the disease.
Hosting Lab
The members of the MedLab develop new mathematical methods that allow identification of disease specific signatures within modern large-scale bio-medical datasets, such as genomics or proteomics sources. Having these signatures (e.g. changing concentrations of a blood protein during some viral infection) will allow to build new diagnostic test but also to gain insights about disease mechanisms. This is based on the insight that changes in cells – while they undergo transformation from “normal” to a malignant state (e.g. during infections) – happen on many biological levels, including genes, proteins and metabolites. Integrative analysis of all these levels allows generation of more detailed and informative models about a disease when compared to just analyzing the effect of single biomarkers, such as blood values or proteins levels.
Sponsor
The project is in close collaboration with 1000shapes GmbH, a ZIB spin-off that transfers research into industrial applications. 1000shapes provides advanced solutions in image and geometry processing for 2D and 3D product design, covering the full spectrum from measurement, analysis, planning up to manufacturing. In the medical field, 1000shapes is interested in analyzing medical image based data, such as x-ray, CT or MRT data.
Project
The project will deal with the integrative analysis of large medical data sets coming from a large study about knee osteoarthritis, one of the most common causes of disability in adults. Based on clinical, imaging, genomics and proteomics data the project team will work on and with state-of-the-art algorithms for analyzing this data. The ultimate goal is to integrate the single data sources into a large modelling framework which allows detection / diagnosis of the disease.
Problems and (some) hope: Most of the data coming from available bio-medical data sources, such as images or proteomics data, is ultra high-dimensional and very noisy. At the same time, this data exhibits a very particular structure, in the sense that it is highly sparse. Thus the information content of this data is much lower than its actual dimension seems to suggest, which is the requirement for any following step in this project: the dimension reduction of the data with as little loss of information as possible.
Unfortunately the sparsity structure of this data is complex, (in most cases) not known a-priori, and usually does not coincide with often assumed patterns such as joint sparsity or Gaussian noise. This means, although the data is highly sparse, the sparsity structure as well as the noise distribution is non-standard. However, specifically adapted dimension reduction strategies such as compressed sensing do not readily exist e.g. for proteomics data.
However, methods exist that allow to identify the sparsity structure of the contained information from very high-dimensional, noisy -omics and imaging data. Once this has been achieved, the next step is the integrating of the (low-dimensional) information into one unified mode. We will use a network-based approach, modelling the various biological levels through a multiplex network coming from existing databases such as known protein/protein or gene/protein interactions. The hope is that this model can shed some light on the mechanisms of osteoarthritis and maybe even allow new ways of early diagnosis of this disease.
Requirements
The prospective participant should:
- have a background in mathematics, bioinformatics or computer science,
- have experience in network analysis,
- have experience with a high-level programming language (e.g. C/C++, Java or Python) and a statistical software package such as R,
- have attended classes in the area of data mining or acquired the foundations of this field by some other means
- be prepared to work with very large datasets from industry partners (which involves preprocessing, e.g. to overcome inconsistencies and incompleteness).
- Ideally he or she is familiar with the biological background and has already worked with biological data-sets,
- and- finally – has experience in working in a Linux/Unix environment.