WP3 has been assigned the task of creating a common data model, to support homogenization of the data creation of relationships among the datasets. This will facilitate comparisons between treatments, patient histories, and clinical and demographic data.
- Exploit its ability to combine sets of heterogenic data structures from each source (cooperative groups, hospitals, academic partners, and so on) into such groupings/data types as demographics, clinical information, epidemiology, molecular data, etc.;
- Support future data collection from existing data sources;
- Standardize the format and content of the observational data;
- Create models that will facilitate the establishment of relationships between the data;
- Create a definition for data-use requirements that will be aligned with HARMONY’s needs;
- Harmonize, through the application of algorithms and rule engines, the processing of data to create an organized system for the categorization of findings.
Bayer, Celgene, EBMT, ELN, EORTC, GMV, HULAFE, IBSAL, Janssen, LeukaNET, MediUni Wien, Menarini, Novartis, Takeda, Ulm University, UNIBO, University of York.
Considerable progress has been made in terms of establishing the Big Data platform and the methodology for analyzing data:
1. The HARMONY data platform is now fully operational
- Our hosting facility at CNAF (CNAF national center of Italian Institute for Nuclear Physics) became fully operational and became ISO 27001 certified;
- All software components of the platform are now in place, tested and operational;
- The Quality Assessment process for the evaluation of data sources is now fully supported through analytics and visualization;
- HARMONY partners can now easier identify relevant datasets through the first release of a data discovery tool
- The first three AML datasets from our public partners as well as from the pharma partners have been uploaded to the HARMONY platform
2. The governance process for the intake of data became also fully operational
- The legal requirements for the data platform were established and the rules for data governance were written:
- Data security measures became operational; and
- A trusted third party is now engaged supporting the ‘de-facto’ anonymization and secure upload to the Harmony platform
- The standard operation procedure for data anonymization was established.
- HARMONY's Big Data platform has been established;
- A common data model that adheres to the FAIR (findable, accessible, interoperable, and reusable) data-sharing principles has been created;
- HARMONY has begun developing and testing models based on available data on AML (TCGA public data, UNIBO internal data and Sanger Institute data);
- HARMONY has started analyzing the description of datasets;
WP3 will expand the recently established Harmony big data platform in 2018 with an initial focus on AML, followed by MDS and CLL structures and data.
- Industrialize the process for intake and upload of data to the HARMONY big data platform.
- Expand the intake of data sources to the other indications
- In conjunction with WP2 and others, implement a systematic approach for the management of taxonomies across the different indications
- Pilot the process and technical infrastructure for answering research questions
- Complete the first research question for acute myeloid leukemia (AML)
Work Package Leadership