Home> News> #bigdataforbloodcancer Blog 01/2023: Integrated data services are boosting the power of Big Data

#bigdataforbloodcancer Blog 01/2023: Integrated data services are boosting the power of Big Data

April 14, 2023 10:16 - x 00, 0 - 00:00

#bigdataforbloodcancer, BigData, Data science

The HARMONY Alliance is managing a growing data lake with more than 80,000 anonymized data sets derived from blood cancer patients from 19 countries. This unique research infrastructure is associated with a series of sophisticated data processing and analysis tools, allowing scientists to answer pressing research questions that cannot be addressed with other methods.

Integrated Data Services

HARMONY has developed a range of integrated data services to advance blood cancer research. These data services allow researchers to apply advanced techniques like statistical modelling, Machine Learning, Neural Networks and Deep Learning to the data. As a result, HARMONY data science teams can model complex processes such as disease progression, and they can identify new markers and perform risk stratification. 

Data harmonization and processing
Strict measures are applied to ensure data security and quality on the platform. “A new data set must go through several processing steps before it can be added to the growing data lake. These steps include anonymizing the data, evaluating the quality of the data, and harmonizing the data,” says data manager Laura Tur Giménez of HARMONY Partner GMV.

Dedicated research workspace
Researchers that want to analyze the data are invited to join the HARMONY Alliance as Associated Members and contribute their data to the platform. Once formally joined, they are welcome to submit a research proposal, which will be evaluated by scientific experts involved in the HARMONY Alliance. A workspace and user accounts for analysis will then be created, enabling research teams to work with a subset of the data lake, together with HARMONY’s data scientists. "We only extract the variables required to answer their specific research question and the cases that meet the inclusion criteria specified in their research proposal. This helps us guarantee data safety as well as efficiency,” says project manager Antonio Pérez Bautista of GMV.

Data analytic tools

Several data analytic tools have been integrated into the platform. At present, two applications that have been developed by HARMONY teams are available on the platform. The first is a data discovery tool, which allows users to visualize the data and perform basic analyses. Based on R/Shiny dashboard, it shows a descriptive analysis of the data, including the number of concepts for each category and subcategory and their values. The second is a Machine Learning tool to predict the risk of relapse after the first remission in AML patients treated without stem cell transplantation. The platform also features a tool that was developed outside the HARMONY project. This is the Zeppelin tool that can be used to inspect the data and work in the range of workspaces created for the HARMONY Project. “In the future, we aim to develop and add more tools, such as TIBCO Spotfire Data Visualization and Analytics Software or any other application developed by one of our partners,” says Antonio Pérez Bautista.

“Taking advantage of the installation of the TIBCO Spotfire tool, web-accessible dashboards can be created to facilitate the exploration of the data without the knowledge of a data scientist. In these dashboards, it is possible to define cohorts of patients filtering by any of the variables (e.g., age, treatment, genetic mutations, karyotype abnormalities, ELN2022 classification) and to perform initial exploratory analyses such as Kapplan-Meier curves, Hazard ratios calculation, and comparison of cohort profiles,” says Angela Villaverde and Javier Martínez from IBSAL, Institute for Biomedical Research of Salamanca.


Screenshots of the HARMONY Alliance visualization tool


Eric Sträng is a data scientist at HARMONY Partner Charité Berlin. He explains: “I am involved in the management and analysis of AML data. I am collaborating with data scientists from HARMONY Partners University of Bologna and University of Salamanca, amongst others. Clinicians and biologists are involved as well. As we know how to analyze the data, they are needed to interpret the results and guide the analyses. We are now in a position where we can present the first results. We have collected data sets from large numbers of patients, so we can use these to refine and confirm our preliminary results. This is a major strength of HARMONY. Meanwhile, data analysis tools are constantly being added to the Big Data Platform. The more features we can offer, the better.” 

AML occupies a special position in the HARMONY Alliance because it was the pilot project. HARMONY has already provided knowledge that may benefit patients with AML in the future, read more >
Recently, HARMONY’s efforts in other disease fields have started to pay off as well, delivering novel insights into the biology of a variety of hematologic malignancies, read more > 

Artificial intelligence

The HARMONY teams in Berlin, Bologna, and Salamanca are currently collaborating on a project regarding therapy optimization in AML patients. AML patients go through a range of therapy stages, including for instance intake, visiting a hematologist, first treatment, etcetera. “We are applying a mix of sophisticated statistical and AI methods to model these stages and study the effect on the risk of relapse. We are modelling several parameters of these stages, including the timing of the transition from one stage to the next. So, for instance, does a patient have a lower risk of relapse if he or she immediately proceeds to immune system inhibition?” says Gastone Castellani, Professor of Applied Physics and Biophysics at HARMONY Partner, University of Bologna.

Alessandra Merlotti, bioinformatics expert at the University of Bologna, explains: “We have also developed visual tools that allow users to generate graphs with different clusters of patients and how they evolve over time. In AML, each patient has around 100-200 variables and most of these are not informative for clinical outcomes. The tool uses a state-of-the-art algorithm to pick the most relevant outcomes and hence reduce the dimensionality of the data set. This makes it easier to handle and analyze the data, allowing clinicians to make sense of what is displayed in the graph.”

Javier Martínez and Angela Villaverde comment: “During the analysis, advanced statistical and machine learning libraries are used to promote insights that help to better characterize patients and improve diagnoses in the future. As a result of these analyses, models accessible from mobile phones were developed to predict patient risk. Multiple presentations were held at congresses of the European Hematology Association (EHA) and the American Society of Hematology (ASH) to share this work with hematologists from across the globe.”

Integrated services platform

By adding multiple additional data analytic tools, the HARMONY Alliance is transforming the Big Data Platform into an Integrated Services Platform. In this way, the Big Data Platform and the associated data processing and analysis tools could serve as a technology platform enabling advances in the field of Big Data analytics and Artificial Intelligence for healthcare. “The Big Data Platform should allow all stakeholders to exploit its analytical capabilities in an easy and intuitive way, answering their questions about different patient segments using HARMONY data, while maintaining HARMONY’s high standards for data safety and quality. This will be a great resource for stakeholders such as researchers, clinical professionals, social entities, policy makers, data scientists, and pharmaceutical companies. By expanding these services and making them even more sophisticated, the care for blood cancer patients across Europe will get a major boost,” concludes Gastone Castellani.

Interested to read more about our data science expertise? Click here to visit #bigdataforbloodcancer blog 02/2023 >

#Bigdataforbloodcancer: Big Data to accelerate better and faster treatment for Patients with Hematologic Malignancies

The HARMONY Alliance is a European Public–Private Partnership for Big Data in Hematology that is capturing and mining Big Data on various Hematologic Malignancies. Funded by IMI (now IHI, Innovative Health Initiative), and uniting more than 120 organizations such as European medical associations, hospitals, research institutes, patient organizations, pharmaceutical companies, IT companies, and health technology assessment/regulatory agencies.

The inclusion of all these stakeholders reflects the HARMONY ambition to develop tools that clinicians and patients can adopt, and also be of interest to regulators, payers, and health technology assessment bodies.

Receive the latest news. Click here to subscribe!