From Data Strategy to Implementation to Advance Cancer Research and Cancer Care: a French Comprehensive Cancer Center Experience

Pierre Heudel, Hugo Crochet, Thierry Durand, Philippe Zrounba, Jean-Yves Blay


In a comprehensive cancer center, effective data strategies are essential to evaluate practices, and outcome, understanding the disease and prognostic factors, identifying disparities in cancer care, and overall developing better treatments. To achieve these goals, the Center Léon Bérard (CLB) considers various data collection strategies, including electronic medical records (EMRs), clinical trial data, and research projects. Advanced data analysis techniques like natural language processing (NLP) can be used to extract and categorize information from these sources to provide a more complete description of patient data. Data sharing is also crucial for collaboration across comprehensive cancer centers, but it must be done securely and in compliance with regulations like GDPR.


Cancer is a disease with multidimensional complexity that affects millions of people worldwide, and it requires a comprehensive approach for effective treatment and management [1,2]. Comprehensive cancer centers play a crucial role in providing state-of-the-art cancer care and conducting cutting-edge research to advance our understanding of the disease [3]. To effectively fulfill these roles, comprehensive cancer centers need to adopt effective data strategies that enable the efficient collection, analysis, and use of data [4,5]. Contending with a continuously expanding volume and a variety of clinical data poses challenges and opportunities for a comprehensive cancer center.

Materials and method

The French comprehensive cancer center Léon Bérard, located in the Rhône-Alpes region, launched in 1989 a pilot phase to implement a single EMR procedure for each cancer patient. This procedure was adopted from January, 1993, and computerized medical observations and hospitalization reports, then progressively integrated prescriptions of chemotherapy and blood products since 1996, external medical reports since 2000, and additional files since 2005 such as medical imaging, histology results, other medication, etc. In July 2002, EMR became the reference file, and a full electronic format for medical records was definitively adopted in 2006 with regular updates performed on a daily basis.


In parallel with the daily use of digital health data and the development or integration of new computer software, we defined a data strategy making it possible to have reliable, accessible and easily reusable data for health data research aligned with international references (FAIR guiding principles) [17]. Our EMR mining process involves stages like obtaining data access, using SQL and tools for data extraction, and then refining the data, which includes both manual verification and automated processing via NLP.


We present here the development of a data strategy for health data research at the Léon Bérard cancer center with results of extraction as examples. The strategy is based on three main axes: data collection, data analysis, and data sharing. The CLB uses EMRs and PROMs to collect data on cancer patients, including tumor-centered data, vital status and patient characteristics. The CLB also collects toxicity data through software that integrates it directly into medical consultation or hospital stay reports.


Data is a critical asset for comprehensive cancer centers, and effective data strategies are essential for delivering high-quality care, advancing research, and improving patient outcomes. The adoption of electronic medical records, patient-reported outcome measures, and data sharing agreements are all key components of a successful data strategy. By implementing these strategies, comprehensive cancer centers can leverage data to drive advancements in cancer research and provide the best possible care

