A mind-boggling amount of healthcare data is generated annually, namely about a trillion gigabytes of data, according to latest estimates. Thanks to the emergence of advanced analytical approaches such as machine learning and deep learning, it has only recently become possible to make sense of such massive data sets. They may help lift the veil on the complexity of human biology. Read about two promising examples.
257 million people worldwide are chronically infected with Hepatitis B (HBV).1 25% of them are expected to develop hepatocellular carcinoma, one of the deadliest known cancers. There are still no curative therapies available to date.
It is an extremely complex disease, and not every virus is the same: "The diversity of the Hepatitis B virus at patient level is enormous. In each patient you can find an ecosystem of viral variants," says Alan Mueller-Breckenridge, Systems Biology Scientist at Roche Pharma Research and Early Development (pRED).
Until recently, scientists were not able to understand the vast amount of genetic heterogeneity. Today, however, by pairing next generation ultra-deep sequencing with machine learning, these mutant viral strains can be characterised with a high degree of precision. For this purpose, scientific, clinical and computational experts from Roche collaborated with two medical centers in the Netherlands and China to obtain almost 400 representative patient samples of the virus from different populations and parts of the world.
"For the very first time, a comprehensive genome survey of HBV variants across distinct patient populations was undertaken," says Fernando Garcia-Alcalde, Team Lead Advanced Analytics at Diagnostics Information Systems. "We used a ‘random forest’ machine learning approach to analyse the deep sequencing data. As a result, we were able to predict which viral features are clinically relevant in people with chronic HBV."
However, this was only the first step: "In the future this methodology could allow clinicians to monitor patients over time," Alan continues. "This means that the combined application of deep sequencing and machine learning analysis in HBV research could help identify patients for more effective clinical trial design and understand how mutations relate to therapy success."
Parkinson’s disease affects seven million people worldwide.2 It is expected that the number will grow sharply as life expectancy increases. People suffer from tremor, muscle stiffness, balance problems, as well as cognitive and psychiatric symptoms such as depression, anxiety and memory loss. As the disease progresses, medication becomes less effective and the severity of symptoms can fluctuate from day to day. Traditionally, the way patients and their physicians monitor disease progression is by regular but infrequent check-ins in the clinic. This can make it difficult to paint a clear picture on how the disease is actually progressing between doctor visits.
In 2014, we started to use mobile sensors in our clinical trials to collect data continuously from the day-to-day life of a patient. That way, a more holistic profile of how the disease is progressing can be established.
However, this approach also leads to huge amounts of data. Florian Lipsmeier, Digital Biomarker Data Analysis Lead, comments: "To analyse these massive amounts of data we apply a special model of machine learning that builds on deep artificial neural networks called Human Activity Recognition3. The networks have to be trained with known datasets, after which they are able to classify different types of human activities. This allows us to investigate the effects the disease has on these activities."
The advantages of combining digital and machine learning technologies are diverse: being highly sensitive, they can detect symptom fluctuations day-to-day in a real-world setting. At the same time, the effort for patients is rather low and test results are less biased as the monitoring occurs in a private environment the patient is familiar with.
"When we saw the first machine learning based analysis from our digital biomarker studies, it became clear that we had made a big step forward, as it opened a window to a more comprehensive understanding of Parkinson’s and the patients’ daily life with the disease," says Florian. “In clinical trials we hope to generate more sensitive and objective endpoints with high clinical and patient relevance, with the potential to support smaller or shorter studies that might get treatments to patients faster. Building on the insights from machine learning in Parkinson’s disease, we will expand our digital biomarker programmes to diseases such as multiple sclerosis4, schizophrenia5, Huntington’s disease6 and spinal muscular atrophy."
These are exciting times, as we are starting to see how advanced analytics can be applied to tackle some of the toughest problems in healthcare and make a difference for patients.
With the ultimate goal to find better treatments for patients through systematic knowledge exchange, data scientists at Roche have established the Roche Advanced Analytics Network (RAAN), a global community with over 750 members across the globe.
RAAN runs a variety of activities including the "annual data challenge". This year, 141 Roche teams are working to develop the best model to identify patients most likely to respond to cancer immunotherapy versus standard of care using data sets from in-house clinical trials.
"We have been blown away by how many people want to bring in their expertise, learn from and help each other," says Ryan Copping, Global Head of Analytics for Personalised Healthcare, Roche Product Development. "We never dreamt we would connect so many talents from across Pharma, Diagnostics, R&D and IT, all bringing in unique viewpoints and working on common problems."
WHO[Internet; cited 2019 May14]
parkinsonslife[Internet; cited 2019 May 14]
2017 IEEE/ACM International Conference on Connected Health
Hoffmann-La Roche Ltd. [AAN Congress abstract; presented 2018 April 21]. Available from:
ISCTM [Congress abstract; presented 2018 Oct]
F. Hoffmann-La Roche Ltd. [HSG Congress abstract; presented 2018 Nov 8]. Available from:
This website contains information on products which is targeted to a wide range of audiences and could contain product details or information otherwise not accessible or valid in your country. Please be aware that we do not take any responsibility for accessing such information which may not comply with any legal process, regulation, registration or usage in the country of your origin.