PMWC 2016 Silicon Valley: Scalable Clinical Data Interpretation The Biggest Bottleneck


This year’s Personalized Medicine World Conference (PMWC 2016 Silicon Valley) was comprised of many exciting sessions and presentations in the area of next-generation sequencing, different diagnostics applications, precision medicine, big data analysis, the microbiome, large cohort studies, biobanking, and data interpretation/knowledge extraction. In addition to providing a great set of overview talks on latest developments and achievements across the health care sector, in pharma, and related to regulatory aspects, this latest rendition of PMWC also featured great government presence with Kathy Hudson (Deputy Director for Science, Outreach, and Policy, NIH) and Elizabeth Baca (Health Advisor to Governor Brown’s Office of Planning and Research) sharing respective updates on the “Precision Medicine Initiative” and the “California Initiative to Advance Precision Medicine” with the audience. Furthermore, with personalized medicine establishing new emerging themes were on the agenda, such as wellness, liquid biopsy, or knowledge extraction – the latter a critical aspect in making personalized medicine a continuous success.

In this post, I am summarizing some of my personal conference highlights, admitting that I was not able to see all talks and or visit all tracks, and thus not being able to do justice to a number of other high quality presentations that would have been equally worth covering.

A set of interesting quotes by presenters:

  • Keith Yamamoto (UCSF): “Precision medicine is an audacious aspiration for sure, but worthy the outcomes”.
  • Laura Esserman (UCSF):
    • “Just because it is targeted (e.g. personalized) may not mean it is better or working perfectly…”
    • “fit for purpose”.
    • “Aspire to Better”!
  • Kathy Giusti (MMRF): “big data is wonderful and great, but only ever as good as the question you are asking and answering”.

AButteThis year’s conference was chaired by Atul Butte, Director of the Computational Science Institute at UCSF. His energetic kickoff talk reminded us of how far we’ve come in the development towards personalized medicine: the cost of genome sequencing continues to drop, the California Personalized Medicine Initiative was launched in April of 2015, the rise of valuable cohorts is evident, the wearable future and with that an even greater opportunity to data and knowledge acquisition on the individual level is upon us, and last but not least, exciting hot new companies are emerging that propel the field forward. Later in the conference Butte was putting a big emphasis on available public and open source data accessible via ArrayExpress, PubChem, and ImmPort to name a few, with the next big data source being clinical trials data.

Keith Yamamoto (Vice Chancellor Research and Executive Vice Dean of Medicine YamamotoSchool at UCSF) discussed how precision medicine is close to reaching its inflection point. However, there are still complications ahead of us, rooted in the fact that the biology underlying a given disease is complicated, with networks of interacting genes and molecules that we need to understand. Such an understanding extracted from the mountain of data is required to move beyond the inflection point. To achieve this we need to merge the different facets of science and build learning systems that integrate science and policies, in close coordination with stakeholders. The precision medicine efforts currently under way at UCSF aim to address this via collecting, connecting, and applying vast amounts of scientific research data and information about our health with a goal to ultimately understand why individuals respond differently to treatments and therapies, and subsequently translate these insights into the practice of more precise and predictive medicine worldwide. The approach UCSF has taken is based on the knowledge network approach that with Atul Butte’s institute includes different commercial partners via two nuclear sites. The network is used in such a way that an investigator can click on a node to identify teams and collaborators – a model with flexible edges responsive to emerging directions driven by the specific collaboration projects. Dr. Yamamoto, ended his presentation with the quote: “Precision medicine is an audacious aspiration for sure, but worthy the outcomes”.


Quantifying a baseline to focus on prevention and giving individuals the power to help control their health was elegantly introduced with Lee Hood’s new endeavor Arivale. The underlying suggestion here is that wellness is about demystifying disease, and that in approximately 15 years from now we will have created an entire new domain. A variety of different data – fitbit, microbiome, personal genome, GWAS, clinical chemistry data  – are combined with monthly coaching to identify relevant new correlations that support reaching actionable personal health goals, identifying diseases early with the hope to reverse them, adjusting diet and exercise, and suggesting possible therapeutic approaches. Arivale is in the process of publishing its findings of a 100 pioneer program and is now focusing on rolling its services out as an enterprise offering.  Other companies in this space include giants like Calico or Human Longevity, but also smaller startups like BaseHealth.

Breast cancer diagnostics

A talk I was particularly looking forward to was the one by Dr. Laura Esserman a breast surgeon at UCSF.  She is a big proponent of personalized medicine, yet cautions that this does not necessarily mean that a personalized treatment must be better. Just because it is targeted (e.g. personalized) may not mean it is better or working perfectly, it could also have negative value and be very costly, and therefore it is important to take the approach of “fit for purpose”. As pointed out in her talk the current drug discovery and development process is very expensive and inefficient with long development times to market, and a 70-90% failure rate of phase III trials. Esserman stated that a change in the way we collect data to enable good clinical care is needed. She herself has actively started or is involved in some major initiatives which include the I-SPY TRIAL program (a multi-site neoadjuvant clinical trial that has evolved into a model for translational research and innovation in clinical trial design that tests drugs that matter), Athena Breast Health Network (a project designed to follow 400,000 women from screening through treatment and outcomes), or WISDOM (which studies the benefits of a personal approach to breast cancer screening that takes different measures like breast density and gene markers into account). Again, the emphasis is on collecting the right data and having the right screening process in place. Her concluding remarks were that “we need to ensure innovation and “Aspire to Better””!

Clinical data interpretation

Clinical data interpretation is one of the main challenges we encounter these days. This in particular given the flood of emerging sequence data expected to ultimately shape clinical applications provided we succeed in generating accurate clinical interpretation. Christine Curnoyer (CEO at N-of-One) focused in her talk on the complexity of the task as we are moving away from the one test/one drug model to the thousands of diseases and variants model to get the strategy right. Curnoyer stated that we clearly need to understand the multi-variant interactions. Some of the challenges associated with going beyond easier gene panels, includes the integration of report delivery with the analysis, the integration of critical medical records data, and the question of adequate time a pathologists should spend on interpreting the data. A deep analysis and interpretation requires teaming up with experts like oncologists to make full sense of the data. Last but not least, the scaling to the clinical tests that is upon us in the near future.

Big data everywhere

…and with that a set of talks that discussed applications, biobanking, and Giustiprocesses and implementation for standardization and sharing.  Kathy Giusti, Founder and Executive Chairman of the Multiple Myeloma Research Foundation (MMRF) and a multiple myeloma patient, elaborated on the foundation’s efforts to move research to the clinic. Giusti pointed out that the main focus is on big data and in working together with Genospace they push to get the data out there. She stated that one of the joys is having partners that come together to analyze the different data sets. The MMRF’s focus is now on creating more innovative clinical trial designs and moving patients into master protocols. Therefore, MMRF views itself more like a virtual biotech company collecting as many patients as possible to enroll them when needed. She concluded by stating that “big data is wonderful and great, but only ever as good as the question you are asking and answering”.

Biobanks are a key element to big data and cohorts. Both Catherine Schaefer (Kaiser Permanente Northern California [KPNC] and Sir Rory Collins (UK Biobank) were covering the subject of biobanking  which included a status update, how they recruited patients, the type of data they are collecting, but also the challenges encountered. It was pointed out that the problem with any cohort is that only a few of them will be informative for some conditions which therefore requires these studies to be very large. As of today the Kaiser Permanente biobank includes about 200K samples, while the UK Biobank has about 500K samples. It took almost three years for the UK Biobank to reach 500K samples which are now available to anybody in the world for research purpose. In the case of the UK biobank all information was gathered via an online portal asking specific questions that included medical history (e.g. hearing, vision), occupation, fitness measure, and more. Furthermore, 100K enrollees are now wearing an accelerometer and are frequently asked about diet specific question.

Diane Wuest (GNS Healthcare) discussed in detail GNS Healthcare’s approach to big data analysis which includes machine learning and building causal models (a network of models) from multi-data sources. Interestingly, GNS Healthcare does not generate any data in-house, but rather relies on their partners to provide such. Here some impressive big data numbers that they are tackling in collaboration with the Global Genomics Group (G3): 22 trillion data points, 7,500 subjects, data sets include proteome, lipidome, genetic data, clinical parameters, and different omics layers (e.g. mRNA) with 9.5 million causal edges between the variables.

David Haussler, Director UCSC Genomics Institute, and panel (Oracle Healthcare, Color Genomics, and Carlos Bustamante from Stanford University) discussed the different approaches they are taking to share and make data accessible. Haussler’s emphasis was on guidelines that need to be harmonized, have group participation, and standardized formats and locations. As an example the BRCA Challenge was highlighted which focuses on creating the largest collection of standardized disease variant data in one location. To address this challenge the GA4GH – lead by Haussler’s team – is in the process of building a comprehensive graph of the universe of variants in one reference genome. Interesting Taylor Sittler from Color Genomics did mention that Color will make all their BRCA results open and accessible via the Global Alliance.

Large-scale genomics

The large-scale genomics project session included talks by David Ledbetter (Geisinger), Peter Donnelly (Oxford University), Andrew Carroll (DNAnexus), Hannes Smarason (WuXi NextCODE), and Catherine Ball (Ancestry DNA). It is absolutely clear that we are accumulating massive data which in turn is calling for elegant ways to manage the information (DNAnexus offers data compression mechanisms), and requires fast processing and findings sharing (all data produced by Geisinger is contributed to ClinVar and other databases) with the big bottleneck being scalable data interpretation (WuXi NextCODE employs deep learning), and large reference data (both WuXi NextCODE and Ancestry DNA have both mined about 800K samples).

The talk by Helix’s co-founder Justin Kao was energetic and one of my personal highlights of this year’s PMWC. The company, founded in 2015 and backed by Illumina, is dedicated to take genomics out of research and to the consumer. The company’s headquarter is in San Francisco – to reach and pull qualified engineers – but has their sequencing lab in San Diego. Even though Helix plans on storing the entire DNA sample of any participating customer, they will focus on sequencing specific windows of the whole sequence only with the rest of the data being stored but accessible for additional sequencing work when needed. Their goal is to function as the trusted intermediary to help customers select a specific window of data and for sharing it with a medical center. Helix will focus on the informatics by providing APIs to partners. The different partners will help personalize the products, create specific sites for specific needs, and will help reach millions of qualified customers. For the customer the value propositions lie in the fact that they have lifetime access to their DNA, have control over what piece of DNA they want to interact with and share, and have access to high quality data that via a secure and trusted partner. Helix plans to launch its product commercially by end of 2016.

The Microbiome

New to this year’s PMWC was the added microbiome session with speakers from Stanford Medical School (Ami Bhatt), Second Genome (Peter diLaura), and Seres Therapeutics (David Cook). It was poignantly conveyed that with 100 trillion bacteria (~3% of our body mass or ~1-10xmicrobes versus human cells), our microbiome can’t be ignored much longer. The case is obvious, as we succeed in the microbiome space we will be able to improve our view of biomarkers, redefined pharmacology, and create effective new therapeutics.


The conference concluded with a talk by Charles Chiu (UCSF) discussing in detail the metagenomics research his lab is focusing on and the development of the bioinformatics pipeline SURPI, that allows pathogen identification from complex metagenomics next-generation sequencing (NGS) data generated from clinical samples. SURPI aligns the sample data against all sequences in Genbank to identify pathogen sequences. Last but not least, his talk also focused on what it would take to change the point of care setting to increase turnaround time of results, including sequencing and data interpretation. Currently his lab is testing the Nanopore sequencing system (in beta), which shows promise to perform viral read detection in less than eight minutes.


One thought on “PMWC 2016 Silicon Valley: Scalable Clinical Data Interpretation The Biggest Bottleneck

  1. Pingback: At AGBT 2016, the Winners Are Long Reads and Whole Solutions | enlightenbio Blog

Comments are closed.