Genomenon’s Automated Genomic Search Engine Mastermind Illuminates Current Trends in Genomic Literature – A Look at the Top Cited Variants Across the Scientific Literature

For this month’s “Company Spotlight” blog series we are taking a deeper look at Mastermind, an automated genomic search engine created by Genomenon, a University of Michigan spin-off. Mastermind, a comprehensive database of genomic disease-to-gene-to-variant associations, supports its users to search through millions of full text articles from the primary medical literature to identify variants of interest, prioritize them, and retrieve relevant articles for disease-gene-variant combinations. Rather than providing a regular Q&A with a company representative, we thought of illustrating the richness of Mastermind’s content and the associated functionalities via the retrieval of interesting citation data, as provided by Mark Kiel (CSO and co-Founder at Genomenon) and Lauren Chunn (Data Processing Intern at Genomenon). Specifically, the Genomenon team supplied a current trends analysis of widely cited variants within the genomic literature. Through this process they are able to paint a picture of the changing landscape of genomic research and medicine, from variants that have remained a common feature for decades to newly emerging variants over the last few years.

Genomenon is a 15 person company, headquartered in Ann Arbor, Michigan, and founded in 2014. To date, Genomenon has received $4.5M in funding (Angel, Venture investment and NIH SBIR Fast Track Grant), and has grown to become an independent company.

Mastermind is a genomic search engine comprising an index of titles, abstracts, and full text including figures and tables of 5.7 million prioritized primary articles. New content is added to Mastermind on a weekly basis which allows users to stay abreast of the latest research and clinical information influencing personalized medicine for cancer and constitutional disease research and clinical applications. Mastermind is designed for variant curators in clinical and research laboratories, both in the commercial and academic setting, and is intended to replace currently used search tools such as PubMed which indexes titles and abstracts only, and Google Scholar which does not index specifically for genetic content and does not prioritize results or organize information for clinical use. Currently, Mastermind is already being used by several dozen laboratories and over 500 users across 20 countries – either as the Free Edition of Mastermind or via subscription to the Professional Edition – for enhancing and automating high-throughput workflows.

Genomenon also provides services to clients to expedite database assembly and gene panel design. To illustrate the power of this technique, the Genomenon team has recently used Mastermind to develop the first evidence-based blood cancer panel based on automated machine learning techniques by mining the database for gene and variant biomarkers associated with leukemias and lymphomas. This blood cancer panel is available for download.

The following represents a current trends analysis, as provided by Mark Kiel and Lauren Chunn.

The top ten most cited variants

To illustrate the nature of the disease-to-gene-to-variant landscape, the top 10,000 most widely mentioned, clinically significant variants were identified and mapped over time by publication date. With the exception of BRAF p.V600E, the top ten most widely cited variants, including EGFR p.T790M, JAK2 p.V617F, HFE p.C282Y, EGFR p.L858R, KRAS p.G12D, CFTR p.F508del, KRAS p.G12V, BDNF p.V66M, and SNCA p.A53T, exhibit a steady upward curve in the number of citations over time as shown in Figure 1.

Not surprisingly, the cancer associated BRAF p.V600E mutation remains the most highly cited variant from 2008 onward, with a substantial increase in number of citations between 2008 and 2014. Furthermore, of the top ten highly cited variants in total, six are associated with cancer as might be expected, whereas four are associated with constitutional diseases, including Parkinson’s disease, hemochromatosis, and cystic fibrosis.

Figure 1: Shown are the top ten most widely cited variants graphed as the number of citations per year. Analysis time frame: 1994-2017.

Newly emerging highly cited variants are associated with cancer, with a focus on resistance mechanisms and the development of new therapeutics

While the top ten identified variants illustrate the direction and focus of genomic research over several past decades, a second picture emerges, demonstrating that these variants are predominantly associated with cancer and, more specifically, with a focus on resistance mechanisms and the development of new therapeutic interventions as summarized in the Table 1 below.

Table 1:  Cancer-associated variants that have greatly increased in number of citations in the past few years and are relevant to resistance mechanisms and/or therapeutic developments.

EGFR p.C797S mutation details 

One of the most notable variants is EGFR p.C797S, a mutation acquired only after treatment with EGFR tyrosine kinase inhibitors (TKIs), which confers resistance to these drugs in the treatment of lung cancer (Avizienyte et al., 2008).

EGFR p.C797s variant literature findings include:

  • This specific variant was first reported in November of 2007 by Yu et al.
  • The first documentation of its role in resistance occurred in October of 2008 by Avizienyte et al. (AstraZeneca).
  • Avizienyte and team found that the presence of the C797S mutation on the same EGFR allele as another mutation, p.T790M (one of the most widely cited variants overall), resulted in complete resistance to erlotinib, lapatinib, and the investigational drug CI-1033 (Avizienyte et al., 2008).
  • The variant was mentioned in the scientific literature only five times until 2015.
  • However, since 2015, the variant has been discussed in 218 articles (see also Figure 2).
  • The 2015 article by Niederst et al. (2008), Massachusetts General Hospital Center, described that when the p.C797S and p.T790M mutations occur in trans, or on different alleles, the cells remain resistant to third generation EGFR TKIs, but are sensitive to a combination of first and third generation EGFR Coinciding with the findings of that previous article is the observation that when those mutations occur in cis, or on the same allele, the cells remain fully resistant to EGFR TKIs used alone or in combination (Niederst et al., 2008).
  • Additional variants with precision therapeutic implications are depicted in Table 1, along with their disease associations and therapeutic affinities.

Figure 2:  Mutation EGFR p.C797S graphed as the number of citations per year (log scale). Each bubble represents a single article with the bubble size reflecting its relevance to genomic medicine (as determined by a quantification algorithm).

Figure 3:  Mastermind software screenshot highlighting the references identifying the EGFR p.C797S variant.

An upward trend towards constitutional diseases in parallel to ongoing cancer research

The EGFR p.C797S mutation is an excellent example of the current direction of genomic research focusing on cancer, precision therapies, and drug resistance mechanisms. However, many of the variants experiencing a similar burst in citations within the last few years are found to be involved in constitutional diseases (generally defined as pathological lesions whose etiology depends to a significant degree upon the action of genetic factors), particularly Alzheimer disease/dementia and Parkinson’s disease.

Genes and variants identified with greatly increased presence in the genomic literature include:

  • With an increasing focus on neurodegenerative disease, neuropsychiatric, and other disorders:
    • PARK2, VPS35, and SNCA: variants in three genes implicated in the pathogenesis of Parkinson’s disease
    • MAPT and TREM2: variants in genes implicated in the risk for and prognosis of Alzheimer disease
    • NUDT15 R139C: a variant associated with thiopurine-induced hair loss and leukopenia in the treatment of inflammatory bowel disease (Kakuta et al., 2016).
    • SLC10A1 S267F: a variant associated with a decreased risk of cirrhosis and hepatocellular carcinoma in patients with chronic hepatitis B infection (Hu et al., 2016)
  • Other notable variants, some of which are associated with cancer, include:
    • RAD51 G151D: a variant associated with a novel hyper-recombination phenotype and resistance to select DNA damaging agents (Marsden et al., 2016)
    • MITF E318K: a germ-line variant associated with an increased risk of developing melanoma (Potrony et al., 2016)
    • SF3B1 K700E: a variant associated with impaired erythropoiesis in myelodysplastic syndrome (Obeng et al., 2016)

Genomenon, through the use of its Mastermind database, was able to demonstrate how the genomic literature has been changing over the recent years, from the consistency of research involving widely known variants such as BRAF p.V600E to the substantial bursts of research describing newly discovered variants. Furthermore, it’s clear that the current field of genomic research is skewed toward cancer research with a recent focus on the development of targeted therapeutics. However, while cancer research seems to dominate, the recent trend towards increased publication of variants affecting constitutional diseases such as Alzheimer and Parkinson’s diseases may foreshadow the emergence of personalized therapies for treatment of such diseases.

The Genomenon team invites our readers to use the Mastermind Free Edition now with the invitation code enlightenbio.


Avizienyte et al. Comparison of the EGFR resistance mutation profiles generated by EGFR-targeted tyrosine kinase inhibitors and the impact of drug combinations. (2008) Biochemical Journal 415, no. 2: 197-206.

Yu et al. Resistance to an irreversible epidermal growth factor receptor (EGFR) inhibitor in EGFR-mutant lung cancer reveals novel treatment strategies. (2007) Cancer research, 67(21), 10417-10427.

Niederst et al. The allelic context of the C797S mutation acquired upon treatment with third-generation EGFR inhibitors impacts sensitivity to subsequent treatment strategies. (2015) Clinical Cancer Research 21, no. 17: 3924-3933.

Kakuta et al. NUDT15 R139C causes thiopurine-induced early severe hair loss and leukopenia in Japanese patients with IBD. (2016) The pharmacogenomics journal 16, no. 3: 280-285.

Hu et al. The rs2296651 (S267F) variant on NTCP (SLC10A1) is inversely associated with chronic hepatitis B and progression to cirrhosis and hepatocellular carcinoma in patients with chronic hepatitis B. (2016) Gut 65, no. 9: 1514-1521.

Marsden et al. The Tumor-Associated Variant RAD51 G151D Induces a Hyper-Recombination Phenotype. (2016) PLoS genetics 12, no. 8: e1006208.

Potrony et al. Prevalence of MITF p. E318K in Patients With Melanoma Independent of the Presence of CDKN2A Causative Mutations. (2016) JAMA dermatology 152, no. 4: 405-412.

Obeng et al. Physiologic Expression of Sf3b1 K700E Causes Impaired Erythropoiesis, Aberrant Splicing, and Sensitivity to Therapeutic Spliceosome Modulation. (2016) Cancer Cell 30, no. 3: 404-417.

Varettoni et al. Pattern of somatic mutations in patients with Waldenström macroglobulinemia or IgM monoclonal gammopathy of undetermined significance. (2017) haematologica102(12), 2077-2085.

Lynch and Ranjana. Dramatic Response with Single-Agent Ibrutinib in Multiply Relapsed Marginal Zone Lymphoma with MYD88L265P Mutation. (2017) Case reports in oncology10, no. 3: 813-818.

Thress et al. Acquired EGFR C797S mutation mediates resistance to AZD9291 in non-small cell lung cancer harboring EGFR T790M. (2015) Nature medicine 21, no. 6: 560-562.

Cole et al. Haploinsufficiency for DNA methyltransferase 3A predisposes hematopoietic cells to myeloid malignancies. (2017) The Journal of Clinical Investigation127, no. 10: 3657-3674.

Thompson and Burger. Bruton’s tyrosine kinase inhibitors: First and second generation agents for patients with Chronic Lymphocytic Leukemia (CLL). Expert opinion on investigational drugs, (accepted for publ.).

Robinson et al. Activating ESR1 mutations in hormone-resistant metastatic breast cancer. (2013) Nature genetics 45, no. 12: 1446-1451.

Takeshita et al. Comparison of ESR1 mutations in tumor tissue and matched plasma samples from metastatic breast cancer patients. (2017) Translational oncology 10, no. 5: 766-771.

Jeselsohn et al. The evolving role of the estrogen receptor mutations in endocrine therapy-resistant breast cancer. (2017) Current oncology reports 19, no. 5: 35.

Bagchi et al. Molecular basis for necitumumab inhibition of EGFR variants associated with acquired cetuximab resistance. (2017) Molecular cancer therapeutics: molcanther-0575.

Ouyang et al. Clinical significance of CSF3R, SRSF2 and SETBP1 mutations in chronic neutrophilic leukemia and chronic myelomonocytic leukemia. (2017) Oncotarget 8, no. 13: 20834.

Fleischman et al. The CSF3R T618I mutation causes a lethal neutrophilic neoplasia in mice that is responsive to therapeutic JAK inhibition. (2013) Blood 122, no. 22: 3628-3631.

Smith et al. Characterizing and overriding the structural mechanism of the Quizartinib-Resistant FLT3 “Gatekeeper” F691L mutation with PLX3397. (2015) Cancer discovery 5, no. 6: 668-679.

Yap et al. Somatic mutations at EZH2 Y641 act dominantly through a mechanism of selectively altered PRC2 catalytic activity, to increase H3K27 trimethylation. (2011) Blood 117, no. 8: 2451-2459.

Li et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. (2013) Cell reports 4, no. 6: 1116-1130.

Rathkopf et al. Androgen receptor mutations in patients with castration-resistant prostate cancer treated with apalutamide. (2017) Annals of Oncology 28, no. 9: 2264-2271.

Crona and Young. Androgen Receptor-Dependent and-Independent Mechanisms Involved in Prostate Cancer Therapy Resistance. (2017) Cancers 9, no. 6: 67.

Kiel et al. Integrated genomic sequencing reveals mutational landscape of T-cell prolymphocytic leukemia. (2014) Blood 124, no. 9: 1460-1472.

Andersson et al. Discovery of novel drug sensitivities in T-PLL by high-throughput ex vivo drug testing and mutation profiling. (2017) Leukemia 10.1038/leu.252.