Cancer gene expression signatures: The Rise and Fall…and Rise?

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

James Bradford

Head of Bioinformatics, Almac Diagnostic Services

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery
The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

Download the article

Gene expression signatures – Promise vs Reality

Since the advent almost 25 years ago of techniques that enable simultaneous measurement of gene expression in a single sample, the use of gene expression profile combinations (or “signatures”) to understand tumour biology has promised to revolutionise our approach to diagnosis and prognosis in the clinic. However, this potential has yet to be fully realised despite the exponential increase in genomic data during this period. This blog explores some of the issues that have hindered progress, and highlights several recent trends and innovations that may yet fuel a rise in successful translation of gene signatures into clinical application.

Gene expression signature discovery: a history of potential

A gene expression signature refers to a finite, pre-determined group of genes whose combined expression profile is highly specific to a biological process, disease state or pathogenic medical condition. Typically, the signature generation process takes place in three main phases: discovery, development and independent validation.

Discovery: During discovery, gene expression profiles are determined in a set of “training” samples, where genes highly correlated to or differentially expressed between the phenotype(s) of interest (e.g.prognosis) are selected. The training set must be of sufficient size to allow statistically meaningful associations between genes and phenotype to be identified as an underpowered analysis can lead to failure during the independent validation phase.

Development: During development, genes are further refined, and candidate signatures undergo rigorous cross-validation before a method to score and classify patients based on their signature profile is implemented. The discovery and development phases are often indistinguishable, particularly if automated methods such as machine learning are employed.

Independent validation: For translation into the clinic, an independent validation phase, where signatures are tested in clinically relevant cohorts distinct from those used to develop the classifier, is critical.

Whilst the first gene expression signature can be traced back to 1995, a simple search of Pubmed reveals that cancer gene expression signature development activities did not become widespread until the early-mid 2000s (Figure 1).

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery
The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

“Currently, over 100 cancer gene signature development efforts are published every year, and in 2020 the number is likely to reach nearly 250.”

Figure 1 suggests two periods of exponential growth (2003-2010 and 2014-present), each catalysed by a major advancement in gene expression profiling technology around five years previous: the cDNA microarray in the mid-1990s and RNA Sequencing (RNA-Seq), its Next Generation Sequencing-based successor, in the mid-2000s. The consolidatory period between 2011 and 2013 appears to coincide with a transition between the two platforms. It is also worth noting that the release of The Cancer Genome Project (TCGA) expression data, generated by both microarray and RNASeq platforms, began in 2010, and has likely provided further impetus for signature development during the last decade. Currently, over 100 cancer gene signature development efforts are published every year, and in 2020 the number is likely to reach nearly 250.

Translation to the clinic: a potential unfulfilled…yet

Despite the opportunities offered by new technologies, and extensive efforts by the MAQC (Microarray Quality Control) consortium1 to show both microarray and RNASeq platforms are sufficiently reliable for clinical and regulatory purposes (at least if mRNA is extracted from high quality samples), the vast majority of signatures have failed to make any clinical impact. Indeed, to the author’s knowledge, only two gene expression signatures have gained FDA approval, both of which are prognostic in breast cancer: Prosigna (a 50-gene signature providing a risk-of-recurrence score) from Veracyte and Mammaprint (a 70-gene signature to stratify patients into high versus low risk for relapse) from Agendia. No gene expression signature has gained approval since Mammaprint in 2013.

Problems associated with the first exponential growth phase of signature development were documented in two seminal papers published just after that period2, 3.

Principally, they noted that many early gene signatures were developed on small training sets (increasing the risk of over-fitting), and lacked external validation resulting in low reproducibility in independent datasets. A Pubmed search confirms that signature validation was rarely performed in the microarray-led growth period with no publication explicitly referring to “validation” in the title between 2003 and 2008, and only six publications between 2009 and 2014 (Figure 2). However, recent trends suggest that the situation has slowly improved in the RNASeq-led growth phase, culminating this year with over 15% of developed signatures undergoing some form of validation. Whether this trend translates to a higher proportion of FDA approved signatures in the future remains to be seen but it does suggest that lessons have been learned and more rigorous practises are now being applied to RNA signature development.

Two further issues continue to impede progress to the clinic. Firstly, for many signatures, the gain in predictive accuracy compared to more established prognostic factors better suited to clinical testing is either insufficient or unquantified. Indeed, the PAM50 signature, which enables classification of breast cancer into four prognostic subtypes4, is justifiably regarded as clinically influential (and now forms the basis of the Prosigna assay), but at the time was not immediately adopted in the clinic because cheaper and more efficient surrogates such as immunohistochemical measurement of hormone receptor (HR) and HER2 status performed equally well. Secondly, some signatures are cohortdependent where individual sample scores rely on information from other samples in the same cohort. This results in unstable and non-reproducible scores that cannot be validated for use in prospective clinical testing where samples are measured one at a time.

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

Almac signature discovery and validation process

Almac Diagnostic Services has long been a strong advocate for the use of robust best practices and standards in signature development and validation, exemplified by our active contribution to the MAQC initiative in 20101. Based on this experience, we have an established bioinformatics Biomarker Discovery process applicable to both cDNA microarray and RNASeq platforms designed to meet MAQC standards and avoid the common pitfalls highlighted above.

The initial phase of the process consists of a series of data QC steps, which include Almac’s proprietary Exploratory Analysis (EA) tool to identify and reduce any technical effects that may confound signature generation.

This is followed by the signature discovery/ development phase, which begins with feature selection and performance metric generation carried out under cross-validation, and then application of a machine learning method appropriate to the endpoint (whether discrete or continuous). Multiple factors guide final model selection including statistical performance, biological relevance and independence from established clinical biomarkers. The chosen model is always further validated using independent test data.

As a result of this process, Almac Diagnostic Services has discovered and validated several proprietary biomarkers such as our own DNA Damage Immune Response (DDIR), Angiogenesis, Epithelial-Mesenchymal Transition (EMT), ProstateDx and ColDx signatures. Almac’s ColDx and DDIR signatures, originally developed on microarray platforms, have been independently clinically validated by the Cancer and Leukemia Group B (CALGB) and SWOG consortiums respectively5, 6. Furthermore, the DDIR signature has recently been transferred to the Illumina RNA Exome platform, undergoing a rigorous analytical validation process that meets both Clinical Laboratory Improvement Amendments (CLIA) and Clinical and Laboratory Standards Institute (CLSI) guidelines7. To the author’s knowledge, this is one of the first analytical validation studies of a gene expression signature on RNA-Seq technology.

The wisdom of crowds: Almac claraT Total mRNA report

Recently we have seen an increased interest in companion diagnostic gene expression signatures from our Pharmaceutical partners. This is perhaps because the complexity of tumour biology underpinning the response to certain therapies is not adequately captured by immunohistochemical or DNA analysis. For example, the response to immune targeted therapies may be determined by a complex interaction between tumour and stromal molecular pathways which are often dysregulated at a gene expression level through mechanisms other than DNA mutation.

This has led Almac Diagnostic Services to develop claraT, a unique software-driven solution that integrates a diverse set of pan-cancer gene expression signatures into a comprehensive, easy-to-interpret cohort report. Over 90 signatures representing all 10 Hallmarks of Cancer8 are included in the report with each signature selected for inclusion based on a set of rigorous scientific and technical criteria including:

  1. literature-based review of scientific and clinical rationale.
  2. level of validation and clinical utility.
  3. feasibility of implementing published signature methodology.

The integrative approach compensates for any potential shortcomings at the individual signature level by use of information from other signatures, thus increasing the likelihood of discovering accurate and clinically relevant disease subtypes. It also allows efficient visualisation of the key discriminating biologies within either a large cohort or an individual tumour sample. The full end-to-end solution from raw sequence data to claraT report takes a matter of hours regardless of cohort size, saving researchers months of effort in selecting and implementing a similar number of signatures themselves. Thus, claraT accelerates the interpretation of complex datasets, extracting value from gene expression markers even for those without specialist computational knowledge.

Closing remarks

Whilst the potential of gene expression signatures remains largely unfulfilled, the increased drive in recent years to meet the standards first advocated by MAQC a decade ago provides hope that signatures can begin to progress more frequently beyond the development phase and translate to patient benefit.

Almac Diagnostic Services will continue to look to the future and support these efforts by promoting robust signature development and best practice, whilst drawing on our experience to offer claraT, a powerful computational tool that distils some of the most prominent cancer gene signatures to emerge over the last 25 years into a single reporting solution.

Download the article

More information

Almac Diagnostic Services has recorded a webinar on the analytical validation of gene expression signatures, presented by Dr Katarina Wikstrom, entitled “Overcoming the challenges of taking RNA biomarkers into clinical trials.”

Download this complimentary webinar to explore key considerations in the development & analytical validation of developing RNA biomarkers suitable for clinical stratification.

Download the Webinar


1. MAQC Consortium (2010) The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature Biotechnology, 28(8): 827-838.
2. Koscielny S. (2010) Why most gene expression signatures of tumors have not been useful in the clinic. Science Translational Medicine, 2: 14ps2.
3. Chibon F. (2013) Cancer gene expression signatures – The rise and fall? European Journal of Cancer, 49(8): 2000-2009.
4. Parker et al. (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology, 27(8): 1160-1167.
5. Niedzwiecki J et al. (2016) Association between results of a gene expression signature assay and recurrence-free interval in patients with stage II colon cancer in Cancer and Leukemia Group B 9581 (Alliance). Journal of Clinical Oncology, 34(25): 3047-3053.
6. Sharma P et al. (2019) Validation of the DNA Damage Immune Response signature in patients with triple-negative breast cancer from the SWOG 9313c trial. Journal of Clinical Oncology, 37(36): 3484-3492
7. Medlow et al. (2021) Analytical validation of an RNA Exome sequencing gene expression assay. In preparation.
8. Hanahan D & Weinberg RA. (2011) Hallmarks of Cancer: The Next Generation. Cell, 144(5): 646-674.

The Almac Advantage – Post-Brexit Northern Ireland IVD Landscape

Almac is uniquely placed to act as “one stop shop” with easy access to both
the EU and UK for biomarker clinical trial support and CDx development &

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

Dr Stewart McWilliams

Global VP of Quality & Regulatory Affairs, Almac Diagnostic Services


The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

Download the article


Due to the special status that Northern Ireland has been granted as part of the EU Withdrawal Agreement between EU27 and UK, once the transition period ends (and regardless of whether the EU and UK have concluded a trade agreement by then), Northern Ireland will continue to adhere to EU rules on the regulation of medicinal products, medical devices and the movement of goods. This part of the Withdrawal Agreement is known as the “Northern Ireland Protocol”.

This puts Almac in a unique position to allow clients unfettered and flexible access to support their biomarker clinical trial and CDx development in both the UK & European markets.

New MHRA Guidance for Medical Devices

MHRA guidance for regulating medical devices in the UK from the end of the Brexit transition period (31st December 2020), was published on 1st September 2020. Under the terms of the Northern Ireland Protocol, from 1 January 2021, the rules for placing medical devices on the Northern Ireland market will differ from those applicable to Great Britain and will remain aligned with those of the EU.

The Northern Ireland Protocol will offer Northern Ireland-based companies, like Almac, the opportunity to effectively act as if they are still within the EU with respect to compliance with EU In Vitro Diagnostic (IVD) Regulations and EU Clinical Trial Regulations while still being able to easily access the UK market. In other words, the best of both worlds.

As a result, for in vitro diagnostics being used in NI for clinical trials, Almac will remain in compliance with current EU directives and incoming IVD regulations (full compliance to which must be achieved by 26 May 2022 respectively, in line with the EU’s implementation timeline.)


The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery


The Almac Advantage & Northern Ireland

Almac Group has been working under the current EU directives for many years and Almac Diagnostic Services has been planning for the IVDR for several years. We will be fully compliant with the EU IVDR by the May 2022 deadline. Our customers, who are currently utilising our services, can expect continuity with respect to levels of service and hassle-free regulatory transition for their assays to the new EU regulation.

From the 31st June 2023 for IVD’s utilised within the UK a new UKCA mark will replace the CE-mark and will be required to be displayed on all devices. Manufacturers of IVD’s, such as Almac, who are located in Northern Ireland, will still be able to register all devices with the MHRA.

As per the current guidance from MHRA, a Northern Ireland-based manufacturer upon registration of an IVD with the MHRA, can then freely supply the device between Northern Ireland and Great Britain with no further registration required.

This is a huge benefit for Almac’s customers allowing Almac Diagnostic Services to act effectively as a ‘one stop shop’ for UK and EU clinical trial support activities such as clinical testing and in vitro diagnostic (IVD) development from our global headquarters based in Craigavon, Northern Ireland.

Further Brexit Information from Almac Group:


The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery


About the Author

Stewart McWilliams leads the Quality Management and In vitro Diagnostic (IVD) Regulatory affairs activities at Almac Diagnostic Services. The team works with pharmaceutical industry clients on the Quality and Regulatory aspects of CDx Development and Commercialisation. They are also responsible for Almac Diagnostic Services’ Laboratory Quality Management systems ensuring compliance with ISO13485, CLIA (Federal and New York State CLEP), ISO17025, ISO15189 and the College of American Pathologists (CAP) accreditation requirements.


The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

Dr Nuala McCabe

Biomarker Research Manager, Almac Diagnostic Services


The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery

Download the article

RNA biomarkers – a largely untapped, but beneficial resource

Next generation sequencing (NGS) has evolved as a valuable tool for biomarker discovery and development. This new era of research into biomarker discovery has largely focused on DNA based biomarkers rather than RNA based biomarkers and is exemplified by the number of publications in the NGS field, mostly DNA related (Figure 1A). To date, RNA has been a relatively untapped resource for biomarker discovery. However, within the past 8-10 years, there has been a significant rise in the number of NGS publications featuring RNA-Sequencing (RNASeq) (Figure 1B). This change has been driven by the realisation that researchers cannot solely rely on DNA aberrations to understand tumour biology but also need other omics data such as RNA profiling to capture a more comprehensive view.

This suggestion has recently been highlighted in a New York Times article entitled “The Search for Cancer Treatment Beyond Mutant-Hunting” by Siddhartha Mukherjee (author of the book “The Emperor of All Maladies: A Biography of Cancer”) who suggested “mutations within a cancer cell certainly carry information about its physiology – its propensity for growth, its vulnerabilities, its potential to cause lethal disease – but there’s a world of information beyond mutations.” He continues “What if the ‘really clinically useful information’ lies within these domains – in the networks of normal genes co-opted by cancer cells, in the mechanisms by which they engage with their host’s immune system or in the metabolic inputs that a cell needs to integrate in order to grow? At the annual meeting of the American Society of Clinical Oncology (ASCO) this year [2018], it was this altered – and more expansive – vision of precision cancer medicine that was on display. ” It should be noted that several commercial RNAbased biomarkers have already been developed for prediction of outcomes in cancer including Agendia’s – Mammaprint®, Genome Dx’s – Decipher®, and Genomic Health’s – OncoType DX® assays, although these are microarray and q-PCR based rather than using NGS.


The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery


The evolution of RNA-Seq

Traditionally cDNA microarray technology has been the primary method of choice for gene expression profiling. However, with the continued advancement of NGS technology, RNA-Seq is rapidly emerging as the principal discovery tool. Unlike microarrays which are limited in their design to the detection of known transcripts, RNA-Seq has the potential to detect novel transcripts and structural variants such as alternative splicing events and gene fusions and can also identify allele-specific expression and disease-associated single nucleotide polymorphisms (SNP). Furthermore, RNA-Seq technology also avoids the hybridization based issues associated with microarrays and demonstrates a broader dynamic range (>8,000 fold), with low background signals. For differential gene expression analysis, RNA-Seq demonstrates superior levels of accuracy in the quantification of highly expressed genes and sensitivity in lowly expressed genes when compared to microarrays. Multiple platforms are currently available for RNASeq, including Illumina HiSeq and NoveSeq platforms and ThermoFisher Ion Torrent Personal Genome Machine.

The challenges of RNA-Seq

Formalin-fixed paraffin-embedded (FFPE) samples are a valuable resource for biomarker studies since they are routinely collected in clinical practice. However formalin fixation presents certain challenges both from a technical and bioinformatics perspective. This is mainly due to chemical modification, crosslinking, and general degradation of the RNA, particularly in samples over 6 months old. However, with the advancement of sequencing chemistries and processes for RNA-Seq analysis, optimized workflows (e.g. Illumina TruSeq® RNA Exome) are now available with robust quality control steps both in the laboratory and throughout data analysis, incorporating advanced algorithms which can account for FFPE specific effects. The scale and complexity of the data generated with RNA-Seq also harbours significant challenges for biomarker discovery. Interpretation of RNA-Seq data requires sophisticated and powerful computational programs with novel tools emerging constantly that require performance validation. A wealth of mature tools exist to meet basic requirements (e.g. applications hosted on Illumina BaseSpace Sequence Hub). Some challenges, however, remain where there is scope for more advanced algorithms such as differential gene expression analysis, detection of fusion genes, alternative splicing, and variants detection.

The opportunities of RNA-Seq

Despite the various challenges of using RNA-Seq for biomarker discovery and development, there are numerous opportunities. Gene expression based biomarkers capture several molecular pathways and are more dynamic than DNA based biomarkers. They may therefore better reflect the underlying biology of disease, over time and with treatment

“The obsession with DNA mutations alone is short sighted. Utilisation of RNA signatures is key to recognizing stratified medicine.”- Professor Tim Maughan CRUK/MRC

RNA-Seq allows for the discovery and validation of a large number of gene expression signatures simultaneously from a single tumour sample (from as low as 20-50 ng RNA input), which may enable better treatment stratification through reflecting multiple biologies. In addition, the ability to understand parallel biological pathways guides potential combination drug therapies and aligns with the recent move in clinical trials towards basket and umbrella approaches. The release of public databases of expression data from collaborative programmes such as The Cancer Genome Atlas (TCGA) has facilitated the identification and in silico validation of many potential gene expression Biomarkers for clinical trial use.

The increasing uptake of RNA-Seq as the technology of choice in biomarker discovery


The RNA technology market landscape

In my opinion, four main categories of commercial RNA expression profiling exist. The first of these are the service providers for larger whole transcriptome panels Almac, Q2 Solutions and PGDx for example, providing comprehensive sequencing analysis. However, this is a standard service offering, where clients may require additional bioinformatics expertise for data analysis and interpretation. In addition to these whole transcriptome offerings, there are many curated gene panels (some with and some without a specific report) available for biomarker development including the HTG Molecular Panel and Nanostring Panels. These panels have the advantage of a shorter turnaround time, are more focussed on biologies and produce less data for analysis and interpretation. For example, Cofactor Genomics provide an immuno-oncology (IO) only offering with a patient report for instant interpretation. However, these targeted approaches have limitations as they do not provide a comprehensive picture of the global tumour biology. The final category is whole transcriptome analysis with a customised biologically interpreted report such as the Almac claraT proprietary solution (see below). This approach allows the simultaneous application of multiple biomarker assays that represent the hallmarks of cancer, providing the rapid generation of standardised data and saves considerable bioinformatics resource.





claraT– Almac’s unique gene expression report

To solve the complex RNA-Seq data challenge, Almac has developed claraT*- A unique software-driven solution, classifying biologically relevant gene expression signatures into a comprehensive, easy-to interpret report. claraT helps simplify RNA-Seq data for biomarker discovery and translational research. A pancancer solution, based on a powerful proprietary bioinformatics pipeline, automatically generating the claraT report from raw gene expression data utilizing the Almac optimized RNA Exome Panel. The claraT report covers key biologies within the Hallmarks of Cancer, and to date includes 35 gene expression signatures, 30 drug targets and 1,641 single genes relevant to cancer biology.

claraT – Whitepaper:

“Evaluation of the claraT Total mRNA Report In an RNA-Sequencing dataset from malignant melanoma cancer patients.”

In this whitepaper the application of claraT to an RNA-Sequencing dataset of malignant melanoma cancer patients treated with immune checkpoint therapy is described.


clataT White Paper



Download the Whitepaper





In Conclusion

To conclude, RNA Sequencing (RNA-Seq) is an increasingly popular technology for biomarker discovery in cancer research. High-throughput RNA-Seq produces large quantities of data that can reflect multiple biologies but requires complex computational bioinformatics pipelines to enable interpretation. Novel software approaches such as the Almac claraT solution can bring a standardised, rapid approach to analysis of this data for the purposes of biomarker discovery and application.


*claraT is for research use only (RUO) and is not to be used for diagnostic or prognostic purposes, including predicting responsiveness to a particular therapy.
The information contained in this article is accurate, to the best of the author’s knowledge, as of 1 February 2019.
claraT TM is a trade mark of Almac. Other trade marks referenced are those of third parties.  


NGS (Next Generation Sequencing) panels and patient tumour testing for stratified medicine

Dr Laura Knight Dr Laura Knight

Head of Bioinformatics and Biostatistics, Almac Diagnostic Services


Professor Richard KennedyProfessor Richard Kennedy

Global VP Biomarker Development, Almac Diagnostic Services


NGS panels and patienttumour testing for stratified medicine


Download the article

Brief overview of NGS panels

The past year has seen FDA approval of two large DNA-based Next Generation Sequencing (NGS) panels for single laboratory multi-gene mutation testing in tumours (The Foundation One and Memorial Sloan Kettering MSK-IMPACT panels). Technology providers such as Illumina, Thermo Fisher and Qiagen are also actively developing kits for the delivery of NGS panels from multiple laboratories. In this article we discuss the reasons behind the move to NGS panels, the opportunities these provide and some of the challenges they present.

Why NGS panels?

In the case of solid tumours, clinicians often have limited amounts of material to provide for molecular biomarker testing. Lung cancer, for example, may be diagnosed with a trans-bronchial biopsy which provides as little as a 5mm diameter sample of mixed tumour and normal cells and often yields less than 100ng of DNA and RNA.

Traditional approaches to DNA or RNA analysis such as PCR-based methodologies can use substantial amounts of material in order to give a single assay result. This may leave the patient and their clinician in the position of receiving a negative predictive biomarker result and no additional tissue to test. This is particularly an issue where an actionable mutation is relatively rare such as the ROS1 gene rearrangement which predicts response to crizotinib in non-small cell lung cancer (approximately 2% of cancers).

In addition, clinical trials are increasingly following “umbrella” designs where multiple drugs are offered depending on the results of more than one assay. It is therefore often impossible to run assays that use large amounts of material in these studies. It is also logistically difficult to send samples to more than one assay vendor, especially as there may be interdependencies on the results (i.e. a second assay is only considered if the first assay is negative).

NGS panels, targeted panels and exome-wide RNA panels

NGS panels offer a solution to these issues. Modern targeted DNA panels can provide cost effective gene aberration data in several hundred genes simultaneously with as little as 20ng of DNA input. Whole exome sequencing (WES) panels can extend this coverage to over 20,000 genes but are less practical for regular clinical use due to cost and through-put limitations with current diagnostic platforms. Furthermore, non-coding DNA regions may be important in predicting response to certain drugs and these would not be covered by WES.

Targeted panels, restricted to genes and mutation “hotspots” of interest such as the FDA-approved Oncomine Dx panel from ThermoFisher or the TruSight Tumor 170 platform from Illumina, can be more cost effective to analytically validate and run in a diagnostic lab. Targeted platforms can also be designed to sequence non-coding regions that are mutated in disease.

Exome-wide RNA panels are also now available and can be used to deliver what could be considered an infinite number of gene expression based assays, providing they are based on measuring coding transcripts. Current platforms can use as little as 20ng mRNA although formalin-fixed paraffin embedded based samples may require larger inputs due to mRNA degradation.

NGS Panels

Potential issues

The move to panel based NGS strategies has presented some challenges, particularly in the regulatory space. A major concern is the level of validation required to allow clinical decisions to be made safely. In the case of clinical trials these panels fall under CLIA/CLEP legislation in the US and CE marking in the EU.

Typically an approach is taken where the lab demonstrates that the panel can properly detect aberration class types such as deletions, insertions, single nucleotide variations and copy number variations through a representative number of each rather than validation of every gene (which would be extremely challenging). Labs need to be able to demonstrate adequate analytical validation including precision (repeatability and reproducibility), sensitivity (minimum allele frequency detected) and accuracy (through comparison of results to an alternative orthogonal technology such as digital droplet PCR or Sanger Sequencing). Due to the large amounts of data measured, and the varying approaches to data filtering, there will always be some analytical error in these studies, especially in measurement of mutations at low (1% or less) allele frequency.

Potential IssuesIn addition, Formalin Fixed Paraffin Embedded (FFPE) tissue can represent a challenge due to the introduction of formalin-related artefacts. In our experience, as many as 20-30% of the gene aberrations identified in unfiltered data from low quality material, are FFPE artefacts. Of concern, some of these could mistakenly be considered pathogenic and in the worst case lead to the wrong treatment being recommended. It is therefore important that a fresh tissue/FFPE comparative dataset is run on new technologies to ensure the bioinformatic pipeline has appropriate filtering for these artefacts.

In the case of somatic variant calling, the gold standard is to run a matched normal reference (typically white cells from blood) alongside the tumour sample, such that germline variants and common polymorphisms can be filtered out. However, this increases the cost of NGS profiling as it requires two samples to be run per patient. Somatic variant calling from a single sample is also possible, but it requires using databases (such as dbsnp, sift, polyphen, COSMIC etc.) to filter out synonymous variants. Often the use of databases is not enough and can leave behind some variants that would have been filtered out with the use of the normal reference. Novel bioinformatics algorithms have also been proposed to distinguish somatic vs germline alterations, for example the SGZ (somatic-germlinezygosity) method that is employed in the Foundation One assay from Foundation Medicine.

DNA panels

A potential issue for targeted DNA panels as opposed to WES is that they may have inadequate chromosomal coverage to make an accurate measurement of copy number variations (CNV) and ploidy. This can be particularly challenging where a tumour contains significant amounts of non-cancer material such as white cells or stroma.

In our opinion, WES, with a reference normal DNA sample from white blood cells, still represents the best methodology for CNV measurement. Similarly, WES is the platform of choice for the measurement of tumour mutation burden (TMB), the number of mutations per megabase, which may have utility in selecting patients for immune checkpoint targeted therapies. NGS panels targeted to a few hundred genes or less may not have adequate DNA coverage to provide an accurate measurement of TMB using standard counting approaches.

RNA panels

In the case of RNA based panels, mRNA can also be adversely affected by FFPE sample degradation, particularly in samples over 6 months old. These FFPE specific effects must be accounted for in the quality control steps and the measurement of unstable mRNA transcripts should be excluded from biomarker signatures. It is also important to have pre-defined control samples to run with each batch of patient samples to ensure the process is running within specification, as minor deviations from protocols can have large effects on mRNA-based assay results.

NGS panel CDx developmentNGS panel CDx development

In the case of NGS panel companion diagnostics, where specific gene mutations or gene expression signatures are linked to the successful registration of a drug, the regulatory authorities expect a higher level of validation. If a pre-existing FDA-approved assay exists the diagnostic lab will need to show that the panel performs as well as the previous assay.

In the case of new assays, the lab will need to analytically and clinically validate the panel for the specific genes of interest, using clinically relevant material and suitable GMP level reagents. Other genes on the panel that have not been previously validated may also be reported to clinicians under CLIA/CLEP guidance, but these will need to be considered informative rather than being used specifically for drug selection.

In conclusion

In the case of solid tumours where material may be limited, NGS panels clearly have an advantage over older, single biomarker based technologies and are likely to dominate in the next few years.

It is important to point out, however, for more readily accessible material such as blood or saliva, material quantity is less of an issue and older molecular technologies such as PCR may still be preferable due to reduced cost and potentially increased sensitivity.

In the long term, we believe it is likely that advances in technology will allow whole exome and eventually whole genome sequencing to be used routinely in clinical practice. This will ensure the maximum achievable amount of information can be gained from small clinical samples.

Almac’s Clinical Trial Solutions

Almac Diagnostic Services has created novel panel solutions for multi-arm clinical trials – enabling clients to evaluate multiple biomarkers from one sample, whilst saving precious tissue, time and cost. For further information on our panels click on the options below:



DNA Panel
RNA Panel

Please contact Almac Diagnostic Services if you would like to discuss any of the points raised.

An introductory guide to biomarkers, precision, personalised and stratified medicine

Professor Richard Kennedy

Global VP of Biomarker Development, Almac Diagnostic Services

Download the article

The application of biomarkers to improve patient outcomes is now common in clinical trials and is rapidly being adopted into clinical practice . Although most of what is being described in articles and at conferences is fairly straight forward, clinicians and scientists as well as the regulatory bodies in the US and EU tend to use what can be seen as fairly obscure terminology. In this article I have attempted to provide some clarity to this evolving field, particularly for those without a medical or scientific background. Several of the examples I use are in the area of cancer medicine, mainly because this has been at the forefront of biomarker development and also because I am a Medical Oncologist and cancer researcher by training. The concepts, however, are the same for the application of biomarkers to all human diseases. For more detailed explanations of concepts covered here I would recommend the FDA website that has various useful articles.

First of all we need to tackle the contentious issue of personalised medicine versus precision medicine versus stratified medicine. I have been on scientific advisory panels that have spent hours debating these terms with no resolution so it is not surprising there is considerable variation as to how they are used. Accepting that some people will disagree I have found the most useful way to think about these as the following:

a) Precision medicine. Selection of the best approach to managing a patient based on biological measurements (biomarkers). For example: a person’s genome is sequenced or a protein is measured in their blood and a specific drug is given that is known to work with their unique biology. It can also involve repeated monitoring of disease markers to allow tailoring of a treatment to an individual. For example, a specific cancer related gene mutation or protein may be detected in blood and disappear when successful treatment has been given.

b) Personalised medicine: Selection of the best approach to managing a specific patient, based on available biological information about the patient as well as a patient’s personal preferences, environmental factors, social factors and other factors that may affect the treatment choice. This can be thought of as an extension of precision medicine to a holistic approach. An example may be the choice of a prophylactic salpingo-oophorectomy (surgical removal of fallopian tubes and ovaries) in an individual who has tested positive for the BRCA1 or BRCA2 mutation, who has a family history of ovarian cancer, is aware of the relative risk of developing ovarian cancer themselves and has decided to have no pregnancies in the future.

c) Stratified medicine: Selection of the best approach to managing a group of patients. It can be considered a step towards precision and personalised medicine and is often used for treatment selection in clinical trials. Stratified medicine accepts that the treatment selected has a net effect of benefiting a group of patients as a whole but may not benefit every individual within the group. An example may be the use of the estrogen receptor to select hormone treatment in breast cancer. The majority (70%) will benefit, but it is accepted that some will not.

Biomarkers. These are required for precision, personalised and stratified medicine. Again there are multiple definitions for these, many that are quite complex. The official FDA definition is here and the European Medicines Agency (EMA) discusses biomarkers here. Basically a biomarker is something you measure to let you know if a person is healthy, not healthy, at risk of becoming unhealthy, or if unhealthy is responding to a treatment. Biomarkers can be anything that can be measured in a person and can range from DNA, RNA, Protein, metabolites measured from samples such as blood, tumour material, urine or saliva to imaging such as digital pathology and radiology with special contrast agents. Other examples are blood pressure as a biomarker for risk of heart disease and blood sugar and HBA1c as biomarkers for risk of diabetes related health problems.

Biomarkers can be qualitative or quantitative. Qualitative biomarkers are either present or not. For example a KRAS mutation can either be measured in a cancer or not. Quantitative biomarkers measure something using a continuous scale. For example a blood sugar is measured as a numerical concentration in blood. Tumour mRNA is often measured as a number relative to a control mRNA for which the level is known. Quantitative biomarkers will usually have associated values which represent “cut-offs” above (or sometime below) which is considered abnormal (biomarker positive).

A working classification of biomarkers is given in the following table. Each would require an article to describe them fully, but it is worth discussing prognostic and predictive biomarkers in more detail as these often cause confusion.

A prognostic biomarker is used to estimate the outcome for a patient in the absence of a treatment. For example, if a patient has surgery for breast cancer how likely are they to be cured without further (adjuvant) chemotherapy? The Oncotype Dx and MammaPrint assays are examples of prognostic biomarkers used to help answer this question.

A predictive biomarker is used to estimate the benefit for a specific treatment. When a predictive biomarker is registered with a regulatory body (such as the FDA) along with an associated drug to select appropriate patients it is referred to as a companion diagnostic and is on-label for the drug. Using the breast cancer example again, HER2 overexpression (predictive biomarker positive) indicates potential benefit from trastuzamab (Herceptin) treatment. HER2 overexpression is a companion diagnostic on-label for trastuzumab and the FDA require it is tested prior to prescribing this drug. The FDA discuss these definitions and the use of companion diagnostics further here.

A common point of confusion is where a specific intervention such as chemotherapy is shown to have an overall benefit in a high risk population as identified by a prognostic biomarker. This does not necessarily mean the biomarker is “predictive” for the treatment. For example, the breast cancer prognostic biomarkers mentioned earlier will identify high-risk patients who should be offered adjuvant chemotherapy (The low risk patients are unlikely to develop recurrent disease and therefore will not need chemotherapy).  The actual benefit for chemotherapy used for an individual high-risk patient however is unknown as the prognostic biomarker does not measure the biology that determines sensitivity or resistance to chemotherapy. Rather it is designed to predict the aggressiveness of a tumour and its likelihood of metastatic spread.  This means that the overall response rate is probably less than 50% for standard adjuvant chemotherapy in the high risk population.  Ideally a predictive biomarker that measures mechanisms of sensitivity or resistance would be used along with the prognostic biomarker to select the correct chemotherapy for the patient’s specific tumor.

 SusceptibilityIs patient at risk of disease?BRCA1 mutation and risk of breast cancer.  CAG Repeats and Huntington’s Disease.
Diagnostic/ BiomarkerIs disease present? D-Dimer measurement and deep venous thrombosis PSA and presence of prostate cancer.
PrognosticIs treatment needed? HBA1C levels and complications of diabetes. MammaPrint assay and risk of breast cancer recurrence.
PredictiveWhich treatment?KRAS mutation and cetuximab resistance in colon cancer. ER and response to tamoxifen in breast cancer.
PharmacogenomicIs treatment safe?The CYP2C9*3 single nucleotide polymorphism in germline DNA reduces warfarin metabolism by 90% increasing the risk of overtreatment and bleeding.
Pharmacodynamic/responseIs the treatment as prescribed working at a biological level?Loss of Ki67 as marker of decreased proliferation in cancer. Blood pressure as a response to anti-hypertensives.
Monitoring biomarkerIs there a change in the disease with time?Hepatitis C virus ribonucleic acid (HCV-RNA) for assessing patients with chronic hepatitis C.
PhysiologicalIs the patient fit for treatment?Renal function tests prior to platinum-based chemotherapy.

Biomarker Validation. This again is an area of much confusion with several different terms used. There are, however, two main aspects to proving a biomarker is fit for purpose that can be summarised as:

i) Analytical validation. This is the proof that the biomarker is technically robust (ie it measures what it is supposed to reliably). This is what the Clinical Laboratory Improvement Amendments  (CLIA) legislation is primarily focussed on in the US. Indeed, in order to be able to use a biomarker for the selection of patient treatment in many US states, including clinical trials, the assay must be compliant with CLIA biomarker validation requirements.  This typically includes:

  1. Accuracy– does the biomarker measure what it is supposed to? Usually it needs to be compared to another measurement technique to demonstrate comparable data.
  2. Precision– does the biomarker give the same result for the same sample every time it is run or is the technology/process “noisy” with a lot of variation?
  3. Analytical Sensitivity– what is the minimum amount of biological material (eg DNA from blood) that is required to give a reliable result?

ii) Clinical Validation. This is proof that the biomarker can be used for the clinical purpose for which it has been designed. The FDA and EMA require adequate clinical validation before a biomarker can be routinely used in the clinic. The process usually involves the application of the biomarker to a patient population that is entirely different (the validation dataset) to that used for the purpose of discovery and development (the training dataset). The true ability of the biomarker to guide precision medicine can be estimated in terms of sensitivity (ability to identify those patients with an adverse outcome and distinct from “analytical sensitivity” mentioned above) and specificity (ability to identify those who do not have the adverse outcome).  Sometimes the validation can be given in terms of a Hazard Ratio (HR) which is a measure of the biomarker’s ability to predict the risk of developing an adverse event over time such as cancer recurrence, stroke, death etc. A P-Value indicates the level of statistical certainty that the biomarker is performing, with a value of less than 0.05 conventionally indicating it is working as expected.

Historically clinical validation has often been poorly performed with the same patient population being used for the purposes of discovery and clinical validation. This approach rather unsurprisingly demonstrates the biomarker working very well in this defined group of patients and is referred to as “overfitting”. The problem is that when someone then applies the overfitted biomarker to a separate group of patients it can fail due to differences between the populations the investigators were not aware of. For example a centre may collect patient samples using a specific protocol that is not used elsewhere resulting in any locally developed biomarkers only working in that centre. Overfitting is avoided by ensuring the clinical validation patient population is entirely different from that used for biomarker discovery. In addition a third party should be used to apply the biomarker independently from the original investigators using a pre-specified, locked protocol to prevent any unintended experimental bias.

One area of confusion is the use of the term “Biomarker verification” that is sometimes used interchangeably with the “Biomarker validation”.  Biomarker verification is best reserved for the analysis of a sample set to ensure that a diagnostic lab can run a commercially available biomarker accurately and to the manufacturer’s specification. For example, a lab may acquire kits from a vendor to measure HER2 amplification. They will need to show that they generate comparable results to other labs when analysing a set of verification samples before offering the assay to patients.


This article has covered some of the main concepts of biomarker application which will hopefully orientate those new to the field and provide a working knowledge. There are of course nuances specific to certain situations that are covered elsewhere in more detail.

Please contact Almac Diagnostic Services if you would like to discuss any of the points raised.

This website uses cookies. By continuing to browse the site, you are agreeing to our use of cookies