In silico tools used for the interpretation of novel variants

Our laboratory utilises a combination of freely-available in silico packages, subscription-only databases and information on the clinical phenotype and family history to assist with the interpretation of novel variants identified by sequencing in the laboratory.  The results of these investigations are summarised in our ‘Investigations into pathogenicity’ form with the overall conclusion stated.  This form accompanies the report for patients in whom a novel variant has been identified. 

 Below is a list of the tools that we use and, where relevant, a link to their website (please see disclaimer below).

Locus-specific mutation databases

HGMD® Professional is a curated collection of known (published) gene variants responsible for human inherited disease. Data included are mainly from the original published reports, although some data have been taken from ‘Mutation Updates’ and review articles (

The Diagnostic Mutation Database (DMuDB)is a repository of clinical quality variant data collected from diagnostic genetics laboratories. DMuDB supports the diagnostic process in UK genetic testing laboratories and accesshas now been extended to non-UK laboratories through a partnership with EMQN.


 Several other locus-specific databases for are also used.  Information pertaining to these will be stated on the ‘Investigations into Pathogenicity’ form.

Sequence Variant Databases

Sequence variant databases are used by the laboratory to determine the allele frequency of a variant in a population/populations.  Findings are listed on the ‘Investigations into Pathogenicity’ form.

dbSNP, also known as The Short Genetic Variations (SNV) database is a freely available catalogue of short variations in nucleotide sequences, developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI) (

The Exome Variant Server (EVS) is a freely-available web-based data server providing access to variants identified in over 6500 European American and African American individuals  by the National Heart, Lung, and Blood Institute’s (NHLBI) Exome Sequencing Project (ESP,

The 1000 genomes is a detailed catalogue of human genetic variants with a minor allele frequency of >1% identified in >1000 genomes from different ethnic groups ).  (

In Silico analysis

Analysis of amino acid substitutions

SIFT (Sorting Intolerant From Tolerant) is an algorithm which predicts whether an amino acid substitution will affect protein function based on sequence homology and the physical properties of amino acids.  A SIFT score of less than 0.05 is predicted to be deleterious. A substitution with a score greater than or equal to 0.05 is predicted to be tolerated.

PolyPhen is a tool that predicts possible impact of an amino acid substitution on the structure and function of a human protein using physical and comparative considerations.  Structural features such as amino acid atomic contacts and solvent accessibility are also assessed and empirically determined cut-offs used to predict if the substitution is ‘probably damaging’, ‘possibly damaging’ or ‘benign’.

Grantham score is a prediction of the effect of substitutions between amino acids based on chemical properties, including polarity and molecular volume, characterised into classes of increasing chemical dissimilarity: conservative (0-50), moderately conservative (51-100), moderately radical (101-150), or radical (≥151).

Align-GVGD is a freely available, web-based program that combines the biophysical characteristics of amino acids and multiple protein sequence alignments to predict the effect of missense substitutions.   The biochemical variation at each alignment position is converted to a Grantham Variation score (GV) and the difference between these properties and those of the variant amino acid being assessed are calculated and a Grantham Difference score generated (GD).  The predicted effect is classed as C0, C15, C25, C35, C45, C55, or C65, with C65 most likely to interfere with function and C0 least likely.

Analysis of possible splicing mutations

SpliceSiteFinder-like is based on position weight matrices (PWM) computed from a set of human exon/intron junctions for donor and acceptor sites.  The score is the probability of observing the sequence given the motif model (matrix).

MatEntScan is based on the approach for modeling the sequences of short sequence motifs such as those involved in RNA splicing which simultaneously accounts for non-adjacent as well as adjacent dependencies between positions. This method is based on the ‘Maximum Entropy Principle’ (Yeo G, Burge CB. J Comput Biol. 2004, 11:377-94).

NNSplice (Berkeley Drosophila Genome Project) is a prediction method based on neural networks (

GeneSplicer is an open-source software that combines several splice site detection techniques (

Human Splicing Finder is based on position weight matrices (PWM) with some position-dependent logic (Desmet et al 2009 Nucl Acids Res 37. e67).

External Website Disclaimer

Please note: We have no control over the nature, content and availability of external websites.  The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them. Links from this website are provided as an information service only, and we take no responsibility for the content of these websites.