SARS-Cov-2 Phylogeny


The figures below are visualizations of SARS-CoV-2 strains. Different colors represent different virus strains, the x-axis indicates the time (year 2020), and the y-axis indicates the amount of public genome sequences. Users may hover the cursor over the graph to check accurate numbers at interested time points. The data is obtained from GISAID.

GISAID clades:

Pangolin lineages:

As shown above, the clades (lineages/subtypes) identified by different groups are often too divergent from each other. Although similarities can be found across several main clades of different nomenclatures, the overall patterns are distinct. To address the inconsistency, we recommend to use the new Nextstrain clade as a consensus nomenclature. Below is the table showing the Nextstrain clades and corresponding marker variations.

Current SARS-CoV-2 phylogenetic information

Clades Similar clades Marker Mutations Max Frequency
Nucleotide Amino acid  
19A Root, A1a and A3 (old NextStrain); L, O and V (GISAID) / / 65-47% in Jan
19B B, B1, B2 and B4 (old NextStrain); S (GISAID) C8782T nsp4: S76S 28-33% Globally in Jan
T28144C orf8: L84S
20A A2 (old NextStrain); G (GISAID) C14408T nsp12b: P314L 41-46% Globally in Apr-May
A23403G S: D614G
20B A2 (old NextStrain); GR (GISAID) G28881A N: R203K 19-20% Globally in Mar-Apr
G28882A N: R203K
G28883C N: G204R
20C A2 (old NextStrain); GH (GISAID) C1059T orf1ab: T265I 19-21% Globally in Apr
G25563T orf3a: Q57H
*The phylogenetic information is obtained from nextstrain.org. The "Similar clades" column shows relatively high-overlapping clades from different nomenclatures used by different groups (Yatish et al. 2020).

SARS-CoV-2 Mutations and Hotspots


Many studies have investigated the mutations based on available sequence data. The number of identified recurrent mutations increases with the sample amount, but with a slower increasing rate, suggesting that the number of identified recurrent mutations is reaching the limit (Meriem et al. 2020;Lucy et al. 2020;Takahiko et al. 2020).

After the D614G mutation occurs in the first sample, this new virus strain quickly spreads and has quickly became the dominant subtype in many areas in the world. Interestingly, the P314L mutation also spread with D614G, but as a mutation with high frequency, P314L itself is unlikely to occur alone. Further studies on recurrent mutations is needed to distinguish the functional mutations and the by-products.

*The figures are derived from nextstrain.org.