Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BA.2.75 lineage with ORF1b:V1706I [~600 seq as of 2022-08-18] #965

Closed
9 tasks
corneliusroemer opened this issue Aug 18, 2022 · 8 comments · Fixed by #971
Closed
9 tasks

BA.2.75 lineage with ORF1b:V1706I [~600 seq as of 2022-08-18] #965

corneliusroemer opened this issue Aug 18, 2022 · 8 comments · Fixed by #971
Milestone

Comments

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Aug 18, 2022

To make better sense of the multiple independent occurrences of spike mutations of interest within BA.2.75* like for example S:346T and others (see #961) it may make sense to designate the big branches that are coming off the BA.2.75 polytomy.

GISAID query: NSP14_V182I,NSP3_S403L

We've already got one designated: BA.2.75.1 with S:574 in #907

Another big lineage with international spread is the branch with ORF1b:V1706I (G18583A) that accounts for ~20% of sequences within BA.2.75*

In any case, even if this branch is not designated, at least this issue draws attention to the existence of this branch. There will definitely be multiple child lineages in due course - which will get called BA.2.75.X if this lineage here isn't designated or get their own alias if this one here is designated.

GISAID query that should catch most of these: NSP14_V182I,E_T11A,NSP3_S403L

Proposed lineages on this branch include (maybe I missed some):

covSpectrum query: https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=nextcladePangoLineage%3ABA.2.75*+%26+G18583A+&variantQuery1=nextcladePangoLineage%3ABa.2.75*+%26++ORF1b%3AV1706I&

@corneliusroemer
Copy link
Contributor Author

corneliusroemer commented Aug 30, 2022

We are at ~1000 sequences now.

I've been looking at designatable lineages in there with the benefitial mutations: S:346T, S:486S and S:490S

It looks like we have the following branches:

  1. ORF1a:Q1198K, this one has a 486S sublineage - since the overall lineage doesn't seem to be growing this one may not need to be designated as it could possibly stay small. Not many international sequences of this one. But good to keep in mind that this branch is separate for the rest.
  2. S:346T possibly directly on the polytomy, or maybe after 28657T (~20 sequences, also still pretty small so maybe worth waiting) - mentioned here: BA.2.75.3 sublineage with S:R346T (35seq, 20xIndia/5xSingapore, also Israel, Belgium, US, as of 2022-08-21) #918 (comment)
  3. S:F486S on the polytomy (80 sequences), there are sublineages here, one with ORF7a:A15S, and another one with S:346T

Of the above, the lineage with S:F486S on the polytomy is worthy of immediate designation - the rest still needs to wait whether it grows to be significant.

@silcn @AngieHinrichs any thoughts?

corneliusroemer added a commit that referenced this issue Aug 30, 2022
corneliusroemer added a commit that referenced this issue Aug 30, 2022
Added new lineage BM.1 from #965 with 45 new designations and 8 updated from BA.2.75 and BA.2.75.3
@AngieHinrichs
Copy link
Member

S:F486S on the polytomy (80 sequences), there are sublineages here, one with ORF7a:A15S, and another one with S:346T

In the 2022-08-29 UShER tree, the BA.2.75.3 > T23019C (S:F486S) branch has only 3 sequences:

  • India/DL-ILBS-WGS6688/2022|EPI_ISL_14584554|2022-07-22
  • India/OR-ILSGS19184/2022|EPI_ISL_14602050|2022-08-11
  • Israel/SMC-7099997/2022|EPI_ISL_14686449|2022-08-15

-- <<80 sequences... can you send some example IDs? I'll try to figure out why they aren't showing up there.

@corneliusroemer
Copy link
Contributor Author

corneliusroemer commented Aug 30, 2022

@AngieHinrichs It should be all these in here: 615e53c

Usher really struggles with BA.2.75 because it has so particularly many reverted sequences :/

I look at the splits in Nextclade - very low tech, just coloring etc

I'm curious what you find...

It could be that the order is the other way round in Usher: first 346 then 486 but I'm pretty sure 486 happened first.

The order is off because of reversions.

@AngieHinrichs
Copy link
Member

Yep, reversions. Of the 51 samples you just added to lineages.csv for BM.1, 22 are found in the 2022-08-29 UShER tree.

  • 3 are the ones I found above
  • 8 are on a BA.2.75 > reversion 26577 branch with a mini-BA.2.75.3 -- I might be able to fix at least some of those by pruning and re-adding now that BA.2.75.3 is more filled out
  • 9 are on the BA.2.75.3 > C3857A (ORF1a:Q1198K) branch but with a reversion on 3857 😬 -- the attraction of S:F486S + S:R346T is stronger than the repulsion of not having 3857. There are 9 sequences with 3857 but not 23019 (F486S), 8 sequences with both 3857 and 23019 (4 of which also have 22559 (R346T)), and 14 sequences with 23019 and 22599 but not 3857. If I remove & add back sequences, it's possible that 23019 and 22599 would be put first and then 3857 after.
  • A few others are placed elsewhere in BA.2.75.3, with sequences that they have more mutations in common with, for example India/KA-RFNB-15726/2022 shares G27870T, C29545T, A1876G, T8800C, and G29081T with Australia/NSW-ICPMR-31775/2022 (I haven't checked whether any of those were imputed from ambiguous or N bases, just going by usher placement) and then has several private mutations including T23019C (F486S).

@corneliusroemer
Copy link
Contributor Author

Thanks for the investigation @AngieHinrichs!

Seems like BA.2.75 is pretty much a nightmare scenario for Usher.

Maybe there should be a separate Usher build with much stricter reversion requirements? Anything that has even a single reversion gets thrown out? It would have fewer sequences but be cleaner!

Copy link

Hi @corneliusroemer. I'm very busy in real life at the moment so I can't spend much time looking at sequences. When the numbers were still small I was manually removing the reversions from the fastas in order to see the correct placements on the Usher tree, but now I would need to automate that and I don't have the time to write a script.

Don't think we can know which of 346T and 486S came first in the lineage with both (plus 490S). Given this, I would say the designation should match the order of the Usher tree, but C3857A followed by A3857C looks clearly wrong to me, and I don't think fixing the reversions will fix that placement. @AngieHinrichs if you shout loudly enough at Usher can you get it to accept that it's wrong? :)

@AngieHinrichs
Copy link
Member

Seems like BA.2.75 is pretty much a nightmare scenario for Usher.

With each new major-wave variant, the problem of amplicon dropout / assembly pipelines causing false reversions has progressively worsened. There was a little bit of a problem with reversions causing a mini-Alpha, then quite a few mini-Deltas so we added branch-specific masking, and then with BA.1 every primer scheme had something or other knocked out and if it weren't for wanting to catch recombinants I'd be masking all the major defining mutations... anyway, yep.

C3857A followed by A3857C looks clearly wrong to me, and I don't think fixing the reversions will fix that placement.

I agree. It's one of those cases where there's just not enough info for a purely parsimony-based approach to sort out, but usher has a tie-breaking algorithm based on number of descendants of a node that often works to settle things out once there are more sequences, if I remove some sequences and add them back. That's about the only form of "shouting" I have at this point. Every once in a while I think maybe it's time for a manual node-moving utility function though (with checks to make sure that the move doesn't change alleles assigned to any descendants). [We added matUtils mask --move-nodes that fixes a particularly egregious situation of mutations and reversions that matOptimize produced for a limited time, but stopped short of adding a general purpose just-move-it function.]

@corneliusroemer
Copy link
Contributor Author

corneliusroemer commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants