From 16S to the bigger picture

Content warning: This post contains strongly mixed feelings about 16S. Also a COI disclosure at the end. May contain traces of kitome.

Aber

Tl;dr: Our view of genomic diversity in a microbial community (left) subject to 16S analysis is limited to high resolution of a sub-region (centre) using standard workflows, but Nanopore sequencing of 16S (and indeed the whole rRNA operon) offers agile, error-prone long read data. Hopefully we can gain a (slightly) more transparent view of the bigger picture from using Nanopore to sequence 16S.

I’m responding to interest from Peter van Heusden and Sophie Nixon in using Nanopore sequencing for 16S analyses reply to our Radio 4 Nanopore exploits.

Some tweets will not cover this issue in sufficient detail, so here’s a rantpiece. Hopefully I will address the betamax/VHS or bluray/DVD issue but there are bigger questions to address in this area.

(For avoidance of doubt, we didn’t perform 16S for the Today programme. We did a shotgun metagenomic profile. Of course, 16S is not metagenomics. This is consequential for what follows.  I will also use “16S” as shorthand for “high throughput amplicon sequencing of a well-loved marker locus, for example the 16S rRNA gene or its product, 16S rRNA.)

We need to talk about 16S. It’s an old friend we think we know intimately, but we make some bad assumptions about them, lie about them to ourselves, and trash them endlessly to our peers. The problem is ours, not theirs. I did ponder whether I should tag this post NSFW – Not Safe for Woese – but I figured that would be crossing from cheeky to rude. I should acknowledge that there is so much we have learned about Archaea and Bacteria – even the fundamental existence of the Archaea alone – from 16S analyses that there will always be value in looking at 16S.

In recent months though,  microbiome product vendors have offered us wildly varying opinions ranging from the negging of 16S and its uncritical praise from “the leader in microbial genomics“. What’s a PI to think – let alone a member of the general public parting with their money?

Fortunately some in the academic community offer us a more nuanced interpretation. I have attended a brilliant (and funny) talk by Alan Walker on the myths and truths of microbiome analyses. Mick Watson’s team have published a series of robust recommendations on the technical choices to be made when confronted by the madness of the microbiome.

That many of their recommendations (e.g. “treat all samples alike”, “yeah, have controls”, “think about the power required for your experiment”, “be wary of interpreting across studies with different methods”, “work with the best quality samples you can”) are the bread and butter of training new scientists in any field at the undergraduate or high school level in How To Do A Science makes me wonder if the madness lies not in the microbiome, but ourselves. I fully expect to be backed up a study, somewhere showing a correlation between Bradyrhizobium relative abundance in the brain microbiome of PIs and the tendency to skip fundamental aspects of experimental design.

So, on the basis that we should be doing those things anyway (or providing equally robust, well-supported arguments as to why they would be inappropriate in your specific experiment) I will instead focus on why we perform 16S analyses. I guess a catch-all is that we wish to obtain a picture of our desired microbial community. From that picture we may wish to infer its phylogeny and make some (educated?) guesses about what the community does.

In my limited time (2006-2018) looking at microbial communities we have considered the following assumptions appropriate for achieving this aim with 16S:

  • That the denaturing points of fragments of 16S with different %GC content thrashed through an urea-soaked sliver of delinquent jellyfish with the capability to resolve maybe the 30 most abundant different denaturing points when picked up from a dirty lab floor, jigsawed back together and scanned in was OK.
  • That PCR amplifying 16S genes, cloning and sequencing a few dozen to a few thousand of them to represent the estimated ten-to-the-Brian-Cox number of genotypes in a sample was OK.
  • That PCR amplifying 16S genes, cutting them with an enzyme or three and then running them through a capillary to see the size of the first fragment in an amplicon was OK.
  • That PCR amplifying a little bit of 16S and “massively parallel sequencing” it a few thousand times per sample was OK.
  • That PCR amplifying a different bit of 16S and Illumina sequencing it a few more thousand times per sample was more OK.
  • That not providing experimental replicates for our samples because our budgets were smaller than our ambitions was OK.
  • That publishing papers describing microbiomes from low-biomass samples comprising taxa described in Table 1 of Salter et al (2014)  was OK.
  • That ever more intensive data processing to make the leap from correlations between 16S relative abundances to causation driven by genes in unsequenced regions of volatile genomes was OK.
  • That sequencing  cDNA of 16S to provide an caveat-free view of an active community (whatever “active” meant) was OK.
  • That highly sophisticated data processing will provide high resolution insights to not just the whole 16S gene but presumably the parent genome from exactly determining the sequence of a few hundred bases will be OK.

Over that decade I would expect each of these assumptions has jammed the desks of editors in any number of microbial science journals with papers based on them, followed by papers critiquing them and then another raft of papers highlighting newer, better methods. This is what we think of as progress. Papers, and hence careers, are built on it.

The funny thing is (to me) is that I have a sample set, obtained in 2006, which has provided congruent results on DGGE, T-RFLP, clone libraries, 454, and most recently Nanopore sequencing. The results are congruent with analyses of other samples from the same habitats done using Ion Torrent V1-V3 and Illumina V4. It even includes some of the Salter et al (2014) dirty dozen taxa, for they like oligotrophic, UV-stressed cool waters irrespective of whether they lie in a bottle of EB or the Arctic. (Orthogonal analyses e.g. FISH, culture etc) support that claim).

This, to me highlights some deeply shocking key truths:

  • Every method has its limitations.
  • Every method has its uses.
  • Our job is to apply these methods while recognizing their limitations.
  • In many cases, the limitations of 16S override the limitations of the method used to study it

And in that final point, it is worth remembering that 16S as a locus is far from ideal. It’s slow evolving, changing slower than virtually any process we’ve linked changes in 16S to. In the time taken to achieve a 3% divergence used to call an OTU, I expect we have gone from arguing about how to use stone tools to how to sequence stools. The 16S gene is also sometimes incongruent with taxonomy. Universal primers…aren’t. And so on. But, thanks to the pioneering work of Woese and all those inspired by him, if we are to pick a single locus to sequence, in 16S we are afforded well established tools and databases to work with. So even as we make a transition towards routine genome-resolved metagenomics it has its place.

After all, until we obtain single-contig genomes from every taxon (individual?) across orders of magnitude changes in abundance typical in uneven, complex microbial communities – now there’s an obvious challenge for PromethION – we are in the business of making extrapolative statements from incomplete data. We need to handle that uncertainty confidently.

So, what about Nanopore 16S sequencing? (Finally! The point!)

Caveat: I have nothing in the peer reviewed space on this, as I’ve yet to resubmit a paper rejected on the grounds of insufficient novelty – the core body of work is summarized in this talk at the Nanopore Community Meeting in New York*.

Well, it’s just the latest in the string of methods which will be used and abused in the name of 16S sequencing. There are some game-changers though. This is not VHS/Betamax or Bluray/DVD. It will prove to be either VHS/DVD or Betamax/Bluray (you decide what that means).

Specifically, Nanopore 16S offers conspicuous advantages

  • It is portable. Your 16S lab fits in your daysack.
  • It is fast. (Our first 16S run totally trashed live basecalling on our ONT spec laptop. While MinION can do 2.3Mb long reads without realizing it, sequencing short 1.2kbp, clean amplicons at 450 bases per second is turbocharged)
  • It can provide tens of thousands of useable full length 16S reads per barcode in a simple, kit-based multiplexed run within minutes/hours.
  • It does not require access to capital infrastructure or service providers.

But, what of its limitations. Insurmountable? Nope.

  • Error rate (will someone please think of the error rate! Somebody?!)
  • Established bioinformatic tools can’t handle ONT data. I have to treat every read as an OTU. (Here’s looking at the progressive, clever community of people that are developing e.g. QIIME, Mothur, vsearch).
  • The current 16S barcoded kit from ONT only has 12 barcodes. (There is capacity for more, and DIY barcodes work well in our experience). For off-the-shelf users this limits the size of your study, and given the quick throughput, the cost-effectiveness of a much longer flow cell life.
  • Reviewer 3 problems. See my comments above regarding the rate of change and opposition in the 16S analysis community.

These four challenges are soluble in my estimation: for the phylogenetic signal obtained by full length sequencing of 16S helps amortize the impact of error rate. Hopefully, there is ample motivation to provide bioinformatic solutions to this problem. If you have the dry skills to solve this, and need wet data – please contact me if you’re interested in collaborating.

At the moment I’ve aggregated 16S data to class/family but there is clear potential for genus/species level resolution. However, is this problem unique to Nanopore? How many write.ismej.py papers present the same five figures – map/study design, phylum/class stacked barchart, PCoA, CCA/RDA, network diagram using Illumina data, focusing on such coarse taxonomic resolution.

Think outside the box: And here is the final, potential game changer. With Nanopore, you need not stick to 16S. The question is not “which region of 16S?” or even “can you do full length 16S?” or even ” why not do ITS too?” it is “wouldn’t it be rude not to do 23S while we’re at it?” This paper by Lee Kerkhof’s team illustrates this, and the value of single-amplicon 16S-ITS data is highlighted in a real world, Sanger based study I coauthored.

So – 16S is dead, long live 16S?

 

*Conflict of Interest – disclosure: My registration and travel costs to present this work were covered by ONT, and ONT have kindly provided reagents for outreach work including 16S analyses.

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s