The gist of between glaciers and genomes

 

This week I’ve been at the Microbiology Society Annual Conference in Birmingham and had the opportunity to present in the Microbial Diversity & Interactions in the Environment session in Tuesday. My theme was From glaciers to genomes…and back again and built around exploits with the Oxford Nanopore MinION in the last year. Cedric Laczny asked me to provide a summary of what was said and done for non-attendees – so here it is.

So far, we’ve mainly used MinION for in-field metagenomics, but I decided to only mention this as a scene setter. Our larger problem is that Earth has 70% of its freshwater stashed in glacial ice occupying 11% of its surface area and the genomic diversity microbial inhabitants of this (by volume) massive freshwater ecosystem is very poorly mapped. I believe we have fewer than 10 public bacterial genomes, less than five cyanobacterial genomes, no eukaryote or archaeal genomes and a slack handful of amplicon or shotgun datasets to cover ca. 198,000 glaciers and three ice sheets. It’s embarrassing to chat to folk in the medical microbiology community: last night I passed a poster reporting >10,000 Salmonella genomes.

As we are embarking on an unprecedented experiment in destroying glaciers and because microbes are confounding factors in that experiment it seems prudent to start discovering this genomic diversity. I have tried to engage the research community to gauge interest in a genuinely communal effort to sequence as many genomes as we can afford to, but the response has largely been muted or inclined to consolidate the project at one institution. If anyone who reads this is interested in a network based collaboration – I am all ears.

But, when starting from a low point, even incremental advances can be transformative. So for now we’ll go with me and my MinION.

The scope of my talk covered the behaviour of just a few cyanobacteria associated with cryoconite formation. My plans to show the Chris Hadfield-approved zoom shot into a Greenland cryoconite hole was scotched by IT issues – the lack of a computer mouse!

IMGP1091

Nevertheless, in ten minutes I needed to cover cyanobacterial sequence diversity in cryoconite on a timescale from ~12,500 years before present to Tuesday before last – literally. I need to write a separate update about the Walters Kundert “Bleakest Midwinter” project, but for the moment I’ll just say we hit our bag limit on Svalbard cryoconite samples in the “light winter” phase this March and preliminary MinION metagenomes and qPCR on our portable Mic cycler are prompting interesting hypotheses about who lives and who dies between dark and light winter. I’ll look forward to the final phase of sampling so I can then batch the samples for other analyses.

The core part of my methodology has been to resolve these genomes from shotgun metagenomes sequenced on the MinION. I’ve been multiplexing 3-6 samples per flow cell using Josh Quick’s one pot barcoding ligation protocol. In contrast to the local norms of the parish I find myself in I am not able to bask in the glory of closed, single contig genomes formed from ultra long whale reads – but then again I am working with degraded, old (ancient?) bead-beaten DNA so my expectations were adjusted downwards from the start.

Nevertheless, I have been able to learn some interesting things. As this needs to be worked up into a couple of papers, I won’t delve into too much before peer review.

In short, following error correction and assembly with Canu, and then a binning strategy based on the taxonomic classifier to select contigs with good protein-level matches to taxa of interest I do have bins corresponding to discrete bacterial genomes.  Quick annotation with prokka throws up both interesting metabolic traits and the prospect of strain resolution. Where we have non-targeted metabolomics data from the same samples, the presence/absence of pathways matching the salient metabolite fluxes  is quite gratifying. For one of my genome bins which matches a non-cyanobacterial taxon where the evolution of a particular autotrophic pathway is an unfinished business. I can tick off genes for their presence or absence in full concordance with the sequenced isolate, a bacterium with a large and complex genome which was reportedly very difficult to assemble on short reads only.

But it is with the rRNA operons that I’ve been having most fun. Agreement between the binning and rRNA operon taxonomy is excellent. Last year I co-authored a study led by Takahiro Segawa which used a retro, Sanger based long read strategy to resolve contiguous 16S-ITS environmental sequences from cyanobacteria on a global range of glaciers. I think the ITS haplotype data in that study offers the highest resolution and spatial coverage of diversity across the terrestrial cryosphere, so I have simply been looking to see where my MinION metagenome-assembled genomes have matched the Segawa haplotypes. The good news is that the 16S genes match the expected 16S OTUs and the ITS haplotypes extracted from my genomes lie within the geographic clades for the population structures of those OTUs. So – seems legit.

Cyanobacterial haplotypes from pre-modern cryoconite cyanobacteria either published by Takahiro Segawa or in our possession also sit within the extant strains from those regions, hinting at the stable colonization of the cryosphere over extended timescales.

Personally I would hope this observation of congruence between pre-industrial and contemporary cryoconite ecosystem engineers helps make abundantly clear that the old chestnut “dark stuff on ice is simply pollutants” is utter dog toffee.

As always when presenting nanopore data someone is duty bound to ask the question “but isn’t the error rate terrible”?  This is based on the observation that the accuracy per base of uncorrected, raw reads is in the range 85-92%. Fair one. But we are not playing with these reads, we are playing with error-corrected assembled data.

While I can point to the efforts of folk who do multiple rounds of polishing with racon or nanopolish, and their dissatisfaction with 99.9x% accuracy for this initial effort I have only used error correction with canu. For one thing I am wary of citing CheckM statistics as remaining indels will likely undersell the estimation of completeness. For now, as the only hook I have to hang these genomes on is the ITS strain data, tracking the rRNA operons has been my goal. One of the next stages might be to look across the genome so I will be polishing and/or going hybrid as funds permit.

So what’s the error rate like in this context? Here’s a quick test: Dr Nathan Chrismas has kindly provided his BC1401 isolate of Phormidesmis priestleyi which represents the first cyanobacterial isolate genome from glacial ecosystems, sequenced with Illumina reads. For cyanobacterial genomes, one can usually read them as “metagenomes” as a number of cohabiting bacteria are difficult to get rid of, even in a unialgal culture. In his paper, Nathan devised a clever bioinformatics strategy for doing so…rather than hammering the culture with say, bleach (sorry Nathan). But, when resequencing the isolate from culture the presence of contaminants obviously recurs. So, turning the negative to a positive I can consider the ONT sequence data from BC1401 a reduced complexity, P. priestleyi dominated metagenome. This starts to sound like the real thing, covered by a similar sequencing effort, but for which there is a 213 contig genome sequenced with Illumina data to act as a benchmark. Long story short, across the 4.8kbp rRNA operon I obtain a 98.92% identity between the canu corrected and assembled ONT data and the Nathan’s Illumina assembly.  Not 99.9999% but fairly respectable for a first pass.

In summary, given the potential strain level resolution and functional insights from genomes recovered from metagenomics data generated on an USB powered device I get the feeling our MinION will be just as handy in our home lab as well as our field lab.

My thanks to collaborators Dr Joseph Cook, Dr Sara Rassner, Professor Andy Hodson and Professor Alun Hubbard for their contributions to this work in the varied form of samples, fieldwork and scripts for pulling out interesting contigs, and to the session organizers for the opportunity to speak.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s