Technical Note: With the same set of samples from the same labs you can get very different averages!

This post originated from a dialog with a Ph.D. in Molecular Genetics that I often discuss many aspects of microbiome analysis with.

The root of the problem is how many “Reads” from a 16s sample do you deem to be threshold for reliability. A “Read”, “num_hits” or “Count” is the number of matches to specific pattern found in the sample that matches a library. These are “best efforts” identification. Not always correct.

Accuracy can be as low as 62% [Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories ]. It is generally assumed that a single “Read” is questionable. Commercial labs and test providers will often use them so they can claim that they identify more bacteria than the competition. Accuracy is rarely a marketing concept.

To this end, we processed the biggest collection of samples of one lab with different Read Levels to see what happens. The higher accuracy required to be included that you use, the higher the values.

obsmeanstddevmedianboxplotlowboxplothightax_namerankReads
471386.55301.7301070Neisseriagenus1
242733.67387.05010110Neisseriagenus2
1361275.89835.57030210Neisseriagenus3
951800.311747.68020300Neisseriagenus4
682491.213853.41200360Neisseriagenus5
553059.515375.21600380Neisseriagenus6
414071.717748.52000500Neisseriagenus7
305517.020650.42400778Neisseriagenus8
217825.724488.4350501532Neisseriagenus9

Two labs may report different reference ranges for the simple reason that one requires at least 2 reads and the other lab 4 reads. This decision is often well hidden from the consumer. If the reference ranges are based on 4 reads and you apply them to 1 read samples then you will get a lot of false too high and too lows.

For the above example bacteria a 1 read reference range would have 386 being the average, while a 4 read reference range would have the average being 1800. So, a sample with 800 from 2 reads would be 2x the average for one reference range and and 1/2 the average for the other reference range.

This is part of the complexity of doing microbiome analysis and understanding the mechanism involved. Mechanisms that are often not understood by the labs and kit providers.