Significant Bacteria and Their Thresholds – Part 1

Anyone who regularly reads peer-reviewed medical studies on the microbiome will notice findings reported as bacteria being “too high” or “too low,” with phrases like “trending” when statistical significance isn’t reached. Frankly, my reaction to 95% of these papers is an eye-roll, as the statistical methods used are often inappropriate for the data at hand. With multiple degrees in statistics, professional memberships, and experience, I’m acutely aware of both best practices and common pitfalls.

Microbiome data distributions frequently display extreme skewness—often greater than 20. In such cases, computing mean and standard deviation is simply incorrect.  My friend “Perplexity” writes Mean and standard deviation become inappropriate measures for computing significance if the distribution’s skewness is substantial—specifically, when the absolute skewness exceeds ±2.

Despite this, using these metrics remains standard in high school statistics and unfortunately persists in many life science studies. This “comfort zone” approach does nothing but cloud true findings in microbiome science.

My alternative methodology uses a much larger, highly annotated dataset—over 4,290 unique samples generously donated to Microbiome Prescription., most transferred from Biomesight.com. Importantly, these samples are uniformly processed and richly annotated with symptoms rather than diagnoses, yielding superior analytical clarity.

My Natural Questions to ask

Natural for a statistician that is.

  • For people with a symptom or diagnosis
    • What are the significant bacteria associating (and likely causing) the symptom
    • What is the threshold levels for these bacteria
      • I use levels and not level because I have observed the same symptom may occur with a bacteria outside of a specific range. That is, too high or too low. I have also encountered this reported in a few studies, often hidden under a term like “altered microbiome”.
      • There is a dangerous assumption in the literature that significant bacteria must be either too high or too low. I unfortunately know Kierkegaard’s “Either/Or” well.
      • There are no universal threshold for all symptoms, each has its own
  • For people without a symptom with a statistical model but with dysbiosis
    • How do you determine which bacteria are significant?
    • What is the threshold levels for these bacteria

Over the last decade, these are important questions because they lead directly to treatment suggestions.

They are also significant in evaluating progress. At present I have a forecasting algorithm that has a high prediction rate for symptoms from a microbiome. The forecasting algorithm also is useful for evaluating progress, an example for a recent sample the donor asked me to review.

Prediction

The checks indicates that the donor agrees that they have this symptom.

Monitoring

The person above followed the suggestions and the subsequent test results are shown below.

What are the most common bacteria associated with symptoms?

This is a generic question that is useful for health practitioners to know. For example, Kristina Mitts, of Mind Mood Microbiome and who I frequently correspond with, or Dr. Jason Hawrelak.  

Using more appropriate statistical methods on our sample of 4,292 distinct different samples; we found significant bacteria identified over 327 symptoms resulting in the following statistical significances.

Significance: P < Count
0.0513,855
0.0112,411
0.0017,614
0.00015,532

So what are the top one for each of these significance?

Overall Significance

TaxonnamerankInstances
820Bacteroides uniformisspecies165
35833Bilophila wadsworthiaspecies142
35832Bilophilagenus139
818Bacteroides thetaiotaomicronspecies137
1426Parageobacillus thermoglucosidasiusspecies133
118884Gammaproteobacteria incertae sedisno rank125
871324Bacteroides stercorirosorisspecies124
120580Symbiobacterium toebiispecies122
53244Desulfonatronovibriogenus122
543349Symbiobacteriaceaefamily122
2733Symbiobacteriumgenus122
1498Hathewaya histolyticaspecies122
454155Paraprevotella xylaniphilaspecies120

P < 0.05

TaxonnamerankInstances
2950010Salidesulfovibriogenus47
221711Salidesulfovibrio brasiliensisspecies46
658623Chelonobactergenus45
69224Erwinia psidiispecies44
213462Syntrophobacteralesorder44
3024408Syntrophobacteriaclass44
31977Veillonellaceaefamily44
1843489Veillonellalesorder43
550Enterobacter cloacaespecies43
35832Bilophilagenus42
841Roseburiagenus41
53244Desulfonatronovibriogenus41
871324Bacteroides stercorirosorisspecies41
1260Finegoldia magnaspecies41
1498Hathewaya histolyticaspecies40

P < 0.01

TaxonnamerankInstances
35833Bilophila wadsworthiaspecies51
78448Bifidobacterium pullorumspecies50
820Bacteroides uniformisspecies47
841Roseburiagenus46
818Bacteroides thetaiotaomicronspecies44
118884Gammaproteobacteria incertae sedisno rank41
1769729Hathewayagenus41
1426Parageobacillus thermoglucosidasiusspecies41
112902Propionisporagenus40
36853Desulfitobacteriumgenus40
386414Hoylesella timonensisspecies40
119065unclassified Burkholderialesfamily40
1853231Odoribacteraceaefamily40
400091Hymenobacter xinjiangensisspecies39
209080Propionispora hippeispecies39
871324Bacteroides stercorirosorisspecies39
69224Erwinia psidiispecies39
35832Bilophilagenus39

P < 0.001

TaxonnamerankInstances
820Bacteroides uniformisspecies50
35833Bilophila wadsworthiaspecies37
35832Bilophilagenus32
118884Gammaproteobacteria incertae sedisno rank32
658623Chelonobactergenus31
246787Bacteroides cellulosilyticusspecies31
120580Symbiobacterium toebiispecies31
543349Symbiobacteriaceaefamily31
253238Ethanoligenensgenus31
2733Symbiobacteriumgenus31
292833Candidatus Rhabdochlamydiagenus30
324707Candidatus Rhabdochlamydia crassificansspecies30
1426Parageobacillus thermoglucosidasiusspecies30
689704Candidatus Rhabdochlamydiaceaefamily30
70190Chroococcusgenus29
402401Chroococcus minutusspecies29
1890464Chroococcaceaefamily29
283169Odoribacter denticanisspecies28

P < 0.0001

TaxonnamerankInstances
820Bacteroides uniformisspecies40
246787Bacteroides cellulosilyticusspecies34
818Bacteroides thetaiotaomicronspecies30
1963360Parachlamydialesorder30
454155Paraprevotella xylaniphilaspecies30
1426Parageobacillus thermoglucosidasiusspecies30
2733Symbiobacteriumgenus29
543349Symbiobacteriaceaefamily29
120580Symbiobacterium toebiispecies29
35832Bilophilagenus26
191412Chlorobiaceaefamily25
256319Chlorobaculumgenus25
244127Anaerotruncusgenus25
189723Prevotella micansspecies25
53244Desulfonatronovibriogenus25
324707Candidatus Rhabdochlamydia crassificansspecies24
191410Chlorobiiaclass24
35833Bilophila wadsworthiaspecies24

Summary

This is a high level overview of Significant Bacteria. The patterns above are specific for tests done by Biomesight; a lack of standardization results in using these identifications for other tests is unsafe (legal sense). Background here. IMHO, it is a moral responsibility for labs to produce similar tables.

The key findings are:

  • “Common suspects” such as bifidobacterium and lactobacillus are missing!
  • Large sample sizes with the same processing is critical. The processing must be the same as used in a clinical setting.
  • Appropriate statistical methods must be used

Stay tune for the next part as we drill deeper into appropriate handing of data with some specific issues like Long COVID.

The input data that I used is publicly available at: https://citizenscience.microbiomeprescription.com/

Post Script

Probiotics and the above gets interesting. Take Bacteroides uniformis which is at the top of many of these tables. If we go to my bacteria association site,

We can determine the probiotics (available or pending) that will increase this bacteria (none decreases)

Again, the “cure all” lactobacillus and bifidobacterium genus is absent (apart from Ligilactobacillus ruminis which is not currently available).

Leave a Reply