Ability to Predict Symptoms with 99.9% probability using Bacteria Incidence alone

I am off from my usual day job until the new year. In the new year I know that I will be very busy because my firm just landed a contract for a major software product that I am the principal for. I decided to give myself a challenge to explore on these down days:

How accurately can you prediction symptoms from the presence or absence of bacteria reported ALONE with different 16s test.

For those familiar with various forms of Artificial Intelligence, that approach is often used. It reduces the problem to a collection of true/false. For most microbiologists, it is a road not even thought about, lest travelled.

To make the challenge harder, I required the data to have a P value of 0.001. The analysis demanded bacteria-symptom associations with a stringent P-value of 0.001 (Chi² > 10.83), exceeding typical microbiome study thresholds. 

I have four contributed and annotated datasets:

  • 16s
    • uBiome: 791 samples
    • Ombre (formerly Thryve): 1,319 samples
    • Biomesight: 4,436 samples
  • Shotgun
    • Thorne : 253 samples

Using these datasets, I explored the strength of relationships based on Odds Ratio. A subsequent post includes using odds ratio based on a threshold of bacteria which will get much higher values. A high cumulative value indicates a very strong microbiome basis of the symptom and thus remediation.

For details, see the methodology in : New Standards for Microbiome Analysis? Also, taking amount of each bacteria into consider is shown in Ability to Predict Symptoms with 99.9% probability using Bacteria Incidence and Amount

Relationships were quantified using Odds Ratios (OR) at consistent taxonomy levels to avoid dependence, with cumulative log(OR) indicating symptom-microbiome strength (higher values suggest robust basis for remediation). ​

The Tax Rank indicates what is likely the most effective level to use for investigation (i.e. highest discrimination ability).

Thorne

Symptom NameTax_rankCumulativeCnt
General: Fatiguespecies25.2637

Ombre / Thryve

SymptomNameTax_rankCumulativeCnt
Autonomic Manifestations: Orthostatic intolerancegenus21.3923
General: Fatiguespecies7.7232
General: Headachesgenus27.2637
General: Myalgia (pain)species8.4931
Neurological: Confusionspecies1.312
Neurological: Difficulty processing information (Understanding)species9.0519
Neurological: Disorientationspecies1.403
Neurological: emotional overloadspecies4.7411
Neurological: Impairment of concentrationgenus22.7132
Neurological: Word-finding problemsgenus15.2415
Neurological-Audio: hypersensitivity to noisegenus29.5843
Neurological-Sleep: Chaotic diurnal sleep rhythms (Erratic Sleep)genus35.0141
Neurological-Vision: inability to focus eye/visiongenus42.3453
Neurological-Vision: photophobia (Light Sensitivity)genus47.8764
Post-exertional malaise: Inappropriate loss of physical and mental stamina,species20.2045
Sleep: Unrefreshed sleepspecies21.1949

uBiome

While no longer in existence, sharing numbers may be interesting.

Symptom NameTax_rankCumulativeCnt
General: Fatiguespecies5.6915
General: Headachesspecies3.127
General: Myalgia (pain)species1.746
Neurological: Confusionspecies2.823
Neurological: Difficulty processing information (Understanding)species1.576
Neurological: emotional overloadspecies6.0817
Neurological: fasciculationsstrain1.253
Neurological: Impairment of concentrationspecies14.2918
Neurological: Short-term memory issuesspecies0.947
Neurological: Spatial instability and disorientationspecies1.821
Neurological: Word-finding problemsspecies6.3811
Neurological-Audio: hypersensitivity to noisespecies7.7111
Neurological-Sleep: Chaotic diurnal sleep rhythms (Erratic Sleep)species7.726
Neurological-Vision: inability to focus eye/visionspecies11.6314
Neurological-Vision: photophobia (Light Sensitivity)species10.5713
Sleep: Unrefreshed sleepspecies5.4711

BiomeSight

Symptom NameTax_rankCumulativeCnt
Autonomic Manifestations: irritable bowel syndromespecies3.8710
Autonomic Manifestations: light-headednessspecies8.0715
Autonomic Manifestations: nauseaspecies2.5511
Autonomic Manifestations: Neurally mediated hypotension (NMH)species1.461
Autonomic Manifestations: Postural orthostatic tachycardia syndrome (POTS)species1.8711
General: Fatiguespecies7.6520
General: Headachesspecies3.9415
General: Myalgia (pain)species1.6310
Neurological: Confusionspecies4.936
Neurological: Difficulty processing information (Understanding)species0.639
Neurological: emotional overloadspecies0.8110
Neurological: fasciculationsgenus3.619
Neurological: Impairment of concentrationspecies2.105
Neurological: Short-term memory issuesspecies2.415
Neurological: Spatial instability and disorientationspecies2.914
Neurological: Word-finding problemsspecies2.6821
Neurological-Audio: hypersensitivity to noisegenus2.536
Neurological-Vision: inability to focus eye/visionspecies2.104
Neurological-Vision: photophobia (Light Sensitivity)species9.4628
Post-exertional malaise: Inappropriate loss of physical and mental stamina,species2.4412
Sleep: Unrefreshed sleepspecies1.7517

Summary

Nota Bene: the above is the cumulative of the log values. It is assumed that for each bacteria, the highest odd ratio is used /hit. A value of 0.81 means exp(0.81) = 2.25 is the highest odds ratio possible if the sample hits every child highest odds ratio. A value of 8.07 becomes odds ratio of 3188.

All of the above are 16s tests which typically are viewed accurate to species at best. The difference of test processing is strongly exhibited in the table below. For background on the challenge on a lack of standardization in microbiome testing, see my post from 6 years ago: The taxonomy nightmare before Christmas…

Symptom NameOmbreuBiomeBiomeSight
General: Fatigue7.725.697.65
General: Headaches27.263.123.94
General: Myalgia (pain)8.491.741.63
Neurological: Confusion1.312.824.93
Neurological: Difficulty processing information (Understanding)9.051.570.63
Neurological: emotional overload4.746.080.81
Neurological: Impairment of concentration22.7114.292.10
Neurological: Word-finding problems15.246.382.68
Neurological-Audio: hypersensitivity to noise29.587.712.53
Neurological-Vision: inability to focus eye/vision42.3411.632.10
Neurological-Vision: photophobia (Light Sensitivity)47.8710.579.46
Sleep: Unrefreshed sleep21.195.471.75

Looking at the counts:

Symptom NameOmbreuBiomeBiomeSight
General: Fatigue321520
General: Headaches37715
General: Myalgia (pain)31610
Neurological: Confusion236
Neurological: Difficulty processing information (Understanding)1969
Neurological: emotional overload111710
Neurological: Impairment of concentration32185
Neurological: Word-finding problems151121
Neurological-Audio: hypersensitivity to noise43116
Neurological-Vision: inability to focus eye/vision53144
Neurological-Vision: photophobia (Light Sensitivity)641328
Sleep: Unrefreshed sleep491117

I view these stark differences due to the fragments of RNA that each test looks at to make the identification of bacteria. It is those RNA fragments that is important.

All of the data used above is available for download.

Leave a Reply