In my prior post, Patterns of Microbiome Distributions Across Different Vendors, we saw that the distribution patterns were not consistent. With annotated samples from several different vendors, it was logical to see if there was consistency of bacteria identified across vendors for specific symptoms. This question was directed to me by Chidozie Ojobor, Ph.D. The results can be viewed on my Special Studies page.
This page hunts for statistical association with P < 0.001. The results shown are not only for bacteria, but estimates on compounds and enzymes using KEGG: Kyoto Encyclopedia of Genes and Genomes.
- Definitions:
- Singleton: only found in one of the labs
- Pair: two labs reported the same
- Triplet: all three labs reported the same
Given the difference of sample sizes and thus significance levels, some lack of consensus is expected.
- uBiome: 790
- Thryve: 1542
- Biomesight: 4604
Over P < 0.001 data, we had the following
- Looking at bacteria to symptom agreement
- 19558 singleton relationships
- 1914 pair relationships
- 94 triplet relationship. or 0.436%
- Looking at enzymes to symptom agreement
- 74159 singleton relationships
- 1629 pair relationships
- 3 triplet relationship or 0.004%
- Looking at compound to symptom agreement
- 69147 singleton relationships
- 9364 pair relationships
- 6189 triplet relationships or 7.3%
Of special interest is when we went to P < 0.0001 for compound to symptom , we got significantly better results for compounds
- 5952 singleton
- 45439 pair relationships
- 8153 triplet relationships or 13.7%
Bottom Line
While I am doing a naive estimate for compounds using KEGG data, the results support the model:
It is not the bacteria that causes the symptoms, it is the net amount of compounds produced by the bacteria that causes the symptoms. Surplus or deficiency of compounds can come from a vast array of different collections of bacteria. The bacteria may just be noise!!
This shifts any model to generate suggestions for a symptom to one further and significant indirection.
All of the date used is available as CSV files at Citizen Science Data Share.