In recent discussions with microbiome startup teams, a key question emerged: which metabolites correlate with specific symptoms? I recommended odds ratios as the optimal analytical approach, and one team is now considering integrating this into their product.
My prior analysis of KEGG-derived metabolite data from various labs revealed stronger consistency in metabolite patterns than bacterial profiles. Symptoms likely arise from adverse metabolite combinations circulating systemically—one metabolite can stem from hundreds of bacteria, and one bacterium can influence hundreds of metabolites—creating a complex web akin to an oversized Gordian knot.
Humans naturally gravitate toward simple “sound bites.” Asked for the highest odds of criminality, people might cite race, city neighborhood, or age range (with 0-5-year-olds showing near-zero risk). True predictive power comes from aggregating all statistically significant odds ratios—in this case, all reported metabolites with meaningful associations.
Using 4500 symptom-annotated samples from BiomeSight, this post explores that approach.
Computing the Odds Ratio
The process is simple:
- Take the Biomesight samples and compute the different metabolites using the KEGG: Kyoto Encyclopedia of Genes and Genomes data.
- This produced 2,690 different metabolites
- Convert the amount for each metabolite to Percentile ranking. This allows the results to be applied to other pipeline data that may produce different values.
- Compute the Chi2 for each integer Percentile rank(100) for each metabolite and symptom (with at least 30 reports, i.e. 207)
- 2,690 x 100 x 207 = 55,683,000 Chi2 computations
- Take the most significant vector with P < 0.001 or Chi2 > 10.83.
- Then compute the odds ratio for it
The calculations were brutal with the CPU pegged for days (with some overclocking). If you are running on a cloud service, I trust you have a fat bitcoin wallet.

The results were over 190,000 significant metabolites for our 207 symptoms.
Using Odds Ratios
Human nature likes simplicity. “Give me just one factor to determine if a person is likely an illegal resident in the USA.” A 2025 report cites there are 150,000 illegal Irish citizens in the US, why is an Irish accent not used as a flag by a certain paramilitary group? With the microbiome data, we suffer a similar bias for simplicity with exclusion of inconvenient facts.
Looking at the odds ratios in detail, we may see large numbers. We should avoid using just one numberin isolation. The table for General: Fatigue is below. For example for the first two, metabolites if one is above and one is below, the resulting Odds ratio is about 1.09 (62.60 * 0.0174), i.e. no major risk. In short all available metabolites should be used, not just one or two.
| CompoundName | Percentile | Odds Ratio Above | Odds Ratio Below |
| Pseudouridine 5′-phosphate | 39 | 62.60 | 0.0160 |
| N-Acetylmuramic acid 6-phosphate | 37 | 57.32 | 0.0174 |
| Uridine | 43 | 74.51 | 0.0134 |
| 1-(5′-Phosphoribosyl)-5-amino-4-imidazolecarboxamide | 34 | 84.22 | 0.0119 |
| GDP-4-amino-4,6-dideoxy-alpha-D-mannose | 40 | 109.80 | 0.0091 |
| beta-L-Arabinofuranose | 29 | 0.62 | 1.6092 |
There is another interesting pattern that arise. Every metabolite is most significant for being present or ot present. This is a natural pattern to use for various machine learning and AI methods, for example
- Logistic Regression: Outputs probabilities for binary decisions via sigmoid.
- Support Vector Machines (SVM): Finds hyperplanes separating binary classes.
- Decision Trees: Splits data into binary paths leading to class labels.
- Naive Bayes: Probabilistic classifier assuming feature independence for binary outcomes.
- Perceptrons: Single-layer neural nets for linearly separable binary problems.
- Random Forests: Ensemble of trees voting on binary predictions

Clinical Use No, Research Use Yes
With a few exceptions, clinical use is limited. For most metabolites there are no easy ways to explicitly, safely, increase or decrease it.
One interesting exception was for C06570: Tetracycline. If this is not seen, then with Chronic Fatigue Syndrome (ME/CFS), the odds of having it is 22x higher. Tetracycline family of probiotics have had significant positive effects on ME/CFS patients. Other similar metabolites include:
- Oxytetracycline (C06571)
- Chlortetracycline (C11453)
- Penicillin G (C06925)
- Streptomycin (C04282)
- Erythromycin (C06911)
In other words, detecting the absence of naturally occurring antibiotics in a patient with a matching symptom suggests specific antibiotics could be tried.
For items like probiotics, the impact is very strain specific. Very few probiotics are sold by strain (and those that do, often lack data). A simple example for Limosilactobacillus reuteri is illustrated below.
