Accurate Inference from Studies on the Microbiome

My exploration of microbiome modification began with reading studies archived in the U.S. National Library of Medicine. As someone who has been developing expert systems since the 1990s, my instinct was to encode the findings from these studies as facts within an expert system—letting logic determine the optimal course of action.

In artificial intelligence (AI), an expert system is a computer system emulating the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as if–then rules rather than through conventional procedural programming code — Wikipedia

Some Difficult Discoveries

As I built the knowledge base, some major problems quickly emerged:

  • Results from different studies often contradicted one another.
  • Some results were replicated consistently, while others produced conflicting outcomes.
  • Certain findings were reported only once and never replicated.
  • There was significant uncertainty about bacterial identification due to non-standardized testing methods (see this explanation).
  • Studies tended to report results at a single taxonomy rank—often not the rank relevant to my analysis.

To address the first issue, I incorporated fuzzy logic into the expert system, allowing it to handle ambiguity and partial truths rather than rigid yes/no classifications.

Fuzzy logic is based on the observation that people make decisions based on imprecise and non-numerical information. Fuzzy models or fuzzy sets are mathematical means of representing vagueness and imprecise information (hence the term fuzzy). These models have the capability of recognising, representing, manipulating, interpreting, and using data and information that are vague and lack certainty. — Wikipedia

The second issue required a different approach. I began using bacterial association data (available here) to infer relationships between taxa. For example, if a compound influenced the genus Bifidobacterium, I could reasonably infer a similar effect for its species. This two-way relationship also works in reverse: if you want to increase Bifidobacterium overall, the species Bifidobacterium longum—a readily available probiotic—shows the strongest positive association.

Species NameEstimate Percentage Inference
Bifidobacterium actinocoloniiforme18.8
Bifidobacterium adolescentis54.8
Bifidobacterium angulatum26.6
Bifidobacterium animalis14.5
Bifidobacterium asteroides40.2
Bifidobacterium avesanii34.2
Bifidobacterium bifidum25.5
Bifidobacterium bohemicum52.7
Bifidobacterium bombi57.7
Bifidobacterium boum64.1
Bifidobacterium breve52.4
Bifidobacterium catenulatum33.9
Bifidobacterium choerinum66.6
Bifidobacterium commune45.4
Bifidobacterium cuniculi21.8
Bifidobacterium dentium23.2
Bifidobacterium gallicum30.8
Bifidobacterium indicum52.9
Bifidobacterium lemurum50.4
Bifidobacterium longum73.7
Bifidobacterium magnum62.5
Bifidobacterium minimum27.5
Bifidobacterium mongoliense31.9
Bifidobacterium pseudocatenulatum31.2
Bifidobacterium pullorum30.2
Bifidobacterium ruminantium20.4
Bifidobacterium scardovii16.9
Bifidobacterium subtile38.8
Bifidobacterium thermacidophilum44.5
Bifidobacterium thermophilum29.8
Bifidobacterium tsurumiense11.7

With fuzzy logic, study findings indicating increases or decreases could be translated into numerical values. Using bacterial association data, I could then adjust those values to create a more accurate estimate of impact.

Why Do the Mathematics?

Modern AI models, particularly Large Language Models (LLMs), operate differently. They generate responses by finding text that resembles the question rather than reasoning from factual relationships. LLMs do not distinguish whether data comes from a single study or many, nor do they analyze hierarchical relationships within bacterial taxonomy. As a result, taxonomic nuances—such as the difference between Lactobacillus reuteri and Limosilactobacillus reuteri—are often overlooked because the evolution of bacterial naming conventions is ignored.

At the other end of the treatment spectrum is a “whole health” influencer who might recall a single study about Bifidobacterium dentium and use it to infer a complete treatment plan. My approach is simpler: I prefer actions grounded in probability—ones that have the best odds of success.

This is a bit of the engine used on my free site for individuals, Microbiome Prescription. At present we have 14,388,582 million facts over 1,882 modifiers.

Designing Clinical Trials using Metabolite Shifts

Recently on my Microbiome Sites I added a table of metabolites whose presence or absence is associated with various symptoms. Technical details is give in this post: Metabolites and Symptoms. The volume of data is overwhelming.

Today I realized that a good solution is to have the page frame a question that people can paste into their favorite AI.

These metabolites have been identified statistically: their levels, estimated from the microbiome, are abnormally low with P < 0.001 for people reporting this symptom pattern (for example, ME/CFS without IBS). They are hypothesis-generating signals only and have not been validated in clinical trials. In my opinion, they highlight candidates that could be explored in formal clinical research or cautiously by an experienced physician with individual patients. None of these AI-assisted suggestions should be used without review and approval by an appropriate medical professional.

For example ME/CFS without IBS link as an example or to look at other symptoms go to that page and change the drop down. In terms of PROBIOTIC TRIALS, you want to see which of these substances are produced by bacteria.

Instead of “Are any of the following available as supplements or prescriptible from MDs:” ask,
“Are any of the following available produced by species uses as probiotics” or more explicit “According to KEGG are any produced by lactobacillus species”

Perplexity Response


None of the listed KEGG compounds appear to be commonly prescribable by MDs as pharmaceuticals. A few are available or derivable as dietary supplements, primarily those related to natural pigments and vitamins found in plants or bacteria.

Available Supplements

Chlorophyll a (KEGG: C05306) is widely sold as a supplement in forms like chlorophyllin for odor control, detoxification, and general health support, often derived from alfalfa or algae.

Ectoine (KEGG: C06231) is offered in over-the-counter supplements, eye drops, and skincare for dry eye relief, skin protection, and anti-inflammatory effects.

Vitamin E Forms

Alpha-tocopherol (KEGG: C02477), beta-tocopherol (KEGG: C14152), alpha-tocotrienol (KEGG: C14153), and beta-tocotrienol (KEGG: C14154) are standard vitamin E components available in tocopherol/tocotrienol supplements for antioxidant and cardiovascular support.

Unavailable Compounds

All others, including bacteriochlorophylls (e.g., C11242, C18155, C21217), complex lipids, CoA derivatives, and specialized bacterial metabolites like archaeal lipids or mycothiol conjugates, show no evidence of commercial supplements or prescriptions; they remain research or biochemical intermediates.

Claude AI Response

Based on my research, I can provide you with information about which compounds from your list are available as supplements or prescribable by physicians. Here’s a summary:

Compounds Available as Supplements:

Chlorophyll a – Available as dietary supplements, typically sold as “chlorophyllin” (a water-soluble derivative with copper instead of magnesium). Available in liquid drops, capsules, and tablets at most health food stores and drug stores WebMDWebMD.

Arachidonate (Arachidonic acid) – Marketed as a dietary supplement, particularly for bodybuilding and muscle growth Wikipedia. Available over-the-counter in capsule form.

Raffinose – Not typically taken as a standalone supplement, though it occurs naturally in foods and may have prebiotic properties Nutrients ReviewWikipedia. Used primarily in research and cryopreservation applications.

Alpha-Tocopherol (Vitamin E) – Widely available as a dietary supplement in capsule, gel capsule, and liquid forms, and can also be prescribed by doctors for certain conditions MedlinePlusWikipedia. In 2023, it was the 290th most commonly prescribed medication in the United States Wikipedia.

Beta-Tocopherol, Alpha-Tocotrienol, Beta-Tocotrienol – Available in mixed vitamin E supplements containing various tocopherols and tocotrienols.

Compounds with Medical/Prescribable Uses:

Bilirubin – Not available as a supplement. Medications like phenobarbital can be prescribed to reduce bilirubin levels in conditions like Crigler-Najjar syndrome and Gilbert’s syndrome Medscape, but bilirubin itself is not supplemented.

D-Xylose – Used primarily as a diagnostic tool in medical testing for malabsorption.

Important Notes:

The vast majority of the compounds on your list are:

  • Biochemical intermediates in metabolic pathways
  • Research chemicals
  • Bacterial/plant-specific compounds
  • Not available as supplements or prescription medications

Only a very small number (primarily the vitamin E forms, chlorophyll, and arachidonic acid) are commercially available as dietary supplements. I’m not a medical professional, so if you’re considering any of these supplements for health purposes, please consult with a healthcare provider.

This is intended for MD and Researchers

None of the listed KEGG compounds have robust clinical trial validation for therapeutic use. They represent microbiome-derived metabolites with statistically low levels (P < 0.001), suggesting potential biological roles worth exploring in formal trials or under MD supervision.

Key Caveat

Always consult a medical professional before trying any supplement or intervention based on these findings. AI suggestions are not medical advice.

Metabolites and Symptoms

In recent discussions with microbiome startup teams, a key question emerged: which metabolites correlate with specific symptoms? I recommended odds ratios as the optimal analytical approach, and one team is now considering integrating this into their product.

My prior analysis of KEGG-derived metabolite data from various labs revealed stronger consistency in metabolite patterns than bacterial profiles. Symptoms likely arise from adverse metabolite combinations circulating systemically—one metabolite can stem from hundreds of bacteria, and one bacterium can influence hundreds of metabolites—creating a complex web akin to an oversized Gordian knot.

Humans naturally gravitate toward simple “sound bites.” Asked for the highest odds of criminality, people might cite race, city neighborhood, or age range (with 0-5-year-olds showing near-zero risk). True predictive power comes from aggregating all statistically significant odds ratios—in this case, all reported metabolites with meaningful associations.

Using 4500 symptom-annotated samples from BiomeSight, this post explores that approach.

Go to Site

Computing the Odds Ratio

The process is simple:

  • Take the Biomesight samples and compute the different metabolites using the KEGG: Kyoto Encyclopedia of Genes and Genomes data.
    • This produced 2,690 different metabolites
  • Convert the amount for each metabolite to Percentile ranking. This allows the results to be applied to other pipeline data that may produce different values.
  • Compute the Chi2 for each integer Percentile rank(100) for each metabolite and symptom (with at least 30 reports, i.e. 207)
    • 2,690 x 100 x 207 = 55,683,000 Chi2 computations
  • Take the most significant vector with P < 0.001 or Chi2 > 10.83.
    • Then compute the odds ratio for it

The calculations were brutal with the CPU pegged for days (with some overclocking). If you are running on a cloud service, I trust you have a fat bitcoin wallet.

The results were over 585,094 significant metabolites for our 207 symptoms, or about 1% of the cases has a significant association found.

Using Odds Ratios

Human nature likes simplicity. “Give me just one factor to determine if a person is likely an illegal resident in the USA.” A 2025 report cites there are 150,000 illegal Irish citizens in the US, why is an Irish accent not used as a flag by a certain paramilitary group? With the microbiome data, we suffer a similar bias for simplicity with exclusion of inconvenient facts.

Looking at the odds ratios in detail, we may see large numbers. We should avoid using just one numberin isolation. The table for General: Fatigue is below. For example for the first two, metabolites if one is above and one is below, the resulting Odds ratio is about 1.09 (62.60 * 0.0174), i.e. no major risk. In short all available metabolites should be used, not just one or two.

CompoundNamePercentileOdds Ratio AboveOdds Ratio Below
Pseudouridine 5′-phosphate3962.600.0160
N-Acetylmuramic acid 6-phosphate3757.320.0174
Uridine4374.510.0134
1-(5′-Phosphoribosyl)-5-amino-4-imidazolecarboxamide3484.220.0119
GDP-4-amino-4,6-dideoxy-alpha-D-mannose40109.800.0091
beta-L-Arabinofuranose290.621.6092

There is another interesting pattern that arise. Every metabolite is most significant for being present or ot present. This is a natural pattern to use for various machine learning and AI methods, for example

  • Logistic Regression: Outputs probabilities for binary decisions via sigmoid.
  • Support Vector Machines (SVM): Finds hyperplanes separating binary classes.
  • Decision Trees: Splits data into binary paths leading to class labels.
  • Naive Bayes: Probabilistic classifier assuming feature independence for binary outcomes.
  • Perceptrons: Single-layer neural nets for linearly separable binary problems.
  • Random Forests: Ensemble of trees voting on binary predictions

Clinical Use No, Research Use Yes

With a few exceptions, clinical use is limited. For most metabolites there are no easy ways to explicitly, safely, increase or decrease it.

One interesting exception was for C06570: Tetracycline. If this is not seen, then with Chronic Fatigue Syndrome (ME/CFS), the odds of having it is 22x higher. Tetracycline family of probiotics have had significant positive effects on ME/CFS patients. Other similar metabolites include:

  • Oxytetracycline (C06571)
  • Chlortetracycline (C11453)
  • ​Penicillin G (C06925)
  • Streptomycin (C04282)
  • Erythromycin (C06911)

In other words, detecting the absence of naturally occurring antibiotics in a patient with a matching symptom suggests specific antibiotics could be tried.

For items like probiotics, the impact is very strain specific. Very few probiotics are sold by strain (and those that do, often lack data). A simple example for Limosilactobacillus reuteri is illustrated below.

From prediction to function using evolutionary genomics: human-specific ecotypes of Lactobacillus reuteri have diverse probiotic functions[2014].

Technical Note

With bacteria Odds ratio I did a “double odds ratio”. A Double Odds ratio means:

  • Determining Odds ratio for Detected or Not Detected.
  • Determining the Optimal Odds Ratio when it is Detected.

This was done because I was focused on computing suggestions on individual samples.

These are coarse (single) odds tables because we are not evaluating individual samples but a collection of samples. It is very possible to do a double odds ratio for metabolites — but at present, it appears to have low utilization because of the extremely low number of possible interventions. If interested, contact me at Research at MicrobiomePrescription (US decimal point) com