The describes the mechanics behind the Microbiome Prescription Web Site. It describes the process with questions that impacts the AI algorithms. A critical starting point is that I am a statistician — with everything there is uncertainty. We know nothing for certain. Everything is odds or probability — except with the sole exception: pure mathematics.
Base Hypothesis
The microbiome is a contributor to many medical conditions and the effectiveness of some treatments. Altering the microbiome may change the symptoms or the symptom severity of these conditions, and the effectiveness of some treatments.
Ken Lassesen, M.Sc.
In current medical practice a variety of supplements, antibiotics and foods are well known to alter medical conditions. Many of these trace back millenniums. Current studies show that these substances happen to alter bacteria that are also associated to these conditions. The term associated is critical because it implies probability.
Identifying the Bacteria is not Certain
For any study finding bacteria associated to a condition, the same samples may produce a different set of bacteria associated to the condition. We do not know with certainty the bacteria named. For the literature see this earlier post: The taxonomy nightmare — Episode II
Suggestion: Use only studies that uses specific equipment, reagents and software (and version)
This leads to a huge number of studies being excluded
Step 1: Building the List of Bacteria Associated to the Condition
The gold-standard approach is extract data from peer-reviewed studies on the US National Library of Medicine and equivalent resources. This encounters the first challenge. Studies typically report that the means of the control group and the means of the study group are statistically different. The study may indicate that the mean of one is higher or lower the other. The reality is that some subjects in the control group may have the mean of the study group, or be further away from control group. This presents challenges for being able to apply to any individual. Given an individual’s value, we end up having to determine the probability of the individual matching to the control group or the study group.
A few studies have reported that the study group can be subdivided into two groups (typically by clustering of microbiome patterns) with the result that one study subgroup has a mean below the control with statistically significant, and the other study subgroup has a mean above the control with statistically significant.
From these studies we could build a table like the one shown below
Condition | Bacteria | Direction | Statistical Metadata |
IBS | Blautia | {H,L,B} | μ,σ, α, β, r, ρ, η2 |
When we find multiple studies reporting shifts for a condition, it is not unusual to get contrary results. A few examples of sets are below:
- Study 1: Done in China; Study 2: Done in the US
- Study 1: Done in US with people with a specific additional condition; Study 2: Done in the US with out filter for additional condition
- Study 1: Done in Israel with Ashkenazi Jews; Study 2: Done in Israel with Sephardic Jews
DNA, diet and living environment all play a role in the microbiome and thus associations to conditions.
Deriving Association Probabilities
Let us assume that we have build up a database and then look at the results. Suppose we have 10 studies and look solely at Direction, with the following results:
- All ten report Higher is seen for Blautia. It seems that we could assign a probability of 1.0 safely
- If eight reports higher and two fail to report any difference. What is the probability then? A naïve 0.8 is unlikely to be an intelligent choice
- If six report higher and three report lower. What is the probability then?
- If only one of the ten studies reported it lower, What is the probability? A naïve 1.0 is unlikely to be an intelligent choice because there is a significant risk this may be a false result.
When we try to apply to an arbitrary individual (say an Canadian Inuit living traditionally), we find the estimation of probability increases in complexities.
Step 2: Identifying the Impact of Food, Supplements and Drugs
This has many of the same issues as discussed above. We will use the term modifier for Food, Supplements and Drugs. The gold-standard approach is extract data from peer-reviewed studies on the US National Library of Medicine and equivalent resources. This encounters the first challenge. Studies typically report on the different animal species. At present, the majority of studies of modifiers impact on the microbiome are based on animal studies (rodents, dogs, cats, horses, cattle).
These veterinary studies are likely higher quality and reliability than human studies for several reasons:
- Generic variance of the animals is much less than with humans
- Diet is far stricter control then humans. Typically, the modifier being studied is the sole difference between the study group and the control group.
There are people who will reject these studies because they were not done on humans. That is an ideological excuse. As a statistician, I look for probability and not certainty or absolutism. It is highly probable that bacteria will react to modifiers in the same way, regardless of the species origin of the bacteria.
When we find multiple studies reporting shifts for a modifier, it is not unusual to get contrary results. The impact of a grain on dogs is likely different from that seen on cattle. The impacts may mirror the impact of humans that are heavy meat eaters (for example Americans) and those that are vegetarians (East Indians).
We can proceed in the same way as above, building up a data table
Modifier | Bacteria | Direction | Statistical Metadata |
Vegetables | Blautia | {H,L,B} | μ,σ, α, β, r, ρ, η2 |
And then proceed to Deriving Association Probabilities as shown above.
Step 3: Combining The Data
With the above data, we actually drift into the world of certainty! We have an optimization mathematical problem. This is my home turf (having a Masters in Operations Research with several Ph.D. courses in the same area). Our goal is select the modifiers that should be included (or excluded if currently being taken) to minimize the conceptual control means across all bacteria.
Creating the objective equation is unfortunately complex. Do we assign weights in this objective equation based on percentage of the microbiome or being present or the relative shift from the control group? For some bacteria like Yersinia pestis, being present is a reasonable choice. For other bacteria that produces d-lactic acid, then the percentage may be a better choice. For others, the relative shift (2x higher) or even the z-score (for example 5.3) may be a more appropriate weight.
We have additional coefficient challenges. If a person consumes 250 mg of barley a day, will the impact of 500 mg of barley a day on the microbiome be twice as much? It is very unlikely that we have linearity of impact. Some of the studies above tried multiple dosages on the study population and thus an impact to dosage equation could be constructed for some modifiers. In a few cases, the impact may be influenced by other bacteria present.
Once the weights are determined, then the problem can be handed to any good mathematician to solve.
Step 4: Cross Validation
At a macro level, you should run the above optimization problem and see what is suggested. The list of things suggested and to be avoided should then be compared against the literature to see if there is agreement. If there is good agreement, then you have likely made reasonable choices for the many issues cited above. If there is not agreement, then you may need to re-examine assumptions.
The next step is individual application. You apply the same mathematics to an individual’s microbiome and generate suggestions. Ideally, the individual will be able to follow some of the suggestions. After 2-3 months the individual’s microbiome is retested and evaluated whether there has been the desired shift.
The model was built using Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) for tuning. I recently did a Cross Validation of AI Suggestions for Nonalcoholic Fatty Liver Disease. Validation requires a lot of studies trying different things on a condition.
Implement your own Model
I do not have the ultimate model. I have one that has evolved to produce reasonable results consistently based on a series of “arbitrary decisions” on how to handle the many issues cited above. I encourage readers and researchers to develop their own model.
Expediated Drug Discovery
A drug that alters bacteria in the desired direction is likely to have positive results in a clinical human study. Testing a drug with animals is both faster, cheaper and more controlled. If by diverse means, the animal can be induced to have the desired target microbiome, then the validity of the drug becomes more probable.
Personal Experience with my Model
The Microbiome Prescription site uses an evolution of the above process and through iterations have resulted in good cross validation of suggestions. This validation has often included 75% of the top recommended prescription modifiers being shown in studies to be effective. Similarly with non-prescription supplements. This has been done only for a few conditions (specifically, ones with a rich collection of treatments tried and no “known, accepted treatment”).
I recently spent hours doing a cross validation for a condition that I was unfamiliar with, the results were good: Cross Validation of AI Suggestions for Nonalcoholic Fatty Liver Disease
So far individual application has been successful. Success being defined objectively as reduced shifts from abnormal levels, and subjectively, as reduced symptom severity or loss of symptoms.
The details of the algorithms in my model are proprietary and deemed trade secrets. Why? They are mathematical techniques and thus cannot be copyrighted or patented. Disclosing the details will likely result in people with different opinions becoming very vocal. I described the issues and challenges above so people are free to build they own models off this pattern.
The site is free for personal use.
Build your own AI Model?
I would suggest using SWI-PROLOG. Prolog is a language that does not compute numbers but compute logic. The following is actual program code. It is almost english-like and prolog can resolved what you should take or not take.
- increases(Modifier_A,Bacteria_31979).
- decreases(Modifier_X,Bacteria_1236).
- low(“person”,Bacteria_1239).
- high(“person”,Bacteria_1234).
- helps(Person,Modifier) :- high(Person,Taxon),decreases(Modifier,Taxon).
- helps(Person,Modifier) :- low(Person,Taxon),increases(Modifier,Taxon).
- hurts(Person,Modifier) :- low(Person,Taxon),decreases(Modifier,Taxon).
- hurts(Person,Modifier) :- high(Person,Taxon),increases(Modifier,Taxon).
- take(Person,Modifiler) :- remove(helps(Person,Modifier),is,hurts(Person,Modifier)).
- contradict(Modifier,Taxon) :- decreases(Modifier,Taxon),increases(Modifier,Taxon).
“take” above says get the list of items that helps and then removes all items that hurts. I.e. only items without contradictions. The number of lines (statements) to explicitly write Microbiome Prescription in SWI-PROLOG is about 15 million statements.
Recent Comments