Some statistics related to Symptom Forecasting

I recently finished building out a new symptom forecasting algorithm. The ultimate goal is not to forecast symptoms, but to identify the bacteria associated to symptoms and a numeric value that each bacteria contributes.

The results are shown below. Honestly, they far exceeded my expectation of 70-85% accuracy. Odds Ratio required P < 0.01 to be included. All of the input data used is available for download.

Retail Test	Results
Biomesight	Forecast Symptom and Declared Symptom agrees: 43,085 Declared Symptom failed forecast ( Odds Ratio < 1): 646 Percentage Accuracy: 98.5% Total Samples: 4,436 Symptoms Evaluated: 363
Ombre	Forecast Symptom and Declared Symptom agrees: 17,492 Declared Symptom failed forecast ( Odds Ratio < 1): 63 Percentage Accuracy: 99.996% Total Samples: 1,319 Symptoms Evaluated: 302
uBiome	Forecast Symptom and Declared Symptom agrees: 11,856 Declared Symptom failed forecast ( Odds Ratio < 1): 26 Percentage Accuracy: 99.998% Total Samples: 791 Symptoms Evaluated: 202
Thorne	Forecast Symptom and Declared Symptom agrees: 1,643 Declared Symptom failed forecast ( Odds Ratio < 1): 11 Percentage Accuracy: 99.99% Total Samples: 253 Symptoms Evaluated: 23

This was obtained using Odds Ratio and accepting the complexities of the microbiome. A simple example is the assumption of using the average to determine if too high or low. This is wrong in so many ways, especially give the high skewness seen in the microbiome.

There is no one threshold suits all

To assume a single value applies to all symptoms is naive ideological thinking. You can do it with a significant drop in forecast accuracy (I’ve tried it). The following charts illustrate the patterns discovered.

FCB group for Ombre Data

The chart below illustrate that the best threshold for Odds Ratio vary greatly by symptom with about 45% being the most common one. The average over all samples is 30.8%.

Bacteroides for Biomesight

We see the same behavior elsewhere, with the average 26%.

Clostridia for uBiome

Similar to above with an average of 61.4%

What do the resulting Odds Ratios look like?

The charts below are for Ombre Data Sets. The numbers are log(Odds Ratio) with the actual odds being pretty high (for an understatement).

The next chart includes the failure to forecast ones. With the typical Odds Ratio being much lower than others shown, this weakness in prediction is not unexpected. It illustrates that Odds Ratios are indeed Odds.

Most Frequent Bacteria Used for Odds Ratio

It is interesting to note that Lactobacillus is not a top bacteria (despite its popularity with some). Lactobacillaceae shows up as #91, Lactobacillales as #127, Individual Lactobacillus start around #151

Tax name	Rank
Bifidobacterium	genus
Bifidobacteriales	order
Bifidobacteriaceae	family
Actinomycetota	phylum
Actinomycetes	class
Sutterella stercoricanis	species
Listeriaceae	family
Caloramator fervidus	species
Devosiaceae	family
Marvinbryantia	genus
Oscillibacter valericigenes	species
Devosia	genus
Oscillospiraceae	family
Bacteroides	genus
Bacteroides cellulosilyticus	species
Bacteroides uniformis	species
Anaerotruncus colihominis	species
Coriobacteriaceae	family
Anaerotruncus	genus
Faecalibacterium	genus
Lachnobacterium	genus
Paraprevotella xylaniphila	species
Mycoplasmatota	phylum
Campylobacter ureolyticus	species
Mollicutes	class
Bacteroidaceae	family
Holdemania massiliensis	species
Metamycoplasmataceae	family
Desulfitobacteriaceae	family
Ruminococcaceae	family

Bifidobacterium for Biomesight

Average is 0.93 which presents a stark contrast to the numbers below.

Odds Ratio show significant increased risk for most symptoms with low Bifidobacterium,

Symptom	Odds Ratio with low amount
Immune Manifestations: Hyperphagia (abnormally hunger or desire to eat)	14.65
Comorbid: Reactive Hypoglicemia	12.89
Comorbid: Panic Attacks	9.51
Neurological: Neuropathy	7.72
Lactose intolerance	5.88
Neuroendocrine: Sweating hands	4.4
Comorbid: Constipation and Explosions (not diarrohea)	4.14

Summary

Odds ratios are designed to have predictive power. An optimized algorithm using odds ratios appear to have awesome power. The challenges with Odds Ratio are simple:

Having sufficient data to obtain P < 0.01 (or better still, P <0.0001) Odds Ratio
Addressing each bacteria-symptom individually and not going into simplistic one bacteria level fits all.
Computational requirements: After a lot of performance tuning, my computations took over 12 hours on a well equipped Microsoft SQL Server machine with 64 GB of memory. Other tech stacks may take considerably longer, a few (like Multithreaded C++) may be a lot of faster.

I deem my success to doing the above being two factors: A M.Sc. in Operations Research where optimization techniques rules, plus extensive experience with SQL Server Performance (including writing multiple white papers for Microsoft). Last, is not following current normative beliefs on how to approach this issue, I follow the numbers (i.e. statistics).

As a historic note, building out this model caused flash backs to programming in Simula and other modelling languages back in the early 1980’s. There is a “zen” for modelling.

Microbiome Prescription Blog

A site exploring the microbiome, what it affects and how to manipulate it.

There is no one threshold suits all

FCB group for Ombre Data

Bacteroides for Biomesight

Clostridia for uBiome

What do the resulting Odds Ratios look like?

Most Frequent Bacteria Used for Odds Ratio

Bifidobacterium for Biomesight

Summary

Leave a Reply Cancel reply

Recent Posts

Pages

Reference Material

Recent Comments