Detecting the clearly abnormal

The page Your microbiome is unique to you illustrates that it is a challenge to compare microbiomes between people. The one probable exemption is two identical twins that eat the same diet and live the same lifestyle.

I often read, “Thryve says my bacteria X is too high /low”. That determination is usually done by comparing against averages — with the microbiome, that is not a good approach.

Determining what is or is not a significant shift in the microbiome is challenging. My original algorithm was based on reverse engineering the normative values that uBiome appear to be working from. This was a quick and dirty solution — the best that I could do a year ago.

I work with computer systems, both microbiomes and complex computer system tend to share similar challenges: they are not normal distributions and often long tailed (skewed) which means that averages and standard deviation often produce poor results for detecting abnormal values.

We will Box Up the Issue!

A common process in filtering data for machine learning etc, is excluding outliers. We are actually interested in finding the outliers! This is often done by boxplots. An example of some of phylum level bacteria is shown below. (Note 1.0 = 100%). This is using some 500+ uploaded microbiome samples.

outliers are the round circles

And we can do it to lower levels, for example, order

Down to Species

The solid black line is the median (almost an average). For B.Vulgatus we see that the range of values from 25%ile to median is almost the same as median to 75%ile. For B. uniformis, this is very different.

Site Update in Progress

I am working on producing a new set of pages on the site that uses the above approach. I will keep the original approaches available as Advanced Options because I really do not know which is more accurate. The above approach reduces the complexities and mute what is likely noise.

For people not uploading 16S details, but using canned conventional tests, this approach cannot be done. I need a reasonable size of samples to detect outliers.