To illustrate this, we use our collection of distinct microbiome samples processed through BiomeSight (N: 3656).
Species: Phocaeicola massiliensis
Basic Statistics;
- Minimum: 0.001 %
- Maximum: 89.1%
- Median: 0.254%
- Mean / Average: 7.6%
- Mode: 12.4%
- Standard Deviation: 14.6%
- 5 Percentile: 0.009%
- 95 Percentile: 43.7%
- Harmonic Mean: 0.035%
- Geometric Mean: 0.445%
- Skew: 1.5
- Kurtosis: 0.035
When we apply Stats Class 101 methods, we get:
- Mean +/- 1.95 SD ==> (-21% to 36.2%)
- Box-Plot-Whiskers ==> (-9.4%, 15.8%)
WAIT: Having negative amount of bacteria!!! That is absurd!
What we should see if data was normal
data:image/s3,"s3://crabby-images/125de/125de41ed0930dfe4a00bebeb916019b8f2da855" alt=""
Wait, Mean, Median and Mode should be next door to each other!!!
What do we see when we chart this data. The charts are identical — NOT!
data:image/s3,"s3://crabby-images/1b642/1b642f26f183bb844a9bdf88ed5e386602734767" alt=""
What should be used to compute range?
There are many better suited statistical methods. A few are:
- Kolmogorov-Smirnov test
- Kruskal-Wallis test
- Wilcoxon signed-rank test
- Mann-Whitney U test
- Bothe/Z-scores
- Median Absolute Deviation
My Preference: Patent Pending Kaltoft Moldrup Algorithm
The basis of it is doing a data transformation, then taking derivates to get an almost straight line. When the data leaves the line is where it is deemed to be abnormal. The following diagrams illustrates the process.
Example: Original Data
data:image/s3,"s3://crabby-images/f6dba/f6dbad5f21b2e87f6533beabf3952485e6f7da3e" alt=""
2nd derivative line
data:image/s3,"s3://crabby-images/ebaea/ebaea56205cad7e6bc5adee2453afc3cd3e8a918" alt=""
3rd derivate line
data:image/s3,"s3://crabby-images/1d1d0/1d1d0d64b97a741e51bd8059eb5f2ac518a911a6" alt=""
4th derivative line (where we see the desired straight line in purple)
data:image/s3,"s3://crabby-images/b6752/b67523a708a22da62b6ad6be59b5b3063c6972e5" alt=""
An example with real data. Most of the abnormal data is at the bottom in this example
data:image/s3,"s3://crabby-images/0bd7e/0bd7e3399448080932f87ed0a35b949947494117" alt=""
Another more complex example indicating more complexity in the bacteria behavior in situ of the microbiome.
data:image/s3,"s3://crabby-images/cad74/cad746a8d36d9cb32e94859cb55f7c1a6557e240" alt=""
Another example showing both high and low abnormal areas
data:image/s3,"s3://crabby-images/0f159/0f159516f0662a017700161fc36b4bb87e6636c0" alt=""
Bottom Line
Many suggested ranges are based on mean and never tests if methods that apply to a normal distribution/ bell curve applies. A small number of ranges are based on percentiles, i.e. over 95%ile or below 5%ile. Using percentiles is better but as suggested by the last curves above, this does not suggest evidence of being abnormal.
The patent pending Kaltoft Moldrup Algorithm appears to identify abnormal values in the classic sense of abnormal. It does require significant mathematical and statistical skills.