Microbiome Statistics for Dummies

The is a saying “Lies, damned lies, and statistics” quoted by Mark Twain. This does not mean that statistics are horrible, it is the misinterpretation of the numbers. In general, it is people not understanding the qualifications of those numbers and craving simple answers.

A real world illustration may help understand some terms and qualifications. Looking at retirement plans (I will use real numbers about the US 401K plans)

  • Under 25: average ≈ 6,900 median ≈ 1,900.​
  • 25–34: average ≈ 42,600 median ≈ 16,300.​
  • 35–44: average ≈ 103,500 median ≈ 40,000.​
  • 45–54: average ≈ 188,600 median ≈ 67,800.​
  • 55–64: average ≈ 271,300 median ≈ 95,600.​
  • 65+: average ≈ 299,400 median ≈ 95,400.


The average amount of different bacteria also vary by age. For a few bacteria, with almost as dramatic contrast for a few bacteria.


We also have one other measure, the mode, the most common amount in 401Ks. Typically this is often 1/3 of the median. But wait a minute, we excluded from the above numbers those that do not have a 401K, for some of the above ages it is about half, for 65+ if is 20% do not have.

Looking at the above numbers, we see that “the average American (with a 401K) has $148,000 in 401K”. The reality is that less 24% have this amount. Clearly, 76% of people need to save more! This average excludes those who do not have any 401K, adjusting for this we may find less than 8% have this amount! i.e. 82% of the population are deficient in 401K saving — time to hire financial advisor!

Similarly a microbiome test may report that the average person has 3% of Bifidobacterium. In looking at various data sets from different labs, less than 25% have this amount. Clearly, 75% of people need to increase their Bifidobacterium! This average excludes those who do not have any Bifidobacterium, adjusting for this we may find less than 8% have the average amount or higher of some bacteria i.e. 82% of the population are deficient in some bacteria — time to hire microbiome advisor!

As a statistician, the critical numbers are median and mode. Average is for politicians!

  • If you are close to the mode, you are effectively typical. Not deficient unless you are living in a population with horrible health issues.
  • If you are at or above the median, you are deficiently not deficient. The median here should include those who have none.
    • Some bacteria is seen in only 40% of people, if we include those in our computations, we see two things:
      • If you have none, then you are typical!! none is the mode!
      • If you were at the 5%ile before the non-reporter where included, you are now at 43%. Clearly not deficient

Unfortunately, labs often makes only the very deceptive average available render interpretation of your data questionable! The rational approach is to ask for more data from the lab, and if not provided, take to social media with their refusal.

Odds Ratio and Autism

I posted about odds ratio for autism, Odds Ratio for Autism where a high Odds ratio was reported (2.41) when the median value of bifidobacterium of people with Autism was used. The value was 0.85%.

ConditionAverageAbove MedianBelow Median
With Autism1.874949
Without Autism1.998173114

The 49 to 49 split for autism is precisely what is expected when using the median, the middle value. The averages are close to each other. Where things are different is the shape of the sample distribution. The peak of the distribution is around the median for autism, the peak is below the median for without- autism.

Picking a different division point will get different odds ratios. I opted to use the median.

An additional issue that I cited above, “people without 401K”.

  • We have 127 samples with Bifidobacterium in Autism of 247 samples, or 51% of samples.
  • We have 3902 samples with Bifidobacterium in Without-Autism of 4118 samples, or 94% of samples.

So, we have it reported only about half of the time, with similar averages. When we look at the occurrence odds ratio for those that reported Bifidobacterium, we get a high (occurrence) Odds Ratio.

If we included non-reporters of Bifidobacterium, then the median will be ZERO for autism with the odds ratio becoming close to 2 for Low Bifidobacterium for Autism — the exact opposite results.

Bottom Line

The purpose of this post is to illustrate that that while statistics are numbers and accurate for the calculations used, the conditions on the calculations are critical.

  • Before comparing two groups for a bacteria, you must check the incidence of occurrence in both groups. Ideally verifying that they are not statistically different (another topic). As a rule of thumb, they should be within 5% of each other.
    • 50% and 52% is fine. 5% of 50 is 2.5
    • 50% and 54% is not fine. But you can use Odds Ratio of Occurrence!
  • THEN and ONLY THEN: Using Odds Ratio of the median of one group will often expose shifts. Ideally, you should test with the median of both groups.
  • Using Mean and Standard deviation is a TOTAL NO NO. Bacteria are very very skewed.

IMHO, Odds Ratio are preferred for those not skilled in statistical sciences. I will likely be producing some sample Odds Ratio of Occurrence from the donated data. Stay tune.

The biggest challenge to using the Microbiome for Health reasons is the failure for microbiome testing companies to disclose essential data. Personally, I make all of the microbiome data that I use freely available for download.

To return to the question of Autism:

  • People with Autism AND Bifidobacterium have more bifidobacterium than expected using Odds Ratios and similar amount using means.
  • People with Autism are less likely to have Bifidobacterium reported.