A reader asked:
Do you have a blog post where I can learn more about how percentile distribution of a 16s can give statistical insights ? You often mention the concept that a “good” sample has a even distribution…I’m trying to understand why so and what uneven distribution might mean etc
The answer is statistics!!
At each taxonomy level (species, genus, family, etc) we can reasonably assume that the count of each bacteria taxonomy are independent of others of the same rank. There may be a little correlation (if A goes up, B goes up), but in general, not significant.
When we use percentile instead of percentage, we change the information about the bacteria in a uniform distribution. Suppose you have 120 dice. You roll all of them. You expect to have around 20 with 1, 20 with 2, 20 with 3, 20 with 4, 20 with 5 and 20 with 6.
If you get 40 dice with 1 and 2 dice with a 6, you are reasonable to suspect that the dice are biased or loaded.
Instead of a 6 sided die, we use a 10 sided die — 10%ile ranges. If the number in each 10%ile range are the same, then you can assume that the die is fair OR in our case, the microbiome is balanced. If you get a great difference in each 10%ile ranges, you suspect that the die is bias OR the microbiome is unbalanced (as in an unbalance die).
Where there is over or underrepresentation gives us hints as to where there may be an issue. It does not tell us what the issue is. It simply tells us that the microbiome is unbalanced and points us at subsets of bacteria worth examining.
Wait! There is More
A reader sent me this recent paper which appears to emphasis this issue for having a high count in the 0-9%ile for inflammatory conditions.
These observations suggest a general mechanism that underlies changes in diversity in perturbed gut environments and reveal taxon-independent markers of “dysbiosis” that may explain why widespread yet typically low-abundance members of healthy gut microbiomes can dominate under inflammatory conditions without any causal association with disease.Metabolic independence drives gut microbial colonization and resilience in health and disease 
The question of causal association not being found may be a matter of sample size being used and cross-interactions and cooperation between these “minor voices”. The paper data came from Fecal Matter Transplants data and is illuminating on why many FMT fail to persist.
1 thought on “Why Percentile at taxonomy rank is important”