I’m excited to share the launch of R² Microbiome Prescription (https://R2.MicrobiomePrescription.com), a platform dedicated to unraveling bacterial associations in the microbiome. The name “R²” reflects the Coefficient of determination—a statistical measure showing how strongly one variable (like one bacterial presence) correlates with another (e.g., a different bacteria presence). Think of it like income and spending: as salary rises or falls, spending often follows, though this doesn’t prove salary causes spending changes.
Why focus on associations?
While correlation ≠ causation, I lean toward the idea that bacterial relationships in the gut often hint at underlying causal mechanisms. For instance, one microbe’s metabolites might directly feed or inhibit another, creating a metabolic chain reaction. With thousands of metabolites (and counting!) interacting in complex ways, pinpointing exact cause-effect relationships is like solving a 4D puzzle.
The challenge ahead
Current research is racing to map these connections, but the sheer scale of interactions—combined with individual variability—makes definitive conclusions tough. My goal with R² is to aggregate data, spotlight patterns, and inspire deeper exploration into how these microbes might shape health.
Feel free to explore the site and join the conversation!
Keep It Simple Statistics (KISS)
Over the years, I’ve experimented with various methods to uncover meaningful bacterial associations—a journey that’s been both challenging and gradual. After much trial and error, I finally developed a methodology that consistently delivers reliable results, which I’ve now used to populate the new site.
A turning point came during discussions with Precision Biome. They encouraged me to apply this approach to their extensive dataset of shotgun sequencing samples from healthy individuals. This collaboration provided the perfect opportunity to put my method to the test on a large scale—and ultimately led to the creation of the site you see today.
Getting R2 by Percentages
Here’s an example of a clear association between two taxa using percentages of each in samples:
R² = 0.6971 and Slope = 0.3563.
An R² value of 0.6971 indicates that nearly 70% of the variation in one taxon’s abundance can be explained by changes in the other, reflecting a strong linear relationship between them. The slope of 0.3563 shows the rate at which one taxon’s abundance changes in relation to the other—specifically, for every unit increase in one, the other increases by about 0.36 units.
This kind of result highlights how statistical measures like R² and slope help quantify and visualize associations within complex microbiome data.
The relationship is typically not so linear. This was a specific example picked for illustration.

[BELOW] Applying a monotonic increasing transformation like the square root to the data changes the association metrics: in this case, R² drops to 0.5112 and the slope increases to 0.5405, indicating a weaker linear relationship compared to the original analysis. This reduction in R² means the transformed data explains less of the variance between the two taxa, making the association less robust than before.
Square root and similar transformations are commonly used in microbiome studies to stabilize variance, handle skewness, and address issues like zero-inflation and compositionality in the data. However, these adjustments can sometimes reduce the strength of observed associations, as seen here, because they alter the data’s distribution and the nature of relationships. Our goal is not to get a linear relationship, rather to get the best R² while preserving the nature of the data (i.e. all transforms should be monotonic increasing transformation)

[BELOW] Applying a different monotonic increasing transformation, such as taking the logarithm of the data, yields R² = 0.6596 and Slope = 1.0046. This result is an improvement over the square root transformation, as indicated by the higher R² value; but less than the first linear one.
A logarithmic transformation is often used to manage skewed data and compress large ranges, making relationships more linear and easier to interpret. In this case, the higher R² suggests that the log transformation preserves more of the association between the two taxa compared to the square root transformation. The slope of 1.0046 indicates a nearly one-to-one relationship between the log-transformed values of the two taxa.

[BELOW] We can also experiment with other transformations to see how they affect the association. The more complex transformation that I prefer yields R² = 0.7082 and Slope = 0.5015.
This R² is the highest among the transformations tested so far, indicating that this method captures the relationship between the two taxa most effectively. The slope of 0.5015 shows a moderate rate of change between the transformed values of the taxa.
This example highlights how choosing the right transformation can significantly enhance our ability to detect and quantify associations within microbiome data. By carefully selecting and testing different approaches, we can better reveal the underlying patterns and relationships that might otherwise remain hidden.

R2 is the amount of influence, slope indicate direction of influence
It’s important to avoid combining R² and slope by multiplying them together. This is not a standard or meaningful statistic in regression analysis and can easily lead to misinterpretation. For instance, a high slope with a low R² suggests that while changes are dramatic when they happen, the overall model does not explain much of the data’s variance.
Remember:
- Slope tells you the direction and rate of change (whether the relationship increases or decreases).
- R² indicates how much of the variation in one variable can be explained by the other (the strength of the association).
Each metric provides valuable information on its own, but their product does not offer any additional insight and can actually be misleading.
Criteria for selecting transformation
For any given pair of bacteria, it’s technically possible to find a data transformation that maximizes the R² value for that specific pair. However, with 5,000 taxa, there are over 25 million possible pairs (5,000 × 5,000), making it an overwhelming and impractical task to optimize each one individually.
Ideally, the goal is to identify a single transformation that performs well across both low and high R² values for all pairs. Discovering such a transformation was a significant part of my journey. To keep the analysis manageable, I focused only on bacteria present in at least 0.3% (0.003) of the samples, which helped reduce the number of pairs to a more reasonable level.
I’ve found a favorite transformation—demonstrated in the last chart above—that I’m particularly satisfied with. If I discover an even better transformation in the future, I simply rerun the analysis and select the one that yields the highest R² values. This approach ensures that the associations presented are as strong and meaningful as possible.
A practical alternative is to run regressions with multiple transformations and picked the transformation for each bacteria pair that has the highest R². I would suggest some of the following transformation be tried:
- linear function with positive slope
- cubic function
- square root function (converting percentage to 0 – 1 range)
- exponential function with base e
- natural logarithm
- logistic function
- general exponential function
- x−sin(x)
- x/(log(x)
This will increase the computations from 25 million to 250 million. Remember computer resources are cheap today (say he would started doing statistics using a HP-21 calculator and WatFor). And fast using parallelism (multiple cores and threads).

Usage With Probiotics
Suppose your Bacteroidota levels are too high and you’re considering which Bifidobacterium probiotic to take. If you turn to published studies, you’ll notice that most research focuses on individual probiotic strains, making it difficult to directly compare their effects. Instead, let’s examine the comparative data in the charts below to help guide your choice.
Bifidobacterium adolescentis: NCBI 1680, [species]
Does not impact any bacteria!! Definitely a pass.

Bifidobacterium bifidum: NCBI 1681, [species]
We see some impact, with R2 being 0.10. it has little impact on other bacteria

Bifidobacterium breve: NCBI 1685, [species]
This does reduce some bacteria, and Bacteroidota has R2 0.12 (20% better than above)

Bifidobacterium longum subsp. infantis: NCBI 1682, [subspecies]
For this one, we have R2 being 0.144 — best yet!

Bifidobacterium longum subsp. longum: NCBI 1679, [subspecies]
For this one, we have R2 being 0.153 — best yet!

Starting at the target Bacteroidota
Are target is Bacteroidota: NCBI 976, [phylum]. We see the top 64 bacteria in the chart. The table below has 134 entries with R2 of 0.10 or more

We can then search the table at the end for the best probiotics.

“Buyer beware,” or caveat emptor
The harsh reality is that we cannot trust most bacteria identification with the microbiome and with probiotics.

Precision Biome (who supplied the dataset) are doing things what I deem the right way:
- They are using the same pipeline that the above data came from (no ambiguity in bacteria identification) for client samples that they received.
- They are working with an EU probiotic manufacturer directly.
- The contents of the probiotics is also verified with the same pipeline
- The probiotics come directly from the factory and are not stored in questionable environments before being delivered to the client
- They intend to use the data from this site in identifying the best probiotics for each client
This is (IMHO) the ideal trifecta for clinical use of the microbiome. It is the strategy that I hope responsible microbiome testing firms move to.
Quick Test
Some one asked about probiotics that reduces Campylobacter. The page shows known (and pending) probiotics. We found none listed to reduce it. We did find some that increases it.

Going to Campylobacter Details: NCBI 194, a page that consolidated studies we found:

Bacillus is a genus and covers many species — so difficult to evaluate.
Not as good as actual studies? — but reasonable for sparse data
Critical Evaluation of Microbiome Study Limitations & Proposed Solutions
Key Factors Impacting Credibility
Current microbiome research faces significant validity challenges due to three core assumptions:
- Taxonomic Accuracy of 16S rRNA Sequencing
- The 16S pipeline (used in >80% of studies) has notable limitations:
▪ Struggles with species/strain-level resolution
▪ Database gaps create misclassification risks
▪ PCR amplification biases skew abundance data
- The 16S pipeline (used in >80% of studies) has notable limitations:
- Probiotic Product Integrity
- Studies often assume supplements match label claims, yet:
▪ DNA analyses show 30-50% mislabeling in commercial probiotics
▪ Viability issues occur in 40% of products (esp. non-refrigerated)
▪ Strain-specific effects are frequently overlooked
- Studies often assume supplements match label claims, yet:
- Population Generalizability
- Most trials use narrow cohorts:
▪ 78% of probiotic studies focus on healthy adults
▪ Gut ecosystem dynamics differ in:- Chronic disease states
- Antibiotic-treated individuals
- Elderly/immunocompromised populations
- Most trials use narrow cohorts:
I prefer the trifecta approach over blind faith that all of the above assumptions are true. Blind faith is reasonable when you have no better data — the odds are that it will be better than no data.
Recent Comments