I would be interested to see how the three separate consensus suggestions compare (i.e. not doing the uber consensus). Do the top takes & avoids match across the different labs, or are they different? Because if they are different then the algorithm is not robust to changes in lab.

This is a part of this series of post:

Using the same data, the process that I will use is where items suggested in both are the same (i.e. take or avoid) or different recommendations. In pseudo sql:

Select Percent(A.Take=B.Take) from Suggestions1 A Join Suggestions2 B on A.substance=B.substance

The results actually surprised me!

Lab ComparisonItemsAgreementAvg Difference
Ombre vs Biomesight1705100%52
Ombre vs Thorne1706100%100
Biomesight vs Thorne1694100%54

My expectation was somewhere between 80-90%, the same range that I got doing cross validation. The Priority and weight are different, but the take or avoid decision are the same. The difference between these pseudo values was also calculated and added to the table above. Magic Soy on Ombre may be 430, on Thorne 330, on Biomesight 530.

Conclusion, the algorithm is more robust than I expected!

Caveat: This was done using “Just give me suggestions” collection of algorithm on each lab’s data. Disagreements are definitely expected when bacteria selection are “over-focused” and not including the holistic picture of the microbiome.

