An Exercise showing the Volatility of Bacteria Counts

This post started out seeking to confirm or debunk the claim located here.

The method was very simple because we have a continuous stream of samples from before COVID, before the COVID vaccination and after the majority of people uploading samples would have been vaccinated. If this massive change is happening then the pre-COVID bifidobacterium count (by lab) would be much higher than the post-COVID vaccination bifidobacterium counts.

My results: there was no statistical significance between the averages

  • Pre 2020-01-01: Average Count 20380 on 118 samples, Std Dev 98300
  • Post 2022-06-01: Average Count 26111 on 406 samples, Std Dev 72700

That is a 28% increase when a decrease was expected from the above talk.

I am open data, so you can pull the data and check the calculations:

Volatility of Numbers

I was also curious to see if there was any apparent month by month pattern, so I pulled the statistics for biidobacterium, shown below. It is illuminating to a statistician like me, perhaps confusing or concerning to people with poor understanding of statistics (who would expect the numbers from month to month to be similar).

ThryveBiomeSight
YearMonthAverageStd DevObsAverageStd DevObs
202073243813164624279293582614
20208254564340521768389489
20209134101932917185012256614
20201084056148144184370739020
202011185983404994926819713
2020121007817108161841271829
202116815217240520124362067532
2021210160016398030172895550945
202135795710324817144823377433
2021421979429673077002443646
20215246935174456142573360838
20216281668449139214658576251
202174702310520939226206722951
20218602838239843184277978437
20219624389292928120021963541
2021101312129924245922856538
20211111515270955799962496658
202112285828019117114982991963
2022115114287603870761514950
20222248165906932117072720252
20223104862399533202435153947
2022410207215805779161828869
20225334718249780103042371981
20226238616012653805321994235
202272679710943540107092043962
20228677071321086080851719085
20229139261762228126352133292
2022109090140494596272117189
20221110296190343972931215061
2022123186619421108872139042
20231102241821541133022161289
2023210604228835291031978172
20233659537817330135663425657
Statistics for Bifidobacterium

My conclusion is that you need to have two things to get good results:

  • All of the samples should be processed by the same lab at the same time. Different batches of reagents may cause different results.
  • You need good sample sizes, at least 100+
  • You need to be very very careful not to cherry pick data (example below)

An example from Thryve/Ombre data above, with a sample size of 30, the average was 101600. Later a sample size of 21 reported just 3186. Conclusion: going back to school caused family bifidobacterium to tank!

On sample size of 100 issue: