A reader messaged this to me:
People like (and expect) absolute certain answers. Statistician and Artificial Intelligence Engineers NEVER expect absolute answers… they expect fuzzy answers and just work with it.
There are two problems with determining levels from the microbiome:
- Correct identification of the bacteria from the test
- Determining if the bacteria produces the substance.
Accurate Bacteria Identification is HORRIBLE
For a basic understanding see these prior posts:
- The taxonomy nightmare before Christmas…
- The taxonomy nightmare — Episode II
- The taxonomy nightmare before Christmas… Episode III
So with the same data (FastQ) files, the number of bacteria producing each of the above will vary greatly.
Most Labs use INFERENCE data to identify producers
By inference, I mean looking at a sample with high butyrate etc and then (stupidly) looking at the bacteria that your specific lab identify there (see above!) and then publishing a paper that high X bacteria produces butyrate etc. Even having an alleged “sterile” environment with only one bacteria is questionable to assert (besides the behavior of a bacteria placed in “extended isolated confinement” is different – as it is with humans). Not many years ago, breast milk was deemed to be sterile. Improved testing resulted is this myth being disproved in Lactobacillus Bacteria in Breast Milk , Breast Milk, a Source of Beneficial Microbes and Associated Benefits for Infant Health., Characterization of potentially probiotic lactic acid bacteria and bifidobacteria isolated from human colostrum.
These studies are used by many labs to determine amount being produced.
Microbiome Prescription Use Genetics
Is the bacteria capable of producing the chemical? Surprise, surprise, surprise… this list disagrees with the inference studies above. We still have the challenge of labs reports misidentifying the bacteria. Which is more reliable? Well, with genetics, we do not know if the production process is turned on or not. We do know which ones are incapable of producing.
The Wish List
I have tossed this request over to a person that has the academic skills (and creativity) to explore. Take the FASTQ files and the data from KEGG: Kyoto Encyclopedia of Genes and Genomes and see if you can determine the amount of genetic material producing each of these products. We totally side-step the key point of failure — identifying the bacteria!
That is, create software that takes in FASTQ files and provide estimates for all of the applicable Enzymes present! Remove bacteria naming from the process.
Possible software includes: Piphilin, Tax4Fun, PICRUSt2, PICRUSt
You want to be rich. So you look at the rich and see expensive cars, big homes, trophy spouses etc. So you deduce that you just need to have those and you will become rich!! After all, there is a strong statistical association!! The alternative is to look at wealth production (the genes) and you see a different picture: high yield stocks, inherited money, professional licenses, etc. It is the same with looking at what bacteria produces.