The data is available at http://citizenscience.microbiomeprescription.com/. I did some tutorials a few years back. These are linked to below:
- OpenSource Microbiome Project
- Uploading ncbi hierarchy data
- Computing Statistics
- Non Parametric Detection
For an example of checking any Artificial Intelligence Application, see Cross Validation of AI Suggestions for Nonalcoholic Fatty Liver Disease
Some challenges for the reader:
- Compute Percentile for each bacteria (taxon) in samples from the same lab
- Test the data if it is a normal distribution
- Run regressions between different taxon/bacteria using:
- Raw Counts
- Which gives stronger results?
- Reframe this using random forest and other ML techniques.
Then compare your results to that shown here for Clostridium butyricum (taxon 1492).
When you use percentile — what is the name of the resulting distribution? Why is that an advantage?
The model was built using Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) for tuning. I recently did a Cross Validation of AI Suggestions for Nonalcoholic Fatty Liver Disease. Validation requires a lot of studies trying different things on a condition.
Identify bacteria shifts for symptoms reported.
- Do for all labs first
- Filter by individual labs
Which approach gives better associations/models.