The data is available at http://citizenscience.microbiomeprescription.com/. I did some tutorials a few years back. These are linked to below:
- OpenSource Microbiome Project
- Uploading ncbi hierarchy data
- Computing Statistics
- Non Parametric Detection
Some challenges for the reader:
- Compute Percentile for each bacteria (taxon) in samples from the same lab
- Test the data if it is a normal distribution
- Run regressions between different taxon/bacteria using:
- Raw Counts
- Which gives stronger results?
- Reframe this using random forest and other ML techniques.
Then compare your results to that shown here for Clostridium butyricum (taxon 1492).
When you use percentile — what is the name of the resulting distribution? Why is that an advantage?
Identify bacteria shifts for symptoms reported.
- Do for all labs first
- Filter by individual labs
Which approach gives better associations/models.