LLM / Data extraction of Microbiome Studies: A Challenge

I work with several microbiome testing companies. These companies desire a reliable automated way of extracting data from studies. At the same time, I have been pinged by people claiming they can do it. My response is simple:

Show Me the Evidence

The easiest solution is to give a challenge set of data:

  • All bacteria must be identified by NCBI Taxon Number (integer), not arbitrary name. The definitive source of this information is here.
    • Warning: Spelling mistakes, former names, regional variations (i.e. color vs colour) are part of the problem
  • The following conditions [ConditionId] should be searched for. Note that there are many alternative names in the literature.
  • The following modifiers [ModifierId] should be searched for. Note that there are many alternative names in the literature.

A collection of open source studies are listed below [CitationId],[ConditionId],[ModifierId]

The expected submission are these Comma Separated Files

  • Modifications.csv:
    • taxon
    • ModifierId
    • CitationId
    • Shift (+1 for increase, -1 for decrease, 0 for no impact)
  • Conditions.csv
    • taxon
    • ConditionId
    • CitationId
    • Shift (+1 for high, -1 for low, 0 for no shift)
  • Treatment.csv – only items that reduces the severity of the condition
    • ModifierId
    • CitationId
    • ConditionId

The Source Files

Additional Data that would be desired:

  • Experimental Subject (i.e. human, mice, zebra fish, etc)
    • Qualification: (diabetes, obese, Parkinson, etc)
  • Significance (P < 0.05, P < 0.0001)
  • Sample Size
  • Statistical Method Used

Submitting Results

Email the files to Research@MicrobiomePrescription.com with the files. A quote per 1000 documents should be included. The results will be forwarded to those that I am working with.