Large Language Model vs Expert Systems for Medicine and the Microbiome

I tried out a new medical Large Language Model (LLM) API, https://www.drgupta.ai/ to see what it would suggest for a simple microbiome shift compared to what the Expert System(ES) (NOT LLM) AI system does on microbiome prescription. The first significant one Expert System for Medicine was called MYCIN and was written by Stanford University in 1972.

The main business differences between LLM and ES are costs. LLM run heavy computer costs, ES runs heavy human costs. An ES often require article by article review with suitably skilled reader (or example, a grad student). LLM are easy targets for getting venture capital funds — often based on pie-in-the-sky beliefs of how easy it would be to do that is sold to venture capitalists. Earlier I had several sessions with folks at the The Allen Institute for Artificial Intelligence exploring machine analysis of the literature. Their conclusion was that existing AI is still incapable of good analysis of medical and clinical studies.

LLM Report

I tossed a simple, three taxa problem at the API. Reality is that typically we are seeing 40-80 taxa of interest.

Expert System Response

Compared to:

We do not have vague “probiotics” but provide their specific names and dosages. We also include probiotics to avoid.

With information on suggested dosage for probiotics with links to sources.

As well as the names and links to studies used to make these suggestions.


Nothing in the LLM indicates that Pectobacteriaceae was considered. The Expert system does consider it:

And provide background

Bottom Line

This post intent was to contrast the difference between LLM and ES systems. Both can have data entry/text interpretation issues. With a suitable ES system, the suggestions can be audited and issues addressed. With LLM, this does not seem to be available with common code bases.

The table below from PubMedQA shows that accuracy is barely over 80% for the best models out there. Expert systems can exceed 95%. Is 80% “good enough” for treating patients especially if the ability to audit the logic is missing? By Audit, I mean provide links to all source studies).