Research As A Service (RAAS)

This collection of pages describes commercial offering of selected data as a service.

An overview of issues to be considered with roll-your-own.

Citation Extraction Process

This video demonstrates the process that is done at least once a month to keep the data current. It is not intended to discover 100% of possible citations, rather just a very high percentage.

Citations Overview

Information on citations are usually given as shown below for an item that modifies a bacteria. The basis of citations are always:

  • id — our internal identifier, numeric
  • name — this is the common name that we are using, there is an API that will give alternative names
  • doi — Digital Object Identifier System
  • url — the url to the source or a summary. Note that some url may be behind paywalls or have moved.
                    "id": 40,
                    "name": "berberine",
                    "citations": [
                            "id": 143,
                            "url": "",
                            "doi": "0161903/AIM.008",
                            "logic": "Direct",
                            "impact": 1
                            "id": 283,
                            "url": "",
                            "doi": "10.3892/mmr.2017.6321",
                            "logic": "Direct",
                            "impact": -1

Modifier Context

Citations in the context of a bacteria being specified too high or too low may include

  • Impact:
    • 1 – shifts towards medium
    • -1 – shifts away from medium
    • 0 – no impact (rarely reported)
  • Logic: Indicates the logic connecting
    • Direct (C) – this bacteria is explicitly Cited in the source
    • Children Impacted (D) – Descendents
    • Parent Impacted (P) – the hierarchy layer above the bacteria is impacted

The above only applies to the lowest levels of the taxonomy hierarchy as shown in the table below. Note that some items impacts only some species of a genus – a study may report that the genus changes. we cannot be certain that every species will be impacted. In our internal algorithm we diminish the expected impact of items that are not Direct to reflect this uncertainty.

StrainSpecies, Subspecies, species group, species subgroupGenus, Subgenus
Impact StrainC– DirectP
Impact SpeciesDC– DirectP
Impacts GenusDC – Direct

In a few cases (where there is an absence of data), we may apply this to Family following the maxim “Poor data is better than no data”.


For PubMed items, there is C# core sample code on GitHub that will allow you to obtain a summary of each article. Please note that the information may be coming from any of the following:

  • The Summary
  • The full article
  • Appendix to the full article (example CSV files, Excel files, word documents etc)
  • Charts or tables (rendered as images) from the article.