Understanding the impact of your medicines

I just pushed out an update on http://microbiomeprescription.azurewebsites.net/ that may help you understand what various prescription, over the counter and some supplements may be doing to your microbiome.

Select any of the links highlighted below

The next page will show some choices at the top:

Compare Impact

This is intended to allow you to better choice between alternatives – for example Aspirin versus  Paracetamol (acetaminophen). I am sure people will find more uses for it.

The process is simple, search for each item, and put a check beside it. Select the Compare Impact radio button and then click the submit button below it.

This will take you to a page listing the impact side by side. In this case we seel that their impacts are similar, but different on a few items. At the family level there are a few differences

If a family that is important to you is shifted the wrong way, you may wish to consider the better one

Compensate

This is intended when you are prescribed drugs to treat some conditions and wish to reduce the impact on the microbiome by counteracting the drug or drugs impact on the microbiome.

For this example, we pick lovastatin (a statin), Famotidine (Pepcid AC).

We may wish to first see how much impact they have together (do they reinforce or counteract each other)

Bad news — they reinforce each other in decreasing many families

Just pressing back, and changing radio buttons, and submit produces suggestions.

The suggestions are done by creating a virtual microbiome report based on the above shifts and running that through our AI engine.

The suggestion page is the new format with the long lists hidden until you ask to see them.

The Take or Avoid list is defaulted to 100 items (which is one reason that I toggle visibility). Remember – none of these items are guaranteed to work, nor do you need to take all of them. Each item increases your odds

The avoid list values are a lot higher, and thus you may wish by reducing any of these items that you are taking.

Automatic Upload and Login from 3rd Party Sites

An upload from a 3rd party site may be done by posting json to http://microbiomeprescription.azurewebsites.net/api/upload

By uploading, you consent to allow your microbiome data and symptoms to be made available to citizen scientists for further discoveries.

Required consent is cited above. 3rd party is responsible to obtain consent.

Json Structure

The structure is simple:

  • The key is issued by us and identifies where the data is coming from (“source”)
  • logon and password are the authentication pair that you generate. These are used for logging on. Logon and Password should be the same for all samples from the same user (so we can display on a timeline).


“key”:”3rdpartyKey”,
“logon”:”3rdpartyId”,
“Password”:”3rdpartyPassword”,
“taxonomy”:[ 
      { 
“taxon”:2321,
“percent”:0.000304
      },
      { 
“taxon”:2841,
“percent”:0.000983
      }
   ]
}

The taxonomy uses the official taxon numbers and the percentage.

Logon

On your site, create a page that does a post to /email/logon3rd with two elements:

<form method=”post”
action=” http://microbiomeprescription.azurewebsites.net/email/logon3rd“><input type=”hidden” name=”logon” value=”whatever” />
<input type=”hidden” name=”password” value=”whatever” />
<input type=”submit” value=”Logon to MicrobiomePrescription” />
</form>

Atlas Bio Upload Notes

The report file reports only at the strain level, no genus or family levels are given. These total sums up to 100%. The smallest resolution appear to be 0.02% That is 1 in 5,000 bacteria. This is a lot lower resolution than other providers ( 1 in 160,000 is seem in some other reports with a good sample). There is something odd about a large number of bacteria being at 0.02 or 0.04 percent.

While different strains are identified, the naming is not matching the official NCBI name.

It appears that FASTQ downloads from them (alleged to be available if requested) is the prefered way to get better data.

One bacteria was listed as:”(Bifidobacterium catenulatum/Bifidobacterium gallicum/Bifidobacterium kashiwanohense/Bifidobacterium pseudocatenulatum)”

which is with more current tests are 4 different strains.

Bottom Line: Won’t Do

There are too many problems with the data. I have spent almost an entire day fighting it. If they provide a FASTQ file, I have unload for those processed through SequentiaBiotech web site.

To use their CSVs:

  • They must provide the official Taxon Numbers in the Excel File
  • They must provide the full hierarchy with numbers at each level

Without those, their data will pollute the existing contributed base too much. There are no acceptable kludge arounds for these defects.

Very fine tuning of suggestions to modify the microbiome

A reader reported a bug with the manual selection of which taxa to modify page that I did a while back. As a result, I smooth the flow and also realized that I need to allow a wider search for modifiers since the bacteria selected may have few modifiers — especially when a species or strain is selected.

Quick Recap on how to Manual Select bacteria taxa

This is done via the “My Biome View” button from the Samples Page

On this page, you can select which taxa you wish to have included in the suggestions. You make the choices entirely.

Click the button and you will return to the Sample Page and a new button will appear:

This takes you to the usual custom suggestion page (except most of the bacteria filters are hidden — after all, you have hand picked them!)

Note the new choices with emojis!

Parent and children Modifiers are being added

Bacteria is reported in a hierarchy, the next level up is the parent, the next level down are the childern

Above you see that the Parent of Oscillibacter is Oscillospiraceae. Things reported to modify the parent will likely modify its children. The key work is likely. A modifier may reduce Marseillibacter but not Oscillibacter; we lack the studies.

Similarly, the three children of Oscillibacter may have items that modify them. One would expect that if something reduces one of the children then Oscillibacter would be reduced too! Again the key word is likely.

On the Reference pages for Bacteria, you will now see three icons

IF we have 10 or more modifier documented to change this bacteria, we do not include the parent or the children. If we have less than 10 then we include this additional information (giving it a reduced weight reflecting the greater uncertainty).

Often you may see that the same item appears at two levels

Bottom Line

Do not ask me what is the right way to get suggestions. We do not know. What I can provide are tools that will generate suggestions is a logical manner. This latest addition extends the prior choices by:

  • Allowing you to hand pick the taxa from your sample, one by one
  • Allows you to extend the list of modifiers by asking to include items that modify its children (i.e. Lactobacillus Fermentum CQPC04 modifiers would be added to Lactobacillus Fermentum , OR/and its parent (i.e. Lactobacillus) if we have sparse information on how to modify it.

I expect only a few people will make use of this; but it is there if you want absolute control.

Exporting data for DataScience

For over a year I have made donated data available at: http://lassesen.com/ubiome/ . I would hope that anyone using open source software project to also be open data.

This post deals with exporting the taxon, continuous and category data to a csv file format suitable for importing to R or Python for data exploration. The program code is simple (with all of the work done in the shared library).

DataInterfaces.ConnectionString = "Server=LAPTOP-BQ764BQT;Database=MicrobiomeV2;Trusted_Connection=True; Connection Timeout=1000";
File.WriteAllLines("DataScience_Taxon.csv", DataInterfaces.GetFlatTaxonomy("species").ToCsvString());
File.WriteAllLines("DataScience_Continuous.csv", DataInterfaces.GetFlatContinuous( ).ToCsvString());
File.WriteAllLines("DataScience_Category.csv", DataInterfaces.GetFlatCategory().ToCsvString());
File.WriteAllLines("DataScience_LabReport.csv", DataInterfaces.GetLabReport().ToCsvString());

The result files examples are shown below:

Source at:

Library update is shown below https://github.com/Lassesen/Microbiome2/tree/master/DataScienceExport

Sharing Data

The last post dealt with importing public data from my web site. My hope is that there will be open sourcing of data between sites derived from my open source code base.

With donated data, there are a few items that we need to make sure we do not share. We should not share:

  • Time series data for any person –> so dates are always the export date.
  • We make sure that there is apparent way to connect one sample to another.

We do the export and import by writing or reading XML via DataSets. We convert all Ids in the source to GUIDs to prevent accidental mixing source and destination IDs. This also mask sequence information, a desired characteristic for sharing data.

A key factor are the new SyncGuid column which provides a unique identifier across the multiple sites for a particular lab or report.

The C# code is very simple (the SQL does go thru some complexity to get the export data matching the import data from my last post).

static void Main(string[] args)
{
    var exportName = "some site";
    var filename = "MyExport";
    if (args.Length > 0) exportName = args[0];
    if (args.Length > 1) filename = args[1];
    DataInterfaces.ConnectionString = "Server=LAPTOP-BQ764BQT;Database=MicrobiomeV2;Trusted_Connection=True; Connection Timeout=1000";
    var schemaFile = new FileInfo($"{filename}Schema.xml");
    var exportFile = new FileInfo($"{filename}.xml");
    var export = DataInterfaces.Export(exportName);
    export.WriteXmlSchema(schemaFile.FullName);
    export.WriteXml(exportFile.FullName);
}

Source for C#: https://github.com/Lassesen/Microbiome2/tree/master/Export

Source for DB Schema: https://github.com/Lassesen/Microbiome2/tree/master/DBV2

Populating with Open Data

I have done open sharing of data at http://lassesen.com/ubiome in formats designed for data science import and processing. In this post, I taken some of this data and make it uploadable to our new database schema. I will also work on utilities to allow data to be exported and shared going forward. If you get data donated on your site (AND you obtain consent for sharing the data anonymously) remember to share it with other citizen scientists.

There are a series of tables that needed to be exported and then imported. The contents of the export is illustrated below. Note: Guid in the export will be converted to an Id by the import.

To deal with some issues (and to support interchange of data between people using this open source, I added two GUID columns. This means that every sample uploaded to a site, or report given to a site is unique and may be exchange between sites without duplicates. If the same microbiome report is uploaded to multiple sites, then there will be duplicates — until you de-duplicate the data (see this earlier post).

The process is simple:

  • Bring in all of the reference data (Continuous, Category, Labs, LabTests definitions) using their names as keys to prevent duplication
  • Import the lab and other results — using SyncGuid to prevent duplication.

This is done by passing the XML from one site to a stored procedure (using Datasets) and then processing it. It is important to test the code to insure that the same data being imported multiple times does not result in extra rows. The TSQL below gives the counts on key tables.

Select 'LabResultTaxon' as TableName, Count(1) as Count from LabResultTaxon
UNION
Select 'LabResults' as TableName, Count(1) as Count from LabResults
UNION
Select 'LabTests' as TableName, Count(1) as Count from LabTests
UNION
Select 'Labs' as TableName, Count(1) as Count from Labs
UNION
Select 'OwnerReport' as TableName, Count(1) as Count from OwnerReport
UNION
Select 'ReportCategory' as TableName, Count(1) as Count from ReportCategory
UNION
Select 'ReportContinuous' as TableName, Count(1) as Count from ReportContinuous
Example – your numbers should be around these

After the import, you should recalculate core statistics for each LabTest type (see this post for a reminder). Every thing has been rolled into the Library, so the program is very short (but the library code and tSQL code is not).

var import = new DataSet();
import.ReadXmlSchema(schemaFile.FullName);
import.ReadXml(exportFile.FullName);
DataInterfaces.ConnectionString = "Server=LAPTOP-....";
DataInterfaces.Import(import);
LabTests.ComputeAllLabs(4);

Source for Importer: https://github.com/Lassesen/Microbiome2/tree/master/Import

Data Location: https://github.com/Lassesen/Microbiome2/tree/master/Import/Data

Updated SQL Server Schema: https://github.com/Lassesen/Microbiome2/tree/master/DBV2

Non Parametric Detection

As we have seen, the microbiome is NOT a normal or bell curve. I struggled for almost a year trying to get statistical significance out of the data with parametric techniques. While I did get some results, the results were disappointing. When I switched to a non-parametric approach, I shouted EUREKA (without becoming a streaker thru town, unlike a certain ancient greek did).

In the last post we dealt with both continuous and category factors associated with a person. In terms of my existing site, using symptom explorer you will see tables such as the one shown below with used 4-quantilies.

In our earlier post on statistics, we saw how we can compute the quantiles for the available taxonomy. In this lesson we will use that data plus a category variable to detect significance as shown above in real time. This means that the results may change as more data is added — to me, this makes it a living research document.

First the Nerd Stuff — Moving to Libraries

For this example, I have consolidated into a library most of the key stuff from prior posts. The class diagram is below. I plan to keep expanding it with future posts.

https://github.com/Lassesen/Microbiome2/tree/master/MicrobiomeLibrary

Computating the non-parametric

This is done by selecting a LabTest (remember that technically we cannot compare uBiome numbers to XenoType numbers to Thryve numbers) and then some Category. I opted not to go down the control group to category group path because with my donated data, it is not reliable. I opted to go down the population to category group path, which while technically less sensitive — it is a reasonable approach.

We need to associate Category and Continuous Reports to Lab Results and this means just adding one new table LabResultReport as shown below, it links two timeline items together.

From the @LabTestId and @CategoryId we just need to select which quantile to use. Did we divide data into 3,4,5,6,7 etc. buckets. If you look at the prior post, we see that it is easy to select which one, “Q3_”, “Q4_”, etc is the @quantileRoot. We need one more value: @MinSamples – if we do not have at reasonable number, there is almost no change of getting significant. I usually require 4 data points per bin — so Q3_ -> 12, Q4_-> 16, Q5->20.

Passing these number to a stored procedure, we get a dataset back as shown below:

  • Quantiles
    • Taxon
    • Count
    • StatisticsValue
    • StatisticsName (i.e. Q3_1,Q3_2 or Q4_1,Q4_2,Q4_3 etc)
  • User Data
    • Taxon
    • Value
  • Taxon Data
    • Taxon
    • TaxonName

The process is simply counting the data in User Data in each range and then applying some simple statistics to get P Values.

In terms of the calling program, the code is very simple:

var data = DataInterfaces.GetNonParametricCategoryDataSet(1, 1, "Q4_", 20);
var matrix = MicrobiomeLibrary.Statistics.NonParametric.CategoricSignficance(data);
matrix.WriteXml("Matrix.xml");

I just dump the data to a file for simplicity sake. You can open this file via excel to get a nice spreadsheet.

For myself, I wrote a long running (24hrs) program that iterated thru the range of values for Categories (and combinations of categories!) with different quantiles.

https://github.com/Lassesen/Microbiome2/tree/master/ConsoleTestLibrary

Homework

When we work with Continuous variables, we need to convert the ranges into quantiles (just like we did for taxon). This could be done using the ranges we entered, or by breaking into quantiles. Personally, using quantiles would be my preference because too many numbers are not bell/normal curves but are assumed to be just that. I will leave people to do pull requests with their code suggestions.

Connecting the dots…

We have a microbiome, we have lab results, we have official conditions (ICD), we have symptoms. Last, we have substances (for example, probiotics) that modify the microbiome and thus may alter:

  • lab results
  • official condition status (i.e. mild, severe, acute)
  • microbiome
  • symptoms (one symptom may disappear or appear)

Information on expected impact of the above come from medical studies.

The typical question is “What should I take to improve {lab results|symptoms|official diagnosis|microbiome}?” The response should be typically, “Base on study A,B,C,K, you should take X to improve {lab results|symptoms|official diagnosis|microbiome}? “

The answers may come indirectly and may be by inference. For example:

I wish to improve my diabetes.

  • Severity of diabetes is connected with high A bacteria and low B bacteria and high levels of TNF-alpha
  • Substance X has no published studies for diabetes
  • Substance X has published studies for decreasing A and not altering B.
  • Substance Y has published studies for increasing B and not altering A but it does reduce TNF-alpha levels.

The inference is that you should consider taking X and Y to improve your diabetes. In some cases, you may find something like:

I wish to improve my mother’s Alzheimer’s Disease.

  • Severity of Alzheimer’s Disease is connected with high A bacteria and low B bacteria.
  • Substance X and Y has published studies for Alzheimer’s Disease showing positive results
  • Substance X has published studies for decreasing A and not altering B.
  • Substance Y has published studies for increasing B and not altering A.

The database schema below attempts to capture this information from citations (studies).

Let us look at what information may be in a study and map the information to tables (following are made up study results for illustrations)

  • Salted Herrings at 20gm/day improves IBS from Study A
    • Modifier: Salted Herring
    • Citation: A
    • ICDCode: IBS
    • ICDModifierCitation
      • DirectionOfImpact: +1
      • AmountOfImpact: NULL — nothing reported
      • UsageInformation: 20gm/day
  • Same study found TNF-Alpha Increases by 20% above control
    • Confinuous Reference: TNF-Alpha
    • ContinousModifierCitation:
      • DirectionOfInpact: +1
      • Amount of Impact: 1.2 (1 being no change)
      • UsageInformation: 20gm/day
  • Same study found Asthma Disappear in 30% of patients
    • CategoryReference: Asthma (Yes or No remember)
      • ContinousModifierCitation:DirectionOfInpact: -1
      • Amount of Impact: 0.8 (1 being no change)
      • UsageInformation: 20gm/day
  • Same study found Sillium bacteria increased in patients
    • TaxonHierarchy: Sillium
      • TaxonModifierCitation:DirectionOfInpact: +1
      • Amount of Impact: nothing reported
      • UsageInformation: 20gm/day

So the results of one study ended up with entries in 4 tables.

We have a lot of possible inferences here:

  • Sillium impacts TNF-Alpha
  • Low Sillium may be associated with Asthma

All of this stuff becomes facts in our Artificial Intelligence/Expert System engine which I will cover in a few weeks.

Alternative Names

Alternative names is actually critical for text mining (i.e. having programs determine if there is important data is a study, paragraph or sentence). Studies may use a multitude of names for the same thing. For example, you may decided to use the latin name for herbs, Hypericum perforatum and then have the alternative names “St. John Wart” and “Saint John Wart”. The alternative names should be unique, hence the unique index is placed on this column.

Bottom Line

Above is the full solution, I have only partially implemented it and the only one of the table that I have been populating has been TaxonModifierCitation. Readers have asked question about TNF-Alpha, Interleukin 10 (IL-10), also known as human cytokine synthesis inhibitory factor (CSIF). My own resources could only stretched to review and processing this table. Ideally, a crowd-source efforts (or a wealthy patron to have Ph.D. students) would allow the full solution to be populated.