Very fine tuning of suggestions to modify the microbiome

A reader reported a bug with the manual selection of which taxa to modify page that I did a while back. As a result, I smooth the flow and also realized that I need to allow a wider search for modifiers since the bacteria selected may have few modifiers — especially when a species or strain is selected.

Quick Recap on how to Manual Select bacteria taxa

This is done via the “My Biome View” button from the Samples Page

On this page, you can select which taxa you wish to have included in the suggestions. You make the choices entirely.

Click the button and you will return to the Sample Page and a new button will appear:

This takes you to the usual custom suggestion page (except most of the bacteria filters are hidden — after all, you have hand picked them!)

Note the new choices with emojis!

Parent and children Modifiers are being added

Bacteria is reported in a hierarchy, the next level up is the parent, the next level down are the childern

Above you see that the Parent of Oscillibacter is Oscillospiraceae. Things reported to modify the parent will likely modify its children. The key work is likely. A modifier may reduce Marseillibacter but not Oscillibacter; we lack the studies.

Similarly, the three children of Oscillibacter may have items that modify them. One would expect that if something reduces one of the children then Oscillibacter would be reduced too! Again the key word is likely.

On the Reference pages for Bacteria, you will now see three icons

IF we have 10 or more modifier documented to change this bacteria, we do not include the parent or the children. If we have less than 10 then we include this additional information (giving it a reduced weight reflecting the greater uncertainty).

Often you may see that the same item appears at two levels

Bottom Line

Do not ask me what is the right way to get suggestions. We do not know. What I can provide are tools that will generate suggestions is a logical manner. This latest addition extends the prior choices by:

  • Allowing you to hand pick the taxa from your sample, one by one
  • Allows you to extend the list of modifiers by asking to include items that modify its children (i.e. Lactobacillus Fermentum CQPC04 modifiers would be added to Lactobacillus Fermentum , OR/and its parent (i.e. Lactobacillus) if we have sparse information on how to modify it.

I expect only a few people will make use of this; but it is there if you want absolute control.

Sharing Data

The last post dealt with importing public data from my web site. My hope is that there will be open sourcing of data between sites derived from my open source code base.

With donated data, there are a few items that we need to make sure we do not share. We should not share:

  • Time series data for any person –> so dates are always the export date.
  • We make sure that there is apparent way to connect one sample to another.

We do the export and import by writing or reading XML via DataSets. We convert all Ids in the source to GUIDs to prevent accidental mixing source and destination IDs. This also mask sequence information, a desired characteristic for sharing data.

A key factor are the new SyncGuid column which provides a unique identifier across the multiple sites for a particular lab or report.

The C# code is very simple (the SQL does go thru some complexity to get the export data matching the import data from my last post).

static void Main(string[] args)
{
    var exportName = "some site";
    var filename = "MyExport";
    if (args.Length > 0) exportName = args[0];
    if (args.Length > 1) filename = args[1];
    DataInterfaces.ConnectionString = "Server=LAPTOP-BQ764BQT;Database=MicrobiomeV2;Trusted_Connection=True; Connection Timeout=1000";
    var schemaFile = new FileInfo($"{filename}Schema.xml");
    var exportFile = new FileInfo($"{filename}.xml");
    var export = DataInterfaces.Export(exportName);
    export.WriteXmlSchema(schemaFile.FullName);
    export.WriteXml(exportFile.FullName);
}

Source for C#: https://github.com/Lassesen/Microbiome2/tree/master/Export

Source for DB Schema: https://github.com/Lassesen/Microbiome2/tree/master/DBV2

Populating with Open Data

I have done open sharing of data at http://lassesen.com/ubiome in formats designed for data science import and processing. In this post, I taken some of this data and make it uploadable to our new database schema. I will also work on utilities to allow data to be exported and shared going forward. If you get data donated on your site (AND you obtain consent for sharing the data anonymously) remember to share it with other citizen scientists.

There are a series of tables that needed to be exported and then imported. The contents of the export is illustrated below. Note: Guid in the export will be converted to an Id by the import.

To deal with some issues (and to support interchange of data between people using this open source, I added two GUID columns. This means that every sample uploaded to a site, or report given to a site is unique and may be exchange between sites without duplicates. If the same microbiome report is uploaded to multiple sites, then there will be duplicates — until you de-duplicate the data (see this earlier post).

The process is simple:

  • Bring in all of the reference data (Continuous, Category, Labs, LabTests definitions) using their names as keys to prevent duplication
  • Import the lab and other results — using SyncGuid to prevent duplication.

This is done by passing the XML from one site to a stored procedure (using Datasets) and then processing it. It is important to test the code to insure that the same data being imported multiple times does not result in extra rows. The TSQL below gives the counts on key tables.

Select 'LabResultTaxon' as TableName, Count(1) as Count from LabResultTaxon
UNION
Select 'LabResults' as TableName, Count(1) as Count from LabResults
UNION
Select 'LabTests' as TableName, Count(1) as Count from LabTests
UNION
Select 'Labs' as TableName, Count(1) as Count from Labs
UNION
Select 'OwnerReport' as TableName, Count(1) as Count from OwnerReport
UNION
Select 'ReportCategory' as TableName, Count(1) as Count from ReportCategory
UNION
Select 'ReportContinuous' as TableName, Count(1) as Count from ReportContinuous
Example – your numbers should be around these

After the import, you should recalculate core statistics for each LabTest type (see this post for a reminder). Every thing has been rolled into the Library, so the program is very short (but the library code and tSQL code is not).

var import = new DataSet();
import.ReadXmlSchema(schemaFile.FullName);
import.ReadXml(exportFile.FullName);
DataInterfaces.ConnectionString = "Server=LAPTOP-....";
DataInterfaces.Import(import);
LabTests.ComputeAllLabs(4);

Source for Importer: https://github.com/Lassesen/Microbiome2/tree/master/Import

Data Location: https://github.com/Lassesen/Microbiome2/tree/master/Import/Data

Updated SQL Server Schema: https://github.com/Lassesen/Microbiome2/tree/master/DBV2