Sharing Data

The last post dealt with importing public data from my web site. My hope is that there will be open sourcing of data between sites derived from my open source code base.

With donated data, there are a few items that we need to make sure we do not share. We should not share:

  • Time series data for any person –> so dates are always the export date.
  • We make sure that there is apparent way to connect one sample to another.

We do the export and import by writing or reading XML via DataSets. We convert all Ids in the source to GUIDs to prevent accidental mixing source and destination IDs. This also mask sequence information, a desired characteristic for sharing data.

A key factor are the new SyncGuid column which provides a unique identifier across the multiple sites for a particular lab or report.

The C# code is very simple (the SQL does go thru some complexity to get the export data matching the import data from my last post).

static void Main(string[] args)
{
    var exportName = "some site";
    var filename = "MyExport";
    if (args.Length > 0) exportName = args[0];
    if (args.Length > 1) filename = args[1];
    DataInterfaces.ConnectionString = "Server=LAPTOP-BQ764BQT;Database=MicrobiomeV2;Trusted_Connection=True; Connection Timeout=1000";
    var schemaFile = new FileInfo($"{filename}Schema.xml");
    var exportFile = new FileInfo($"{filename}.xml");
    var export = DataInterfaces.Export(exportName);
    export.WriteXmlSchema(schemaFile.FullName);
    export.WriteXml(exportFile.FullName);
}

Source for C#: https://github.com/Lassesen/Microbiome2/tree/master/Export

Source for DB Schema: https://github.com/Lassesen/Microbiome2/tree/master/DBV2