I have just added another console application. I am doing things in layers and downstream we could create a DLL with everything in it. At this point (especially for those wishing to port it to other languages), doing one feature in one console application is likely the best approach.
This installment takes the ubiome json file and uploads it to the tables.
This is the template to use for uploading tests that provides the standardized taxon number. The only different would be in file parsing.
https://github.com/Lassesen/Microbiome2/tree/master/ubiomeUpload
Input file structure
{ "download_time_utc": "2019-06-24T22:46:36.000Z", "sequencing_revision": "1346982", "site": "gut", "sampling_time": "2019-06-12T00:00:00.000Z", "notes": "", "ubiome_bacteriacounts": [ { "taxon": 1, "parent": 0, "count": 70967, "count_norm": 1000000, "tax_name": "root", "tax_rank": "root" },
After one upload, your data should look like this:
The code is simple, a single method that reads the file name, takes the JSON and makes an object. Walks the object and create a data table. Then calls a stored procedure with this data table and other information.
One thing that is also done is it writes a report on any taxon that it could not match to the ncbi microbiome hierarchy. “Different strokes for different folks”. Actually, more often a taxonomy got deprecated and ubiome has not updated their system.
In the execution folder, you will see a file containing the missing taxon
With the contents like:
If you open the json file, and search for it, you will see what they call it.
You could add this to the taxonHierarchy or ignore it. I searched for it by name and got an apparent match but with a different taxon number.
Resolving this disagreement is up to you. One option is a replacement table of ubiome’s taxon to ncbi taxon where you are confident that they are the same.
Recent Comments