” No consistent association between a vegan diet or vegetarian diet and microbiota composition compared to omnivores could be identified. Moreover, some studies revealed contradictory results. This result could be due to high microbial individuality, and/or differences in the applied approaches. Standardized methods with high taxonomical and functional resolutions are needed to clarify this issue. “
I have seen that also in extracting facts to the database. While diet (based on these studies) is still on the suggestions list, it is not recommended to use. Specific food is a very different question. Diets tend to be nebulous collections of foods making things very undefined.
I recall reading reviews of difference of reports by bloggers who took two samples from the same stool and sent them to different analysis labs. There are a dozen possible explanation for those differences.
Due to the demise of uBiome, a number of former users downloaded their FASTQ data files and processed that data through different providers that will determine the bacteria taxonomy from FastQ files. Most of us naively believed that the reports would be similar – after all it is digital data in and thus similar taxonomy would be delivered… It appears that things are a lot more complex than that.
What is in a FastQ File
A taxonomy download may be 20-30,000 bytes. This contains the bacteria name and hopefully the taxa number with the percentage or count out of a million. The FastQ file is the result of a machine reading the DNA bits of bacteria in your microbiome. It is a lot bigger. DNA bits are represented by 4 characters (A,T,C,G) The typical data would be 170,000,000 bytes (170 Megs).
If you examine the text, yes text, you will see line after line with:
These strings have been matched to certain bacteria, just like your DNA would match to you (and other people closely related). If you go over the US National Library of Medicine, you will find information on these sequences, like this for Bacillus subtilis , a common probiotic.
So, the process is matching up to a reference set. At this point of time we walk into the time trap!
A firm like uBiome may have gotten the latest values when it was started. I suspect a business decision was made not to constantly update them. Why you ask? The answer is simple, to maintain consistency and comparability from sample to sample over time. If they use newer ones, then they should reprocess the old ones to be consistent, but then reports will change in minor or major ways — resulting in support emails and phone calls. Support can be a major expense. So keep to what we started with. I suspected that with uBiome Plus, they were working on using new reference values, after all it was a different test!!
Each provider has a different set of reference sequences. Their sequences may be proprietary (not in the publish site above). This means that to compare results, you need to use the same reference sequences to match with your FastQ microbiome data. If not, it may result in a “bible” by taking page 1 from King James Bible, page 2 from the Vulgate, page 3 from Tynsdale’s translation, etc. Things become a hash.
Another issue also arises, bacteria get renamed or refined. The names used in an older reference library may not match the names in a latter reference library.
For myself, I have the FastQ for all of my uBiome tests and my Thryve Inside tests. I will continue on requiring these FastQ files from testing firms so I can keep the ability to compare samples to each other overtime by running them through the same provider.
I have created a page to allow comparison between FastQ files processed to taxonomy by different provider. The button to get to it, is at the top of the Samples Page – “FastQ Results Comparison”
This takes you to a list of all of your samples. Note that I have 4 samples with the same date below. It is actually just 1 FastQ file interpreted by four different providers. There are additional providers.
This produces a report showing the normalize count (scaled to be per million). I also have the raw count on the page as tool tips over each numbers.
Who has the right numbers?
Without full disclosure by all of the providers, it is difficult to tell.
With all things equal, the current provider that you are getting samples processed through would be the first choice. Why? it allows you to do immediate comparisons. This is not that critical because both https://www.biomesight.com/and https://metagenomics.sequentiabiotech.com/ will convert a FastQ file to a taxonomy in less than a hour.
What about Research Findings?
Fortunately, researchers use the same process for each study. That means that the results are relatively independent of the process used. It does mean that Study A may find some bacteria are high or low and this is NOT reported in Study B. The why may be very simple, that bacteria was never looked for. Things get fuzzy. With the distribution of bacteria known for a particular method, then we can determine if it is high or low… but that means sufficient samples with that method. With uBiome, we had a large number of samples from this one provider and that allow us to make some good citizen science progress.
Bottom Line on why the difference
Different reference libraries
Change in bacteria classifications (same sequence, different name)
I recall reading reviews of difference of reports by bloggers who took two samples from the same stool and sent them to different analysis labs. There are a dozen possible explanation for those differences.
Due to the demise of uBiome, a number of former users downloaded their FASTQ data files and processed that data through different providers that will determine the bacteria taxonomy from FastQ files. Most of us naively believed that the reports would be similar – after all it is digital data in and thus similar taxonomy would be delivered… It appears that things are a lot more complex than that.
What is in a FastQ File
A taxonomy download may be 20-30,000 bytes. This contains the bacteria name and hopefully the taxa number with the percentage or count out of a million. The FastQ file is the result of a machine reading the DNA bits of bacteria in your microbiome. It is a lot bigger. DNA bits are represented by 4 characters (A,T,C,G) The typical data would be 170,000,000 bytes (170 Megs).
If you examine the text, yes text, you will see line after line with:
These strings have been matched to certain bacteria, just like your DNA would match to you (and other people closely related). If you go over the US National Library of Medicine, you will find information on these sequences, like this for Bacillus subtilis , a common probiotic.
So, the process is matching up to a reference set. At this point of time we walk into the time trap!
A firm like uBiome may have gotten the latest values when it was started. I suspect a business decision was made not to constantly update them. Why you ask? The answer is simple, to maintain consistency and comparability from sample to sample over time. If they use newer ones, then they should reprocess the old ones to be consistent, but then reports will change in minor or major ways — resulting in support emails and phone calls. Support can be a major expense. So keep to what we started with. I suspected that with uBiome Plus, they were working on using new reference values, after all it was a different test!!
Each provider has a different set of reference sequences. Their sequences may be proprietary (not in the publish site above). This means that to compare results, you need to use the same reference sequences to match with your FastQ microbiome data. If not, it may result in a “bible” by taking page 1 from King James Bible, page 2 from the Vulgate, page 3 from Tynsdale’s translation, etc. Things become a hash.
Another issue also arises, bacteria get renamed or refined. The names used in an older reference library may not match the names in a latter reference library.
For myself, I have the FastQ for all of my uBiome tests and my Thryve Inside tests. I will continue on requiring these FastQ files from testing firms so I can keep the ability to compare samples to each other overtime by running them through the same provider.
I have created a page to allow comparison between FastQ files processed to taxonomy by different provider. The button to get to it, is at the top of the Samples Page – “FastQ Results Comparison”
This takes you to a list of all of your samples. Note that I have 4 samples with the same date below. It is actually just 1 FastQ file interpreted by four different providers. There are additional providers.
This produces a report showing the normalize count (scaled to be per million). I also have the raw count on the page as tool tips over each numbers.
Who has the right numbers?
Without full disclosure by all of the providers, it is difficult to tell.
With all things equal, the current provider that you are getting samples processed through would be the first choice. Why? it allows you to do immediate comparisons. This is not that critical because both https://www.biomesight.com/and https://metagenomics.sequentiabiotech.com/ will convert a FastQ file to a taxonomy in less than a hour.
What about Research Findings?
Fortunately, researchers use the same process for each study. That means that the results are relatively independent of the process used. It does mean that Study A may find some bacteria are high or low and this is NOT reported in Study B. The why may be very simple, that bacteria was never looked for. Things get fuzzy. With the distribution of bacteria known for a particular method, then we can determine if it is high or low… but that means sufficient samples with that method. With uBiome, we had a large number of samples from this one provider and that allow us to make some good citizen science progress.
Bottom Line on why the difference
Different reference libraries
Change in bacteria classifications (same sequence, different name)
I just pushed out an update on http://microbiomeprescription.azurewebsites.net/ that may help you understand what various prescription, over the counter and some supplements may be doing to your microbiome.
Select any of the links highlighted below
The next page will show some choices at the top:
Compare Impact
This is intended to allow you to better choice between alternatives – for example Aspirin versus Paracetamol (acetaminophen). I am sure people will find more uses for it.
The process is simple, search for each item, and put a check beside it. Select the Compare Impact radio button and then click the submit button below it.
This will take you to a page listing the impact side by side. In this case we seel that their impacts are similar, but different on a few items. At the family level there are a few differences
Compensate
This is intended when you are prescribed drugs to treat some conditions and wish to reduce the impact on the microbiome by counteracting the drug or drugs impact on the microbiome.
For this example, we pick lovastatin (a statin), Famotidine (Pepcid AC).
We may wish to first see how much impact they have together (do they reinforce or counteract each other)
Just pressing back, and changing radio buttons, and submit produces suggestions.
The suggestions are done by creating a virtual microbiome report based on the above shifts and running that through our AI engine.
The suggestion page is the new format with the long lists hidden until you ask to see them.
The Take or Avoid list is defaulted to 100 items (which is one reason that I toggle visibility). Remember – none of these items are guaranteed to work, nor do you need to take all of them. Each item increases your odds…
The avoid list values are a lot higher, and thus you may wish by reducing any of these items that you are taking.
By uploading, you consent to allow your microbiome data and symptoms to be made available to citizen scientists for further discoveries.
Required consent is cited above. 3rd party is responsible to obtain consent.
Json Structure
The structure is simple:
The key is issued by us and identifies where the data is coming from (“source”)
logon and password are the authentication pair that you generate. These are used for logging on. Logon and Password should be the same for all samples from the same user (so we can display on a timeline).
The report file reports only at the strain level, no genus or family levels are given. These total sums up to 100%. The smallest resolution appear to be 0.02% That is 1 in 5,000 bacteria. This is a lot lower resolution than other providers ( 1 in 160,000 is seem in some other reports with a good sample). There is something odd about a large number of bacteria being at 0.02 or 0.04 percent.
It appears that FASTQ downloads from them (alleged to be available if requested) is the prefered way to get better data.
One bacteria was listed as:”(Bifidobacterium catenulatum/Bifidobacterium gallicum/Bifidobacterium kashiwanohense/Bifidobacterium pseudocatenulatum)”
which is with more current tests are 4 different strains.
Bottom Line: Won’t Do
There are too many problems with the data. I have spent almost an entire day fighting it. If they provide a FASTQ file, I have unload for those processed through SequentiaBiotech web site.
To use their CSVs:
They must provide the official Taxon Numbers in the Excel File
They must provide the full hierarchy with numbers at each level
Without those, their data will pollute the existing contributed base too much. There are no acceptable kludge arounds for these defects.
A reader reported a bug with the manual selection of which taxa to modify page that I did a while back. As a result, I smooth the flow and also realized that I need to allow a wider search for modifiers since the bacteria selected may have few modifiers — especially when a species or strain is selected.
Quick Recap on how to Manual Select bacteria taxa
This is done via the “My Biome View” button from the Samples Page
On this page, you can select which taxa you wish to have included in the suggestions. You make the choices entirely.
Click the button and you will return to the Sample Page and a new button will appear:
This takes you to the usual custom suggestion page (except most of the bacteria filters are hidden — after all, you have hand picked them!)
Parent and children Modifiers are being added
Bacteria is reported in a hierarchy, the next level up is the parent, the next level down are the childern
Above you see that the Parent of Oscillibacter is Oscillospiraceae. Things reported to modify the parent will likely modify its children. The key work is likely. A modifier may reduce Marseillibacter but not Oscillibacter; we lack the studies.
Similarly, the three children of Oscillibacter may have items that modify them. One would expect that if something reduces one of the children then Oscillibacter would be reduced too! Again the key word is likely.
On the Reference pages for Bacteria, you will now see three icons
IF we have 10 or more modifier documented to change this bacteria, we do not include the parent or the children. If we have less than 10 then we include this additional information (giving it a reduced weight reflecting the greater uncertainty).
Bottom Line
Do not ask me what is the right way to get suggestions. We do not know. What I can provide are tools that will generate suggestions is a logical manner. This latest addition extends the prior choices by:
Allowing you to hand pick the taxa from your sample, one by one
Allows you to extend the list of modifiers by asking to include items that modify its children (i.e. Lactobacillus Fermentum CQPC04 modifiers would be added to Lactobacillus Fermentum , OR/and its parent (i.e. Lactobacillus) if we have sparse information on how to modify it.
I expect only a few people will make use of this; but it is there if you want absolute control.
For over a year I have made donated data available at: http://lassesen.com/ubiome/ . I would hope that anyone using open source software project to also be open data.
This post deals with exporting the taxon, continuous and category data to a csv file format suitable for importing to R or Python for data exploration. The program code is simple (with all of the work done in the shared library).
The last post dealt with importing public data from my web site. My hope is that there will be open sourcing of data between sites derived from my open source code base.
With donated data, there are a few items that we need to make sure we do not share. We should not share:
Time series data for any person –> so dates are always the export date.
We make sure that there is apparent way to connect one sample to another.
We do the export and import by writing or reading XML via DataSets. We convert all Ids in the source to GUIDs to prevent accidental mixing source and destination IDs. This also mask sequence information, a desired characteristic for sharing data.
A key factor are the new SyncGuid column which provides a unique identifier across the multiple sites for a particular lab or report.
The C# code is very simple (the SQL does go thru some complexity to get the export data matching the import data from my last post).
static void Main(string[] args)
{
var exportName = "some site";
var filename = "MyExport";
if (args.Length > 0) exportName = args[0];
if (args.Length > 1) filename = args[1];
DataInterfaces.ConnectionString = "Server=LAPTOP-BQ764BQT;Database=MicrobiomeV2;Trusted_Connection=True; Connection Timeout=1000";
var schemaFile = new FileInfo($"{filename}Schema.xml");
var exportFile = new FileInfo($"{filename}.xml");
var export = DataInterfaces.Export(exportName);
export.WriteXmlSchema(schemaFile.FullName);
export.WriteXml(exportFile.FullName);
}
Source for C#: https://github.com/Lassesen/Microbiome2/tree/master/Export
Source for DB Schema: https://github.com/Lassesen/Microbiome2/tree/master/DBV2
Recent Comments