I am currently working with BiomeSight.com to add Taxon numbers to their downloadable reports.
At first sight, this should be easy, the sample of their complete taxonomy looks like this:
The problem is that their software have forced items into an unnatural structure to make presentation easy. An item that is under Class — when you go to NCBI Taxonomy Browser may be listed as:
- Sub-Class
- Super-Class
- Order
- Sub Order
- Family
The result is that many many items have to be resolve by manual inspection of NCBI to find the apparent match and individually assigned. Example below, notice the “Group II” item which required working from existing matches for a line to identify probable candidates.
Update [MicrobiomeSight] Set OID=2731342 Where [Order]='Group II' Update [MicrobiomeSight] Set OID=1643688 Where [Order]='Leptospirae'
Other issues concern differences of spelling and renaming, i.e. Cerasicoccales vs Cerasicoccus that was found….
While NCBI shows
We have a possible old/atypical name being used which obtuficates reports. This is one of the key reasons that I am pushing for taxon numbers in all uploads because without them, we would have massive inconsistencies.
After getting all of the Genus and Above resolved, I hit an issue with the species.. namely the list shown below remain unresolved. A few I did a google for and found no hits. Many had incomplete names.
Example:
Bacillus polyfermenticus in NCBI is Bacillus velezensis variant polyfermenticus |
- Acholeplasma ales
- Burkholderia eae
- Candidatus Methylacidiphilum infernorum
- Cryocola poae
- Dechloromonas fungiphilus
- Desulfovibrio aceae
- Enterobacter aceae
- Enterobacter rottae
- Erwinia dispersa
- Haererehalobacter salaria
- Haloterrigena gari
- Herpetosiphon agaradhaerens
- Megasphaera geminatus
- Mycobacterium indicus
- Oscillospira eae
- Tessaracoccus terricola
- Pasteurella eae
- Stenotrophomonas retroflexus
- Stenotrophomonas griseosporeus
- Trabulsiella farmeri
- Vibrio bacterium
Bottom Line
All Phylum, Orders, Classes, Families and Genus had matching taxon assigned. At the Species level, 6445 were identified and 21 were not. This means 99.7% of species were given taxon numbers. I expect BiomeSight.com to offer uploadable formats soon, ideally with automatic transfer from their web site.
Recent Comments