FoodMicrobionet v 4.1.2 is here! If you want to cite it and get links to GitHub and Mendeley data (were you can download files and scripts) have look at this preprint or at the published paper (Parente, E., Zotta, T., Ricciardi, A., 2022. FoodMicrobionet v4: a large, integrated, open and transparent database for food bacterial communities. International Journal of Food Microbiology, 372, https://doi.org/10.1016/j.ijfoodmicro.2022.109696). The paper is free to download at https://authors.elsevier.com/a/1f0z4cF3iBTqK until June 24th, 2022.
With 180 studies and 10,155 samples belonging to 8 major food groups this is arguably the largest and best annotated database on metataxonomic data for for bacterial communities. The database includes 1684 environmental samples and 8,467 food samples, belonging to 16 L1 categories and 196 L6 categories of the FoodEx2 classification. Taking into account further annotation on sample nature, heat treatment and spoilage fermentation, samples in FoodMicrobionet belong to 316 different combinations. The details for the structure of the database are here. Further major changes include:
I have switched to SILVA 138.1 for taxonomic assignment and fixed former genus Lactobacillus; revision of taxonomic assignment of studies 34 to 170 completed; revision of taxonomic assignment of studies 1 to 33 in progress, scheduled to be completed by June 2022;
added fields for geoloc, primers and sequence organization (paired end, overlapping) to studies (multiple values separated by comma possible) to studies and geoloc_name_countries, geoloc_name_countries_continent and lat_lon to samples
If you want to get more facts and figures on this version of FoodMicrobionet have a look here.
Since the manuscript describing it has now been accepted for publication in IJFM (Hallelujah), this version is publicly available on GitHub and on Mendeley Data. There are two versions available:
FMBN for ShinyFMBN 2.3: this version is designed to be compatible with the ShinyFMBN app, v2.3 but lacks fields on primers and geographic location
FMBN_plus: the complete database, available as a R list, will be distributed only in the framework of scientific collaboration agreements. Upon request phyloseq objects for studies 26 to 180 can be made available. This allows access to ASVs for each study.
In addition, if you are really into it, you can inquire for a collaboration and get the phyloseq objects for almost all studies in FoodMicrobionet.
All the scripts we use to process raw sequences and to prep data for inclusion in FoodMicrobionet are now public on GitHub. Visit the FoodMicrobionet repository to view the most recent versions of the scripts.
A new version of DairyFMBN has been added to Mendeley data (https://data.mendeley.com/datasets/3cwf729p34/5) and quarantine will expire at the end of July. This is connected to an astounding ( 😉 )workflow for inference of microbial association networks, which is available on GitHub (https://github.com/ep142/MAN_in_cheese). Both are related to a review on microbial association networks in cheese which is about to be published as a preprint…
Using 34 studies in the last version of DairyFMBN I have inferred microbial association networks (at the genus level) using NetCoMi, with four different methods (SparCC, CCREPE, SPRING, SpiecEasi). I then calculated how often an association was detected (by any method in any study) and which was the average stability (i.e. the mean of the number of times a given association was detected by the 4 methods within a study). here are the results for the top 25 copresence and mutual exclusion associations.
I have made major changes to the taxonomic table of the database, which is now compatible with SILVA taxonomy. Although SILVA taxonomy does not match the taxonomy in the List of Prokaryotic names with Standing in Nomenclature the change was necessary because new studies in FoodMicrobionet are processed using SILVA v138 and this was causing inconsistencies in the higher level taxonomy of taxa (i.e. the same genus might potentially have a different lineage depending on when it was added to the database) and this, in turn, would prevent correct aggregation at levels higher than genus. In case you want to know, I did compare taxonomic assignments done with SILVA v132 and v138 for five studies using different 16S RNA gene regions as a target. The same sequence was assigned to a different “taxon” in as many as 70% of the cases. However, when doing comparisons at the genus level, >96% of the sequences were assigned to the same genus using either version of the database. Mismatches were mostly due to sequences (actually Amplicon Sequence Variants) which, when tested with Seqmatch had consistently a Sab<0.80 with the best match. Again, the best way to compare studies is to reprocess data based on the same target using exactly the same pipeline, but this is time consuming. Doing comparisons at the genus level is still a reasonable alternative: Article A comparison of bioinformatic approaches for 16S rRNA gene p… The update version of the last public version of FoodMicrobionet should be shortly available here.
As usual, we are open to collaborations and if you are interested in obtaining data from FoodMicrobionet 3.2.6 contact me.
Latest additions in FoodMicrobionet are done using SILVA 138 SSU. This version introduces several differences over previous version, especially in higher level taxonomy (2/3 taxonomic paths have been changed). I have compared taxonomic assignments with SILVA v132 with those of v138 using 5 recent studies targeting different 16S RNA gene regions (V1-V3, V3-V4, V4-V5) and found that:
≥95% of the sequences in each study are identified at the genus or species level in the same way
overall, the matching identifications at the genus level range from 70 to 90%; differences are usually due to sequences of poor quality (which receive ambiguous identifications with either BLASTn or SEQMATCH)
Overall, while the best way to compare results of different studies is to re-analyse the data using the same pipeline and the same version of the taxonomic database, I still feel that comparing different studies at the genus level is a reasonable compromise. In addition, with FoodMicrobionet you always have the option of selecting studies which are as close as possible in terms of target, platform and pipeline. However, due to the changes in the higher level taxonomy, I have decided to make the higher level taxonomy (i.e. above the genus level) compatible with SILVA 138 SSU, even if this is sometimes in contrast with NCBI taxonomy or LPSN.
One last issue is with the new classification of the former genus Lactobacillus. The new classification has been incorporated in LPCN and NCBI taxonomy, but not in Florilege nor in SILVA, and searches with the old species names still work. Therefore I have decided to leave things as they are (and add a small hidden switch in the code of ShinyFMBN which allows you to convert old names into the new ones).