As promised, it's April 1st and here's my publication pick. Published in February, so not exactly paper of the month, but not to worry. "Serendipitous discovery of Wolbachia genomes in multiple Drosophila species" by Salzberg et al. is in the open access journal Genome Biology, abstract here and full access here. Hit "read more" for the details.
This is a really nice piece of data mining. To give you some background, Wolbachia pipientis wMel is a bacterial endosymbiont of the fruit fly, Drosophila melanogaster. Draft genome sequence (~ 8x coverage) is available for 7 other species of Drosophila and like all good genomics people, the sequencing institutes have submitted their raw sequence reads to the NCBI Trace Archive. Now, if you sequence the genome of organism A that potentially contains another organism B (e.g. a parasite or endosymbiont), you have a good chance of obtaining quite a lot of sequence from organism B. This is an annoyance when it comes to assembling genome A, but it also provides the potential to discover genes from organism B. Incidentally, this is the rationale behind the archaeal pathogen detection database, published in Bioinformatics 20: 2361-2362 by your humble author.
So - the authors have used the genome of Wolbachia pipientis as a probe to screen the trace archive records for the other 7 Drosophila species. Not only did they find Wolbachia DNA in 3 species of Drosophila, but they managed to assemble it into 3 distinct Wolbachia genomes, representing 3 previously unknown species. One new Wolbachia genome is 95 % complete, the others are 75-80 % and 6-7 %.
I think this is very impressive stuff. It demonstrates 2 things - first, the power of TIGR's computers which enables this kind of study. Second, just the sheer volume of genome data that we now have available - so much that with the right tools and a clever approach, you can find whole new species lurking in there! You can envisage similar studies using the many traces from mammalian genomes that reside in the trace archive to look for novel pathogens and symbionts.


Comments
From the paper:"The amount of
From the paper:
I would now like to demonstrate my ignorance of DNA preparation techniques. Endosymbiont here refers to integrated Wolbachia DNA in the fruit fly DNA or fruit fly cell ? Does this mean the trace archive is picking up non-intracellular parasite/organismal DNA, or is that somehow excluded during the DNA extraction process ? Also how effective are the screens for plasmid/E. coli DNA ?
re: dna preps
Presumably you get a bunch of fruit flies, gently mash them and extract DNA from the mess. If anything else is living inside the flies, you will get DNA from that too. How much of that DNA you end up cloning and sequencing then depends on the factors that you highlighted in the comment. Plasmid/E. coli DNA is screened out during the genome assembly (e.g. using cross_match).
I think this aspect of a sample containing DNA from other sources is one that people tend not to think about - which is one of the reasons that I found the paper of interest.
Marketing
It is quite impressive that mountains of data, full of potentially interesting discoveries, are just lying about in some NCBI server room. While the Wolbachia genomes study certainly demonstrates this, it just isn't marketable enough to spread the message. I'm only half joking here, I know that to the people that understand this stuff marketing is the last thing they would be thinking about. However it seems a shame that the "wow! that's kinda cool" aspect of this discovery (including the methods) will not be widely absorbed. I look forward to the next paper of the month, takers ?
Cool does it for me
It's true that many people will overlook this, because (a) it doesn't involve medicine or humans and (b) it's in a fledgling and open access journal. Which is a shame. "Wow, that's cool" should be enough for any real scientist, IMHO.
I'm a big fan of the BMC journals (Bioinformatics, Genomics and Genome Biology).