I was looking for the best way to explain what is blast to people with no background in biology/bioinformatics.
I thought I could to say it's a search engine:
- Blast is the same as google, but for biological sequences instead of search terms.

In fact, the problem which Blast is aimed to solve is: given a sequence, like a sequenced mRNA from an experiment or a protein sequence, search in the public databases to see if it already exists or if there are similar sequences and where and how.
What do you think of this analogy?
Many bioinformatics courses do explain what is blast and the alignment theory in the same row, in a single lesson: but I think this is a bit confusing for some kind of people and that the two topics should be separated.
You can also explain what is euristhic by saying that google never returns the best results for a search, but only a set of the best ones.
Imagine to search in the Internet for the terms 'alternative splicing': how can the search engine determine which is the 'n. 1 page' for these terms? It doesn't, it just gives a score to every possible page and then returns the 10 best ones as results.
This is the same for blast: it would be very computationally expensive to determine what is the most similar sequence to our query, and also 'similarity' could be defined in various different ways (e.g.: by using different scoring matrixes), so blast just returns a set of the sequences with the higher scores corresponding to our sequence.
You can think of other analogies; for example, as you can do with the Internet search engines, you can use blast to do things which are different from the standard use, like searching in a different dataset to look for related (in this case, homologues) items, and so on.
I don't remember if I've heard of this analogy from somewhere.. but I only would like to ask if it could be considered as a good analogy, or a bad one.
Bye! :=)
p.s.: original article: http://dalloliogm.wordpress.com/2007/06/11/spieghiamo-cosa-e-blast-come-...


Comments
String-blasting
It's useful to apply BLAST-like techniques for searching over strings. This has been used to find variations of the names of genes in text by several groups. First, Michael Krauthammer, when he was at Columbia, used base-pairs to encode arbitrary strings (they're usually encoding amino acids), then queries gene and protein names for matches in the text of journal articles.
BLAST is really just edit distance with some exclusion heuristics which don't work at all well on small strings. So it's more natural to implement this notion directly, as Tsuruoka and Tsujii did. There's a nice description of the algorithms in Dan Gusfield's string algorithm bible.
Our LingPipe software provides an implementation of approximate dictionary matching following Gusfield. Here's a link to the class Javadoc: com.aliasi.dict.ApproxDictionaryChunker. We provide Tsuruoka and Tsujii's distance metric as a constant, but the distances are plug-and-play.
The really critical issue here is not just finding approximate matches of names of biomedical entities, but also disambiguating them. The acronym "ACT" means a lot of different things in different contexts. Figuring out which sense of a word or phrase is intended is a widely studied problem usually going under the heading of word sense disambiguation for common nouns or database linkage for proper nouns. This can either be done via unsupervised clustering, or by supervised database linkage if there are example contexts. Luckily, databases such as Entrez and KEGG provide GeneRIFs which include pointers to articles about specific genes. And evaluations like Biocreative are evaluating abilities of systems to figure out which gene is being mentioned in an article.
Bob Carpenter
Alias-i, Inc.
Analogy
Or, BLAST is like a microwave oven, except for sequences and not frozen burritos.
But seriously, I think the Google analogy isn't very good because
1) You don't search using a subject in BLAST, but by using another sequence. If Google worked that way, you'd give it a web page and it would find web pages similar to it.
2) BLAST is statistical, Google isn't. The only measure of how good a Google search is is the (subjective) opinion of the searcher.
If you want an analogy, assuming the students know some bench biology, how about "BLAST is an electronic Southern Blot"?
Flawed, but useful analogy: Bloogle
I think Google is still a useful analogy for explaining BLAST to wet bench biologists, even if it does have its flaws. As for "Google isn't statistical", I disagree. What about all that statistics, probability and machine learning they use to builld and improve search results? Google (and other search engines) have very well defined metrics for measuring search quality, it's not all subjective. So despite its problems, search is still a handy analogy for describing BLAST that many people will be familiar with.