FiReaNG3L's blog

Eureka Science News - intelligent science news aggregator

Eureka Science News just launched - it's an intelligent science news aggregator based on machine learning (for classification) and clustering of recent news from all major science sources - I'm a microarray analyst, so I used the same techniques I use to cluster genes, but this time to cluster news! It's fun to use skills in other contexts - I hope you will enjoy the site!


How to compile a database of citations?

The discussion on impact factors got me wondering - is there a public, free access citation database for articles in Medline / Pubmed? I know of Scopus, ISI WOS (but theyre not free, and their content is proprietary) and Google Scholar (only give 'cited by', when I want 'this article cites x and y')?

How would one build such a database, if its not accessible? I know that ISI actually scans articles (not doable by myself) - I don't know how Scopus got their index, though.

Such a database would help tremendously on some bibliomics work I'm doing. Is it technically feasible to get references for all Medline articles (at least, those past 1996?). Where would you get the information - scrape/spider&index publishers website, if this information is even freely accessible (without a subscription?) and then match against a local Medline database (which I already have)? If anyone can help, it'd be appreciated :)


Medline XML to database parser?

I recently downloaded Medline in XML format - the goal is to load it in a relational database (like mySQL), index it, and then somehow save the world with the data. I'm pretty sure tons (relatively speaking) of people have done the same thing before (except maybe the save the world part), and I'd prefer not to reinvent the wheel if I don't have to.

Anyone know of a good XML->database parser for Medline? If not I guess I'll code one myself! Indexing tips / advice would also be appreciated (first time I'm playing with a 50+ GB database). I heard Lucene is the bomb for indexing such a large database...


Syndicate content