YeastHub: A Semantic Web Use Case for Integrating Data in the Life Sciences Domain

Yeast hub

This is the talk I was genuinely interested in seeing, it is supposedly all about semantic web...

- Time is right for using the semantic web...

So the introduction is fairly general, the web is full of heterogeneous data and access methods, we need metadata and we need a standard format to put our metadata in which in this case is RDF.

The speaker is now talking about the proliferation of all the different BioXML standards, for example MAGE-ML, SBML, etc. The problem in pathway databases is particularly bad with many formats describing the same thing. So according to the speaker we should unify on RDF/XML. Then we get all the so-called "stuff for free" e.g. inference, integration etc.

Nice thought and I totally agree with this approach (obviously), but he is presenting is as an obvious conclusion, what about the fundamental problem of getting the data into RDF/XML format in the first place, inconsistent ontologies, lack of GOOD tools etc.

Now were are getting the standard overview of the semantic web: What is RDF? Triples, Subject Predicate Object ?

Will he mention the identifier problem ? URIs vs URNs ?

What is Yeast Hub ?

They use RSS which he calls Rich Site Summary (not really simple syndication and of course no mention of ), data conversion tools (convert from different file formats and databases ) D2RQ tool (for relational), converting tabular data to RDF - they use and for storage they use Sesame

Okay now he is talking about the history of RSS and re-defines it as really simple syndication. Talks about feeds etc. mentions aggregators etc. (I really hope my Boss is here listening to this, damn it, I've been pushing this ages now, since it has been mentioned at a "real" scientific conference I assume it will be all the rage :))

He talks about aggregator technology and applying this to yeast, at this point I'm not so sure why he's doing it this way ? why not just use straight RDF ? You get the same benefit right ?

So the general idea is take a bunch of data, convert it to RDF, dump it into a RDF triple store (Sesame in this case). And then query using SPARQL etc. to discover interesting things ?

Now he's giving some examples of the conversion of different data.

Example: six different data sets, yeast genome data (stanford) and mips (europe), using gene lists (names ?) and then integrate with protein protein interactions, gene ontology etc.

Web interface to RDQL query language, use a query language generator...

Integrative data mining and analysis...

He's talking about some of the problems now, e.g. naming issues, legal and social issues, security and robustness etc.

In the future they want to look at LSIDs, ontology mappings, Semantic web services such as S-OWL...

Audience questions:

Seem the whole audience is walking out :)

Should chat to this guy ?

My questions:

Why RSS ? It is a syndication technology for news...

This talk is really about a closed world system (i.e. they convert all the data) they don't take existing RDF and try to integrate it