IBM DB2 - Graph extender demo

Six grueling days and I still have the conference dinner and my poster session to go. I haven't found any of the presentations today particularly interesting so I've skipped most of the sessions. I will get along tonight to the key note at 5:10pm.

I've walled into the IBM demo 40 min late, it goes for two hours, looks like I've just caught the end of the technical introduction. Basically IBM are pushing something called the DB2 "graph extender" (google doesn't seem to have anything on "graph extender"), which by the looks of the last few slides is basically turning DB2 into a graph database. She runs through some of the queries, doesn't look like it is based on RDF or any of the semantic web technologies.

The query language looks specific to DB2, but I only saw two slides. She is moving on to the demo itself now, using simple CGI script interfaces to the DB2 graph database. Showing an example of shortest path queries on yeast data-sets no real merging example ?

The obvious question is how they get this kind of data to integrate and how is the identifier mapping done ? Maybe I missed this

Some has asked where RDF fits in...

Write a wrapper around a site that speaks RDF... answer is not that good, it seems we are at the point were people are now aware of RDF as a significant technology but don't really yet appreciate the technical details. For example one guy mentions that if we grab an RDF graph we could potentially suck down the entire web... this is all because of the fact that RDF uses URIs ? Okay then, we haven't even solved the RDF URI resolution problem, see LSID discussion, URIQA etc. Afterwards I spoke to one of the IBM guys who told me that they have researches at the IBM Cambridge labs who are working on integrating RDF technology into DBM2.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Oracle are active in this space too

I bumped into Susie Stephens, from Oracle's Life Science marketing group, here at ISMB.
Oracle are active in this space too. e.g. they are a participant in the BioDASH project.

Oracle already has support for graphs (or in oracle's terminology, networks) built into it's database, allowing various types of topological query to be done on the network connections that are stored in the db.

Apparently in 10g release 2 (due out any time now) there will be explicit support for RDF, built on top of this underlying network representation/querying technology.

Earlier presentations at ISMB have mentioned how poorly most current RDF storage/querying technologies perform, as the number of 'facts' scales up. Will be interesting to see how the various technologies in this space evolve, as they start to be thrown at non-toy problems.


Good point. Scalability of tr

Good point. Scalability of triple stores for biological applications (or in general) is clearly something that will need to be addressed. There is a report from SWAD-Europe here on RDF scalability. Relevant quote:

Other important storage issues that arose were were with scalability of the current systems available. Stores that can handle 10-20M triples are readily available and the current state of the art is around 40M; the development community is considering the next 10x increase in storage requirements, and their affect on indexing, which has tended to be O(n) for triples. Novel dedicated storage approaches such as in RDFStore were shown to avoid this. The dedicated non-relational stores can outperform the relational ones in such scaling, although the relational databases continue to perform well.

My understanding is that both DB2 and Oracle 10g do graphs on top of their relational dbs.