I have been noticing that Nodalpoint's "Who's online" box has been showing somewhere in between 20 to 80 people online at any given time and this got me thinking again on how internet has been such a strong positive force for the development of collaborative work. Open source software comes to mind as one of the best examples of what you can achieve by getting interested people together in a virtual space. Why can't we do the same for scientific research ?
I would start with two utopian premises: 1) we do research because we like science and 2) no one would ever claim other people's work as their one. With these two simple ideas we are free to think of a virtual research group very much like an "open source" effort.
This is how it would work:
Anyone one with a challenge/idea would contribute to a idea bag/pile. How many of us had in the past ideas that we put aside for future work and never got around to doing them.
Ideas would be discussed, reference papers attached with comments, expertise and resources necessary would be assessed and requested.
If there is no known solution to the challenges raised and the community has the resources and expertise to tackle the question the project would start.
If enough people would collaborate this would require little effort from each member of the community and eventually some results (positive or negative) would come up. I think a lot of us, once in a while, like to to get our hands on something that does not necessarily have to do directly with our own research but we find interesting. Would you give 5% of your time to such a side project ?
A report would be written with the data/methods/programs freely available and it would be extensively reviewed by the community. If the information is interesting a paper would be submitted by the community to a peer-review journal.
I was looking online to some equivalents to this and we have the well known bioinformatics.org but the site is just for open source software development and ends up being just a repository. Something very close to what I had in mind is thinkcycle: "ThinkCycle seeks to create a culture of open-source design innovation, with ongoing collaboration among individuals, communities and organizations around the world."
I for one would spend some time on a side project just to see Nodalpoint as the author of a paper published in a peer reviewed journal :).


Comments
Live textbooks
Why not use wikibooks to collaborate on something that might make a chapter in a textbook?
Wikibooks
some different science collab tools
I have also been interested in the same ideas of 'open-source science', and have spoken with a number of my colleagues (mainly academic scientists) about this to see what they think of the idea. Pretty much all of them have said it won't work because, to put it in a nutshell, a) publish or perish means that work with no publications = no motivation and b) who has time for this even if there was motivation? Also most of the scientists, I associate with (even the math geeks) are pretty techno-adverse and they view hanging around on science blogs as a time sink. I've argued that 'open-source science' is so much more efficient that arguments a & b will be a moot point, since once open science gets going, anyone who won't join will find that their scientific problems are solved faster by the open-group. With each day/month/year, the possibility of scientific secrecy diminishes, so it seems to me like we have an unstable equilibrium of sorts now with our current local minima being an inefficient scientific development and a much lower minima nearby which is much more efficient open scientific development -- to use a geeky metaphor.
However, as a research scientist, I'm not entirely satisfied with the collaboration tools we have now -- wikis, blogs, lists, forums. I've been playing around with a different type of on-line collaboration tool based on PDFs, so that you can discuss the primary literature with colleagues and discuss paper drafts. Ok, this isn't nearly as collaborative as wikis, but I've had so much push back from my non-geek colleagues on the wiki idea that I thought I'd try developing something different. If you want to see to see some different content collaboration stuff in development, go to www.iugo-sci.net where you can see demos of a PDF collab tool that a friend and I developed called Chinook. The chinook engine is all open-source. You can download the engine to your own server at iugo-sci.net/chinook-home.php to change the php code as you wish or run your own chinook site.
The computer scientists have lots of examples of open-collaboration working (I'm thinking of projects like apache.org, especially), but I think that they have the advantage of being techno-geeks. Most scientists I work with are not like that and it's been an uphill battle just trying to get my colleagues to try stuff. For example, I've tried to get all my colleagues on LinkedIn.com so that we could share post-doc and grad opportunities more efficiently (I was getting tired of e-mailing all my colleagues to find out when their grad students were finishing up). Even with the inducement of potential jobs, getting my non-geek scientist friends signed up has been a chore. In summary, I think there are technology barriers that will to be reduced before there is a big increase in 'open-source science'. Perhaps if it were as easy to set up a collaboration on-line as starting a yahoo groups account.... like a www.MyScience.com site (as opposed to www.MySpace.com for music lovers).
E E Holmes
I have encountered the same
I have encountered the same problems promoting collaborative technologies in academic environments. Chinook looks interesting, in fact I'm sure I came across it before in some other context ?
Something else relevant to this discussion I found is The Center for The Development of a Virtual Tumor. It doesn't sound very appealing, but they are using what looks to be a modified wiki to do online collaboration.
collaborative technologies
The Center for The Development of a Virtual Tumor is definitely interesting, although it's frustrating that it's password controlled and one can't see much of what they are doing. I'd love to see other examples of sites for on-line scientific collaboration (if anyone knows of them). There's also big bioinformatics group doing open-source stat code dev, but I can't put my hands on the link at the moment.
I've been pondering all summer about what exactly a generic science collaboration site would look like. We have good blog tools/sites, but what's a good generic structure for a site that advance a field or subfield (without being password protected). What is it about MySpace.com and friendster.com that made those sites so viral within their target group? I don't believe that biologists (a field that is particularly techno-resistant as most here can probably attest), are more techno-resistant than the average person on MySpace or Friendster. I think rather that right idea has not yet been put together. I spent all Sunday night talking with a programmer friend at pipestone.com (a group working on something kind of like drupal, but more generic). I tried for 2 hours to come up with what MyScience.com would look like -- what would make a collab site 'viral' to scientists. We went through 10 sheets of paper each cluttered little diagrams of brainstorming ideas, but we never came up with anything that gave me that 'this is it, this is the right direction' feeling. Sigh, some days I think I shouldn't have read 'The Tipping Point'.
technological or psychological barriers
I think you've pinpointed a couple of problems that many of us in the "technogeek" community face - getting people interested and getting people to understand why we think these things are good things.
I've tried and failed to get my group into RSS feeds. First I ask - have you never noticed those little orange boxes on websites which say "RSS" or "XML"? No, they have not. So I try to explain what a feed is and where you can obtain feed reader software. I point out the obvious feature of feeds, their whole reason for being, namely that they allow you to keep track of when numerous sites of all kinds update their content so you only have to visit the site when it changes or something interesting appears. I point out that this is great for scientific literature as many journals have these feeds.
To no avail. To my knowledge, not one of them has followed it up.
I find many of my colleagues are unwilling to experiment with computers. They are also asking me things like "how do I draw a box around this?" If I know, sometimes they ask "how do you know?" I'm just perplexed. I didn't know, so I sat down at the computer and I played around, or read some help or manual pages, I found out and now I do know. Isn't this how everyone operates? Particularly scientists whose purpose is supposed to be "find out how stuff works"? Apparently not.
It seems many people just want their computers "to work". They are a black box. There is no comprehension of how what appears on the screen comes to be there. Without any grasp of what it means to write a program, the simple notion that you can make a computer do anything you want - you are in control as opposed to a passive dumb user - is not there. Hence the success of Microsoft.
I really struggle with these things because to me, a true scientist is a technogeek by definition. And I have to say - I only see this attitude from biologists (disclaimer - I'm a biologist by training). I wonder if it stems from the bad old days when biology was labelled as soft science, practiced only by those who were not adept enough in mathematics and physical sciences (and by extension now, computation). I wonder if in these days of computational biology, clearly not soft science, if there isn't a hardcore of biologists who are responsible for maintaining this myth.
I wonder if it does all just stem from the "publish or perish" attitude. If people are being trained to believe that time spent learning new skills on the job is time wasted. How do we get this strange disconnect in peoples minds between the importance of the end result (the paper) and the importance of the tools used to get there? Why can people not see that time spent now is time saved later - that just a few simple computational skills can remove the tedium of every day tasks (parsing instead of cutting and pasting)? Why do they continue to struggle with inadequate software for important tasks (Word/Endnote for papers) when they are being told that there are better alternatives available (CVS, LaTeX), if they'd just take a little time to learn these things?
In my current environment I tend to assume that it's all because most people around me are inadequate. But I hear it from other places too, which makes me wonder if something fundamental in the way we teach and learn science, to others and ourselves, is deeply wrong.
Well that became a bit of a rant. I took a brief look at your site and it looks pretty nice on first inspection. Keep fighting the fight!
Neil
I think the problem is
I think the problem is generational (to some extent). Most established PIs today were educated in very different circumstances to what undergraduates and postgraduates are today. From my observations most people will use the tools they are familiar with and find change very difficult. Even simple things like using Endnote for reference management and advanced MS Word features seem beyond many people, mainly because they were not explicitly taught these kinds of tools during undergrad/postgrad.
In most labs, in fact most working environments, people generally follow the leader (it's just our biological nature when put in hierarchical groups). So if the head of the lab isn't promoting these technologies it is unlikely that members of the lab will use them. For example in my current lab there is a group wiki that is used quite frequently. However it is not used for the "enjoyment" of collaborations, more for the boss to keep track of the worker bees. And the usage didn't pick up until the boss mandated usage of the wiki.
And you can forget RSS. I gave a talk to the boss (and the lab) on the benefits of RSS for research (Connotea, CiteULike, journal RSS feeds etc.) and while I'm sure he at least appreciated the significance of the technology there was no follow up.
The frustrating thing is that you would expect people with scientific training (objective, research training, discovery, open minded ?) to be the first people to jump on this kind of technology. The reality is entirely opposite to this ? So in fact *most* scientists are highly subjective, in capable of collaboration (see various objections above), closed minded and narrow mined ?
It could be that I have had bad experiences, but I hear similar stories from others ? Or maybe I need a holiday :)
This is an excellent idea
This is an excellent idea and I think the time is right for this kind of emergent collaborative science. In fact I have been thinking along the same lines myself, here are some thoughts:
In terms of the technology to enable this kind of online scientific collaboration there is much to choose from: weblogs, mailing lists, wikis, forums, cvs/subversion, bugtrackers etc. There is also a growing number of new PI's post-docs and certainly graduate students who are familiar with open source development, bioinformatics, biology, mailing lists, wikis etc. to contribute to such a project.
There are numerous research opportunities for virtual projects and I am assuming here that a successful project will be discovery based. There is a tremendous amount of genomic data from all kinds of organisms, as well as protein-protein interactions, proteomics and of course microarray data. However like all good science the success of online collaborative research will be about asking the right questions. I think this is where collaboration online will be helpful, all hypothesis will be thoroughly checked.
The main barrier to entry will be social, someone will likely have to take a leading role in the development, do organization etc. And the project must maintain focus. Basically the same kind of social issues that crop up in regular research environments (labs). So I guess this is a virtual opportunity to try you hand at being a PI :)
Now I think with the right kind of question all these issues can be handled. I think this needs to be done outside of the context of existing institutions (too much social baggage) and it must be very grass-roots. If my experience of open source software development is anything to go by, people tend to gravitate towards roles they feel most comfortable with. Leaders will lead, coders will code, some people fact check and write documentation (I should say the manuscript).
ThinkCycle looks interesting but it seems that the process of selecting the project has become the focus, rather than working on a project. While I agree that some kind of idea bank will be necessary, I don't think there is a critical mass of people to drive those ideas forward. I think we can turn to open source again for the answer: it usually starts with an itch a programmer needs to scratch. So adding research questions to a wiki and then taking a vote on which ones are the most interesting and potentially do-able might be the right approach.
I'm willing to put my money where my mouth is here and offer up hosting and a domain name to focus the project. We can use the nodalpoint wiki to start brainstorming research questions, once something concrete is in place, I can move the relevant pages to the new site. With the addition of a bug tracker, mailing list etc.
So then, suggestions for starting points ? Maybe something with a small genome rather than a Eukaryote ? I mention this simply due to size of the data, so everyone at home can follow along. Should it be methodological ? Discovery ? Can anyone convince a few wet-lab scientists to run a few PCRs ? Something audacious, publish a paper in the Journal of Bioinformatics within six months of starting... Minimal genome research, pathway redundancy, what does all that non-coding DNA do ? maybe a comparative genomic survey of non-coding dna ?, various alternative-splicing questions, comparative yeast genomics maybe ? Are there really pathogenic archaea ? etc.
And if any of this actually works, then you will have participated in the beginning of a revolution.
Junk DNA
Junk DNA sounds like an interesting topic that a lot of people could help out with. Recently the genome of Pelagibacter ubique (Science, sub only) was release and it looks like one of the most compact genomes to date. According to the paper the average intergenic distance for this species is 3 nucleotides. For some bacterial and archaeal genomes the intergenic spacers can reach an average of around 300 nucleotides. If we assume that it is somehow good to have small intergeneic regions because it makes the genome cheaper to duplicate then one can think of why not all of the species have small intergenetic spacing. Why is there such a big range of average intergenic spacing ?
One starting point could be the characteristics of the mRNA: do very compact genomes have to have particular signals in the mRNA ? Can we find them ? Is there some price to pay to have such small spacing ? On the other side of the spectrum, do species with big intergenic regions gain anything from them ?
We could look at: the nucleotide frequencies around mRNA signals, conservation and evolution of intergenic regions , ... ?
I Would be able to spend more than 5% of my Time
I think the large intergenetic spacing consists of permanently supressed sequences. These may have been coding for protiens in the past & might have been permanently masked during evolution of these organisms.
These sequences might have been coding for some specific protiens which found themselves unwanted in due course of evolution.
or
These sequences might have lost their promotor bases due to heavy mutations which would have occured in the post Jurrasic eras.
chimpanzee
The chimpanzee genome is being published tomorrow ( Guardian story at http://www.guardian.co.uk/life/news/story/0,12976,1560091,00.html , Nature special issue at http://www.nature.com/nature/focus/chimpgenome/index.html ). Maybe there are some interesting junk DNA comparisons to be made there (it's obviously not a small genome, but you could take one chromosome, perhaps)?
we'll need a grid :)
I bet Jim Kent and the UCSC crowd are aligning the entire chimp genome with just about everything else as we speak.
I doubt we can complete with groups running 1000+ node clusters. On the other hand, how much compute power do Nodalpoint readers have access to between us all? Perhaps a good project would employ a distributed computing approach that used all our machines.
A word of caution
The idea of such a collaboration deserves a try, no doubt. The research idea needs a lot more refinement and it might be questionable whether the 5%-contribution of non-experts in non-coding DNA (or whatever field is chosen) amounts to much.
Most bioinformatics project were really performed by no more than two people that focussed on the problem for a limited amount of time. A scientifically motivated software project might be easier to get off the ground because it would be easier to assemble the requirements and to join forces.
5% of your time
You are right to say that it might be naive to think one can do research on the "spare" time of people because it takes some depth of knowledge in a area to direct the efforts at an interesting and well posed question. I think still that a collaborative effort would in many cases (not all of course) speed up research. Instead of relying on the slow process of individual learning, expertise could be tapped from the common pool. Also, the community could bring innovation by looking at questions from an outside the field perspective.
I will try to make a structured case for a possible research idea with background, questions and tasks as suggested by Chris.
time constraints
Not to be a downer but for some of us, the very concept of "spare time" is unimaginable ;-)
However, I think this is a very interesting idea. I think the way to go is to fully democratise the process. An area of the website where people post initial proposals, interested parties then join in, discussion of approach and methodology ensues, links to code and results spring up. My belief is that a well-designed web interface goes a long way to getting people to interact in the intended way. They have to be coerced but believe that they have free will :-)
Agreed
I would be happier seeing a little more structure. Let's say a project lead (or perhaps a couple), who will frame the original question, incorporate criticisms, and generally assume responsibility for the project. This person(s) could then provide a break down into tasks, which are then taken over by interested parties. Otherwise, I have a sneaky feeling that projects will remain pipedreams.
So perhaps the way to go about this would be to have people work up an abstract of a question they would like to ask. Not necessarily an A-Z description, but giving enough detail for people to mull over, and perhaps a few relevant literature pointers, if appropriate. If nothing else, a level of organisation will prevent us all from wild-goose chasing.
Just to bust the academic freedom thing, it might be an idea for interested parties, either as leads or contributors, to check their home institutions' and/or employers' policies regarding outside collaborations. Whilst it's shockingly uncouth to speak of such things, I can envision situations where employers will require acknowledgement and/or credit for eg resources (think: folding@home debacle), or who will at least wish to be kept informed of 'extracurricular' activities...
Iam with you
I wish to a part of this collaboration.
I definitly agree with you.
authorship
On a related note to employers owning all your time etc ...
"Virtual" collaboration for research is something I've been thinking about for a while now. One problem I felt would be difficult to overcome is proper attribution of authorship. I comes up often for real offline meatspace collaborations, and I suspect it would be worse for online collaborations between ad hoc groups. Also, how do you trust that new member of a your ad hoc online research group is not going to run off to a competing research group once they have the vital details of your novel findings ? Some incremental trust building system would be required. The Vancouver Protocol is a great guide, but how do you follow and enforce it with ad hoc online groups ?
To make the best use of contributors time and be taken seriously (considering many of us are probably slave to a "publish or perish" environment), I think the ultimate goal should be to publish in a real-life peer-reviewed journal. This means not too many details of a project could be revealed publically on the web, lest some journals reject the final paper due to prior online publication.
I think setting up the web interface in such a way that every authors contributions can be clearly tracked is essential (and a wiki sort of does this, but I think some clearer edit tracking system may be better, particulary in the early stages of a project).
Anyhow, I think this is a great idea, I'd like to figure out how best to make it work ...
Search of particular signals in the mRNA
For that, we can be inspired by what people did in this paper. Do we have other kind of motifs for a less compact genome ?
Wiki pages and a mailing
Wiki pages and a mailing list seems like the best bet. IRC perhaps, for chatter, but it's not a necessity (and most university firewalls seem to disable IRC anyway).
Does it need a whole new domain name? I reckon you could get away with making an area on the nodalpoint wiki.
In the initial stages a new
In the initial stages a new domain name is overkill I agree. It would still be on offer if the project gets going and people feel there is a need for it. Right now a corner of the nodalpoint wiki should be fine:
http://wiki.nodalpoint.org/virtual_collaborative_research ?
The uni I'm at now blocks IRC, but of course MSN is fine (they wouldn't want a riot on their hands). A mailing list is also do-able, once some research questions are offered up...