Liam Quin from W3C has given a few useful tips relating to processing documents (eg error-prone re-typed or scanned text) into XML.
Many of these practises are important for the sort of text processing tasks that seem to come up in bioinformatics.
Article summary: use lots of small one-off scripts to make small changes, continually validate your output, briefly document your steps, automate steps with a meta-script or Makefile and keep input and output text seperate (.. well duh!).


Comments
scripting != programming
I like this part: "Before embarking on writing scripts, you need one of two things: the right frame of mind to write a script, or someone else to do it for you. In either case, the frame of mind is very different from what's needed for making a product, so not all professional programmers are good at doing this until they understand the differences."
clunky
Shouldn't we dream of a more streamlined approach, where you don't need glued together ad-hocery to process data?
ad voc scripts vs. integrated systems
Yes, I prefer the *idea* of a streamlined approach (just because I posted that article doesn't mean I agree with what it says :) ).
I think if the style of text being processed doesn't have a defined format, and it is a once off task, the ad hoc set of tools (which occasionally may be reusable) is often the way to go. Luckily, most raw bioinformatics data has a defined format, making a streamlined approach much more sensible.
I guess the question you have to ask yourself is ... am I ever likely to use this script again ?
(or am I feeling altruistic, and will someone else use it after me even if I won't ever need it again ?).
I guess it is the streamlined approach which is slowly emerging from the BioPython, BioPerl, Bio* etc projects, which is nice.
clunky ^ 2
as jim kent puts it: It's safer on the lagging edge.