I want to share with you something that I've realized in the past few months at school. Forgive me if this is old hat to you but I think it's kind of cool.
I knew that it is important to publish scientific papers if you are in academia ("publish or perish".) Plus I knew vaguely about the concept of peer-reviewed journals, where your findings have to convince a number of other scientists before they can be actually accepted and published. And it seems the "in" thing now if you're an instructor to supplement the text and lectures with generous helpings of assigned readings of scientific papers, so I've pretty much been soaking in them.
What I didn't realize before is that the whole process is one of laboriously building up a body of factoids that are supported by actual data. The more generally applicable these bits are ("reusable" for you software geeks) the better. Then those bits are reshuffled and recombined by other scientists and added to their own results to create even further bits of knowledge. The effect is one of science consuming itself and growing larger at the same time.
Let me give you an example. Say a scientist named Jane Doe decided to do some research to test the hypothesis that the sky is blue. She might scrounge some cool instruments and take readings of the sky in various ways, and then publish data that show that, at least within her constraints, the sky does indeed appear to be blue.
Then someone else wants to ask a question about how animal vision works. I am totally making this up. They might need to establish a basic model of the world in which these animals see. As part of this, they could reference Jane's data. What I think is cool is how they would do this. It's as simple as referencing the data right in what they want to say. For example: "Animals looking up view objects against a sky that is blue (Doe, 2005)."
Technically you could put at least one such reference on every sentence, which gets a bit tedious. What I like about this is that you can connect a lot of dots without having to re-prove everything. You can build on cool things that others have done. You can provide context for your own findings. And it weaves everything together into a big tapestry of stuff we know so far.
One thing this means is that you need to read a lot of papers (don't tell my instructors I said that), and assimilate the information somehow into a kind of brain soup, and be able to recall the relevant bits later. There is actually
software that helps with this, and I'm going to a tutorial this week on how to use it, given by the author of the
"why google is not god" list.
Another thing it means is that data that hasn't been published is not usable. So it turns out that there are rewards for publishing beyond just keeping your teaching job. It's as if publishing data is giving back to the community. I like that model a lot.
Of course I'm taking a bit of a pollyanna viewpoint here. There are people who publish stuff that's
of dubious use to anyone else, and people who misuse data that others have published, perhaps by creative cutting and pasting of statistics. But still. I think it's a charming system and I'm looking forward to joining the game.