Another ancient draft post dusted off while sleepless in China.
One of the best books I've read on how science actually works is Stephen Toulmin's Human Understanding: The Collective Use and Evolution of Concepts. (It is, of course, long out of print.) The core of it is a set of ideas about how the social mechanisms of working scientific disciplines actually implement the intellectual goals of learning about the world, and rationally changing our minds, through a evolutionary process. (And Toulmin actually understands evolution in a sensible, blind variation plus selection, way, rather than some useless idea about progress or trends.) A lot of the argument is summed up in two of his aphorisms, which he admitted he exaggerated a bit for effect: "Every concept is an intellectual micro-institution" (p. 166), consisting of the people who accept the concept, and the practices by which they use and transmit it; and conversely, "Institutions are macro-concepts" (p. 353).
The natural question is whether one can say which institutions correspond to which concepts, and vice versa. This is a very tricky question, but an excellent beginning has been made by two papers on Camille Roth and Paul Bourgine, which I've been meaning to post about for quite a while.
While I don't want to suggest for a moment that the stuff about Galois lattices is window-dressing, the intuitive idea behind what Roth and Bourgine are doing is simple and compelling, and I think can be accurately presented without an excursion through higher mathematics. (The math is necessary when it comes to actually making the stuff work, though. And really it's pretty cool math in itself. To read more about it, it may be helpful to know that the structure Roth and Bourgine call a Galois lattice is also known as a Galois connection, because it's a relationship between two lattices. The Wikipedia entry on Galois connections is good, and explains where the name comes from, namely Galois theory in abstract algebra, which was, in fact, initiated by Evariste Galois.)
Start with an arbitrary collection of scientists; we don't care whether or not they have anything to do with one another in reality, at least not yet. These will have certain concepts they employ in their research. Probably no two scientists employ exactly the same concepts, but it's a good bet that scientists who are part of the same research community will have a lot of concepts in common. So, let's write down the set of concepts which are shared by all the scientists in our initial collection. This corresponds to starting with the initial set of all concepts, and then tossing out any ideas which aren't shared by all of our scientists. This is, so to speak, the conceptual intension of the group. Now that we have this group of concepts, we can ask "are there any other scientists who employ all these concepts?" That is, we take our initial group, and augment it with all the other scientists who share all the concepts shared by its initial members; this is the social extension of the concepts. Now, notice, we've come to a fixed point. If we took the augmented group and repeated this procedure, we'd get no increase in the group. (You might care to try checking this by hand.) Roth and Bourgine call such an augmented group an "epistemic community".
The same trick can be worked the other way, too. Start with a set of concepts; identify all the scientists who share them; and then add any additional concepts all those scientists have in common. This will get you to a fixed point as well, and so it will also be an epistemic community.
It's easy to convince yourself that if community A includes all the scientists in B and more, it must contain fewer concepts, and vice versa. This lets us define a kind of structure on communities, of the sort technically known as a lattice: socially larger but conceptually more impoverished groups sit higher in the lattice than smaller, more conceptually-distinct groups, until at the very top one finds the collection of all the scientists in the world, and whatever incredibly generic conceptual apparatus they all have in common (if anything). At the bottom are individual scientists, and their complete, presumably unique conceptual repertoires.
Just by itself, this is a neat idea for characterizing epistemic communities, but Roth and Bourgine, in their second paper, go further, and show that it can be used to actually discover such communities, more or less blind. What they did was take all the papers in Medline from 1990 to 1995 that included "zebrafish" among their keywords. They then identified, for operational purposes, the remaining keywords with the concepts employed in the papers. (Naturally, they are aware of all of the pitfalls involved in this.) They then built the Galois lattice of authors and concepts, on the assumption that every author on a paper employed all of the concepts in its keywords. This gives them a lattice of communities, and the striking thing is that their communities make sense. The top is a single community centered on the concept "zebrafish" --- no surprise. Below that are communities centered around words like "gene", "expression", "pattern", "embryo", "develop", "vertebrate", with high over-lap, and another bunch of overlapping communities based on "cloning", "stage", "transcription", "sequence", "protein", "region", "encode". These in turn sub-divide into, say, neuro-developmental communities (P. Z. Myers is probably in there someplace), or even just the spinal cord. Because this is a lattice structure, and not just a tree, there are sub-communities (say, changing patterns of gene expression in the spinal cord during different stages of development) which belong to multiple higher-level communities.
Roth and Bourgine check their results by comparing them to the ideas zebrafish biologists have about themselves, as revealed by tables of contents, review papers, etc., and by a native informant. There's much to be said for this, but it would be nice to have a more objective check. On the one hand, there will be many populations where there aren't handy native informants; on the other, people might mis-understand themselves, or simply have missed some important feature of their own organization, which could in fact be revealed by the lattice. Still, it's impressive how sensible the organization they get is, especially since the way they pick out concepts is, as they emphasize, quite simplistic, both logically and as a matter of language-processing. --- You could of course repeat this analysis on any population with partially-shared concepts and terms: scientists, philosophers, writers for science fiction fanzines, other literary intellectuals, modernist poets, (other) rappers, UFO-fanciers, political pundits, or even bloggers. The results would be interesting, and probably often pretty amusing.
As Roth and Bourgine emphasize in their first paper, simply picking out what the communities are is merely a preliminary to figuring out how they work, and how they got that way --- to understanding dynamics and function. Presumably there are relationships between, say, epistemic community structure and the structure of the scientific collaboration network. (I've already written at length about the "cultural epidemiology" they mention, so I won't repeat myself other than to say "Right on!") But you have to know what the epistemic communities are before you can ask how they're related to other things, and this is the nicest way I've seen to do that.
Update, 22 July: M. Roth writes to point out that he has a new preprint, nlin.AO/0507021, looking at, inter alia, the evolution of the zebrafish epistemic community structure from 1997 to 2004. I've not had a chance to read it yet, but I'm looking forward to doing so.
Posted by crshalizi at July 19, 2005 13:45 | permanent link