Siris: Needle-Finding in Ever Larger Haystacks

Tuesday, July 10, 2007

Needle-Finding in Ever Larger Haystacks

Timothy Burke has an excellent post on Charles Stross's science fiction scenario of total historical record. Burke notes that the more information there is the more difficult effective analysis and search are. The needles of relevant information stay as small, but the haystacks of irrelevant information increase exponentially. That we will certainly get a few more needles doesn't really solve the problem. Burke notes it would take extremely sophisticated AI to get a handle on it (and that we cannot assume that such AI is inevitable). I would go farther and say that even that would not do it. What effectively would have to happen for AI to handle it would be that the AI would have to be doing historical work for us -- sorting, analyzing, interpreting, assessing relevance, evaluating data, creating an adequate and suitable topography of the evidence. Setting aside the difficulties of building an artificial historian that is actually competent and able to handle quantities of information daily far in excess of anything a human brain can handle in a lifetime, which for a scientifictional scenario I'm willing to concede, there are fundamental problems with saying that our historical work would improve because of it. For one thing, we'd face serious problems of assessment -- obviously, we can't just take the artificial historian's word for it, or we are conceding that human beings have no business doing historical work and should just defer to the program. But we can't handle the infinite piles of tedious detail; even the most thorough historian only skims through it and, with more art than science, grabs what seems best. In Stross's scenario, even with an AI capable of getting a handle on it all, the collapse of history is just as plausible an outcome, and perhaps more so, as the flourishing of it.

(I've been working on and off for a while on sketching out a science fiction story which looks at a similar issue, but with science itself, namely, looking at the question of what happens to society when scientific progress is taken out of our very inefficient human hands and put in the hands of infinitely more efficient and thorough machines. Will it be an era of progress, because of the vastly more effective and reliable scientific work being done, or an era of decline, because none of that progress is human progress? My thought is that in technological terms the progress would be swift and amazing; but since we'd know less and less about the technology we were using, this shows that technological progress is really a poor indicator of scientific progress, despite our tendency to conflate the two. Scientific progress, unlike technological progress, can't be had heteronomously. I think there are similar issues here. In Stross's scenario the technology of history, so to speak -- the evidential access on which it is built -- would skyrocket to mindblowing proportions. But this just shows that evidential access is an instrument for progress in historical scholarship, not that progress itself.)

In fact, however, the building of an artificial historian is not even a reasonable research project; there's no point to it, since it would be an immense amount of effort for relatively little result (as far as anyone's own historical inquiry goes). All we are ever likely to have are tools -- things like search engines and databases. Better search engines are nice, of course; I'll certainly take the best you can get. Ditto with databases. But we shouldn't have any illusions that it introduces a fundamental change; these things are the dishwashers and microwave ovens of historical scholarship. They save time, but not as much as one might have thought; and we just do the same things with them that we did without them. It's wonderful that a historian of medieval science does not always have to go and sit (as Pierre Duhem always did) in a library for hours on end and painstakingly copy Latin manuscripts by hand. But historians of medieval science still do in their own way what Duhem did, and whether they do it as well still depends on things other than technology.

The second, and related, point Burke makes is that total record doesn't actually relieve the problem it is supposed to relieve. The idea, according to Stross, is that "we've acquired bad behvioural habits - because we're used to forgetting things over time". But this would not actually change if the information available for access were to skyrocket, again because the haystack would expand so massively that we would likely be losing track of the needles even more than we do now. You remember that awesome Indiana Jones scene in which the Ark of the Covenant is hidden by being put into a crate which in turn is put into a vast anonymous warehouse filled to the brim with similar crates. What better way to hide it? There are two ways to lose a piece of paper: get rid of it entirely, or put it in a stack of papers so large that you'll never find it again. There's simple forgetting, and there's bureaucratic forgetting, and what Stross is advocating is not remembering but bureaucratic forgetting. Each happening gets recorded, filed, and put away. But so many things are recorded that there's always a danger that things will be as good as forgotten. One of the things that helps us to remember what we do is the fact that we forget the rest. It helps no one at all if storage is increased if our ability to access it is not; so again we are at the point that historical memory cannot be achieved by brute force, by sheer intricacy of record. It can only be had by careful and critical selection.

The third excellent point Burke makes is that even in a manageable total record a veil remains in place. In effect, all that the record gathers is evidence. But what do you do with the evidence? What inferences can you (should you, must you) draw? As he puts it, "Knowing what people do doesn't relieve you of the extraordinary difficulties involved in knowing what it means that they do it." Memory is not mere storage; it is storage, access, interpretation, and synthesis. Just increasing storage doesn't help any.

One point Burke doesn't make that I think should be made is that Stross's scenario faces the same problem that we have with the record now. After all, there was total record of the past, when it occurred. It deteriorated, and began deteriorating as soon as it existed. Historical evidence is simply that total record itself remaining in fragments. All Stross's super-storage really does is increase the shelf-life of the past through a complicated set of back-up systems; it reduces reliance on human memory and transmission of memory alone. But it is still the past in record, and it still will deteriorate, and it still will do so at prodigious speed. The nice thing about it would be that if one part deteriorated we'd have some redundancy to reduce the chances of total loss; and the redundancy can in some cases be made fairly durable. But it cannot be made infinitely so, and some of these systems are less durable than they seem. Two hundred years from now I very much doubt that anything more than a minute selection of the blogosphere will even exist any more, Wayback Machine not withstanding. This minute slice still may be a huge wealth of information -- but most of what exists now will be completely lost. This scenario will not change under any circumstances; we can delay and defer the end, but sooner or later information, like people, begins to die. Perhaps to some extent this counterbalances the other problems; but to the extent it does, the future looks less like Stross's vision and more like business as usual.

A further point that I think worth making is that, even if it all came to fruition, and all the problems with the total record itself avoided, there would be new problems for historians sorting through a total record of a society trying to handle a total record of a society with a total record of a previous society. It all seems rather unstable. But that's just an idea.

Tuesday, July 10, 2007

Needle-Finding in Ever Larger Haystacks

StatCounter