Part III: Designing Interaction Details in Java

Figure 15-2 LibraryThing is a Web application that allows users to catalog their own book collections online with a tag-based system. The universe of tags applied to all the books in all the collections has become a democratic organizational scheme based upon the way the user community describes things.
Relational Databases versus Digital Soup
Software that uses database technology typically makes two simple demands of its users: First, users must define the form of the data in advance; second, users must then conform to that definition. There are also two facts about human users of software: First, they rarely can express what they are going to want in advance, and second, even if they could express their specific needs, more often than not they change their minds.
Organizing the unorganizable
Living in the Internet age, we find ourselves more and more frequently confronting information systems that fail the relational database litmus: We can neither define
15: Searching and Finding: Improving Data Retrieval
information in advance, nor can we reliably stick to any definition we might conjure up. In particular, the two most common components of the Internet exemplify this dilemma. The first is electronic mail. Whereas a record in a database has a specific identity, and thus belongs in a table of objects of the same type, an e-mail message doesn t fit this paradigm very well. We can divide our e-mail into incoming and outgoing, but that doesn t help us much. For example, if you receive a piece of e-mail from Jerry about Sally, regarding the Ajax Project and how it relates to Jones Consulting and your joint presentation at the board meeting, you can file this away in the Jerry folder, or the Sally folder, or the Ajax folder, but what you really want is to file it in all of them. In six months, you might try to find this message for any number of unpredictable reasons, and you ll want to be able to find it, regardless of your reason. Second, consider the Web. Like an infinite, chaotic, redundant, unsupervised hard drive, the Web defies structure. Enormous quantities of information are available on the Internet, but its sheer quantity and heterogeneity almost guarantee that no regular system could ever be imposed on it. Even if the Web could be organized, the method would likely have to exist on the outside, because its contents are owned by millions of individuals, none of whom are subject to any authority. Unlike records in a database, we cannot expect to find a predictable identifying mark in a record on the Internet.
Problems with databases
There s a further problem with databases: All database records are of a single, predefined type, and all instances of a record type are grouped together. A record may represent an invoice or a customer, but it never represents an invoice and a customer. Similarly, a field within a record may be a name or a social security number, but it is never a name and a social security number. This is the fundamental concept underlying all databases it serves the vital purpose of allowing us to impose order on our storage system. Unfortunately, it fails miserably to address the realities of retrieval for our e-mail problem: It is not enough that the e-mail from Jerry is a record of type e-mail. Somehow, we must also identify it as a record of type Jerry, type Sally, type Ajax, type Jones Consulting, and type Board Meeting. We must also be able to add and change its identity at will, even after the record has been stored away. What s more, a record of type Ajax may refer to documents other than e-mail messages a project plan, for example. Because the record format is unpredictable, the value that identifies the record as pertaining to Ajax cannot be stored reliably within the record itself. This is in direct contradiction to the way databases work.
