New search tools help separate the wheat from the (data) chaff

Published 20 February 2008

If there is a problem which is worse than having too little information, it is having too much of it; three new tools developed by researchers at a German institute help cope with this wheat-from-the-chaff problem

We are drowning in information. Databases, text archives, corporate networks, and the Web contain mountains of information. Filtering important data or even new findings out of these masses is difficult. Semantic techniques and visual analyses can help maintain a clear overview in the data jungle. Fraunhofer researchers will be presenting several new search techniques at CeBIT in Hanover on 4 March through 9 March.

You don’t have to know everything — you just need to know where to find it, as the popular saying goes. This, however, is becoming increasingly difficult in today’s age of digital information. Knowledge is now distributed worldwide, so how can one find the information one seeks? Searching the Internet or a corporate network often provides little help. One either receives thousands of hits — or none at all. The problem is that search programs can only understand individual terms, and cannot grasp the relationship between different words, let alone the meaning, that is, the semantics, of whole sentences. If you enter the word “Web” into Google, Yahoo, or MSN, it makes no difference whether you mean a spider’s web, the World Wide Web, or a woven fabric. Here are three innovative solution to the increasingly daunting wheat-from-the-chaff problem:

First, Wikinger. In the collaborative Wikinger project, which is being led by the German Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), computer scientists, engineers, and historians are working together to give search applications a better understanding of textual content. They are using techniques for obtaining knowledge through interrelationships within a document — such as the knowledge that the term “Web” can be associated with “new media,” “nature,” or “textiles.” The knowledge platform can then semi-automatically develop semantic networks of its own accord, making it easier for users to search for specific information. “This technology is suitable for searching text archives such as those belonging to newspaper publishers,” explains IAIS project manager Lars Bröcker. “It is also ideal for any tasks that require searching large databases or linking multimedia-based data in order to obtain new, additional information.”

There is also the ConWeaver (with the catchy slogan: “Vernetzen Sie Ihr Know-how!”)search engine, which was developed by a working group led by Dr. Thomas Kamps at the Fraunhofer Institute for Computer Graphics Research (IGD) in Darmstadt, and which is tailored to the problems of large and medium-sized companies. In many businesses, staff waste valuable