What's in a word...Computers to tell fact from opinion in newspaper articles

Published 25 September 2006

Inquiring minds want to know whether what people say about the U.S. is based on fact or is mere opinion; DHS has allocated $2.4 million to a consortium of three universities to develop machine-learning algorithms which computers will use to engage more effectively and accurately in information extraction

We are big on dual-use technologies, and here is a story about a technology which may be used to enhance the U.S. national security, but also to ferret out truth from fiction during this political season. It all began with a question about what the foreign press was saying about President George Bush. Yes, much of what was said is not exactly flattering, but more importantly, how much of what was said was based on fact, and how much of it was based on the writer’s or the newspaper’s own views? How could we tell?

A program by a Cornell computer scientist, developed in collaboration with colleagues at the University of Pittsburgh and University of Utah, aims to teach computers to scan through text and sort opinion from fact. The research is funded by DHS, which has designated the consortium of three universities as one of four University Affiliate Centers (UAC) to conduct research on advanced methods for information analysis and to develop computational technologies which could contribute to national security. Cornell will receive $850,000 of $2.4 million in funding provided for the consortium over three years. Claire Cardie, Cornell professor of computer science, who is one of three co-principal investigators for the grant, said that “Lots of work has been done on extracting factual information — the who, what, where, when….. We’re interested in seeing how we would extract information about opinions.”

Cardie specializes in information extraction, that is, the process by which computers scan text to find meaning in natural language. This is no mean feat, as computers are very literal and demand that information be presented according to rigid rules. The task of the programmers is thus to bridge the differences between the rigid computer brain and the more agile and reflective human brain. More specifically, the programmers had to find a way for the computer to identify subjects, objects, and other key parts of sentences to determine meaning. The new research will use machine-learning algorithms to give computers examples of text expressing both fact and opinion and teach them to tell the difference. The work will also seek to determine the sources of information cited by a writer. UAC also has educational goals, seeking to train students to work in information extraction and presenting seminars and workshops for other researchers. The center also will offer summer seminars for women and underrepresented minority undergraduates.

Cardie says that DHS has established the UACs partly because it currently lacks enough in-house expertise in natural-language processing. Although the research may conjure fears about invasions of privacy, Cardie says she will be working only with publicly available material, primarily news reports and editorials from English-language newspapers worldwide. The results will always include pointers to the original sources, so that when a computer draws some conclusion, human beings will be able to look at the original material and determine whether or not the conclusion was correct.

In addition to Cardie, Janyce Wiebe, associate professor of computer science at the University of Pittsburgh, and Ellen Riloff, associate professor of computer science at the University of Utah, are among the project’s leaders.

-read more in this Technologynewsdaily report