SurveillanceNo technological replacement exists for bulk data collection: Report

Published 16 January 2015

No software-based technique can fully replace the bulk collection of signals intelligence, but methods can be developed more effectively to conduct targeted collection and to control the usage of collected data, says a new report from the National Research Council. Automated systems for isolating collected data, restricting queries that can be made against those data, and auditing usage of the data can help to enforce privacy protections and allay some civil liberty concerns, the unclassified report says.

No software-based technique can fully replace the bulk collection of signals intelligence, but methods can be developed more effectively to conduct targeted collection and to control the usage of collected data, says a new report from the National Research Council. Automated systems for isolating collected data, restricting queries that can be made against those data, and auditing usage of the data can help to enforce privacy protections and allay some civil liberty concerns, the unclassified report says.

An NRC release reports that the study was a result of an activity called for in Presidential Policy Directive 28, issued by President Obama in January 2014, to evaluate U.S. signals intelligence practices. The directive instructed the Office of the Director of National Intelligence to produce a report within one year “assessing the feasibility of creating software that would allow the intelligence community more easily to conduct targeted information acquisition rather than bulk collection.”  ODNI asked the Research Council — the operating arm of the National Academy of Sciences and National Academy of Engineering — to conduct a study, which began in June 2014, to assist in preparing a response to the President. Over the ensuing months, a committee of experts appointed by the Research Council produced the report.

“From a technological standpoint, curtailing bulk data collection means analysts will be deprived of some information,” said committee chairman Robert F. Sproull, former director of Oracle’s Sun Labs. “It does not necessarily mean that current bulk collection must continue. A reduction in bulk collection can be partially mitigated by improving targeted collection, and technologies can improve oversight and transparency and help reduce the conflict between collection and privacy.”

The report defines “collection” as the process of extracting data from a source, filtering it according to some criteria, and storing the results. If a significant portion of the collected data is not associated with current targets or subjects of interest in an investigation, it is considered bulk; otherwise, it is targeted.

A key value of bulk collection is its record of past signals intelligence that may be relevant to subsequent investigations, the report notes. The committee was not asked to and did not consider whether the loss of effectiveness from reducing bulk collection would be too great, or whether the potential gain in privacy from adopting an alternative collection method is worth the potential loss of intelligence information. It did observe that other sources of information — for example, data held by third parties such as communications providers — might provide a partial substitute for bulk collection in some circumstances.

Improving the relevance of collected information to future investigations could also be achieved with new approaches to targeting, the report says. Rapidly updating filtering criteria to include new targets as they are discovered will help collect data that would otherwise be lost, and if done quickly enough and well enough, bulk information about past events may not be needed. However, targeted collection cannot substitute for bulk collection if past events were unique or if the delay in collecting the new information is too long.

As an alternative to controlling the collection of data, automated controls on the use of collected data can help to protect the privacy of people who are not subjects of investigation, the committee found. The report describes three key technical elements required to control and automate usage: isolating bulk data so that it can be accessed only in specific ways; restricting the types of queries that can be made against stored data; and auditing the queries that have been done. The way these controls work can be made public without revealing sensitive data, so that outside inspectors can verify that the intelligence community has and abides by adequate procedures to protect privacy.

While some of the necessary technologies to enhance targeted collection or improve automated usage controls require further research and development, some of the techniques are already in use in the intelligence community or in private companies, some have been demonstrated in research laboratories, and many are feasible to deploy within the next five years, the report says, although it does not recommend adoption of any specific technology. Automating usage controls will be easier if the rules governing collection and use are technology-neutral and based on a consistent set of definitions.

Ultimately, the decision to deploy any given technology is a policy question that requires determining whether increased effectiveness and apparent transparency are worth the cost in equipment, labor, and potential interference with the intelligence mission. Such discussions were beyond the scope of this report.

The study was sponsored by the Office of the Director for National Intelligence. 

— Read more in Bulk Collection of Signals Intelligence: Technical Options (National Academies Press, 2015)