Tag Archives: PRISM

Evil in a Haystack

How do you find a terrorist hidden in millions of gigabytes of metadata?

The information collected includes records of every call placed on the Verizon communications network (and, it appears, every other U.S. phone carrier) including times, dates, lengths of calls, and the phone numbers of the participants, but not the names associated with the accounts.

For some, the collection of these data represent a grave violation of the privacy of American citizens. For others, the privacy issue is negligible, as long as it helps keep us safe from terrorism.

There are indeed privacy issues at play here, but they aren’t necessarily the obvious ones. In order to put the most important questions into context, consider the following illustration of a metadata analysis using sample data derived from a real social network. The sample data isn’t derived from telephone records, but it’s close enough to give a sense of the analysis challenges and privacy issues in play.

While this example is relevant to what happens behind the NSA’s closed doors, it is not in any way intended to be a literal or accurate portrayal. While every effort was made to keep this example close to reality, a wide number of hypotheticals and classified procedures ensure the reality is somewhat different.

We start with a classic scenario. U.S. intelligence officials have captured an al Qaeda operative and obtained the phone number of an al Qaeda fundraiser in Yemen.

You are an analyst for a fictionalized version of the NSA, and you have been authorized to search through metadata in order to expose the fundraiser’s network, armed with only a single phone number as a starting point.

The first step is refreshingly simple: You type the fundraiser’s phone number into the metadata analysis software and click OK.

In our example data, the result is a list of 79 phone numbers that were involved in an incoming or outgoing call with the fundraiser’s phone within the last 30 days. The fundraiser is a covert operator and this phone is dedicated to covert activities, so almost anyone who calls the number is a high-value target right out of the gate.

Using the metadata, we can weight each phone number according to the number of calls it was involved in, the lengths of the calls, the location of the other participant, and the time of day the call was placed. Your NSA training manual claims these qualities help indicate the threat level of each participant. Your workstation renders these data as a graph. Each dot represents a phone number, and the size of the dot is bigger when the number scores higher on the “threat” calculus.

This is already a significant intelligence windfall, and you’ve barely been at this for five minutes. But you can go back to the metadata and query which of these 79 people have been talking to each other in addition to talking to the fundraiser.

Foreign Policy asks some hard questions about we use that data.

1. How much contact can an analyst have with a U.S. person’s data before it becomes a troublesome violation of privacy? Is it a violation to load a phone record into a graph if the analyst never looks at it individually? Is it a violation to look at the number individually if you don’t associate a name? Is it a violation to associate a name if you never take any additional investigative steps?

2. Metadata analysis is more accurate when the data is more complete. Should minimization practices filter metadata on American citizens out of the analysis altogether? What if that means targeting might be less accurate and, ironically, more likely to designate innocent people for more intrusive scrutiny?

3. What percentage of phone traffic to targeted numbers travels only on foreign carriers? Does the absence of those data skew analysis and possibly overemphasize the scoring of phone numbers used by American citizens?

4. On a fundamental level, are we willing to trust mathematical formulas and behavioral models to decide who should receive intrusive scrutiny?

5. Metadata analysis rarely deals in certainties; it almost always produces probabilities. What probability of evil intent should these models demonstrate before the government uses them to help justify a phone tap, or a house search, or a drone strike? 90 percent? 60 percent? Should we allow incremental collection of slightly more intrusive data if they can clarify a marginal case?

6. Have we tested our analytical math to see how accurate its predictions are relative to the actual content of calls? If so, how were these tests done? If not, are we willing to trust these models based on their success in other fields, or do they need to be tested specifically for counterterrorism?

7. If we believe the models do need to be tested for accuracy, are we willing to endure the privacy violations such tests would almost certainly entail? Will more accurate models lead to better privacy in the long run by reducing the number of innocent people subjected to more intrusive scrutiny?

8. Are we willing to trust the government to hold this data? Although the government says this data is currently focused on foreign counterterrorism, do we believe the president might not order the NSA to access metadata in the wake of a terrorist attack of domestic origin?

9. On a related note, what happens if the origin of an attack isn’t immediately clear, as in the Boston Marathon bombing? Should the NSA immediately begin a broad analysis of metadata and continue until it’s clear where the responsibility lies?

10. If we were to allow the use of this technology in domestic terrorism investigations, during a crisis or otherwise, how do we avoid collecting information on legal political dissent? For instance, targeting anarchists might inadvertently produce a list of influential leaders in the Occupy movement. Targeting militia groups might create a database of gun sellers. When you plunge into a huge dataset, you sometimes get insights you didn’t expect.

Ugh.

David Simon (creator of The Wire) on PRISM

It’s been happening for a long time.

Having labored as a police reporter in the days before the Patriot Act, I can assure all there has always been a stage before the wiretap, a preliminary process involving the capture, retention and analysis of raw data. It has been so for decades now in this country. The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data. But the legal and moral principles? Same old stuff.

Allow for a comparable example, dating to the early 1980s in a place called Baltimore, Maryland.

There, city detectives once began to suspect that major traffickers were using a combination of public pay phones and digital pagers to communicate their business. And they took their suspicions to a judge and obtained court orders — not to monitor any particular suspect, but to instead cull the dialed numbers from the thousands and thousands of calls made to and from certain city pay phones.

Think about it. There is certainly a public expectation of privacy when you pick up a pay phone on the streets of Baltimore, is there not? And certainly, the detectives knew that many, many Baltimoreans were using those pay phones for legitimate telephonic communication. Yet, a city judge had no problem allowing them to place dialed-number recorders on as many pay phones as they felt the need to monitor, knowing that every single number dialed to or from those phones would be captured. So authorized, detectives gleaned the numbers of digital pagers and they began monitoring the incoming digitized numbers on those pagers — even though they had yet to learn to whom those pagers belonged. The judges were okay with that, too, and signed another order allowing the suspect pagers to be “cloned” by detectives, even though in some cases the suspect in possession of the pager was not yet positively identified.

All of that — even in the less fevered, pre-Patriot Act days of yore — was entirely legal. Why?

Because they aren’t listening to the calls.

Here is what does happen

In Baltimore thirty years ago, after the detectives figured out which pay phones were dialing pagers, and then did all the requisite background checks and surveillance to identify the drug suspects, they finally went to a judge and asked for a wiretap on several pay phones. The judge looked at the police work and said, okay, you can record calls off those public pay phones, but only if you have someone watching the phones to ensure that your suspects are making the calls and not ordinary citizens. And if you make a mistake and record a non-drug-involved call, you will of course “minimize” the call and cease recording.

It was at that point — and not at the earlier stage of gathering thousands and thousands of dialed numbers and times of call — that the greatest balance was sought between investigative need and privacy rights. And in Baltimore, that wiretap case was made and the defendants caught and convicted, the case upheld on appeal. Here, too, the Verizon data corresponds to the sheets and sheets of printouts of calls from the Baltimore pay phones, obtainable with a court order and without any demonstration of probable cause against any specific individual. To get that far as a law-abiding investigator, you didn’t need to know a target, only that the electronic medium is being used for telephonic communication that is both illegal and legal. It’s at the point of actually identifying specific targets and then seeking to listen to the conversations of those targets that the rubber really hits the road.