Data mining
is becoming an important IT tool for the intelligence community. It combines
statistical models, powerful processors, and artificial intelligence to find valuable
information that can be buried in large amounts of data. Retailers have relied
on data mining to understand and predict the purchasing habits of customers,
while credit card companies have relied on data mining to detect fraud. After 9
/11, the U.S. government concluded that data mining could be a valuable tool
for preventing future terrorist attacks.
There are two basic types of data mining:
subject-based and pattern-based. A subject-based data mining application could
be used to retrieve data that could help an agency analyst follow a particular
lead. Pattern-based or link analysis can be used to look for suspicious
behaviors through nonobvious associations or relationships between seemingly
unconnected people or activities. For example, a pattern-based data mining
analysis could identify two terrorists who use the same credit card to book a
flight or who share the same address.
Pressure to prevent another catastrophic
terrorist attack has led to a proliferation of data mining projects. A 2004
report by the General Accountability Office (GAO) reported that federal
agencies were engaged in or planning almost 200 data mining projects. Former
deputy director of the Information Awareness Office at the Defense Advanced Research
Projects Agency, Robert Popp, says “There is a real fear of not going down this
path, because if there is value you don’t want to be on the side that opposed
[a data mining project].” It comes as no surprise that agency heads have been
approving data mining projects almost as fast as they are conceived. However,
such media outlets as The New York Times and USA Today have
uncovered top secret programs that collect and look for patterns in phone records,
emails, and other personal information. Although many government officials and
politicians defend this as being critical to the war on terror, a growing
number of people have expressed their concerns for ensuring privacy.
A number of experts are questioning whether an
IT strategy with no clear goals and unlimited scope, budget, and schedule will
best serve its end. Given the government’s poor track record for IT projects,
many people are concerned that projects could drag on for years and that good projects
could be overlooked because some bad projects may have serious privacy and
civil liberties issues. IT projects, no matter how vital, tend to experience
serious problems when controls are nonexistent or drop to the wayside when
organizations face a crisis. This is a problem that all organizations face and
this can lead to overly ambitious projects, an unwillingness to change the
original vision, and overlooking signs when something is not working. Moreover,
some experts believe that the government’s eagerness to apply IT to
antiterrorism could backfire and disrupt the crime-fighting process if users
view the system as an obstacle for getting their work done. They will rebel or
simply not use it.
According to Steve Cooper, former CIO of the
Department of Homeland Security, “No one [in the government] has looked at data
mining from an IT value perspective. I couldn’t figure out [the value of data
mining] when I was in DHS, and I can’t figure it out now. But that didn’t stop
us from using it.” In short, no one has done a business case to determine
whether the government was getting any return on its investment—just a
rationalization that a project would be worth the investment if it could catch
just one terrorist.
However, a number of projects have gotten the
ax. For example, Congress pulled the plug on a project to create a large
database that would include everything and anything that could identify a
terrorist. Moreover, after 9 /11 the government decided to replace the Computer
Assisted Passenger Pre-Screening System (CAPPS), which focused on passenger
information (names, credit card numbers, addresses) collected by the airlines,
with CAPPS II, which would also include information purchased from data brokers
such as ChoicePoint and LexisNexis. In 2003, a controversy was created when
Northwest Airlines and JetBlue gave passenger information to the Transportation
Security Administration (TSA) in order to test the new system. Outcries of
critics that privacy safeguards were virtually nonexistent led to Congress
withholding funds for CAPPS II until a study completed by the GAO could
determine how the TSA could protect people’s privacy. After spending over $100
million on CAPPS II, TSA cancelled the project in 2004 and proposed a new
system called Secure Flight. This new system was very similar to its
predecessor, CAPPS II, in that both systems would combine passenger information
with purchased information from commercial databases.
In 2005, a group of data mining and privacy
experts made up the Secure Flight Working Group and were asked to review the
project. After nine months they submitted a confidential report that became
available on the Internet within a week. The report was highly critical and
read, “First and foremost, TSA has not articulated what the specific goals of
Secure Flight are.” Moreover, it also reported, “Based on the limited test
results presented to us, we cannot assess whether even the general goal of
evaluating passengers for the risk they represent to aviation security is a
realistic or feasible one or how TSA proposes to achieve it.”
According to Jim Dempsey, policy director of
the Center for Democracy and Technology who was part of the Secure Flight
Working Group, “TSA was never willing to reevaluate the scope of the project.
So now, five years after 9 /11, we still don’t have an automated system for
matching passenger names with names on the terror watch list. Civil liberties
had nothing to do with that.”
Bruce Schneier, a security expert and another
member of the working group, views CAPPS II and Secure Flight as examples that
show how a poor understanding of what the systems must achieve can damage
antiterror IT efforts. Schneier argues that even if a data mining system could be
developed to scour through phone records or credit card transactions and
identify terrorists with 99 percent accuracy, it still would not be of much use
to investigators. More specifically, if 300 million Americans make just 10
phone calls or other identifiable transactions per day, that would produce over
1 trillion pieces of data each year that the government would have to mine.
Even with a 99 percent accuracy rate, that would produce a billion false positives
a year, or about 27 million a day. This would still mean missing transactions
that would be made by terrorists. It came to no surprise to Schneier when The
New York Times reported that hundreds of FBI agents were looking into
thousands of data mining leads each month, with just about all of them turning
out to be dead ends.
Despite the failures of CAPPS II, there is
still a belief that data mining can be an effective tool against terrorism. One
antiterrorism data mining that has been deemed successful is a link analysis
system that has been used by investigators at Guantanamo Bay to determine which
detainees were likely terrorists. The Army’s Criminal Investigative Task Force
(CITF) used a commercially available tool and reliable data about detainees
such as where they were captured, who they associated with, and other details
about their relationships and behaviors to construct a chart of all the
detainees. Using a system called Proximity—a system developed by the University
of Massachusetts—the CITF was able to calculate a probability that a given
detainee was a terrorist or just a person in the wrong place at the wrong time.
The Guantanamo system was viewed as having a
high accuracy rate because it had a limited scope and reliable data that was
gathered by human investigators. It was a specific application used to solve a
specific problem. Valdis Krebs, an IT consultant who developed a map connecting
the 9 /11 hijackers (after the fact) says that link analysis projects are
useful only if they have a narrow scope. According to Krebs, “If you’re just
looking at the ocean, you’ll find a lot of fish that look different. Are they
terrorists or just some species you don’t know about? If the government
searched for only the activities mentioned above—emails, checks and plane
tickets—without the added insight that one of the network’s members was a
terrorist, investigators would be more likely to uncover a high school reunion
than a terrorist plot.”
Tidak ada komentar:
Posting Komentar