Potentially opening the door to large-scale identity theft

Jul 7, 2009 13:14 GMT  ·  By

Researchers from the Carnegie Mellon University have studied the predictability of Social Security Numbers (SSNs) and have discovered that they can be determined fairly easily from publicly available data. The researchers urge companies in the private sector to stop using SSNs for identity verification, something they were never intended for in the first place.

Social security account numbers were originally implemented back in 1936, as a means of tracking people's income. In 1943, in order to avoid confusion and keep a level of consistency across all future systems that required tracking individuals, the government mandated that these numbers be used for all of them. This included drivers' licenses, voting registration, tax collection, and so on.

In time, companies from the private sector also adopted the use of SSNs to track individuals, particularly after 1989, when these numbers started being issued at birth. The problem is that companies, particularly from the financial sector, took SSNs one step further and started using them for identity verification, which eventually opened the way to abuse, or what is today commonly known as identity theft.

For their study, the Carnegie Mellon researchers used data obtained from Social Security Administration's Death Master File, commercial databases and online social networks. "In a world of wired consumers, it is possible to combine information from multiple sources to infer data that is more personal and sensitive than any single piece of original information alone," Alessandro Acquisti, associate professor of information technology and public policy at Carnegie Mellon's H. John Heinz III College and this project's lead, noted.

By using the Death Master File, the researchers succeeded in getting the first five digits of SSNs belonging to 44 percent of the individuals born after 1988 in a single try. This is because the official algorithm used to generate this first portion of someone's SSN is directly tied to the date and place of their birth. This is not applicable to people born before 1988, therefore for SSNs belonging to these individuals, they only had a seven-percent success rate from a single try.

For 8.5 percent of the 44 percent of individuals born after 1988, whose first five digits were obtained in a single attempt, under 1,000 retries were required to get the remaining four digits. Additionally, smaller states and more recent years translated into a higher rate of success. For example, under ten attempts were needed to determine one in twenty SSNs issued in Delaware in 1996.

Knowing the first five digits of SSNs is still a big advantage for identity thieves, who can brute-force the rest or use them to increase the credibility of phishing attacks. The researchers maintain that, "If one can successfully identify all nine digits of an SSN in fewer than 10, 100 or even 1,000 attempts, that Social Security number is no more secure than a three-digit PIN."

Identity theft can occur when an attacker only knows the date of birth, easily obtainable from social networking websites, and the SSN, demonstrated now to be predictable. "Mounting empirical evidence suggests, in fact, that providing an SSN and a date of birth which match that SSN is sufficient to create new fraudulent accounts, even when the name associated with that SSN did not match, or the address was wrong, or even […] some of the submitted SSN digits were wrong," the academics explain.

Several proposed mitigation strategies are discussed in the study, but most of them fall short of completely preventing identity theft. For example, implementing full randomization for SSNs will make them less predictable, but can only be applied for newly issued numbers, leaving the older ones still vulnerable. Ultimately, the only viable solution, according to the researchers, is for the government to ban the use of SSNs for identity verification entirely.

The authors are expected to present their findings at the upcoming Black Hat security conference in Las Vegas, but have noted that they have intentionally "omitted sensitive details about the prediction strategy" from their published paper (PDF).