Cheese pizza. To you that might sound like a totally innocent addition to your night in, but to a certain sordid class of people it can mean something entirely different. As one Reddit user recently pointed out, it’s a codename for child porn (CP) on 4chan. And that’s the kind of cryptic confusion that plagues the people who gather the evidence to bring cyber criminals like pedophiles, terrorists, and identity thieves to trial.
investigators had been using nothing more than glorified search engines
Researchers have developed a new algorithm to aid investigators in their quest to collect evidence for cyber crimes (a process known as “digital forensics”) and they say it significantly reduces the time it takes to mount a damning case for the prosecution.
You’d be forgiven for assuming that digital forensics was an incredibly cutting edge process, with law enforcement agencies seamlessly sieving through hard drives and electronic records with technology so advanced that we probably won’t even know it existed for years to come. Unfortunately it’s actually quite the opposite, and still involves a great deal of old-fashioned human grunt work. Kabi Daghir, who led the research at Concordia University in Canada, says that digital forensics is still a field that relies on manual effort and human hours.
When it comes down to it, the investigators had been using nothing more than glorified search engines to comb through documents on a suspect’s hard drive. “The investigator would also have to be experienced enough to search for really specific key words that could match [up] to the suspect’s computers,” says Daghir. In other words, the detectives need to be constantly aware of the latest cyber slang, like “cheese pizza,” and if they don’t happen to know one of the criminal codenames then the offender is likely to get away with it.
“In one case there was a person who was involved in child pornography but in the same computer there were thousands of credit card numbers on the machine”
The new set of tools that Daghir has invented is designed to make sure cyber felons don’t circumvent the courtroom by way of code words. He works in partnership with Canada’s National Cyber-Forensics and Training Alliance, and came up with a sophisticated way to cluster documents into groups based on topics provided by the investigator, so the investigators can “right way, instead of searching millions and millions of documents, extract information that’s specific to [certain] topics like child pornography,” says Daghir.
Crucially, the algorithm also “captures the suspect’s vocabulary,” which Daghir says makes it harder for a perpetrator to avoid being caught by relying on talk of cheese pizza.
He says his tool makes the suspect vulnerable to being caught for crimes unrelated to the one he or she is being investigated for. “Sometimes you don’t know that the suspect is involved in another type of crime,” says Daghir, and if you’re only searching for specific keywords then you’re not likely to happen upon evidence for other crimes that may have committed. “In one case there was a person who was involved in child pornography,” says Daghir, “but in the same computer there were thousands of credit card numbers on the machine.”
“We don’t want to let the bad guys know how we catch them”
Daghir confirms that his new digital forensics tool is currently in use but was unable to reveal which law enforcement agencies in particular have incorporated it into their digital investigations.
The FBI declined to comment as to whether or not they’re one of the agencies that currently use Daghir’s protocols. “We don’t want to let the bad guys know how we catch them,” said FBI special agent Jennifer Shearer.
One of the most notable factors that Daghir’s algorithm brings to the table is the time saving element. A process that typically takes at least a couple of months can now be accomplished in just one or two days. Now Daghir can move on to teaching his algorithm new languages (it can currently search English and French).