Saturday, March 5, 2011

Software instead of lawyers

When five television studios became entangled in a Justice Department antitrust lawsuit against CBS, the cost was immense. As part of the obscure task of “discovery” — providing documents relevant to a lawsuit — the studios examined six million documents at a cost of more than $2.2 million, much of it to pay for a platoon of lawyers and paralegals who worked for months at high hourly rates.

But that was in 1978. Now, thanks to advances in artificial intelligence, “e-discovery” software can analyze documents in a fraction of the time for a fraction of the cost. In January, for example, Blackstone Discovery of Palo Alto, Calif., helped analyze 1.5 million documents for less than $100,000. ....

The sociological approach adds an inferential layer of analysis, mimicking the deductive powers of a human Sherlock Holmes. Engineers and linguists at Cataphora, an information-sifting company based in Silicon Valley, have their software mine documents for the activities and interactions of people — who did what when, and who talks to whom. ...

For example, it finds “call me” moments — those incidents when an employee decides to hide a particular action by having a private conversation. This usually involves switching media, perhaps from an e-mail conversation to instant messaging, telephone or even a face-to-face encounter. ....

A shift in an author’s e-mail style, from breezy to unusually formal, can raise a red flag about illegal activity.

“You tend to split a lot fewer infinitives when you think the F.B.I. might be reading your mail,” said Steve Roberts, Cataphora’s chief technology officer. ...

Such tools owe a debt to an unlikely, though appropriate, source: the electronic mail database known as the Enron Corpus.

In October 2003, Andrew McCallum, a computer scientist at the University of Massachusetts, Amherst, read that the federal government had a collection of more than five million messages from the prosecution of Enron.

He bought a copy of the database for $10,000 and made it freely available to academic and corporate researchers. Since then, it has become the foundation of a wealth of new science — and its value has endured, since privacy constraints usually keep large collections of e-mail out of reach. “It’s made a massive difference in the research community,” Dr. McCallum said.

The Enron Corpus has led to a better understanding of how language is used and how social networks function, and it has improved efforts to uncover social groups based on e-mail communication.
--John Markoff, NYT, on the future of legal discovery

No comments: