Court rejects LinkedIn claim that unauthorized scraping is hacking

Judge says LinkedIn’s reading of hacking law would have troubling consequences.

TIMOTHY B. LEE – 8/15/2017, 2:05 PM

Enlarge / LinkedIn CEO Jeff Weiner.

A California federal court has handed a setback to LinkedIn in a case that could determine whether scraping a public website triggers anti-hacking law. The 25-page ruling, released on Monday, holds that federal anti-hacking law isn’t triggered by scraping a website, even if the website owner—LinkedIn in this case—explicitly asks for the scraping to stop.

The case pits a business analytics startup called hiQ against the Microsoft-owned behemoth LinkedIn. HiQ scrapes data from publicly available portions of the LinkedIn website, then sells reports to employers about which of their employees seem to be looking for new jobs. LinkedIn sent hiQ a cease-and-desist letter warning that continued scraping could subject hiQ to liability under the Computer Fraud and Abuse Act (CFAA), the anti-hacking legislation Congress enacted in 1986.

But critics argued that the LinkedIn interpretation of the law could have sweeping and harmful consequences. After all, lots of people scrape publicly available websites, and they don’t always do so with the approval of website owners.

“It’s hugely problematic to let the subjective wishes of the website owner and not their objective action” determine what’s legal, said legal scholar Orin Kerr in a recent interview with Ars.

Judge Edward Chen bought this argument. In fact, he quoted extensively from Kerr’s arguments in his opinion. If his ruling is upheld on appeal, it would not only beat back LinkedIn’s expansive reading of the CFAA, but it would give us greater clarity about how to draw the line between legal data harvesting and illegal hacking.

Passwords mark the boundary between public and private

The CFAA is more than 30 years old, yet its exact meaning remains a subject of vigorous debate. The reason is that the CFAA was written in vague language—and was crafted before modern technologies like the Web and social media sites were invented.

The CFAA makes it a crime to “access a computer without authorization or exceed authorized access.” LinkedIn argued that this made the case straightforward: its cease-and-desist letter—as well as technical measures like its robots.txt file and IP-based blocking—made it clear that hiQ wasn’t authorized to access LinkedIn’s servers. Hence, LinkedIn argued, hiQ had accessed its servers without authorization, in clear violation of the law.

But Judge Chen concluded that the issue isn’t so simple. When you publish a website, you implicitly give members of the public permission to access it, he ruled. Allowing website operators to revoke that permission on a case-by-case basis, backed up by the force of federal criminal law, could have serious consequences that Congress could not have intended:

“The CFAA as interpreted by LinkedIn would not leave any room for the consideration of either a website owner’s reasons for denying authorization or an individual’s possible justification for ignoring such a denial. Website owners could, for example, block access by individuals or groups on the basis of race of gender discrimination. Political campaigns could block selected news media, or supporters of rival candidates, from accessing their websites. Companies could prevent competitors or consumer groups from visiting their websites to learn about their products or analyze pricing.”

A site called NewsDiffs, for example, tracks changes to articles in major media organizations like The New York Times and CNN, helping to bring transparency to undisclosed, after-the-fact edits of their work. Under the LinkedIn interpretation of the law, a cease-and-desist letter could transform this site’s activity from a valuable public service into a felony.

Instead, Chen endorses an approach developed by Kerr that relies heavily on analogies to physical trespassing:

“It is generally impermissible to enter into a private home without permission in any circumstances. By contrast, it is presumptively not trespassing to open the unlocked door of a business during daytime hours because “the shared understanding is that shop owners are normally open to potential customers.” These norms, moreover govern not only the time of entry but the manner; entering a business through the back window might be a trespass even when entering through the door is not.”

Following an influential essay by Kerr, Chen argues that the main way websites distinguish between the public and private portions of their websites is using an authentication method such as a password. If a page is available without a password, it’s presumptively public and so downloading it shouldn’t be considered a violation of the CFAA. On the other hand, if a site is password-protected, then bypassing the password might trigger liability under federal anti-hacking laws.

In a recent interview with Ars, the Cornell legal scholar James Grimmelmann argued that it’s important to preserve the ability of small companies to scrape established websites.

“Lots of businesses are built on connecting data from a lot of sources,” Grimmelmann told us. He argued that scraping is a key way that companies bootstrap themselves into “having the scale to do something interesting with that data.” If scraping without consent becomes illegal, startups like hiQ will have a harder time getting off the ground.

The big question here is whether appeals courts will see things the same way Chen did. When we first covered this case last month, experts told us that a 2016 precedent involving scraping of Facebook could spell bad news for hiQ, since it essentially held that a cease-and-desist letter from Facebook was enough to turn a company’s data harvesting into a violation of the CFAA.

Chen says the LinkedIn case is different, however, because the defendant in the Facebook case was harvesting data from the password-protected parts of Facebook—albeit with the permission of the password owner—while hiQ is only scraping data from the public portions of LinkedIn. But this is a somewhat novel argument, and it remains to be seen if the Ninth Circuit Appeals Court—which made last year’s Facebook decision and has jurisdiction over this new case—will see things the same way.

For hiQ, this could be a matter of life or death. hiQ has told the court that its existence as a business depends on access to LinkedIn’s data.

hiQ has not only argued that it isn’t violating the CFAA but that LinkedIn is violating antitrust law by denying it access to data about its users. In Monday’s ruling, Chen doesn’t fully endorse that argument. Instead, he defers ruling on it until later in the case. But he finds it plausible enough to order LinkedIn to stop interfering with hiQ’s scraping efforts while the court case runs its course.

Ultimately, Judge Chen could rule that hiQ didn’t violate the CFAA, but also that there’s nothing wrong with LinkedIn using technical measures like IP blocking to prevent hiQ from harvesting its users’ data. That could result in a technological arms race, with LinkedIn trying to block hiQ’s scrapers, and hiQ taking ever more elaborate steps to evade those blocking measures.