Biz & IT —

Adobe’s e-book reader sends your reading logs back to Adobe—in plain text [Updated]

Digital Editions even tracks which pages you've read. It might break a New Jersey Law.

Adobe even logs what you read in Digital Editions' instruction manual.
Adobe even logs what you read in Digital Editions' instruction manual.

Adobe’s Digital Editions e-book and PDF reader—an application used by thousands of libraries to give patrons access to electronic lending libraries—actively logs and reports every document readers add to their local “library” along with what users do with those files. Even worse, the logs are transmitted over the Internet in the clear, allowing anyone who can monitor network traffic (such as the National Security Agency, Internet service providers and cable companies, or others sharing a public Wi-Fi network) to follow along over readers’ shoulders.

Ars has independently verified the logging of e-reader activity with the use of a packet capture tool. The exposure of data was first discovered by Nate Hoffelder of The Digital Reader, who reported the issue to Adobe but received no reply.

Digital Editions (DE) has been used by many public libraries as a recommended application for patrons wanting to borrow electronic books (particularly with the Overdrive e-book lending system), because it can enforce digital rights management rules on how long a book may be read for. But DE also reports back data on e-books that have been purchased or self-published. Those logs are transmitted over an unencrypted HTTP connection back to a server at Adobe—a server with the Domain Name Service hostname “adelogs.adobe.com”—as an unencrypted file (the data format of which appears to be JSON).

The behavior is part of Adobe's way of managing access to e-books borrowed from a library or "lent" by other users through online bookstores supporting the EPUB book format, such as Barnes & Noble. If you've "activated" Digital Editions with an Adobe ID, it uses that information to determine whether a book has been "locked" on another device using the same ID to read it or if the loan has expired. If the reader isn't activated, it uses an anonymous unique ID code generated for each DE installation.

Below is the data transmitted by Digital Editions when we opened an EPUB file of Yotam Ottolenghi’s cookbook, Jerusalem:

This is what Adobe knows about my choice in culinary reading—broadcast in plain text by Digital Editions.
Enlarge / This is what Adobe knows about my choice in culinary reading—broadcast in plain text by Digital Editions.

DE reported back each EPUB document opened and the navigation within the document, recording each page number viewed in a stream of activity data back to an application called “datacollector.” The XML data is logged locally by the application, and then transmitted each time the application is opened—likely as part of Adobe’s DRM enforcement within DE. No data was transmitted for PDF documents opened.

A review of Adobe's terms of use for DE found no mention of the logging feature or how long the data was stored by Adobe. While checking the license data for books in DE’s local library is certainly part of the application’s core functionality, the fact that this data is broadcast in the clear could create a significant privacy issue for readers. It's not clear how the data collected by Adobe is stored, but it is associated with a unique identifier for each Digital Editions installation that can be associated with an Internet Protocol address when logged. And the fact that the data is broadcast in the clear by Digital Editions is directly in conflict with the privacy guidelines of many library systems, which closely guard readers' book loan data.

Update, 4:45 PM: The unencrypted transmission of reader data, along with an apparent lack of coverage of the collection of that data in Adobe'e terms of service, may be in violation of a recently passed New Jersey Law, the Reader Privacy Act. And the collection has also raised concern among librarians. The American Library Association's Code of Ethics states, "We protect each library user's right to privacy and confidentiality with respect to information sought or received, and resources consulted, borrowed, acquired or transmitted."

In a phone interview with Ars Technica, Deorah Caldwell-Stone, the deputy director of the American Library Association's Office for Intellectual Freedom, said that the ALA was still investigating the issue. "We are looking at this, and very concerned about this," she said, and If the data were to pertain to any library transactions, "we would want this information encrypted and private."

An Adobe spokesperson provided the following statement:

Adobe Digital Editions allows users to view and manage eBooks and other digital publications across their preferred reading devices—whether they purchase or borrow them. All information collected from the user is collected solely for purposes such as license validation and to facilitate the implementation of different licensing models by publishers. Additionally, this information is solely collected for the eBook currently being read by the user and not for any other eBook in the user’s library or read/available in any other reader. User privacy is very important to Adobe, and all data collection in Adobe Digital Editions is in line with the end user license agreement and the Adobe Privacy Policy.

Here's what Adobe says they collect:

  • User ID: this is the user's Adobe ID or an anonymous ID for an unactivated version of DE.
  • Device ID: a unique identifier for the computer running DE, "collected for digital right management (DRM) purposes since publishers typically restrict the number of devices an eBook or digital publication can be read on," Adobe's spokesperson said.
  • Certified App ID: a key that allows DE to open documents protected by DRM from being opened with unauthorized software.
  • Device IP address: for geo-location, "since publishers have different pricing models in place depending on the location of the reader purchasing a given eBook or digital publication," Adobe's spokesperson said.
  • Duration for Which the Book was Read: "This information is collected to facilitate limited or metered pricing models where publishers or distributors charge readers based on the duration a book is read," said Adobe's spokesperson.
  • Percentage of the Book Read: Believe it or not, some publishers charge based on how much you read of a book—you may be only charged a percentage of the total if you don't finish it.

The end-user license for Digital Editions states that "the Software may cause Customer’s Computer, without additional notice and on an intermittent or regular basis, to automatically connect to the Internet to facilitate Customer’s access to content and services that are provided by Adobe or third parties... In addition, the Software may, without additional notice, automatically connect to the Internet to update downloadable materials from these online services so as to provide immediate availability of these services even when Customer is offline."  The EULA also refers to Adobe's privacy policy, which states that the company will "provide reasonable administrative, technical, and physical security controls to protect your personal information."

It's clear from testing that data is sent on more than just the book currently being read—in our test, data was provided from a "scan" of all documents currently in the library. Additionally, it's not clear why all this data is sent regardless of the source of the book—even EPUB documents that are DRM-free get all their data shipped back to Adobe.

Update, 6:23 PM ET: An Adobe spokesperson now says the company is working on an update. "In terms of the transmission of the data collected, Adobe is in the process of working on an update to address this issue," the spokesperson said in an e-mail to Ars Technica. "We will notify you when a date for this update has been determined."

 

Channel Ars Technica