The Obama Administration released an Executive Order and a Policy Directive today that move the federal government forward in a significant direction — officially requiring that, going forward, data generated by the government be made available in open, machine-readable formats (with appropriate protections). Most notably, it requires that agencies create and maintain an “enterprise data inventory, if it does not already exist, that accounts for datasets used in the agency's information systems" — with the ultimate goal of including all agency datasets, and with indications whether the agency has determined that the individual datasets may be made publicly available and whether these are currently available to the public.
In addition, agencies are required to create and maintain a public data listing of those datasets in the agency's enterprise data inventory that can be made publicly available, including datasets that can be made publicly available but have not yet been released.
These are important steps to providing the public with the information needed to help the government identify “high-value” data sets for release – including those the government may not think should be made publicly available. On a related front, though, there are some troubling exclusions.
Requirements to “Collect or create information in a way that supports downstream information processing and dissemination activities,” and to “Build information systems to support interoperability and information accessibility” do not apply National Security Systems — as defined in 40 U.S. C. 11103. This section of the US Code refers any information system that “(A) involves intelligence activities; (B) involves cryptologic activities related to national security; (C) involves command and control of military forces; (D) involves equipment that is an integral part of a weapon or weapons system; or (E) is critical to the direct fulfillment of military or intelligence missions.” It is hard to fathom the logic of exempting these systems from requirements to support information processing, interoperability, and information accessibility – even if only within the government for much of the life of the information.
Also troubling is some of the definitional language in the OMB memorandum. Most striking – given the focus of this policy – is the omission of a definition of ‘information system.’ As defined in OMB Circular A-130(referenced in other definitions), it means “a discrete set of information resources organized for the collection, processing, maintenance, transmission, and dissemination of information, in accordance with defined procedures, whether automated or manual.“ The omission of this definition can also possibly be read together with the definition of data – which in the Memorandum is shortened from the definition on Project Open Data — “Data includes all data. It includes, but is not limited to, 1) geospatial data 2) unstructured data, 3) structured data, etc.” to “For the purposes of this Memorandum, the term "data" refers to all structured information, unless otherwise noted. “ Unstructured data, things like policy documents and Memoranda, could perhaps become structured through explicit meta-tagging, e.g., “broken into the following component data pieces: the title, body text, images, and related links.” It is important, we think, to note that Project Open Data was developed by the White House to “help agencies adopt the Open Data Policy.”
Finally, those of us with long memories are troubled by the inclusion of a definition: Mosaic effect: The mosaic effect occurs when the information in an individual dataset, in isolation, may not pose a risk of identifying an individual (or threatening some other important interest such as security), but when combined with other available information, could pose such risk. Before disclosing potential PII or other potentially sensitive information, agencies must consider other publicly available data -in any medium and from any source-to determine whether some combination of existing data and the data intended to be' publicly released could allow for the identification of an individual or pose another security concern. (Italics added). The concept of the mosaic of information dates back to the Reagan Administration and John Poindexter; we have been struggling against its overuse and abuse since that time.