Taking A Look At Whats Next For The Environmental Protection Agency (EPA) Envirofacts Data Service API

EPA

I was asked by folks at the Environmental Protection Agency (EPA) to provide some feedback on the Envirofacts Data Service API, as they prepare to work on the next iteration. I took a quick glance at the landing page for their service, I saw a simple URL layout showing how to make API calls, and made an estimate that it would take me probably an hour or two (at the most) to profile the API.

As I dug into the process of profiling the Envirofacts Data Service API one evening in May, I realized I was wrong about the scope of the API, and became unsure how long it would actually take me. Then this work got lost in the shuffle of my summer, and is something I only recently picked up. I'm not happy if I can't provide an agency with some direction on where to go next, and after about 12 hours of work, I think I have some valuable feedback that they can run with.

The Envirofacts Data Service API program consists of a single landing page, with an overview of how to use the API, and a myriad of pages below, that explain the underlying data model put to use. The API is what I consider a very resource driven API design, meaning it reflects the database resource it came from, and not much emphasis on how the API driven resources will be used.

While the API does use the URL, it uses few of the other HTTP components that make some RESTful. I can see how the design would make sense to a database engineer, but will be a little confusing for API developers.

After looking beyond this portal I have since found other possible APIs, but honestly they are often even more incoherent than the Envirofacts Data Service API. I'm not trying to review the entire EPA API efforts, and will be specifically focusing on the resources available in the Envirofacts Data Service API for this round.

Environmental Protection Agency
  EPA Air Facility System (AFS) API  
  EPA Biennial Report API  
  EPA Comprehensive Environmental Response, Compensation, and Liability Information System API    
  EPA Facility Registry System API  
  EPA Greenhouse Gas API  
  EPA Integrated Grants Management System API  
  EPA Locational information API  
  EPA Permit Compliance System API  
  EPA Radiation Ambient Monitoring API  
  EPA Radiation Information Database API  
  EPA Resource Conservation and Recovery Act Information API  
  EPA Safe Drinking Water Information System API  
  EPA Toxics Release Inventory API  

After I discovered the 411 tables across these 13 groups, and learned the common URL pattern for querying, I decided to define each table as its own endpoint, rather than relying on each table to be included via a {table} path parameter, I opted to hard code it. Even though most of them are incoherent, some still articulate a little bit more about what they resource might do, and once you make a request, you get an even better idea. All of this can go a long way towards helping people understand what is going on.

It wouldn't take much to apply a coherent summary  to each endpoint that describes what is stored in the table for use. Once I had a list of all tables, I went ahead and made a call to each of the 411 endpoints in the 13 areas, and generated a Swagger API definition for each. Using Charles Proxy I was able to generate the underlying data model for each, which is necessary for generating SDKs, and can be used as a central truth throughout other aspects of API integration. The current API design also allows you pass in a field, and apply an operator against it when searching--I opted to leave this out of this iteration, until I had a clear diction of endpoints, and the underlying data model defined for each.  The API is perfectly usable without this.

Keeping Things Simple
My recommendation for any future API release out of the EPA team would be focused on just simplifying things. When you land on the home page, you get the idea there is an API present, but you do not grasp the depth of the resource. A simple list of the various API groups is important. A list that I hydrated from the acronyms, to better demonstrate what lies beneath. Calling things by their actual names just makes things more intuitive. You need to reach out of your government silos. I had to really work hard to make sense of the data model at play, I was sure there would be a meta API or download allowing me to quickly understand things, but I couldn't find it. By creating Swagger definitions for all API endpoints, complete with associated definitions for the data model, I can now easily build querying, filtering, and other mechanisms into my clients. 

Speaking In Plain English
While FRS_PROGRAM_FACILITY may had made sense to the database administrator when naming the original, it does not adequately describe the resource it is serving up. A big part of the next version for these APIs needs to focus on renaming towards more meaningful endpoints over the cryptic table names, and more descriptive fields for each of the underlying data definitions. After crafting the Swagger definitions for these APIs I am blown away by the amount of information in here, obfuscated by the cryptic database naming conventions.

Wrap In A Clean Portal
The current landing page for the Envirofacts API is fairly cluttered, and ultimately doesn't say much--it made me work to hard to get what I need. My goal was to distill down the 13 APIs I found buried in the Envirofacts API page, and expose exactly what you need to understand and get to work using any of the 13 APIs and the over 400 endpoints--nothing more.  I started with a simple Github Pages hosted template, with a single APIs.json home page, and interactive documentation for each of the APIs (which you can fork).

Environmental Protection Agency (apis.json)
The United States Environmental Protection Agency (EPA or sometimes USEPA) is an agency of the U.S. federal government which was created for the purpose of protecting human health and the environment by writing and enforcing regulations based on laws passed by Congress. The EPA was proposed by President Richard Nixon and began operation on December 2, 1970, after Nixon signed an executive order. The order establishing the EPA was ratified by committee hearings in the House and Senate. The agency is led by its Administrator, who is appointed by the president and approved by Congress. The current administrator is Gina McCarthy. The EPA is not a Cabinet department, but the administrator is normally given cabinet rank.
APIs

EPA Air Facility System (AFS) API

EPA Biennial Report API

EPA Environmental Response, Compensation, and Liability Information API

EPA Facility Registry System API

EPA Greenhouse Gas API

EPA Integrated Grants Management System API

EPA Locational information API

EPA Permit Compliance System API

EPA Radiation Ambient Monitoring API

EPA Radiation Information Database API

EPA Resource Conservation and Recovery Act Information API

EPA Safe Drinking Water Information System API

EPA Toxics Release Inventory API

With my new portal, you get the overview of the EPA API, with link to each API, but I also use the Swagger definition to generate Swagger interactive documentation, rather than sending you to the EPA data model page. This is just the start. I can also use Swagger to generate sandbox environments, cloning APIs, and maybe allowing for updates and changes. I could also use the Swagger to generate client libraries for EPA APIs using APIMATIC. I'll add all of this to the roadmap, I think I have done enough work for now, and ready to hand things back to EPA.

I'd like to see EPA consider some of the common building blocks I recommend as part of my default developer portal. You don't have to do everything, but the more you do to engage the public around your API, the more chances they will actually use it. Additionally if you go through all APIs, and translate everything from databaseze to English, the potential someone will build on it will exponentially increase.

Continue On The API Journey At EPA
Beyond the portal, and better describing the APIs, my advice is to just continue on the API journey at EPA--this is where the learnings come from. On the current Envirofacts API page, there is another API in addition to the Envirofacts Data Service API, the UV Index API. I can tell the thinking that went into this, are the beginning steps of more experience based API design, focusing on how the API will be used. There is still a lot of the same design mistakes in crafting URLs for this API, but I can tell the desire is there to continue improving on the original design.

When you look at the Envirofacts Multi-system Search, and the widgets that are present, you also see some serious thought put into usability--this needs to be applied to the API design. The API is for other developers, but you can assist them in better understanding the potential through better API design.

I haven't changed anything with the current EPA Envirofacts Data Service API, I just worked to understand how it works, profiled each service I found as a Swagger definition, and then brought them all together as a single APIs.json driven collection. This process helped me understand the 13 APIs and 400+ endpoints, while also distilling this definition into a set of machine readable index, that I use to drive the Github Page developer portal I launched. APIs.json drives the home page, and each APIs Swagger definition drives its associated interactive documentation.

When it comes to the next iteration of the EPA Envirofacts Data Service API, I'd focus on a simple, concise portal for supporting developers, complete with the common building blocks found in other leading API platforms. I would also focus on taking the API definitions I've created, and get to work humanizing the design of these 13 APIs, and 400+ endpoints. Make the endpoints intuitive, standardize your approach to query, and pagination based upon other leading approaches established by API architects. Then do the dirty work of humanizing the underlying definitions, field names, and descriptions. Think deeply about both the request and response structure, and make it speak to developers--your simple, intuitive portal, with the right building blocks will provide a potential feedback look for this cycle (if you do it right).

If you do this, then get to work generating some SDKs, setup some monitors with API Science or Runscope, and provide Postman Collections for your API consumers, and  get busy evangelizing that these APIs even exist--the API will get used. There is a lot of value present here, it just needs to be brought out, polished, and presented in a way that showcases the hard work going on at EPA.