View on GitHub

990 Decoder -- Charity Navigator

ETL toolkit for 2.5 million electronic nonprofit tax returns released by the IRS.

IRS Form 990 Decoder – DEPRECATED

Please use the newer version of this code

This repository contains everything you need to get started exploring the IRS Form 990 dataset hosted by Amazon Web Services on S3. This includes instructions for an easier-to-use 990 database provided free to the public by Charity Navigator.

Why we are providing these files

Charity Navigator is dedicated to the advancement of informed giving. In the United States, most organizations exempt from income tax under section 501(a) must file an annual information return called the Form 990. These documents contain a wealth of information about each organization’s operations, finances and governance practices. Philanthropists, regulators, researchers and others rely on the IRS Form 990 as a crucial public record of nonprofit governance.

Historically, these documents were completed and mailed into the IRS. More recently, organizations began to submit them digitally. And within the last year, the IRS began to make the digitized data available to the public.

While this is a great advancement for the sector, many have found these original electronic records to be difficult to work with. That’s because the original Form 990 dataset consists of more than a million individual files. The encoding scheme for these files varies from case to case, and the file structure alone makes them very difficult to retrieve. The tools offered here are intended to facilitate retrieval, comprehension and analysis.

This toolkit is neither perfect nor comprehensive, but we hope it can be a starting point for data scientists and subject area experts looking to explore public records for the charitable sector.

Getting started

We recommend using our quick start, but we provide other options as well.

Additional documentation

In addition to the documentation for each start-up option above, we also provide the following documentation:

Who should use this toolkit

These tools were prepared for researchers, data scientists, and enthusiasts interested in exploring the IRS 990 database. These tools are partial and preliminary, but they can help you get started working with what some have found to be a difficult-to-use dataset. You should not use this dataset for legal investigations, policy decisions, or published findings without some scrutiny and careful cleaning. If you do polish things up, please consider contributing to this repository.

Prerequisites

The quick start assumes that you are comfortable with R and RStudio, and have at least a passing familiarity with relational databases and Amazon Web Services. The other options require more expertise.

Limitations

The following limitations apply as of the latest version we are hosting.

Change log

Authors

Code and visualizations: David Bruce Borenstein

Documentation: David Bruce Borenstein and Zach Weinsteiger

Crosswalk between XML and database columns (990): Vince Bogucki

Crosswalk between XML and database columns (EZ): David Bruce Borenstein and Zach Weinsteiger