View on GitHub

990 Decoder -- Charity Navigator

ETL toolkit for 2.5 million electronic nonprofit tax returns released by the IRS.

Exploring the 990 dataset

Understanding (and finding) the IRS schemas

The 990 database is built from XML files supplied from the IRS. These filings are based on schemas supplied in .xsd format, which is a special form of XML used for describing other XML documents. Some have found these .xsd schemas to be hard to use.

You can find schemas from TY 2013 through TY 2016 on the IRS website. However, the AWS data runs as far back as 2011. You can find some of the missing schemas if you look at the directory where the listed files are located. As these are not actually listed on the index page provided by the IRS, you should exercise caution in interpreting these schemas.

There are multiple versions of the schema for each year. Within each version, there are several .xsd files that must be consulted in order to build the database. Field descriptions may be contained either in XML comments or in an <xsd:documentation> tag. If you wish to work directly with the data, be sure to look at everything, including comments.

The following files were used to build the database supplied in this toolkit.

Database tables

filing

The table filing is built from the index files directly from the IRS. It corresponds one-to-one to the records in the index. This includes non-digital filings and other filings not available on AWS. From the AWS front matter:

Index listings of available filings are available in JSON and CSV files, organized based on the year they were filed. Index files exist for each year going back to 2011 and are named based on their year and file type. For example, the CSV index for 2011 is available at https://s3.amazonaws.com/irs-form-990/index_2011.csv, and the JSON index file for 2015 is available at https://s3.amazonaws.com/irs-form-990/index_2011.json

These index files includes basic information about each filing, including the name of the filer, the Employer Identification Number (EIN) of the filer, the date of the filing, and unique identifier for the filing.

Several of the fields here are redundant with those in filing. That’s because the filing version is pulled from the index, whereas the header version is pulled from the 990. Therefore, it is a reasonable sanity check to compare the two.

Part I

Part III

The 990 schema documentation for this is pretty sparse. All it says is “Repeating Activities Lines 4b through 4d.” The one-word definitions below are also from the schema.

Part IV

Part VI

Part VII(a).

Part VIII

Contributions, gifts, grants and other similar amounts

Part IX

Part X

Part XII

Schedule G

Schedule L, Part II

NOTE: There is currently an open issue with our processing of Schedule L. Please do not use it.

Data quality issues

Issues specific to this dataset

For known issues related to this dataset, see our issues page.

Upstream issues

Multiple 990s for the same EIN

Institutional trustees vs board members