Bulk Download and Developer Documentation

We are making all of the reports and associated data data available for download for analysis and reuse.

Bulk Download

A complete listing of reports is published in a CSV file https://www.everycrsreport.com/reports.csv. It looks like:

number,url,sha1,latestPubDate,latestPDF,latestHTML
IN10929,reports/IN10929.json,4d3bdd9837cf7d359646dbc98b344786b00ad36b,2018-07-16,,files/20180716_IN10929_5f791dbabc8be4901fa6e06befa0904b033e2569.html
IN10889,reports/IN10889.json,afd5ee5f131958eb11a4cc8b6a165e5f295ba9ed,2018-07-16,files/20180716_IN10889_3ec6c3177e9baf4fb16b589837052700c779e0e0.pdf,files/20180716_IN10889_1f186345532a6cd914e1b1f5be857a65ef86bb9d.html
RL31457,reports/RL31457.json,761b695493453a64c1cc51ce7aab5feae662e05c,2018-07-13,files/20180713_RL31457_98bb1bffe9693885e012228b9b55a9947747881c.pdf,files/20180713_RL31457_99463929b7922a8c0faa47c5ac1bc9172987739a.html
...

The columns are:

  1. the report number (assigned by CRS)
  2. a path to a metadata JSON file for the report (see below for schema documentation)
  3. the SHA1 digest of the metadata JSON file (so you can know if it has changed)
  4. the most recent publication date of the report (changed when CRS publishes an update)
  5. a path to the most recent PDF for the report
  6. a path to the most recent HTML fragment for the report

The paths in the CSV file are relative to https://www.everycrsreport.com/.

Only the most recent PDF and HTML files for a report are listed in the CSV file. For previous versions of the report, parse the metadata JSON files to get the PDF and HTML filenames. Our Python script bulk-download.py demonstrates how to use this listing to download the complete archive.

The report URLs follow a predictable format:

report page: https://www.everycrsreport.com/reports/R44636.html
report metadata JSON: https://www.everycrsreport.com/reports/R44636.json
report thumbnail: https://www.everycrsreport.com/reports/R44636.png

Report PDFs and HTML fragment files are under https://www.everycrsreport.com/files. Because reports can have multiple versions, the file paths for the PDFs and HTML are not predictable, but they are listed in the CSV listing and in the metadata JSON files.

Metadata Schema

Each report has a metadata JSON file that describes the report.

Here’s an example:

{
  "id": "R41330",
  "type": "CRS Report",
  "source": "EveryCRSReport.com",
  "versions": [
    {
      "id": 393782,
      "source": "EveryCRSReport.com",
      "date": "2016-09-15T00:00:00",
      "title": "National Monuments and the Antiquities Act",
      "summary": "The Antiquities Act of 1906 authorizes ... from potential threats.",
      "formats": [
        {
          "format": "HTML",
          "filename": "files/20160915_R41330_67193b9ccc8e2a41640eb140040bc6a08a8275ab.html"
        },
        {
          "format": "PDF",
          "filename": "files/20160915_R41330_7d9b5190fc6ef4f898d2e7b667dd7c0531831a19.pdf"
        }
      ],
      "topics": [
        {
          "source": "IBCList",
          "name": "Federal Lands",
          "id": 314
        }
      ]
    },
    ...
  ],
  "topics": [
    "Appropriations"
  ]
}

Report Object

The top-level object represents a CRS report and contains two main fields:

{
  "id": "R41330",
  "versions": [
   ...
  ]
}

The id field is an identifier for the report assigned by CRS.

CRS may update a report. Each update is listed in versions as a Report Version Objects (see below). Because the metadata may change in each update, all of the metadata for a report besides its number is stored in the metadata of the versions. The versions are in reverse-chronological order, so the first entry has the most recent metadata.

Report Version Objects

Each report version has several fields:

{
  "id": 393782,
  "source": "EveryCRSReport.com",
  "date": "2016-09-15T00:00:00",
  "title": "National Monuments and the Antiquities Act",
  "summary": "The Antiquities Act of 1906 authorizes ... from potential threats.",
  "formats": [
    ...
  ],
  "topics": [
    {
      "source": "IBCList",
      "name": "Federal Lands",
      "id": 314
    }
  ]
}

date is the publication date of the report (or updated report).

formats is a list of zero or more Report Version Format objects (see below).

Report Version Format Objects

Each report version is published in zero or more document formats. Each format looks like:

{
  "format": "HTML",
  "filename": "files/20160915_R41330_67193b9ccc8e2a41640eb140040bc6a08a8275ab.html"
}

The format is either HTML or PDF.

filename specifies a URL at https://www.everycrsreport.com/ where the file can be retrieved.

Usage Terms

CRS reports, as works of the United States Government, are not subject to copyright protection in the United States. Any CRS report, including its metadata, may be reproduced and distributed in its entirety without permission from CRS or from us. However, bear in mind that as a CRS report may include copyrighted images or material from a third party, you may need to obtain the permission of the copyright holder if you wish to copy or otherwise use copyrighted material.