87,460 Cases and a Gap Disclosed: How Project Hermes Expanded Its NUFORC Corpus Without Hiding What's Missing
Project Hermes v0.7.0 swapped in a more complete NUFORC dataset (+7,822 archival reports), tightened provenance on every archived case, and did something almost nobody in UAP data does: publicly disclosed the 2014-to-present gap rather than letting it be inferred.
A corpus is a claim. The size of a dataset, the dates it spans, the source it came from — these are not neutral metadata. They are assertions about what you looked at, and by extension, what you did not. Most UAP databases make these assertions badly, or not at all. Project Hermes just made its assertions better, and the most important thing it did was admit what is still missing.
The upgrade: from "scrubbed" to "complete"
Hermes previously ingested the ufo-scrubbed-geocoded-time-standardized.csv file — the "scrubbed" version of the NUFORC archive, which drops rows that fail certain quality filters. That filter makes sense if you are studying the phenomenon and want only high-signal reports. It makes less sense if you are studying reporting behavior, where every report — including the terse, the speculative, and the fragmentary — is a datapoint about when and why people chose to report.
v0.7.0 switches to the ufo-complete-geocoded-time-standardized.csv file (via the truthiswill/ufo-reports fork, 2019-09-17 snapshot), retaining 87,458 rows after lat/lon validation. Net gain: +7,822 reports, all within the existing 1906-11-11 to 2014-05-08 date range. Same time period, more density.
Provenance, upgraded
Every archival case now carries an archive_provenance field that names the exact fork and file it came from: planetsig/ufo-reports → truthiswill/ufo-reports fork (2019-09-17 snapshot); ufo-complete-geocoded-time-standardized.csv. The /corpus manifest's snapshot_date now reflects both the fork's commit date and the date Hermes imported it. If you cite a case, a reviewer can walk the chain all the way back to the CSV row it originated from. That is how research-grade data is supposed to behave.
The gap that got disclosed
The NUFORC public mirrors end on 2014-05-08. There is no public GitHub copy of the 2014-to-2026 records, and nuforc.org's live /databank/ is served through the wpDataTables plugin, which gates bulk export. That means there is a roughly twelve-year gap in the archival corpus that Hermes cannot legally or practically close from the public side.
Most projects in this position would quietly let date_range do the talking and hope nobody noticed. Hermes did the opposite: the /corpus notes field now explicitly discloses the 2014-05-09 to 2026-04-04 gap, in plain language, as part of the published manifest.
This is a small thing and a large thing at once. It is small because it is one field in one file. It is large because it inverts the default norm of UAP databases, which is to advertise coverage and let the user discover limits by accident. A dataset whose holes are on the marketing page is more useful than a dataset whose holes are only findable by running your own audits.
The integrity hash changed, as expected
When the corpus changes, the integrity hash has to change. Hermes's global corpus hash moved from 264F1972A076011A (v0.6.0) to A13E1408841A3CF7 (v0.7.0). Any v0.6.0 audit_hash values from earlier analyses remain valid against v0.6.0 corpus state — that's the point of versioning the methodology independently of the software — but re-auditing against v0.7.0 requires recomputation. Nothing is silently mutated. Nothing is retroactively changed.
Why it matters for the study of UAP
Seven thousand new reports is not the headline. The headline is that a UAP project treated its corpus like a scientific artifact: a specific file, from a specific fork, imported on a specific date, with specific known limits. The field desperately needs more of this. The single biggest reason UAP data has been hard to take seriously is not the sightings themselves — it is that researchers have historically handled their datasets the way hobbyists handle collections: additively, without version control, without disclosure of gaps, without a chain of custody that a skeptic could audit. v0.7.0 is Hermes taking the opposite stance, publicly and on the record.
Browse the current corpus manifest at projecthermes.tech/corpus.
Project Hermes and UFO Index are affiliated projects.