That’s Never Happened Before

Famous last words, I know.

I was working with ChatGPT, discussing various concepts. We chatted for several hours over the course of a few days, and we came to, let’s say, a conclusion. What ChatGPT 5 did next was something I had never seen. In fact, I’m sharing the actual response.

I won’t share my thesis here, but this is one of the field verification projects it recommends.

One-page grant prospectus — Provenance Pilot

Project title
Provenance First: Testing provenance packages & slow-mode deliberation to improve municipal decision quality

Lead investigator
Bry Willis — Independent Researcher (contact: [REDACTED])

Summary (1–2 lines)
Test whether mandatory, machine-readable provenance packages plus a 60-day slow-mode and rapid adversarial audits improve decision quality (DQI), reduce reversal rates and increase public legitimacy in municipal planning decisions.

Problem statement (why fund this)
Policy decisions are frequently hijacked by headline pressure and low-provenance inputs; municipalities lack lightweight, testable protocols that make inputs traceable and give deliberation time. This pilot operationalises the “provenance-first + slow-mode” institutional fix and measures whether design moves increase robust decision outcomes rather than performative actions.

Objectives

  1. Test whether provenance packages + 60-day slow-mode increase Decision Quality Index (DQI) vs business as usual.
  2. Measure effects on Provenance Completeness Score (PCS), reversal rates at 12 months, and public legitimacy.
  3. Produce an open replication pack (data, code, provenance schema) and practitioner guidance.

Design & methods (12 months)

  • Setting: medium-sized city planning board; 12 eligible planning items over 12 months.
  • Randomisation: items randomly assigned (stratified by leverage) to Treatment (provenance package + 60-day slow-mode + funded rapid audit) or Control (standard process).
  • Intervention: standard JSON provenance package template (raw data links, code where relevant, funders, changelog), public portal posting, 60-day live comment + one adversarial rapid audit (48–72 hrs) on contested claims.
  • Primary outcome: Decision Quality Index (DQI) — composite of: evidence completeness, process reversibility, and independent expert accuracy score (pre-registered rubric).
  • Secondary outcomes: PCS, reversal at 12 months, public trust (survey), time to decision, stakeholder satisfaction.
  • Analysis: pre-registered mixed effects model (item-level with panel-level random effects), intention-to-treat.

Deliverables

  • Pre-registered protocol on OSF; open data & replication pack (anonymised where required); policy brief and practitioner playbook; 1 peer-reviewed methods paper.

Success criteria (pre-registered)

  • Cohen’s d≥0.4 improvement in DQI for treatment vs control; PCS improvement >25 percentage points; no more than 30% increase in unacceptable delay (pre-agreed threshold).

Risk & mitigation

  • Delay risk: cap slow-mode at 60 days; urgent items can request expedited review (documented override).
  • Capture risk: audit funding ring-fenced and administered by independent trustee panel.

I’m just curious now. Have you ever had a generative AI process end with a cost of goods and services?

Leave a comment