Author’s note: this post was drafted by Claude (Anthropic) from my project notes and source code, then reviewed and edited by me before publishing. The voice and judgments are mine; the typing isn’t.
The mess
The trigger was a specific moment: I was building my website, wanted photos of my grandfather (David Roy Simmons, ethnologist, 1930-2015), and after twenty minutes of clicking around realised the photos I needed had been sitting inside Windows Mail’s attachment cache the whole time. The full path, for the record:
C:\Users\Luke Simmons\AppData\Local\Packages\
microsoft.windowscommunicationsapps_8wekyb3d8bbwe\
LocalState\Files\S0\4\Attachments\
774 files, 227 MB, completely invisible to normal browsing. That was the prompt to audit the rest. A few highlights:
C:\Users\Luke Simmons\Downloads: 5,680 files, 3.47 GB — mostly.jarand.classdebris from a university Minecraft modding project.D:\Downloads: 929 files, 19.40 GB, most under 30 days old — the actual active Downloads folder.D:\DocumentsandD:\Hard Drive - SONY: near-identical extension profiles. One was almost certainly a forgotten backup of the other.
In total, roughly 7,500 files in active staging locations plus ~2,700 photo candidates to consolidate. None of it would fit in my head, which is why I’d been ignoring it.
The two-pronged tool choice
I installed Everything (Voidtools) for instant filename search. That solved half the problem — the Mail-cache discovery would have taken thirty seconds with Everything installed, instead of a year of low-grade frustration.
For the actual sorting I picked organize — Python, MIT-licensed, YAML-configured. The rules live in config\config.yaml and organize sim runs the whole pipeline in dry-run mode without touching disk. The rules themselves are boring (e.g. D:\Downloads images → D:\Photos\Inbox\{created.year}-{created.month}\). Boring is the point. The cleverness lives in everything around the rules engine.
The safety design — this is the writeup
This isn’t really “a file organiser.” It’s a set of safety gates wrapped around a file organiser. Every design decision was about not eating my data.
Rule 1: never delete. Move, don’t rm. When the destination is unclear, the rule moves the file to D:\Quarantine\YYYY-MM-DD\ and leaves it there for a week. I have not yet found a “delete this file” rule worth writing.
Rule 2: dry-run first, every time, no exceptions. organize sim walks every rule against the real filesystem and prints what would move. The hard rule: no rule runs for real until it’s had a full successful dry-run pass with output I’ve actually read. This caught at least one bug per rule on average.
Rule 3: every operation writes a JSONL log line. I wrote a custom organize action (actions/log_move.py, ~80 lines) that wraps shutil.move and shutil.copy2 with structured logging. Every move writes one line to logs\moves.jsonl:
{"ts": "...", "rule_id": "downloads-images", "reason": "...",
"mode": "move", "src": "...", "dest": "...", "size": 123456}
The log is the audit trail and the rollback trail — if a rule goes wrong I can scan the JSONL, find every file the bad rule touched, and reverse those specific moves.
Rule 4: copy-only mode for irreversible sources. The Mail attachment cache is the canonical case. Windows Mail may still need the originals, so the rule that pulls from microsoft.windowscommunicationsapps_8wekyb3d8bbwe\LocalState\Files\ runs in mode="copy", never mode="move". The log_move helper also short-circuits if a same-name same-size file already exists at the destination.
Rule 5: off-limits directories are explicit. The config has a fixed exclusion list — D:\SteamLibrary, the Hyper-V VM, D:\ai-website-manager (this website), plus the usual .venv\, .git\, node_modules\. The rules engine is not allowed to make decisions about my active project folders.
digiKam, for the photo half
I didn’t build my own photo deduplicator. digiKam already does it better than I would have, so I used digiKam. The split worked out clean: organize handles the ongoing flow (new images land in dated inbox folders), digiKam handles the one-time consolidation pass and perceptual-hash dedup.
What broke
A compressed-files rule matched too broadly on its first dry-run — it would have swept up .tar and .gz files inside a Linux dotfile backup I’d forgotten about. The dry-run output was the only reason I caught it; one-line fix with max_depth: 0.
The Mail-attachments copy rule, on its first real run, created a duplicate loop — the JSONL log showed the same file being copied as (1), (2), (3) on consecutive scheduled runs. Fixed with the same-name-same-size check in log_move. The dry-run hadn’t caught it because the duplicates only appeared after a real run created the first copy. Honest lesson: dry-runs catch the rules-that-should-not-fire problem; the JSONL log is what made the state-after-the-rule-fires problem visible.
Where it sits now
The pipeline runs hourly via a Windows Task Scheduler job that invokes scripts\run_organize.ps1. Every run appends to logs\organize-runs.log and, for any actual moves, logs\moves.jsonl. Restic backs the lot up nightly.
The principle
Most file-automation tools fail one basic test: they assume the user trusts them. They shouldn’t. Filesystem automation has the same risk profile as a database migration — silent corruption is more dangerous than a loud failure, the consequences are durable, and “undo” is rarely free. The right default is the opposite of what most tools ship with: every action reversible, every action audited, every rule dry-runnable.
The reason this project works for me isn’t the YAML rules. It’s that there is no path through the system that touches my filesystem without leaving a JSONL receipt and without having first been shown to me in simulation. That’s the test I’d want any future automation I build to pass.
— Luke Simmons, Auckland