Skip to content

A First With Claude Code

I’ve been downloading public-domain books for a project, and many of them are kindof bad OCR scans.

A lot of the work I was doing was in Claude Code. I created a skill where it would look online for the text I’m seeking and download it into a sources folder with a certain naming convention.

At one point, I asked, “What options do I have to clean up files downloaded from Project Gutenberg OCR scans into nicely formatted markdown files? Ideally, tools I can download that won’t burn through tokens,” and I was surprised by its suggestion.

It actually recommended I use Ollama (a local LLM runtime) to go through and clean up the text files. I’m not sure if it suggested it because it could see it was already installed, or if it would have suggested it either way, but this is the first time it did that.

It ended up writing a Python script to work with Ollama and a bash script to bootstrap the process. 14 hours later, I had three massive OCR scans cleaned up, and it didn’t use any of my Claude Code quota. It recommended I use the “qwen2.5:14b” model, which seemed to do a good job and ran fine on my M4 Mac Mini with 24 GB of RAM.


Reposts (1)

One comment on “A First With Claude Code

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)