The failure of OCRbot

Update: A lot of people have left me comments telling me that OCRbot really has helped them, and that they wouldn’t have posted or boosted certain content without it. I didn’t realise that when I wrote this, and I’m really happy to hear that OCRbot does actually have a purpose. I’ll keep it running. ❤️

The idea

OCRbot is a Fediverse bot that uses tesseract to caption images. When mentioned, it will download the provided image (or the one from the post you’re replying to) and reply with an automatically generated caption.

While it works reasonably well for captioning screenshots, its actual goal was to remind people of the importance of image captioning. The idea was that people would post images without a caption, and then people who needed it to be captioned would link OCRbot to the post. This meant that the person using OCRbot received the transcription they needed, and the person posting the image would be reminded that people need transcriptions, and would hopefully decide to caption their images in future, especially since OCRbot often makes mistakes.

Of course, this isn’t what happened at all.

A post from the OCRbot account noting that it’s far from perfect, and that you should really caption images yourself.

The mistake

The mistake was providing people with a convenient method to automatically generate image captions over the Fediverse. This meant that you could just post an image and tag OCRbot and have it do the work for you. Free accessibility with none of the work.

Of course, OCRbot isn’t “free accessibility”. A caption will be read aloud by a screen reader in place of the image. An OCRbot reply needs to be manually scrolled to and opened to get the transcription. There’s another problem, too: OCRbot frequently performs terribly at captioning images.

OCRbot captioning the Discord friends bar as “Coes Col oe”

This is partially an issue with the underlying software (optical character recognition is incredibly difficult), and also due in part to the complete lack of image processing OCRbot does before sending the image to tesseract. tesseract is particularly bad at handling light text on dark backgrounds, as seen above. This is bad because screenshots from Twitter, Discord, Mastodon, etc. are often taken using a dark theme.

The outcome

The end result is that I’ve given people a way to think they’re improving the accessibility of their posts by using a bot that does a terrible job and hides the caption in a separate post behind a content warning. Considering that my goal was to make the Fediverse a little more accessible, I wouldn’t just say I’ve failed, I’d say I’ve achieved the exact opposite. The mere existence of OCRbot encourages people not to bother captioning their images, when it should be a reminder that people need captions.

141 commits later, OCRbot is still terrible. There’s a known issue that prevents it from even working at all on some posts, and I still haven’t been able to motivate myself to fix it. The issue has been open since the 26th of February, and it will likely stay that way for a lot longer.

People have thanked me for OCRbot, but nobody who actually needs it – people who genuinely need image captions – has said anything. Many people believe it’s helpful, and like the idea, but so far I don’t think any actual help has come of it.

I made a post lamenting the failure of OCRbot in March this year, less than a month after OCRbot went online. I mentioned that I should have made a video or something similar, to communicate the message of the importance of accessibility without providing a tool to ineffectively automate it. OCRbot should never have existed.