Client slaps me with an invoice screenshot, I stare at the screen typing for another half hour

Last Wednesday at 2 PM, I was rushing a project wrap-up at a coffee shop. A client messaged me on WeChat five invoice screenshots to log the amounts and dates. I was manually typing numbers while on a Zoom call, and by the third one, I realized I added an extra zero, messing up the whole sheet. At that moment, I really wanted to just shut my laptop and walk away. If you also frequently receive various screenshots from clients—surveys, handwritten feedback, event posters—and need to copy the info out one by one, I totally get that eye-straining feeling.

What is GLM-5V-Turbo, and who is using it

GLM-5V-Turbo is a multimodal large model just released by Zhipu. Simply put, it can "look at images and talk": give it an image, and it can read the text, tables, and even layout relationships inside, then directly output the structured information you want. My e-commerce friend Lao Zhou started using it last month to batch extract info from competitors' event posters—start date, discount amount, applicable categories. What used to take him all evening to check one by one now takes half an hour. Zhipu's researchers made this model so AI can do more than just chat; it can "understand" interfaces like a human, becoming a real digital assistant. But not everyone needs this tool—if you barely process image info normally, no pressure to try it now.

Your replication cost today

I messed this up at first—I thought I needed to deploy the model with code, struggled for two days without getting it to run, and then realized you can just upload images and ask questions directly on Zhipu's online chat interface.

Money: Free tier is enough for daily use; beyond that, it's pay-per-image, just a few cents each.

Time: 15 minutes for your first try, including registration and testing one image.

Tech barrier: Just know how to take a screenshot and type; no coding jargon needed.

First step: Open chatglm.cn (Zhipu's chat page), register an account, click the "Image" button next to the chat box, upload your screenshot, and type "List all amounts and dates in this image as a table."

Advice by stage

If you're just starting out and don't have many clients: Just use the free tier to occasionally process screenshots. Don't rush to figure out API integration; spending time finding clients is more valuable.

If you have 1-2 steady clients: I'd suggest organizing common extraction needs into fixed prompt templates, like "Extract invoice amount, date, and issuer," save them in your notes, and copy-paste them each time to save a few more minutes.

If you're scaling up and processing tons of images monthly: At this point, you can consider asking a tech-savvy friend to help you connect Zhipu's API (the way programs automatically call the model, no manual image uploading needed) into your workflow for batch auto-extraction. But this isn't mandatory; manually uploading images works perfectly fine. Don't add unnecessary pressure to yourself.