The $45 Gucci Heels

A few days ago I ran a test against my own product.

I pulled 17 real scan photos from Underpriced AI users. Thrift finds. Estate sale pickups. Garage sale hauls. I fed each photo through two different AI vision models with identical instructions and settings, then compared what came back.

One of the images was a pair of black pleated mule heels. Leather, stitched, unmistakable designer shape.

The cheap model looked at the image and said: "Black Pleated Mule Heels. Brand: Unknown. Estimated value: $45."

The more expensive model looked at the same image and said: "Gucci Black Pleated Ruched Crossover Strap Stiletto Heel Mule Sandal. Brand: Gucci. Estimated value: $550."

Think about what that means for the person holding the scanner in a thrift store. The cheap model tells them to walk past a $550 pair of shoes. The expensive one tells them to grab it, flip it on eBay, and make their whole weekend.

This wasn't an isolated bad result. It was one of seven cases out of sixteen where the cheap model failed in ways that would cost a real reseller real money. On another scan, the cheap model made up a brand name that doesn't exist: "JDEY" on a Spyder ski vest. On a Coach hobo bag it returned no brand at all. On a CB2 stoneware coffee mug it called the item a cast iron beer stein. On a Bunko dice game that the expensive model read cleanly off the box, the cheap model just shrugged and said "please add more details."

Forty-four percent of the branded scans in the test failed.

That's why every single scan you run on Underpriced AI today goes through Claude Opus 4.7. Anthropic's most capable vision model. The premium tier. No exceptions for low-cost plans, no downgrades for free users, no quiet routing of cheap-plan scans to cheap models. One model, the best one, every time.

Here's the whole story.

Why most scan apps use the cheap models (and won't tell you)

If you run a scan app, you pick an AI model. Nobody else sees the choice. The user just gets a result on their phone and assumes the app is doing its best.

The pricing on these models spans a huge range. You can pay a tenth of a cent per scan on Google's Gemini Flash or OpenAI's GPT-4o-mini. You can pay two or three cents on Claude Haiku. Or you can pay eight to thirteen cents on Claude Opus 4.7.

The cheap models are tempting for an obvious reason: if every scan costs you a fraction of a penny instead of a dime, you keep ten times more gross profit per scan. That lets you undercut on price, pour more into ads, or just pocket the savings. The user never sees which model you used. Most founders, if I'm being honest, pick the cheap option.

The problem is that the cheap models fail on the exact scans where accuracy matters most. Branded items. Maker's marks. Distinctive product lines. Vintage pieces with specific names. The things a reseller is actually trying to flip for profit, not just identify as "some kind of mug."

I know this because I tested it.

The test, in detail

I built a small benchmark script. It pulled 20 random scan photos from my own production database, stratified across four buckets: recent scans, high-confidence scans, low-confidence scans, and branded scans. Four images were unreachable because those users had deleted their accounts, so the final sample was 16.

For each image I ran the exact same scan pipeline twice. Same system prompt. Same instructions. Same user input. Same temperature. The only variable was which Claude model the image went to. No web search, no external knowledge hacks, nothing in the call that could distort the comparison.

The cheap model matched the expensive one on clean identifications about a quarter of the time. Another third of cases were close but lost detail that matters for eBay search: the cheap model got "vintage pin-up print with turtle" where the expensive one got "Earl Moran pin-up, 'Even a Turtle Gets Nowhere Until He Sticks His Neck Out.'" Big difference when a buyer is typing "Earl Moran" into the eBay search bar. The listing that names the artist gets the click.

And then there were the outright failures.

The Gucci heels. Cheap model: "Brand: Unknown, $45." Expensive model: "Brand: Gucci, $550."

A Coach hobo bag. Cheap model: "Leather Crossbody Shoulder Bag." No brand. Expensive model: "Coach Leather Hobo Shoulder Bag in Magenta/Berry, brand: Coach."

A Spyder ski vest. Cheap model invented a brand. It said "brand: JDEY," a brand that does not exist and has never existed. The real brand was clearly printed on the label.

A CB2 ceramic mug. Cheap model: "Cast Iron Coffee Mug / Beer Stein." Not only missed the CB2 branding but got the material completely wrong. Stoneware and cast iron aren't close to each other visually or functionally.

A Bunko dice game in its original box. Cheap model: "Unable to identify, please add more details." Expensive model: "Bunko Girls' Night Out Dice Game, $10." The text was on the packaging. The cheap model just wouldn't read it.

Seven of sixteen. Not edge cases. Not unusual items. These are the kinds of finds people pay scan apps to catch, and a cheap-model app would have walked them right past every one.

So I picked Opus

After the cheap model flunked, I ran the same benchmark against Claude Opus 4.7. Opus is the most expensive Claude model Anthropic sells. It costs roughly five to seven times more than Sonnet and about thirty times more than Haiku, measured per scan.

The results? Opus matched the baseline on most of the easy cases. No surprise. The baseline Sonnet model is already pretty good, and for common items like a Vans sneaker or a Pyrex bowl the two models hand back nearly identical answers.

Where Opus pulled ahead was the hard stuff. The CB2 mug that the cheaper model miscategorized as cast iron? Opus nailed it: "CB2 Black Matte Stoneware Mug, brand: CB2, confidence 0.88." A Holiday Barbie collector doll that Sonnet returned as "Collectible Doll in Plastic Packaging"? Opus got the Barbie brand attribution and the collector line. On a Coach handbag, Opus went further than Sonnet and named the specific "Colette" product line, which is exactly the kind of detail that pulls search traffic on eBay.

The wins aren't dramatic individually. Twelve percent of scans saw a meaningful quality improvement. Six percent saw a genuine rescue, cases where Sonnet was confidently wrong and Opus was confidently right.

But think about what six percent means at scale. If I'm running, say, a thousand scans a month, that's sixty scans per month where the difference between Opus and a cheaper model is the difference between a user walking away with an accurate find and a user walking away with a wrong answer that costs them money. Sixty moments of trust built or broken.

What it costs us

I'm going to be honest with you about the dollars, because you deserve to know where your subscription money goes.

Every scan on Underpriced AI costs us about eight to nine cents in real Anthropic API charges. That's with caching, with our own pipeline optimizations, with the market-research and enrichment calls we run in parallel to get real sold data. The cheap alternative would have been closer to one and a half cents per scan. Over a thousand scans, that's a difference of about sixty-five dollars a month that I'm not pocketing.

I'm fine with that.

Here's why. Accuracy is the entire job of a scan tool. A reseller who pays us nine bucks a month isn't paying for a pretty interface or a clever app icon or faster results. They're paying for one specific thing: tell me the truth about this item, fast, so I can decide whether to buy it. If we get that wrong on the items where it matters most, we've failed. And we've probably failed in a way that costs the user more money than the monthly subscription.

Paying the premium for Opus is our insurance policy against failing at our one job. Sixty-five bucks a month is cheap for that kind of insurance.

What this means for your scans

Next time you point your phone at a find on the shelf, know this: the model looking at your photo is the same one researchers, Fortune 500 companies, and well-funded AI teams pay premium rates for. We don't route low-tier plans to cheaper models to save money. We don't run the good stuff only on the top tier. We don't do A/B tests where half the users get a weaker identification. Every scan, every plan, every user, every time.

The decision is right there in our code, and I wrote the comment explaining it myself. If you ever want to verify, src/lib/claude.ts has a constant called DEFAULT_SCAN_MODEL set to "claude-opus-4-7". I committed it on April 11, 2026, after staring at the benchmark results for an hour and deciding I couldn't live with the cheaper option.

So the next time you're in a thrift store holding a pair of shoes that could be a Gucci or could be a knockoff, or a Coach bag that could be worth $50 or $500 depending on the line, or a ceramic mug that could be a generic piece or a CB2 designer find, the answer you get on your phone isn't a compromise. It's the best answer the best AI in the world can give you, right there in the aisle.

That's what you're paying for. That's what we're delivering.

Ready to try the boutique scanner? Scan your next thrift find for just 99¢ as a single-scan pack, no subscription required. You'll see exactly the kind of brand-level detail Opus catches on your own items. If you flip often, the $5/month Starter plan gets you 20 scans for less than one bad buy at Goodwill. Get started.

Frank Kratzer is the solo founder of Underpriced AI. He writes the code, answers the support emails, and still gets excited every time a user tells him they caught a $200 flip at Goodwill.

The $45 Gucci Heels: Why We Use Claude Opus on Every Scan