All That Glisters Is Not Gold; Nor Every Negative the AI's Told

Have you ever had a request like: “Compare Q4 sales this year vs. last year, but only for stores that were open in both periods, and exclude one-off promotions”? Or perhaps: “Show me all customers who bought Product A in the last 3 months but not Product B, segmented by region, and sorted by lifetime value”?

These questions and others are not too tough to answer, nowadays at least.

However, back in the 1980s and 1990s, data warehouses were the big new “shiny toy,” and promises of what they could achieve at that time were later known to often exceed real capabilities. In the heady excitement of developing computer power (businesses looking for an edge on their competitors, and data warehouse builders looking to make their product look better than all the other early-to-market products), there was an ecosystem ripe for disappointment.

“Free your valuable business data from isolated operational systems and bring it together in one place for analysis,” promised the data warehouse builders.

Yet the gap between promise and reality emerged for reasons that, in hindsight, seem almost predictable:

The wrong tool for the job: databases were terrible at handling the massive, complex analytical queries a data warehouse required.
Garbage in, garbage out: the information many companies held was just too messy and inconsistent for data warehouse analysis.
Lack of a problem to solve: massive data warehouses were created without major identified business needs.
Incompatible database schemas: in practice, the schemas of multiple databases were often incompatible.
Time trend issues: one database might use calendar year, another fiscal year.
Exploration was hard: data warehouses failed to magically unearth business insights; they only returned value when users already knew the right questions to ask, and had reason to believe the underlying data was clean enough to answer them.

Industry analysts at the time estimated that somewhere between 50% and 70% of such data warehouse projects failed to provide real value. Okay, those figures were often produced by consulting firms with their own interest in selling remediation services; but, what is beyond dispute is that there were enormous financial undertakings, failures occurred, and they weren’t just inconvenient – they sometimes led to bankruptcies, canceled services, and destroyed careers.

Nowadays, even small businesses have the capability to process the sorts of queries that once required multi-million-dollar infrastructure. From approximately 2011 to the present, various innovations have combined to make such data querying cheap and easy.

The relevance of the data warehouse era, to today, is probably not lost on business owners who wish to leverage the enormous possibilities of using AI to optimize their marketing but fear the risks of doing so. From automated funnel building to predictive audiences to specialized ad copy, possibilities abound. Yet with every new AI tool or feature whose marketing promises vast time and money savings, there can be hesitation: a fear that not all that is promised is as it seems.

And this is when the article finds relevance to us.

StatBid is a company investing in AI innovation and trialing different processes and solutions, while also keeping a strict human eye on the quality of results. Informed by our vast e-commerce Google Ads experience we’re carefully navigating the space between AI hype and business-process-obsolescence.

We became aware in April 2026 of a social media post where a marketing agency claimed to have used AI to quickly rate the relevance of search terms to a client’s business, which in turn allowed negative keyword identification at far faster rates. They also claimed a 20% cheaper cost per action, though they didn’t specify their methodology for this claim.

In retrospect, the fact that they said multiple hours of manual negation checks had been required weekly, on an account, could have highlighted the condition of their pre-AI negation situation.

Possibly their client’s account was new, and consisted of Pmax and/or broad-match campaigns that were still ‘finding their feet’? Or perhaps their Google Ad Accounts were not well structured. Or maybe they didn’t evaluate the AI’s suggestions properly.

Because with a mature Google Ads account, having well-structured campaigns, receiving attention from an experienced human, plus Ngram checking … negation checks are rarely required weekly.

Still, we had already been dabbling with exact match keyword negation identification using AI automation that, on first sight, seemed promising. The agency’s claims pushed us to focus more time on this area of research.

We diligently explored the results from our own early “search term relevance to business” prompt on a client that sold products in a specific office equipment sub-niche.

Generally, the results looked good, but there were inconsistencies.

“Office Depot” themed search term clicks performed reasonably well, even though they are not our client. Our AI results, however, flagged such terms as ‘low relevance’ because competitor terms don’t generally perform well, which is often true, but in this specific case they did.

So, the prompt was amended to take into consideration whether a search term had actually achieved a reasonable ROAS, or even any ROAS at all.

Those strongly performing search terms that had gained ROAS immediately had their “potential for negation” forecast downgraded by the AI.

However, if a term was similar to a good converting term but had not itself gained conversions (for example, if 40 slightly differently worded “Office Depot” search terms had been served ads during the period of examination, and only 5 of them converted, but at very good ROAS), the prompt still considered the non-converters good prospects for negation, simply because semantically they did not match “good relevance” and they had brought in no revenue themselves.

So we added an extra condition: is this search term semantically similar to a term that gained revenue?

Then we evolved the prompt to understand whether a term was semantically close to a strong ROAS term or a weak one.

Then we incorporated click-through rates as a new factor, because there are some rarer terms that might not have converted, might not be similar to converters, and have no obvious simple business relevance; and yet, they could have potential because – the CTR is high and our ads tend to reflect the offer well, meaning a high CTR should indicate the offer is close to what the searcher seeks.

We’ve kept adding prompt amendments to counter the notable inefficiencies we discovered using just the ‘relevance-based’ prompt. It’s not that the AI doesn’t give us ‘okay results now’; but, there are enough errors that a human still has to review them closely, erasing the time savings we’d hoped for.

The overall benefit of using this prompt, in addition to our usual Ngram and human review, is not great for this client, so far. There has been no dramatic reduction in cost per conversion at similar conversion volume.

The ‘result’ of our tests might be disappointing for a business owner expecting us (perhaps) to write an article explaining how we read a social media post from an agency purporting 20% reductions in cost-per-an-action, and then created our own AI negation prompt that gained 40% reductions on what we’d achieved before.

That’s not what this article is about.

Making spectacular claims.

It’s more about us documenting ‘the bad’ as well as the ‘good’.

I think the prompt we have now is probably better than what ‘the other marketing agency’ had.

But we can’t know that for sure.

The development of the prompt has not ceased. Its story has not finished. Although it’s not a panacea for the client account we tested it on, there are possibilities on accounts with much vaster budgets, or accounts with products whose converting search terms are less subject to ambiguity – there are other avenues to explore.

AI will benefit businesses more and more. Yet some early product pioneers will get stung, just as we and our clients might have been, had we not looked closely at the early glittering results of this kind of prompt and dived straight in with negations our original ‘relevancy’ prompt gave us.

The data warehouse era didn’t teach us that new technology is worthless. It taught us that the gap between a compelling demonstration and reliable, measurable business value deserves exactly a level of scrutiny and testing that others might not do.

Mike Jelley

Data Scientist

Mike studied Computer Science, then worked in a Management Information Reporting role, switched to a Data Analysis role, and then Paid Ads Campaigns Management in a results driven environment... before joining StatBid. Outside of StatBid he is kept busy by his wife, son and fur babies. Enjoys Sci-fi and fantasy in both novel and 'on screen', and supports Tottenham Hotspur FC.

Tags:

All That Glisters Is Not Gold; Nor Every Negative the AI’s Told

Mike Jelley

Tags:

Mike Jelley

Leave a Reply Cancel Reply

Connect with our team and leave with clear, actionable next steps.

Start Your Free Review

Solutions

Tools

Insights

Events

About StatBid

All That Glisters Is Not Gold; Nor Every Negative the AI’s Told

Mike Jelley

Tags:

Mike Jelley

Related Posts

The Shipping Cost Google Shows Is a Promise Your Checkout Has to Keep

Why Did My Google Ads Account Spend So Much?

Google Ads Campaign Setup Defaults: What We Push Back On

Leave a Reply Cancel Reply

Connect with our team and leave with clear, actionable next steps.

Start Your Free Review

Solutions

Tools

Insights

Events

About StatBid