StatCave 7: Control vs. Certainty

Why is it that everyone who manages AdWords has some degree of difficulty getting the intended output? There's an underlying compromise, a balance, between two types of desirable account behavior, and it's happening regardless of the type of bidding solution you use, or whether or not you're even aware of it.

Click here to see Veritasium's video on Regression to the Mean.

Transcript:

If you have ever managed ad words, then you've struggled to hit a particular return on ad spend target, that's just fundamental to the entire experience of managing ads. But why is it so difficult? Well it turns out that there's this constant balance between the amount of control you have, and the amount of statistical certainty you have on your decisions. So we'll dive into exactly what causes this conflict between these two competing sides, and some recommendations for how to mitigate that problem.

To start with, we need to look at how bidding works fundamentally, and it's actually really, really simple. You start by understanding how much of your revenue you're willing to spend on your ads, and that is called a cost of sale target, and it's generally shown as a percentage, it's just the reciprocal of your return on ad spend target. Then you need to know how much revenue you're getting per click, that's simple division, and the resulting metric is called value per click. You multiply these two together, which is a very simple bit of math, and you find out exactly what you're intended target CPC ought to be.

From there you have to pick what layer of your account you're willing to apply your bids to. The larger the samples that you're bidding on, the more certain your decisions, but the slower you can react. So it's a balancing act between certainty and reactiveness. And so you've got account level data, and that obviously gets broken out into campaigns, which are then broken out into ad groups, and then any number of segments based on anything from product segmentation in your shopping campaign, to audience segments, to time of day or geography, and so on. And so you can take almost any account of any size and eventually sub divide it down to the point where none of the individual segments have enough data to produce a meaningful bid.

You can think of this as if it were a car, but a very strange car indeed, where there's a coin slot on your steering wheel, and the coin slot in your windshield. And you have a finite number of coins to put in, and the more coins you put into the steering wheel the more responsive the car is to your commands, the more coins you put into the windshield, the more clear the road ahead. How do you decide how many coins to put into each slot? Put too many coins into the steering wheel to ensure you have the ability to respond to changes, and you lose the ability to see with certainty what's in front of you on the road. Put too many coins into the windshield, and you can't respond to what you're seeing. Either way, the outcome is undesirable.

So how do you resolve this constant conflict between your sample sizes and your level of granular control, because you do need to find the right balance for your account, and no two accounts are the same on this front.

There are three major approaches that I'm familiar with. The first one is machine learning, now this is going to be the worlds most gross oversimplification, so if anyone watching this is actually building a machine learning bidding system, I apologize ahead of time. But this is fundamentally what you're learning a machine learning system to do. So, you start with an input, you start with an intended output, and then you either import some preexisting data, or you run a bunch of tests. And as you start to accumulate more and more data in the system, the system gets a better and better idea of the pattern that is emerging, and eventually identifies what is the "Best possible input."

Now, in our example I've designed a very specific type of algorithm to this machine learning bucket, and we'll get to other types of artificial intelligence in a moment. But if you want to learn more about exactly the kind of logic that's going into this kind of system, not necessarily the implementation but the output, I highly recommend checking out Veritasium's video on regression to the mean, I have put the link in the description below.

The distinction between machine learning and artificial intelligence in this case is one that I'm making, this isn't necessarily standardized, in fact the two are often used interchangeably. But I think that oversimplifies even farther than we're going to do.

And so I'm going to leave regression analysis in the machine learning bucket, and reserve the term AI for things that are built on top of neural nets, and other sophisticated learning algorithms that aren't just building a bunch of data on top of a pre existing and pre determined formula. For these, you once again have to start with a bunch of example data, but instead of just mashing it all together and finding some sort of average and banking on that, or variations on that theme, this type of system tries to understand some sort of underlying pattern that is true beyond just the individual points of data. Some sort of generalities about the way certain types of behavior can be tracked over time.

And so you give it a bunch of individual data points, and then later you say does this match or not match the thing you're looking for? And if it does match, take action A, and if it doesn't, take action B. But, sometimes that can be a little bit complicated.

So these systems are really only as successful as the amount of data that you can give it, and the quality of the resulting models. But they are really dependent on the sheer volume of data. So if you're an enterprise account, your site does more than a couple of hundred million in revenue a year, then this could be a great system. But if you're a very small account, say spending just a couple of thousand a month or less, then it really depends on whether or not your account is the only source of data that it's being trained against, and if it's being trained against other data, it may or may not be as relevant to your account as you'd like.

Now, that was way, way simplified, like barely scratching the surface of even the elevator pitch for an actual machine learning, or artificial intelligence based bidding solution. But I think it's important to recognize that I'm not just talking about third party vendors, I'm also talking about Google's built in optimization, like their OS bidder, or Enhance CPC. These systems just have a lot more inputs, because Google has a lot of data behind the scenes on shoppers and shopper behavior that they're trying to bring to bear.

But if you'd tried any of those systems, you see that sometimes they don't really work well, and hopefully this gives you a little bit of an idea of how those systems might actually be fallible.

Now those are the first two, I promised you three. The third one is obviously maintaining all of your bids manually, just using the ad words interface traditionally. Now the first two suffer from the same limitations that the third one does, namely that if you don't have enough data, they're not going to perform well. The same is true here, but it's a little bit more concrete.

So this is an exercise in determining how granular can you get? And I'm not just talking about say by breaking out your shopping campaign tree into finer and finer branches, but also things like how long a date range can you use? Well the answer is you have to balance all of those different types of segments against each other, so that the resulting sample size has at least a certain number of clicks and conversions. And the number that is appropriate for your click and conversion minimums is going to be dependent mostly on your conversion rate.

Let's for example take a look at a hypothetical product group in a shopping campaign that has a bunch of clicks starting to accumulate. You need to know whether or not you should start to back off your bid because of a lack of performance, or whether you should let it ride because any moment now the next click might convert.

Well there's two ways to go about that. The first is knowing what your click threshold should be. So, for example, one way that I've done it in the past is say take the conversion rate and then do a little bit of math and figure out how many clicks would I need in order to have a 95 or 99% certainty of at least one conversion within that pool? I've I get that number of clicks and I don't yet have a conversion, then something's probably wrong, and it's worth reevaluating.

The other approach is a little more speculative. Take any number of clicks that you have, with a much lower minimum, say like 20 or 30 clicks, and then you assume that the next click literally is that conversion that you've been waiting for. Well that gives you a conversion rate, and then if you know what you expect the average order value to be from that, you can work your way back to a value per click number, and you can generate a bid from that. If it's overly optimistic it'll run up additional clicks, but that'll suppress the perceived conversion rate, and lower your bid accordingly. If you do get those conversions, then that'll bolster the VPC, and then you can bid up normally.

Now, for date ranges there at least two variables to consider when determining how long you're willing to go back in time for data. The first is seasonality, obviously if you're diving all the way back to another period in the year where the behavior can't be expected to be representative of what you should expect tomorrow, that's a problem. You can use year over year data in some cases to help mitigate that, but it's a hard limit.

The other thing is what happens if the behavior suddenly changes? Say a new competitor comes into the market and disrupts the normal flow of shopper behavior. That kind of change is more quickly detected and responded in a shorter date range obviously, but that shorter date range comes at the expense of the level of granularity you can use across keyword bids or additional segments within your shopping campaign, and so on.

Imagine you have a text campaign, so we're not worried about the sub division within the product tree of a shopping campaign. You have the option to either bid at the [inaudible 00:08:46] level or the keyword level. Keyword level is way more granular, you have a lot more control. However, it also reduces your sample size by a factor of how many keywords you have on average per ad group.

So if you're going to do that, say you have 10 key words in an add group, you suddenly 1/10th as much data per keyword, on average, in order to make your decision. And you may end up then needing 10 times as long a date range, which if you're trying to respond to sudden changes in your market, or seasonality, and things like that, that can be more detrimental than beneficial.

So you might look for something that's a little less granular, that still gives you some of that control, say by lumping all of your broad match and phrase match keywords together, and all of your exact match keywords together. Then you only have two samples within the ad group, so you're only doubling the amount of date range you may need to make up for that additional sub division. But that's better than ten times.

Hopefully this helps explain why, no matter what you've tried, you're ad words campaign's always seem to be some kind of struggle to get them to behave the way you'd expect.