We'll be using PYMC3 for this so first we import that.

```
import pymc3 as pm
```

Here we can imagine a scenario in which we are running a test that is displaying two different versions of an ad to customers arriving on our website.

In this example we ran our test for a week or so and our data shows us that 3,551 individuals were exposed to our control while 5,693 were exposed to our variation. 63 success events were recorded from the control group while 125 were recorded from the variation group.

```
n1 = 3551
x1 = 63
n2 = 5693
x2 = 125
```

Next we have to set the parameters for our prior probability distribution. An A/B test is not entirely unlike the ubiquituous coin-toss example. I went ahead and borrowed the PYMC3 code structure for a coin toss from here and modified it slightly to suit our needs.

An uninformed prior for this would be a Beta(1,1) which means we say that before we know anything, all values of the potential conversion rates are equally likely.

```
alpha = 1
beta = 1
```

Now we construct the model. PYMC3 uses this specific type of syntax with a `with`

statement. Notice we are constructing two different probability distributions and then creating a third based on the difference between the two. That is to say, we are interested in the difference in the probable conversion rates between A and B.

```
with pm.Model() as model:
p1 = pm.Beta('p1', alpha=alpha, beta=beta)
y1 = pm.Binomial('y1', n=n1, p=p1, observed=x1)
p2 = pm.Beta('p2', alpha=alpha, beta=beta)
y2 = pm.Binomial('y2', n=n2, p=p2, observed=x2)
pm.Deterministic('difference', p2 - p1)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(niter, step, start, random_seed=123, progressbar=True)
```

We are using the Metropolis-Hastings algorithm to solve the model computationally. While we could solve this analytically, it's a bit more fun.

Finally, we are interested in looking at what the probability is that the Variation is greater than our Control.

`_ = pm.plot_posterior(trace['difference'], ref_val=0)`

This plots a graph of the difference distribution with a reference bar at 0. This probability distribution represents the various probabilities that B is greater than A.

We can see from the above chart that the difference distribution is saying that the 95% highest density interval is between -.001 and 0.01. Also, the probability that B is better than A is 93.3%. We have a 6.7% probability that B is worse. We are safe to go with B.

It's important to frame the decision in terms of the risk too. Seeing a 80% probability of B being greater than A might sound reasonable, but if the stakes are high enough you might not be willing to accept the 20% chance that implementing B is worse than A!

]]>A short while ago I published a rather technical post on the development of a python-based attribution model that leverages a probabilistic graphical modeling concept known as a `Markov chain`

.

I realize what might serve as better content is actually the motivation behind doing such a thing, as well as

]]>A short while ago I published a rather technical post on the development of a python-based attribution model that leverages a probabilistic graphical modeling concept known as a `Markov chain`

.

I realize what might serve as better content is actually the motivation behind doing such a thing, as well as providing a clearer understanding of what is going on behind the scenes. So to that end, in this post I'll be describing the basics of the Markov process and why we would want to use it in practice for attribution modeling.

A Markov chain is a type of probabilistic model. This means that it is a system for representing different states that are connected to each other by probabilities.

The state, in the example of our attribution model, is the channel or tactic that a given user is exposed to (e.g. a nonbrand SEM ad or a Display ad). The question then becomes, given your current state, what is your next most likely state?

Well one way to estimate this would be to get a list of all possible states branching from the state in question and create a conditional probability distribution representing the likelihood of moving from the initial state to each other possible state.

So in practice, this could look like the following:

Let our current state be `SEM`

in a system containing the possible states of `SEM`

, `SEO`

, `Display`

, `Affiliate`

, `Conversion`

, and `No Conversion`

.

After we look at every user path in our dataset we get conditional probabilities that resemble this.

P(SEM | SEM) = .1

P(SEO | SEM) = .2

P(Affiliate | SEM) = .05

P(Display | SEM) = .05

P(Conversion | SEM) = .5

P(No Conversion | SEM) = .1

This can be graphically represented.

Notice how the sum of the probabilities extending from the SEM state equal to one. This is an important property of a Markov process and one that will arise organically if you have engineered your datset properly.

Above we only identified the conditional probabilities for scenario in which our current state was SEM. We now need to go through the same process for every other scenario that is possible to build a networked model that you can follow indefinitely.

Now up to this point I've written a lot about the process of defining and constructing a Markov chain but I think at this point it is helpful to explain `why`

I like these models over standard heuristic based attribution models.

Look again at the fully constructed network we have created, but pay special attention to the outbound Display vectors that I've highlighted in blue below.

According to the data, we have a high likelihood of not converting at about 75% and only a 5% chance of converting the user. However, that user has a 20% probability of going proceeding to SEM as the next step. And SEM has a 50% chance of converting!

This means that when it comes time to do the "attribution" portion of this model, Display is very likely to increase its share of conversions.

Now that we have constructed the system that represents our user behavior it's time to use it to re-allocate the total number of conversions that occured for a period of time.

What I like to do is take the entire system's probability matrix and simulate thousands of runs through the system that end when our simulated user arrives at either `conversion`

or `null`

. This allows us to use a rather small sample to generalize because we can simulate the random walk through the different stages of our system with our prior understanding of the probability of moving from one stage to the next. Since we pass a probability distribution into the mix we are allowing for a bit more variation in our simulation outcomes.

After getting the conversion rates of the system we can simulate what occurs when we remove channels from the system one by one to understand their overall contribution to the whole.

We do this by calculating the `removal effect`

^{[1]} which is defined as the probability of reaching a conversion when a given channel or tactic is removed from the system.

In other words, if we create one new model for each channel where that channel is set to 100% no conversion, we will have a new model that highlights the effect that removing that channel entirely had on the overall system.

Mathematically speaking, we'd be taking the percent difference in the conversion rate of the overall system with a given channel set to NULL against the conversion rate of the overall system. We would do this for each channel. Then we would divide the removal CVR by the sum of all removal CVRs for every channel to get a weighting for each of them so that we could finally then multiply that number by the number of conversions to arrive at the fractionally attributed number of conversions.

If the above paragraph confuses you head over to here and scroll about a third of the way down for a clear removal effect example. I went and made my example system too complicated for me to want to manually write out the the removal effect CVRs.

Well by now you have a working attribution model that leverages a Markov process for allocating fractions of a conversion to multiple touchpoints! I have also built a proof-of-concept in Python that employs the above methodology to perform markov model based attribution given a set of touchpoints.^{[2]}

Anderl, Eva and Becker, Ingo and Wangenheim, Florian V. and Schumann, Jan Hendrik, Mapping the Customer Journey: A Graph-Based Framework for Online Attribution Modeling (October 18, 2014). Available at SSRN: https://ssrn.com/abstract=2343077 or http://dx.doi.org/10.2139/ssrn.2343077 ↩︎

Ultimately I want a space to do a few things:

- Publish articles based on work I am currently doing in the space of marketing analytics
- Post informational details on applied machine learning and marketing analytics

Ultimately I want a space to do a few things:

- Publish articles based on work I am currently doing in the space of marketing analytics
- Post informational details on applied machine learning and marketing analytics for my own edification since I tend to hold on to things better by writing about them
- Grow my online presence by contributing content and reaching people who are in industry to grow my professional network
- Try to finally own some search results for my name. Jeremy Nelson. This will be difficult!

I've had blogs before. Ranging from a wide variety of topics like travel, SEO, health. One topical travel post even blew up on WordPress because I was at the right place at the right time, but I quickly learned that it doesn't mean anything if you don't provide people a reason to come back.

Basically whatever interest I held for a few months I decided to write about it.

Well all my other blogs were on WordPress. This one is hosted on Google Compute Engine and powered by Ghost!

Yeah I don't think that that really matters to anybody but seeing as this is more directed at my professional life and something I do every day, I think it should hopefully have more staying power.

Anyway, I hope you find some useful information here. If you like marketing analytics or the applications of statistical learning to solving problems I encounter in eCommerce, marketing and logistics, stick around or subscribe to me.

]]>