Ads Analysis

Solution

1) Which ad groups are doing the best?

library(data.table)
library(tseries)
ad_table <- fread("~/Google Drive/take_home_challenge/challenge_20/ad_table.csv")

I want to maximize the probability that people click AND convert as well as the profit on conversion and minimize the cost of clicks. Which is basically, clicks/shown x convert/clicks x (total revenue - avg_cost_per_clicks x clicks) Therefore metric is conversion rate x profit per conversion.

# create profit column

ad_table$profit <- with(ad_table, total_revenue - clicked*avg_cost_per_click)

ad_table$profit_per_click <- with(ad_table, profit/clicked)

# want to maximize conversion rate and profit per click. 

ad_table$prob_convert <- with(ad_table, converted/shown)
ad_table$metric <- with(ad_table, prob_convert*profit_per_click)

ad_table[ , .(mean_metric = mean(metric, na.rm = T)) , by = ad][order(mean_metric)]

5 best ad groups based on this metric would be ad_group 27, 31, 14, 16 and 2.

2) How many ads will be shown on Dec 15, assuming present trend continues?

A number of stationarity tests are available, one of the most popular ones is the Augmented dickey fuller test. A negative coefficient and pvalue < 0.05 indicates original series is stationary. A coefficient value of zero indicates original series needs first differencing. I normalize the series and calculated first difference to see if the first differenced series is stationary.

normal <- function(x) { #x <- x[x != 0]
    (x - mean(x, na.rm = T))/sd(x, na.rm = T) }

first_diff <- function(x){ y <- c(x[-1], NA) - x
    y[!is.na(y)] }

ad_table[shown != 0 , adf.test(first_diff(normal(shown)) , alternative = "stationary"), by = ad ][, .(ad, statistic, p.value)]

I do the Augmented Dickey Fuller test for first difference of the series. ADF test shows that all except the 19th ad group is stationary after first differencing. Plotting the first differenced series for the 19th ad group.

plot_series <- function(z) { 
    x = 1:length(z[z!=0]) 
    plot(x = x, y = (z[z!=0]), typ = "l")
    #lines(predict(loess(z[z != 0] ~ x )), col = "green" ) 
}
par(mfrow = c(2,2), mar = c(1,3,1,3))
ad_table[shown!= 0 & ad == "ad_group_19", plot_series(first_diff(shown))]
ad_table[shown!= 0 & ad == "ad_group_19", plot_series(first_diff(shown[c(10:length(shown))]) )]

Looks like there might be an outlier around the 9th observation (first plot). plotting from 10th obs after gives second plot - looks fairly stationary.

# adf. test of the first diff of the shown data of 9th ad group is now stationary. 

ad_table[shown!= 0 & ad == "ad_group_19", adf.test(first_diff(shown[c(10:length(shown))]) , alternative = "stationary") ]

## 
##  Augmented Dickey-Fuller Test
## 
## data:  first_diff(shown[c(10:length(shown))])
## Dickey-Fuller = -3.723, Lag order = 3, p-value = 0.03529
## alternative hypothesis: stationary

Since the differenced series is stationary, original series, \(y_{t} = y_{t-1} + e_{t}\) which is also known as random walk.
Therefore the expected number of ads that will be shown on “Dec 15” date is ads shown on last day for that group.

 ad_table[ ,  .( predicted_shown = tail(shown, 1)) , by = ad ]

3) Categorizing the ad groups based on a linear trend of avg cost.

#categorizing the ad groups based on a linear trend of avg cost.
slope <- function(z) { mz = mean(z, na.rm = T)
        x = 1:length(z[z!=0]) 
        y = z/mz 
    inc = round(coef(lm(y[z != 0] ~ x + x*x))[2], 3)  
    if (inc > 0) return("inc")
    if (inc == 0) return("flat")
    if (inc < 0) return("dec")
    }
ad_table[ ,  slope(avg_cost_per_click) , by = ad ]

Visualizing the same by plotting moving average and trend line for ad_group 1, 2 and 8 which have increasing, flat and decreasing avg cost trends.

ma <- function(x, n = 5) {filter(x, rep(1/n, n), sides = 1 )}
plot_tr <- function(z) { 
    
    plot(x = 1: length(z), y = z, typ = "l")
    lines(x = 1: length(z), y =  ma(z), typ = "l", col = "red" ) 
    x = 1:length(z[z!=0]) 
    lines(predict(lm(z[z != 0] ~ x  )), col = "blue" ) 
}

Ads Analysis

Problem Statement:

Solution

1) Which ad groups are doing the best?

2) How many ads will be shown on Dec 15, assuming present trend continues?

3) Categorizing the ad groups based on a linear trend of avg cost.