Problem Statement:

Company XYZ is a food delivery company. Like pretty much any other site, in order to get customers, they have been relying significantly on online ads, such as those you see on Google or Facebook. At the moment, they are running 40 different ad campaigns and want you to help them understand their performance. Specifically, you are asked to:
- If you had to identify the 5 best ad groups, which ones would they be? Which metric did you choose to identify the best ad groups? Why? Explain the pros of your metric as well as the possible cons.
- For each group, predict how many ads will be shown on Dec, 15 (assume each ad group keeps following its trend).
- Cluster ads into 3 groups: the ones whose avg_cost_per_click is going up, the ones whose avg_cost_per_click is flat and the ones whose avg_cost_per_click is going down.


Solution

1) Which ad groups are doing the best?
library(data.table)
library(tseries)
ad_table <- fread("~/Google Drive/take_home_challenge/challenge_20/ad_table.csv")


I want to maximize the probability that people click AND convert as well as the profit on conversion and minimize the cost of clicks. Which is basically, clicks/shown x convert/clicks x (total revenue - avg_cost_per_clicks x clicks) Therefore metric is conversion rate x profit per conversion.

# create profit column

ad_table$profit <- with(ad_table, total_revenue - clicked*avg_cost_per_click)

ad_table$profit_per_click <- with(ad_table, profit/clicked)

# want to maximize conversion rate and profit per click. 

ad_table$prob_convert <- with(ad_table, converted/shown)
ad_table$metric <- with(ad_table, prob_convert*profit_per_click)

ad_table[ , .(mean_metric = mean(metric, na.rm = T)) , by = ad][order(mean_metric)]


5 best ad groups based on this metric would be ad_group 27, 31, 14, 16 and 2.

2) How many ads will be shown on Dec 15, assuming present trend continues?

A number of stationarity tests are available, one of the most popular ones is the Augmented dickey fuller test. A negative coefficient and pvalue < 0.05 indicates original series is stationary. A coefficient value of zero indicates original series needs first differencing. I normalize the series and calculated first difference to see if the first differenced series is stationary.

normal <- function(x) { #x <- x[x != 0]
    (x - mean(x, na.rm = T))/sd(x, na.rm = T) }

first_diff <- function(x){ y <- c(x[-1], NA) - x
    y[!is.na(y)] }

ad_table[shown != 0 , adf.test(first_diff(normal(shown)) , alternative = "stationary"), by = ad ][, .(ad, statistic, p.value)]

I do the Augmented Dickey Fuller test for first difference of the series. ADF test shows that all except the 19th ad group is stationary after first differencing. Plotting the first differenced series for the 19th ad group.

plot_series <- function(z) { 
    x = 1:length(z[z!=0]) 
    plot(x = x, y = (z[z!=0]), typ = "l")
    #lines(predict(loess(z[z != 0] ~ x )), col = "green" ) 
}
par(mfrow = c(2,2), mar = c(1,3,1,3))
ad_table[shown!= 0 & ad == "ad_group_19", plot_series(first_diff(shown))]
ad_table[shown!= 0 & ad == "ad_group_19", plot_series(first_diff(shown[c(10:length(shown))]) )]


Looks like there might be an outlier around the 9th observation (first plot). plotting from 10th obs after gives second plot - looks fairly stationary.

# adf. test of the first diff of the shown data of 9th ad group is now stationary. 

ad_table[shown!= 0 & ad == "ad_group_19", adf.test(first_diff(shown[c(10:length(shown))]) , alternative = "stationary") ]
## 
##  Augmented Dickey-Fuller Test
## 
## data:  first_diff(shown[c(10:length(shown))])
## Dickey-Fuller = -3.723, Lag order = 3, p-value = 0.03529
## alternative hypothesis: stationary

Since the differenced series is stationary, original series, \(y_{t} = y_{t-1} + e_{t}\) which is also known as random walk.
Therefore the expected number of ads that will be shown on “Dec 15” date is ads shown on last day for that group.

 ad_table[ ,  .( predicted_shown = tail(shown, 1)) , by = ad ]


3) Categorizing the ad groups based on a linear trend of avg cost.


#categorizing the ad groups based on a linear trend of avg cost.
slope <- function(z) { mz = mean(z, na.rm = T)
        x = 1:length(z[z!=0]) 
        y = z/mz 
    inc = round(coef(lm(y[z != 0] ~ x + x*x))[2], 3)  
    if (inc > 0) return("inc")
    if (inc == 0) return("flat")
    if (inc < 0) return("dec")
    }
ad_table[ ,  slope(avg_cost_per_click) , by = ad ]


Visualizing the same by plotting moving average and trend line for ad_group 1, 2 and 8 which have increasing, flat and decreasing avg cost trends.

ma <- function(x, n = 5) {filter(x, rep(1/n, n), sides = 1 )}
plot_tr <- function(z) { 
    
    plot(x = 1: length(z), y = z, typ = "l")
    lines(x = 1: length(z), y =  ma(z), typ = "l", col = "red" ) 
    x = 1:length(z[z!=0]) 
    lines(predict(lm(z[z != 0] ~ x  )), col = "blue" ) 
}