As a late followup to all the press, summarizing results of 2014 Winter Olympics, I decided to apply data envelopment analysis to find most efficient teams on 2014 Winter Olympics. I’ll use Benchmarking package to estimate efficiencies and ggplot2 + ggthemes to visualize it.
Data envelopment analysis is a methodology to distinguish between efficient and inefficient units, which could be anything from industries to firms to branches to hedge funds to algotrading robots to platoons to cockroaches to galaxies etc etc. Initially born within dull and stern macro economic realms, DEA gradually evolved into KPI-like methodology applied to approximately anything, usually studied and researched under operation research umbrella. Anything which could be described as a black box, transforming inputs into outputs, could one day appear a subject to DEA in some peer-reviewed paper. Currently this is one of the most cited and published topic in business and economics field. Everyone likes that it requires minimal assumptions, is able to cope with different units of measure, and is non-parametric. No free lunches here: as a price for universality you have to collect ALL units before DEA, otherwise you risk to significantly overestimate efficiency of the ones you’ve managed to collect.
Putting math magic aside (it’s described perfectly lapidary in Pessanha, Marinho, Laurencel, Amaral presentation Implementing DEA models in the R program and their respective paper), let’s proceed to pictures. But first lets calculate DEA scores. DEA score ranges from 1 to 0, where 1 is for efficient decision making units (DMUs), and all other integers – for all various levels of inefficiency. Paraphrasing Tolstoy (see Prokudin-Gorskiy photo below), efficient DMUs are all alike; every inefficient is inefficient in its own way.
In R, you could use Benchmarking package to do all DEA things (or write your own function, like Pessanha et al.) Syntax is simple: you supply dea() function with inputs dataframe, outputs dataframe, and several options (we’ll discuss one of them – return to scale – in greater details later). So here comes my data preparation routine:
library(mosaic)olympics[,2:ncol(olympics)] <- olympics[,2:ncol(olympics)] + 0.000000000000000001
rownames(olympics) <- as.character(olympics$Country)
olympics <- olympics[,2:ncol(olympics)]
olympics <- as.data.frame(olympics)
# let’s create inputs/outputs:
inputs <- subset(olympics, select = Athletes)
outputs <- subset(olympics, select = TotMedals)
dea.dtf <- data.frame(inputs, outputs, DMUs)
To replicate it you have to download my dataset first. Now everything is ready for the first launch. For a starter, we’ll have very simple DEA model. All DMUs – teams – are black boxes, transforming athletes to medals.