Benchmarking is the process of comparing an organization´s operational performance to that of other organizations to become “best in class” and make continual improvements. A logical performance evaluation would compare the actual attained utility level with the maximum attainable utility level. However, in practical evaluations, it is not always possible or upfront to have a clear idea of what maximum utility means. In such cases, measuring efficiency helps in the description of goal setting based on actual behavior. Benchmarking compares companies that are $comparable$ in terms of products, processes, legal requirements, applicable prices, productivity, cost materials, energy use, etc. In this sense, benchmarking uses measures of efficiency to compare the best use that entities make of resources considering more than one criteria.

Firm Benchmarking - An Overview

Benchmarking can be used for intra and inter organizational comparisons as well as for longitudinal or dynamic comparisons where the performance of entities over different time periods is compared. The objectives of using benchmarking may include learning, coordination and motivation (Bogetoft and Otto 2011):

Learning occurs when firms want to know how well they are doing compared to others and what they can learn from. Advances in interactive benchmarking attempt to allow individual firms to define the comparison basis (potential peers), the objective (e.g. cost reduction or sales expansion) and the aspiration level (e.g. top ten) among others aspects of benchmarking evaluations. This approach is generally used in environments where ==firms see themselves as colleagues== more than as competitors, e.g. energy-networks, farmers, etc.
Coordination occurs when the allocation of tasks tries to ensure that the right firms are producing the right products at the right place and time. ==Benchmarks, tournaments and bidding schemes== are used widely to coordinate operations at optimal cost and performance especially when evaluating structural efficiency of whole industries or for decomposing aggregate inefficiency into inefficiencies in production units.
Motivation is the result of, for example, the firm knowing more precisely the performance of an employee and, based on that, ==better targeting incentives==. Adverse selection and moral hazard can be limited by implementing ex-ante benchmarking studies. Salary plans, tariff regulations and budgeting rules can be tied to the outcome of the benchmarking as well.

Benchmarking methods

Various methods are typically applied in organizations for performance benchmarking:

simple ratio indicators related to a specific variable (e.g. consumption: kg of water consumed/kg of product produced),
proportional values (e.g. drinking water input in kg in 2009/water input in kg in 2008), or,
index figures over the course of time (water input in 2009/water input in the base year 2007).

Other more comprehensive methods may be used for specific situations or specifically oriented towards the organization´s objectives. These performance indicators are seldom disclosed which prevents or at least makes it more difficult to do the comparison among firms. However, the major problem of measuring performance and the greatest challenge, in the end, comes to summarizing in a single criterion (an index) the different dimensions of the problem in order to better aid in decision making or to compare with analogous units. Moreover, it is not only desirable to have a single indicator which would allow performance comparison in an objective way but also to study the effect of externalities (i.e. changes in technology, regulations or economic tools) which could impact decisions over time. Data Envelopment Analysis has demonstrated its utility in these two contexts.

Data Envelopment Analysis

Data Envelopment Analysis (DEA) is a powerful benchmarking technique used in operations research and management science to measure the efficiency of decision-making units (DMUs), such as companies, organizations, and institutions. It’s particularly useful in cases where multiple inputs and outputs need to be considered simultaneously, and traditional methods may fall short in providing a comprehensive evaluation.

DEA allows for the comparison of the relative efficiency of different DMUs by considering their input-output relationships. By evaluating how efficiently a DMU transforms inputs into outputs, DEA helps identify best practices and areas for improvement, which can be invaluable for decision-making, resource allocation, and performance improvement.

It’s widely applied across various industries, including finance, healthcare, education, and manufacturing, to name a few. Whether it’s optimizing processes, evaluating the performance of service providers, or benchmarking against competitors, DEA provides valuable insights into operational efficiency and effectiveness.

Advantages of DEA include its ability to account for substitution possibilities between different performance criteria (i.e. when reduction in one factor may come at the cost of increasing another) as well as its independence of subjective aggregation of weights. DEA does not require any prior assumptions on the underlying functional relationships between inputs and outputs. It is, therefore, a non-parametric approach. Yet, normative judgments and subjective valuations of weights can easily be incorporated into the model.

DEA was given its name because of the way it “envelops” observations in order to identify a “frontier” which is taken as reference to evaluate observations of all the entities that are to be evaluated. A performance productivity index is obtained from each evaluation against peers. The efficient DMUs serve as benchmark to effect improvement in the inefficient firms. For a given set of DMUs, DEA forms an efficient frontier by joining the most efficient DMUs. The efficiency value or productivity index of each DMU is measured using the relative distance projection toward the frontier. DEA uses the weighted sum of the outputs to the weighted sum of the inputs to determine the performance between DMUs. Weights are the decision variables and are determined to give each DMU the highest efficiency value. A linear program is run for each DMU to see how efficient it is compared to others. The efficiency ratio identifies the sources and amounts of inefficiency in each input and output for every DMU. In a typical DEA, each DMU will get an efficiency score between 0 and 1, with a higher efficiency score being more preferable.

DEA models

The Charnes, Cooper and Rhodes (CCR) DEA model (1978) is a linear program which determines the efficiency of each DMU. This basic model lets $x_{i}$ and $y_{q}$ represent the inputs and outputs, where $i = {1,2,…, I}$ and $q ={1,2,…, Q }$ represent specific inputs and outputs. Also, $u$ and $v$ represent output and inputs weights respectively. Efficiency is obtained using the ratio given below:

$Efficiency = =$ =∑i=1Ivixi∑q=1Quqyq

In this equation, the weights are the decision variables and they are obtained by solving a linear program for each DMU. Assume there are $$j ={1,2,…J}$ DMUs. For $DMU_{jo}$$, which is the test DMU, the linear programming is as follows:

$Max z_{jo} = {$

$Subject to:$
$\sum_{i = 1}^{I} v_{i j_{o}} x_{i j_{o}} = 1 (e q . 2)$

$${q = 1}^{Q}u{q}y_{qj} - {i = 1}^{I}v{i}x_{ij} <= 0 ; j = 1,2,.. J (eq. 3)$$

$v_{i}, u_{q} ≥ 0 ; i = 1, 2, …, I_; q = 1, 2, .. Q (eq. 4)$

Where:

$y_{qj_{o}}$ is the qth output of $DMU_{jo}$

$u_{qj_o}$ is the weight of the qth output for $DMU_{jo}$

$x_{ij_{o}}$ is the ith input for $DMU_{jo}$

$v_{ij_{o}}$ is the weight of ith input for $DMU_{jo}$

$y_{qj}$ and $x_{ij}$ are the qth output and the ith input, respectively, for $DMU_{j}$ , $j_ = 1, 2, …, _J$ .

The CCR DEA model formulation determines objectively the set of weights, $u_{q}$ and $v_{i}$ which maximizes the efficiency of $DMU_{jo}$ . The objective function (1) indicates an output oriented approach where input levels are kept constant while output increase is obtained. Equation (2) requires the weighted inputs of $DMU_{jo}$ to equal 1 and Equation (3) require the weighted efficiency of each DMU does not to exceed 1, whereas the weights, $u_{q}$ and $v_{i}$ must be non-negative.

Software

DEA libraries are currently available through major software packages. In R, for example, we can find the deaR library which applies classic models such as the one above as well as BCC, Multiplier, Directional function and non-radial models (more on these in future posts). Be aware that most of the models available through these packages are generic and, while they may provide good start, real applications usually involve complexities that need more advanced models (see sections below). Therefore, obtaining advise from a subject matter expert would be the best bet to apply DEA for specific situations.

You may want to see how this model is actually applied for Warehouse Benchmarking here!
For a more advanced application in the area of sustainable procurement, go here!

Weight restrictions in DEA

DEA models work under a “benefit of the doubt” weighting scheme (Kortelainen 2008) which implies that DEA does not require prior information about the weights of different inputs or outputs other than they are nonnegative. Therefore, it is a non-parametric approach.

The disadvantages of assigning weights to criteria are particularly evident when evaluating sustainability and corporate social responsibility. Research has pointed out that no universally agreed-upon weights or prioritization of social or environmental issues can exist as stakeholder attributes (e.g., stakeholder composition, perceptions and preferences) could change over time. This holds even for a specific stakeholder group, where current weight elicitation encounters great difficulties when evaluating less tangible goods, such as clean air and noise (Chen & Delmas 2011; Kuosmanen & Kortelainen 2005). For these reasons, a non-subjective weighting method as DEA would seem a better option.

MCDM and DEA

In both DEA and Multiple-criteria Decision Making (MCDM), Pareto optimality is an important consideration. However, it is determined in a different way. A dominated alternative in DEA is one that underperforms with respect to a virtual alternative formed by the linear combination of the alternatives that will generate the most outputs for a given set of inputs. In MCDM, on the other hand, an alternative i is dominated when its criteria values are as good as those of another alternative k and for at least one criterion, alternative k is better than i. In other words, while in MCDM, each alternative is described by its performance on each of the criteria, the DEA performance is estimated based on the relative performance with respect to the best combination of other options. That is, MCDM starts with the purpose of choosing a specific course of action, while DEA starts with the purpose of evaluating relative efficiency.

DEA approaches

Traditional DEA models assume inputs and outputs to be strongly or freely disposable. Strong disposability states that if any input is increased, output does not decrease. However, the seminal work of (Koopmans 1951) pointed out that this may not be true in real world applications (e.g. production processes are always accompanied by the generation of by-products such as pollution or waste which are costly and, therefore, “undesirable”). Based on this view, pollution is a form of economic waste, a sign that resources have been used incompletely, inefficiently or ineffectively. Accordingly, several approaches for incorporating undesirable outputs in DEA have been proposed such as Zhou et al. (2007). Scheel (2001) (see also Lu et al. 2007; Saen 2010) classifies them as direct and indirect approaches:

Indirect approaches transform the values of the undesirable outputs by a monotonically decreasing function (i.e.: whenever , then such that the transformed data can be included as “normal” (desirable) outputs in the technology set T. The only information needed for indirect approaches is whether the data have to be minimized or maximized. One more argument in favor of the indirect approach is that both inputs and undesirable outputs incur costs for a DMU and DMUs usually want to reduce both types of variables as much as possible (Yang and Pollitt 2010).

Direct approaches, on the other hand, use the original data but modify the assumptions about the structure of the technology set in order to treat undesirable outputs appropriately. The assumption is that it is impossible to reduce undesirable outputs without reducing desirable outputs (level of production activity) at the same time (i.e. weak disposability) holding inputs constant (Fare et al. 1989). Seiford and Zhu (2002) presented an approach dealing with undesirable outputs in the DEA framework in which efficiency was improved by increasing desirable outputs and decreasing undesirable outputs. Azadi and Farzipoor Saen (2012) proposed a model for considering imprecise data and undesirable factors. The current understanding is that the DEA model should credit DMUs for their provision of desirable or marketable outputs and penalize them for their provision of undesirable outputs. As such, when formulating desirable outputs and undesirable outputs given a certain amount of inputs, the more desirable outputs and the less undesirable outputs, the higher the final efficiency.

Requirements for carrying out DEA assessments
The DEA method has extensive data requirements and is sensitive to the data used (Kuosmanen and Kortelainen, 2005), as discussed below (Thanassoulis 2003; Cooper et al. 2006).

Data Scaling: A justification for data scaling in DEA is that DEA weights are inversely proportional to the levels of the inputs and the outputs which will make the weights take very low or negligible numerical values when the inputs/outputs take very large values. Hence, it is sometimes suggested that the right hand side of the normalization constraint be set to a value larger than 1 to further scale up the weights.

Isotonicity: Wherever possible, (in an output oriented DEA model) we are looking for variables for which larger output levels reflect better efficiency. Isotonicity requirements fail when, for example, an increase in any input results in the decrease of an output resulting in a more efficient unit the lower the output value. An example is given when using as output the criterion “number of service complaints”. Here, if the input is “number of workshops given to improve quality service”, the estimated result is a decrease in complaints. Therefore a better alternative is to select “number of service occasions which did not lead to a complaint”. Correlation analysis is the usual approach for testing isotonicity. High input-output correlations will imply that data is suitable to be analyzed through DEA.

Zero input-output levels: For the DMUs to be comparable they all need to operate the same technology or face the same options of transforming inputs to outputs. If this was not the case, the assessment would likely show DMUs having zero input levels, artificially more efficient than they really are. No similar problem arises with having zero output levels.

Data accuracy: Inaccurate data of a DMU can have an impact depending on whether it renders incorrectly the DMU Pareto-efficient or Pareto-inefficient. An “inaccurate DMU” which has been appropriately made Pareto-efficient will cause the efficiencies of other DMUs for which it is an efficient peer to be underestimated. When the inaccurate DMU has been inappropriately deemed Pareto-inefficient, the DMUs which would have otherwise had it as their efficient peer may show larger efficiency ratings than would be justified.

Number of Input and Output items: Generally speaking, if the number of DMUs (J) is less than the combined number of inputs and outputs a large portion of the DMUs will be identified as efficient and efficiency discrimination among DMUs will be questionable. The usual rule of thumb is to choose the number of DMUs so that :
Since the selection of input and output items is crucial for the successful application of DEA, it is generally recommended starting with a smaller set of input and output items and gradually enlarging the set to observe the effects of the added items.

DEA applications

Since the work of Charnes et al. (1978), DEA has rapidly grown and seen applications in a wide variety of fields.

Macro-level analysis: DEA applications have typically addressed macro-level analysis, where the usual approach is from a policy perspective, rather than on managerial and competitive dimensions: Country productivity evaluations (Kortelainen 2008b; Mohamad and Said n.d.); industrial site emissions (Song et al. 2014; Tyteca 1997); road transportation (Kuosmanen and Kortelainen 2005); bank efficiency (Kuchler 2013); telecommunications (Samoilenko and Osei-Bryson 2013); food manufacturing (Egilmez et al. 2014); nursing homes and university departments (Wang and Chin 2010).
Particularly, since 1980s, DEA has been accepted as a major frontier technique for benchmarking in the energy sectors in many countries (Sarkis and Cordeiro 2012). Zhou et al. (2008) reviewed about 100 articles with DEA applications in this field finding that very few studies focused on the firm level. This confirms that, although it has proven quite useful for a number of managerial decisions, the use of DEA for performance evaluation as an internal managerial tool at firms has been limited (Sarkis and Talluri 2004).
Procurement: Weber and Desai (1996) were the first to propose a GDEAVE (Generalized DEA for Vendor Selection) assuming a single product sourcing problem, where vendors were compared with respect to their efficiency criteria. Other studies (e.g.: Talluri 2004; Talluri and Sarkis 2002; Talluri and Baker 2002; Narasimhan et al. 2006; Talluri and Narasimhan 2003) followed the application of DEA for general supplier-selection problems. These models have primarily targeted key limitations of DEA including: unrestricted weight flexibility, inability to rank or discriminate among efficient units and inappropriate benchmarks, homogeneity and accuracy assumptions.
Sustainable Procurement: The number of studies using DEA for environmental evaluations of suppliers is very limited. We found only six articles which specifically address green supplier evaluation (Kuo et al. 2010; Wen and Chi 2010; Amindoust et al. 2012; Zhe et al. 2013; Kumar et al. 2014; Bai and Sarkis 2014). In addition, other articles (e.g.: Shabani et al. 2013; Lee and Farzipoor Saen 2012; Chen and Delmas 2011) describe methods that can be easily extended to the purchasing function but which not necessarily consider criteria that might be relevant. Likewise, due to the little attention given to the topic, it is to be expected that only few modeling features (e.g. input-output factor interpretation, common weights, criteria set reduction) and not very sophisticated modeling approaches have been transferred from converging streams of literature (i.e. from eco-efficiency and vendor management literature). Here is some of the work related to this type of application:
- Kuo et al.(2010) used the Delphi method to identify environmental criteria for evaluation. Then, they developed a green supplier selection model, called ANN-MADA hybrid method, which integrated artificial neural network (ANN) and two multi-attribute decision analysis (MADA) methods: data envelopment analysis (DEA) and analytic network process (ANP). ANN was used to reduce the number of criteria based on the prediction of criteria with highest performance values. Thus, reducing the number of inputs required for an effective implementation of DEA. Analytic Network Process (ANP) was used in the determination of criteria weights. Their approach showed comparative advantages against more simple models that integrated only ANN or ANP with DEA.
- Zhe et al. (2013) proposed the use of the common weights DEA model originally developed by Liu (2008) to improve efficiency discrimination in the context of supplier selection. They considered traditional productivity factors and environmental criteria within the set of inputs and outputs.
- Wen and Chi (2010) combined AHP with DEA method for small and large datasets of suppliers. DEA assurance region method was used to incorporate the decision maker´s opinion and improve the discriminating power of the model. They considered traditional and green performance inputs and outputs. They used the model proposed by Kao (2006) which considered exact, ordinal and interval data.
- Amindoust et al. (2012) proposed the basic CCR model to select suppliers in the Malaysian Electrical and Electronic Industry.
- Kumar et al. (2014) introduced a Green DEA (GDEA) model which built on existing DEA models with weight restrictions and considered carbon footprint as dual role factor. Unlike other approaches, GDEA incorporated heterogeneous suppliers. Their model was validated through an Indian auto parts manufacturer.
- Bai and Sarkis (2014) used neighborhood rough set and DEA. Neighborhood rough set was used to reduce the large number of performance measures that existed when adding the sustainability dimension for supply chain management and integrate (super-efficiency slack based) DEA for the determination of a supplier eco-efficiency measure.
- Lee and Farzipoor Saen (2012) propose DEA as an approach to measure corporate sustainability management which may also be adjusted for supplier monitoring and measurement. They consider the dual-role factor model introduced by Cook et al. (2006) and incorporate cross-efficiency estimations based on Wang and Chin (2010). They model corporate sustainability from a financial perspective. Their approach is based on the fact that in some situations it may not be clear for decision makers whether a factor is an input or an output. In their example, tax benefit is the dual role factor since better sustainability performance can increase donations for tax benefits while increase in donations for tax benefits may bring better sustainability performance. Cross-efficiency is used to allow better comparability among DMUs.
- Shabani et al. (2013) apply DEA to the general case of selecting eco-efficient technologies which by de facto are assumed to be heterogeneous. Undesirable outputs are considered in their research. Although not explicitly directed to supplier evaluation, their methodology could well be applied for the case of evaluating suppliers offering different technologies (e.g. as in the case of indirect materials or when comparing suppliers using different processes.
- Chen and Delmas (2010) apply the Cook and Zhu (2006) imprecise DEA model in the measurement of corporate social performance (CSP) using the KLD database as reference, considering “concern” criteria as inputs and “strength” criteria as outputs. Selection of environmentally sustainable supplier alternatives through DEA has also been explored from the design phase in the work of Lin and Okudan (2010), who used interval and assurance DEA models for pre-screening suppliers for conceptual design.
Although the aforementioned work provide comprehensive approaches to supplier evaluation, the solutions developed are still very limited with respect to the modeling of features relevant to the sustainability of the purchasing function. For instance, the usual focus is on eco-efficiency measurement, where the definition of this term is not quite clear in relation to existent eco-efficiency definitions. In addition, the proposed models do not provide ways to translate eco-efficiency estimations into practical procurement decisions such as order allocation. There is also a recurring use of weight schemes for assigning criteria priorities. Although, they are helpful for managerial purposes, we consider that, given the characteristics of sustainable criteria discussed before, the use of weight values derived from the model itself is better for an objective criteria weight estimation. Finally, none of these studies make a dynamic assessment of supplier performance over time or prioritize orders based on the most progressive suppliers. We addressed these different issues in our article Use of interval data envelopment analysis, goal programming and dynamic eco-efficiency assessment for sustainable supplier management.