Lee Merkhofer Consulting Priority Systems
Implementing project portfolio management

"A common misunderstanding about weights is that they convey the relative importance of objectives."

Assessing Weights

My step-by-step process for constructing a project selection decision model calls for weights to be assessed after:

  1. Portfolio objectives and their corresponding performance measures have been defined,
  2. The independence condition needed to justify an additive (or partially additive) form for the value function has been checked and verified,
  3. Single attribute value functions have been defined for each performance measure,
  4. A decision has been made regarding whether to model risk, and if so, a relative risk aversion (exponential) utility function has been adopted, and,
  5. A consequence model has been defined to simulate the impact on organizational performance of doing versus not doing projects,

As reflected in Figure 36, specifying weights necessarily comes after nearly all of the other significant choices about model design have been made.


Steps for creating a project selection model

Figure 32:   Steps for creating a project selection decision model.


Understanding Weights

The term "weights" refers to the wi factors in the additive form of a multi-attribute value function [1]:

V(x1,x2...xN) = w1V1(x1) + w2V2(x2) ... + wNVN(xN) Eq.1

The use of this term may contribute to the common misunderstanding that the wi factors define the relative importance of the objectives and their corresponding performance measures. If, for example, an organization designs a multi-criteria prioritization model with a weight on public safety that is half as large as the weight assigned to net revenue, observers may assume the organization regards public safety as being half as important as profits. The relative value of weights does not support such a conclusion. As shown below, in addition to the relative value attributed to obtaining improvements in performance measures, weights depend on the ranges defined for their assessment [2]. Even though conclusions about the meaning of weights are often based on misconceptions, the fact that such misunderstandings can easily arise underscores the need for taking care when designing the model and when explaining its logic.

Weights as Scaling Factors

As described previously, it is customary to scale (normalize) single-attribute value functions Vi(xi) to go from zero to one [3]. The single attribute value function Vi(xi) converts performance relative to the i'th objective into the relative value of that level of performance. Regardless of the shape of the Vi(xi) functions, the values assigned to the worst and best performance levels are zero and one, respectively.

As indicated by the form of Equation 1, the wi weights serve as scaling factors that allow numbers indicating the relative preference for the performance levels obtained from the single-attribute value functions to be compared with one another and summed [4]. If, for example, performance measure xA has a weight wA that is twice the weight wB for measure xB, then this should be interpreted as meaning that the decision maker values an increment of 0.1 value points on performance measure xA, the same as a 0.2 value point increment for performance measure xB. The wi weights control the sensitivity of total value to changes in the various areas of performance represented in the model. If wi is made smaller, changes in the levels of performance xi will have less significance for determining total value. Conversely, it wi is made larger, total value will be made more sensitive to projects that impact performance in this area.

Weights as Swing Weights

Scaling the multi-attribute value function V( · ) and the single-attribute value functions Vi(xi) in the normal way means that, if xi and xi+ are the worst and best outcome levels for the i'th performance measure, respectively, and if x and x+ are the worst and best outcome bundles, respectively, then:

Vi(xi) = 0 , Vi(xi+) = 1,  i = 1, 2, ..., N
V(x) = 0 , V(x+) = 1

With this normalization, the weights must sum to one:

Additive value function Eq.2

Now suppose there are two outcome bundles denoted x1 and x2 that are identical for every measure except the i'th. The i'th measures for the two bundles are set equal the worst and best outcomes for that measure, respectively. The performance levels for the measures other than the i'th, designated x#, are at arbitrary levels but are the same for each bundle (the levels can differ from measure to measure, but whatever the levels are for one bundle they are at the same levels for the second bundle), Expressing these assumption mathematically:

x1 = [xi = xi;   xk = x#;   k ≠ i]
x2 = [xi = xi+;   xk = x#;   k ≠ i]

With the definitions as given, going from x1 to x2 is the swing from the worst to best outcome for the i'th performance measure (other performance measures remaining unchanged). The value increase associated with this swing is:

V(x2) - V(x1)  =  wiVi(xi+) - wiVi(xi)  =  wi

This result shows that each scaling factor wi has a very specific interpretation; namely, wi is the relative value (value expressed on a zero to one scale) of obtaining a swing from the worst to best outcome on the i'th performance measure [5]. For this reason, weights are more precisely termed swing weights; they may be obtained by estimating the value of swings from worst to best outcome levels for the performance measures.

Weights as Tradeoffs

Consider two outcome bundles where the outcome levels for every performance measure but two, the i'th and j'th, are the same. For the first of the two outcome bundles, labeled a, the i'th performance measure is at its best level and the j'th performance measure is at its worst level. The other performance measures are at some arbitrary, common level x#:

xa = [xi = xi+; xj = x; xk = x#, for i, j ≠ k]

For the second outcome bundle, labeled b, the i'th performance measure is at some special outcome level xi, and the j'th performance measure is at its best level, with the other measures set at the common level x#:

xb = [xi = xi; xj = x+; xk = x#, for i, j ≠ k]

Now consider the swings that occur for the individual performance measures if there is a swing from performance bundle xa to the performance bundle xb. The j'th performance measure swings from its worst level to its best level, so there is a gain in value equal to wj. Meanwhile, the i'th performance measure swings from its best level to the level of xi. This swing can be decomposed into two parts, a swing from the best level to the worst level, which produces a loss in value of wi, and a swing from the worst level to the level of xi, which, since Vi(xi) is zero, produces a gain in value equal to Vi(xi). Thus, the net value of the swing from xa to xb is wj - wi + Vi(xi).

Suppose now that the level xi is adjusted so that the values of the two outcome bundles are equal. Then, we must have:

wj = wi -Vi(xi)

Dividing each side by wi:

wj/wi = 1 - Vi(xi)/wi   or, inverting   wi/wj = 1/[1 - Vi(xi)]

This equation yields a formula for the increase in the i'th measure needed to compensate for a decrease in the j'th measure. If the decision maker estimates the fraction of the swing for measure xi that is of equal value to the swing for measure xj, we can generate equations that relate the weights for the performance measures [6].

To illustrate, suppose, for convenience, that the decision maker starts by ranking the performance measure swings from most valuable to least valuable. Then, the decision maker specifies the fraction or percentage of the top ranked swing that would have equal value to the second ranked swing. Designate the proportion of the top-ranked swing that equals the second ranked swing p12. Then, if w1 and w2 are the weights for the swings ranked number one and two:

w2 = p12w1

Likewise, if the decision maker estimates the proportion of the top ranked swing that is of equal value to the third ranked swing, designated p13:

w3 = p13w1

Continuing in this way, N - 1 equations can be obtained for the N swing weights. To solve for the weights, you can assign "1" to the first, compute a value for each of the other wi, and then use Equation 2 to normalize the weights to sum to one.

Consistency Checks

Additional equations relating weights can be generated to provide consistency checks [7]. For example, if the decision maker estimates the proportion of the second ranked swing that is of equal value to the third ranked swing, denoted p23, then, for consistency we should have,

p13 = p12p23

Consistency checks can provide a sense for the reliability of the judgments being provided by the decision makers who are the subjects for the weight assessment process. If serious inconsistencies are identified, they can be pointed out so as to allow decision makers to reconsider their judgments and resolve the inconsistencies.

Weights as Measures of "Importance," Not

Another concept often used to assign weights is "importance"—the more important the objective (or its performance measure) is judged to be, the larger the weight that is assigned to it. Assigning weights based on importance provides the opportunity to specify weights at every level in the objectives hierarchy, not just weights for the lowest-level objectives. With importance weights, each level of the hierarchy has its own weights, and at each level those weights sum to one (or one hundred, if the weights are expressed as percentages). With importance weights, the weight for each intermediate objective equals the sum of the weights of its sub-objectives (the connected objectives directly below it in the hierarchy). An objective with sub-objectives will thus have a "category weight," a weight indicating the combined importance of all of its sub-objectives. Conveniently, importance weights can be specified from the top down, with the weight for an objective at any level being apportioned across its lower-level, sub-objectives.

Despite these attractive features, you should avoid the temptation to use importance weights. Importance weights have no theoretical or operation grounding. How do you measure importance? A decision maker might say, for example, that health is "X" times more important than money, but what is the amount of health or money to which that statement applies [8]? Avoiding a fatality is certainly more important than saving $1,000, but is avoiding a cold more important than $10,000?

In addition to the theoretical problems for using importance weights, there are practical problems. Tests, for example, show that weights assigned based on feelings of importance have poor repeatability, meaning that the same decision maker will assign different importance weights to a performance measure at different points in time [42]. Of equal concern, importance weights violate the range sensitivity principle.

The Range Sensitivity Principle

As described above, each wi weight in the normalized, additive value function (Equation 1) equals the value, as determined by the decision maker, of a specified swing in performance measured by the xi performance measure. Typical advice for specifying a swing is to choose either a "local" range or a "global" range. A local range for a performance measure is typically defined as a range that spans the levels of performance seen in the current set of alternatives. The global range is larger, it is typically defined as a range the spans the worst and best levels of performance that are theoretically possible. In truth, weights may be estimated based on any performance range that might be defined. The selection of the range for normalizing the value function is a choice that may be made at the time the model is being designed. Regardless of how ranges in performance are selected, it is necessarily true that making the range for a swing smaller will typically cause its weight to be smaller. Conversely, making the range for a swing larger will typically yield a larger weight. This result is known as the range sensitivity principle.

Importance weights, because they are assigned independent of any specified swing in performance, will, obviously, violate the range sensitivity principle. In fact, though, tests show that the range sensitivity principle is violated to a lesser or greater degree for nearly every weight assessment method and nearly every subject. Why? One theory is the anchoring and adjustment bias described in Part 1 of this paper. When asked to specify the value of a swing, people initially think about the underlying objective. People have many years of intuitive experience thinking about the importance of objectives. Those initial thoughts may create anchors, and, like most anchors, the adjustments away from them tend to be too small. Thus, people have trouble adequately adjusting weights according to the postulated swings. Nearly every study reported in the literature indicates that the range sensitivity principle is violated, at least to a small degree, but quite often significantly so. The unavoidable conclusion from the literature is that normal people do not and cannot adequately adjust weights to be in accordance with the range sensitivity principle.

Methods for Assessing Weights

A surprisingly large number of methods have been proposed for assessing weights, many of which are simply minor variants of one another. However, as discussed more below, even small procedural differences have been shown to sometimes have important consequences [9]. The proposed methods differ both in terms of the nature of the questions posed and the interpretations given to the judgments obtained. The methods can be categorized as direct, indirect, tradeoff, ranking, pairwise, interval, holistic, and others. The tables below summarize some of the more popular methods useful for obtaining weights for project selection models. Be aware that in cases where original definitions require estimations of "importance, I've rephrased the instructions to seek judgments of the desirability of swings. Computer programs are available for guiding many,if not all, of these methods.

Assume for the application of the methods that "worst" and "best" levels of performance have been defined for each measure, thereby allowing swings in performance to be specified. Nearly all of the proposed methods advise that weights should be normalized so that they sum to one, so assume, unless indicated otherwise, that such normalization is the final step in the process for each method.

Direct Methods

Direct weight assessment methods seek direct estimates of weights from decision makers based on weights being defined as the judged value of specified swings in performance. The main direct methods for obtaining weights are rating and point allocation.


Direct Assessment Methods Assessment Process Comments
Direct rating [6]
(Swing weighting is direct weighting with emphasis on swings). It establishes a relative value per unit of change in the measure or scale used to quantify performance.
1. Rate each swing in terms of desirability using a 0-to-100 point scale.
2. Assign 100 points to the most desired swing and use it as a benchmark for rating the subsequent, less desirable swings.
Quick, easy, and generally liked by subjects. Tests show assignments have a relatively high degree of repeatability, though (like most other direct methods) may tend to overweight swings of lesser importance [9] The implicitly of the method may cause some not to take the method seriously.
SMART
(Simple Multi-Attribute Rating Technique)
[10]
1. Identify the least desired swing and assign 10 points to it.
2. Rate the remaining criteria relative to the least desired, assigning at least 10 points to each (e.g.,"By how many percentage points does the value of this swing exceed the value of the least valuable swing?")
Because no upper limit is specified, there is more potential for assessment errors leading to poor repeatability of ratings. Whether to use the least or most desired swing as the benchmark should depend in part on which swing feels more natural or easier for decision makers to imagine.
Point allocation Distribute 100 points among the swings based on desirability Can be time consuming, since changing the allocation for any one weight requires adjusting all of the others. That the method forces repeated reexamination of initial views may be regarded a positive characteristic.

Graphical and physical aids are often used to facilitate direct rating. For example, point allocation can be facilitated by providing decision makers 100 poker chips to be allocated among the swings. Direct rating may be aided by drawing a line on graph paper between points representing the values assigned to the least and most desirable swings. The decision maker then places tick marks along the line indicating the relative values of swings that lie between the values that bound the line. The values corresponding to the locations of the tick marks can then be obtained either graphically or with a software program designed for this purpose.

A strength of direct methods is opportunity to stress that assignments be based on the valuation of swings, which can help reduce the tendency subjects have to be biased by notions of the relative importance of objectives.

Tradeoff Methods

Tradeoff methods require decision makers to provide judgments that establish relationships between pairs of weights.


Method Assessment Process Comments
Ratio method [6, 13]
1. Rank swings by desirability,
2. Choose the least desired swing and assign it 10 points,
3. Assign points to each subsequent swing as a multiple of 10 based on its judged ratio of desirability to the desirability of the least desired swing.
A variation is to assign 100 points to the most desired swing and then to assign a lesser number of points to every subsequent swing based on the ratio of its desirability to that of the highest value swing. Either way, many individuals find expressing weights as ratios is a natural way to think about weights.
Tradeoff method [5]. Similar to the ratio method except that tradeoffs are expressed at the level of performance units rather than swings.
1. Choose two performance measures and identify the one with more valuable unit of performance.
2. Determine the number of the less desired unit equal in value to the more desired unit.
3. Similarly determine per unit conversions for other measures.
4. With this approach, the per unit conversions are interpreted as the weights (so, there is no normalization).
Interpreting weights in terms of conversions works well for many people, and forces thinking in terms of swings rather than importance. Also, so long as the single attribute value functions are linear, the model is simpler and easier to explain. As shown by the example below, weights allow measures of performance to be directly translated to equivalent monetary values. Note, though, that the number of necessary assessment questions increases rapidly with the number of performance measures. Studies suggest that methods based on specifying ratios seem to give more weight than appropriate to the swings in performance that are judged most valuable [23].
Equivalent cost (Willingness to pay [5], Cost benefit analysis) A special case of a tradeoff method where the unit for comparison is monetary. Estimate the equivalent economic values of swings or per unit of the performance measure used to express the swing. As noted above, value must be linear in the units of swing. The method allows using CBA results where available. May lead to higher weights for measures that are more naturally expressed in dollar units [14}. Using CBA results may be more attractive to decision makers than having to make controversial value assignments, such as for health and the environment.

In ratio and trade-off methods, performance swings are considered in pairs and presented to decision-makers as contrasting outcomes that differ only in the performance measures under consideration. If helpful, comparisons between the value of the outcome swings may be expressed in percentages.

The method wherein tradeoffs are expressed at the level of performance units requires performance measures to be continuous and single attribute value functions must be linear. As an example of how the method works [15], suppose a computer manufacturer must choose between two product designs: One is less costly and the other will get to market sooner. The performance measures are cost per unit and months to market. The method requires the decision maker to first choose the more valuable unit of performance. For example, the judgment might be that reducing time to market by one month is more valuable than reducing the cost to produce each computer by $1. Then, the decision maker must determine the number of the less desired unit equal in value to the more desired unit. For example, it might be estimated that lowering cost per unit by $15 is equally desired to getting to market one month sooner. Per unit conversions for other performance measures might be similarly obtained, so the method is well-suited to situations where it is desirable to express project value in equivalent dollars. Note that for this weight assessment method, the per unit conversions are interpreted as weights, so there is no normalization.

The equivalent cost method, also called pricing out is very useful, but requires that performance measures have been defined in terms of units familiar to decision makers with corresponding single-attribute value functions that are linear in those units (something you should usually aim for regardless). Estimates of equivalent monetary value may then be obtained based on willingness to pay ("How much would you be willing to spend to move the performance measure from its worst level to its best level of performance?").

Alternatively, equivalent monetary values can sometimes be obtained from available results from cost benefit analyses (CBA), academic research, government recommended values, and values being used by other organizations . CBA derives a monetary value per unit of a consequence based on market prices, contingent valuation (people's willingness to pay), and the hedonic price method (analyzing market prices to determine how factors impact market prices). For example, if reducing greenhouse gas emissions is an objective, and tons of greenhouse gas emissions released is the performance measure, then CBA results can used to obtain an equivalent dollar cost per ton of greenhouse gas released. Multiplying the per ton cost times the estimated number of tons by which emissions might be reduced provides an equivalent monetary value for emissions reductions.

Compared to direct methods, obtaining weights through tradeoff methods is more cognitively demanding for decision makers. Also, applying the method typically requires real-time computer support for calculations.

Indirect Methods

While direct methods demand precise estimates from decision makers that lead to precise weights, an alternative is to allow decision makers to express vagueness or uncertainties in responses to weight assessment questions. An example is provided by the so-called, balance beam method, described below.


Indirect Assessment Method Assessment Process Comments
Balance beam method [16] 1. Rank swings from most to least desirable,
2. Starting with the most desirable swing, express its desirability in terms of other swings by specifying equations. The equation may include equalities or inequalities using "=", "<", and ">" relationships as well as sums and multiples.
3. Try, to the extent possible, to generate equality (=) relationships. This will make it easier to solve the equations at the end.
4. Continue generating equations so as to have enough to solve for the N unknown weights.
5. Assign the lowest level swing a desirability of "1", then solve the equations from bottom to top.
6. Wherever a single value cannot be assigned, choose the average (middle) value for it.
Sample balance beam relationships The balance beam method allows decision makers to express relationships among the weights as equations which may then be solved to obtain a set of weights for the model. The method continues to work well even for large numbers of performance measures. Importantly, the method can be used in situations where decision makers are reluctant to specify precise values for swings. Because any number of equality or inequality relationships can be generated, the method can provide consistency checks.

Indirect methods for weight assessments may in some situations be easier for decision makers than direct methods. Also, some argue that attempting to put a precise value on an inherently imprecise concept is inappropriate, misleading, and conveys a false sense of precision. Since the equations allow for identifying ranges of values for some weights, the method coordinates well with sensitivity analysis.

Ranking (Ordinal) Methods

With these methods, decision makers need only rank the swings. Surrogate values for weights are then derived from the ranking.


Ranking Methods Assessment Process Comments
Rank order methods [17]. Variations use different formulas for converting rank positions into weights 1. Rank swings in order of desirability
2. Use a formula for converting each ordinal rank into a cardinal weight (In the equations below, N is the number of swings and ri indicates the rank achieved by the i'th performance measure swing.
a. For rank order centroid (ROC) [18], use
Rank order centroid weights
b. For rank reciprocal (RR) [17], use:
Rank reciprocal weights
c. For rank sum (RS) {18], use:
Rank order centroid weights
d. For rank exponent [19], use:
Rank reciprocal weights
The parameter z affects the dispersion in the weights.
Ranks are much easier and quicker to generate than weights, and it is easier to get people to agree about ranks, so these methods work well for groups [20]. However, results capture decision maker judgments with less precision than direct or tradeoff methods because they ignore any information the subject might provide beyond how the swings rank [18].
The equations for converting ranks to weights have been criticized for producing weights that, if plotted from high to low, drop off more quickly than weights obtained from direct methods—In other words, the weights for lower ranked swings may be too low. The rank-sum weights are typically flatter than ROC, rank reciprocal, or ratio weights. At least one study found rank reciprocal to be more accurate than rank sum [18], but another suggests that if knowledge of weights is highly limited, then rank sum weights are better [21]. The z parameter in the equation for rank exponential weights controls how quickly the plot of weights from high to low drops off. If z = 0 the weights are all equal. If z = 1 the result is rank sum weights. The rank exponent method can be thought of as a generalization of the rank sum method.
There is really no theoretical justification for any of the recommended conversion formulas, except, perhaps, for ROC weights, which derive from statistical arguments and the assumption that the probability distributions for "true weights" are uniformly distributed within regions defined by the rank positions.
An issue for these methods is how to deal with the judgment that two or more swings have equal rank. If the judgment that swings should be ranked equally can't be accommodated that is, obviously, a serious disadvantage.

Ranking methods are good options for situations where weights must be obtained quickly from many subjects who might be unwilling or unable to devote the greater effort needed to provide cardinal information for computing weights.

Of the conversion methods, ROC has gained most recognition. The weight assessment method proposed by Edwards and Barron, called SMARTER (SMART Exploiting Ranks), is ranking with ROC [11]. SMARTER is presented by the authors as an improvement to SMART because it does not force subjects to provide more difficult cardinal judgments.

Mixed Ordinal-Cardinal Methods

Direct methods for assessing weights demand from decision makers sufficient cardinal judgments to allow the computation of precise values for the complete set of weights. At the other extreme, ranking methods provide no opportunity for decision makers to provide cardinal judgments. Between these two extremes, mixed ordinal-cardinal methods allow decision makers an opportunity to input cardinal judgments while not demanding all of the cardinal judgments needed to compute a precise value for every weight.

Two examples of ordinal-cardinal methods are presented below: CROC (Cardinal Rank Ordering of Criteria) [24] and Simos' method {25]. The version of Simos's method described is sometimes referred to as the improved Simos method, because it includes an improvement provided by Figuiera and Roy [26]. A challenge for mixed ordinal-cardinal methods is providing an efficient mechanism for enabling decision makers to express cardinal judgments. Both CROC and the Simos method use visual cues for expressing cardinal judgments. CROC is facilitated by a software package that allows the user to express degrees of preference and associated levels of uncertainty using a slider. With Simos's method, users express degrees of preference using white and colored cards.


Mixed Methods Recommended Assessment Process Comments
CROC [24]
1. Rank the swings from most to least desirable
2. Assign 100 points to the most desired swing and a number of points to the least desired swing to indicate their relative degrees of desirability.
3. The software provides a graphic illustrating preference values for the swings and their uncertainties (weights are regarded as being uncertain since they may lie anywhere within the range defined by their ranks). Given rank position only, the ranges are assumed to be of equal sized regions located within the range defined by the minimum and maximum point values assigned in Step 1).
4. Using sliders, the user adjusts the location of each region along the line to better represent estimated levels of desirability. Likewise, the user adjusts the size or length of each region to represent the user's degree of confidence in that level of preference.
5. When the user is satisfied with the graphic depiction, weights are computed assuming that "true values" are uniformly distributed within their defined regions.
Sample CROC graphic The figure is similar to a display provided by the software as depicted in a paper providing an example application [24]. "Preference clouds" for the "capacity" and "accessibility" swings overlap, indicating the user is somewhat unclear about their relative preference ranks. In contrast, the clouds for "accessibility" and "cost" are distinct, indicating the decision maker is confident in his or her respective rank positions. If the user makes no changes to the preference clouds as initially depicted (i.e., the user enters no cardinal information), the calculated weights are identical to those produced by ranking with ROC.
SIMOS [25, 26] 1. Each subject is provided a set of colored and white cards for use in visually depicting degrees of preference. Each colored card has the name of a performance measure inscribed on it, along with its corresponding objective and a description of the swing defined for that measure. Blank white cards are also provided.
2. Subjects order their cards according to how desirable they perceive each swing, from least to most preferred.
3. Subjects clarify degrees of preference between successively ranked swings by inserting white cards between the colored cards. The number of inserted white cards is proportional to the magnitude of difference in preference for the considered swings.
4. Subjects answer the question "How many times more desirable is the top ranked swing than the lowest ranked swing?"
5. The positioning of the cards and the number of intervening white cards define a set of coupled algebraic equations for the weights. Specifically, number of cards and the ratio between the desirability of the top ranked swing compared to the lowest ranked swing defines a constant value difference, u, between consecutive cards. Each white card inserted between consecutive colored cards adds a preference increment, so that two white cards, for example, means that the preference difference is 3 × u. The equations are solved to provide precise values for weights, or, if desired, ranges of values can be provided for weights whose values can be varied while remaining true to the equations.
Simos's method, like CROC, has been applied to a wide range of project selection and prioritization problems. It's quick and participants generally like the visual aid provided by the cards. The method works well with groups of participants, either multiple decision makers or external stakeholders, who can be encouraged to discuss the results of ranking before proceeding to the more cognitively demanding task of inserting white cards to convey differences in the level of preference. Also like CROC, if subjects provide no cardinal Information (no white cards are interspersed into the ranking), the method provides ROC weights. The figure below depicts how cards may be arranged to reflect a subject's preferences.

Sample CROC graphic

CROC and the Simos method extend ROC ranking in a way that allows but does not force subjects to express weakly held or vague feelings about their preferences. Because cardinal information may be added to ordinal results produced by ranking, the methods have the potential to incorporate into the project selection model more accurate and comprehensive assessments of the preferences of decision makers. People seem to like communicating about preferences using the similar graphical means adopted by the two methods. Because the methods allow imprecise representations of preference, they can lessen the reluctance that some decision makers feel toward revealing their true preferences.

Pairwise Comparisons Using Scales Defined Using Qualitative Phrases

Pairwise comparison is a popular weight assessment method that has been shown, through many successful applications, to work well for most subjects. Subjects are able to discern and express relatively small differences in preferences using this method [29]. Each item is compared with every other item to determine which is preferred and by how much.

The usual approach to pairwise comparison calls for providing a scoring scale, with scale levels defined using common qualitative phrases like "slightly preferred", "moderately preferred", and so forth. Such scales are often referred to as semantic scales. It's clear that providing a semantic scoring scale with score levels defined using qualitative phrases can greatly ease the subjects' task of providing pairwise comparison judgments. While some decision makers will not be able to provide weights by direct assignment or point allocation, conducting weight assessments through a pairwise comparison method with a semantic scoring scale will almost always be successful.

Most of the example scoring scales that you'll find for pairwise comparisons are designed to obtain judgments of the relative "importance" of the pairs being considered. Accordingly, and rightly so, pairwise comparison methods are often criticized because they allow for different interpretations of scores due to lack of any quantitative basis for measuring importance. However, simply replacing in the scale definitions the term "importance" with "value",as I've done in these example, mitigates these criticisms. Estimating the relative value of alternative swings remains a difficult task, however, in the case of value the term has a precise meaning and there are natural measures for quantifying it. And, as argued on previous pages, value maps exactly and directly to preference.

Consistent with most descriptions of pairwise comparison, I've not included ranking as the first step for conducting the process. However, as I describe below, my approach is always to begin by asking decision makers to rank swings in order of desirability. Ranking is a relatively easy task and incorporating it into the assessment process provides an opportunity for decision maker's to discuss differing viewpoints prior to attempting the more difficult task of quantifying degrees of preference.


Pairwise Comparison Methods Assessment Process Comments
Pairwise comparison [36] PC scoring scale In its basic form, pairwise comparison is the ratio method with estimated ratios obtained as scores and organized into a matrix.
1. Select two swings and determine which is more valuable.
2. Using a scoring scale, select the score that best describes how much more desirable the more preferred of the two swings is.
3. Repeat the above two steps until all pairs of swings have been scored.
pairwise comparison matrix 4. Construct a pairwise comparison matrix
5. To get the weights, sum the columns in each row and then normalize.
Pairwise comparison methods require the decision maker to provide judgments expressed as preference ratios for pairs of items. The preference ratios are entered into a pairwise comparison matrix, D, such that element dij indicates preference for swing i relative to swing j. Only the top right side of the matrix needs to be filled in because, for consistency, dji must equal 1/(dij). The matrix D is called a positive reciprocal matrix.
Conducting more than the minimum number of comparisons (N-1) provides the opportunity to check the consistency of the decision maker's judgments. If the pairwise assessments are completely consistent, then
dij = dik·dki for all i, j and k. If possible, inconsistencies should be pointed out and resolved by the decision maker. Still, it is common to find in empirically derived comparison matrices inconsistencies. If inconsistencies are large, then one of the two methods below may be used to provide a "work around."
If the pairwise comparison matrix is consistent, calculating weights from pairwise comparisons is simple. The weights can be obtained by normalizing the entries from any row or column of D.
The main weakness of this version of pairwise comparison is the use of a semantic scoring scale. Different decision makers in different contexts will interpret the words differently. Also, if desirability (value) is not linear in the scores, as is likely the case for the illustrated scoring scale, errors will be introduced unless the single attribute value functions are properly defined to translate each score into the relative value of that score.
AHP [27] AHP nine point scales This is the technique for obtaining weights prescribed by the popular analytic hierarchical process (AHP), developed originally in the 1970's by Thomas Saaty.
1. Select two swings and identify the one that is more desirable
2. Assign a score indicating how much more desirable (important) the selected swing is
3. Repeat the above two steps until all pairs have been scored
4. Construct a pairwise comparison matrix
pairwise comparison matrix
5. Use AHP's eigenvalue method to obtain weights.
6. If inconsistencies aren't too large, you can obtain the weights as above for a consistent matrix—Use the average of the normalized weights obtained from each column (or row).
7. Compute the AHP consistency ratio. As a rule of thumb, if the consistency ratio is less than 0.10, then consistency in the pairwise comparisons should be regarded reasonable.
The central issue for pairwise comparison methods is how to deal with inconsistencies in the pairwise comparison matrix, D. AHP obtains weights by computing the principal eigenvector of D.
As with the example pairwise comparison method above, AHP restricts the pairwise comparison judgments by required subjects to select a score from a discrete scoring scale. In the case of AHP, a nine-point scoring scale is specified. As several authors have commented, restricting judgments to a nine point scale almost guarantees that pairwise consistency won't be obtained [30]. Moreover, the use of verbal terms in general may be criticised because words can have very different meanings for different people.
For most applications, subjects find having nine levels in the scoring scale is more than adequate for expressing judgments. Sometimes, the scale is simplified to include fewer scale levels. For example, a scale based on the descriptions for levels 1,2,4,7, and 9 is sometimes used to signal that less precision in responses is acceptable.
The referenced AHP consistency measure is based on a result from matrix algebra applicable to positive reciprocal matrices. A positive reciprocal matrix is consistent if and only if λmax = N, where λmax is the largest eigenvalue for the matrix and N is the matrix dimension. If the matrix is not consistent, then λmax > N.
Saaty selected as a basis for judging consistency the difference max - N). Specifically, Saaty's consistency index, denoted CI, is the ratio max - N)/(N-1). To measures consistency, the CI ratio for the assessed matrix is compared with the CI computed from randomly generated reciprocal matrices, denoted RI. As a rule of thumb, Saaty says that if CI/RI < 0.1, that a reasonable level of consistency has been obtained. If the ratio is greater than 10%, then the consistency level is poor and the decision maker's judgments should be revised [30].
Extremal pairwise methods [28]. This is a class of pairwise comparison methods that accommodate inconsistencies in the comparison matrix D by choosing weights that minimize the "distance" between the wi weights and the corresponding dij elements. Different ways of defining distance correspond to the different versions of the extremal methods.
The most popular is logarithmic least squares (LLS), which minimizes the square of the log of the distance between the selected weights and the pairwise comparisons [43]:
Weighted least squares method
Ensuring that the log of the wi weights sum to one corresponds with a multiplicative form for the normalization equation:
Weighted least squares method
This formulation has a particularly simple solution. The optimal wi weights are given by [37]:
Weighted least squares method
In other words, the set of weights that minimizes the log least squares distance can be simply obtained by:
a. Calculating the geometric mean of each row (or column) in the matrix, and then.
b. Normalizing by, calculating the sum of the geometric means and divide the geometric mean of each row by the sum [39].
Note that with the log least square method it is not the original pairwise comparison matrix which is approximated but, rather, the matrix whose elements are the log of dij.
As with AHP, a scoring scale is typically used to simplify the decision maker's task of providing ratio judgments for the comparison matrix. For example, a 1-to-10 scale similar to the following is often suggested:

Ten level scoring scale

Be aware that there is little justification for selecting weights that minimize the square of the sum of the logarithms of the distance other than with this measure of distance the solution is particularly simple to calculate.
The natural measure of distance is Euclidean distance, and this measures is used by the direct least squares (DLS) method:
Direct least squares method
The problem with this formulation is that the above nonlinear optimization problem is difficult to solve analytically and can have multiple solutions. A definition of distance adopted by the weighted least squares (WLS) method produces a convex, quadratic optimization problem: [40]:
Weighted least squares method
The above has a unique solution that can be found by solving a set of linear equations [41]:
Equivalent expression
Though the above two methods utilize a more natural concept of distance, the logarithm least squares (LLS) method described to the left trades away more natural ways of defining distance for an easy-to-obtain solution. In effect, instead of using the most natural approach for defining distance, a less-natural definition for distance is chosen to provide an easy to obtain solution.

Assessing Risk Tolerance

The assessment of weights involves determining the organization's willingness to tradeoff advancements in the achievement of its various objectives. If decisions about projects entail the level of risk faced by the organization, it is also necessary to establish the organization's willingness to tradeoff the expected achievement of its objectives in order to reduce risk. Obtaining the organization's willingness to accept risk requires determining the levels of risk involved in project choice decisions, whih requires having a model for simulating risk. Disscussion of the process for quantifying risk and the organiztions willingness to accept risk is the topic of the next part of this paper.

References

  1. J. S. Dyer and R. K. Sarin, "Measurable Multiattribute Functions," Operations Research, 27:810, 1979.
  2. G. W, Fischer, G.W., "Range Sensitivity of Attribute Weights in Multiattribute Value Models," Organizational Behavior and Human Decision Processes, 62(3), 252-266, 1985.
  3. C. L. Hwang and K. Yoon, Multiple Attribute Decision Making: Methods and Applications a State-of-the-Art Survey (Vol. 186), Springer Science & Business Media, 2012.
  4. E. U. Choo, B. Schoner and W. C. Wedley, W.C.,"Interpretation of Criteria Weights in Multicriteria Decision Making," Computers & Industrial Engineering, 37(3), 527-541, 1999.
  5. R. Keeney and H. Raiffa, Decisions with Multiple Objectives, J. Wiley & Sons, New York, 1976.
  6. D. von Winterfeldt and W. Edwards, Decision Analysis and Behavioral Research, New York: Cambridge Univ. Press, 1986.
  7. V. Belton and T.Stewart, Multiple Criteria Decision Analysis: An Integrated Approach, Springer Science & Business Media, 2002.
  8. M. W. Jones-Lee, G. Loomes, and P. R. Philips, "Valuing the Prevention of Non-fatal Road Injuries: Contingent Valuation vs. Standard Gambles," Oxford Economic Papers, 676-695, 1995.
  9. P. A. Bottomley and J. R. Doyle, "A Comparison of Three Weight Elicitation Methods: Good, Better, and Best," Omega, 29(6), 553-560, 2001.
  10. W. Edwards, "How to Use Multiattribute Utility Measurement for Social Decision Making," IEEE Transactions on Systems, Man and Cybernetics, SMC-7, 326-340, 1977.
  11. W. Edwards and F. H. Barron, “SMARTS and SMARTER: Improved Simple Methods for Multiattribute Utility Measurement,“Organizational Behavior and Human Decision Processes, 60, 306-325, 1994.
  12. T. Solymosi and J. A. Dombi, "A Method for Determining the Weights of Criteria: The Centralized Weights," European Journal of Operational Research,26, 35-41, 1986.
  13. M. Weber and K. Borcherding, "Behavioral Influences on Weight Judgments in Multiattribute Decision Making," European Journal of Operational Research, 67(1), 1-12, 1993.
  14. K. Borcherding, S. Schmeer, and M. Weber, "Biases in Multiattribute Weight Elicitation," Contributions to Decision Making,, Elsevier, Amsterdam, 1995.
  15. R. L. Keeney, R.L., "Common Mistakes in Making Value Trade-offs," Operations Research, 50(6), 935-945, 2002.
  16. S. R. Watson and D. M. Buede, Decision Synthesis: The Principles and Practice of Decision Analysis, Cambridge University Press, 1987.
  17. R. Roberts and P. Goodwin, "Weight Approximations in Multi-Attribute Decision Models," Journal of Multi-Criteria Decision Analysis, 11(6), 291-303, 2002.
  18. F. H. Barron and B. E. Barrett, "Decision Quality Using Ranked Attribute Weights," Management Science, 42(11), 1515-1523, 1996.
  19. J. Malczewski, I>GIS and Multicriteria Decision Analysis, John Wiley & Sons, 1999.
  20. R. T. Eckenrode, "Weighting Multiple Criteria," Management Science, 12(3), 180-192, 1965.
  21. J. Jia, "Attribute Weighting Methods and Decision Quality in the Presence of Response Error: A Simulation Study (Doctoral dissertation), The University of Texas at Austin, 1997.
  22. R. Roberts and P. Goodwin, "Weight Approximations in Multi-attribute Decision Models," Journal of Multi-Criteria Decision Analysis, 11(6), 291-303, 2002.
  23. G.W. Fischer, “Range Sensitivity of attribute Weights in Multiattribute Value Models,” Organizational Behavior and Human Decision Processes, 62(3), 252–266, 1995.
  24. M. Danielson, L. Ekenberg, A. Larsson, and M. Riabacke, "Weighting Under Ambiguous Preferences and Imprecise Differences in a Cardinal Rank Ordering Process, International Journal of Computational Intelligence Systems, 7(sup1), 105-112, 2014.
  25. J. Simos, "L'Evaluation Environmentale: Un Processus Cognitif Négocié (Doctoral dissertation), 1989.
  26. J. Figuiera and B. Roy, "Determining the Weights of Criteria in the ELECTRE Type Methods with a Revised Simos’ Procedure," European Journal of Operational Research, Vol. 139, 317-326, 2002.
  27. T. L. Saaty, Thomas L. (1994). "Fundamentals of Decision Making and Priority Theory with the Analytic Hierarchy Process, RWS Publications, Pittsburgh, USA, 1994.
  28. B. Golany and M. Kress, "A Multicriteria Evaluation of Methods for Obtaining Weights from Ratio-Scale Matrices," European Journal of Operational Research, 69(2), 210-220, 1993.
  29. J. van Til, C. Groothuis-Oudshoorn, M. Lieferink, J. Dolan, and M. Goetghebeur, "Does Technique Matter; A Pilot Study Exploring Weighting Techniques for a Multi-Criteria Decision Support Framework," Cost Effectiveness and Resource Allocation, 12(1), 1, 2004.
  30. J. Barzilai and F. A. Lootsma, "Power Relations and Group Aggregation in the Multiplicative AHP and SMART," Journal of Multi-Criteria Decision Analysis, 6(3), 155-165, 1997.
  31. T. L. Saaty, The Analytic Hierarchy Process, McGraw Hill, New York, 1980.
  32. G. L. Lee and E. W. Chan, "The Analytic Hierarchy Process (AHP) Approach for Assessment of Urban Renewal Proposals," Social Indicators Research, 89(1), 155–168, 2008.
  33. J. Barzilai, "Deriving Weights from Pairwise Comparison Matrices," Journal of the Operational Research Society, 48, 1226-1232, 1997.
  34. P. Meier and B. F. Hobbs, Energy Decisions and the Environment: A Guide to the use of Multicriteria Methods, Kluwer Academic Publishers, Dordrecht, 2000.
  35. H. F. Barron and H. B. Person, "Assessment of Multiplicative Utility Functions via Holistic Judgments," Organizational Behavior and Human Performance, 24, 147-166, 1979.
  36. B. Golany and M. Kress, "A Multicriteria Evaluation of Methods for Obtaining Weights from Ratio-Scale Matrices," European Journal of Operational Research, 69(2), 210-220, 1993.
  37. G. Crawford and C. Williams, "A Note on the Analysis of Subjective Judgment Matrices," Journal of Mathematical Phsychology, 29, 387-405, 1985.
  38. P. A. De Jong, "A Statistical Approach to Saaty's Scaling Method for Priorities," Journal of Mathematical Psychology, 28, 467-478, 1984.
  39. T. L. Saaty and L. G. Vargas, "Inconsistency and Rank Preservation," Journal of Mathematical Psychology, 28(2), 205-214. 1984.
  40. E. Blankmeyer, "Approaches to Consistency Adjustments," Journal of Optimization Theory and Applications, 54, 479-488, 1987.
  41. P. De Jong, "A Statistical Approach to Saaty's Scaling Method for Priorities" Journal of Mathematical Psychology, 28, 467-478, 1984.
  42. P. X. Nutt, "Comparing Methods for Weighting Decision Criteria," OMEGA, 163-172, 1980.
  43. G. Crawford and C. Williams, "A Note on the Analysis of Subjective Judgment Matrices," Journal of Mathematical Psychology, 29(4), 387-405, 1985.


When CBA data is available for monetizing project impact, my preference is to make that data available to decision makers, but to allow those decision makers to deviate from CBA values if desired.