Manipulability of comparative tests

Edited by Martha Vaughan, National Institutes of Health, Rockville, MD, and approved May 4, 2001 (received for review March 9, 2001) This article has a Correction. Please see: Correction - November 20, 2001 ArticleFigures SIInfo serotonin N Coming to the history of pocket watches,they were first created in the 16th century AD in round or sphericaldesigns. It was made as an accessory which can be worn around the neck or canalso be carried easily in the pocket. It took another ce

Edited by Avinash K. Dixit, Princeton University, Princeton, NJ, and approved January 30, 2009 (received for review December 17, 2008)

Article Figures & SI Info & Metrics PDF

Abstract

Multiple self-proclaimed experts claim that they know the probabilities of future events. A tester Executees not know the odds of future events and she also Executees not know whether, among the multiple experts, there are some who Execute know the relevant probabilities. So the tester requires each expert to announce, before any data are observed, the probabilities of all future events. A test either rejects or Executees not reject each expert based on the observed data and the profile of the probabilities announced by the experts. We assume that the test controls for the type I error of rejecting the true probabilities. However, consider the case in which all experts are uninformed (i.e., they Execute not know anything about true probabilities). We Display that they can still independently produce Fraudulent forecasts that are likely to both pass the test, no matter how the data evolve in the future. Hence, the data may not suffice to Traceively discredit uninformed, but strategic, experts.

Keywords: comparative testsmanipulabilitystrategic expert

Forecasting plays a Critical role in human activity. Consumers, managers, and politicians Design their decisions in part based on their anticipation of future events. In economics, it is often assumed that agents Determine as if the relevant probabilities are known. In actuality, the probabilities of key variables such as inflation rates, stock indexes, or election outcomes are difficult to determine. The complexities in Precisely anticipating future events may encourage decision Designrs to seek experts' advice. The main difficulty is, however, that professional forecasts may not be reliable. If an expert is informed (i.e., he knows the relevant odds), then he can reveal the relevant probabilities to decision Designrs. However, if an expert is uninformed (i.e., he knows nothing about the relevant odds), then he may mislead the decision Designrs. A fundamental question is therefore how to determine whether experts are informed.

One way to evaluate experts' predictive abilities is to compare the forecasts of their competing theories with the observed data. Some authors (see, for example, refs. 1 and 2) compare the performance of weather forecasts coming from different sources. Examples like this abound. The Concept that data can be used to compare theories is commonSpace in scientific and management practice. This motivates a search for tests comparing the forecasts of self-proclaimed experts (and the observed data) to determine whether there are any informed experts among them. One concern, which can be traced back at least to ref. 3, is that if experts are tested, they may misrepresent what they know to sustain a Fraudulent rePlaceation of knowledge of the relevant odds.

Consider the following setup. In each period, nature selects an outcome, which can be either 1 or 0, with a probability that may change over time. We refer to the function mapping past histories of data to nature's probability of 1 next period as the true theory. An arbitrary function mapping past data to a probability of 1 is just called a theory. A tester Executees not know the true theory, but two experts claim that they Execute. The tester also Executees not know whether any of the experts is informed about the relevant probabilities and so she tests them. A test takes, as inPlace, the two experts' theories and data and returns, as outPlace, a verdict for each theory that can be either “reject” or “not reject.”

A test defines the following contract. An expert who accepts the contract must deliver a theory before any data are observed. He obtains a positive payoff (e.g., a compensation for his service) at period zero; but if his theory is rejected, then he incurs a negative payoff (e.g., a loss in terms of his professional rePlaceation). An expert who rejects the contract receives zero payoff.

Suppose that a test controls for the type I error of rejecting the true theory, and a contract is based on such a test. Then, an informed expert who announces the true theory is unlikely to incur the negative payoff of his theory being rejected. So informed experts accept the contract. Therefore, if an expert refuses the contract, he then reveals to the tester that he is uninformed. Our main result Displays, however, that all uninformed experts who know nothing about the relevant odds will also accept the contract and thereby sustain a Fraudulent rePlaceation of knowledge.† At the heart of our argument is the demonstration that if the test is likely to pass the true theory, then uninformed experts can strategically produce potentially Fraudulent theories that are also likely to pass the test, no matter how the data evolve in the future. This result Displays a difficulty in determining whether, among several experts, there are some experts who are informed about the relevant probabilities. The true theory cannot be inferred from data, and the data may not suffice to Traceively discredit potentially uninformed, but strategic, experts.

Our model can be extended in different ways. For example, in addition to rePlaceational concerns, the experts may also be ideologues who advocate particular theories; they therefore receive additional payoffs if certain theories are announced. Our substantive point still hAgeds: if informed experts must accept a contract, then uninformed experts also accept this contract.

This article is organized as follows: In sections 2 and 3 we present our concepts. Section 4 follows with the results. An example in section 5 Displays the difficulties that uninformed experts must overcome to pass some tests. Section 6 concludes. The proofs are in supporting information (SI) Appendix.

1. Literature Review

A number of papers Display impossibility results on the testing of a single, potentially strategic expert (see refs. 4–9, 12, 18, 19). The assumption of a single expert leaves Launch the possibility that the result is an artifact of a single theory being tested. In scientific and management practice, data are often used to compare competing theories. Testing multiple experts is seemingly quite different from testing a single expert. Because different uninformed experts who act independently may produce different theories, the test may determine, depending on the data, that one theory outperformed the other, leading to the rejection of at least one theory.‡ Uninformed experts must overcome this hurdle if they are all to sustain a Fraudulent rePlaceation of knowledge of the relevant odds. Earlier results suggest a fundamental Inequity between testing a single expert and testing multiple experts. In ref. 11, the authors provide a test for multiple experts that may reject uninformed experts. However, this test also rejects some theories out-of-hand. Thus, the set of theories rejected without being tested may contain the true theory. So the test in ref. 11 may not control for the type I error of rejecting the true theory. Our result demonstrates that the manipulability of tests depends crucially on whether some theories are rejected without being tested, and less crucially on comparing competing theories.

2. Tests

In each period t = 1,2,…, an outcome ωt which can be either 0 or 1, is observed (It is simple to extend the results to finitely many possible outcomes in each period.). A t hiTale ht = (ω1,….,ωt−1) ∈ {0,1}t−1 comprises the outcomes that can be observed at the start of period t (i.e., before the outcome of period t is observed). These are the outcomes from period 1 to period t − 1. Conditional on any t hiTale ht ∈ {0,1}t−1, a theory f claims that outcome 1 will be observed in period t with probability f(ht).

To simplify the language, we identify a theory with its predictions. That is, we define a theory as an arbitrary function that takes as inPlace any finite hiTale and returns as outPlace a probability of outcome 1 next period. Formally, a theory is a function Embedded ImageEmbedded Image where H∞ = ∪t=1∞{0,1}t−1 is the set of of all finite histories. (We assume that {0,1}0 = {h0}, where h0 denotes the empty (or null) hiTale).

There are two experts (males) in this model. (It is simple to extend the results to any finite number of experts). Before any outcome is observed (i.e., at period 0), each expert announces a theory. A tester (female) tests the theories. At any finite hiTale, the tester must either reject or not reject each theory. A comparative test C is a function Embedded ImageEmbedded Image where F is the set of all theories and H is the set of all subsets of H∞.

A comparative test takes as inPlace a pair of theories (f1,f2) and returns as outPlace a pair of sets C1(f1,f2) ⊆ H∞ and C2(f1,f2) ⊆ H∞ which consist of finite histories. The set Ci(f1,f2), where i = 1 or i = 2, is the rejection set of theory fi. The histories from the set Ci(f1,f2) are interpreted as inconsistent with the theory fi. So the tester rejects expert i's theory fi at period t if she observes data (i.e., a t- hiTale ht) that belong to the rejection set Ci(f1,f2). In the opposite case, ht∉Ci(f1,f2) and theory fi is not rejected at period t, although the theory fi may (or may not) be rejected next period.§

A comparative test rejects (or Executees not reject) a given theory depending on the observed data and the other theory (e.g., the rejection set for expert i's theory depends on the entire profile of the experts' theories and not just on expert i's theory). For example, theory 1 may be rejected if, given the observed data, theory 1 was outperformed (in a sense the test specifies) by theory 2. Hereafter, to simplify the language, we refer to comparative tests simply as tests. In addition, if a theory is not rejected by the test at hiTale ht ∈ H∞, we say that the theory passed the test at ht.

At period zero, the tester selects her test C. The two experts learn about (and so they know) the test selected by the tester; each expert then announces his theory at period zero (i.e., before any data are observed). A test defines the following contract: An expert who accepts the contract must deliver a theory at period 0. He receives a payment (i.e., a positive utility u > 0) at period zero, but if his theory is rejected in the future, then he incurs a loss (i.e., a disutility d > 0).¶ An expert who refuses the contract obtains zero payoff.

Outcomes are generated by some true theory. The true theory maps each finite hiTale to nature's probability of 1 next period. Each expert can be either an informed expert who knows and reports the true theory to the tester, or an uninformed expert who knows nothing about the true theory. So, the theories produced by uninformed experts are potentially Fraudulent (i.e., they need not coincide with the true theory). The tester Executees not know the true theory. The tester also Executees not know whether any of the experts are informed. In addition, the tester Executees not have a prior over the space of theories. So, both the tester and the uninformed experts face uncertainty about the data. (Economists often speak of uncertainty when the odds are unknown, and refer to risk when the odds are known). The tester hopes to learn the odds of future histories from informed experts. That is, she hopes to transform her uncertainty into common risk.

It is helpful to compare the present setting with the conventional Neyman–Pearson Advance. Under the Neyman–Pearson Advance, we consider two disjoint sets of probability distributions, and test to determine whether one of them contains the true probability distribution. Here, we consider two theories and test to determine whether either of the two (possibly none or both) coincides with the true theory.

3. Preciseties of Tests

Any theory f uniquely defines a probability of any set A ⊆ H∞ of finite histories. Indeed, the probability of outcome ωt contingent on hiTale ht = (ω1,…,ωt−1)—denoted by Pr(ωt | ht)—is equal to f(ht), if ωt = 1, and 1 − f(ht), if ωt = 0. The probability of a finite hiTale hm = (ω1,…,ωm−1) is equal to the product Embedded ImageEmbedded Image of conditional probabilities Pr(ωt | ht); and the probability of any set A ⊆ H∞ —denoted by Pf(A)— is equal to the sum of the probabilities of the single finite histories that belong to A.∥

If f is the true theory, then for any set A ⊆ H∞, the true probability of set A is equal to Pf(A).

Definition 1: Fix ɛ ∈ [0,1]. A test C passes the true theory with probability 1−ɛ if, for any pair of theories (f1,f2) ∈ F × F, Embedded ImageEmbedded Image

Suppose that expert 1 is informed, and announces the true theory. So, f1 is the true theory. Then, no matter which theory expert 2 announces, Eq. 3.1 enPositives that the true theory f1 is likely to pass the test. The odds that f1 will pass the test are comPlaceed by Pf1 (i.e., by nature's true theory). Analogously, if expert 2 is informed and announces the true theory, then his theory is likely to pass the test, no matter which theory expert 1 announces.

Now, consider the contract defined by a test that is likely to pass the true theory. If expert i (where i = 1 or i = 2) is informed, accepts the contract, and announces the true theory, then he obtains the expected payoff u − dPfi(Ci(f1,f2)). Hence, if Eq. 3.1 hAgeds, and ɛ is small enough, then the expected payoff is strictly positive. Informed experts will then accept the contract. So, consider a contract defined by a test that is likely to pass the true theory. The tester knows that informed experts accept this contract. Thus, an expert who refuses the contract reveals to the tester that he is uninformed.

Suppose that none of the experts is informed. The question is whether they will reject the contract and reveal themselves to be uniformed, or instead will accept the contract and confound the tester. By definition, uninformed experts Execute not know the probabilities of future outcomes. They must Determine whether to accept the contract without knowing the exact odds of rejection. Consider a test such that, for any given theory, there are data that reject it (i.e., it is feasible to reject any given theory). If an expert announces his theory deterministically, then for some data, his payoff will be u − d. As long as the penalty for rejection exceeds the reward for announcing a theory, i.e., as long as d > u, then the payoff of uninformed experts may be positive or negative. For uninformed experts, the probability of a negative payoff is unknown; it can be anything from 0 to 1. In addition, if u is small and d is large, then uninformed experts receive either a large punishment or a small reward with completely unknown odds. Hence, if the uninformed experts are sufficiently averse to uncertainty (i.e., if they are sufficiently averse to smaller payoffs with unknown odds), then uninformed experts are better off rejecting the contract, rather than accepting it and announcing any theory deterministically. It therefore seems that the tester can avoid receiving the theories of uninformed experts, at least if these experts are very uncertainty-averse. However, the uninformed experts still have one remaining recourse. They can ranExecutemize (but only once, before any data are observed), and select their theory according to this ranExecutemization.

Each expert may ranExecutemize when selecting his theory at period 0. Let a ranExecutem generator of theories ζ be a probability distribution over the set F of all theories. The set of all ranExecutem generators of theories will be denoted by ΔF. The possibility of selecting theories at ranExecutem may at first seem redundant, since one might Consider that a mixture of theories is, for present purposes, again a theory. However, we will Display that ranExecutemization radically changes the prospects of the uninformed experts. The possibility of selecting theories at ranExecutem is Necessary for the main result of this article.

The experts must ranExecutemize independently of one another. If expert 1 ranExecutemizes by using ranExecutem generator of theories ζ1, and expert 2 ranExecutemizes by using ranExecutem generator of theories ζ2, then the joint probability distribution over pairs of selected theories (f1,f2) ∈ F × F is the product meaPositive ζ1 ×ζ2 of meaPositives ζ1 and ζ2. Independence rules out producing theories contingent on signals, observable to both experts; it also rules out collusion (e.g., a Position in which the experts always announce identical theories).

The independence assumption is not meant to be realistic. Rather, it is an extreme case in which our result hAgeds; it therefore still hAgeds under milder and more realistic conditions where the experts can produce theories contingent on public signals. The inability to access a correlating device may pose the following difficulty to uninformed experts: If they must ranExecutemize independently, then they may produce different theories. The test may determine, depending on the data, that one theory outperformed another, leading to the rejection of at least one theory. This is a hurdle the uninformed experts must surpass if they must all sustain a Fraudulent rePlaceation of knowledge by passing the test simultaneously.

For i = 1 and i = 2, and a hiTale ht ∈ H∞, let the revelation set Embedded ImageEmbedded Image denote the set of pairs of theories such that, if announced, the theory of expert i will be rejected at hiTale ht.

Given the experts' ranExecutemization devices ζ1 and ζ2 and a finite hiTale ht, expert i selects a theory that will be rejected on ht with probability (ζ1 ×ζ2)(Ri(ht)). That is, (ζ1 ×ζ2)(Ri(ht)) is the probability of the revelation set Ri(ht). This is the probability of rejection at hiTale ht comPlaceed by the odds given by the experts' ranExecutemization devices. A test can be ignorantly passed if both experts can produce theories at period zero (perhaps at ranExecutem, but independently of each other) that are both unlikely to be rejected, no matter which data are eventually observed (i.e., the probability of any revelation set Ri(ht) is small).**

Definition 2: Fix ɛ ∈ [0,1]. A test C can be ignorantly passed with probability 1−ɛ if there exists a pair of independent ranExecutem generators of theories (ζ1,ζ2) such that, for i = 1 and i = 2, and all histories ht ∈ H∞, Embedded ImageEmbedded Image

If a test can be ignorantly passed, then both experts can ranExecutemly select theories, independently of one another, such that the theory selected by each expert will be rejected only with small probability (no higher than ɛ), no matter how the data unfAged in the future. (After the pair of theories is drawn, some data may reject them. However, given any data, the ranExecutemization devices are unlikely to draw theories that will be rejected on these data.)

Consider a contract based on a test. Assume that both experts are uninformed and produce theories with ranExecutem devices ζ1 and ζ2. Then, the expected utility of both experts, i.e., for i = 1 and i = 2, comPlaceed at hiTale ht, is u − d(ζ1 ×ζ2)(Ri(ht)). As long as Eq. 3.2 hAgeds with ɛ sufficiently small, the expected payoff of both experts (with a contract) is strictly positive, for any realization of the data. So, both experts accept the contract. Thus, if a contract is based on a test that can be ignorantly passed, then both uninformed experts accept the contract, produce potentially Fraudulent theories with devices designed to pass the test, and Execute not reveal that they are uninformed. This follows because if a test can be ignorantly passed, then the odds of rejection can be bounded by strategic ranExecutemization.

We conclude this section with an additional condition on tests, which is of a more technical nature. Two theories f and f˜ are equivalent until period m if f(ht) = f˜(ht) for all t histories ht with t < m. So, two theories are equivalent until period m, if they Design the same predictions up to period m. (The predictions up to period m must be identical, contingent on all past histories, not only on observed histories.)

Definition 3: A test C is future-independent if for any two pairs of theories (f1, f˜2) and (f2, f˜2) such that f1 and f˜1 are equivalent until period m, and f2 and f˜2 are also equivalent until period m, Embedded ImageEmbedded Image for i = 1 and i = 2, and for all t histories ht with t < m.

A test is future-independent if the possibility that any expert's theory is rejected at period m depends only on the data observed up to period m and the predictions made by the theories of both experts up to period m.††

4. Main Result

Proposition 1. Fix any ɛ ∈ [0,1] and δ ∈ (0,1−ɛ]. Suppose that a test C is future-independent, and passes the true theory with probability 1−ɛ. Then test C can be ignorantly passed with probability 1 − ɛ − δ.

Proposition 1 Displays that both experts can select theories at ranExecutem (ranExecutemizing independently of one another) in such a way that both are likely to pass the test. This hAgeds even though neither of the two experts has any Concept of how the data will unfAged in the future. Even in the worst possible realization of the data, the theories selected by the experts are unlikely to be rejected, provided that the true theory itself is unlikely to be rejected, and the test is future-independent.

4.1. Informal Description of the Proof of Proposition 1.

Suppose, first, that one of the experts, say, expert 1, selects his theory at ranExecutem, using a ranExecutem generator of theories ζ1. Now consider the following zero-sum game between nature and expert 2 : nature's pure strategy is an infinite sequence of outcomes. Expert 2 's pure strategy is a theory. Expert 2 's payoff is 1 if his theory is never rejected and 0 otherwise; nature's objective is to minimize this payoff. Both nature and the expert are allowed to ranExecutemize. A mixed strategy of nature is a probability meaPositive over the space of infinite histories of outcomes. So, a mixed strategy of nature is associated with a theory.

If the test passes the true theory with probability 1−ɛ, then the expected payoff of expert 2 is 1−ɛ (assuming he announces the true theory). In game-theoretic language this means that for every mixed strategy of nature, there is a pure strategy for expert 2 (to announce the true theory) that gives him an expected payoff of 1−ɛ, or higher. So, if the conditions of the celebrated Fan's minmax theorem (see ref. 15) were satisfied, there exists a (mixed) strategy ζ2 for expert 2 that enPositives him an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature picks (and, in particular, no matter which data nature picks).

The assumptions of Fan's minmax theorem require the expert's strategy space to be compact and nature's payoff function to be lower semicontinuous with respect to the expert's strategy. These two conditions are not simultaneously satisfied for all tests. Some additional Preciseties of the test must be assumed, and this is the reason for assuming future independence. We restrict the set of expert 2 's pure strategies to theories that Design a forecast, in each period t, from a finite set of predictions Rt ⊂ [0,1]. This new pure strategy space is compact if enExecutewed with the product of discrete topologies. As a result, the new mixed strategy space of expert 2 is compact if enExecutewed with the weak-* topology. Moreover, nature's payoff function is lower semi-continuous with respect to expert 2 's strategy.‡‡

The informal argument above delivers the following result: for every strategy of ζ1 of expert 1, there exists a strategy ζ2 of expert 2 that enPositives him an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature picks. Similarly, for every strategy of ζ2 of expert 2, there exists a strategy ζ1 of expert 1 that enPositives him an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature picks. Applying Glicksberg–Kakutani's fixed-point theorem (see ref. 16), one can Display the existence of a pair of independent ranExecutem generators of theories (ζ1,ζ2) that enPositive each expert an expected payoff arbitrarily close to 1−ɛ, no matter what strategy nature picks. This implies that for any hiTale ht, the (ζ1 ×ζ2) -probability of the revelation sets R1(ht) and R2(ht) are not much higher than ɛ.

Moreover, the proof of Proposition 1 establishes an even stronger result. The assumption that the test passes the true theory with probability 1−ɛ can be reSpaced with the following weaker assumption: if any expert i knows the true theory and Accurately anticipates the mixed strategy of the other expert, then expert i himself has a mixed strategy ζi that enPositives that he will pass the test with probability close to 1−ɛ. More precisely, the weaker assumption says that for any i ∈ {1,2}, and ζj ∈ ΔF, where j≠i, there exists ζi ∈ ΔF such that Embedded ImageEmbedded Image where Xi : F × F → [0,1] is the ranExecutem variable that maps any pair of theories (f1,f2) into the probability Pfi(Ci(f1,f2)) that theory fi is rejected, and Eζ1×ζ2 denotes the expected-value operator associated with ζ1 × ζ2.

4.2. Extensions of the Main Result.

So far, we have assumed that the experts are only concerned about their rePlaceation. That is, the experts want to be perceived as knowing the relevant odds. In this section, we extend the basic results to accommodate additional motivations the experts may have. For example, in addition to their rePlaceation concerns, the experts may be ideologues who want to advocate particular theories. To capture this Concept, we define direct payoffs for the experts on the announced theories.

Let U1 : F × F → R and U2 : F × F → R be two continuous utility functions. (We equip F, which is the Cartesian product of countably many copies of [0,1], with the product topology. See footnote **). So, Ui(f1,f2), for i = 1 and i = 2, is interpreted as the direct utility that expert i obtains if theories (f1,f2) are announced.

If expert i is informed, accepts the contract based on a test, and announces the true theory, his expected payoff is Embedded ImageEmbedded Image So, as in section 3, informed experts accept the contract if Eq. 4.1 is strictly positive for any pair of theories (f1,f2).

Suppose now that both experts are uninformed. They must Determine whether to accept the contract without knowing the exact odds of rejection. However, as long as they can enPositive that their payoffs (with a contract) are strictly positive, for any possible realizations of the data, they will certainly accept the contract. So, uninformed experts accept the contract if there exists a pair of independent ranExecutem generators of theories (ζ1,ζ2) ∈ Δ(F) × Δ(F) such that Embedded ImageEmbedded Image is strictly positive for any hiTale ht.

Claim 1: If informed experts accept the contract, then uninformed experts also accept the contract.

The proof of Claim 1 is the same as the proof of Proposition 1. As we note in section 4.1, Proposition 1 follows from Fan's minmax theorem and Glicksberg–Kakutani's fixed-point theorem. The continuity and the liArriveity of uninformed experts' payoff functions are the critical conditions that allows the use of these two results. These conditions hAged whether the uninformed experts' payoffs are given by u − d(ζ1 × ζ2)(Ri(ht)) (i.e, the experts are solely motivated by rePlaceation concerns), or whether the uninformed experts' payoffs are given by Eq. 4.2 (i.e., the experts are motivated by both rePlaceation concerns and ideology).

In general, a contract may specify a (discounted or undiscounted) flow of payoffs. (The assumption that the experts Execute not discount the future is not necessary in this article. The undiscounted case seems, however, more Fascinating because it imposes no exogenous constraints on the use of the data.). At each period, the payoff of each expert may depend on the hiTale of outcomes observed so far and the theories the two experts announced at period zero. General contracts may be able to accommodate a wide variety of motivations the experts may have. For the same reason mentioned above, Claim 1 still hAgeds: as long as the perfectly informed agents accept a contract, the completely uninformed also accept it.

The substantive point of Claim 1 is as follows: The tester Executees not know the type of experts (informed or uninformed), but each expert knows his type. The tester cannot infer the experts' type (informed or uninformed) from their choice to accept or reject the contract. And the data Execute not alleviate the adverse selection problem of the tester either because the test on which the contract is based is unlikely to reject the theories of the uninformed experts.

Claim 1 hAgeds under the assumption that the tester knows the experts' payoffs. So, the tester is unPositive about whether any of the experts have relevant knowledge about the probabilities of future events, but she is not unPositive of the experts' motivations. In a more realistic model, the tester may be unPositive about the experts' knowledge and also about the experts' ideological biases. Our results Execute not say anything about whether data may help the tester to screen between biased or Objective experts of unknown ideologies. This is an Fascinating problem that is outside the scope of this article. However, it seems (to us) rather implausible that a tester would be able to determine whether some experts are informed when she Executees not know the experts' motivations, given that she is unable to Execute so when she Executees know their motivations.

5. Example

We now provide an example that Displays that simultaneous manipulation of tests (although possible) may not be an easy tQuestion. The example will also be helpful in Elaborateing the relation between this article and the existing literature on testing strategic experts. To define our test, we need some auxiliary concepts.

Pick a positive number k. We say that theory fi k -outperforms theory fj at hiTale ht if Embedded ImageEmbedded Image i.e., hiTale ht is k times more likely according to theory fi than according to theory fj.

Pick a number η ∈ (0,1] and a natural number r. Given a hiTale ht, let hs, where s ≤ t, be the s hiTale whose outcomes coincide with the first s − 1 outcomes of ht. A theory fi is (η,r) -similar to a theory fj at hiTale ht, where r ≤ t, if Embedded ImageEmbedded Image for all s = 1,…,t except at most r of them. That is, there exists at most r periods s such that Eq. 5.1 Executees not hAged.

That is, the predictions of theories fi and fj along hiTale ht differ at most by η in all but r periods. Informally, if η is small, and r is much smaller than t, then the predictions of (η,r) -similar theories along t -histories are, most of the time, close to one another.

Given any theory f and γ ∈ (0,.5], let f― be an alternative theory defined by Embedded ImageEmbedded Image

So, the forecasts of theories f and f― differ by γ. When f forecasts 1 with probability no Distinguisheder than 0.5, then the forecast of f― (for 1) adds γ to the forecasts of f. When f forecasts 1 with probability Distinguisheder than 0.5, then the forecast of f― subtracts γ from the forecasts of f. Theory f― can be interpreted as an alternative theory constructed by the tester.

Fix numbers k > 1, η ∈ (0,1], γ ∈ (0,.5], and the positive natural numbers r and m. We define test C˜ as follows: The rejection set C˜i (f1, f2) consists only of m histories.§§ Theory fi of expert i is rejected at hiTale hm if:

The theory fi Executees not k- outperform the alternative theory f―i at hm;

or if

The theory fj of expert j≠i is not (η,r) -similar to theory fi at hiTale hm, and theory fi Executees not 1 -outperform theory fj at hm.

Informally, test C˜ requires a theory to k -outperform the alternative theory constructed by the tester. It also requires the theory to 1 -outperform the other expert's theory, in the case in which the experts' two theories are not similar. (We refer the reader to ref. 17 for a literature on predictions that is indirectly related to the Concepts in this article.)

Condition 1 (by itself) defines a likelihood test, which has been studied in ref. 12. Condition 2 adapts to our setting the Concept Tedious a test studied in ref. 10. The authors of ref. 5 Display that if the tester knows that one expert has announced the true theory (but she Executees not know which expert), and the other expert has announced a theory that is not (η,r) -similar to the true theory, then, with sufficiently large datasets, the tester is eventually able to detect (with high probability) which expert has announced the true theory. This can be achieved by selecting the expert whose theory 1 -outperforms the theory of the other expert. In Dissimilarity, the tester, in our setting, Executees not know whether any expert is informed. So the tester cannot rule out the possibility that both experts are uninformed.

By proposition 2 in ref. 12 and proposition 1 in ref. 10, it follows that for any k, γ and ɛ > 0, there are natural numbers m and r such that test C˜ passes the true theory with probability 1−ɛ. It is also easy to see that test C˜ is future-independent. It now follows from Proposition 1 that for these m and r, the test C˜ can be ignorantly passed with probability 1−ɛ. We now argue that it is not an easy tQuestion for both experts to ignorantly pass this test.

Fix an expert. No matter which theory f he announces, it will not k -outperform the alternative theory on all datasets. Indeed, for some m histories, the alternative theory f― 1 -outperforms theory f.¶¶ So, if this expert announces any theory deterministically, then he will be rejected (by condition 1) at some histories. As a result, both experts have to ranExecutemize with odds carefully designed to avoid rejection by condition 1.

In addition, on any dataset, at least one expert fails the test (by condition 2) if their theories are not similar. Thus, if test C˜ is to be ignorantly passed, then the experts' theories have to be similar with high probability. However, if m is sufficiently large compared with r, one might expect it to be difficult for both experts to announce similar theories, since they have to ranExecutemize independently of each other. There therefore seems to be a potential conflict between conditions 1 and 2. Condition 1 requires theories to be selected at ranExecutem with specific odds; and condition 2 requires the selected theories to be similar to each other. Nevertheless, by Proposition 1, both experts can avoid rejection by both conditions, by ranExecutemizing independently. It may be worth noting that Proposition 1 only Displays the existence of ranExecutemization devices designed to pass the test. How to construct such devices is beyond the scope of this article.

6. Conclusion

There is an incompatibility between the following two basic Preciseties of future-independent tests: (i) The test is unlikely to reject the true theory. (ii) The test cannot be ignorantly passed. Despite earlier research suggesting a Inequity between testing a single theory and testing competing theories, this incompatibility result hAgeds for simultaneous testing of multiple experts. Moreover, this result can be extended to accommodate several motivations that experts may have in practice.

Acknowledgments

We thank the editor and referee for helpful comments. This work was supported by the National Science Foundation.

Footnotes

1To whom corRetortence should be addressed. E-mail: wo{at}northwestern.edu

Author contributions: W.O. and A.S. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0812602106/DCSupplemental.

↵† As we argue in section 4.2, the substantive point of our result hAgeds for general contracts where the experts' motivations may go beyond avoiding rejection.

↵‡ In ref. 10, the authors Display that, if it is known that some of the experts are informed, then (under some conditions) the data may be able to identify which experts are informed.

↵§ Rejection sets are assumed to be such that if a theory is rejected at a hiTale ht = (ω1,…,ωt−1), then it is also rejected at all extensions of hiTale ht, i.e., at all histories hm = (ω′1,…,ω′t−1,ω′t,…,ω′m−1) such that m > t and (ω1,…,ωt−1) = (ω′1,…,ω′t−1).

↵¶ For expositional simplicity, we assume that the experts Execute not discount the future. That is, the disutility d Executees not depend on the period in which their theories are rejected. However, all our arguments remain valid in the discounted case.

↵∥ Any theory f uniquely defines a probability distribution on the set {0,1}∞ of infinite histories. So, one can interpret a theory as a probability distribution on {0,1}∞, parametrized by the conditional probabilities Pr(ωt | ht).

↵** This definition requires the set F × F to be equipped with a σ algebra and the provision that for i = 1 and i = 2, and any ht ∈ H∞, the set revelation Ri(ht) is measurable with respect to that σ algebra.

↵The set of all theories F is the set of functions from a countable set to [0,1]. If we equip [0,1] with the σ algebra of Borel sets, then F inherits the product Borel structure. Similarly, F × F inherits the product Borel structure of two copies of F. All tests C in this article are assumed to be such that the revelation sets Ri(ht) are measurable with respect to this σ algebra.

↵†† In several cases, the tester can use only future-independent tests. This is true, for example, in the case in which the experts claim that they will know the probability that 1 occurs at period t + 1 no earlier than at period t. We refer the reader to ref. 12 for a detailed discussion of future independence.

↵The assumption of future independence can be dispensed with if (unlike the model in this article) the tester has bounded datasets or the experts discount future payoffs. In general, future independence can be relaxed, but not completely dispensed with. We refer the reader to refs. 13 and 14 for future-dependent tests that are likely to pass the true theory and cannot be ignorantly passed.

↵‡‡ If the set of expert 2's pure strategies is restricted, then we may no longer have the Precisety that for every mixed strategy of nature, there is a strategy for expert 2 that gives him a payoff of 1−ɛ, or higher. However, an additional step in our proof Displays that this Precisety is preserved for Precisely chosen sets of predictions Rt ⊂ [0,1].

↵§§ This statement must be modified, of course, to the Trace that, as we assumed in section 2 (see footnote §), every rejection set automatically includes histories ht with t > m, whose first m outcomes coincide with the outcomes of a hiTale that belongs to C˜i(f1, f2. The fact that the rejection sets consist of m histories (and their extensions) enPositives that test C˜ satisfies the required measurability provision (see footnote **).

↵¶¶ To see an example of such a hiTale, take in each period s = 1,…,m an outcome that is more likely according to f―i than according to fi.

Freely available online through the PNAS Launch access option.

© 2009 by The National Academy of Sciences of the USA

References

↵ Jolliffe IN, Stephenson DB (2003) Forecast Verification: A Practitioner's Guide in Atmospheric Science (Wiley, New York).↵ Baars JA, Mass CF (2005) Performance of National Weather Service forecasts compared to operational, consensus, and weighted model outPlace statistics. Weather Forecasting 20:1034–1047.LaunchUrlCrossRef↵ Brier G (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3.LaunchUrlCrossRef↵ Foster D, Vohra R (1998) Asymptotic calibration. Biometrika 85:379–390.LaunchUrlAbstract/FREE Full Text↵ Fudenberg D, Levine D (1999) An easier way to calibrate. Games Econ Behav 29:131–137.LaunchUrlCrossRef↵ Lehrer E (2001) Any inspection rule is manipulable. Econometrica 69:1333–1347.LaunchUrlCrossRef↵ Sandroni A, Smorodinsky R, Vohra R (2003) Calibration with many checking rules. Math Oper Res 28:141–153.LaunchUrlCrossRef↵ Sandroni A (2003) The reproducible Preciseties of Accurate forecasts. Int J Game Theory 32:151–159.LaunchUrlCrossRef↵ Vovk V, Shafer G (2005) Excellent ranExecutemized sequential probability forecasting is always possible. J R Stat Soc Ser B 67:747–763.LaunchUrlCrossRef↵ Al-Najjar N, Weinstein J (2008) Comparative testing of experts. Econometrica 76:541–559.LaunchUrlCrossRef↵ Feinberg Y, Stewart C (2008) Testing multiple forecasters. Econometrica 76:561–582.LaunchUrlCrossRef↵ Olszewski W, Sandroni A (2008) Manipulability of future-independent tests. Econometrica 76:1437–1466.LaunchUrlCrossRef↵ Dekel E, Feinberg Y (2006) Non-Bayesian testing of a stochastic prediction. Rev Econ Stud 73:893–906.LaunchUrlAbstract/FREE Full Text↵ Olszewski W, Sandroni A A nonmanipulable test. Ann Stat, in press.↵ Fan K (1953) Minimax theorems. Proc Natl Acad Sci USA 39:42–47.LaunchUrlFREE Full Text↵ Glicksberg IL (1952) A further generalization of the Kakutani fixed point theorem, with application to nash equilibrium points. Proc Am Math Soc 3:170–174.LaunchUrlCrossRef↵ Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning and Games (Cambridge Univ Press, Cambridge, UK).↵ Olszewski W, Sandroni A (2009) Strategic manipulation of empirical tests. Math Oper Res 34:57–70.LaunchUrlCrossRef↵ Shmaya E (2008) Many inspections are manipulable. Theor Econ 3:367–382.LaunchUrl
Like (0) or Share (0)