That Decision Theory Problem

Last week I posed a (perhaps imperfectly remembered) problem from Nick Kiefer’s course on Decision Theory. I’m very sorry that I haven’t found time to work out a complete solution (or even to read carefully through all the comments). Today I’ll post some hints from my notes toward a solution. Warning One: Only the math nerds will care about this post. Warning Two: This has all been double checked, but none of it’s been triple checked. It could be wrong.

The goal is to guess Nick’s secret number, which is drawn randomly from a uniform distribution on the numbers from 0 to 100. Each day, he draws a new “Number of the Day” and tells you whether it’s larger or smaller than the secret number.

Suppose we’ve gone several days into the semester, and Nick has announced “smaller” a total of n times and “larger” a total of k times. If you submit your optimal guess at this point, you pay a penalty equal to your the squared difference between your guess and the right answer. If I’ve done this correctly, the expected value of that penalty is

(I assume that readers divide into two camps—those who don’t care about how I got this and those who would prefer to figure it out for themselves. So I’ll omit that part of the argument.)

If you wait one more day to submit your answer, the probability you’ll hear another “smaller” is (1+n)/(2+k+n) and the probability you’ll hear another “smaller” is (1+k)/(2+k+n). From this, we compute that the expected reduction in the penalty is

This reduction is part of the benefit of waiting another day. The rest of the benefit is that if you wait, then you acquire the option of waiting again.

You also pay a second penalty equal to the log of the number of days you wait to submit your answer. So waiting one more day increases that penalty by Log(n+k+1)-Log(n+k).

This cost and benefit are likely to be pretty close to equal after (roughly) 2500 days are so, at which point the full benefit of waiting (including the option value) still exceeds the cost, so it’s not yet time to submit your guess.

On the other hand, if we multiply the log-penalty by 10, then this measured cost and benefit are pretty close to equal after about 8 days, provided you happen to have heard “larger” and “smaller” an equal number of times. So in this more sensible version of the problem, you want to start thinking about submitting your guess somewhere around eight days in.

17 Responses to “That Decision Theory Problem”

1. 1 1 Jonathan Campbell

The option value of waiting is dependent on the # of days in the semester. The greater the # of days, the greater the value of waiting. The option value, I believe, must be solved for recursively, moving backwards from the last day of the semester.

2. 2 2 Steve Landsburg

Jonathan Campbell:

I suspect there’s a better way to deal with the option value — by defining a value function, proving that it’s a fixed point of an appropriate operator, and then finding that fixed point. But I haven’t done this.

3. 3 3 Jonathan Campbell

Surely even if you do that (something I don’t know how to do), though, you’ll find a result that agrees with the recursion solution, in which I found that the decision on any given day depends on the # of days in the semester. No?

4. 4 4 Ron

I can’t see how the decision could possibly depend on number of days
in the semester. Each day’s decision depends on answering the
question: “Does the value of the increased precision of a guess
tomorrow outweigh the penalty for waiting another day?”

As far as I can see, that answer is independent of number of days in
the semester. If the semester has 1,000,000 days, the function
doesn’t change any from a semester with 100 days. In fact, the only
way I can see where number of days in the semester makes a
difference is if the value function still calls for waiting longer
than the semester. In that case, you’re forced to make your guess
on the last day.

5. 5 5 Mike H

@Ron – this is only true if there are no local minima. Applying your principle would mean that, for example, you never wait for a term deposit to mature. If the maturity date is two or more days away, the benefit of waiting until tomorrow is negative (since inflation eats away at your principal), however the true optimal solution is to wait until maturity, when you get to collect the principal plus interest (assuming it was sensible to open the account in the first place).

However, for this decision theory problem, I think there *are* no local minima. The length of the semester comes in as an unstated rule of the game – at some point, the head of department asks the lecturer to submit the assignment results. If you are a student, you don’t want to have it recorded that you never submitted your assignment. This is what you said.

I still need to think about the ‘option value’ to see if I agree with Steve, but I’m suspicious of it.

6. 6 6 Mike H

Ok, yes, I see it now. That makes it quite a complicated problem.

7. 7 7 Steve Landsburg

Ron: The more information you have, the less is the value of additional information.

If you’ve already got enough information to be 99% sure that the secret number is between 5 and 6, then extra info is worth far less than it is at the beginning of the semester when all you know is that it’s between 0 and 100.

8. 8 8 Thomas Bayes

Ron:

“Does the value of the increased precision of a guess
tomorrow outweigh the penalty for waiting another day?”

There is a better question to ask: “Does the value of the increased precision I can expect to attain with a guess sometime after today outweigh the penalty for making a guess sometime in the future.”

The difference between the two questions is that the expected cost for a guess tomorrow is relatively easy to compute. The expected cost for making a guess sometime in the future is a harder thing to compute, and will depend on how many days are left in the semester.

9. 9 9 Jonathan Campbell

Steve:
I don’t see how your response addresses Ron’s question. Your response seems to justify the argument that it is sometimes worthwhile to guess before the last day of the semester. This is separate from the argument that whether it is worthwhile to wait on any given day is dependent on the total # of days in the semester (and thus the # of days left). In order to justify this latter argument, you need to show that, for example, assuming you are 99% sure that the secret # is between 5 and 6, you should be more eager for extra information if there are 1000 days left in the semester than if there are 10 days left in the semester. That is, I believe, true, but not obvious.

10. 10 10 Steve Landsburg

Jonathan Campbell (and Ron):

You are right, and I apologize. Ron wrote:

I can’t see how the decision could possibly depend on number of days
in the semester.

I, in a brain fog, read:

I can’t see how the decision could possibly depend on number of days
since the beginning of the semester

and responded accordingly. Ron was of course exactly right.

11. 11 11 Jonathan Campbell

Steve:
You said “Ron was of course exactly right.”
Do you mean to say that it does not matter how many days are left in the semester when you decide whether to guess or not?
That seems untrue to me. If it is the day before the last day of the semester, then the only benefit of waiting is that you have a lower expected penalty upon guessing tomorrow. There is no “option of waiting again.” However, if it is the 3rd to last day, there is a nonzero “option of waiting again,” thus you are more likely to want to wait. And the same basic reasoning suggests that the option value is higher the more days you have left in the semester (all else equal).

12. 12 12 Steve Landsburg

Jonathan Campbell:

Yes, I need to correct myself again. The number of days remaining does matter, because it affects the option value of waiting, just as you say.

13. 13 13 Ron

Here’s the way I analyze the problem.

You have two curves, one descending, one ascending.
First one: expected penalty due to inaccuracy of guess.
This goes down as number of samples (S) increases.
Second one: penalty for number of samples. This goes
up as number of samples (S) increases.

Unconstrained conditions: make your guess at the S
value where the two curves cross.

Constrained conditions: if the number of possible
samples is less than where the two curves cross,
make your guess as close as you can come to the
optimum. That means the highest possible value of
S that you can get.

What would modify the constrained selection
criterion: if one of the curves, say the penalty
per sample, was an odd shape, for instance- a
descending sinusoidal curve. Then it would make
sense to select a local minimum near the max S.

In the given problem, I don’t see any odd-shaped
curves that would give local minima. Neither do
I see anything that would affect the shape of
the curves given a known limit to S. Therefore,
if max permitted S is less than the optimum S
value, choose the highest you are permitted.
This is because, for every permitted value of S,
an increase in S reduces the accuracy penalty more
than it increases the waiting penalty.

Where have I gone wrong? Does something I haven’t
allowed for change the shape of the curves?

14. 14 14 Steve Landsburg

Ron: The number of days left in the semester affects the location of the first curve.

First of all, your first curve moves every day depending on whether you’ve heard “larger” or “smaller”. For example, on day 10, the expected penalty from guessing is much lower when you’ve heard ten “smallers” (and so can be quite sure that the secret number is very small) than when you’ve heard five of each.

But more importantly —- the expected penalty from guessing must be calculated relative to what you expect will happen in the future. For example, if you expect that tomorrow I will slip up and tell you what the secret number is, then the expected penalty from guessing (relative to what you’d get by waiting) is much higher than if you don’t expect me to slip up.

The number of days remaining in the semester affects the options you expect to have in the future and therefore impacts the location of the curve, just as an expectation of a future slip-up would.

15. 15 15 Ron

Steve:
Got it; thanks!

16. 16 16 Jonathan Campbell

Here is some code in the scripting language R that generates all the n,k pairs (n = # of days elapsed, k = # of “lowers”) when you should guess rather than wait. The nmax parameter is total # of days in the semester.

———————
# total number of days in semester
nmax <- 100

# max value of k, where k-1 = # of "lowers" heard
kmax <- nmax+1

# matrix giving E[penalty | n,k], assuming optimal play
EP <- matrix(0,nmax,kmax)

# output matrix showing n,k combos where you should guess
guess <- matrix(0,1,2)

for (n in nmax:1){
for (k in 1:(n+1)){
EP[n,k] <- 1000*(k)/(n+2)*((k+1)/(n+3)-(k)/(n+2))+log(n)
if (n != nmax) {
EPwait <- (k)/(n+2)*EP[n+1,k] + (n-k+2)/(n+2)*EP[n+1,k+1]
if (EPwait < EP[n,k]){
EP[n,k] <- EPwait
} else {
guess <- rbind(guess,c(n,k-1))
}
}
}
}
guess <- guess [-1,]
guess

17. 17 17 Marisol Perry

Here is some code in the scripting language R that generates all the n,k pairs (n = # of days elapsed, k = # of “lowers”) when you should guess rather than wait. The nmax parameter is total # of days in the semester. ——————— # total number of days in semester nmax <- 100 # max value of k, where k-1 = # of "lowers" heard kmax <- nmax+1 # matrix giving E[penalty | n,k], assuming optimal play EP <- matrix(0,nmax,kmax) # output matrix showing n,k combos where you should guess guess <- matrix(0,1,2) for (n in nmax:1){ for (k in 1:(n+1)){ EP[n,k] <- 1000*(k)/(n+2)*((k+1)/(n+3)-(k)/(n+2))+log(n) if (n != nmax) { EPwait <- (k)/(n+2)*EP[n+1,k] + (n-k+2)/(n+2)*EP[n+1,k+1] if (EPwait < EP[n,k]){ EP[n,k] <- EPwait } else { guess <- rbind(guess,c(n,k-1)) } } } } guess <- guess [-1,] guess

Comments are currently closed.