Last week I posed a (perhaps imperfectly remembered) problem from Nick Kiefer’s course on Decision Theory. I’m very sorry that I haven’t found time to work out a complete solution (or even to read carefully through all the comments). Today I’ll post some hints from my notes toward a solution. Warning One: Only the math nerds will care about this post. Warning Two: This has all been double checked, but none of it’s been triple checked. It could be wrong.
The goal is to guess Nick’s secret number, which is drawn randomly from a uniform distribution on the numbers from 0 to 100. Each day, he draws a new “Number of the Day” and tells you whether it’s larger or smaller than the secret number.
Suppose we’ve gone several days into the semester, and Nick has announced “smaller” a total of n times and “larger” a total of k times. If you submit your optimal guess at this point, you pay a penalty equal to your the squared difference between your guess and the right answer. If I’ve done this correctly, the expected value of that penalty is
(I assume that readers divide into two camps—those who don’t care about how I got this and those who would prefer to figure it out for themselves. So I’ll omit that part of the argument.)
If you wait one more day to submit your answer, the probability you’ll hear another “smaller” is (1+n)/(2+k+n) and the probability you’ll hear another “smaller” is (1+k)/(2+k+n). From this, we compute that the expected reduction in the penalty is
This reduction is part of the benefit of waiting another day. The rest of the benefit is that if you wait, then you acquire the option of waiting again.
You also pay a second penalty equal to the log of the number of days you wait to submit your answer. So waiting one more day increases that penalty by Log(n+k+1)-Log(n+k).
This cost and benefit are likely to be pretty close to equal after (roughly) 2500 days are so, at which point the full benefit of waiting (including the option value) still exceeds the cost, so it’s not yet time to submit your guess.
On the other hand, if we multiply the log-penalty by 10, then this measured cost and benefit are pretty close to equal after about 8 days, provided you happen to have heard “larger” and “smaller” an equal number of times. So in this more sensible version of the problem, you want to start thinking about submitting your guess somewhere around eight days in.