Research output: Contribution to journal › Article › peer-review
Gittins index for simple family of markov bandit processes with switching cost and no discounting. / Savelov, M. P.
In: Theory of Probability and its Applications, Vol. 64, No. 3, 01.01.2019, p. 355-364.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Gittins index for simple family of markov bandit processes with switching cost and no discounting
AU - Savelov, M. P.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - We consider the multiarmed bandit problem (the problem of Markov bandits) with switching penalties and no discounting in case when state spaces of all bandits are finite. An optimal strategy should have the largest average reward per unit time on an infinite time horizon. For this problem it is shown that an optimal strategy can be specified by a Gittins index under the natural assumption that the switching penalties are nonnegative.
AB - We consider the multiarmed bandit problem (the problem of Markov bandits) with switching penalties and no discounting in case when state spaces of all bandits are finite. An optimal strategy should have the largest average reward per unit time on an infinite time horizon. For this problem it is shown that an optimal strategy can be specified by a Gittins index under the natural assumption that the switching penalties are nonnegative.
KW - Controlled Markov processes
KW - Gittins index
KW - Long run average return
KW - Markov decision process
KW - Multiarmed bandit problem
KW - Multicomponent systems
KW - No discounting
KW - Optimal strategy
KW - Simple family of alternative Markov bandit processes
KW - Switching penalties
UR - http://www.scopus.com/inward/record.url?scp=85074360690&partnerID=8YFLogxK
U2 - 10.1137/S0040585X97T989544
DO - 10.1137/S0040585X97T989544
M3 - Article
AN - SCOPUS:85074360690
VL - 64
SP - 355
EP - 364
JO - Theory of Probability and its Applications
JF - Theory of Probability and its Applications
SN - 0040-585X
IS - 3
ER -
ID: 22362425