Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Reinforcement Learning for Long-Term Reward Optimization in Recommender Systems. / Dorozhko, Anton.
SIBIRCON 2019 - International Multi-Conference on Engineering, Computer and Information Sciences, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 862-867 8958202 (SIBIRCON 2019 - International Multi-Conference on Engineering, Computer and Information Sciences, Proceedings).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Reinforcement Learning for Long-Term Reward Optimization in Recommender Systems
AU - Dorozhko, Anton
N1 - Publisher Copyright: © 2019 IEEE. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/10
Y1 - 2019/10
N2 - Recommender systems help users to orient in the vast space of goods, services, and events. A user interacts with the recommender engine in a sequence of exchanges of recommendations and user feedback. The idea that previous interaction influence the later ones and the importance of the sequence of interactions can be modeled using Markov decision processes and solved by reinforcement learning. Several recent articles applying reinforcement learning to recommender systems have proved the viability of this direction. But it is still difficult to compare different approaches. We propose an environment with a unified interface that will permit to compare different modelization of recommender process and different algorithms on the same underlying sequential data. We also performed the extensive parameter study for deep deterministic policy gradient methods on the well-known MovieLens dataset.
AB - Recommender systems help users to orient in the vast space of goods, services, and events. A user interacts with the recommender engine in a sequence of exchanges of recommendations and user feedback. The idea that previous interaction influence the later ones and the importance of the sequence of interactions can be modeled using Markov decision processes and solved by reinforcement learning. Several recent articles applying reinforcement learning to recommender systems have proved the viability of this direction. But it is still difficult to compare different approaches. We propose an environment with a unified interface that will permit to compare different modelization of recommender process and different algorithms on the same underlying sequential data. We also performed the extensive parameter study for deep deterministic policy gradient methods on the well-known MovieLens dataset.
KW - DDPG
KW - deep reinforcement learning (DRL)
KW - long-Term value
KW - recommender systems
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85079055019&partnerID=8YFLogxK
U2 - 10.1109/SIBIRCON48586.2019.8958202
DO - 10.1109/SIBIRCON48586.2019.8958202
M3 - Conference contribution
AN - SCOPUS:85079055019
SN - 978-1-7281-4402-3
T3 - SIBIRCON 2019 - International Multi-Conference on Engineering, Computer and Information Sciences, Proceedings
SP - 862
EP - 867
BT - SIBIRCON 2019 - International Multi-Conference on Engineering, Computer and Information Sciences, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2019
Y2 - 21 October 2019 through 27 October 2019
ER -
ID: 28278727