Follow
Matteo Papini
Title
Cited by
Cited by
Year
Stochastic variance-reduced policy gradient
M Papini, D Binaghi, G Canonaco, M Pirotta, M Restelli
Proceedings of the 35th International Conference on Machine Learning 80 …, 2018
1752018
Policy optimization via importance sampling
AM Metelli, M Papini, F Faccio, M Restelli
Advances in Neural Information Processing Systems 31, 2018
992018
Feature selection via mutual information: New theoretical insights
M Beraha, AM Metelli, M Papini, A Tirinzoni, M Restelli
2019 international joint conference on neural networks (IJCNN), 1-9, 2019
792019
Risk-averse trust region optimization for reward-volatility reduction
L Bisi, L Sabbioni, E Vittori, M Papini, M Restelli
arXiv preprint arXiv:1912.03193, 2019
612019
Importance sampling techniques for policy optimization
AM Metelli, M Papini, N Montali, M Restelli
The Journal of Machine Learning Research 21 (1), 5552-5626, 2020
482020
Adaptive batch size for safe policy gradients
M Papini, M Pirotta, M Restelli
Advances in neural information processing systems 30, 2017
452017
Gradient-aware model-based policy search
P D'Oro, AM Metelli, A Tirinzoni, M Papini, M Restelli
Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 3801-3808, 2020
422020
Optimistic policy optimization via multiple importance sampling
M Papini, AM Metelli, L Lupo, M Restelli
International Conference on Machine Learning, 4989-4999, 2019
362019
Smoothing policies and safe policy gradients
M Papini, M Pirotta, M Restelli
Machine Learning 111 (11), 4081-4137, 2022
282022
Leveraging good representations in linear contextual bandits
M Papini, A Tirinzoni, M Restelli, A Lazaric, M Pirotta
International Conference on Machine Learning, 8371-8380, 2021
272021
Reinforcement learning in linear mdps: Constant regret and representation selection
M Papini, A Tirinzoni, A Pacchiano, M Restelli, A Lazaric, M Pirotta
Advances in Neural Information Processing Systems 34, 16371-16383, 2021
162021
Balancing learning speed and stability in policy gradient via adaptive exploration
M Papini, A Battistello, M Restelli
International conference on artificial intelligence and statistics, 1188-1199, 2020
142020
Policy optimization as online learning with mediator feedback
AM Metelli, M Papini, P D'Oro, M Restelli
Proceedings of the AAAI Conference on Artificial Intelligence 35 (10), 8958-8966, 2021
122021
Lifting the information ratio: An information-theoretic analysis of thompson sampling for contextual bandits
G Neu, I Olkhovskaia, M Papini, L Schwartz
Advances in Neural Information Processing Systems 35, 9486-9498, 2022
92022
Offline Primal-Dual Reinforcement Learning for Linear MDPs
G Gabbianelli, G Neu, N Okolo, M Papini
arXiv preprint arXiv:2305.12944, 2023
32023
Online learning with off-policy feedback
G Gabbianelli, G Neu, M Papini
International Conference on Algorithmic Learning Theory, 620-641, 2023
32023
Scalable representation learning in linear contextual bandits with constant regret guarantees
A Tirinzoni, M Papini, A Touati, A Lazaric, M Pirotta
Advances in Neural Information Processing Systems 35, 2307-2319, 2022
32022
Safe policy optimization
M Papini
Italy, 2021
32021
Automated Reasoning for Reinforcement Learning Agents in Structured Environments.
A Gianola, M Montali, M Papini
OVERLAY@ GandALF, 43-48, 2021
32021
Importance-weighted offline learning done right
G Gabbianelli, G Neu, M Papini
arXiv preprint arXiv:2309.15771, 2023
12023
The system can't perform the operation now. Try again later.
Articles 1–20