Reinforcement learning in partially observable tasks: state uncertainty and memory dependence

Abstract

Reinforcement learning models have been successfully used to explain a wide range of psychological and neural features of human and animal learning behaviour. The most common reinforcement learning models were initially designed to work in Markovian environments where the currently observed input gives complete knowledge about the current state of the system, and where rewards, optimal responses and transitions between states are all dependent on the present state only. However, these conditions are rarely satisfied in the real problems that humans or animals face. In this thesis, we examine how people can deal or actually deal with tasks where these conditions are not met; a problem that is often referred to as the problem of partial observability. We consider two types of partial observability: (1) that which is due to state uncertainty, and (2) that which is due to the dependency of some states on past information, and hence requires some sort of memory. In two separate experimental studies, we show that people are, in general, able to deal with both these types of partial observability when the tasks involved are simple. In a series of modelling studies, we introduce a number of modifications to the Actor-Critic gating model, one of the seminal models for solving memory-dependent partially observable tasks, which substantially improved its learning speed and plausibility as a model of human learning. Our modelling work also generates a number of predictions, which can be used to guide future experimental investigations.