Working memory-based reinforcement learning (RL) models (O’Reilly and Frank, 2006; Zilli and Hasselmo, 2008; Todd et al., 2009) are often slow to learn and require setting the learning parameters carefully in order to reasonably expect convergence to an optimal policy. In particular, it is crucial to allow the models to explore enough but not excessively (i.e., balancing the trade-off between exploration and exploitation). Solving the exploration-exploitation trade-off is especially critical in these models, as the addition of a working memory to the agent results in a sizeable state-action policy space.
The study of uncertainty in reinforcement learning (RL) has received an increased interest in the last decade. Two prominent types of uncertainty that have been studied extensively are expected and unexpected uncertainty (Yu and Dayan, 2003; Bland and Schaefer, 2012). Expected uncertainty refers to a known unreliability in the environment due, for example, to rewards being generated from a stable probability distribution; whereas unexpected uncertainty reflects a fundamental change in the environment that is not expected by the agent, such as a non-signalled change in the rules of the task.
Techniques based on reinforcement learning have been used to solve a wide range of learning problems. However, most current reinforcement learning techniques work only in Markovian settings, where rewards, optimal actions and transitions between states are all dependent on the present state only. To extend their use to non-Markovian environments, several reinforcement learning models have been enhanced with working memory (Zilli and Hasselmo, 2008; Todd et al., 2009; Lloyd et al.
Working memory refers to our ability to temporarily store and manipulate information. It is an important cognitive system necessary for many daily life tasks such as thinking, planning, problem solving, reading and communicating with others. These activities require us to selectively maintain some information in mind while replacing old and irrelevant information with new and relevant one.
Different types of learning have been shown to involve working memory, including rule-based category learning (Ashby and Spiering, 2004; DeCaro et al.