Content area
Animals integrate knowledge about how the state of the environment evolves to choose actions that maximise reward. Such goal-directed behaviour - or model-based (MB) reinforcement learning (RL) - can flexibly adapt choice to changes, being thus distinct from simpler habitual - or model-free (MF) RL - strategies. Previous inactivation and neuroimaging work implicates prefrontal cortex (PFC) and the caudate striatal region in MB-RL; however, details are scarce about its implementation at the single-neuron level. Here, we recorded from two PFC regions - the dorsal anterior cingulate cortex (ACC) and dorsolateral PFC (DLPFC), and two striatal regions, caudate and putamen - while two rhesus macaques performed a sequential decision-making (two-step) task in which MB-RL involves knowledge about the statistics of reward and state transitions. All four regions, but particularly the ACC, encoded the rewards received and tracked the probabilistic state transitions that occurred. However, ACC (and to a lesser extent caudate) encoded the key variables of the task - namely the interaction between reward, transition and choice - which underlies MB decision-making. ACC and caudate neurons also encoded MB-derived estimates of choice values. Moreover, caudate value estimates of the choice options flipped when a rare transition occurred, demonstrating value update based on structural knowledge of the task. The striatal regions were unique (relative to PFC) in encoding the current and previous rewards with opposing polarities, reminiscent of dopaminergic neurons, and indicative of a MF prediction error. Our findings provide a deeper understanding of selective and temporally dissociable neural mechanisms underlying goal-directed behaviour.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
* The manuscript has been refined with various revisions such as clarifications on the explore/exploit analysis, additional explanation of visual confounds in the transition structure, and clearer integration of putamen findings. We have also ensured all previously missing figures and relevant statistics are now included and that all figure legends are fully annotated. The revised manuscript is stronger and more accessible as a result. Importantly, none of the revisions altered the central conclusions of our study, which provide novel evidence from non-human primate single-unit recordings that model-based (MB) and model-free (MF) reinforcement learning processes are dissociably encoded across distinct prefrontal and striatal circuits.
Funder Information Declared
Wellcome Trust, 096689/Z/11/Z, 220296/Z/20/Z, 219525/Z/19/Z, 214314/Z/18/Z
Biotechnology and Biological Sciences Research Council, https://ror.org/00cwqg982, BB/W003392/1
Fundação para a Ciência e Tecnologia, SFRH/BD/51711/2011
Santa Casa da Misericórdia de Lisboa, Premio João Lobo Antunes 2017
Rosetrees Trust, https://ror.org/04e3zg361
Gatsby Initiative for Brain Development and Psychiatry, GAT3955
Jean Francois and Marie-Laure de Clermont Tonerre Foundation
Max Planck Society, https://ror.org/01hhn8329
Alexander von Humboldt Foundation, https://ror.org/012kf4317