Neurons in the primate dorsal striatum signal the

Full text

Turn on search term navigation

ARTICLE

Received 29 Mar 2016 | Accepted 28 Jul 2016 | Published 14 Sep 2016

To learn, obtain reward and survive, humans and other animals must monitor, approach and act on objects that are associated with variable or unknown rewards. However, the neuronal mechanisms that mediate behaviours aimed at uncertain objects are poorly understood. Here we demonstrate that a set of neurons in an internal-capsule bordering regions of the primate dorsal striatum, within the putamen and caudate nucleus, signal the uncertainty of object reward associations. Their uncertainty responses depend on the presence of objects associated with reward uncertainty and evolve rapidly as monkeys learn novel objectreward associations. Therefore, beyond its established role in mediating actions aimed at known or certain rewards, the dorsal striatum also participates in behaviours aimed at reward-uncertain objects.

DOI: 10.1038/ncomms12735 OPEN

Neurons in the primate dorsal striatum signal the uncertainty of objectreward associations

J. Kael White1 & Ilya E. Monosov1

1 Department of Neuroscience, Washington University School of Medicine, 660 S. Euclid Avenue, St Louis, Missouri 63110, USA. Correspondence and requests for materials should be addressed to I.E.M. (email: mailto:[email protected]

Web End [email protected] ).

NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 1

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12735

To survive, humans and other animals must act on objects that have been previously associated with certain or reliable rewards13. However, learning, foraging and decision-

making also require animals to monitor, approach and act on objects associated with variable or unknown rewards47, even when the mean reward value of such uncertain objects is lower than that of other objects810. To date, the mechanisms that direct behaviour towards uncertain objects are not well understood.

Expected (or certain) reward-driven behaviours are in part dependent on the caudateputamen complex1113, also called the dorsal striatum (DS). In primates, the caudate nucleus in particular has recently been shown to contain multiple mechanisms for directing gaze at objects associated with high reward values1417. Here we asked if the primate DS also contains a mechanism to support behaviour aimed at objects associated with outcome uncertainty.

Our experiments showed that a subset of neurons, mostly in the internal-capsule bordering regions of the DS (icbDS), was preferentially activated by visual objects associated with reward-uncertain outcomes. Furthermore the icbDS reward-uncertainty responses depended on the presence of visual objects associated with reward uncertainty because they were mostly ablated when the object was removed before the uncertain outcome was delivered. Finally, during objectreward associative learning, icbDS neurons uncertainty responses evolved rapidly as monkeys learned novel objectreward associations. These uncertainty responses identied object associations that were uncertain either due to the subjects lack of knowledge or due to known uncertainty (also called risk18,19).

Our experiments suggest that uncertainty-sensitive neurons in the primate DS may play important roles in object-based behaviours under uncertainty.

ResultsDS neurons selectively signal reward uncertainty. To test if the primate DS contains neurons that are preferentially activated by visual objects associated with reward-uncertain outcomes, we recorded 141 single neurons from DS while two monkeys (B, n 103 neurons; W, n 38 neurons) participated in a

behavioural procedure that was composed of two distinct blocks: a reward-probability block, in which three visual conditioned stimuli (CSs) predicted a 0.25 ml juice reward with 100, 50 and 0% chance; and a reward-amount block, in which three CSs predicted 0.25, 0.125 and 0 ml of juice (experiment 1). For each block, we used two fractal sets that could appear in one of three spatial locations. Monkeys knowledge of the task was tested with interleaved choice trials (Methods), and neuronal recordings did not begin until the monkeys chose the CSs associated with higher expected value over CSs associated with lower expected value 490% of the time (Supplementary Fig. 1).

Uncertainty-sensitive neurons were dened as those that varied their responses across the task CSs (KruskalWallis test; Po0.01)

and displayed signicantly stronger responses to the 50% CS than to both 100 and 0% reward CSs or weaker responses to 50% CS than to both 100 and 0% reward CSs (two-tailed rank-sum tests; Po0.01). We found that 45/141 neurons, mostly in the internal-capsule bordering regions of the striatum, were selectively activated by reward uncertainty (n 19 in monkey W; n 26

in monkey B). 0/141 neurons was selectively suppressed by uncertainty.

An example uncertainty-sensitive (U ) neurons CS responses

are shown in Fig. 1a. Its activity increased following the presentation of the CS that predicted 0.25 ml of juice reward with 50% chance until the uncertain outcome was delivered and

the uncertainty was resolved. This example neuron did not strongly respond to other CS objects or task events.

The location of all recorded U neurons is shown in Fig. 1b.

U neurons were most often found within the anteriordorsal

putamen and caudate nucleus regions that bordered the internal capsule (Supplementary Figs 24), prominently in the anterior putamen. We refer to this brain area as the icbDS. The low baseline discharge rate of U neurons (mostly o1 spikes per s;

Fig. 1c) suggests that they are medium spiny neurons12,15,2022 the chief output neurons of the striatum.

All U neurons exhibited roughly similar responses

(Fig. 2a,b). On average, they were strongly activated by the presentation of the CS that predicted 0.25 ml of juice reward with 50% chance. This activation was most often a ramp-like increase in activity, which continued until the uncertain outcome was delivered and the uncertainty was resolved (Fig. 2a). Amongst single neurons, 44/45 U neurons responded more strongly to

the CS object associated with 50% 025 ml of juice than to the CS object associated with 0.125 ml of juice (Fig. 2b) even though these CSs were associated with the same expected reward value.

Further neuron-by-neuron analyses revealed that amongst the task features of experiment 1, U neurons were consistently

sensitive to reward uncertainty and to reward context (that is, difference between trials in which reward was possible versus trials in which rewards would not be delivered). This is shown in Fig. 2c and in Supplementary Fig. 5 for icbDS U neurons in

caudate and putamen, separately. Most single U neurons did

not encode information about expected values (dened as the difference between responses to objects associated with 0.25 and0.125 ml of juice), spatial- and object-feature parameters (Fig. 2c), or aversive outcomes (Supplementary Fig. 6). However, 24/45 U neurons discriminated reward-associated CSs from CSs

associated with no outcome delivery (Fig. 2c, this reward-related enhancement can also be observed in the average activity in Fig. 2a). Also, on average, U neurons responded to the delivery

of expected/certain rewards with a weak but consistent phasic excitation (Fig. 2a; Po0.05; sign-rank test). The observations in

Fig. 2 indicate that while U were preferentially dedicated to

signalling reward uncertainty, they were also sensitive to reward context (or expectation) and reward delivery.

While U neurons did not encode the locations of CS objects,

thus far, it was unknown if they respond before or during saccades aimed at reward uncertain objects. To assess this further, we studied the dynamics of U uncertainty selectivity during

choice trials. We found that, on average, U uncertainty

selectivity emerged after the monkeys xated the object associated with reward uncertainty (Supplementary Fig. 7). Therefore, U

neurons did not trigger saccades aimed at reward-uncertain

objects.

Overall, the results of experiment 1 showed that the icbDS contains a subpopulation of neurons with striking sensitivity to objects associated with reward uncertainty. However, several important questions about these neurons remained unclear. First, are they sensitive to the level of uncertainty in a graded manner7,23? Second, do U neurons signal internal states

related to the expectation of reward or are their uncertainty responses dependent on external cues or objects? Third, can U

neurons support object learning under uncertainty? To answer these important questions, we selectively recorded from U

neurons in the icbDS in experiments 24.

icbDS neurons are sensitive to the level of reward uncertainty. To test if U neurons were sensitive to the level of reward

uncertainty, in experiment 2, we recorded 20 U neurons (14 in

monkey B and 6 in monkey W) in a behavioural procedure in

2 NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12735 ARTICLE

Reward-probability block

50% 0.25 ml 0 ml of juice

0.25 ml

150

Fixation CS onset Trial outcome Fixation CS onset Trial outcome Fixation CS onset Trial outcome

Firing

rate (spikes per s)

Reward-amount block

0.125 ml 0 ml of juice

0.25 ml

+1 mm from AC +5 mm from AC

150

Fixation CS onset Trial outcome Fixation CS onset Trial outcome Fixation CS onset Trial outcome

Firing

rate (spikes per s)

b c

0mm from AC

U+

Count

2 *

Caudate

Spike duration (ms)

pMSN pCHAT

Caudate

Putamen

0.6

U+

Putamen

Count

Other striatal neurons

0 0 16 Baseline firing rate

Range of neurons:~ 2 to +3 mm from AC

Range of neurons:~ 3.1 to +10 mm from AC

Figure 1 | Selective reward-uncertainty responses in the DS. (a) Responses of a single uncertainty selective (U ) neuron in the internal-capsule

bordering region of the striatum to the presentation of six fractal objects (shown above rasters) associated with certain and uncertain predictions of juice reward. Dark blue raster plots indicate the activity in 50% CS trials in which reward was omitted. (b) Estimated locations of 45 U neurons (red dots) in

the internal-capsule bordering striatum shown on two coronal slices. Ranges of the neurons on each slice and the distance of each slice from the centre of the anterior commissure (AC) are indicated. Black dots indicate other recorded neurons. Inset is the histogram of recording locations along the anterior posterior axis. U neurons (red) were most often found anterior to the AC. (c) Histogram of baseline ring rates of recorded neurons. Inset shows spike

durations (trough-to-trough) for all U neurons (left), non-uncertainty-selective putative medium spiny neurons (pMSN; neurons with a baseline ring

rate of o3 spikes per s), and non-uncertainty-selective putative cholinergic interneurons (pCHAT) neurons (neurons with a baseline ring rate Z3 spikes per s). Error bars indicate standard errors. Single neuron data points are shown as scatters. Asterisks indicate signicant differences (Wilcoxon rank-sum test; Po0.05).

which monkeys experienced a reward-probability block that contained ve objects associated with ve probabilistic reward predictions (0, 25, 50, 75 and 100% of 0.25 ml of juice), and a reward-amount block that contained ve objects associated with 100% reward predictions of varying reward amounts(0.25, 0.1875, 0.125, 0.065 and 0 ml)7,23. The expected values of the ve CSs in the probability block matched the expected value of the ve CSs in the amount block.

Reward-uncertainty neurons in icbDS were identied during online screening as neurons that responded to any of the uncertain conditioned stimuli (25, 50 or 75% reward). The same preselection criteria were used in subsequent experiments in this study and in our previous reports7,23.

An example U neurons responses to the 10 CS objects are

shown in Fig. 3a. It responded most strongly to the presentation of the 50% CS object, and less strongly to the presentation of the 25 and 75% CS objects. Moreover, it did no respond to the presentation of objects associated with certain reward predictions (0 and 100% reward CS objects and CS objects in the reward-amount block). A similar result can be observed across the population of U neurons (Fig. 3b,c). U neurons average

response was strongest for the presentation of the 50% CS object. Their responses were weaker for 25 and 75% reward-associated CS objects. On average, there was no signicant difference between their responses to the 25% versus 75% CS objects, which have the same level of uncertainty but different expected values. Furthermore, as in experiment 1, during the reward-amount block, the neurons discriminated objects associated with rewards from objects associated with no reward (Fig. 3c, black trace). In sum, experiment 2 showed that U neurons were sensitive to

the levels of reward uncertainty.

icbDS uncertainty responses are object-dependent. The results of experiments 1 and 2 are consistent with two scenarios. First, U responses may signal internal states related to reward

expectation, particularly with the expectation of uncertain rewards. A second scenario is that U responses may signal the

uncertainty of the objectreward associations, rather than the internal state associated with reward uncertainty. To distinguish between these alternatives, monkeys were presented with four CSs (experiment 3). Two distinct CSs were associated with 100 and 50% chances of reward and were kept on the experimental

NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 3

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12735

45 U+ neurons

1 All neurons (n=141)

a b

Proportion

Normalized

response (spikes per s)

0 100% 0.25 ml

50% 0.25 ml (reward)

50% 0.25 ml (no reward)

100% 0.125 ml

CS response

0 ml

Fixation

CS onset

Trial outcome

0 0.25 0.25 50%

0.125 0 ml

0 / 45 1 / 45 5 / 45 6 / 45 24 / 45 45 / 45

P<0.01

Count

01 1 1 1 1 1 1 1 1 1 1 1CS location sensitivity

CS object-feature sensitivity

Reward prediction error sensitivity

Reward value sensitivity

Reward context sensitivity

Uncertainty sensitivity

Figure 2 | Population activity of U neurons. (a) Average responses of 45 U neurons to different reward predictions in the reward-probability and

reward-amount procedure. Shaded region represents standard error. The inset shows proportion of neurons (of 45 U neurons and of all 141 striatal

neurons) displaying uncertainty selectivity during the CS epoch in time. (b) CS responses of 45 U neurons for different reward predictions in the reward-

probability and reward-amount procedure (normalized to the maximum CS response; from 0 to 1). In all, 44/45 neurons had the highest response for the 50% CS. (c) Sensitivity indices (Methods) for 45 striatal uncertainty-selective neurons for different behavioural/task variables. Asterisk above the histogram indicates signicant deviation from 0 (Po0.01; sign-rank test). Signicant individual neuron indices (Po0.01; Wilcoxon rank-sum test) are grey.

The number of signicant indices is indicated near the histogram.

presentation screen for 2.5 s, until the time of the trial outcome (same trial structure as in Fig. 1a). Two other CSs were also associated with 100 and 50% chances of reward and were present on the screen for 1 s and outcomes were delivered in 1.5 s after the removal of the CSs (the 1.5 s period during which the CS is not present is referred to as a trace period). Therefore, for all CSs, reward was delivered 2.5 s after CS onset. Monkey performance indicated that they understood the procedure and were similarly motivated by trace and no-trace 50% reward predictions (Supplementary Fig. 8).

We identied U neurons in icbDS and recorded their

activity in this paradigm (n 32 neurons; 11 in monkey W and

21 in monkey B). An example U neuron is shown in Fig. 4a.

This neuron robustly discriminated 50% reward-associated CS object (uncertain condition) from the 100% reward-associated CS object (Po0.01; rank-sum test). But surprisingly, the removal of the uncertain CS (trace condition) before the outcome was delivered completely abolished its uncertainty selectivity (Fig. 4a, green and blue traces). Similar results were found for most of the U neurons (Fig. 4b). The discriminability of striatal uncertainty

signals was greatly diminished when the uncertain object was not present at the time of the outcome (Fig. 4c). Many U neurons

uncertainty signals were completely abolished (Fig. 4b,c). These results indicate that U neurons reward-uncertainty responses

are contingent on the presence of the uncertain object.

In the basal forebrain (particularly in its medial regions), some neurons also signal reward uncertainty with ramp-like responses23, however, additional experiments revealed that their uncertainty-selective signals persist during the same trace-conditioning procedure used to study U neurons

(Supplementary Fig. 8). Consistent with this observation, other reward-related signals are preserved during trace conditioning in brain regions that are interconnected with the basal forebrain, such as in the dorsal raphe24 and in the amygdala25. These observations suggest that basal forebrain and related limbic structures signal values and uncertainty of internal states (perhaps

somewhat independently of the external environment), whereas

the U neurons in the basal ganglia signal reward uncertainty

associated with objects.

icbDS uncertainty responses are rapidly shaped by learning. The data thus far prompted us to assess how U neuronal

responses are shaped by the learning of novel objectreward associations (experiment 4). Thus, far we tested the responses of U neurons to reward uncertainty arising from knowledge

about reward variability associated with 50% reward CSs (also called known-uncertainty or risk). But, if uncertain object reward signals in the DS contribute to object learning, then U

neurons should also signal uncertainty that is due to a lack of previous objectoutcome associations (also called ambiguity)an uncertainty that can be identied and resolved by learning. To test this, we recorded the activity of identied U neurons in a

Pavlovian procedure in which three novel fractals were used as CSs associated with 100, 50 and 0% reward probabilities (n 30

neurons; 11 in monkey W and 19 in monkey B). One example U neuron is shown in Fig. 5a. At the start of learning, this

neuron showed a strong increase in response to all the novel CSs. As the CSs were repeatedly experienced, the neuronal activity started to decrease for certain CSs (0 and 100%) and remained roughly the same for the reward-uncertain CS (50% reward prediction). The population of 30 U neurons shows a similar

pattern (Fig. 5b and Supplementary Fig. 9). The neuronal responses to certain objectreward associations decreased as the monkeys learned (Fig. 5c). These results demonstrated that U

neurons signal objectreward uncertainty of unknown or novel objects and that the DS uncertainty responses can be rapidly shaped by learning, even within a single experimental session.

DiscussionIn the caudateputamen complex we found a population of neurons that signal uncertainty of objectreward associations.

4 NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12735 ARTICLE

Firing

rate (spikes per s)

100% 0.25 ml 75% 0.25 ml 50% 0.25 ml 25% 0.25 ml 0 ml

100

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

100 0.25 ml 0.1875 ml 0.125 ml 0.0625 ml 0 ml

100

Firing

rate (spikes per s)

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

Fixation

CS onset

Trial outcome

Reward-amount block

100% 0.25 ml 75%

0.25 ml0.1875 ml0.125 ml0.0625 ml 0 ml

50% 25%

n=5 n=12 n=3

** 00% 25% 50% 75%100%

Reward-probability block

Reward-amount block

40 N=20

0 ml

Normalized

response (spikes per s)

Normalized

response (spikes per s)

** **

NS NS NS

Fixation CS onset

Trial outcome

Fixation CS onset

Trial outcome

0% 25% 50% 75% 100% of 0.25 ml

0.0625 0.125 ml 0.1875 0.25 ml

Figure 3 | Striatal U neurons are sensitive to the level of reward uncertainty. (a) Responses of a single uncertainty selective (U ) neuron to the

presentation of 10 fractal objects associated with certain and uncertain predictions of juice reward. (b) Average responses of 20 U neurons in the

reward-probability block (left) and reward amount block (right). (c) Average normalized responses of 20 U neurons for probability (red) and amount

(black) CSs. Asterisks indicate differences between CSs (**Po0.01; *Po0.025; paired sign-rank test). The inset shows the single neurons CS responses for different reward predictions in the reward-probability block (normalized to the maximum CS response; from 0 to 1). Numbers above the inset indicate the number of cells that exhibited the greatest response for 25, 50 or 75% CSs; 60% of the neurons exhibited greatest response for 50% reward CS.

CS onset

a b c

Trial outcome

100% reward (no trace)

No trace

Neurons Neurons

Trace

Normalized

response (spikes per s)

Trace

0.5

1 (AUC.)

Firing rate (spikes per s)

50% reward (no trace)

P<0.01

AUC (50%

versus 100%)

100% reward (trace) 50% reward (trace)

N=1

N=32

No trace

Trial outcome

0 CS onset

Trace start

Trace

No trace

Trace

Figure 4 | Striatal U neurons responses are object-dependent. (a) Responses of a single U neuron to 100 and 50% reward predictions without a

trace period (CS objects remained until the outcome) (black and red), and with a trace period (CS objects disappeared after 1 s) (green and blue). (b) In all, 22/32 neurons displayed signicant differences in reward-uncertainty responses across the no-trace and trace conditions (red; rank-sum test; Po0.01;

26/32 were signicant with a 0.05 threshold). All signicant changes were reductions of uncertainty responses. Here normalization was performed by subtracting 100% CS responses from 50% CS responses (for trace and no-trace conditions, separately). (c) Single neuron (insets above) and population reward-uncertainty discriminability was greatly diminished in the trace condition. AUC, area under receiver-operating characteristic curve.

These U neurons were often found in the icbDS. Their

uncertainty-selective responses depended on the presence of objects associated with reward uncertainty and evolved rapidly as monkeys learned novel objectreward associations.

What brain regions supply reward uncertainty signals to U

neurons? Their average location in the striatum may provide a clue. U neurons were most often found within the anterior

putamen and caudate regions that bordered the internal capsule

(icbDS), prominently in the anterior putamen. icbDS receives inhibitory inputs from the ventral pallidum26, where some neurons are inhibited by reward uncertainty (Supplementary Fig. 10)27. Given the uncertainty-excitatory responses of many icbDS neurons (Fig. 2), we hypothesize that the inhibition of pallidal neurons by uncertainty may open a gate, so that U

neurons can selectively respond to cortical inputs carrying sensory information about objects28,29 and about their reward

NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 5

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12735

0.25 ml

Novel fractals

Presentation number

CS onset

50% 0.25 ml 0 ml of juice

Trial outcome

CS onset Trial outcome

b c

Normalized

response (spikes per s)

Proportion of choices of the

higher-valued CSs

NS ** ** **

100% 0.25 ml 50% 0.25 ml 0 ml

* NS

30 neurons

0.5

610

1115

1620

2125

610

1115

1620

2125

Presentation number

Figure 5 | U responses are rapidly shaped by learning. (a) Single

neurons responses (shown as single trial rasters) to the presentation of three novel objects shown in the order of the monkeys experience (bottom to top). (b) Binned neuronal population response across learning (30 learning sessions, 30 neurons) shown separately for 100, 50 and 0% reward-associated novel objects. Asterisks indicate signicant variance across the three conditions (Po0.01; KruskalWallis test). Neuronal responses are shown separately for Pavlovian and choice trials in

Supplementary Fig. 9. (c) Monkeys choices during learning. Proportion of choices of the higher-valued fractal CS objects during randomly interleaved choice trials (binned like neuronal activity in b). **Po0.01, *Po0.05 (sign-rank test assessing difference between bins).

value or uncertainty30. But precisely what cortical regions send uncertainty and other signals to U neurons remains to be

assessed.The task responses of striatal U neurons differentiated them

from reward uncertainty-selective neurons in the anterodorsal septum and the medial basal forebrain. For example, during object learning, anterodorsal septal uncertainty-selective neurons responded preferentially to knowledge-based uncertainty (often called risk), after monkeys learned the uncertain stimulus response association7. In contrast, during a similar object-learning task, U neurons responded strongly to novel stimuli,

whose conditioned stimulusunconditioned stimulus relationship was not yet learned (Fig. 5). Unlike U neurons, medial basal

forebrain reward uncertainty-sensitive neurons slowly learned to discriminate between certain and uncertain reward-predicting objects23. This slow learning was not correlated with the fast time course of the monkeys objectreward associative learning23. These data are consistent with the observation that there are no known connections from the medial basal forebrain or septum to the striatum and suggest that U neurons belong to a mostly

distinct system for signalling uncertainty of objects that may be particularly well suited to contribute to object learning.

It is noteworthy that U neurons did not encode all types of

uncertainty, or only uncertainty7,19,31. First, they did not respond to uncertainty about punishments. Whether there are neurons that signal uncertainty about all salient events (such as uncertainty about rewards and punishments) remains a mystery. Second, on average, they discriminated reward-associated CSs from reward-unassociated CSs (Fig. 2a,c). In fact, similar reward-related tonic activity shifts were observed in other neurons that

encode reward uncertainty7,23. It remains to be tested whether they are due to context value (or relevance), or if they are due to uncertainty that could exist even during the expectation of certain rewards (for example, due to errors in the estimation of reward timing). Third, U neurons uncertainty responses were

abolished by the removal of the CS before the trial outcome (during trace conditioning). This suggests that striatal U

neurons responses depended on the presence of the uncertain CS

object. This nding further differentiated striatal U neurons

from uncertainty-enhanced neurons in the medial basal forebrain whose uncertainty selectivity persisted when the CS object was removed before the trial outcome (Supplementary Fig. 8).

Our study in monkeys and a previous human brain-imaging study32 suggest that icbDS is a prominent node for processing information about reward uncertainty. However, it remains possible that there are other striatal mechanisms for signalling uncertainty, and/or for integrating uncertainty with stimulus-feature information, movement kinematics and values33,34. Indeed, different areas of the primate striatum learn and signal values in distinct manners11,14,16,17,3337 to support their different roles in action, decision-making, and learning and memory11,14,15,17,29,33,34,36,3840. How uncertainty guides computations across different striatal subregions must therefore be an important direction of future studies.

Objects in the environment are important because they signal rewards or dangers, or because they represent an opportunity to learn and change ones state. In this study, we showed that the basal ganglia signals reward uncertainty of objectreward associationsa critical variable for monitoring and learning from objects. These results demonstrate a novel role for internal-capsule bordering putamen and caudate in controlling behaviours in uncertain contexts.

Methods

General procedures. Two adult male rhesus monkeys (Macaca mulatta) were used for the neurophysiology experiments in the DS (Monkeys B who is 6 years old; and Monkey W who is 5.25 years old). All procedures conformed to the Guide for the Care and Use of Laboratory Animals and were approved by the Washington University Institutional Animal Care and Use Committee. A plastic head holder and plastic recording chamber were xed to the right side of the skull under general anaesthesia and sterile surgical conditions. The chambers were tilted laterally by 35 and aimed at the anterior portion of the striatum. After the monkeys recovered from surgery, they participated in the behavioural and neurophysiological experiments.

Data acquisition. While the monkeys participated in the behavioural procedures we recorded single neurons in the right DS. The recording sites were determined with 1 mm-spacing grid system and with the aid of magnetic resonance images(3 T) obtained along the direction of the recording chamber. This magnetic resonance imaging-based estimation of neuron recording locations was aided by custom-built software (PyElectrode). Single-unit recording was performed using glass-coated electrodes (Alpha Omega). The electrode was inserted into the brain through a stainless-steel guide tube and advanced by an oil-driven micro-manipulator (MO-97A, Narishige). Signal acquisition (including amplication and ltering) was performed using Alpha Omega 44 kHz SNR system. Action potential waveforms were identied online by multiple time-amplitude windows with an additional template-matching algorithm (Alpha-Omega). Neuronal recording was restricted to single neurons that were isolated online. Neuronal and behavioural analyses were conducted ofine in Matlab (Mathworks, Natick, MA).

Eye position was obtained with an infrared video camera (Eyelink, SR Research). Behavioural events and visual stimuli were controlled by Matlab (Mathworks, Natick, MA) with Psychophysics Toolbox extensions. Juice, used as reward, was delivered with a solenoid delivery reward system (CRIST Instruments). Juice-related anticipatory licking during the CS epoch was measured and quantied using previously described methods23.

Reward-probability and reward-amount procedure (experiment 1). The reward-probability and reward-amount behavioural procedure consisted of two blocks, a reward-probability block and a reward-amount block (Fig. 1). In the reward-probability block, three visual fractal CSs were followed by a liquid reward(0.25 ml of juice) with 100, 50 and 0% chance, respectively. In the reward-amount block, three CSs were followed by a liquid reward of 0.25, 0.125 and 0 ml,

6 NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12735 ARTICLE

respectively. Thus, the expected values of the three CSs matched between the probability and amount blocks. To control for neuronal object preference, we used two fractal sets (that is, for every CS there were two different fractals).

Each trial started with the presentation of a green trial-start cue at the centre. The monkeys had to maintain xation on the trial-start cue for 1 s; then the trial-start cue disappeared and one of the three CSs was presented pseudo randomly. After 2.5 s, the CS disappeared, and juice (if scheduled for that trial) was delivered. The monkeys were not required to xate on the CSs. In each trial, the CS could appear in three locations: 10 to the left or to the right of the trial-start cue, or in the centre. One block consisted of 18 trials with xed proportions of trial types (each of the three CSs appears three times each block, 9/18 trials total).

In the remainder of the trials in each block (9/18), the monkeys chose amongst the task CSs. Each trial started with the presentation of a purple trial-start cue at the centre, and the monkeys had to xate it for 0.5 s. After the monkey xated on the trial start cue for 0.5 s, a choice array was presented consisting of two fractals used in the Pavlovian procedure (shown in Fig. 1a). The monkey had to continue to xate until the trial start cue disappeared (0.5 s). Monkeys then made saccadic eye movements to their preferred reward-associated fractals and xated them for 0.75 s to indicate their choices. Then, the unchosen stimulus disappeared, and the monkeys waited for 1 s to receive the scheduled outcome (associated with their chosen fractal).

The inter-trial intervals ranged from 3 to 6 s. Approximately one in ve inter-trial intervals contained uncued events (chosen randomly). These could be either a juice reward alone (0.25 ml) or an B70 dB 0.15 s auditory white noise burst paired with a brief change in screen colour (same duration as the auditory stimulus).

Neuronal recordings did not begin until the monkeys chose the CSs associated with higher expected value over CSs associated with lower expected value 490% of the time. The monkeys knowledge of the CSs was further conrmed when we measured the monkeys licking behaviour. The magnitude of licking was correlated to the reward value of the fractals in the reward-probability block (Po0.001;

Spearmans rank correlation) and the reward-amount blocks (Po0.001; Spearmans rank correlation).

Five reward-probability and reward-amount procedure (experiment 2). The reward-probability and reward-amount behavioural procedure consisted of two blocks, a reward-probability block and a reward-amount block. The trial structure was the same as in experiment 1. However, here the reward-probability block contained ve objects associated with ve probabilistic reward predictions (0, 25, 50, 75 and 100% of 0.25 ml of juice) and a reward-amount block that contained ve objects associated with 100% reward predictions of varying reward amounts (0.25,0.1875, 0.125, 0.065 and 0 ml)7,23. One block consisted of 20 trials with xed proportions of trial types (each of the ve CSs appears four times each block).

Trace reward-probability procedure (experiment 3). The temporal structure of this procedure was the same as in probability-amount procedure (experiment 1). The trace procedure contained four possible distinct CS fractals. The rst two CSs were associated with 100% (CS 1) and 50% (CS 2) chance of 0.25 ml of juice. These CSs remained on the screen for 2.5 s and were followed by the scheduled reward outcome. (same as in experiment 1). The other two CSs were also associated with 100% (CS 3) and 50% (CS 4) chance of 0.25 ml of juice but were only presented for 1 s. This was followed by a 1.5 s trace period, during which the screen did not contain any stimulus. The trace period was followed by the scheduled reward outcome. Therefore, in both trace and non-trace conditions, monkeys experienced two types of reward predictions (certain and uncertain) and experienced outcome delivery in 2.5 s after the initial CS presentation.

Object learning procedure (experiment 4). Instead of using previously conditioned object fractals, monkeys were exposed to three novel CSs associated with 100, 50 and 0% chance of reward delivery. The task design and temporal structure of the trials were the same as in probability-amount procedure (experiment 1). However, the interleaved choice trials were choice trials amongst the three novel fractals.

Appetitiveaversive procedure. The procedure consisted of two alternating blocks: appetitive and aversive23. In the appetitive block, three visual fractal CSs were followed by a liquid reward (0.4 ml of juice) with 100%, 50% and 0% chance, respectively. In the aversive block, three visual fractal CSs were followed by an air puff with 100%, 50% and 0% chance, respectively. Airpuff (B35 psi) was delivered through a narrow tube placed 68 cm from the monkeys face. Temporal structure of the trials was the same as in other procedures, but here monkeys were not required to xate the trial start cue. Each block consisted of 12 trials with xed proportions of trial types (100%, four trials; 50%, four trials; 0%, four trials).

Data processing and statistics. Spike-density functions were generated by convolving spike times with a Gaussian lter (s 50 ms). To display single neu

rone examples (Figs 1a, 3a, and 4a) spike-density functions were generated by convolving spike times with a 100 ms Gaussian lter. A neuron was dened as uncertainty sensitive if its responses varied across the four possible reward

predictions (100% 0.25, 50% 0.25, 100% 0.125 and 0 ml of juice) (KruskalWallis test, Po0.01; analysis window: 100 ms after CS presentation until outcome) and if its response to the uncertain CS (50%) was signicantly stronger or weaker than its responses to both 100 and 0% reward CSs (two-tailed rank-sum test; Po0.01). The same analysis window was used to study neuronal activity during the CS epoch in Fig. 2c.

To normalize task-event-related responses, we subtracted baseline activity (the last 500 ms of the inter-trial interval) from the activity during the task-event-related measurement epoch. All statistical tests were two-tailed. For comparisons between two task conditions for each neuron, we used a rank-sum test, unless otherwise noted. For comparisons between two task conditions across the population average we used a paired signed-rank test, unless otherwise noted. Statistical threshold throughout this study is Po0.01 unless otherwise noted.

To assess the sensitivity of individual uncertainty-selective striatal neurons to task-related variables in Experiment 1 (Fig. 2c), we obtained their response indices (difference between neuronal responses to two conditions divided by their sum). To assess CS spatial location sensitivity, we compared responses to the 50% CS when it was shown 10 to the right versus 10 to the left of centre. To assess object-feature sensitivity, we compared responses to two distinct 50% CS fractal objects. Reward-value sensitivity was assessed by comparing neuronal responses to 100%0.25 ml CS versus 0.125 ml CS. Reward-context sensitivity was assessed by comparing CS activity in certain reward trials (100% 0.25 and 0.125 ml CS trials) versus no reward trials. Uncertainty sensitivity was assessed by comparing responses to 50% reward CSs with 100% reward CSs. Reward prediction error sensitivity was assessed by comparing reward versus no-reward responses after the 50% reward prediction (in the 250 ms window after the outcome). Neuronal responses during experiments 24 were measured in the last 500 ms before the trial outcome.

To calculate receiver-operating characteristic (ROC) that assessed neuronal discrimination of uncertainty, we compared spike-density functions of 100% reward CS trials and 50% reward CS trials. The analysis was structured so that receiver-operating characteristic area values 40.5 indicate that the activity in the 50% reward CS trials is greater than in the 100% reward CS trials values o0.5 indicate that the activity in the 100% reward CS trials is greater than in the 50% reward CS trials.

Data availability. Data supporting the ndings of this study are available within the article and its Supplementary Information Figures or from the authors on request.

References

1. Hikosaka, O., Yamamoto, S., Yasuda, M. & Kim, H. F. Why skill matters. Trends Cogn. Sci. 17, 434441 (2013).

2. Schultz, W. Behavioral theories and the neurophysiology of reward. Annu. Rev. Psychol. 57, 87115 (2006).

3. Padoa-Schioppa, C. & Cai, X. The orbitofrontal cortex and the computation of subjective value: consolidated concepts and new perspectives. Ann. NY Acad. Sci. 1239, 130137 (2011).

4. Behrens, T. E., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 12141221 (2007).

5. Kolling, N., Behrens, T. E., Mars, R. B. & Rushworth, M. F. Neural mechanisms of foraging. Science 336, 9598 (2012).

6. Courville, A. C., Daw, N. D. & Touretzky, D. S. Bayesian theories of conditioning in a changing world. Trends Cogn. Sci. 10, 294300 (2006).

7. Monosov, I. E. & Hikosaka, O. Selective and graded coding of reward uncertainty by neurons in the primate anterodorsal septal region. Nat. Neurosci. 16, 756762 (2013).

8. Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532552 (1980).

9. Bach, D. R. & Dolan, R. J. Knowing how much you dont know: a neural organization of uncertainty estimates. Nat. Rev. Neurosci. 13, 572586 (2012).

10. Daddaoua, N., Lopes, M. & Gottlieb, J. Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates. Sci. Rep. 6, 20202 (2016).

11. Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specic reward values in the striatum. Science 310, 13371340 (2005).

12. Lauwereyns, J., Watanabe, K., Coe, B. & Hikosaka, O. A neural correlate of response bias in monkey caudate nucleus. Nature 418, 413417 (2002).

13. Graybiel, A. M. Habits, rituals, and the evaluative brain. Annu. Rev.Neurosci. 31, 359387 (2008).

14. Kim, H. F. & Hikosaka, O. Distinct basal ganglia circuits controlling behaviors guided by exible and stable values. Neuron 79, 10011010 (2013).

15. Nakamura, K., Santos, G. S., Matsuzaki, R. & Nakahara, H. Differential reward coding in the subdivisions of the primate caudate during an oculomotor task.J. Neurosci. 32, 1596315982 (2012).16. Lau, B. & Glimcher, P. W. Value representations in the primate striatum during matching behavior. Neuron 58, 451463 (2008).

NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 7

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12735

17. Cai, X., Kim, S. & Lee, D. Heterogeneous coding of temporally discounted values in the dorsal and ventral striatum during intertemporal choice. Neuron 69, 170182 (2011).

18. Burke, C. J. & Tobler, P. N. Coding of reward probability and risk by single neurons in animals. Front. Neurosci. 5, 121 (2012).

19. Platt, M. L. & Huettel, S. A. Risky business: the neuroeconomics of decision making under uncertainty. Nat. Neurosci. 11, 398403 (2008).

20. Hikosaka, O., Sakamoto, M. & Usui, S. Functional properties of monkey caudate neurons. I. Activities related to saccadic eye movements. J. Neurophysiol. 61, 780798 (1989).

21. Yamada, H. et al. Characteristics of fast-spiking neurons in the striatum of behaving monkeys. Neurosci. Res. 105, 218 (2016).

22. Yamamoto, S., Monosov, I. E., Yasuda, M. & Hikosaka, O. What and where information in the caudate tail guides saccades to visual objects. J. Neurosci. 32, 1100511016 (2012).

23. Monosov, I. E., Leopold, D. A. & Hikosaka, O. Neurons in the primate medial basal forebrain signal combined information about reward uncertainty, value, and punishment anticipation. J. Neurosci. 35, 74437459 (2015).

24. Hayashi, K., Nakao, K. & Nakamura, K. Appetitive and aversive information coding in the primate dorsal raphe nucleus. J. Neurosci. 35, 61956208 (2015).

25. Paton, J. J., Belova, M. A., Morrison, S. E. & Salzman, C. D. The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature 439, 865870 (2006).

26. Spooren, W. P., Lynd-Balta, E., Mitchell, S. & Haber, S. N. Ventral pallidostriatal pathway in the monkey: evidence for modulation of basal ganglia circuits. J. Comp. Neurol. 370, 295312 (1996).

27. Ledbetter, M. N., Chen, D. C. & Monosov, I. E. Multiple mechanisms for processing reward uncertainty in the primate basal forebrain. J. Neurosci. 36 78527864 (2016).

28. Haber, S. N., Kunishio, K., Mizobuchi, M. & Lynd-Balta, E. The orbital and medial prefrontal circuit through the primate basal ganglia. J. Neurosci. 15, 48514867 (1995).

29. Alexander, G. E., DeLong, M. R. & Strick, P. L. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357381 (1986).

30. ONeill, M. & Schultz, W. Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron 68, 789800 (2010).31. Hirsh, J. B., Mar, R. A. & Peterson, J. B. Psychological entropy: a framework for understanding uncertainty-related anxiety. Psychol. Rev. 119, 304320 (2012).

32. Hsu, M., Bhatt, M., Adolphs, R., Tranel, D. & Camerer, C. F. Neural systems responding to degrees of uncertainty in human decision-making. Science 310, 16801683 (2005).

33. Grinband, J., Hirsch, J. & Ferrera, V. P. A neural representation of categorization uncertainty in the human brain. Neuron 49, 757763 (2006).

34. Yanike, M. & Ferrera, V. P. Representation of outcome risk and action in the anterior caudate nucleus. J. Neurosci. 34, 32793290 (2014).

35. Yamamoto, S., Kim, H. F. & Hikosaka, O. Reward value-contingent changes of visual responses in the primate caudate tail associated with a visuomotor skill.J. Neurosci. 33, 1122711238 (2013).36. Jog, M. S., Kubota, Y., Connolly, C. I., Hillegaart, V. & Graybiel, A. M. Building neural representations of habits. Science 286, 17451749 (1999).

37. Klein, J. T. & Platt, M. L. Social information signaling by neurons in primate striatum. Curr. Biol. 23, 691696 (2013).

38. Haber, S. N. & Knutson, B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology 35, 426 (2010).

39. Ongur, D. & Price, J. L. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb. Cortex 10, 206219 (2000).

40. Ferry, A. T., Ongur, D., An, X. & Price, J. L. Prefrontal cortical projections to the striatum in macaque monkeys: evidence for an organization related to prefrontal networks. J. Comp. Neurol. 425, 447470 (2000).

Acknowledgements

This work was supported by the Defense Advanced Research Projects Agency (DARPA) Biological Technologies Ofce (BTO) ElectRx program under the auspices of Dr. Doug Weber through the CMO Grant/Contract No. HR0011-16-2-0022, Edward Mallinckrodt, JR Foundation award to I.E.M, and the Department of Neuroscience, Washington University School of Medicine. We are grateful to Dr Noah Ledbetter for assisting in data acquisition, to Ms. Kim Kocher for fantastic animal care and training and to Mr. Charles Chen for assisting with analyses. We thank Drs Timothy Holy, David Leopold and David Van Essen for reading earlier versions of this manuscript. Last, we thank Jonathon Tucker for helping with magnetic resonance imaging and Julia Pai for assistance with the associated artwork.

Author contributions

I.E.M and J.K.W. designed the research; J.K.W. and I.E.M. performed the research, analysed the data and wrote the paper.

Additional information

Supplementary Information accompanies this paper at http://www.nature.com/naturecommunications

Web End =http://www.nature.com/ http://www.nature.com/naturecommunications

Web End =naturecommunications

Competing nancial interests: The authors declare no competing nancial interests.

Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/

Web End =http://npg.nature.com/ http://npg.nature.com/reprintsandpermissions/

Web End =reprintsandpermissions/

How to cite this article: White, J. K. & Monosov I. E. Neurons in the primate dorsal striatum signal the uncertainty of objectreward associations. Nat. Commun. 7:12735 doi: 10.1038/ncomms12735 (2016).

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Web End =http://creativecommons.org/licenses/by/4.0/

r The Author(s) 2016

8 NATURE COMMUNICATIONS | 7:12735 | DOI: 10.1038/ncomms12735 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

Word count: 7443

Show less

Abstract

Translate

To learn, obtain reward and survive, humans and other animals must monitor, approach and act on objects that are associated with variable or unknown rewards. However, the neuronal mechanisms that mediate behaviours aimed at uncertain objects are poorly understood. Here we demonstrate that a set of neurons in an internal-capsule bordering regions of the primate dorsal striatum, within the putamen and caudate nucleus, signal the uncertainty of object-reward associations. Their uncertainty responses depend on the presence of objects associated with reward uncertainty and evolve rapidly as monkeys learn novel object-reward associations. Therefore, beyond its established role in mediating actions aimed at known or certain rewards, the dorsal striatum also participates in behaviours aimed at reward-uncertain objects.

Details

Title

Neurons in the primate dorsal striatum signal the uncertainty of object-reward associations

Author

White, J Kael; Monosov, Ilya E

Pages

12735

Publication year

2016

Publication date

Sep 2016

Publisher

Nature Publishing Group

e-ISSN

20411723

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/ncomms12735

ProQuest document ID

1819113886

Neurons in the primate dorsal striatum signal the uncertainty of object-reward associations

Jump to:

Full text

Abstract

Details

Suggested sources