The Thalamostriatal Pathway and Cholinergic Control of Goal-Directed Action: Interlacing New with Existing Learning in the Striatum
Laura A. Bradfield, Jesus Bertran-Gonzalez, Billy Chieng, and Bernard W. Balleine
Neuron (2013)
Doi: 10.1016/j.neuron.2013.04.039
Brief summary: Bradfield and colleagues shown that the lesion of parafascicular thalamic nucleus (Pf), the disconnection of Pf-posterior dorsomedial striatum (pDMS), and the ablation of cholinergic interneurons (ChIs) in pDMS could invariably impair the learning of action-outcome contingency if the associative relationship was changed during learning.
In this study, the authors trained rats to learn a series of goal-directed behaviors. First, rats learned two lever-press with two outcomes. Let’s say left lever-press with food, right lever-press with sucrose solution. Second, one outcome (randomly picked) was devalued by pre-feeding. The devaluation manipulation decreased animal’s choice to the devalued outcome in the afterwards test. The third phase was a reward degradation test. Reward degradation meant that decreasing the action-outcome (A-O) contingency by randomly deliver the reward outcome independent of the animals’ actions. It’s a test to assess animal’s sensitivity to the A-O association. Behaviorally, the choice to the degraded outcome would be decreased. Forth, after the degradation test, rats were re-trained to learn the initial A-O associations and then a reversal learning protocol was applied (from A1-O1/A2-O2 or A1-O2/A2-O1). Fifth, the second devaluation test for the updated A-O contingencies. Sixth, following the devaluation, rats re-learned the reversed contingencies in step four, and then went through an extinction learning. At last an outcome-selective reinstatement test was applied. Reinstatement means that selectively deliver one reward after extinction to remind rats of the preceding A-O contingency. Upon the above description, we know that these are difficult experiments.
First, the authors shown that Pf specifically projects to the ipsilateral pDMS. Second, excitotoxic lesions of Pf with NMDA decreased the activity of ChIs in pDMS. Third, bilateral NMDA-induced lesions of the Pf, disconnection of Pf and pDMS (by combining a unilateral Pf lesion with a unilateral lesion of the pDMS in the contralateral hemisphere) and reduction of ChIs’ activity (by applying muscarinic M2/M4 agonist) invariably impaired the degradation- (step three), the second devaluation- (step five) and the reinstatement- dependent effects (step six), while the performance at other phases were remained unaffected.
The manipulation did not affect the initial learning (step 1) of A-O association, implying that the Pf, the connection between Pf and pDMS, and ChIs in pDMS were required for learning the A-O contingency and for implementing the corresponding movements. The first devaluation effect was affected (step2) indicating that they were not responsible for reward detection or reward value updating. The impairment observed in the degradation test (step 3) supported their roles’ in updating or detecting the change of the A-O contingency. However, the performance of reversal learning (with A-O contingency changed) during step four was not significantly affected, which was against the A-O contingency updating hypothesis. The impairment effects observed during the second, but not the first, devaluation test might be attributed to the weak remembering of the new A-O contingency after reversal learning. The reasoning could also applied to the phenomenon observed in the reinstatement test. Taken the above results together, the updating of A-O contingency hypothesis alone can not be used to explain the overall observations. It seems that those targets (Pf, Pf-pDMS connections, ChIs in pDMS) are critical if both the A-O contingency and reward value were changed.
This is a typical data-driven story. I don’t believe that the authors conceived the question first and then designed the experiments. It should be the opposite: first, they observed the phenomena; and then packed the story. This kind of research severely depends on the backups of knowledge. Because Bernard W. Balleine is an expert and the disciple of the representative figure (Anthony Dickinson) in the instrumental learning field, he is very familiar with the hidden intentions of these behavioral tests. So he has the ability to weave, backwards, a fantastic story based upon the results. However, following the discussion yesterday, I don’t think the authors could draw some solid conclusions based on those Intertwined effects caused by many hidden parameters within the series behavioral tests they exploited.
Although, they had the energy to push the paper into a high-profile journal, more efforts will be needed to convince people to buy the story. As usual, experimental results is one thing; interpretation is another. The credit in science will not be assigned to the people who made the unexpected findings, but to the ones who incorporated the findings into a self-consistent conceptual framework. I am afraid, at least in this paper, the authors hadn’t make it. Let’s follow their new studies and try to find some convincing theories to account for the observations in the current study.