人类和其它动物通过提取广义的知识结构并将其映射到具体的感觉运动状态中,来概括先验知识以应对新问题。但是,大脑是如何实现这一过程的?
Samborska等人2022年10月发表在Nature Neuroscience上的文章探讨了这一问题。结果发现,尽管任务间的感觉运动相关性不同,内侧前额叶皮层(mPFC)的神经元在跨任务间保持了相似的结构表征;而海马(dCA1)神经元的表征更强烈地受到具体任务细节的影响。

当我们走进一家新餐厅,我们会坐下、扫码点餐、支付账单并等待上菜。这是很自然的过程,因为我们已经很了解餐馆的运作模式,只需将习得的知识映射到新的情景当中。这种知识映射的过程需要我们从过去的感觉运动细节中抽取出广义的知识结构,以便无缝的应用到新场景。
大脑如何实现了这一广义知识迁移的过程?以往研究表明,前额叶皮层与海马间的相互作用可能起到重要作用。Samborska等人采用概率逆转学习(probabilistic reversal learning)范式,一种新的学习范式:一系列任务具有相似的抽象的知识结构,而实际情景的物理位置、运动感觉细节不同,来探究前额叶皮层与海马对知识抽象并广义化、泛化的贡献。
实验范式
小鼠(以下称为个体)连续执行一组共享相同学习结构但不同物理空间布局的逆转学习任务(problem)。单个problem中,每个试次(trial)个体需要先戳洞(即起始端口),然后在A、B两个选择端口之间进行选择,以获得概率奖励。选择A端口获得奖励的概率为80%,选择B端口获得奖励的概率为20%,或者AB概率相反。一旦个体正确选择高奖励概率端口的试次数量超过总试次数量75%,A、B端口获得奖励的概率就会置换,即发生概率逆转。如果个体正确完成10次概率逆转,则开始学习新的problem。实验设计:trials(75% 正确阈限为标准) * 10 reversal blocks * 10 problems。
1. 个体在不同problem间泛化了学习知识结构
行为数据表明,个体在单个problem内达到需要触发概率逆转的试次数量越来越少。更重要的是,这种试次数量减少的现象不仅出现在problem内,也出现在problem间。这说明个体可能整合了获得高概率奖励的经验并以此提升了选择高概率奖励的能力。这一推测得到了逻辑回归模型分析的证实,个体的端口选择受到先前选择的影响。总体而言,数据表明个体基于过去经验,习得了如何学习任务的抽象知识结构。

Fig. 1 Transfer learning in mice.a, Trial structure of the probabilistic reversal learning problem. Mice poked in an initiation port (gray) and then chose between two choice ports (green and pink) for a probabilistic reward. b, Block structure of the probabilistic reversal learning problem. Reward contingencies reversed after the animal consistently chose the high reward probability port. c, Example sequence of problems used for training, showing different locations of the initiation (I) and two choice ports (A and B) in each problem. d, Example behavioral session late in training in which the animal completed 12 reversals. Top sub-panels show animals’ choices, outcomes they received and which side had high reward probability; bottom panel shows exponential moving average of subjects’ choices (tau = 8 trials). e, Mean number of trials after a reversal taken to reach the threshold to trigger the next reversal, as a function of problem number. f, Probability of choosing the new best option (the choice that becomes good after the reversal) on the last ten trials before the reversal and the first ten trials after the reversal split by the first problem and the last problem. The P value refers to the difference between the slopes after the reversal point in early and late training (paired-sample t-test, two-sided). g, Mean number of pokes per trial to a choice port that was no longer available because the subject had already chosen the other port, as a function of problem number. h, Mean number of pokes per trial to a choice port that was no longer available as a function of reversal number on the first five problems and the last five problems during training. The P value refers to the difference in the log of the time constants from fitted exponential curves in early and late training (paired-sample t-test, two-sided). i,j, Coefficients from a logistic regression predicting current choices using the history of previous choices (i), outcomes (not shown) and choice × outcome interactions (j). For each problem and predictor, the coefficients at lag 1–11 trials are plotted. k,l, Coefficients for the previous trial (lag 1, left) and average coefficients across lags 2–11 (right), as a function of problem number (P values derived from repeated-measures one-way ANOVAs with problem number as the within-subjects factor). Error bars on all plots show mean ± s.e.m. across mice (n = 9 mice). P values in e and g are from the two-way repeated-measures ANOVAs with problem number and reversal number as within-subjects factors.
这种泛化的学习是如何发生的?一种解释是任务结构抽象化,即大脑对扮演相同任务角色的不同物理空间布局使用相同的神经元来进行表征。为验证此种机制,神经电生理实验记录了内侧前额叶皮层(mPFC)与海马(dorsal CA1区)神经元细胞的放电状况。电生理实验设计:trials(75% 正确阈限为标准) * 4 reversal blocks * 3 problems。
2. PFC 和 CA1 中抽象的和任务-特异性的表征
单个神经元细胞水平上,PFC的神经元细胞对抽象的知识结构在跨任务间具有更稳定的表征,例如cell2在任何任务中只对起始端口反应,即使起始端口的物理位置是不同的。这说明PFC神经元更多的参与了试次知识结构抽象阶段的表征的构建过程。然而,CA1的一些神经元只对同一物理空间布局的端口有反应;一些神经元的活动在不同任务间出现“重新映射(remapping)”现象,即神经元细胞放电状况随物理端口与试次事件进行了调整。

Fig. 2 Recording units across multiple problems in a single session.a, Silicon probes targeting hippocampal dorsal CA1 and mPFC were implanted in separate groups of mice. b, Diagram of problem layout types used during recording sessions. c, Example recording session in which a subject completed four reversals in each of three problems. Top panel shows the ports participating in each problem color-coded by layout type. Bottom panel shows the exponential moving average of choices, with the choices, outcomes and reversal blocks shown above. d, Example PFC neurons. Cell 1 in PFC fired selectively to both choice ports (but not initiation) in each problem, even though the physical location of the choice ports was different both within and across problems. Cell 2 fired at the initiation port in every problem, even when its physical location changed. Cell 3 fired at B choice ports in all problems but also gained a firing field when initiation port moved to the previous B choice port (showing that PFC does have some port-specific activity). Cell 4 responded to reward at every choice port in every problem. Cell 5 responded to reward omission and had high firing during the ITI. Cell 6 responded to reward at B choice port (that switched location) in each problem. e, Example CA1 neurons. Some CA1 cells also had problem general firing properties (cells 1 and 2). Cell 1 fired at B choice that switched physical location between problems. Cell 2 responded to the same port in all problems and modulated its firing rate depending on whether it was rewarded or not. Cell 3 fired at the same port in all layouts. Cell 4 switched its firing preference from initiation to B choice that shared physical locations, analogous to ‘place cells’ firing at a particular physical location. This port selectivity was more pronounced in CA1 than PFC (Extended Data Fig. 4). Cells 5 and 6 ‘remapped’, showing interactions between problem and physical port. Cell 5 fired at a given port in one layout but not when the same port was visited in a different layout. Cell 6 fired at choice time at a given port in one layout and changed its preferred firing time to pre-initiation in a different layout. In all plots, average firing rates are arranged by layout types 1, 2 and 3, but the order in which they were experienced is plotted in the ‘Experienced layouts’ sub-panel. Error bars show firing rates ± s.e.m. across trials.
3. 群体神经元水平上,PFC比CA1更强的泛化了任务表征
线性回归模型来评估试次变量(A B选择、奖励结果)对不同脑区激活程度的影响,结果发现,PFC和CA1脑区活性都反映了当前选择、奖励结果、奖励结果和选择的交互作用,但是变量表征的强度存在脑区特异性。CA1中A、B选择的表征比PFC表征更显著,但奖励结果在PFC中的编码更强。
此外,表征相似性分析(RSA)及表征相似设计矩阵被用来测量任务-泛化特征对神经活动的影响。结果发现,PFC对试次时间阶段(起始阶段vs.选择阶段)和奖励结果具有更强的、抽象的、感觉运动不变的表征;然而,CA1对poke的物理位置以及A、B 选择具有更强的表征。令人惊奇的是,尽管A端口的空间物理位置及其代表的抽象意义在所有任务中保持不变,CA1仍旧显示出了对A选择的表征。这说明只是改变任务的上下文情景就引起了CA1神经元的“remapping”现象。


Fig. 4 Problem-general and problem-specific representations in PFC and CA1 population activity.a, Linear regression predicting activity of each neuron at each timepoint across the trial, as a function of the choice, outcome and outcome × choice interaction. b, CPDs from the linear model shown in a for choice, outcome and outcome × choice regressors in PFC and CA1. Significance levels for within-region effects were based on a two-sided permutation test where firing rates were rolled with respect to trials. Significance levels for differences between regions were based on a two-sided permutation test across sessions. All significance levels were corrected for multiple comparison over timepoints. c, Representation similarity at ‘choice time’ (left) and ‘outcome time’ (right), quantified as the Pearson correlation between the demeaned neural activity vectors for each pair of conditions. d, RDMs used to model the patterns of representation similarity observed in the data. Each RDM codes the expected pattern of similarities among categories in c under the assumption that the population represents a given variable. The Port RDM models a representation of the physical port poked (for example, far left) irrespective of its meaning in the trial. A vs B choice models a representation of A/B choices irrespective of physical port. The Outcome RDM models representation of reward versus reward omission. The Outcome at A vs B RDM models separate representations of reward versus omission after A and B choices. Choice vs Initiation models representation of the stage in the trial. Problem-specific A choice models separate representation of the A choice in different problems. e, CPDs in a regression analysis modeling the pattern of representation similarities using the RDMs shown in d. The time course is given by sliding the windows associated with choices from being centred on choice port entry to 0.76 seconds after choice port entry while holding time windows centred on trial initiations fixed. Stars indicate timepoints where regression weight for each RDM was significantly different between the two regions (P < 0.05 (small stars) and P < 0.001 (big stars)), from one-sided permutation tests across sessions corrected for multiple comparison over timepoints. f, Confusion matrices from linear decoding of position in trial, using a decoder that was trained on one problem and tested on another, averaged across animals and across all problem pairs. Colored squares indicate three possible patterns of decoding that indicate different neuronal content. Blue indicates correct cross-task decoding to the same abstract state (for example, B choice decodes to B choice). Red indicates decoding to a different state that could have occurred at the same sequential position in the trial (for example, B choice decodes to A choice). Dashed green corresponds to decoding to the same physical port for those training and test layouts where the Initiation and B choice ports interchanged (for example, B choice decodes to Initiation when the decoder was trained on layout 2 and tested on layout 3). g, Bar plots showing the probability of the cross-task decoder outputting the correct abstract state (blue), the other state that can have the same position in the trial sequence (red) and the state that has the same physical port as the training data (dashed green, computed only from confusion matrices where B choice and initiation ports interchange) computed using the corresponding cells highlighted in f. Error bars report the mean ± s.e.m. across different mice (CA1: n = 3 mice; PFC: n = 4). Significance levels were compared against the null distribution obtained by shuffling animal identities between regions (one-sided permutation tests). NS, not significant.
4. 低维度群体神经元活动的泛化
为进一步探索群体神经元活动的结构如何在任务间泛化,奇异值分解方法(SVD)用来评估一个任务低维群体活动的模式预测另一任务低维群体活动的精确性。结果发现,(1)时间模式下,PFC和CA1脑区活动对于试次事件的表征没有显著差异;(2)细胞模式下(共同激活的神经元集群),PFC和CA1脑区活动对于任务内试次表征的相似性优于任务间试次表征的相似性;(3)时间模式与细胞模式的对准(alignment)精度,PFC和CA1都显示出了任务内试次对准的程度优于任务间试次对准的程度。总体而言,尽管PFC与CA1都表现出跨任务的时间结构的表征模式,但是,PFC的神经元集群展现出更强的跨任务泛化能力。

Fig. 5 Generalization of low-dimensional representations of trial events.a, Diagram of SVD analysis. A data matrix comprising the average activity of each neuron across timepoints and trial types was decomposed into the product of three matrices, where diagonal matrix Σ linked a set of temporal patterns across trial type and time (rows of VT) to a set of cellular patterns across cells (columns of U). b, First temporal mode in VT from SVD decomposition of data matrix from PFC plotted in each problem separately for clarity and separated by A (green) and B (pink) rewarded (solid) and non-rewarded (dashed) choices. c, First cellular mode from SVD decomposition of data matrix from PFC in each problem showing that similar patterns of cells participate in all problems. d, Variance explained when using temporal activity patterns V1T from one problem to predict either held-out activity from the same problem (solid lines) or activity from a different problem (dashed lines). Light purple and lilac lines indicate variance explained when shuffling timepoints in the firing rates matrices. e, Variance explained when using temporal activity patterns V1T to predict either activity from the same problem and brain region (solid lines) or a different brain region (and, therefore, different animal) and the same problem (dashed lines) D2. f, Variance explained when using cellular activity patterns U1 from one problem to predict either held-out activity from the same problem (solid lines) or activity from a different problem (dashed lines). Dashed light purple and lilac lines indicate variance explained when shuffling cells in the firing rates matrices. g, Cumulative weights along the diagonal Σ using pairs of temporal V1T and cellular U1 activity patterns from one problem to predict either held-out activity from the same problem (solid lines) or activity from a different problem (dashed lines). Weights were normalized by peak cross-validated cumulative weight computed on the activity from the same problem. h, To assess whether the temporal singular vectors generalized significantly better between problems in PFC than CA1, we evaluated the area between the dashed and solid lines in d for CA1 and for PFC separately, giving a measure for each region of how well the singular vectors generalized. We computed the difference in this measure between CA1 and PFC (pink line in h) and compared this difference to the null distribution obtained by permuting sessions between brain regions (gray histogram; black line shows the 95th percentile of distribution). Temporal singular vectors generalized equally well between problems in the two regions. i, Cellular singular vectors generalized significantly better between problems in PFC than CA1. Computed as in h but using the solid and dashed lines from f. g, Pairs of cellular and temporal singular vectors generalized significantly better between problems in PFC than CA1. Computed as in h but using the solid and dashed lines from g. a.u., arbitrary units.
5. 策略表征的泛化
为了获取最大程度的奖励,个体需要整合选择-结果的历史经验来优化决策。同时,策略表征应与任何特定任务的当前感觉运动经验相分离。逻辑回归分析发现,个体的策略表征与试次的起始时间点的选择有关;线性回归分析发现,相对于CA1,PFC的策略表征在任务间显示出更大程度的泛化过程。

Fig. 6 Policy generalization in PFC and CA1.a, Weights from logistic regression predicting choices in recording sessions using choices, rewards and choice × reward interactions over the previous 12 trials as predictors. The effect of choice × outcome interaction history was significantly above zero on up to 11 trials back (one-sided t-test, P < 0.05) except for the 7th trial (t6 = 1.99, P = 0.094). Error bars report the mean ± s.e.m. across mice. b, CPDs from regression models predicting neural activity using current trial events, subjects’ policy (estimated using the behavioral regression in a) and policy interacted with current choice. Stars denote the timepoints at which each regressor explained significantly more variance than expected by chance (permutation test based on rolling firing rates with respect to trials, P < 0.001, corrected for multiple comparisons; for more details on permutation tests, see the ‘Statistical significance’ section). c, Correlations across problems between policy weights in regressions predicting neural activity. Regressions were run separately for A (left panels) and B (right panels) choices in each problem and at each timepoint across the trial. Correlations of policy representations between all problem pairs were evaluated for each pair of timepoints; values on the diagonal show how correlated policy representation was at the same timepoint in both problems. Positive correlation indicates that the same neurons coded policy with the same sign in both problems. d, To quantify whether policy generalized more strongly between problems in PFC than CA1, we computed the between-region difference in the sum along the diagonal of the correlation matrices in c, separately for A and B choices, and compared it against the null distribution obtained by permuting sessions between brain regions. Policy representation on both A and B choices generalized more strongly in PFC than CA1. e, Slices through the correlation matrices at initiation (left), choice (center) and outcome (right) times for A (solid) and B (dashed line) choices. Significant differences between conditions are indicated by stars as shown in the legend.
结论
前额叶与海马在跨任务的知识泛化中可能起到互补的作用。前额叶提取相似任务中的共同的知识学习结构表征,而海马能够将此结构表征映射到当前的具体情境中。