仅供学术交流,严禁转载及商用!原内容来自于10X Genomics官方支持文档。
使用Peak Viewer分析差异可接近性 - Analyzing Differential Accessibility with the Peak Viewer
NOTE: This section presumes you've created cell type clusters in the Identifying Cell Types tutorial section. You can also follow along roughly by opening the ATAC Tutorial dataset, and choosing the K-Medoids (LSA) Cluster, with K=4.
注意:该部分假设你已经在细胞类型识别教程部分创建了细胞类型分群。你也可以打开ATAC教程数据集,选择K-Medoids (LSA) 聚类,K=4,来粗略的跟进本部分。
加载Peak Viewer - Loading the Peak Viewer
Let's use the Peak Viewer to look at differential accessibility between cell subtypes. Select the Peak Viewer (area chart) icon at the bottom left of the Loupe Cell Browser workspace. By default, you will see the peak distribution for each cluster over the first five detected peaks. The peak distribution chart shows the location of called peaks in the genome, their widths, and the proportion of cells in each cluster that had an accessible region within one of those peaks.
我们们先用Peak Viewer来查看一下细胞类型之间的差异可接近性。在Loupe Cell Browser工作区左下方选择Peak Viewer(面积图)图标。默认下,你将看到前五个检测到的peak上每个分群的peak分布。Peak分布图表展示了基因组中识别到的peak位置,peak的宽度,以及每一个分群中在这些peak之一内有一个可接近性区域的细胞比例。
For the ATAC Tutorial, the first five peak region isn't very interesting, so let's focus on a marker gene. At the top left of a peak viewer is a selector, which allows you to jump to a genomic region (e.g., "chr1:10244-753061") or a gene. Choose the Gene option from the selector, and then enter "CD33" in the adjacent input field. You should end up with a plot that looks like this:
对ATAC教程,前五个peak区域并没有多少意思,所以我们关注一个marker基因。在Peak Viewer的左上方是一个选择器,这个选择器可以让你跳跃到基因组的指定区域(例如:“chr1:10244-753061”)或一个基因的位置。在选择器中选择基因选项,接着在邻近的输入区内输入“CD33”。在你完成之后,你应该看到如下所示的图表:
You can see a few things here. First, by navigating to a single gene, the visible genomic region is now small enough to show gene and transcript annotations. Gene boundaries appear as dashed lines, transcripts as solid lines, exons as solid rectangles, and UTRs as open rectangles. (NOTE: Only one transcript is shown per gene; for an explanation as to why, consult Peak Viewer Details.) The CD33 gene boundaries are clearly visible in the center of the peak viewer.
在这里,你可以看到一些东西。首先,通过跳转到单一基因,可见的基因组区域现在已经足够小来展示基因及转录注释信息。基因边界用虚线来表示,转录本用实线表示,外显子用实心矩形表示,UTRs用空心矩形表示。(注意:每个基因只显示一条转录本;想了解更多原因,请参阅Peak Viewer更多细节部分。)现在,CD33基因边界在peak viewer中心清晰可见。
Next, you can see the difference in accessibility between the clusters. Consider the peak at the beginning of the CD33 transcript. It is clear that a higher percentage of monocytes had open chromatin at that peak than other groups. You can see a precise percentage as well as other information about the peak by moving the mouse over it. Clicking on a peak will also show the cells with open chromatin at that region within the barcode plot, as shown below.
接下来,你可以查看分群之间可接近性的差异。注意CD33转录本起始位置的peak。很明显,相较于其他细胞类型,很高比例的单核细胞在该peak区有开放染色质。当你把鼠标停在上面,你可以看到关于该peak细胞类型的精确百分比及其他信息。单击一个peak也会在barcode图中展示在该区域拥有开放染色质的细胞,如下图所示。
查看切割位点 - Viewing Cut Sites
The default view shows the relative accessibility of the genome at an aggregate level, but you can inspect accessibility at a finer-grained resolution by loading the
fragments.tsv.gz
that is generated by the Cell Ranger ATAC pipeline. Fragment files can be loaded via the file system, or via URL. The ATACTutorial fragments file is accessible athttps://s3-us-west-2.amazonaws.com/10x.files/supp/cell-exp/ATACTutorial-fragments.tsv.gz
. Click the folder icon in the peak viewer menu bar, and copy the above URL into the input field. After a few seconds, you should see a plot similar to the one below:
默认视角在累加水平展示了基因组的相对可接近性,但是你可以通过加载Cell Ranger ATAC流程生成的fragments.tsv.gz文件在更精细的分辨率下检视基因组。Fragment文件可以通过文件系统或URL加载。你可以在https://s3-us-west-2.amazonaws.com/10x.files/supp/cell-exp/ATACTutorial-fragments.tsv.gz得到ATAC教程数据集相对应的fragment文件。在Peak Viewer菜单栏单击文件夹图标,复制前面的URL到输入区。几秒之后,你应该能看到一个和下面类似的图:
With the fragments file loaded, you can see cut site tracks for each cell type. Cut site tracks approximate chromatin accessibility at each position in the genome, for every cluster, using the information stored in the fragments file. The highest peaks indicate where there were local maxima in detected fragments, indicating regions that were more generally accessible within that cell type. You can now clearly see a peak within the monocytes around the CD33 transcription start site, with a smaller secondary peak slightly downstream.
加载好fragments文件后,你可以看到每个细胞类型的切割位点轨迹。切割位点轨迹使用存储在fragment文件中的信息,近似于在基因组中的每个位置,对每一个分群的染色质可接近性。最高的peak表明这里在检测到的fragments中有一个局部最大值,暗示了该细胞类型中通常更易接近的区域。你现在可以清楚的看到单核细胞内CD33转录其实位点附近的一个peak,其稍下游位置有一个小的次级peak。
For more information about how cut site track values are computed, see Understanding Cut Site Tracks.
想了解更多关于切割位点轨迹值是如何计算的,请参阅理解切割位点轨迹。
By default, cut site tracks are computed by taking 400-base wide rolling window sums over all cut sites in a cluster. You can control the window size and other display options by clicking on the Peak Viewer Options icon to the right of the folder icon. Selecting narrower windows may show more detailed accessibility patterns, but can be noisier. You can also change the appearance of the peak viewer through the Peak Viewer Options menu.
默认情况下,切割位点轨迹的计算方法是取一个分群中所有切割位点的400碱基宽划窗总和(通过在一个分群所有切割位点上按400碱基划窗总和计算得到的)。你可以通过点击文件夹图标肉测的Peak Viewer选项图标来控制窗口大小及其他显示选项。选择更窄的窗口可能会展示更详细的可接近性模式,但也会有更多噪声。你也可以通过Peak Viewer选项菜单来改变Peak Viewer的外观。
For a more exhaustive list of features and detailed explanations about the data behind the Peak Viewer, consult the Peak Viewer Details page.
想了解更多关于Peak Viewer详细功能列表及背后数据的详细解释,请参阅Peak Viewer更多细节部分。
下一步 - Next Steps
Now that we have created cell type clusters and analyzed differential accessibility with the peak viewer, let's look for more subtle differences in our data, by looking for Significant Features.
现在我们已经创建了细胞类型分群,使用Peak Viewer分析了差异可接近性,让我们通过寻找显著特征来发现数据中更加微小的差异。
识别显著特征 - Identifying Significant Features
NOTE: This section presumes you've created cell type clusters in the Identifying Cell Types tutorial section. You can follow along roughly by opening the ATAC Tutorial dataset, and choosing the K-Medoids (LSA) Cluster, with K=4.
注意:该部分假设你已经在细胞类型识别教程部分创建了细胞类型分群。你也可以打开ATAC教程数据集,选择K-Medoids (LSA) 聚类,K=4,来粗略的跟进本部分。
寻找T细胞亚群 - Finding T Cell Subgroups
Let's return to the barcode plot, and look at the region of cells with T cell markers. Dimensionality reduction via LSA and subsequent clustering separates the T cells into a few separate groups, as seen on the t-SNE plot. What might be driving differences between those subgroups? We can use the significant features tool to find out.
让我们回到barcode图,看一下带有T细胞markers的细胞区域。LSA降维及后续的聚类将T细胞进一步分为一些分隔的群组,正如t-SNE图所示。是什么导致了这些亚群之间的差异?我们可以使用显著特征工具来发现原因。
First, let's create a second T cell group by using the lasso tool to highlight the rightmost cluster of T cells and label it "T Cells 2", as shown below:
首先,我们先试用套索工具高亮T细胞最右侧的分群并将其标记为“T细胞2”,如下所示:
To get a hint of what is distinguishing these clusters, we can use the Significant Features tool, located below the active cluster list. With this tool, you can compute distinguishing motifs, individual peaks, or promoter sums between currently selected clusters. The tool computes significant features by one of two ways, selectable via the Significant Feature Comparison selector:
为了得到一些关于是什么区分了这些分群的提示,我们可以使用显著性特征工具,位于活动分群列表下方。通过这个工具,你可以在当前选中的分群间计算区别motifs,独立peak,或启动子总和。这个工具通过两种方式之一来计算显著性特征,可通过显著性特征比较选择器来进行选择:
- Globally Distinguishing: For each checked cluster, compute the features that have the most different accessibility compared to the cells in all other clusters. Measures unique accessibility patterns for each cluster in the dataset.
- 全局区分:对每一个选中的分群,计算其具有的相较于其他分群中的细胞最大的差异可接近性的特征。对数据集汇总每一个分群测量其独有的可接近性模式。
- Locally Distinguishing: For each checked cluster, compute the features that have the most different accessibility compared to the cells only in the other checked clusters. Allows for comparison between a subset of groups.
- 局部区分:对每一个选中的分群,计算其具有的相较于其他选中的分群中的细胞最大差异可接近性的特征。可用于组的子集之间进行比较。
The Feature Type selector allows you to select between available feature types. You can only look at one feature type at a time, as the sSEQ significant feature calculation relies on a common denominator of feature counts; there are different total sums across the dataset for peaks, promoter sums, and motifs.
特征类型选择器可以让你在可用的特征类型间进行选择。由于sSEQ显著性特征计算依赖于特征计数的公分母,你一次只能查看一种特征类型。对于peaks,启动子总和以及Motifs,整个数据集中有不同的总和。
Let's start by doing a sanity check and finding distinguishing promoter sums for each cell type. First, make the Feature Table visible by clicking on the list icon to the left of the bottom panel. Next, with the Cell Types category visible in Categories, mode, choose Globally Distinguishing from the comparison selector, and Promoter Sum as the Feature Type. Press the calculator icon, and wait for Loupe Cell Browser to compute significantly enriched promoter sums. This operation should take around 30 seconds, depending on your machine's performance. This calculation will take longer for datasets with more cells or higher depth, and longer when finding significant peaks.
让我们先做一个完整性检查,然后寻找每个细胞类型的区分启动子总和。首先,点击底部面板左侧的列表图标来切激活特征表格视图。接着,当处于分类模式下,细胞类型可见时,在比较选择器中选择全局区分,使用启动子总和作为特征类型。点击计算器图标,等待Loupe Cell Browser计算显著富集启动子总和。这个操作大约需要30秒左右,取决于你所用机器的性能。对于细胞数较多或者更深层的数据集,以及寻找显著性peaks时,这个计算会消耗较长时间。
Once the calculation completes, you will see the most significantly enriched promoter sums for the B cell group, with accompanying log2 fold change against the other cells, sorted by p-value:
当计算完成时,你会看到B细胞群的最显著富集启动子总和,以及根据p-value排序的相较于其他细胞的log2 fold change。
By default, the most comparatively enriched motifs, peaks or promoter sums will be calculated, though that is configurable through the Feature Table Options menu, accessible on the feature table's menu bar.
默认下,会计算最显著富集的motifs,peaks或启动子总和,可以通过特征表格选项菜单来配置,该按钮在特征表格菜单栏。
The most significantly enriched promoter sum, not surprisingly, is MS4A1 (CD20), which we used to identify the B cell cluster. You can click on the cell type column headers to see the feature lists for the other cell types. Once a cell type is selected, clicking on the cell header again will sort the features by log2 fold change.
毫无意外,最显著富集的启动子总和就是我们用来识别B细胞分群的MS4A1(CD20)。你可以点击细胞类型列的header来查看其他细胞类型的特征列表。当选中一个细胞类型时,再次点击细胞header将会按照log2 fold change对特征进行排序。
Next, let's explore the differences between the two T cell clusters. In the Categories panel, uncheck the B Cells and Monocytes clusters, and then select Locally Distinguishing from the comparison selector. Let's also choose Motif as the desired Feature Type, and compute the significant motifs.
接下来,我们来探索一下两个T细胞分群之间的差异。在分类面板,去除B细胞和单核细胞分群的选中,接着在比较选择器中选择局部区分。将Motif作为特征类型,计算显著Motif。
After computation, the feature table now shows the motifs that are most differentially accessible between the two T cell groups. To verify the differences in accessibility, you can highlight the significant motifs in the barcode plot. First, click the Split View button at the bottom of the toolbar, and select "Current Category (Cell Types)". Split View segments the cells in the barcode plot into distinct regions by cluster, even if the cells overlap in the parent t-SNE plot. Next, click on "BATF::JUN" in the feature table. Clicking on a feature in the table will allow you to add it to a feature list, copy the name to the clipboard, or to view the accessibility of the feature in the barcode plot. Click "Set as Active Feature" to see the BATF::JUN motif z-score in the barcode plot:
计算后,特征表格展示了两个T细胞分群间的最大差异可接近性的Motifs。为了验证可接近性上的差异,你可以在barcode图中高亮显著性motifs。首先,在工具箱底部点击拆分视图按钮,选择“当前分类(细胞类型)”。拆分视图会将barcode图中的细胞(包括在原t-SNE图中重叠的细胞)按分群进行拆分。接着在特征表格中点击“BATF::JUN”。点击表格中的一个特征会将其加入一个特征列表,复制名称到剪贴板,或在barcode图中查看该特征的可接近性。点击“设置为活跃特征”可以在barcode图中查看BATF::JUN motif的z-score。
The cells in the T Cell 2 group are bluer, indicating that BATF-associated peaks are less accessible on average in that group than in the T Cell group. BATF accessibility has been shown to increase with cell differentiation and age [1], so this may indicate that the T Cell 2 group is comparised of younger, or more naive T cells. It should be noted that the T Cell 2 group contains both cells with accessible CD8A and CD4 promoter regions, indicating a more diverse mix than could be explained by neat separation between helper and cytotoxic T cells.
T细胞2组中的细胞相对更蓝一些,表明该组中BATF相关peaks相较于T细胞组在平局水平上不易接近。BATF可接近性在细胞分化和衰老中表现出增加趋势,因此这也可能说明T细胞2组相对更年轻,或者说更多未致敏T细胞。应该注意到,T细胞2组中包含可接近CD8A和CD4启动子区域的两种细胞,表明一种相较于用辅助性T细胞和杀伤性T细胞之间的完全分离更加多样化的混合。
Finally, let's look for significant peaks between the two T cell clusters. Return to the Categories mode, and change the Feature Type selector to Peaks. Press the Calculator icon, and wait for a while for the computation to finish.
最后,我们来看一线两组T细胞之间的显著性peaks。切换回分类模式,将特征类型选择器改为Peaks。点击计算器图标,等待计算完成。
Let's add a few significant peaks in the primary T cell cluster to a feature list. Click on the first five significantly enriched peaks, one at a time, and add them to a new "T Cell 1 Drivers" list. To create the new list, type "T Cell 1 Drivers" in the Add to Feature List input field, and select the create new list option. With the list selected, click on the plus icon to add the features to the list.
我们先向一个特征列表中加入一些初始T细胞中的显著性peaks。点击前五个显著富集peaks,一次一次,将它们加入到新的“T细胞1驱动”列表。要创建新的列表,在加入特征列表输入区输入“T细胞1驱动”,选择创建新列表选项。当列表被选中时,点击加号将特征加入到列表中。
We can now learn more about these individual peaks. With the T Cell 1 Drivers list selected in Accessibility mode, hover over a peak, and click on the Peak Viewer icon that appears next to the feature. Let's select the first peak in the list, "chr1:159046026-159047751":
现在我们可以更多的了解这些独立的peaks。在可接近性模式下选择T细胞1驱动列表,鼠标停在一个peak上,点击出现在该特征边上的Peak Viewer图标。我们先选择列表中的第一个peak,“chr1:159046026-159047751”:
This will highlight the peak in the peak viewer. Zooming out in the peak viewer should show additional context. You can see from the gene annotation track that the peak is located right at the transcription start site for the AIM2 gene.
这将会在Peak Viewer中高亮peak。在Peak Viewer中放大或缩小可以显示其他附加信息。你可以从基因注释轨迹中看到该peak位于AIM2基因转录起始位点。
In this manner, you should be able to start teasing out additional information about the different cell groups in your data. Once you've analyzed your dataset, saved lists of significant features, and identified cell types, you can share your findings in a variety of ways with collaborators.
通过这种方式,你可以开始梳理你数据当中不同细胞群的额外信息。当你分析完你自己的数据集后,保存显著性特征列表,识别完细胞类型,你可以奖你的发现通过同种方式分享给你的合作者。
Let's move onto Sharing to find out how.
让我们接着移步到分享结果相关部分来学习如何进行。
[1] https://www.ncbi.nlm.nih.gov/pubmed/28439570
分享结果 - Sharing Results
保存 .cloupe 文件 - Saving .cloupe Files
You can save your dataset workspace, including any custom clusters and feature lists you've created by clicking on the Save (disk) icon on the toolbar. If you prefer to create a new version of a .cloupe file, choose "Save As" from the File menu, or press Ctrl-Shift-S (Windows) or Command-Shift-S (Mac). You'll be prompted to create a new .cloupe file somewhere in your file system.
你可以通过点击工具箱中保存(磁盘)图标来保存你的数据集工作空间,包括所有你创建的自定义分群以及特征列表。如果你想创建一个新版本的 .cloupe
文件,请在文件菜单中选择“另存为”,或使用Ctrl-Shift-S (Windows)/ Command-Shift-S (Mac)快捷键,接着会提示你将文件保存到文件系统的其他位置。
Loupe Cell Browser files are self contained, so if you want to share a dataset with a colleague, you can send him or her the .cloupe file.
Loupe Cell Browser文件是自包含的,所以如果你想把你的数据分享给你的同事,你可以直接把 .cloupe
文件发给他/她。
导入和导出分类 - Importing and Exporting Categories
There are many scripts and packages in the single-cell analysis ecosystem, and that you may want to import and export cluster labels into and out of Loupe Cell Browser. This is easy to do. With Categories mode selected, simply click on the action button to the right of the active category name, and select from one of the import/export options:
针对单细胞分析已有很多脚本和模块,或许你想将分群标记导入或导出Loupe Cell Browser。这很好操作。选择分类模式,仅需要点击活动分类名称右侧的活动按钮,选择导入/导出选项即可:
Cluster labels are stored in CSV format. When importing cluster label CSVs into Loupe Cell Browser, barcodes must be in the first column, the first row must be a column header, the barcode column header must be named barcode, and that the values of the barcodes must match at least a subset of the barcodes in the .cloupe file. To get a sense of the CSV structure, you can Export one of the pre-defined categories in your dataset, and view it with a text editor or spreadsheet tool.
分群标记被存储为CSV格式。当向Loupe Cell Browser中导入分群标记CSV文件时,barcodes必须在第一列,第一行必须为列的header,barcode列的header必须命名为barcode,barcodes的值必须至少与 .cloupe
中已有barcodes的一套子集匹配。为了直观的了解CSV格式,你可以导出你数据集中的一个预定义分类,然后使用文本编辑器或表格工具来查看。
导出数据和图形 - Exporting Data and Graphics
There are a variety of mechanisms to export data and graphics from your ATAC dataset. The toolbar, categories list, feature list, feature table and peak viewer all have export functionality. Let's cover them one by one:
想要从你的数据集中导出数据和图表也有多种方法。工具箱、分类列表、特征列表、特征表以及Peak Viewer都具有导出功能。我们一个一个来讲一下:
导出Barcode图 - Exporting Barcode Plots
Clicking on the camera icon in the toolbar will export the currently-displayed barcode plot. You can choose to export to either PNG, or SVG vector formats. The SVG vector format will include axes if Feature Plot view is active.
点击工具箱中的照相机图标将会导出现在展示的barcode图。你可以选择导出为PNG格式或SVG矢量图格式。如果特征图视图处于活动状态,SVG矢量图格式会包含坐标轴。
导出特征列表 - Exporting Feature Lists
With Accessibility mode selected, you can choose to import a set of feature lists, or export the currently loaded set of feature lists. Click on the action button to the right of the feature list selector to do so:
选择可接近性模式后,你可以选择导入一套特征列表,或导出当前载入的特征列表集合。点击特征列表选择器右侧的活动按钮来进行操作:
Motifs and promoter sums should be applicable across multiple datasets, though lists with peaks will likely not translate between datasets.
Motifs和启动子总和可适用于多个数据集,peaks的列表可能在不同数据集间无法转换。
导出显著特征 - Exporting Significant Features
To export the tabular list of significant features, including p-values and log2 fold changes, click on the Export icon above the feature table. This will save the current table in CSV format. You may use the Feature Options menu to select enriched features, depleted features, the number of features to export in CSV format, and whether to filter by an average accessibility level across the dataset.
想要导出显著特征的扁平列表,包含p-values和log2 fold changes,点击特征表上方的导出图标。这将会保存当前表格为CSV格式。你也可以使用特征选项来控制导出CSV格式文件中的富集特征、贫化特征,特征数量,或者根据一个平均可接近性水平落在整个数据集上进行筛选。
导出Peak Viewer图 - Exporting Peak Viewer Plots
Finally, with the Peak Viewer active, you can click the Export button to export the currently visible peaks into a CSV file, visible cut sites into a multi-track .bedgraph file, or the peak viewer plot into PNG or SVG format.
最后,当Peak Viewer处于活动状态,你可以点击导出按钮来导出当前可视化peaks到一个CSV文件中,或将可视化切割位点到一个多轨 .bedgraph
文件中,或将Peak Viewer图导出为PNG或SVG格式。
下一步及支持 - Next Steps and Support
This concludes the Loupe Cell Browser ATAC tutorial. Now it's time to use what you've learned on your own data. Read about how to generate your own .cloupe files with the Cell Ranger ATAC pipeline.
到这里,Loupe Cell Browser ATAC教程就结束了。现在轮到你来将所学到的知识用在你自己的数据集上了。 阅读如何使用Cell Ranger ATAC流程来生成你自己的 .cloupe
文件。
Reporting Issues
If you encounter any errors in the program, you can send a bug report at any time by clicking on "Generate Bug Report" from the Help menu. A
.tar.gz
file containing logs from your most recent Loupe Cell Browser session will be created, and you can send that file to support@10xgenomics.com. Please add the subject line "Loupe Cell Browser Error" to your message.
如果你在使用过程中遇到了任何错误,你可以在任何时间通过点击帮助菜单中的“生成Bug报告”发送一个bug报告。一些包含从你最近运行的Loupe Cell Browser会话中得到的日志的 .tar.gz
会被创建出来,你可以将这个文件发送给 support@10xgenomics.com。请在邮件的主题部分加上“Loupe Cell Browser Error”信息。
You may also submit general feedback and feature requests to support@10xgenomics.com as well.
你也可以通过 support@10xgenomics.com 来提交一般性反馈及功能需求。
We hope this tutorial has made you familiar with the capabilities of Loupe Cell Browser, and made you excited to process your own data. We hope you find it to be the easiest, fastest and most enjoyable way to interpret your single-cell ATAC data.
我们希望这个教程已经让你熟悉了Loupe Cell Browser的功能,并让你迫不及待的探索自己的数据。我们希望你认为这是解释你single-cell ATAC数据的最简单、最快捷和最愉悦的方式。
结束语
感谢您阅读到此处,本文由Cerasus_sp翻译,浩渺予怀校对。如果您喜欢我们的文章,请赞赏鼓励!