PlantCAZyme

PlantCAZyme是基于dbCAN（用于自动碳水化合物活性酶注释的数据库）建立的数据库，目的是向植物碳水化合物和生物能源研究社区提供碳水化合物活性酶（CAZymes）的预先计算的序列和注释数据。当前版本包含来自35个植物（包括被子植物，裸子植物，lycophyte和苔藓植物苔藓）和具有完全测序的基因组的绿藻藻的159个蛋白家族的43 790个CAZymes数据。数据库的有用特征包括：（i）BLAST服务器和HMMER服务器，其允许用户针对我们的预先计算的序列数据搜索以用于注释目的，（ii）下载页面，以允许批量下载特定CAZyme家族的数据或物种和（iii）蛋白质浏览页面，以提供容易获得最全面的序列和注释数据。

介绍
Lignocellulosic biofuels have received great attentions in the past decade for obvious economic and environmental reasons [1]. Other than using starch-based plant materials as the feedstock, lignocellulosic biofuels use inedible plant biomass materials, which however are very recalcitrant to be degraded to release fermentable sugars. The bioenergy research community thus has major interests in genetically modifying plants in order to develop low-cost biofuels [2]. To achieve this goal, researchers need to know which genes should be modified to acquire the desired plants with lower recalcitrance to enzymatic degradation. Therefore biomass-related enzyme databases are highly needed to promote the development of transgenic biofuel crops [3]. Carbohydrate-Active enzymes (CAZymes) are enzymes responsible for the synthesis, degradation and modification of storage and structural biomass polysaccharides [4] and thus are the most important enzymes for bioenergy research. CAZymes are not only found in plants and bacteria, but also in fungi and animals, responsible for the synthesis, degradation and modification of all the glycoconjugates in nature including glycoproteins and glycolipids. Therefore they are also fundamentally important for general carbohydrate and glycobiology research [4].
由于明显的经济和环境原因，木质纤维素生物燃料在过去十年中受到了极大的关注[1]。除了使用基于淀粉的植物材料作为原料之外，木质纤维素生物燃料使用不可食用的植物生物质材料，然而其非常顽固地被降解以释放可发酵的糖。因此，生物能源研究界对基因改造植物具有重大兴趣，以开发低成本生物燃料[2]。
为了实现这个目标，研究人员需要知道哪些基因应该被修饰以获得具有较低的对酶降解的顽抗性的所需植物。因此，生物质相关酶数据库是非常需要促进转基因生物燃料作物的发展[3]。碳水化合物 - 活性酶（CAZymes）是负责储存和结构生物质多糖的合成，降解和修饰的酶[4]，因此是生物能源研究中最重要的酶。 CAZymes不仅在植物和细菌中发现，而且在真菌和动物中发现，负责自然界中所有糖缀合物的合成，降解和修饰，包括糖蛋白和糖脂。因此，它们对于一般的碳水化合物和糖生物学研究也是重要的[4]。
CAZymes are present in all life kingdoms and particularly abundant in plants [5]. Since 1998, the CAZyme database, known as CAZy, has started to collect experimentally (biochemically, genetically and structurally) characterized CAZyme proteins and classify them into protein families and so far has created 330 families (as of May 2013) of six classes based on sequence homology: GHs (glycoside hydrolases), GTs (glycosyltransferases), CEs (carbohydrate esterases), PLs (polysaccharide lyases), AAs (auxiliary activities) and CBMs (carbohydrate binding modules) [6]. It then populated each family by including homologs from GenBank, UniProt and PDB databases using both BLAST and protein domain/motif search strategies as well as expert manual inspection of sequence alignment [4, 7]. CAZy is an extremely useful resource for its most original classification scheme and high-quality manual curation, and thus has been widely accepted by the carbohydrate research community.
CAZymes存在于所有的生命王国，特别是丰富的植物[5]。自1998年以来，被称为CAZy的CAZyme数据库已开始收集实验（生物化学，基因和结构）特征的CAZyme蛋白质，并将它们分类为蛋白质家族，到目前为止已经创建了330个家庭（2013年5月）基于序列同源性：GHs（糖苷水解酶），GTs（糖基转移酶），CEs（碳水化合物酯酶），PLs（多糖裂解酶），AA（辅助活性）和CBMs（碳水化合物结合模块）。然后通过包括来自GenBank，UniProt和PDB数据库的同源物，使用BLAST和蛋白质结构域/基序搜索策略以及专家手动检查序列比对来填充每个家族[4,7]。 CAZy是其最原始的分类方案和高质量手工策划的非常有用的资源，因此已被碳水化合物研究界广泛接受。

由于产生数千个完成的植物和微生物基因组和宏基因组，在过去几年中出现了对自动CAZyme注释的巨大需求。但CAZy数据库不提供自动CAZyme注释。鉴于这种需求，在2012年，我们开发了一个名为dbCAN的Web服务器，允许用户提交自动CAZyme注释的新排序的基因组[8]。 Web服务器后面是330个CAZyme系列的隐藏的Markov模型（HMM）;每个HMM代表从CAZy数据库的注释CAZyme蛋白序列中检索的每个家族的保守签名区域的序列比对。 dbCAN网站在出版后已经收到了许多国家的数千次访问，表明其对CAZymes研究的影响。

330 CAZyme HMM的可用性还使得可以建立用于植物CAZym的专用数据库。关于相似的资源，CAZy数据库仅覆盖超过40个测序的植物和藻类基因组中的两个（拟南芥和水稻）不包括所有测序的生物能源作物（例如杨树，柳枝稷，高粱）和进化上重要的生物体（例如苔藓，穗苔，藻类）。另外两个数据库，pDAWG [9]和Rice GT [10]，仅限于少数的CAZyme家族和基因组。还有一些其他数据库，如Cell Wall Genomics数据库[11]和Cell Wall Navigator数据库[12]，它们只包含极少数的CAZyme系列。因此，PlantCAZyme的发展是对植物碳水化合物和生物能源研究的工具箱的及时和高度显着的补充。

PlantCAZyme

推荐阅读更多精彩内容