Genomic landscape and genetic manipulation of the black soldier fly Hermetia illucens, a natural waste recycler
2019年11月25日,上海生科院植物生理生态研究所黄勇平,华中农业大学张吉斌团队等人在Cell Research上在线发表了题为Genomic landscape and genetic manipulation of the black soldier fly Hermetia illucens, a natural waste recycler的研究论文。该研究报告了黑水虻(BSF)的高质量基因组图谱,通过CRISPR/Cas9的基因编辑方法,获得了一种能显著提高BSF取食能力的基因型,为优化BSF基因系的产业化提供了有价值的基因组和技术资源。
Abstract
黑水虻是双翅目,水虻科昆虫,能将有机物转化成动物可食用的资源,基因组大小1.1G,16,770个蛋白编码基因。与其他双翅目昆虫相比,黑水虻基因组在septic adaptation(腐败性环境的适应性)的功能类群中的基因大量扩张,包括immune system factors, olfactory receptors, and cytochrome P450s。中肠转录组表明与消化系统以及抵抗细菌等通路大量富集。BSF幼虫取食代表性的有机物的微生物组表明,Firmicutes bacteria(厚壁菌门细菌)在肠道微生物最多。通过CRISPR/Cas9-based技术得到取食能力增强的基因型。
Data availability: NCBI under BioProjectID PRJNA547968 and SRA under SRR10158821.
Introduction
随着人类人口的大量扩增,产生了越来越多的有机废物,它们的处理办法主要有三种:焚烧,填埋,堆肥。然而这些方法或多或少都会造成环境的二次污染。而黑水虻被认为是在全世界唯一可以用于水产以及家禽的饲料原料的昆虫。它们可以高效的利用有机废物转化成蛋白,脂肪等,降低二氧化碳排放,病原菌及抗生素污染。随着测序技术的不断发展,本文利用基因组,转录组,宏基因组及基因的遗传操作,可用于探索BSF生物学特征的遗传基础。
RESULTS AND DISCUSSION
Characteristics of the BSF genome
测序样本为10代自交系昆虫,~300×左右测序深度的Illumina sequencing,包括paired-end libraries of short inserts and mate-pair libraries of long inserts,1102 Mb of assembled scaffolds with a 1.69 Mb N50 length。BSF由于转座子,重复的非编码DNA,以及大量的重复序列导致其基因组很大。
测序深度及GC含量呈现正态分布,表明组装中污染较少。
16,770 protein-coding genes通过与六种双翅目昆虫的同源比对,12个连续BSF发育阶段的转录组数据,以及三个从头预测的基因集得到16,770 protein-coding genes。
Comparison of the BSF genome with those of other dipterans
the BSF genome fills a gap between the Nematocera(长角亚目), the earliest diverging suborder of Diptera, and more recent flies(短角亚目).
BSF与家蝇和果蝇的 nonsynonymous-to-synonymous substitution (dN/dS) ratios分析得出,发现342个基因dN/dS比例高,这些快速进化的基因主要富集在与核糖体相关的功能模块上,它们参与蛋白质合成通路。它们主要富集在氨基酸代谢以及免疫相关的代谢通路中。h这可能由于BSF长期生活在高蛋白以及病原体富集的环境中。
b. Identification of pathways that have rapidly evolved in BSF. dN/dS ratios were calculated independently in two parallel evolutionary lineages, M. domestica and D. melanogaster, using BSF as the common ancestor. Each dot indicates the median dN/dS ratios of all related genes in the corresponding pathway. Significantly enriched (FDR-adjusted P < 0.05), rapidly evolving genes in KEGG pathways are highlighted in red.
BSF表达1798个物种特异性的重复基因,在短角亚目中最多。这些基因主要表达在幼虫期的最后阶段,这可能与其废物转化的取食行为相关。
Expansions in gene families are related to BSF environmental interactions
可以看到BSF与其他双翅目昆虫相比,在解毒酶,嗅觉感受,免疫因子,免疫通路相关的基因出现大量扩张,这与它的环境适应是相关联的。
Fig. 3 Expansions in gene families related to BSF environmental adaptation. a Number of gene copies in the indicated families related to environmental adaptation in dipteran species. The area size of each pie indicates the relative gene number in each family. b–e Phylogenetic relationships across three dipteran species for gene families with prominent expansions in BSF: gram-negative binding proteins (b), cecropin antimicrobial peptides (c), Olfactoery receptors (d), cytochrome P450s (e). Phylogenetic trees were estimated using the maximum likelihood method.
Intestinal transcriptome of BSF larvae fed on organic waste
它们通过对BSF幼虫喂食包括食物废物,家禽粪,牛粪和猪粪,分别在第4,6,8,12天提取中肠进行转录组分析。
Fig. 4 Intestinal transcriptome in BSF larvae fed with organic waste. Midguts of BSF larvae fed with food waste (FW), poultry manure (PM),
dairy manure (DM), or swine manure (SM) were sampled on days 4, 6, 8, and 12 of feeding with the indicated diet. The samples were subjected
to RNA-seq. a Distributions of expressed genes (n = 9417) across 16 samples: Genes expressed at each time point under each type of diet are
labeled “All”; those expressed in 15 out of 16 samples are labeled “Almost all”; genes commonly expressed under each diet but not at every
time point are labeled “Broad”; genes only expressed in one sample are labeled “Orphan”; genes only expressed by larvae fed with manure are
labeled “Manure”; and genes only expressed in larvae fed with food waste are labeled “Waste”. b Principal component analysis of intestinal
samples based on their overall expression profiles. The first two eigenvectors that explained 34.2% and 20.4% of the variance are plotted. c
Venn diagram of the 500 most highly expressed genes (~5% of all expressed genes), selected for each type of diet based on the average
expression values across all time points. A total of 326 genes were expressed by larvae fed all four diets. d The 326 genes expressed by larvae
fed all four diets were subjected to KEGG enrichment analysis. Pathways in blue belong to digestive systems, and pathways in red indicate
those related to infectious diseases. Gene counts are presented as histograms. Hypergeometric test (FDR-adjusted): *P < 0.05, ***P < 0.005,
****P < 0.001. e A representative gene cluster specific to BSF and highly expressed in larvae fed with organic waste. Genomic organization in
BSF and the homologous region in D. melanogaster are shown. Homolog pairs between these species are linked by lines. Genes in green and
blue indicate BSF-specific genes that belong to two ortholog groups. These 14 genes do not have homology to genes of any other sequenced
invertebrate species. Note that this cluster is located in the end of an assembled BSF scaffold. The heatmap shows the expression pattern of
corresponding genes in BSF larvae fed with the other diets at each of the four time points.
Microbiota of BSF larvae fed on organic wastes
通过16S rRNA测序,得到BSF在不同取食及不同时间的肠道微生物种类和丰度,可以看到取食牛粪和猪粪的幼虫肠道中有更多种类的微生物类群。不像中肠转录组的表达谱没有规律性,取食与肠道微生物类群相关性很高。这其中厚壁菌门(Firmicutes)的细菌种类最多,它们能分泌多种蛋白酶和果胶酶参与到消化稻草相关肥料的糖类代谢中
Firmicutes have an important
role in digestion of animal manure as these bacteria secrete a
variety of proteases and pectinases and are involved in degradation
of indigestible carbohydrates in straw-related compost
Fig. 5 Microbiome of BSF larvae fed with different types of organic waste. a Within-sample diversity estimates of the bacterial communities in
larvae fed with the indicated diets. b Constrained principal coordinate analysis of between-sample diversity. Bray-Curtis distances between
samples constrained by diets plotted for the first two CPCoAs. c The dynamic landscape of OTUs across all communities at a phylum level.
OTU richness is indicated by the area of corresponding symbols. Symbols indicate counts of contained sequences. Colors indicate the fraction
of target OTUs relative to all OTUs of the corresponding sample.
Genetic manipulation to facilitate the utilization of BSF larvae
主要的思路就是能让BSF在幼虫阶段吃的更多,转化有机物的能力增加,在成虫阶段减少其移动的距离,这样可以积累大量的种群数量。
首先昆虫的变态过程是通过一系列激素和神经肽控制的,而促前胸腺激素(Ptth)可以控制蜕皮激素的合成与释放。Ptth的敲除可以有效延长幼虫到蛹的时间,two sgRNAs, targeted to the second and fourth exons, to disrupt HiPtth substantially in vivo。the last larval instar increased from 4–5 days in controls to > 85 days in mutant larvae of any mosaic forms of disrupted HiPtth。体型和体重也有明显增加,这可能由于延长其取食时间导致的。
其次,通过与果蝇翅发育基因的同源比对,BSF. Vestigial (Vg)编码对果蝇翅大小和形状的基因。通过对其敲除得到了无翅的成虫个体,并且不影响成虫的发育。
Fig. 6 Mutagenesis of Ptth leads to increased feeding capacity in BSF larvae. The CRISPR/Cas9 system was used to induce mutations at the
HiPtth locus in H. illucens. a Schematic representation of the exon/intron boundaries of the HiPtth gene. Exons are shown as boxes; thin lines
represent introns; numbers are fragment lengths in base pairs (bp). Target site (TS) locations are noted and PAM sequences are shown in red.
b Sequences of the targeted region in the HiPtth locus in the mutants. The PAM sequence is in red. The numbers of nucleotides deleted in
each line are indicated on the right. c Morphology of HiPtth mutants showing their greater size relative to wild type (WT) controls. d Average
body weights of mutants and control (n = 30; mean values ± SEM).
Fig. 7 Mutagenesis of Vg in BSF eliminates wings in adults.
a Schematic representation of the exon/intron boundaries of HiVg.
Exons are shown as boxes and thin lines represent the introns.
Target site (TS) locations are noted and PAM sequences are shown in
red. b Sequences of the targeted region in the corresponding loci of
Vg mutants. The PAM sequence is in red. The numbers of
nucleotides deleted in each line are indicated on the right.
c Phenotypic images show that Vg mutants lack wings in the
adult stage.
MATERIALS AND METHODS
Genome sequencing
提取单个蛹的DNA用于基因组测序,主要通过构建不同插入片段大小的paired-end和mate-pair文库来构建contig和scaffold。
Genome assembly
Kmer分析评估基因组大小,Seqtk v1.0 trim Adaptors and low-quality bases.Kmer的统计使用jellyfish(21mer)。杂合度和其他基因组特征使用GenomeScope。
- MiSeq read pairs were utilized to assemble contigs using DiscovarDeNovo,Initial contigs were processed by redundans v0.11c63 to remove potential redundant sequences。
- The paired-end read information from the long libraries was used step by step from 800-bp to 13-kb insert size to join contigs into scaffolds using SSPACE。
- The remaining gaps within scaffolds were iteratively filled with paired-end reads of 250-bp and 800-bp inserts using GapCloser available in SOAPdenovo。
- CEGMA (Core Eukaryotic Genes Mapping Approach) and BUSCO (Benchmarking Universal Single-Copy Orthologs)用于基因组组装质量评估
Genome annotation
重复序列注释
- Tandem Repeats Finder to annotate the tandem repeats(Tandem Repeats Database)
- RepeatModeler to construct a de novo repeat library
- Repeat-Masker to search similar TEs against the known Repbase TE library and de novo repeat library
- LTR FINDER to find long terminal repeats (LTRs)
蛋白编码基因注释
- transcriptome evidence
两个生物学重复的12个连续BSF发育阶段的转录组数据,HISAT2 to map RNA-seq reads to the reference genome and StringTie to predict exons。 - homolog alignments
GeneWise with protein inputs from six dipteran species。 - ab initio gene annotation
Three independent gene predictors were applied to generate ab initio signatures, including AUGUSTUS, SNAPand Genscan.
上述三种pipelines最后都通过GLEAN产生一致性的基因集。
具体基因家族的功能注释需要人工矫正,TBLASTN搜索双翅目的同源基因确定其genomic loci,基因结构预测通过GeneWise,基因的保守域及生物通路通过KEGG的KO注释得到。基因家族的收缩和扩张通过本地的InterProScan去搜索双翅目基因组。基因的表达定量使用salmon,标准化表达值TPM。
Comparative genomics
orthomclSoftware用于寻找the final orthologs, inparalogs, and co-orthologs。Multiple alignments of protein sequences for each group
were performed using Muscle,Gblocks to identify conserved blocks。
Conserved blocks were finally concatenated to 10 super genes with 255,475 amino acids, which were used to quantify the maximum likelihood
phylogeny using RAxML。
Codeml from the PAML package was used to calculate dN/dS ratios under the F3X4 codon frequency.
Functional enrichment analyses were performed via an online OMICSHARE cloud platform (http://www.omicshare.com/tools/Home/Soft/pathwaygsea).
Analysis of the BSF intestinal transcriptome
- Each sample was independently mapped to the reference genome and subjected to expression profiling using the mode “quant” of salmon with the parameter “-validateMappings”,All independent profile were finally merged to a TPM matrix using the mode“quantmerge” of salmon 。
- Expression profile-based principle component analysis was performed using the built-in R function “prcomp”。
Metagenomic analyses of BSF intestinal microbiota
肠道微生物的16S rRNA sequencing.
- Clean read pairs were merged using the built-in command “join_paired_ends.py” from QIIME .
- OTU analyses were performed by VSEARCH. Within- and between-sample diversities were estimated by the built-in QIIME scripts “alpha_diversity.py” and “beta_diversity.py”, respectively.
- The dynamic landscape of OTUs was generated using the online platform, SILVAngs (https://www.arb-silva.de/ngs).
Mutagenesis of BSF target genes
通过与其他双翅目昆虫的同源比对得到预测的HiPtth and HiVg 的ORFs。With the PAM sequences in consideration, newly designed sgRNAs should follow the NNN19GG rule。
Fertilized eggs were collected within 1 h and microinjection was performed within 2 h of oviposition. Cas9 protein (200 ng/μL) with the sgRNA-1 (100 ng/μL) and sgRNA-2 (100 ng/μL) molecules were co-injected into preblastoderm embryos.
-
first instar larvae were selected for genomic DNA preparation. Fragments covering the two targeting sites were amplified,The amplified fragments were cloned into a pJET1.2 vector (Fermentas) and sequenced on the Sanger platform.