- Journal List
- Heliyon
- v.10(6); 2024 Mar 30
- PMC10966592
Learn more: PMC Disclaimer | PMC Copyright Notice
Abstract
Terpene synthases (TPSs) regulate plant growth, development, and stress response. TPS genes have been identified in Arabidopsis thaliana and Zea mays. Cannabis sativa TPS genes were identified and analyzed using bioinformatics. Genomic data were downloaded from Plant Transcription Factor Database and National Center for Biotechnology Information database, and TPS genes were predicted, analyzed, and visualized using ExPASy, PlantCare, and other online websites along with TBtools, MEGA software, and other software. To verify its role, quantitative real-time polymerase chain reaction (qRT-PCR) tests were conducted. The Cannabis sativa TPS family comprises 41 elements distributed over 8 chromosomes and a single scaffold segment. The isoelectric point varied between 4.96 and 7.03, while the molecular weight spanned from 20705.90 to 102324.64 Da. The majority of genes were found in the cytoplasm and chloroplasts, with the remainder situated in the peroxisome, nucleus, plasma membrane, and mitochondria. Several cis-acting components associated with stress response were present in the gene’s upstream promoter region. Data from RNA sequencing and qRT-PCR revealed specific expression of TPS genes in all five organs of female Cannabis sativa plants. Collinearity analysis showed 4 homologous gene pairs between the Cannabis sativa and Arabidopsis thaliana, with many pairs of homologous genes in other species, which was consistent with the dicotyledons evolutionary relationship. Furthermore, some genes may participate in Cannabis sativa growth and development and play a role in secondary metabolite synthesis. Therefore, bioinformatics analysis of the Cannabis sativa TPS gene family provides a theoretical basis for future research on the volatile terpene compounds of Cannabis sativa.
1. Introduction
The synthesis of volatile organic compounds (VOCs) plays an important biological role in plant tissues. Based on their sources, VOCs can be classified as terpenoids, benzene aromatic hydrocarbons, and fatty acid derivatives, with the highest number of VOCs being classified as terpenoids [1]. Terpenes are derived from compounds produced by the mevalonate and methylerythritol phosphate pathways [2]. Their molecular skeleton is based on isoprene or its isomer, the C5 unit of dimethyl allylic acetyl pyrophosphate, including sesquiterpenes, monoterpenes, diterpenes, triterpenes, and other types [3]. Terpenoids play important roles in plant physiology and biochemistry, such as photosynthesis, electron transfer, and developmental regulation [4]. The role of plant terpenes is crucial in luring insect pollinators, defending plants, facilitating plant-to-plant interactions, and mediating interactions across diverse ecological environments[ [5,6]]. Terpenoids are categorized into primary and secondary metabolites, depending on their roles in plants. Gibberellin, abscisic acid, carotenoids, and sterols are important primary metabolites that can regulate cell elongation and plant growth, participate in photosynthesis, and control membrane fluidity. A majority of plant terpenoids, including secondary metabolites like artemisinin, paclitaxel, and gossypol, are vital in the interaction between plants and their environment, offering substantial medicinal benefits [ [7,8]].
In line with their environmental adaptation function, terpenes are produced by various terpene synthases (TPSs), which have evolved to include TPS with different product structures through the modification of multiple amino acids of existing enzymes [9]. Despite the existence of a common “terpenoid synthase fold,” [10] differences in the sequence of the family is quite high, only maintaining the overall folding of the active site and the basic configuration. These enzymes are unusual because they have mixed substrates (some use 10 carbon geranyl diphosphates or 15 carbon farnesyl diphosphates) and products. Many TPSs generate mixtures of different products on the same substrate[ [11,12]]. At the base of this evolutionary plasticity, changing only one amino acid at the active site can lead to different product morphologies[ [[13], [14], [15]]]. Angiosperms typically possess a medium-sized group of these enzymes, with some evidently emerging from recent replication, while others are separated and experience both divergent and convergent evolutionary processes. Typically, it’s impossible to ascertain the product spectrum of an enzyme solely through its sequence resemblance [16].
TPS stands as the principal enzyme in the creation of terpene compounds, whose varied nature enhances the variety of terpenes [17]. Based on their different products, these can be categorized into monoterpene synthases, sesquiterpene synthases, and diterpene synthases [18]. Every TPS gene possesses a pair of preserved domains; the conserved PF03936 (C-terminus) and PF01397 (N-terminus) domains can be found in the Pfam database. The family is divided into eight subfamilies based on the phylogenetic relationships of the TPS gene family in plants: TPS-a, TPS-b, TPS-c, TPS-d (gymnosperm endemic), TPS-e, TPS-f, TPS-g, and TPS-h (Selaginella endemic) [ [[19], [20], [21]]]. To adapt to special ecological niches, such as attracting pollinators, spreading seeds, combating pathogens, and preventing consumption by herbivores[ [5,6,22,23]],terpenoids in plants have undergone lineage-specific evolution, with varying degrees of expansion and variation in various subfamilies[ [24,25]]. In dicotyledons and monocotyledons: TPS-a encodes sesquiterpene synthase; TPS-b, unique to angiosperms, produces a monoterpene synthase featuring the R (R)X8W motif, facilitating isomerization cyclization processes; TPS-c is a member of an ancient evolutionary branch and catalyzes cobaltyl pyrophosphate synthase; TPS-d, unique to gymnosperms, serves multiple roles, including the encoding of diterpene, monoterpene, and sesquiterpene synthases; TPS-e/f codes for cobaltyl pyrophosphate/kaurene synthase, which are key enzymes in the production of gibberellic acid; TPS-g possesses angiosperm-specific characteristics, yet it produces a monoterpene synthase devoid of the R (R)X8W motif; TPS-h is exclusively found in Selaginella.
Known alternatively as hemp, Cannabis sativa is a yearly herb from the Cannabis sativa family, ranking as one of the world’s most ancient cultivated crops [26]. It was first used as a medicinal plant in the Middle East and Asia and was introduced into western medicine at the beginning of the 19th century [27]. Cannabis sativa is recognized for its production of cannabinoids, phenolic terpenoids exhibiting diverse pharmacological properties, predominantly found in the glandular hair of female flowers [28]. Cannabinoid and tetrahydrocannabinol are the two main cannabinoids [29]. Tetrahydrocannabinol, which is a strictly controlled substance, is addictive and can cause anxiety, hallucinations, and immune decline in the human body [30]. Tetrahydrocannabinol, which is a strictly controlled substance, is addictive and can cause anxiety, hallucinations, and immune decline in the human body [31].
In recent years, with the advancement of genome sequencing technology, high-quality Cannabis sativa genomes have been identified; however, comprehensive research on the TPS gene of Cannabis sativa has not yet been conducted. Therefore, we analyzed the identification and expression of the Cannabis sativa TPS gene family using bioinformatics methods, providing a reference for further exploration of the function of the bamboo Cannabis sativa TPS gene.
2. Materials and methods
2.1. Sources of data and botanical materials
The entire genome sequence and annotation files of Cannabis sativa (GCA_900626175.1) were downloaded from the National Center for Biotechnology Information (NCBI) database. The CRBRx female strain of Cannabis sativa was used [32]. Arabidopsis thaliana (GCA_000001735.4),Malus pumila(GCF_002114115.1) were downloaded from NCBI database. Arabidopsis thaliana TPS gene data were obtained from the TAIR database (https://www.arabidopsis.org) (Supplementary Table 1).The transcriptomics data were retrieved from NCBI (PRJNA498707) [33].
2.2. Identification and analysis of TPS genes in Cannabis sativa
The known TPS protein sequence from Arabidopsis thaliana was initially used as a query sequence to identify members of the Cannabis sativa TPS gene family, and basic local alignment search tool in the TBtools [34]software package was used for sequence alignment. The E-value was set to 1e-5 to recover sequences of candidate Cannabis sativa TPS genes, which were then compared again in the NCBI database using the protein explosion function and were submitted to the CD Search and Pfam databases for further screening of the conserved domain. To isolate the Cannabis sativa TPS gene family members, sequences that were incomplete and superfluous were eliminated from the domain. ExPASy software (http://web. expand. The org/protparam/) tool was employed to forecast the physicochemical characteristics of the Cannabis sativa TPS protein. Research into the subcellular positioning of CsTPS proteins was conducted via the WoLFPSORT website (www.genscript.com/wolf-psort.html) [33].
2.3. Chromosomal localization and gene replication event analysis of Cannabis sativa TPS gene
The amino acid sequence of Cannabis sativa TPS was mapped to the Cannabis sativa chromosomes by analyzing the Cannabis sativa genome annotation file, and chromosome location information for each TPS gene was determined. A chart depicting the respective physical positions of chromosomes was created, assigning names to genes based on their chromosomal locations. MCscanX [35] was used to calculate the repeat events of the Cannabis sativa TPS genes. In addition, to visualize the collinearity between the Cannabis sativa and Arabidopsis thaliana genomes, we used the Dual Systematic Plot function in the TBtools [34] software package and highlighted the collinearity relationship of the TPS genes.
2.4. Protein–protein interaction network analysis
Utilizing the STRING website available online (https://string-db.org/cgi/Input. pl) forecasts the interplay among Cannabis sativa TPS proteins. For reference purposes, the standard settings of the Arabidopsis thaliana protein library are utilized. Data storage is in TSV format, with visualization achieved through the Cytoscape 3.8.0 software (www.cytoscape. org).
Phylogenetic analysis and classification of the TPS gene family and analysis of TPS gene structure and motif.
The MEME Suite web (https://meme-suite.org/mem e/tools/meme) [36] server was used to analyze the TPS motifs. Settings were adjusted to ZOOPS for the distribution of sites and 10 for counting motifs. Utilizing the 2000 base pair sequence preceding each TPS Cannabis sativa gene as the promoter area, this sequence was extracted and uploaded to the PlantCare (http://www.plantcare.co.uk/) site for a statistical examination of its cis component’s makeup [37]. TBtools software was used to analyze and visualize the composition of cis-expressed elements in the conserved domains, exons, introns, motifs, and promoter regions of TPS genes. The neighbor-joining method was applied using the MEGA 7 software (www.megasoftware.net) package to build a phylogenetic tree of TPS proteins in Cannabis sativa and other reported species [38]. Alignment of the amino acid sequences was conducted through the FUSTALW algorithm, and the bootstrap was set to 1000 times. All other parameters were set to their standard settings.
2.5. RNA extraction and cDNA preparation
Purchase of Plant Total RNA Kit (510105) from Hangzhou Xinjing Biochemical Reagent Development Co., Ltd. (Hangzhou, China). Reverse Transcription Kit (article number QP057) was purchased from Guangzhou Yijin Biotechnology Co., Ltd. (Guangzhou China). The RNA extraction process adhered to the methods and steps by Hangzhou Xinjing Biochemical Reagent Development Co., Ltd. Post RNA extraction, the sample was preserved at a temperature of −80 °C. In the analysis, an appropriate quantity of RNA was obtained by melting the reverse transcription reagent while it was maintained on ice. The procedure for the reverse transcription program proceeded in this manner: 25 °C for 5 min, 42 °C for 15 min, 85 °C for 5 min, and 4 °C for the retention phase. An ultramicro ultraviolet spectrophotometer was used to gauge the levels of cDNA products, which were then preserved in the fridge at −20 °C for subsequent application. The above operation steps are the same as the operation methods of other personnel in this research group [39].
Analysis of Cannabis sativa TPS gene expression profile and fluorescence quantitative real-time polymerase chain reaction (qRT- PCR)
Extracting TPS fragments per kilobase of transcript per million mapped reads (FPKM) value from RNA sequencing (RNA seq) data of Cannabis sativa [33]. The CsTPS gene expression was visually analyzed using TBtools software, and all FPKM values were processed using row scaling(Supplementary Table 2). Transcriptome data were verified by via qRT-PCR. The cDNA was used as a template for qRT-PCR experiments using the Hieff UNICON® Universal Blue qPCR SYBR Green Master Mix kit (Shanghai, China). For qRT-PCR, the AriaMx Real-Time PCR System was employed. Program: 95 °C for 2 min, followed by 40 cycles of 95 °C for 10 s, 60 °C for 30 s [40]. Each experiment was replicated thrice. The creation of qPCR primers utilized Premier software (version 5.0), with their synthesis carried out by Beijing Ruibo Xingke Biotechnology Co., Ltd.
3. Results
3.1. Identification and analysis of CsTPS gene family in Cannabis sativa
To identify TPS family members in Cannabis sativa genomes, we performed a whole genome scan using the Blastp method and Hidden Markov Model and then checked the conserved domain using CD-Search and Pfam databases(PF02365), ultimately identifying 41 CsTPS genes in Cannabis sativa. The amino acids encoded by CsTPS range from 180 to 891, with the molecular weight ranging from 20705.90 to 102324.64 and the isoelectric point ranging from 4.96 to 7.03. The majority of genes were found in the cytoplasm and chloroplasts, while the rest were situated in the peroxisome, nucleus, plasma membrane, and mitochondria (Table 1). Prediction of subcellular localization revealed 19 proteins in the chloroplasts, 2 in the nucleus, however, CsTPS-8 was also identified in the peroxisomes, CsTPS4, CsTPS5, CsTPS6, CsTPS15, CsTPS19, CsTPS24, CsTPS25, CsTPS26, CsTPS28, CsTPS29, CsTPS32, CsTPS34, CsTPS36, CsTPS37 and CsTPS41 in the cytoskeleton, CsTPS-30 in the plastid, and CsTPS-38 in mitochondria(Supplementary Table 3). The results of this study may lay the groundwork for additional investigations into TPS genes within Cannabis sativa.
Table 1
Accession NO. | Gene name | Subfamily | aa | Subcellular location | pI | M.W |
---|---|---|---|---|---|---|
rna-XM_030630973.1 | CsTPS1 | c | 822 | Chloroplast | 6.14 | 94591.9 |
rna-XM_030631298.1 | CsTPS2 | a | 180 | Chloroplast | 5.39 | 20705.9 |
rna-XM_030635217.1 | CsTPS3 | a | 570 | Chloroplast | 5.58 | 65424.72 |
rna-XM_030635393.1 | CsTPS4 | c | 551 | Cytoplasmic | 5.32 | 63953.83 |
rna-XM_030635402.1 | CsTPS5 | c | 485 | Cytoplasmic | 6.06 | 56469.59 |
rna-XM_030635401.1 | CsTPS6 | c | 589 | Cytoplasmic | 6.02 | 67853.35 |
rna-XM_030641433.1 | CsTPS7 | c | 646 | Chloroplast | 5.38 | 73732.08 |
rna-XM_030644589.1 | CsTPS8 | b | 585 | Peroxisomal | 4.96 | 69416.65 |
rna-XM_030644482.1 | CsTPS9 | b | 646 | Chloroplast | 5.33 | 75975.84 |
rna-XM_030645709.1 | CsTPS10 | b | 630 | Chloroplast | 5.22 | 73986.80 |
rna-XM_030645710.1 | CsTPS11 | b | 623 | Chloroplast | 5.78 | 73396.58 |
rna-XM_030645438.1 | CsTPS12 | b | 622 | Chloroplast | 5.58 | 73431.07 |
rna-XM_030645439.1 | CsTPS13 | b | 619 | Chloroplast | 6.14 | 72789.31 |
rna-XM_030644559.1 | CsTPS14 | b | 614 | Chloroplast | 5.13 | 71680.38 |
rna-XM_030644770.1 | CsTPS15 | b | 285 | Cytoplasmic | 5.06 | 33477.19 |
rna-XM_030644768.1 | CsTPS16 | b | 623 | Chloroplast | 6.75 | 72524.86 |
rna-XM_030644766.1 | CsTPS17 | b | 635 | Chloroplast | 6.12 | 74421.22 |
rna-XM_030645191.1 | CsTPS18 | b | 634 | Chloroplast | 6.03 | 74286.05 |
rna-XM_030652072.1 | CsTPS19 | b | 576 | Cytoplasmic | 5.76 | 67553.41 |
rna-XM_030652515.1 | CsTPS20 | b | 572 | Chloroplast | 5.48 | 67222.67 |
rna-XM_030652514.1 | CsTPS21 | b | 574 | Chloroplast | 5.28 | 67378.55 |
rna-XM_030652513.1 | CsTPS22 | b | 574 | Chloroplast | 5.41 | 67506.82 |
rna-XM_030655054.1 | CsTPS23 | a | 556 | Nuclear | 5.87 | 65400.70 |
rna-XM_030621835.1 | CsTPS24 | a | 564 | Cytoplasmic | 5.88 | 66531.24 |
rna-XM_030622484.1 | CsTPS25 | a | 566 | Cytoplasmic | 5.51 | 66938.66 |
rna-XM_030622955.1 | CsTPS26 | a | 551 | Cytoplasmic | 6.26 | 64563.14 |
rna-XM_030622956.1 | CsTPS27 | a | 552 | Chloroplast | 6.02 | 64905.48 |
rna-XM_030622957.1 | CsTPS28 | a | 551 | Cytoplasmic | 6.08 | 64680.18 |
rna-XM_030622624.1 | CsTPS29 | a | 566 | Cytoplasmic | 5.63 | 66443.46 |
rna-XM_030622631.1 | CsTPS30 | a | 748 | plastid | 5.79 | 87222.52 |
rna-XM_030622632.1 | CsTPS31 | a | 552 | Chloroplast | 5.65 | 65232.64 |
rna-XM_030654009.1 | CsTPS32 | a | 571 | Cytoplasmic | 5.18 | 67043.19 |
rna-XM_030622635.1 | CsTPS33 | a | 342 | Nuclear | 5.17 | 40121.24 |
rna-XM_030653866.1 | CsTPS34 | a | 564 | Cytoplasmic | 5.69 | 66312.54 |
rna-XM_030623148.1 | CsTPS35 | e/f | 865 | Chloroplast | 7.03 | 100977.13 |
rna-XM_030623146.1 | CsTPS36 | e/f | 866 | Cytoplasmic | 6.51 | 100614.42 |
rna-XM_030624715.1 | CsTPS37 | e/f | 891 | Cytoplasmic | 6.37 | 102324.64 |
rna-XM_030624716.1 | CsTPS38 | e/f | 838 | Mitochondrial | 6.30 | 97214.12 |
rna-XM_030628902.1 | CsTPS39 | c | 613 | Chloroplast | 7.01 | 70612.94 |
rna-XM_030626238.1 | CsTPS40 | c | 587 | Chloroplast | 5.95 | 67549.56 |
rna-XM_030629488.1 | CsTPS41 | b | 572 | Cytoplasmic | 5.86 | 67783.67 |
3.2. Chromosomal locations analysis of CsTPS genes
Cannabis sativa’s genomic annotation data facilitated the examination of CsTPS gene locations on chromosomes. 41 CsTPS genes were scattered randomly across eight chromosomes and a scaffold fragment of Cannabis sativa (Fig. 1A). Predominantly, the CsTPS gene was found at the extremities of each chromosome, exhibiting reduced presence in the central region. Chromosomes 4 and 6 housed the majority of the CsTPS genes. Chromosome 1 showed the lowest distribution, with only one genes. One genes (CsTPS) was distributed on one scaffold fragment. that was not attached to chromosomes (CsTPS41).(Supplementary Table 4).
Events of gene replication are crucial in creating diversity within gene families and in comprehending how species evolve. For the purpose of aiding the study of Cannabis sativa gene replication occurrences, we removed every scaffold segment from the Cannabis sativa genome, keeping only the genetic data of eight chromosomes. (Fig. 1B). One pairs of segmental gene replication were found on chromosomes NC_044371.1and NC_044379.1, and no gene pairs were found on the remaining chromosomes. The duplication of the CsTPS gene might have been a significant factor in the evolutionary process of Cannabis sativa.
Three-dimensional structure analysis of Cannabis sativa TPS showed that the subfamily structure was similar, but there were significant differences in the structure of genes from different subfamilies, such as different proteins with different structures (Fig. 2).
3.3. Phylogenetic tree analysis of Cannabis sativa TPS proteins
To elucidate the evolutionary relationship between Cannabis sativa TPS proteins, a phylogenetic tree based on neighbor-joining was constructed using the sequences of TPS proteins from Arabidopsis thaliana and a few other species, the other species are Oryza sativa, Arabidopsis thaliana, Nicotiana tabacum, Vitis vinifera, Citrus limon, Populus trichocarpa, Brassica napus, Brassica rapa, Cucurbita maxima, Chocolope, Ziziphus jujuba, Ricinus communis, Perilla frutescens, Salvia rosmarinus, Thymus caespititius, Origanum vulgare, Lavandula latifolia, Salvia officinalis, Salvia stenophylla, Solanum lycopersicum, Humulus lupulus, Antirrhinum majus, Physcomitrella patens, Abies grandis, Picea abies, Fragaria ananassa, Mentha x piperita, Cichorium intybus, Clarkia breweri, Gossypium arboretum, Solanum tuberosum, and Lycopersicum esculentum. (Fig. 3). All the CsTPS proteins clustered with AtTPS. Consistent with the classification of AtTPSs, CsTPSs were divided into five subfamilies: a (14 members), b (16 members), c (7 members), d (not present), and e/f (4 members). Interestingly, the e/f subgroup classification was relatively close, which has also been observed in other species (Supplementary Table 5).
3.4. Structural analysis and conservative motif identification of Cannabis sativa TPS genes
Generally, plant TPS gene family members contain two conserved domains: the conserved motif in the N-terminal domain is R(R)X8W (R: arginine, W: tryptophan, and X: alternative amino acids), and two highly conserved aspartic acid motifs are present in the C-terminal domain. The DDXXD motif is associated with the coordination of divalent ions and water molecules and the stability of the active sites. The NSE/DTE motif stands as the secondmotif. These motifs are located on the side of the active site entrance and bind the magnesium trinuclear clusters. Most TPSs belong to monoterpene, sesquiterpene, and diterpene synthases, each containing DDXXD and DXDD motifs. We analyzed all protein sequences of the CsTPS gene family. MEME was used to analyze conserved TPS protein motifs, and 10 conserved motifs designated motifs 1–10 were identified. As shown in Fig. 4 [41],the number of conserved motifs in proteins TPS01–TPS41 varied from 3 to 10, with multiple groups having the same motif. For example, TPS17 and TPS18 had the same conservative cardinality with motifs 1–10. We found that motif 1 was the most common motif present in 41 TPS proteins, followed by motifs 7 and 8. These results indicate that motifs 1, 7, and 8 are conserved in the TPS and are crucial for the function of Cannabis sativa TPS domain proteins. Further analysis was conducted on the multi sequence alignment of CsTPS protein (Supplementary Fig. 1).The N-terminus and C-terminus domains were found in almost all proteins, but the conserved motif of CsTPS2 was missing, possibly due to changes in the arginine tryptophan motif R (R) X8W and DDXXD motifs in TPS-a, or even the absence of a protein. Generally, the arrangement of genes and amino acids aligns with the findings of phylogenetic studies. The role of the CsTPS protein within this category can be deduced from the evolutionary connections of established TPS proteins.
For a deeper insight into Cannabis sativa’s TPS gene structure, we observed the arrangement of its coding regions (CDS), noncoding regions (UTR), and introns (Fig. 4). Every CsTPS gene possesses introns varying between one and fourteen, and within the same subfamily, the count of introns is comparable. As an illustration, members of the TPS-b subfamily possess seven introns, in contrast to the TPS-e/f subfamily’s eleven introns, and occasionally, there might be unique deviations.
We detected motifs A–L in TPS 17, TPS18, TPS21, TPS22, and so on, while TPS 2 and TPS 15 lacked many conserved motifs, including motifs 1, 6, and 7. However, within the structural domain, significant changes in the positions and quantities of proteins were observed to a certain extent (Fig. 5). According to the results of Motif (Supplementary Fig. 2) and conserved domains, it can be concluded that motifs are conserved within a subfamily, which may be related to their evolutionary conservatism. These results indicate that the CsTPS identified in this study are accurate(Supplementary Table 6). Findings from extensive phylogenetic analyses, motifs, conserved domains, and gene structures revealed the TPS gene’s significant conservation throughout an extended evolutionary journey [40].
3.5. Prediction of cis elements in Cannabis sativa TPS genes
Cis-regulatory sequences are noncoding sequences in the gene promoter region that play an significant role in regulating the transcription of related genes. The study extracted the 2000 bp upstream of each CsTPS gene to analyze the promoter region. The promoter region of the CsTPS gene contains a diverse array and quantity of cis-acting elements. The cis-acting elements related to hormone regulation include auxin, gibberellin, and abscisic acid response elements. The abiotic stress encompass cis elements for defense and stress reactions, along with elements responding to low temperatures. Plant growth and development are influenced by factors such as light, salicylic acid, and seed-specific regulatory responsive elements(Supplementary Table 7). Further analysis revealed that the CsTPS gene has cis-elements that bind to myeloblastosis (MYB) to participate in light and drought responses. One possibility is that CsTPS interacts with CsMYB to form a regulatory network (Fig. 6).
3.6. Collinearity analysis of Cannabis sativa TPS genes
To search for TPS homologous genes in Cannabis sativa and Arabidopsis thaliana, we conducted an association analysis of the entire genomes of Cannabis sativa and Arabidopsis thaliana, Malus pumila Mill and highlighted the TPS gene(Supplementary Table 8). Collinearity analysis (Fig. 7) showed that Cannabis sativa had a large number of homologous genes with other species, including 3 homologous gene pairs with Arabidopsis thaliana,6with Malus pumila Mil. Collinearity analysis detected CsTPS1, CsTPS4, CsTPS5, and CsTPS6 in the three plants, suggesting that these genes may be highly conserved. According to the predicted results (Fig. 7), Cannabis sativa has homologous genes in each species; however, Cannabis sativa is a dicot with few genes homologous to monocot plants such as Oryza sativa L [33]. In contrast, it shares a greater number of genes with dicotyledonous plants, such as Arabidopsis thaliana, which is consistent with the evolutionary relationships. Homologous genes may have similar functions, which warrants further study in subsequent functional analyses [29].
3.7. Protein–protein interaction network analysis
The STRING database was employed to forecast the traits of the protein protein interaction (PPI) within the Cannabis sativa TPS gene family (Fig. 8). In the diagram, the nodes represent the names of proteins. A node’s degree value increases with the number of nodes it connects to. The degree value is indicated by the node shape’s size and the color’s depth. In summary, our findings reveal that the PPI network comprises 10 nodes and 16 edges. It is worth noting that we noticed that the degree values of these proteins decreased in sequence. The CsTPS37 protein has the highest degree value and is considered to have stronger interactions with other proteins, thus playing an important role in regulating plant growth.
3.8. Pattern and qRT-PCR verification of Cannabis sativa TPS gene expression
Global changes were also observed across multiple samples and genes. To determine the expression mode of the CsTPS gene, we extracted the FPKM value of the CsTPS gene from RNA-sequencing data of Cannabis sativa flower, bracts, leaves, stems, seeds, and roots, compared their gene expression modes, and constructed a heat map. The results are shown in Fig. 9. Twelve CsTPS genes have tissue-specific expression, which suggests that these genes promote plant growth and development.
Using qRT-PCR, we validated the expression of TPS17, TPS18, TPS20, TPS21, TPS22, TPS26, TPS27, TPS40, TPS41, TPS12, TPS13, and TPS16 in different tissues (Fig. 10). The genes TPS13, TPS16, TPS17, TPS18, TPS26, TPS27, and TPS41 showed high expression levels in bracts, while TPS12, TPS20, and TPS21 genes were predominantly found in roots, and genes TPS40 and TPS22 were notably expressed in leaves. These findings indicate that these genes are specifically expressed in different tissues(Supplementary Table 9).
4. Discussion
The TPS gene family, which regulates plant growth, development, and other functions, has been identified in many plant species, existing in various organisms. Differences in the sizes of different genomes may lead to changes in the number of TPS gene family members [42]. For example, 70 TPS proteins have been identified in Zanthoxylum bungeanum [43], 32 in Arabidopsis thaliana, 30 in maize (Zea mays L.) [21], 34 in rice (Oryza sativa L.), and 41 in upland cotton (Gossypium hirsutum L.) [44]. They are the most numerous species in the TPS-a and TPS-b subfamilies and are broadly similar to the number of TPS genes identified in Cannabis sativa. However, there have been no comprehensive or systematic studies of the Cannabis sativa TPS gene family[ [45,46]]. This research involved identifying and analyzing the TPS gene family in Cannabis sativa, enhancing our comprehension of the CsTPSs gene’s function. The 41CsTPS gene has been pinpointed for inclusion in the STRING website, with the Arabidopsis thaliana protein database chosen as the initial reference. Certain proteins function as as monomers, whereas others work in tandem with chaperones or create complexes with different proteins. Our PPI reveals multiple interactions between Cannabis sativa TPS proteins, with coordination and balance between members of the same subfamily and different subfamilies, affecting the growth of Cannabis sativa.
Terrestrial plants typically possess a moderately sized TPS gene family, created through the process of gene replication [20]. We conducted chromosomal mapping of the CsTPS gene and found that some genes form homologous clusters, which could result from events of gene replication. Interestingly, we found 11 and 12 CsTPS genes distributed on chromosomes NC_044374.1 and NC_044377.1 respectively. This result is consistent with other plants, as many TPS genes in plants have highly conserved gene structures [ [47,48]]. Therefore a study of local duplication or tandem duplicates could give an idea of the evolutionary processes involved in such a concentration of CsTPS. We also know that specialized metabolites biosynthetic genes often cluster together, maybe these regions are rich in terpene biosynthesis genes [49]. This is very worthwhile for us to further study. MYB and MeJA arethe main signal molecules affecting plant growth and stress response [50]. Cis acting elements played a role in controlling gene expression. Certain transcription factors are activated and bind to cis acting elements, activating the expression of stress-related genes in plants exposed to adverse conditions [51]. We found that stress related elements (MYB, LTR), hormones (ABA, SA elements), and light responsive elements are widely present in the promoter regions of most Cannabis sativa TPS genes. The presence of these components in potatoes’ TPS gen[ [52,53]],suggests its potential role in stress, hormones, and light reactions. Expression profiling and Real-time quantitative analysis confirmed that our research showed that TPS13, TPS16, TPS17, TPS18, and TPS26 were highly expressed in the flowers and bracts, whereas TPS12, TPS20, TPS21, and TPS22 were highly expressed in the roots.
Structure of TPS genes within each family is highly conserved in plants, including Cannabis sativa. Although to the best of our knowledge, a systematic study of TPS gene lengths across species has not yet been performed, the lengths of the TPS-a and TPS-b genes in grapes [54], tomatoes [55], and other species vary within a range of approximately twofold. TPS introns are particularly large in Cannabis sativa and appear to be abnormally large in more common TPS genes, and introns contain various regulatory elements [56]. Earlier research has identified three potential mechanisms (exon/intron gain/loss, exon/pseudo-exonization, and insertion/deletion) that could lead to variances in gene architecture. This suggests that the TPS gene may have undergone functional differentiation over time, which is consistent with the results of earlier research on Arabidopsis thaliana [42]. Evolutionary studies suggest that CsTPS proteins have the ability to bind to proteins belonging to various species in the evolutionary tree, implying that TPS proteins across different species might perform analogous roles [57].
There were 41 CsTPS genes identified in the Cannabis sativa genome based on conserved domain. By constructing a phylogenetic tree, 41CsTPS genes were categorized into five distinct subfamilies: TPS-a, TPS-b, TPS-c, TPS-d, and TPS-e/f. In the TPS gene of Cannabis sativa, there are no members of the TPS-d subfamily. Research indicates that TPS-d represents a distinct subfamily within gymnosperms [20]. This conclusion is consistent with the research results of other plants such as tomato [55] and Arabidopsis thaliana [47]. The phylogenetic study of the CsTPS gene family reveals TPS-b as the most extensive subfamily comprising 16 genes, and TPS-a as the second-largest with 14 genes. In the TPS phylogenies, the CsTPS sequences from the TPS-a and TPS-b subfamilies were grouped together with the TPS sequences from their close relatives (Fig. 2). In these two subfamilies, we found Cannabis sativa-specific amplification, indicating that the diversity in CsTPS biosynthesis by monoterpenes and sesquiterpenes may be due to the relatively recent proliferation of ancestral CsTPS [58]. Within the TPS-b subfamily, two different flower CsTPSs were identified, CsTPS-b1 and CsTPS-b2. TPS α-bisabolol is a sesquiterpene found in several varieties of Cannabis sativa but it is not produced by any functionally characterized CsTPS-a family enzyme. This result suggests that the formation of Cannabis sativa trichomes may be mediated by members of TPS-a and TPS-b subfamilies. A member of the TPS-b subfamily has also been shown to function as a sesquiterpene synthase in sandalwood (Santalum sp.) [59]. Members of the TPS-b family can produce bisandrostane sesquiterpenes in Cannabis sativa, which may be due to the similar evolutionary pathways of their respective monoterpene synthase progenitors [60].
Domestication and selective breeding have led to shifts in the distribution and abundance of terpenes[ [61,62]]. Specifically, for millennia, Cannabis sativa has undergone domestication to enhance the volume and potency of resins; yet, the distribution and ecological function of terpenes in the original Cannabis sativa are still unclear. Current research emphasizes that a large number of CsTPS genes and different products encoding TPS enzyme activity contribute to the complex terpene spectrum of Cannabis sativa. Understanding the particular terpene spectrum of standardized Cannabis sativa varieties, polygenic properties of the CsTPS family, and typical various products that encode enzymes is important for the selection or breeding of plants and their improvement through genome editing.
The size of the CsTPS genes was similar to that reported for plant species, with changes in TPS gene expression observed in other families [63]. Differences in specific plants correlate with changes in terpene patterns in diverse systems, including both cultivated and non-cultivated plants, as well as angiosperm and gymnosperm species. In grapevines,the expression of the VvTPS gene family varies among tissues, developmental phases, and types, resulting in diverse terpene profiles influenced by particular TPS gene combinations active during the flowering and ripening of fruits[ [[64], [65], [66]]]. The changes in the classification of terpenes and the expression of the CsTPS gene family of Cannabis sativa varieties described here may provide opportunities for the expansion, design, and synthesis of terpenes in Cannabis sativa.
5. Conclusions
In our study, 41 TPS genes were pinpointed by analyzing the sequence characteristics, chromosome location, gene structure, conserved motifs, phylogeny, and differential expression of Cannabis sativa TPS genes. Additionally, the TPS gene’s promoter region encompasses several cis-acting elements associated with stress., and TPS is differentially expressed in different areas. Thus, the expression of this family of genes can be affected by hormones and external environmental factors. Predictions were made about the TPS gene’s function in the growth of Cannabis sativa and the creation of secondary metabolites, aiding in extensive research on the TPS gene’s role and offering a molecular foundation for its function and theoretical backing for choosing superior industrial Cannabis sativa vareties.
Funding
Heilongjiang Province Postdoctoral Science Foundation (Grant No. LBH-Z21028),Talent training project supported by the central government for the reform and development of local colleges and Universities (No.ZYRCB2021008),
Application Research of Beiyao (Heilongjiang University of Chinese Medicine), Ministry of Education and Heilongjiang Touyan Innovation Team Program (Grant Number: [2019] No. 5). Study on the Chemical Constituents and Bioactivity Analysis of Cannabinoids in Traditional Chinese Medicine Hemp Seed (2023yjscx025).
Data availability statement
The entire genome sequence and annotation files of Cannabis sativa (GCA_900626175.1) and the genome and annotation files of species, such as Arabidopsis thaliana (GCA_000001735.4), were downloaded from the National Center for Biotechnology Information (NCBI) database. The transcriptomic data used in this study were completed by our group and are publicly available and can be found in the NCBI database (PRJNA498707).
CRediT authorship contribution statement
Jiao Xu: Writing – original draft, Software, Project administration, Funding acquisition, Formal analysis, Data curation. Lingyang Kong: Software, Project administration, Funding acquisition, Data curation, Conceptualization. Weichao Ren: Validation, Software, Data curation, Conceptualization. Zhen Wang: Visualization, Data curation, Conceptualization. Lili Tang: Software, Methodology, Formal analysis, Data curation. Wei Wu: Resources, Formal analysis, Data curation. Xiubo Liu: Software, Resources, Project administration, Funding acquisition. Wei Ma: Writing – review & editing, Validation, Supervision, Resources, Data curation, Conceptualization. Shuquan Zhang: Validation, Methodology, Investigation, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Appendix ASupplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2024.e27817.
Appendix A. Supplementary data
The following is the Supplementary data to this article.
Multimedia component 1: