ISCB-LA SoIBio BioNetMX Symposium 2020 Virtual Viewing Hall
View Talks By Category
Article and Flash Talks | Poster Presentations |
---|---|
Population dynamics and Evolution in Eukaryotes
- Alicia Mastretta-Yanes, Mexico
Short Abstract: Alicia Mastretta-Yanes studied biology at UNAM and performed her PhD on evolutionary biology at the University of East Anglia, UK. Currently she is a CONACYT Research Fellow at CONABIO, México. Her research focuses on the microevolutionary processes shaping Mexican biodiversity. This inclues from the effect of topography and past climate fluctuations, to the present implications of domestication and human managment. In 2020 she won a L'Oréal–UNESCO-AMC Fellowship for Women in Science. She likes plants, and keeps accumulating them at home, even if there is no more space. She is an external tutor at the Biological Sciences and Biomedical Sciences Postgrades at UNAM, where she also teaches bioinformatics for biologists.
Video not uploaded
- Vladimir Bajić, Max Planck Institute for Evolutionary Anthropology, Germany
- Mark Stoneking, Max Planck Institute for Evolutionary Anthropology, Germany
Short Abstract: Mitochondrial DNA (mtDNA) and the male-specific region of the Y chromosome (MSY) are commonly used uniparental markers in population genetics that provide information on the history and relationships of populations and individuals. Genetic profiles of a population inferred from mtDNA vs. the MSY often differ from each other, and from the genetic profile inferred from autosomal markers, due to differences in the maternal and paternal histories of human populations. Recently, many populations have been described using both uniparental and autosomal markers, however little is known about associations of uniparental haplogroups with autosomal ancestry components. Here we synergistically use mtDNA and MSY haplogroup compositions together with autosomal ancestry components to define “ancestry packages”, i.e. associated combinations of specific mtDNA and MSY haplogroups with specific autosomal ancestry components, which can be indicative of ancestral genetic compositions. Our results show that i) uniparental haplogroups are highly associated with autosomal ancestry components, suggesting the existence of ancestry packages; ii) ancestry packages can be used to objectively classify the likely geographic origin of haplogroups (and other markers for which population estimates are available) in accordance with autosomal ancestry components; iii) ancestry packages can provide information about the potential direction and composition of sex-biased gene flow between different putative ancestral populations.
Video not uploaded
Show
- Yue Zhang, University of Ottawa, Canada
- Zhe Yu, University of Ottawa, Canada
- Chunfang Zheng, University of Ottawa, Canada
- David Sankoff, University of Ottawa, Canada
Short Abstract: Whole genome doubling, tripling or higher multiplying (WGD), due to fixation of polyploidization events, is attested in almost all lineages of the flowering plants, recurring in the ancestry of some plants two, three or more times in retracing their history to the earliest angiosperm. This major mechanism in genome evolution, which generally appears as instantaneous on the evolutionary time scale, sets in operation a compensatory process called fractionation, the loss of duplicate genes, initially rapid, but continuing over millions and tens of millions of years. We study this process by statistically comparing the distribution of duplicate gene pairs as a function of their time of creation, as measured by sequence similarity. The stochastic model for accounting for this distribution, though exceedingly simple, still has too many rate parameters to be estimated based only on the similarity distribution, while the computational procedures for compiling the distribution from annotated genomic data is heavily biased against earlier polyploidization events - syntenic ""crumble"". Other parameters, such as the size of the initial gene complement and the ploidy of the various events giving rise to duplicate gene pairs, are even more inaccessible. Here we show how the frequency of unpaired genes, identified via their embedding in stretches of duplicate pairs, together with previously established constraints among some parameters, adds enormously to the range of successive polyploidization events that can be analyzed. This also allows us to extimate initial gene complement and to correct for the bias due to crumble. We also discuss how to determine ploidy through recourse to similar gene triples deduced from the duplicate gene data. Finally we explore the applicability of our methodology to four flowering plant genomes covering a range of different polyploidization histories.
Video not uploaded
- Malay Basu, University of Alabama, Birmingham, United States
Short Abstract: Background Genomes are remarkably similar to natural language texts. From an information theory perspective, we can think of amino acid residues as letters, protein domains as words, and proteins as sentences consisting of ordered arrangements of protein domains (domain architectures). This work describes our recent efforts towards understanding the linguistic properties of genomes. Results Our recent work showed that the complexity of “grammars” in all major branches of life is close to a universal constant of ~1.2 bits. This is remarkably similar to natural languages; such an--yet unexplained--universal information gain has been observed and generally used to determine whether a series of symbols represent a language. In this work, we describe the implications of this work and its extension in various areas with a particular emphasis on measuring the proteome complexities in human tissues. Conclusion Our work established the similarity between natural languages and genomes and showed, for the first time, that there exists a “quasi-universal grammar” of protein domains and measured the minimal complexity of proteome required for a functional cell. We also describe the proteome complexities in human tissues and their functional significance.
Video not uploaded
Show
- David Requena, Rockefeller University, United States
- Aldhair Medico, Cayetano Heredia Peruvian University (Universidad Peruana Cayetano Heredia), Peru
- Ruy D. Chacón, Sao Paulo University (Universidade de São Paulo), Brazil
- Manuel Ramírez, San Marcos University (Universidad Nacional Mayor de San Marcos), Peru
- Obert Marín-Sánchez, San Marcos University (Universidad Nacional Mayor de San Marcos), Peru
Short Abstract: SARS-CoV-2 is the causing agent of the COVID-19 pandemic. South America is the most affected region per capita, suffering more than 6 million cases and 200,000 deaths as of August 2020. Numerous ongoing efforts to control the disease include the development of peptide-based immunodiagnostic tests and vaccines. This requires knowledge about allele frequencies of the HLA system. The largest repository of HLA frequencies is the Allele Frequency Net Database (AFNDB), widely used by researchers worldwide. However, it has a passive data collection strategy, relying on the researchers to upload their studies’ data. This results in under-representation of many countries, showing only few studies for South America. To address this problem, we enriched the current scenario with an extensive review of studies reporting HLA frequencies of South American populations. Studies available in PubMed from 1990 onwards, genotyping HLA alleles with 4-digit resolution, were selected. As result, we obtained more than 12 million new datapoints. We combined the datasets selected per country (matching technology and nomenclature), calculating weighted average frequencies per allele. This is summarized in the first integrated map of HLA allelic frequencies of South America. Both the methodology and information collected are presented in full detail to guarantee reproducibility. Then, using the most frequent South American HLA alleles (weighted frequency >5%), linear T-cell epitopes were predicted in SARS-CoV-2 proteins. We used the state-of-the-art prediction software based on artificial neural networks: NetMHCpan-v4.0 and MHCflurry-v1.6.0 (for HLA-II) and NetMHCIIpan-v4.0 (for HLA-II). Predicted Class-I and Class-II peptides were selected according to their binding to South American alleles. Class-II peptides were also filtered according to their three-dimensional accessibility. We selected 27 HLA-I and 34 HLA-II candidate epitopes, from which 14 and 4 (respectively) have experimental evidence in other coronaviruses, reported in the Immune Epitope Database and Analysis Resource (IEDB). Recent similar studies have presented SARS-CoV-2 candidate epitopes based on its similarity with experimentally-detected epitopes of SARS. They attempted worldwide coverage, using either the most frequent HLA supertypes or the IEDB population tool (based on the AFNDB information). Here, we show that this resulted in poor coverage for South America. Therefore, our study provides valuable information for regional epitope-based strategies against SARS-CoV-2. Additionally, updated HLA frequencies provide a better representation of South America and could be useful in various immunogenetic studies of different diseases, such as infectious and autoimmune diseases, cancer and anti-tumor immune response, organ transplants, among others.
Video not uploaded
Show
- Nicolas Zuniga, Universidad de Concepcion, Chile
- Patricio Castro, Universidad de Concepcion, Chile
- Felipe Aguilera, Universidad de Concepcion, Chile
Short Abstract: VGLUT genes play essential roles in excitatory synapsis transmission by concentrating glutamate into presynaptic vesicles. In vertebrates, these genes comprise three highly homologous proteins (VGLUT1-3), which are encoded by solute carrier genes SLC17A6-8, and are expressed mainly in glutamatergic neurons in the neurocortex, hippocampus, and hypothalamus. Although these genes are evolutionarily conserved in vertebrates, their origin and early evolution are not well understood. Here, we performed a thorough phylogenetic and structural analyses, spanning 110 eukaryotic species, to show that VGLUT is closely related to Sialin (SLC17A5) rather than to other SCL17 family members. We also revealed two distinct phylogenetic clades of VGLUT genes, one comprising vertebrates and the other comprising ambulacralians, protostomes, xenacoelomorphs, and cnidarians. In addition, we discovered a new clade of invertebrate phosphate transporters that is closely-related to VGLUT and Sialin rather than phosphate transporters (SLC17A1-4). The evolution of VGLUT genes in modern animal lineages is typified by the loss of one or more members in vertebrates and lineage-specific duplications in some invertebrate species. Vertebrates VGLUTs shows the three critical regions associated with vesicular endocytosis and recycling, such as the transmembrane glutamate-binding region, the di-leucine-like motifs, and the proline-rich domain at the C-terminal domain. Comparative analyses revealed that glutamate binding residues and di-leucine-like motifs emerge at the dawn of bilaterian animals, with the subsequent loss of di-leucine-like motifs in hemichordates. Furthermore, structural comparisons showed that VGLUT1 in mammals has a double proline-rich domain at the C-terminal, which is absent in the mammalian VGLUT2-3 and invertebrate VGLUTs. Altogether, this study reveals the origin and evolution of VGLUT genes but also uncovers molecular signatures and endocytic motifs as a key driver in the evolution of glutamate transport in animals.
Video not uploaded
Show
- Cameron DeChristopher, Bristol Community College, United States
Short Abstract: The demise of the Mammuthus primigenius has long been attributed to a combination of climate change and over-hunting by early humans yet evidence in recent years has contradicted those initial findings. Ample evidence suggests that populations of woolly mammoths lived in areas free of early human hunting as well as evidence of woolly mammoths surviving previous climate change cycles. Here we seek to evaluate if the extinction of Mammuthus primigenius was caused, in part, by a viral infection. The closest living relatives to the woolly mammoth is the Asian elephant (Elephas maximus). While Asian elephants managed to outlive their distant cousins, the large herbivore is today plagued by a fatal viral disease with an up to 80% fatality rate, elephant endotheliotropic herpesvirus. A viral infection might account for some of the unexplained elements of the mammoth’s extinction. To find traces of a potential viral infection similar to the lethal elephant endotheliotropic herpesvirus, we utilized the NIH’s BLAST tool to cross-reference the genome of the elephant endotheliotropic herpesvirus with that of the genome of the Mammuthus primigenius. We sought to find DNA traces left behind by an Elephant endotheliotropic herpesvirus infection in the mammoth’s genetic code. We also searched similar sequences across relative genomes such as the Asian elephant and manatee to identify unique traces of viral DNA within the mammoth genome. Ultimately, we seek to understand the mystery of this large mammal’s extinction through an examination of its most well-preserved and plentiful remains, its own DNA.
To ask a question to the presenter click here
Show
- Catarina Branco, Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain/CINBIO, University of Vigo, Spain, Spain
- Miguel Arenas, Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain/CINBIO, University of Vigo, Spain, Spain
Short Abstract: The currently observed spatial genetic variations (hereafter genetic gradients) of diverse species, including modern humans, have been influenced by past evolutionary processes such as range contractions, range expansions and populations admixture, among others. An interesting case of these influences can be the settlement of modern humans in Asia, which experienced diverse processes such as the last glacial period (GP) that induced Paleolithic range contractions, migration through long-distance dispersal (LDD) and populations admixture. In 1993, Cavalli-Sforza and coauthors estimated a genetic gradient in Asia presenting an east-west (E-W) orientation, which interpreted as a consequence of past range expansions. However, this interpretation was not scientifically evaluated. Here, we performed extensive spatially explicit computer simulations of genetic data to mimic Paleolithic and Neolithic range expansions together with (i) range contractions caused by the last GP, (ii) migration through long-distance dispersal (LDD), (iii) different levels of admixture between Paleolithic and Neolithic populations and (iv) Neolithic expansions from the Middle East and/or southeast Asia. Next, we estimated the corresponding genetic gradients with principal component analyses (PCA) to identify which of those processes produced the most realistic genetic gradients. Our results showed that (1) Paleolithic populations present the observed E-W gradient if the last GP is considered and, under this situation, the genetic gradient is probably caused by allele surfing; (2) Scenarios with admixture between Paleolithic and Neolithic populations presented the E-W gradient when considering two Neolithic expansions or one Neolithic expansion admixed with Paleolithic populations that suffered the last GP, which could be explained by admixture of genetic sectors caused by the Neolithic expansions or by allele surfing, respectively; (3) LDD just increased the variance of genetic gradients among computer simulations. Additionally, we found genetic isolation in Arabian Peninsula and in Japan, probably resulting from geographic isolation. Altogether we conclude that the last GP and the Neolithic expansions from the Middle East and southeast Asia favoured the currently observed genetic gradient of Asian modern humans.
To ask a question to the presenter click here
Show
- David Ferreiro, University of Vigo, Spain
- Catarina Branco, University of Vigo, Portugal
- Miguel Arenas, university of Vigo, Spain
Short Abstract: The Iberian Peninsula is a well-delimitated region with a rich and complex human history caused by a series of range expansions (Paleolithic, Neolithic and Catholic Kingdoms, among others). This rich human history contributed to build the Iberian Peninsula as one of the regions in Europe with the highest genetic diversity. Recent studies based on genetic data identified genetic gradients of modern humans of the Iberian Peninsula presenting an overall east-west (E-W) orientation. Some authors associated these gradients with the establishment and expansion of catholic kingdoms during the Reconquista. Nevertheless, this association was not formally evaluated since other causes could also produce such genetic gradients. In order to clarify this issue, we performed extensive spatially-explicit computer simulations of genetic data by separately mimicking the most relevant evolutionary processes of modern humans that occurred in the Iberian Peninsula (in particular, the Paleolithic, Neolithic and Catholic Kingdom expansions). Then, we inferred genetic gradients from the simulated data using principal component analysis and we qualitatively compared the genetic gradient simulated under each evolutionary process with the real genetic gradients. We found that the Paleolithic expansion produced a southeast-northwest (SE-NW) orientation, probably caused by allele surfing. Next, the Neolithic expansion produced a genetic gradient with E-W orientation, probably driven by serial founder effects and isolation by distance, that was in agreement with the orientation of the real genetic gradients. Indeed, we found that the expansion of the Catholic Kingdoms (where borders among kingdoms were modelled as barriers to gene flow) can also produce the genetic gradient with E-W orientation observed in the real data, including the genetic isolation observed in some regions. These findings suggest that multiple evolutionary processes could have contributed to the currently observed genetic gradients of modern humans in the Iberian Peninsula. Project supported by 2019 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. The BBVA Foundation accepts no responsibility for the opinions, statements and contents included in the project and/or the results thereof, which are entirely the responsibility of the authors.
Show
- Roberto Del Amparo, University of Vigo, Spain
- Laura Rodriguez, University of Vigo, Spain
- Ugo Bastolla, Centro de Biologia Molecular Severo Ochoa, Spain
- Miguel Arenas, University of Vigo, Spain
Short Abstract: Empirical substitution models of protein evolution are traditionally used in phylogenetics because of their technical simplicity. However, these models present assumptions, such as imposing a same substitution process for all the protein sites, that could bias phylogenetic inferences. Here we studied the influence of the traditional empirical substitution models of protein evolution and recombination on the modeling of protein evolution along phylogenetic histories in terms of protein folding stability. We found that empirical substitution models produce proteins with unrealistic folding stability, in contrast to substitution models that directly incorporate information from the protein structure. Interestingly, we also found that recombination, modeled in phylogenetic networks, is not always dramatic in terms of folding stability and, the stability of recombinants is affected by the molecular similarity between the recombining proteins. Therefore, this work concludes that in order to improve results, efforts should be made to incorporate structurally restricted substitution models of protein evolution in the conventional phylogenetic pipeline. This work was funded by the grants “RYC-2015-18241” from the Spanish Government and “ED431F 2018/08” from the “Xunta de Galicia”.
To ask a question to the presenter click here
- Maria I Freiberger, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN-CONICET, FCEN-UBA, Argentina
- Maximiliano Beckel, Fundación Instituto Leloir, Argentina
- Ezequiel A Galpern, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN-CONICET, FCEN-UBA, Argentina
- Ariel Chernomoretz, Departamento de Física - UBA, Argentina
- Diego Ferreiro, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN-CONICET, FCEN-UBA, Argentina
- Maria I Freiberger, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN-CONICET, FCEN-UBA, Argentina
- Maximiliano Beckel, Fundación Instituto Leloir, Argentina
- Ezequiel A Galpern, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN-CONICET, FCEN-UBA, Argentina
- Ariel Chernomoretz, Departamento de Física - UBA, Argentina
- Diego Ferreiro, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN-CONICET, FCEN-UBA, Argentina
Short Abstract: Introduction Ankyrin-Repeat proteins are made of tandem repetitions of a ~33 residue motif that cooperatively fold into elongated architectures. The genomic sequences that code for these proteins are widespread in eukaryotes and are often interrupted by introns. Methods We analyzed the structure of ~70000 genes that code for ankyrin repeat arrays and quantified the distributions of exon lengths, intron phase and exon class. Results We found a characteristic exon length of 99 bases that is typically bounded by introns that do not interrupt the reading frame, suggesting that alternative splicing could occur within the repeat-arrays. We looked for alternative splicing signals in the human genome annotations and found direct evidence for the modular architecture of repeats within the arrays.
- Florencia C. Mascali, Laboratorio de Biotecnología Acuática, FCByF-UNR- MCyTSF - Rosario - Argentina; CONICET CCT-Rosario- Rosario- Argentina, Argentina
- Victoria Posner, Laboratorio de Biotecnología Acuática, FCByF-UNR- MCyTSF - Rosario - Argentina, Argentina
- Felipe Del Pazo, Laboratorio de Biotecnología Acuática, FCByF-UNR- MCyTSF - Rosario - Argentina; CONICET CCT-Rosario- Rosario- Argentina, Argentina
- Juan A. Rubiolo, Facultad de Veterinaria - Universidad de Santiago de Compostela - Lugo - España, Spain
- Paulino Martinez, Facultad de Veterinaria - Universidad de Santiago de Compostela - Lugo - España, Spain
- G. Vanina Villanova, Laboratorio de Biotecnología Acuática, FCByF-UNR- MCyTSF - Rosario - Argentina; CONICET CCT-Rosario- Rosario- Argentina, Argentina
Short Abstract: Aquaculture, possibly the fastest growing food production sector, accounts today for almost 50% of the world's fish products destined for food. Pacú (Piaractus mesopotamicus) is the main product of fish farming production in Argentina. Despite the commercial importance of Pacú in Argentina, as well as in other countries, there is little genetic and genomic information about this species. Sustainable aquaculture from an environmental and economic point of view requires an understanding of the genetic basis of the traits that limit and/or improve crop development. Obtaining the genome for a given species of fish represents the starting point from which it is possible to make significant progress in the study of other areas of aquaculture such as reproduction, health, nutrition, behavior, physiology, larviculture, and molecular biology, among others. In this context, we carry out the de novo genome assembly of a male and a female reference genome for Pacú through Next-Generation Sequencing data. Genomic DNA was extracted from samples of specimens of both sexes. The preparation of Nextera Flex pair-end 600 bp libraries, as well as sequencing on a Novaseq 6000 equipment, were performed by the San Diego company Illumina. The reads quality was analyzed with FastQC program, and a pre-processing step of removing adapters and low quality bases was done with the program Trimmomatic. The final coverage was ~60X for each sex. The reads were assembled using the program VELVET. Both genomes are of ~1.2 Gb, with a 39.4 %GC. Female genome has 131388 contigs and a N50 of 22395, while the male genome has 205120 contigs with a N50 of 12026. Genome completeness was estimated using BUSCO, searching against the Actinopterygii database, resulting in 66% and 58% of completeness for females and males respectively. Regarding annotation, the AUGUSTUS program trained with Danio rerio data was used for ab initio gene prediction. More than 50000 genes were predicted for each genome, that were characterized by BLASTP against the non redundant database. This is the first draft genome available for Pacú, and constitutes a valuable tool to carry out other projects both in our group and in the scientific community, with perspectives sustainable and of improvement in the aquaculture production of this species.
To ask a question to the presenter click here
- Diego M. Luna, Facultad de Ingeniería, Universidad Nacional de Entre Ríos, Argentina
- R. Gonzalo Parra, European Molecular Biology Laboratory, Germany
- Maria I. Freiberger, Laboratorio de Fisiología de Proteínas, FCEN, Universidad de Buenos Aires-CONICET-IQUIBICEN, Argentina
Short Abstract: Introduction Local frustration has extensively been linked to functional aspects in proteins. Recently, we introduced a way to detect evolutionary conserved frustration patterns (ECFPs) to study enzymatic activity. We extend our work to study related protein families to detect differential functional adaptations after the divergence from a common ancestor. Here we present our results of studying ECFPs at the globins superfamily. Methods: Frustration was calculated using the Protein Frustratometer. ECFPs are detected by calculation of the information content over frustration results matched to homologous residues across multiple sequence alignments within the globin superfamily. ECFPs can be obtained both at the level of single residues and contact maps and weighted according to contacts occurrence frequency. Results: We analyzed the ECFPs of different members within the globins superfamily. Given that these families share a common ancestor we conclude that the differential ECFPs at the existent superfamily members correspond to specific functional adaptations to the activity and context in which these proteins operate at present times. We consider ECFPs can be used to exploit the evolutionary history of protein families to detect specific functional aspects of them and better understand the relationship between sequence and function over evolutionary scales.
To ask a question to the presenter click here
Show
- Jose Antonio Ramírez-Rafael, Centro de Física Aplicada y Tecnología Avanzada UNAM- Campus Juriquilla, Mexico
- Dulce Valdivia, CINVESTAV, Mexico
- Gabriel Emilio Herrera-Oropeza, Instituto de Neurobiología UNAM- Campus Juriquilla, Mexico
- Andrés García-García, Centro de Física Aplicada y Tecnología Avanzada UNAM- Campus Juriquilla, Mexico
- Alfredo Varela-Echavarría, Instituto de Neurobiología UNAM- Campus Juriquilla, Mexico
- Katia Aviña-Padilla, Centro de Investigación y de Estudios Avanzados del I.P.N Unidad Irapuato, Mexico
- Maribel Hernández-Rosales, Centro de Investigación y de Estudios Avanzados del I.P.N Unidad Irapuato, Mexico
Short Abstract: Eukaryotic genes without introns in their coding sequence are known as "single-exon genes" (SEGs), in contrast to "multiple exon genes'' (MEGs). Intronless genes (IGs), are a subgroup of SEGs additionally characterized by the lack of introns in their UTRs. IGs are inherently involved in development, growth, and cell proliferation. Because of their prokaryotic architecture, IGs in eukaryotic genomes, provide interesting datasets for computational analysis in comparative genomics and evolutionary trajectories. Comparative analysis of their sequences among genomes can help to identify the unique and conserved features in these genes, and hence provide insight to intron role in gene evolution and a better understanding of genome architecture and arrangement. Several diseases such as Williams Beuren syndrome, myoclonus epilepsy, neuroblastoma, Alzheimer and cancer have been linked to proteins encoded by IGs. This work aimed to determine the evolutionary history of IGs encoding proteins in the mouse (Mus musculus) genome. Mouse IGs and MEGs datasets were obtained from the Ensembl database according to their exon and transcript count. We predicted 1116 protein-coding IGs from the Mouse genome. Mouse peptide sequences were submitted to ProteinOrtho that allowed the inference of gene orthologs relationships in 10 genomes, including Homo sapiens. From ProteinOrtho’s predictions, orthology graphs were constructed, and an inhouse developed method called Best Match Graph Modular Decomposition (BMGMD) was used. This method performs a modular decomposition of the orthology graphs and infers the gene trees that are consistent with the species phylogeny. Furthermore, each internal node of these trees represents a duplication or speciation event. Subsequently, the gene trees are reconciled with the species tree to determine in which branch of the species tree events occur and, at the same time, infer gene losses. This method also allows us to estimate how ancestral a gene family is and to identify species-specific genes. Further, we selected those orthologs that were conserved as IGs in other genomes. Protein orthologs among 10 genomes confirmed a high conservancy of IGs associated with the regulation of neurobiological processes in Vertebrata and with chromatin condensation in more distant organisms. Overall, our results support the hypothesis that IGs are “recent” highly specialized genes that are transcribed skipping splicing events entailing a higher transcriptional fidelity and saving cell energy waste during essential/housekeeping pathways.
Show
- Gustavo Sandoval, Grupo de Investigacion en Bioinformatica y Biologia Estructural. Universidad Nacional Mayor de San Marcos, Peru
- Evans Cucho, Grupo de Investigacion en Bioinformatica y Biologia Estructural. Universidad Nacional Mayor de San Marcos, Peru
- Obert Marin-Sanchez, Grupo de Investigacion en Bioinformatica y Biologia Estructural. Universidad Nacional Mayor de San Marcos, Peru
- Miguel Neira, Grupo de Investigacion en Bioinformatica y Biologia Estructural. Universidad Nacional Mayor de San Marcos, Peru
Short Abstract: Quinoa (Chenopodium quinoa) is one of the most important Peruvian commercial species due to its nutritional and health benefits and its great capacity for adapting contrast environments from soils poor in nutrients to marginal salt-stressed agroecosystems. Recently, its whole genome has been sequenced and this has converted this species into a model cultivar for studying and to increase our comprehension of how plants respond to salinity. Between different metabolic markers related to salinity stress, betaine aldehyde dehydrogenase (BADH) has been recently associated with salt tolerance in a great variety of plant families including Amaranthaceae, because of this capacity to synthesize betalains which confers adaptive advantages to grow in deserts and dunes. For this reason, we performed an in silico analysis of this enzyme from Chenopodium quinoa to gain a deeper understanding of its structure-function relationship. To elucidate a 3D structure of this molecule, we analyzed its mature sequence (Uniprot: A0A0K1GYT5) and obtained biochemical parameters, conserved domains/motifs, secondary structure predictions, protein interactions networks, 3D-modeling, and its validation, all using the online resources: ProtParam, PHD-PRABI, STRING, and SWISSMODEL/PHYRE2. As a result, the enzyme contains 454 amino acids with a MW of 49.4 kDa and a pI of 5.3. According to its conserved domains, this enzyme shows a secondary structure composed of 44.5% alpha-helix, 15.0% beta-sheet and belongs to the family of plant aldehyde dehydrogenases with NAD(P) binding motifs. This enzyme showed high similarity with other dehydrogenases especially with those from Spinacia oleracea which was used as a template. Modelling was obtained with 90% coverage and 100% confidentiality, allowing the recognition of the active site (Glu220, Gly251, and Cys254) as well as amino acids involved in NAD(P) binding. In conclusion, BADH from Chenopodium quinoa is an acidic molecule with an important role in the synthesis of betalains in a NAD-dependent manner. These results open the gate to the discovery of molecular mechanisms to unravel the salt tolerance of this organism (Financial Support: VRIP-UNMSM Código B20100351).