Transcriptome analysis using RNA-Seq fromexperiments with and without biological replicates: areview

Keywords: sequencing, bioinformatics, pipeline, RNA

Abstract

The discovery of nucleic acids opened new frontiers of knowledge, enabling researchers to access an enormous amount of data, through large-scale sequencing methodologies and bioinformatics tools. Amongst these new possibilities, RNA-Seq has been used to identify and quantify RNA molecules. To obtain more accurate biological responses from RNA-Seq data some questions should be considered such as experimental design, type of synthesized library, size of the fragments generated, number of biological replicates, depth, and coverage of the sequencing, species genome availability, and, the choice of software to properly perform the computational analyzes. Accurate bioinformatics analyzes allow the selection of genes with a lower error rate, increasing the validation assertiveness via RT-qPCR and thus, reducing costs. The objective of this review was to present the analysis stages of RNA-Seq data, from experimental design to system biology, considering relevant points, as well as to pointed out some software currently available to carry these analyzes out. Besides, with this review, we aimed to help the academic community to understand all steps and biases involved in RNA-Seq data analysis, from experiments with or without biological replicates.

Downloads

Download data is not yet available.

Author Biography

Mayla Daiane Correa Molinari, EMBRAPA SOJA/UEL

Possui graduação em Biomedicina pela Universidade Estadual de Londrina (2011), graduação em Agronomia pelo Centro Universitário Filadélfia (2015), Especialização em Genética Aplicada pela Universidade Estadual de Londrina (2013), mestrado em Genética e Biologia Molecular pela Universidade Estadual de Londrina (2015) e atualmente cursa doutorado em Genética e Biologia Molecular pela Universidade Estadual de Londrina. Desde 2010 trabalha na área de genética e biotecnologia vegetal com concentração em biologia molecular. Possui experiência em metodologias de análise de expressão gênica diferencial como PCR em tempo real, RNASeq, bioinformática e também protocolos de transformação de plantas via Agrobacterium tumefaciens, visando obtenção de plantas geneticamente modificadas (OGMs) tolerantes à estresse abióticos e análises fisiológicas das plantas GMs. Atualmente desenvolve suas atividades de pesquisa como bolsista CAPES de doutorado na Embrapa Soja.

References

AIOUB, A.A.; ZUO, Y.; LI, Y.; QIE, X.; ZHANG, X.; ESSMAT, N.; HU, Z. Transcriptome analysis of Plantago major as a phytoremediator to identify some genes related to cypermethrin detoxification. Environmental Science and Pollution Research, v. 27, p. 1-15, 2020.

https://doi.org/10.1007/s11356-020-10774-4

BOLGER, A.M.; LOHSE, M.; USADEL, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, v. 30, n. 15, p. 2114-2120, 2014. https://doi.org/10.1093/bioinformatics/btu170

CAMARENA, L.; BRUNO, V.; EUSKIRCHEN, G.; POGGIO, S.; SNYDER, M. Molecular mechanisms of ethanol-induced pathogenesis revealed by RNA-sequencing. PLoS pathogens, v. 6, n. 4, p. e1000834, 2010. https://doi.org/10.1371/journal.ppat.1000834

CARAZO, F.; ROMERO, J.P.; RUBIO, A. Upstream analysis of alternative splicing: a review of computational approaches to predict context-dependent splicing factors. Briefings in Bioinformatics, v. 20, n. 4, p. 1358-1375, 2019. https://doi.org/10.1093/bib/bby005

CHEN, L.; HEIKKINEN, L.; WANG, C.; YANG, Y.; SUN, H.; WONG, G. Trends in the development of miRNA bioinformatics tools. Briefings in Bioinformatics, v. 20, n.5, p. 1836-1852, 2019. https://doi.org/10.1093/bib/bby054

CONESA, A.; MADRIGAL, P.; TARAZONA, S.; GOMEZ-CABRERO, D.; CERVERA, A.; MCPHERSON, A.; MORTAZAVI, A. A survey of best practices for RNA-Seq data analizes. Genome biology, v. 17, n. 1, p. 13, 2016. https://doi.org/10.1186/s13059-016-0881-8

DA FONSECA, B.H.R.; DOMINGUES, D.S.; PASCHOAL, A.R. mirtronDB: a mirtron knowledge base. Bioinformatics, v.35, n. 19, p. 3873-3874, 2019. https://doi.org/10.1093/bioinformatics/btz153

EVERAERT, C.; LUYPAERT, M.; MAAG, J. L.; CHENG, Q. X.; DINGER, M. E.; HELLEMANS, J.; MESTDAGH, P. Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data. Scientific Reports, v. 7, n. 1, p. 1559, 2017. https://doi.org/10.1038/s41598-017-01617-3

EWING B; GREEN, P. Base-Calling of automated sequencer traces using Phred. II. Error probabilities. Genome Research, v. 8, p. 186-194, 1998. http://doi.org/10.1101/gr.8.3.186

FENG, J.; MEYER, C. A.; WANG, Q., LIU; J. S.; SHIRLEY LIU, X.; ZHANG, Y. GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-Seq data. Bioinformatics, v. 28, n. 21, p. 2782-2788, 2012. https://doi.org/10.1093/bioinformatics/bts515

HEATHER, J.M.; CHAIN, B. The sequence of sequencers: the history of sequencing DNA. Genomics, v. 107, n. 1, p. 1-8, 2016. https://doi.org/10.1016/j.ygeno.2015.11.003

JAIN, M.; KOREN, S.; MIGA, K. H.; QUICK, J.; RAND, A. C.; SASANI, T. A.; MALLA, S. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology, v. 36, n. 4, p. 338-345, 2018. https://doi.org/10.1038/nbt.4060

KIM, D.; LANGMEAD, B.; SALZBERG, S.L. HISAT: a fast-spliced aligner with low memory requirements. Nature Methods, v. 12, n. 4, p. 357-360, 2015. https://doi.org/10.1038/nmeth.3317

LI, Q. Q.; LIU, Z.; LU, W.; LIU, M. Interplay between alternative splicing and alternative polyadenylation defines the expression outcome of the plant unique OXIDATIVE TOLERANT-6 gene. Scientific Reports, v. 7, n. 1, p. 1-9, 2017. https://doi.org/10.1038/s41598-017-02215-z

MACHADO, F. B.; MOHARANA, K. C.; ALMEIDA-SILVA, F.; GAZARA, R. K.; PEDROSA-SILVA, F.; COELHO, F. S.; VENANCIO, T. M. Systematic analysis of 1,298 RNA-Seq samples and construction of a comprehensive soybean (Glycine max) expression atlas. The Plant Journal: for cell and Molecular Biology, v. 103, n.5, p. 1894-1909, 2020. https://doi.org/10.1111/tpj.14850

MARACAJA-COUTINHO, V.; PASCHOAL, A. R.; CARIS-MALDONADO, J. C.; BORGES, P. V.; FERREIRA, A. J.; DURHAM, A. M. Noncoding RNAs databases: current status and trends. In Computational Biology of Non-Coding RNA. Humana Press, New York, NY, p. 251-285, 2019. https://doi.org/10.1007/978-1-4939-8982-9_10

MAZA, E. In Papyro comparison of TMM (edgeR), RLE (DESeq2), and MRN normalization methods for a simple two-conditions-without-replicates RNA-Seq experimental design. Frontiers in genetics, v. 7, p. 164, 2016. https://doi.org/10.3389/fgene.2016.00164

MIN, J.; WAGNER, M.; KASAMIAS, T. Advances in Transcriptome Analyses Using RNA Sequencing Technology in Soybean Plants [Glycine max]. Computational Molecular Biology, v. 10, n. 1, 2020.

MUHAMMAD, I. I.; KONG, S. L.; AKMAR ABDULLAH, S. N.; MUNUSAMY, U. RNA-seq, and ChIP-seq as Complementary Approaches for Comprehension of Plant Transcriptional Regulatory Mechanism. International Journal of Molecular Sciences, v. 21, n. 1, p. 167, 2020. https://doi.org/10.3390/ijms21010167

NEGRI, T. D. C.; ALVES, W. A. L.; BUGATTI, P. H.; SAITO, P. T. M.; DOMINGUES, D. S.; PASCHOAL, A. R. Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants. Briefings in Bioinformatics, v. 20, n.2, p. 682-689, 2019. https://doi.org/10.1093/bib/bby034

OH, J. M.; VENTERS, C. C.; DI, C.; PINTO, A. M.; WAN, L.; YOUNIS, I.; DREYFUSS, G. U1 snRNP regulates cancer cell migration and invasion in vitro. Nature Communications, v. 11, n. 1, p. 1-8, 2020. https://doi.org/10.1038/s41467-019-13993-7

OSHLACK, A.; ROBINSON, M.D.; YOUNG, M.D. From RNA-Seq reads to differential expression results. Genome Biology, v. 11, n. 12, p. 220, 2010. https://doi.org/10.1186/gb-2010-11-12-220

PROSDOCIMI, F.; DE CARVALHO, D. C.; DE ALMEIDA, R. N.; BEHEREGARAY, L. B. The complete mitochondrial genome of two recently derived species of the fish genus Nannoperca (Perciformes, Percichthyidae). Molecular Biology Reports, v. 39, n. 3, p. 2767-2772, 2012. https://doi.org/10.1007/s11033-011-1034-5

QUAIL, M. A.; SMITH, M.; COUPLAND, P.; OTTO, T. D.; HARRIS, S. R.; CONNOR, T. R.; GU, Y. A tale of three next-generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences, and Illumina MiSeq sequencers. BMC Genomics, v. 13, n. 1, p. 341, 2012. https://doi.org/10.1186/1471-2164-13-341

SIMPSON, A. J. G.; REINACH, F. D. C.; ARRUDA, P.; ABREU, F. A. D.; ACENCIO, M.; ALVARENGA, R.; BARROS, M. H. D. The genome sequence of the plant pathogen Xylella fastidiosa. Nature, v. 406, n. 6792, p. 151-157, 2000. https://doi.org/10.1038/35018003

THIMM, O.; BLÄSING, O.; GIBON, Y.; NAGEL, A.; MEYER, S.; KRÜGER, P.; STITT, M. Mapman: a user‐driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. The Plant Journal, v. 37, n. 6, p. 914-939, 2004. https://doi.org/10.1111/j.1365-313X.2004.02016.x

TIAN, T.; LIU, Y.; YAN, H.; YOU, Q.; YI, X.; DU, Z.; SU, Z. agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic acids research, v. 45, n. 1, p. 122-129, 2017. https://doi.org/10.1093/nar/gkx382

VOLKER, R.; SMALL, C. RNA-seqlopedia, 2017. Disponível em: https://RNA-Seq uoregon.edu/#exp-design. Acesso em: 09 out. 2019.

WAGNER, G.P.; KIN, K.; LYNCH, V.J. Measurement of mRNA abundance using RNA-Seq data: RPKM measure is inconsistent among samples. Theory in biosciences, v. 131, n. 4, p. 281-285, 2012. https://doi.org/10.1007/s12064-012-0162-3

WANG, M.; JIANG, B.; LIU, W.; LIN, Y. E.; LIANG, Z.; HE, X.; PENG, Q. Transcriptome Analyzes Provide Novel Insights into Heat Stress Responses in Chieh-Qua (Benincasa hispida Cogn. var. Chieh-Qua How). International journal of molecular sciences, v. 20, n. 4, p. 883, 2019. https://doi.org/10.3390/ijms20040883

WANG, Z.; GERSTEIN, M.; SNYDER, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, v. 10, n. 1, p. 57-63, 2009. https://doi.org/10.1038/nrg2484

ZHANG, C.; DOWER, K.; ZHANG, B., MARTINEZ; R. V., LIN; L. L.; ZHAO, S. Computational identification, and validation of alternative splicing in ZSF1 rat RNA-seq data, a preclinical model for type 2 diabetic nephropathy. Scientific Reports, v. 8, n. 1, p. 7624, 2018. https://doi.org/10.1038/s41598-018-26035-x

Published
2021-01-19
Section
Review Articles