
RNA sequencing bioinformatics data analysis transcriptomics. High-throughput sequencing has become the main choice to measure expression levels, i.e., RNA-Seq. RNA-Seq allows researchers to detect both known and novel features in a single assay, enabling the identification of transcript isoforms, gene fusions, single nucleotide variants, and other features without the limitation of prior knowledge. This is obtained by reverse transcription from RNA. Our goals in the present review are to break down the steps of a typical RNA-seq analysis and to highlight the pitfalls and checkpoints along the way that are vital for bench scientists and biomedical researchers performing experiments that use RNA-seq. RNA sequencing (RNA-seq) is the application of next generation sequencing technologies to cDNA molecules. However, a general understanding of the principles underlying each step of RNA-seq data analysis allows investigators without a background in programming and bioinformatics to critically analyze their own datasets as well as published data. An RNA molecule has a backbone made of alternating phosphate groups and the sugar ribose, rather than the deoxyribose found in DNA. Unlike DNA, however, RNA is most often single-stranded. With this wealth of RNA-seq data being generated, it is a challenge to extract maximal meaning from these datasets, and without the appropriate skills and background, there is risk of misinterpretation of these data. Ribonucleic acid (abbreviated RNA) is a nucleic acid present in all living cells that has structural similarities to DNA.

Since the first publications coining the term RNA-seq (RNA sequencing) appeared in 2008, the number of publications containing RNA-seq data has grown exponentially, hitting an all-time high of 2,808 publications in 2016 (PubMed).
