RNA Secondary Structure Annotation Using Rab Representation
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science and Info Sys
Date of Award
Spring 2018
Abstract
RNA is a molecule which has a 2D model, which we call RNA secondary structure. There are various sequence representations for RNA secondary structures. A recently suggested RNA sequence relative address-based (RAB) representation has important features that allow for creating and storing the RNA secondary structure database into a suffix array, which is an efficient data structure specialized for storing and searching sequences. By exploiting these features, and using the suffix array that stores the underlying database, we propose to develop and implement algorithms for identifying (annotating) substructure of RNA secondary structures such as hairpin loops, internal loops, bulge loops, multi-branch loops, helixes, and pseudoknots. Our programs will annotate a given RNA structure by known substructures. RNA secondary substructures are represented in different formats such as base pairing sequence (BPSEQ), dot-bracket, and ordered tree. Traditionally annotation is done by using context-free grammar (CFG) parsing, regular expression-matching based on BPSEQ, and dot-bracket formats. Finding pseudoknots is particularly important because they cannot be described by CFGs. We will compare capabilities of our method with those of other methods.
Advisor
Abdullah N. Arslan
Subject Categories
Computer Sciences | Physical Sciences and Mathematics
Recommended Citation
Kollu, Shanthi, "RNA Secondary Structure Annotation Using Rab Representation" (2018). Electronic Theses & Dissertations. 439.
https://digitalcommons.tamuc.edu/etd/439