RNA Secondary Structure Annotation Using Rab Representation

Author

Shanthi Kollu

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science and Info Sys

Date of Award

Spring 2018

Abstract

RNA is a molecule which has a 2D model, which we call RNA secondary structure. There are various sequence representations for RNA secondary structures. A recently suggested RNA sequence relative address-based (RAB) representation has important features that allow for creating and storing the RNA secondary structure database into a suffix array, which is an efficient data structure specialized for storing and searching sequences. By exploiting these features, and using the suffix array that stores the underlying database, we propose to develop and implement algorithms for identifying (annotating) substructure of RNA secondary structures such as hairpin loops, internal loops, bulge loops, multi-branch loops, helixes, and pseudoknots. Our programs will annotate a given RNA structure by known substructures. RNA secondary substructures are represented in different formats such as base pairing sequence (BPSEQ), dot-bracket, and ordered tree. Traditionally annotation is done by using context-free grammar (CFG) parsing, regular expression-matching based on BPSEQ, and dot-bracket formats. Finding pseudoknots is particularly important because they cannot be described by CFGs. We will compare capabilities of our method with those of other methods.

Advisor

Abdullah N. Arslan

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Share

COinS