Title

Graphical Processing Unit Accelerated RNA Substructure Comparison and Search Engine

Author

Anjali Kumari

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science and Info Sys

Date of Award

Spring 2021

Abstract

Exponentially increasing Ribonucleic acid (RNA) secondary structure databases have presented new challenges to researchers in the field and motivated the idea for fast preparation of collection of huge RNA structure database for subsequent analysis. Further when big data discussion happens, consideration of the processing time is a must. Use of big data analytics and Graphics Processing Unit (GPU) accelerated deep learning have proven highly beneficial in computationally extensive data analysis which motivated this study to incorporate GPU to accelerate the existing as well as new comparison and search algorithms.The recently developed comparison and search algorithm yielded efficient results with the use of newly proposed relative addressing based (RAB) RNA secondary structure representation. The RAB representation embeds the 2D structure information of an RNA into a sequence. It allows to store the RNA structure database into a suffix array and further assists the development of fast substring search and comparison algorithms. The algorithms were tested on databases of around 5000 RNAs. Now as the database sizes have reached collectively to millions, while performing analysis, limitations have been observed with existing algorithms when used over huge databases. Hence, the goals of this project include automation of the steps to streamline the handling of RNA structure data curation such that the database could be refreshed as needed in accordance with its constantly changing nature. Some new search and comparison problems have been formulated for RNA substructures analysis. These problems will show meaningful impact on RNA structure analysis, done on large databases. The development of algorithms to solve these problems efficiently will be the focus of this study. The efficiency of proposed solutions would be proved by showing comparative test results. In addition, the practical use of this study in life-sciences will also be discussed. Finding similarity by comparing RNA structure sequences is substantial for structural bioinformatics researchers but upon comparison, the computational cost could outweigh the gain (Stern & Mathews, 2013). Hence, this study proposes to give improved solution for the previously developed RNA substring search and comparison problems and for new discussed problem scenarios to work efficiently on large databases with a Compute Unified Device Architecture (CUDA)-supported General Purpose Graphics Processing Unit (GPGPU) programming. All the application development would be done using Python language, which is popularly used nowadays for big data analytics and machine learning. The comparative test results would be generated on system with NVIDIA GPU support.

Advisor

Abdullah N. Arslan

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

COinS