A New Method for Regular Expression Matching on Preprocessed Text

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science and Info Sys

Date of Award

Fall 2015

Abstract

Text editing and information retrieval are common applications that use text matching (pattern matching). The problem is to search for a specific string within a text file and to find all the locations of this string in the text. In our work, we used regular expression to search for strings. The traditional approach for regular expression searching is based on translating the regular expression into finite automaton (FA) to recognize strings that match the regular expression (E) and use it to search the text as input. The disadvantage of this method is that a significant amount of time is required to search a large file, since all the text has to be searched. In this thesis, we introduce a new method for solving the classical regular expression string searching problem. We aimed to speed up the regular expression matching in texts such as Google Docs, Integrated Development Environments (IDEs), and biological sequence databases where regular expression searches are numerous and frequent. Ultimately, the developed method and results can be used on internet searches.

Advisor

Abdullah Arslan

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Share

COinS