Title

Density Based Visualization of Big Data with Graphical Processing Units

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science and Info Sys

Date of Award

Fall 2014

Abstract

The purpose of this study was to visualize the data clusters using OPTICS algorithm, with the help of Graphical Processing Units/GPUs and Python (Python, 2001) as a high level programming language through Graphical User Interface (GUI). The GUI is platform independent since Python is supported by all major operating systems, such as Windows XP, 7, 8, Linux and Mac OS. Identifying clusters for large databases is not an easy computation for a Central Processing Unit (CPU), as it can perform the calculations in some minutes-to-hours based on the size and dimensionality of the input data.A GPU might have a large number of multiprocessors, each of which has several cores. CUDA (Compute Unified Device Architecture) (NVIDIA, 2006) is a parallel programming model developed by NVIDIA (NVIDIA, 2014), which works with GPU. It is known that working with the CUDA is n times faster than working with a CPU. By combining the high computational power of GPUs and multiple advantages provided by OPTICS, clustering results can be obtained in a much faster and efficient way. In this study, large databases were divided into smaller parts and distributed among multiprocessors or GPUs, which in turn calculated the results and passed on the data to the CPU which had invoked the operation. The tool we developed will help researchers in various fields like astronomy, medicine, geology, biology and many more. Though the implementation of OPTICS is provided by tools like WEKA (WEKA, 1993), and KNIME (KNIME, 2006), there is no GPU-supported API in the literature. We found that our multiplatform software fastened OPTICS calculations and visualization up to 24 times comparing the CPU version of the algorithm. With respect to the user perspective, the tool is simple to use and adaptable to different data formats, providing user with the option of using it in many kinds of analysis on various operating systems.

Advisor

Mutlu Mete

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

COinS