Rebuilding Statistically Encoded Dataset by Generative Method

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science and Info Sys

Date of Award

Summer 2023

Abstract

The last two decades have been experiencing a surge in the data produced every hour. This sudden surge in data is due to the proliferated use of technologies for both personal and public use. The types of data produced in mass amounts include finance, medical, personal, and social media; although this data could be in any form numerical, images, audio, or video. This research paper is focused on one-dimensional Time series data. The increased amount of data may lead to many limitations like very low transmission speed, limitations with storage, requiring high frequency of bandwidth to transfer large amounts of data, and very slow I/O speed. These limitations lead to the development of tools for compressing the data, it could be either lossless or lossy, the latter causing loss of data to achieve the high compression ratio. The paper’s focus is to reconstruct the lossy compressed data where the newly built data must have a similar distribution of data points to that of real data. To accomplish this purpose the combination of the following concepts is implemented, The Time Series Generative Adversarial Networks – Time GAN (Yoon et al., 2019) is a generative model where the adversarial mechanism is used to increase the efficiency of the generator to synthesize data and for data compression in this study, the IDEALEM (Implementation of Dynamic Extensible Adaptive Low Exchangeable Measure) model developed by the Novel Data Reduction Based on Statistical Similarity (Lee et al., 2016) paper, which applies the Kolmogorov-Smirnov Test as a statistic to compress the data. The efficiency of the decoding technique in the IDEALEM could be increased by the combination of two models, the IDEALEM with the Time GAN leading to a new model developed within our research referred to as Hinted Time GAN (Hinted Time Series Generative Adversarial Networks). The Hinted Time GAN outperforms IDEALEM, and Time GAN by producing less noisy reconstructed data. The performance of the Hinted Time GAN is proved qualitatively by the line plots and quantitatively by calculating the errors.

Advisor

Dongeun Lee

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Share

COinS