Rebuilding Statistically Encoded Dataset by Generative Method
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science and Info Sys
Date of Award
Summer 2023
Abstract
The last two decades have been experiencing a surge in the data produced every hour. This sudden surge in data is due to the proliferated use of technologies for both personal and public use. The types of data produced in mass amounts include finance, medical, personal, and social media; although this data could be in any form numerical, images, audio, or video. This research paper is focused on one-dimensional Time series data. The increased amount of data may lead to many limitations like very low transmission speed, limitations with storage, requiring high frequency of bandwidth to transfer large amounts of data, and very slow I/O speed. These limitations lead to the development of tools for compressing the data, it could be either lossless or lossy, the latter causing loss of data to achieve the high compression ratio. The paper’s focus is to reconstruct the lossy compressed data where the newly built data must have a similar distribution of data points to that of real data. To accomplish this purpose the combination of the following concepts is implemented, The Time Series Generative Adversarial Networks – Time GAN (Yoon et al., 2019) is a generative model where the adversarial mechanism is used to increase the efficiency of the generator to synthesize data and for data compression in this study, the IDEALEM (Implementation of Dynamic Extensible Adaptive Low Exchangeable Measure) model developed by the Novel Data Reduction Based on Statistical Similarity (Lee et al., 2016) paper, which applies the Kolmogorov-Smirnov Test as a statistic to compress the data. The efficiency of the decoding technique in the IDEALEM could be increased by the combination of two models, the IDEALEM with the Time GAN leading to a new model developed within our research referred to as Hinted Time GAN (Hinted Time Series Generative Adversarial Networks). The Hinted Time GAN outperforms IDEALEM, and Time GAN by producing less noisy reconstructed data. The performance of the Hinted Time GAN is proved qualitatively by the line plots and quantitatively by calculating the errors.
Advisor
Dongeun Lee
Subject Categories
Computer Sciences | Physical Sciences and Mathematics
Recommended Citation
Pillarisetty, Navya, "Rebuilding Statistically Encoded Dataset by Generative Method" (2023). Electronic Theses & Dissertations. 1111.
https://digitalcommons.tamuc.edu/etd/1111