Framework To Enhance Veracity And Quality Of Big Data

Ridzuan, Fakhitah (2021) Framework To Enhance Veracity And Quality Of Big Data. PhD thesis, Universiti Sains Malaysia.

[img]
Preview
PDF
Download (344kB) | Preview

Abstract

Massive amount of data are available for organisations to drive their business ahead of the competitors. Data collected from a variety of resources are dirty, and this will affect their business decisions. Various data cleansing tools are available to cater to the issue of dirty data. They offer better data quality, which will be a great help for the organisation to make sure their data is ready for the analysis. However, there has been an issue raised regarding the trustworthiness of the result, even though the quality of the data is high. Veracity is one of the characteristics of Big Data, which refers to the trustworthiness of the data. It always relates to data quality, but there has been less work on a standard that defines data quality, specifically for Big Data. Besides, most of the studies also show the need for data quality rule to satisfy a variety of errors present in the data. However, this process requires a domain expert that is expensive to employ. Consequently, this research proposes a method to automate data quality rules and an enhanced veracity assessment framework. The proposed method will automate the process of extracting data quality rules from the data source, which will reduce the interaction with the domain expert, and at the same time correctly verifying and validating the rules. The proposed method will be evaluated using the Veracity Enhancement Framework (VEF), to make sure the data has met the data quality dimension and able to deliver trustworthy result. The experimental result shows that the proposed automatic technique to extract data quality rules is able to correctly classify 9487 data with 4.6% error percentage.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics > QA75.5-76.95 Electronic computers. Computer science
Divisions: Pusat Pengajian Sains Komputer (School of Computer Sciences) > Thesis
Depositing User: Mr Noor Azizan Abu Hashim
Date Deposited: 31 May 2022 16:58
Last Modified: 31 May 2022 16:58
URI: http://eprints.usm.my/id/eprint/52691

Actions (login required)

View Item View Item
Share