Papers by Yashodhar Desai
BIG DATA QUALITY MODELING AND VALIDATION by Khushali Desai The chief purpose of this study is to ... more BIG DATA QUALITY MODELING AND VALIDATION by Khushali Desai The chief purpose of this study is to characterize various big data quality models and to validate each with an example. As the volume of data is increasing at an exponential speed in the era of broadband Internet, the success of a product or decision largely depends upon selecting the highest quality raw materials, or data, to be used in production. However, working with data in high volumes, fast velocities, and various formats can be fraught with problems. Therefore, software industries need a quality check, especially for data being generated by either software or a sensor. This study explores various big data quality parameters and their definitions, and proposes a quality model for each parameter. By using data from the Water Quality U. S. Geological Survey (USGS), San Francisco Bay, an example for each of the proposed big data quality models is given. To calculate composite data quality, prevalent methods such as Monte Carlo and neural networks were used. This thesis proposes eight big data quality parameters in total. Six out of eight of those models were coded and made into a final year project by a group of Master's degree students at SJSU. A case study is carried out using linear regression analysis, and all the big data quality parameters are validated with positive results. v ACKNOWLEDGMENTS I would like to thank my thesis advisor, Dr. Jerry Gao, for his tremendous support, always helping and guiding me whenever I was stuck. I would also like to thank San Jose State University for giving me the opportunity to do a thesis as part of my master's work. I am grateful for my father, mother and sister for proofreading my work and giving me input, a valuable perspective from people who do not belong to this field. Without them, my work would not be as fruitful. And special thanks to my thesis committee members, who always offered moral support, advice and technical reviews of the material for both this thesis and the original work. I would also like to mention Ariel Andrew and Jenn Hambly from the English writing center of SJSU for giving me essential input on my grammar mistakes. My friends were also my strength; I am indebted to Jayapriya, Chaitra, Sumana, Pranathi, and Nithya for being there with me during my research endeavor, and giving me moral support all the time. Finally, a huge thank you to David McCormick for copy editing the thesis to help reach the final version. vi
Uploads
Papers by Yashodhar Desai