ETL Testing
The ETL Testing group is part of the Computer Science Department at Colorado State University. We are sponsored by the University of Colorado School of Medicine. Our research focuses on developing systematic testing techniques for the Extract, Transform, Load (ETL) process in an enterprise health data warehouse. The warehouse is called Health Data Compass and it uses Google Big Query headquartered at the University of Colorado Anschutz Medical Campus.
Research Objectives
We follow these objectives in this project:
Data Quality Testing
Data quality testing: We validate the data in the data warehouse in isolation to detect violations of syntactic and semantic properties of the data
Data Balancing Testing
Data balancing testing: We compare the data in the sources with the corresponding one in the target warehouse, and report undesired differences.
Publications
Conferences
- Homayouni, S. Ghosh, I. Ray, S. Gondalia, J. Duggan, M. Kahn, 2020. “An Autocorrelation-based LSTM-Autoencoder for Anomaly Detection on Time-Series Data”, submitted as a full paper to the IEEE Big Data.
- Homayouni, Hajar, Sudipto Ghosh, Indrakshi Ray, and Michael G. Kahn. “An interactive data quality test approach for constraint discovery and fault detection.” In 2019 IEEE International Conference on Big Data (Big Data), pp. 200-205. IEEE, 2019. Link
- Homayouni, Hajar, Sudipto Ghosh, and Indrakshi Ray. “ADQuaTe: An Automated Data Quality Test Approach for Constraint Discovery and Fault Detection.” In 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), pp. 61-68. IEEE, 2019. Link
- Homayouni, Hajar. “Testing extract-transform-load process in data warehouse systems.” In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 158-161. IEEE, 2018. Link
- Homayouni, Hajar, Sudipto Ghosh, and Indrakshi Ray. “An Approach for Testing the Extract-Transform-Load Process in Data Warehouse Systems.” In Proceedings of the 22nd International Database Engineering & Applications Symposium, pp. 236-245. 2018. Link
Presentations
- Homayouni, S. Ghosh, I. Ray, 2020. “IDEAL: Interactive Detection and Explanation of Anomalies using Autocorrelation-based LSTM-Autoencoder for Time-Series Data”, virtual poster presentation at Rocky Mountain Advanced Computing Consortium (RMACC).
- Homayouni, S. Ghosh, I. Ray, 2019. “ADQuaTe: An Automated Interactive Data Quality Test Approach”, poster presentations at Graduate Show Case, Colorado State University, USA.
- Homayouni, S. Ghosh, I. Ray, 2019. “ADQuaTe: An Automated Interactive Data Quality Test Approach”, poster presentations at Grace Hopper Celebration of Women in Computing, USA.
- Homayouni, S. Ghosh, I. Ray, 2019. “ADQuaTe: An Automated Interactive Data Quality Test Approach”, poster presentations at Tapia Celebration of Diversity in Computing, Sandiego, USA. Link
- Homayouni, S. Ghosh, I. Ray, 2019. “ADQuaTe: An Automated Data Quality Test Approach for Constraint Discovery and Fault Detection”, poster presentation at Rocky Mountain Advanced Computing Consortium (RMACC). Link
- H. Homayouni, S. Ghosh, I. Ray, 2018. “Using Autoencoder to Generate Data Quality Tests”, paper presentation at Rocky Mountain Celebration of Women in Computing (RMCWIC), Denver, USA, November 2 – 3. Link
- H. Homayouni, S. Ghosh, 2016. “A study of Evosuite as an automatic test case generation approach to kill first order mutants”, paper presentation at Rocky Mountain Celebration of Women in Computing (RMCWIC), Salt Lake City, USA, September 22 – 23. Link
Journal
- Homayouni, S. Ghosh, I. Ray, M. Kahn 2020. “ADQuaTe2: A Data Quality Test Approach for Automated Constraint Discovery and Fault Detection”, submitted to Heuristic acquisition for data science journal Special Issue of Information Systems Frontiers, published by Springer.
Book Chapter
- H. Homayouni, S. Ghosh, I. Ray, 2018. “Data Warehouse Testing”. Advances In Computers, Volume 112. Link