• Full Text
    • Scholarly Journal

    An LLM-guided platform for multi-granular collection and management of data provenance

    PDF CiteCite
    Copy URLPrintAll Options

    References (31)

    • 1.
      Dua D, Graff C. UCI Machine Learning Repository: Mushroom Data Set 2019. https://archive.ics.uci.edu/dataset/73/mushroom .
    • 2.
      FAIRsharing Community: FAIRsharing: C5QG88 (2023). https://doi.org/10.24432/C5QG88.
    • 3.
      Glavic B, Alonso G. Perm: Processing provenance and data on the same data model through query rewriting. In: Ioannidis, Y.E., Lee, D.L., Ng, R.T. (eds.) Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, 2009;174–185
    • 4.
      Gregori L, Missier P, Stidolph M, Torlone r, Wood A. Design and Development of a Provenance Capture Platform for Data Science. In: Procs. 3rd DATAPLAT Workshop, Co-located with ICDE 2024. IEEE, Utrecht, NL 2024.
    • 5.
      Jacovi A, Marasović A, Miller T, Goldberg Y. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in ai. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21, pp. 624–635. Association for Computing Machinery, New York, NY, USA 2021. https://doi.org/10.1145/3442188.3445923
    • 6.
      Kaggle: Titanic - Machine Learning from Disaster 2025. https://www.kaggle.com/competitions/titanic/data .
    • 7.
      Kohavi R. Census Income 1996. https://doi.org/10.24432/C5GP7S.
    • 8.
      Lee S, Köhler S, Ludäscher B, Glavic B. A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017, 2017;485–496
    • 9.
      McPhillips TM, Song T, Kolisnik T, Aulenbach S, Belhajjame K, Bocinsky K, et al. A user-oriented, language-independent tool for recovering workflow information from scripts. CoRR. 2015. abs/1502.02403 .
    • 10.
      Moreau L, Missier P, Belhajjame K, B’Far R, Cheney J, Coppens S, et al. Prov-dm: The prov data model. w3c 2013.
    • 11.
      Namaki MH, Floratou A, Psallidas F, Krishnan S, Agrawal A, Wu Y, et al. Vamsa: Automated provenance tracking in data science scripts. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’20, pp. 1542–1551. Association for Computing Machinery, New York, NY, USA 2020. https://doi.org/10.1145/3394486.3403205
    • 12.
      Neutatz F, Chen B, Abedjan Z, Wu E, Berlin T. From Cleaning before ML to Cleaning for ML.
    • 13.
      Niu X, Kapoor R, Glavic B, Gawlick D, Liu ZH, Radhakrishnan V. Provenance-aware query optimization. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017, 2017;473–484
    • 14.
      Pfisterer F, Siyi W, Lang M. COMPAS Dataset in mlr3fairness 2023. https://mlr3fairness.mlr-org.com/reference/compas.html .
    • 15.
      Pimentel JF, Freire J, Murta L, Braganholo V. Fine-grained provenance collection over scripts through program slicing. In: International Provenance and Annotation Workshop, 2016;199–203.
    • 16.
      Smith MJ, Sala C, Kanter JM, Veeramachaneni K. The machine learning bazaar: Harnessing the ml ecosystem for effective system development. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. SIGMOD ’20. ACM, New York, NY, USA 2020
    • 17.
      Sundararajan M, Najmi A. The many shapley values for model explanation. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, 2020;119:9269–9278 . https://proceedings.mlr.press/v119/sundararajan20b.html
    • 18.
      Volk A. Dataset of USED CARS 2023. https://www.kaggle.com/datasets/volkanastasia/dataset-of-used-cars .
    • 19.
      Zhang Q, Cao Y, Wang Q, Vu D, Thavasimani P, McPhillips T. et al. Revealing the Detailed Lineage of Script Outputs using Hybrid Provenance. In: Procs. 11th Intl. Digital Curation Conference (IDCC) 2017.
    • 20.

      Peeking inside the black-box: a survey on explainable artificial intelligence (XAI)

      Adadi, A; Berrada, M. IEEE Access Vol. 6, .