The rise of EdTech tools has transformed educational environments worldwide, offering innovative ways for students to learn with. However, these advancements come with significant data privacy concerns, especially as they relate to children’s personal data. To ensure compliance with the General Data Protection Regulation (GDPR) in the UK and EU, it’s crucial for EdTech vendors to maintain clear and transparent Data Privacy Policies (DPPs), once that clearly say how companies protect children’s rights to privacy and quality education.
Our recent research (and further work to be presented at ICERI2024 this November) delves into how well EdTech companies align with privacy regulations and whether their DPPs are user-friendly and transparent (see [13] in Resources for the full database of EdTechs, policies and other relevant data around the research). Following EDDS’s advanced assessments of EdTech, the objectives were to identify ways to scale the process of some aspects of the assessment by utilising advancing technologies and innovative methodologies. Specifically, we tested whether machine learning (ML) models could automate the assessment of EdTech vendors’ DPPs effectively. Here is a brief outline of the methodology, results, and key takeaways.
Understanding the Challenge: Data Privacy and EdTech
The UK and EU GDPR establish a comprehensive framework that governs data processing activities. EdTech companies, by nature of their operations, often handle vast amounts of sensitive student and teacher data, making compliance with these regulations crucial. Yet, compliance is not always straightforward.
Why Does It Matter?
Children’s rights & safety: The processing of minors’ data brings unique challenges as per the United Nations Convention on the Rights of the Child (UNCRC) and the UK’s Age-Appropriate Design Code (AADC).
Transparency & trust: A clear DPP enhances trust among parents, educators, and students, ensuring that the company’s practices are well-understood and aligned with user expectations.
Research objectives
Our research sought to explore two key questions:
Do EdTech DPPs effectively communicate data handling practices and ensure transparency?
Can Machine Learning accurately assess EdTech vendors’ DPPs for compliance with UK and EU GDPR requirements?
The ultimate goal was to offer a streamlined assessment framework for schools and educators to evaluate EdTech vendors more efficiently, without needing extensive legal expertise.
Methodology overview
Our methodology had several key components:
Manual analysis: We manually reviewed DPPs from 10 EdTech vendors to identify initial gaps and themes.
Criteria development: Using 44 questions aligned with GDPR requirements, we developed a scorecard to evaluate DPPs. This included mandatory legal information as well as desirable aspects like transparency and readability.
ML assessment: We explored four distinct ML-based approaches using OpenAI’s API to automate policy evaluation, ranging from direct API calls to more complex frameworks like LangChain and Retrieval-Augmented Generation (RAG).
Human oversight: Legal experts cross-checked the ML results to ensure accuracy and identify any false positives or negatives.
Key Findings: manual vs. ML-based assessments
Our findings indicate significant discrepancies in how well EdTech vendors communicate their data practices through their DPPs.
Gaps in DPPs
We found that many DPPs:
Use vague language without providing actionable details. For instance, while several policies mention data minimization, they often fail to specify how this principle is practically applied.
Lack clear information on data retention periods and what happens after the retention period expires.
Don't consistently disclose data sharing with third parties, creating ambiguity around how personal data is managed.
ML models show promise, but aren’t perfect
While our ML-based assessment showed potential for automating evaluations, the results were far from perfect:
Inconsistent accuracy: For instance, the assessment of Doodle Learning’s DPP showed that only 28 out of 44 scores were correct, with several false positives and negatives. This inconsistency points to a need for further refinement.
Misinterpretation of evidence: ChatGPT often cited irrelevant excerpts as evidence for compliance, highlighting limitations in its understanding of complex legal language and dependencies between different policy sections.
Inability to handle abstract concepts: Terms like “fair and transparent data processing” or normative judgments on sufficiency often led to vague or incorrect justifications by the model.
Conclusions and implications
Our research underscores two main points:
Manual assessment remains crucial: While ML models can augment the process, they are not yet reliable enough to replace human experts. Misinterpretations and incorrect conclusions can lead to a false sense of compliance.
Need for improved DPP transparency: Many EdTech vendors still lack detailed disclosures, making it challenging for stakeholders to assess compliance. Enhancing the clarity and structure of DPPs is essential for fostering trust and protecting user rights.
Future directions: AI-augmented compliance tools
Despite current limitations, AI holds promise for scaling compliance assessments, especially when combined with human oversight. Future research can focus on:
Refining ML models: Enhancing prompt templates and developing specialised frameworks to better handle complex legal language and dependencies.
Expanding evaluation criteria: Beyond GDPR compliance, incorporating other standards such as AADC, digital accessibility, and cybersecurity to create a holistic assessment framework.
Resources:
Information Commissioner’s Office, “Applications - children and the GDPR,” 2018. Available at: https://ico.org.uk/media/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/children-and-the-gdpr-1-0.pdf.
Committee on the Rights of the Child, “General comment No. 25 (2021) on children’s rights in relation to the digital environment,” 2021. Available at: https://tbinternet.ohchr.org/_layouts/15/treatybodyexternal/Download.aspx?symbolno=CRC/C/GC/25&Lang=en.
K. Manwaring, K. Kemp, and R. Nicholls, “(mis)Informed Consent in Australia,” Report (UNSWorks), UNSW Law Research, 2021. Available at SSRN: https://ssrn.com/abstract=3859848 or http://dx.doi.org/10.2139/ssrn.3859848.
J. van Dijck, “Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology,” Surveillance & Society, vol. 12, no. 2, pp. 197–208, 2014. doi:10.24908/ss.v12i2.4776.
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (United Kingdom General Data Protection Regulation)(Text with EEA relevance). Available at: https://www.legislation.gov.uk/eur/2016/679/contents
UNESCO, “Technology in Education: Tool on Whose Terms?” Paris: UNESCO, 2023. Available online: https://www.unesco.org/gem-report/en/technology
Filipovska, E., Mladenovska, A., Bajrami, M., Dobreva, J., Hillman, V., Lameski, P., & Zdravevski, E. “Benchmarking OpenAI’s APIs and Other Large Language Models for Efficient Question Answering Across Multiple Documents,” Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 107–117., 2024
European Data Protection Board, Guidelines 07/2020 on the Concepts of Controller and Processor in the GDPR, Version 2.1. Adopted on 07 July 2021. Available at: https://edpb.europa.eu/system/files/2023-10/EDPB_guidelines_202007_controllerprocessor_final_en.pdf
Information Commissioner’s Office, The Children’s Code and Education Technologies (edtech). ICO, 2023. Available at: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/childrens-information/childrens-code-guidance-and-resources/the-children-s-code-and-education-technologies-edtech/
Data Protection Act 2018. Available at: https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted
Filipovska, E., Mladenovska, A., Dobreva, J., Kitanovski, D., Mitrov, G., Lameski, P., & Zdravevski, E. “Evaluation of vector databases and LLMs in RAG-based multi-document question answering,” ICT Innovations 2024. Tech Convergence: AI, Business, and Startup Synergy, Springer, 2024.
R. Binns and D. Matthews, “Community structure for efficient information flow in ‘ToS;DR’, a social machine for parsing legalese,” in Proceedings of the 23rd International Conference on World Wide Web, pp. 881–884, April 2014.
Raw data for this research can be found on GitHub here: https://github.com/admin-magix/edtech-policies
コメント