A recent Forbes article summarizes a potentially problematic aspect of AI which highlights the importance of governance and the quality of data when training AI models.  It is called “model collapse.”  It turns out that over time, when AI models use data that earlier AI models created (rather than data created by humans), something is lost in the process at each iteration and the AI model can fail.

According to the Forbes article:

Model collapse, recently detailed in a Nature article by a team of researchers, is what happens when AI models are trained on data that includes content generated by earlier versions of themselves. Over time, this recursive process causes the models to drift further away from the original data distribution, losing the ability to accurately represent the world as it really is. Instead of improving, the AI starts to make mistakes that compound over generations, leading to outputs that are increasingly distorted and unreliable.

As the researchers published in Nature who observed this effect noted:

In our work, we demonstrate that training on samples from another generative model can induce a distribution shift, which—over time—causes model collapse. This in turn causes the model to mis-perceive the underlying learning task. To sustain learning over a long period of time, we need to make sure that access to the original data source is preserved and that further data not generated by LLMs remain available over time. The need to distinguish data generated by LLMs from other data raises questions about the provenance of content that is crawled from the Internet: it is unclear how content generated by LLMs can be tracked at scale. One option is community-wide coordination to ensure that different parties involved in LLM creation and deployment share the information needed to resolve questions of provenance. Otherwise, it may become increasingly difficult to train newer versions of LLMs without access to data that were crawled from the Internet before the mass adoption of the technology or direct access to data generated by humans at scale.

These findings highlight several important considerations when using AI tools. One is maintaining a robust governance program that includes, among other things, measures to stay abreast of developing risks. We’ve heard a lot about hallucinations. Model collapse is a relatively new and a potentially devastating challenge to the promise of AI. It raises an issue similar to the concerns with hallucinations, namely, that the value of the results received from a generative AI tool, one that an organization comes to rely on, can significantly diminish over time.

Another related consideration is the need to be continually vigilant about the quality of the data being used. Trying to distinguish and preserve human generated content may become more difficult over time as sources of data will be increasingly rooted in AI-generated content. The consequences could be significant, as the Forbes piece notes:

[M]odel collapse could exacerbate issues of bias and inequality in AI. Low-probability events, which often involve marginalized groups or unique scenarios, are particularly vulnerable to being “forgotten” by AI models as they undergo collapse. This could lead to a future where AI is less capable of understanding and responding to the needs of diverse populations, further entrenching existing biases and inequalities.

Accordingly, organizations need to build strong governance and controls around the data on which their (or their vendors’) AI models were and continue to be trained. That need is only made more clear considering the potential for model collapse is only one of a number of risks and challenges facing organizations when developing and/or deploying AI.

Print:
Email this postTweet this postLike this postShare this post on LinkedIn
Photo of Joseph J. Lazzarotti Joseph J. Lazzarotti

Joseph J. Lazzarotti is a principal in the Berkeley Heights, New Jersey, office of Jackson Lewis P.C. He founded and currently co-leads the firm’s Privacy, Data and Cybersecurity practice group, edits the firm’s Privacy Blog, and is a Certified Information Privacy Professional (CIPP)…

Joseph J. Lazzarotti is a principal in the Berkeley Heights, New Jersey, office of Jackson Lewis P.C. He founded and currently co-leads the firm’s Privacy, Data and Cybersecurity practice group, edits the firm’s Privacy Blog, and is a Certified Information Privacy Professional (CIPP) with the International Association of Privacy Professionals. Trained as an employee benefits lawyer, focused on compliance, Joe also is a member of the firm’s Employee Benefits practice group.

In short, his practice focuses on the matrix of laws governing the privacy, security, and management of data, as well as the impact and regulation of social media. He also counsels companies on compliance, fiduciary, taxation, and administrative matters with respect to employee benefit plans.

Privacy and cybersecurity experience – Joe counsels multinational, national and regional companies in all industries on the broad array of laws, regulations, best practices, and preventive safeguards. The following are examples of areas of focus in his practice:

  • Advising health care providers, business associates, and group health plan sponsors concerning HIPAA/HITECH compliance, including risk assessments, policies and procedures, incident response plan development, vendor assessment and management programs, and training.
  • Coached hundreds of companies through the investigation, remediation, notification, and overall response to data breaches of all kinds – PHI, PII, payment card, etc.
  • Helping organizations address questions about the application, implementation, and overall compliance with European Union’s General Data Protection Regulation (GDPR) and, in particular, its implications in the U.S., together with preparing for the California Consumer Privacy Act.
  • Working with organizations to develop and implement video, audio, and data-driven monitoring and surveillance programs. For instance, in the transportation and related industries, Joe has worked with numerous clients on fleet management programs involving the use of telematics, dash-cams, event data recorders (EDR), and related technologies. He also has advised many clients in the use of biometrics including with regard to consent, data security, and retention issues under BIPA and other laws.
  • Assisting clients with growing state data security mandates to safeguard personal information, including steering clients through detailed risk assessments and converting those assessments into practical “best practice” risk management solutions, including written information security programs (WISPs). Related work includes compliance advice concerning FTC Act, Regulation S-P, GLBA, and New York Reg. 500.
  • Advising clients about best practices for electronic communications, including in social media, as well as when communicating under a “bring your own device” (BYOD) or “company owned personally enabled device” (COPE) environment.
  • Conducting various levels of privacy and data security training for executives and employees
  • Supports organizations through mergers, acquisitions, and reorganizations with regard to the handling of employee and customer data, and the safeguarding of that data during the transaction.
  • Representing organizations in matters involving inquiries into privacy and data security compliance before federal and state agencies including the HHS Office of Civil Rights, Federal Trade Commission, and various state Attorneys General.

Benefits counseling experience – Joe’s work in the benefits counseling area covers many areas of employee benefits law. Below are some examples of that work:

  • As part of the Firm’s Health Care Reform Team, he advises employers and plan sponsors regarding the establishment, administration and operation of fully insured and self-funded health and welfare plans to comply with ERISA, IRC, ACA/PPACA, HIPAA, COBRA, ADA, GINA, and other related laws.
  • Guiding clients through the selection of plan service providers, along with negotiating service agreements with vendors to address plan compliance and operations, while leveraging data security experience to ensure plan data is safeguarded.
  • Counsels plan sponsors on day-to-day compliance and administrative issues affecting plans.
  • Assists in the design and drafting of benefit plan documents, including severance and fringe benefit plans.
  • Advises plan sponsors concerning employee benefit plan operation, administration and correcting errors in operation.

Joe speaks and writes regularly on current employee benefits and data privacy and cybersecurity topics and his work has been published in leading business and legal journals and media outlets, such as The Washington Post, Inside Counsel, Bloomberg, The National Law Journal, Financial Times, Business Insurance, HR Magazine and NPR, as well as the ABA Journal, The American Lawyer, Law360, Bender’s Labor and Employment Bulletin, the Australian Privacy Law Bulletin and the Privacy, and Data Security Law Journal.

Joe served as a judicial law clerk for the Honorable Laura Denvir Stith on the Missouri Court of Appeals.