Gen AI Copyright Act

The “Generative AI Copyright Disclosure Act of 2024” has just been proposed in the U.S. House of Congress on April 9th, 2024 to address the issue of training generative AI systems on copyrighted material.  This is a proposed bill by U.S. Californian Congressman Adam Schiff and as such has no registered number and is very short in its current draft form as it will go on through committee as it is reviewed and developed.  Here is a link to the bill, I suggest you go read it as it’s 5 pages double spaced and only takes a minute.

In its current form, this bill supports the very thesis of AI Data CO-OP, both the foundation and the operating company.  The proposed bill states that any dataset that is to be used for AI training, an AI training dataset, must post “a sufficiently detailed summary of any copyrighted works used.”  The bill goes on to state this applies to the base dataset and any altered dataset.  Also that a URL link to the dataset, if it is a public dataset, must be provided, like this one, https://huggingface.co/datasets/the_pile_books3 .  Which is the dataset that got everyone in trouble in the first place.

The bill goes on to state that all datasets used to train a generative AI system must report their copyright contents to a central repository to be held at the copyright office no less than 30 days prior to the public availability of the generative AI system that datasets were used to train.  And here lies the challenge.  The more data you have the better generative AI solutions you have, and thus, we must be able to accurately track both the use of copyrighted and licensed data, not only for legal purposes but to properly compensate those who helped make the quality data that these systems need to be useful.  Fortunately, here at AI Data CO-OP, both on the foundation side and the operating side we are building exactly that.  Working with the copyright office on improved standards and tracking and working on the operating infrastructure to help accelerate innovation and keep moving the whole system forward.

dpd

Previous
Previous

AIDC - Monetize your datasets

Next
Next

Good Data Matters