Copyright and Artificial Intelligence Consultation

14 Aug

Consultation Response

Credits — Craig Dalzell

Overview

A response to the UK Government’s proposal to give developers of Generative Artificial Intelligence programmes a near blanket exemption to copyright laws, allowing them to harvest creative works to train GAIs. We believe that these GAIs undermine creative endeavour and may pose an existential threat to artists and other creatives - including Common Weal.

Download the full policy paper here

The UK Government’s proposal to give AI developers a near blanket exemption to copyright law presents an existential risk to creatives and overwhelmingly favours the desires of billionaire AI platform owners and oligopolistic corporations who have already admitted that their current business model would be “impossible” if they are made to comply with the law as it is currently written and evidently not enforced. The claim that creatives will be able to approach these massive companies to negotiate a fair deal for the work that has been harvested from them without permission and then used to undermine their future work is nonsense. The ability to uphold copyright law is already too weak when it comes to individuals and too strong when it comes to corporations.

Instead of legalising the corporate theft of creative work, the UK Government should clarify that AI developers must prove that they have obtained a licence to use all of the copyrighted work used to train their AI models, must publish copies of those licences transparently and in all cases and must accept that failure to comply with this demand will result in their AIs and derivative products not being licenced to use in the UK and that breaches of copyright may result in restitutions including the payment of damages to the creatives involved that exceed the “cost of doing business” for the AI developer.

Key Points

The UK Government’s proposal is effectively an “opt-out” scheme that will allow AI developers to harvest creative works to train their AIs unless a creator specifically disallows it.
However it is, in practice, impossible for a creator to effectively opt out as it would require them to assess whether or not their work has been harvested - which is often not possible as developers do not disclose the sources of their training data.
Furthermore, several AI developers have admitted to or have been caught breaching existing copyright law by training their AIs on pirated works. At least one has admitted that their business model would be “impossible” without breaking the law.
In practice, only the largest corporations will have the means to defend themselves against such copyright breaches or to enforce the licencing of their work to AI companies.
Many of these corporations are, themselves, oligopoly owners of platforms that could force creators to licence their work to AI developers as a condition of use.
Rather than granting exemptions to copyright laws that would apply to billionaire-owned corporations but not to individual creators, the UK should enforce existing copyright law and demand that AI developers prove that they have a licence to use works before training AIs with them.

Read our responses to the consultation online below

See downloadable document for details
Yes -
No - X

The proposals are overwhelmingly likely to favour AI companies (who will be directly aided to undertake activities that are currently illegal) and large copyright holders (such as media companies) who will be better able to reserve and enforce their rights. Individual creators will find it difficult to discover that their rights have been broken and will find it almost impossible to individually negotiate appropriate licensing agreements or compensation for breaches of rights. Placing the onus of responsibility on individual rights holders will favour large companies and disfavour individuals who will lack collective bargaining powers or possibly even see their negotiated licences held behind non-disclosure agreements which will act to prevent other creators from negotiating fair compensation for their work or for breaches of their rights.
Option 0: Copyright and related laws remain as they are
Option 1: Strengthen copyright requiring licensing in all cases - X
Option 2: A broad data mining exception
Option 3: A data mining exception which allows right holders to reserve their rights, supported by transparency measures

The Government has made it clear that they will not accept Option 0 and this option is poorly enforced anyway. The best alternative option therefore is Option 1 with mandatory and provable licensing of all material used to train AI and strict enforcement of compliance against any attempts to breach, bypass or otherwise circumvent the rights of creators to reserve their work against being used to train AI.
Yes
No - X

The blanket exemption as proposed is nothing less than legalised copyright theft until proven otherwise, with the burden of proof placed on the copyright holder who may not be notified that their works have been stolen until it is too late to prevent it and the damage is already done.
This proposal would never be accepted if it was the other way around – if individuals were legally allowed to breach the copyright of large companies until and unless the company complained and then entered into licencing negotiations with each individual who breached those rights. The law cannot be applied unequally. It is telling that the CEO of OpenAI, Sam Altman, has confessed that their business would be “impossible” under current legislation. This should be a reason to prosecute breaches of the law, not to change the law to allow billionaires and large companies to operate “legally” where the same actions undertaken by individuals would remain illegal.
The current law states that under copyright, all rights are reserved unless the creator grants a waiver or licence otherwise (this includes Creative Commons licences where some rights are reserved depending on the licence applied). Equality under the law must be maintained. Businesses must operate legally or not at all.
AI companies must demonstrate that they have obtained appropriate licences from the copyright holders of all material used in their work or that the material has been entered into the public domain. Additionally, AI companies who cannot demonstrate that their product complies with UK copyright law even if trained outwith the UK must not be allowed to sell their products, services or outputs derived from the AI in the UK.
Generative AI (GAI) presents an existential threat to our organisation. Most of our work output is licenced under open and permissive licences such as CC-NC. On this face of it, this proposal would allow GAI companies to scrape our work for commercial purposes (i.e. in defiance of our non-commercial licence) to train their models. The threat lies in the output of those models. One growing sector is the use of chatbots to scrape and summarise websites as a replacement to search engines returning navigable links. In previous years, an internet search that takes the form “What does Common Weal say about private rents?” would return a search result to our policy papers on the private rented sector.
With GAI search engine replacement, the chatbot would directly answer “Common Weal believes the following about private rents:…”
In the former case, the searcher may have found our paper interesting enough to donate to our organisation. However, in the latter case, the searcher is satisfied with the chatbot’s answer, never visits our own website and does not donate to us.
In this manner, the GAI company has diverted a potential revenue stream from our organisation (and has likely received income in the form of adverts placed next to the chatbot’s answer) based on work that it has scraped from us.
As stated, our current permissive licence allows for this but this proposal means that we may be forced to apply a “No-AI” policy to our work and may have to divert scarce resources into enforcing our rights without guarantee that this will be sufficient protection or that we will have sufficient resources to protect our rights.
We would prefer that developers demonstrate that they have the licence to use material rather than relying on a creator’s “opt-out” system which risks developers unlawfully using work that may have a reservation applied to it. This would inherently protect developers against inadvertently breaching copyright.
If the Government’s proposal takes effect, then developers must take all due care to ensure that work has not been reserved. It is not sufficient to rely on machine readable methods such as a robot.txt file on the creator’s website as the nature of the internet means that material can be easily shared and found elsewhere. If, for example, someone breaches a creator’s copyright by unlawfully posting a copy of a creator’s work in a place where there is no robot.txt file then AI developers may scrape that work regardless of the copyright reservation applied by the creator. That one copyright breach occurred somewhere does not permit or excuse further breaches elsewhere nor would it absolve the AI developer of their responsibility to respect copyright law.
At an individual creator level, we would expect creators whose rights have been breached would be directly compensated at a level not less than what would have been expected from a fair licencing deal, plus appropriate damages for the breach of copyright and that these damages should exceed “the cost of doing business” by way of being higher than the cost of compliance (for “appropriate damages”, AI developers may wish to refer to the damages demanded by record companies during the height of unlawful mp3 sharing during the early 2000s where several tens of thousands of dollars were often demanded per file shared). Additionally, any products or services found to have relied on breached copyrights must be withdrawn and service users compensated. AI developers who repeatedly or flagrantly breach copyrights in the UK or globally must not benefit from UK public subsidies or public procurement contracts and should not be given a licence either to trade in the UK or to use UK-based data for their training until they can demonstrate compliance with these regulations.
No. While a machine readable reservation is a necessary step to protect rights against careless and unscrupulous data scrapers, it is not sufficient. As stated in Q10, works may be unlawfully shared elsewhere or may be shared in a fashion that strips out file-by-file metadata which could result in copyrighted material being scraped despite a reservation applied. Developers must ensure that all work they wish to use in AI training has been appropriately and provably licenced before it is used.
Individual reservations should also be “always applicable” hold above and beyond contracts such as EULA agreements to access online platforms (e.g. if a social media platform changes its EULA to allow for AI training of material on its site and does not allow for the option to opt-out then this should not affect and should be subordinate to any reservation applied by the creator to that work even if that reservation is not published on that platform).
Current copyright law is already sufficiently standardised insofar as a creator reserves all rights to their work and may choose to licence that work as they see fit. The problems facing copyright law do not lie in standardisation but in lack of ability to uphold those rights as well as the Government's attempts to reduce standardisation by giving exemptions to privileged companies at the expense of individuals who are least able to defend their rights.
Developers must transparently demonstrate that they hold the licence to all works used in their AI products. If they cannot demonstrate compliance with the law then appropriate sanctions must be in place to encourage such compliance.
A Government’s purpose is to set and to enforce the law.
No. The current and proposed practice places the burden of proof entirely on creators. With regards to AI developers it is currently not possible even in principle to determine if copyrights have been breached as the training data used by developers is often obscured behind proprietary barriers.
In addition, Oligopoly platforms (from record companies to social media platforms – both of which are actively exploring training AI on their “owned” works) often enforce contracts including surrender of rights as the only means of accessing an audience. It can be foreseen that creators will be forced by such platforms to drop their copyright reservations if they wish their work to be used on these platforms. This might meet the needs of the platforms and of the AI developers but this would be the opposite of meeting the needs of creators and performers.
We currently do not licence work to AI developers and would generally refuse to do so despite it being known to us that at least some of our work has already been scraped and used to train AI products. No current AI developer pays us any royalties or licence fees for any of our work that they have used to train their AI even if they have scraped that work without licence.
“Good licensing practice” should be synonymous with “obeying the law”. No AI company should be allowed to operate in the UK unless it can demonstrate full compliance including providing licences with copyright holders whose work has been used. AI companies that breach UK law or breach local laws using UK creators’ data should be prepared to pay adequate compensation and/or face legal sanctions.
Yes - X
No -
The Government should not allow companies to operate in the UK unless they can demonstrate compliance, should actively help creators apply reservations to their work and negotiate licences and should actively assist in the enforcement of rights including the collection of damages for breaches. Government should also set the minimum standard for collective licensing especially by engaging with appropriate trade unions (such as the Scottish Artists’ Union) to determine appropriate minimum licencing fees for creative works used to train AI.
All creators of works covered by copyright should be duly considered but the Government should also take into account works protected under “some rights reserved” licences such as various Creative Commons licences. In particular, relatively permissive licences such as CC-BY ( CreativeCommons-Attribution) would allow for AI Developers to use that work on condition that the creator is credited. Failure to credit the creator would be considered just as much of a breach as using work covered by a copyright reservation. Similarly, CC-BY-SA (CreativeCommons-Attribution-ShareAlike) works contain a clause that work derived from the original work must use the same licence as the original (that is, it is “shared alike” to the original). This implies that an AI trained using such a work could not have a copyright applied to it or its output – only a similar CC-BY-SA licence – without being considered in breach of licence. Note that is possible in both of these examples for an AI developer to be in breach of these copyright licences even if their AI product is not used for commercial purposes.
Yes, all training materials must be disclosed along with the sources of the material and proof that licences were obtained for all the work used.
All copyright holders of all work used in the inputs should be identified, along with a copy of the licence granting permission to use the work.
Web crawlers should disclose a list of all websites crawled within a particular dataset.
AI Developers must comply with copyright law, must ensure they only used material that they have obtained the licence to use and must disclose – with named copyright holders – the full list of material used to train their systems. Anything less is an existential threat to the creators whose work they have used without licence.
Sam Altman, CEO of OpenAI, indicated in January 2024 that it was “impossible” for his company to train AI without breaching copyright. If any other business in any other sector said that it was “impossible” for them to trade in the UK without breaking the law then they would not be granted a licence to trade. Given those comments, we would anticipate that the cost of complying with the law would exceed the point of business viability. If this is the case, then that is the price of protecting creators from being exploited by billionaires and their companies and is a price worth paying for the public interest.
Regulatory underpinning of compliance is essential as is a Government commitment to prosecute breaches of those laws and regulations to their full extent – just as they would if an individual attempted to breach the copyright of the billionaire AI developers.
We advocate that the EU should follow similar guidelines to those we advocate here for the UK Government.
Mandate through UKGDPR or other appropriate legislation that all data using UK sources must comply with UK laws and that all AI outputs deployed in the UK use sources that comply with UK law, especially AI used in UK public tenders or which have benefited from public subsidy, investment, tax breaks or any other form of public assistance.
The Government should reiterate its commitment to the Berne Convention, in particular that rights holders have the exclusive right to decide how reproductions of their works are licenced and that Member States do not issue blanket or specific exemptions to copyright that conflict with the normal exploitation of work and which unreasonably prejudice the interests of the author. Unauthorised use of work to train AI blatantly overrides both of these principles. AI models trained outwith the UK, with or without material from UK rights holders, must not be licenced for use in the UK or, if used, should be investigated for breaches of copyright. This ensures fairness for AI developers by subjecting them to the same laws as everyone else.
If a temporary copy of material results in a permanent output (e.g. if a work is used to train an AI, the work is then deleted but the AI is able to use that training in future outputs) then the copied material cannot be regarded as “temporary”. Many copyright licences do not include a fair dealing/fair use policy for “temporary” copies used in this way and thus a blanket exception remains inappropriate.
The intended purpose of the temporary copies exception is effectively to allow copies of work to “pass through” technological processes without affecting them (examples are copies made to display on a screen or which are held in a browser cache). As stated in Q31, if an act of training AI with a “temporary” copy of a work results in a permanent change to the behaviour of the AI, the use of the work itself cannot be regarded as temporary and thus cannot be within the scope of the exemption.
No. Several copyright licences allow for non-commercial use ( for example, CC-NC) but several explicitly do not. All material should be appropriately licenced. Not that as per our answer to Q21, it is possible for a licence that grants non-commercial permission to still be breached - for example if a CC-BY-NC licenced work is used non-commercially but without appropriate attribution to the creator.
All firms should be held to the same standards under the law, however scope for larger firms to suffer greater punishment for breach – especially where they own oligopoly platform power. Damages may include not just direct damage for individual breach, but fines as a proportion of revenue to reflect overall damage to creative sectors caused by large companies abusing their oligopoly or monopoly power. In all cases, fines should certainly exceed "the cost of doing business" or they will fail to be an adequate deterrent.
Yes - X
No
While Common Weal issues our creative work under a Creative Commons Non-Commercial licence with a generally permissive licence for reuse, we have identified significant and even existential risks to our organisation should that licence be abused or if reservations are ignored or overridden by AI Developers. Under the proposed legislation, we would be forced to bear those risks regardless of any attempt to apply more restrictive licencing to our work.
We would oppose any interpretation of any law that favours billionaires or oligopoly companies over normal individuals.
Yes - X
No
The purpose of copyright is to protect and encourage the creative endeavours of sentient artists. The requirement for originality should be clarified to make clear that the outputs of non-sentient generative AI cannot, in principle, be regarded as “original” even if a sentient artist could have produced a similar work and claimed a copyright on it. The rights of artists should be at the core of the legislation, not the rights of the tools used.
See answer to Q37
Significant positive impact
Minor positive impact
No impact
Minor negative impact
Significant negative impact - X

A change to the provision opens the possibility of our works being used to train an AI, that AI being used to summarise or otherwise output our work and then to claim a more restrictive copyright on that work than we have applied to the originals and in a manner that impacts our ability to generate and collect revenue from our work.
Yes - X
No -
As stated above, the purpose of copyright is to encourage the creative endeavours of sentient artists. The only purpose of granting copyrights to computer-generated works would be to crowd out the efforts of those artists with work that may have only been possible to create by breaching the copyrights of those artists as has been admitted has been the case by both Meta and OpenAI.
The economic impact of allowing corporations to claim copyright over non-human generated outputs would be the enrichment of a few oligopolistic companies with the resources to harvest the creative endeavours of humanity and then make it nearly impossible for creators to create new works, protected by law.
Significant positive impact
Minor positive impact
No impact
Minor negative impact
Significant negative impact - X

The Government’s proposed change to the provision opens the possibility of our works being used to train an AI, that AI being used to summarise or otherwise output our work and then to claim a more restrictive copyright on that work than we have applied to the originals and in a manner that impacts our ability to generate and collect revenue from our work.
No. It is currently impossible, even in principle, to discover if a particular creative work (or the entire portfolio of an artist) has been used illegally by an AI-developer. AI trained on such works would breach copyright even if the actual outputs do not contain any visible infringing material (though it has been shown that many AI image generators can output works containing artist or company watermarks or even near complete copies of entire works). There is also a great imbalance of power in the enforcement of rights as there is no effective legal mechanism for individual artists to pursue redress if their copyrights are infringed and the Government, via this consultation, appears to be unwilling to adequately fine or otherwise prosecute infringing AI-developers.
AI providers should ensure that they are licenced to use all works in their training databases and must take the normal steps any other creator would take prior to publication to ensure that their work does not intentionally or unintentionally infringe copyright.
Yes - X
No
AI-labelling is particularly important in cases where work relates to public figures or events as this has already proven to be a major source of misinformation but it is important in all other cases including where an artist’s “style” can be replicated in a manner that doesn’t breach copyright. These labels should be human and machine readable. In images and video, labelling should be in the form of an embedded watermark that is clearly visible even when the image is substantially cropped. In text, the disclaimer should be placed prominently at the head of the text to warn that AI has been used to generate the output and that it is a breach of licence to distribute the text without a similar warning. In audio, an audible watermark should be prominently placed within the output. In all digital contexts, additional watermarks should be included within the metadata of the file. Additional protections could be included in the form of less obtrusive machine readable watermarks analogous to some of the protections included in currency counterfeiting protections.
Government must ensure that all emerging tools comply with the law and that laws and regulations are designed to work for people, not profit.
We would advocate for the EU to follow the same principles as we advocate that the UK does.
In practice, users currently have little control and the proposed exceptions to copyright would only weaken those controls. This is probably only a generalisation as there is at least some anecdotal evidence that some AI image generators perform significantly worse at replicating the features of some public figures than others which may suggest that there have been efforts made (either proactively by the developer or after negotiation with the public figure) to not include images of the figure in question in the training data. Similar to the copyright of creative works, the images and voices of people should not be included within AI training data without express permission and licence and that permission must not be tied to images posted online for purposes other than AI training (e.g. someone posting their family photos to Facebook should not have to discover that Meta (or another AI developer) has changed their EULA to allow the photo to be scraped and replicated in an AI image generator.
We have not yet actively used digital replicas but have explored their use particularly in video voiceover or the production of audio versions of policy papers however we have rejected this as it would normalise the disempowering of paid voice actors.
It is not sufficiently clear – as evidenced by the fact that the Government is attempting to weaken copyright law. Government should instead mandate that all AI products that interact with copyright works are licenced and are acting within the scope of that licence and should actively assist copyright holders to assert their rights by fining breaches (to pay damages incurred) and by withdrawing AI product licences from use within the UK.
Synthetic data is already showing its limits as it effectively inbreeds the dataset of the AI leading to loss of output quality. Already there is a growing demand for data produced specifically to feed into AI (not dissimilar to ashcan copy films made not for an audience but for non-artistic obligations such as retention of legal rights to characters) however this is leading to low grade creative works being created at even lower compensation rates – further disempowering artists AND lowering the quality of the output of AI. Synthetic data should not be used outwith a research context but this is probably a matter of company policy rather than a regulatory issue.
Government should remember that the purpose of copyright is to protect artists. Copyright has already been greatly weakened by allowing copyright to be a transferable product (and to allow the rights to be held by corporates and other bodies who are not natural persons) and by extending the length of copyright beyond the lifetime of the creator (a move that, again, only benefits immortal corporations). Any future review of copyright to deal with new developments should begin with "creator-first" principles at their core.

Craig Dalzell

Copyright and Artificial Intelligence Consultation

Consultation Response

Overview

Key Points

Q1-3 - Personal and Organisational information

Q4 - Do you agree that option 3 - a data mining exception which allows right holders to reserve their rights, supported by transparency measures - is most likely to meet the objectives set out above?

Q5 - Which option do you prefer and why?

Q6 - Our [The UK Government's] proposed approach: Exception with rights reservation Q6 Do you support the introduction of an exception along the lines outlined in section C of the consultation?

Q7 - If so, what aspects do you consider to be the most important?

Q8 - If not, what other approach do you propose and how would that achieve the intended balance of objectives?

Q9 - What influence, positive or negative, would the introduction of an exception along these lines have on you or your organisation? Please provide quantitative information where possible.

Q10 - What action should a developer take when a reservation has been applied to a copy of a work?

Q11 - What should be the legal consequences if a reservation is ignored?

Q12 - Do you agree that rights should be reserved in machine-readable formats? Where possible, please indicate what you anticipate the cost of introducing and/or complying with a rights reservation in machine-readable format would be.

Q13 - Is there a need for greater standardisation of rights reservation protocols?

Q14 - How can compliance with standards be encouraged?

Q15 - Should the government have a role in ensuring this and, if so, what should that be?

Q16 - Does current practice relating to the licensing of copyright works for AI training meet the needs of creators and performers?

Q17 - Where possible, please indicate the revenue/cost that you or your organisation receives/pays per year for this licensing under current practice.

Q18 - Should measures be introduced to support licensing good practice?

Q19 - Should the government have a role in encouraging collective licensing and/or data aggregation services?

Q20 - If so, what role should it play?

Q21 - Are you aware of any individuals or bodies with specific licensing needs that should be taken into account?

Q22 - Do you agree that AI developers should disclose the sources of their training material?

Q23 - If so, what level of granularity is sufficient and necessary for AI firms when providing transparency over the inputs to generative models?

Q24 - What transparency should be required in relation to web crawlers?

Q25 - What is a proportionate approach to ensuring appropriate transparency?

Q26 - Where possible, please indicate what you anticipate the costs of introducing transparency measures on AI developers would be.

Q27 - How can compliance with transparency requirements be encouraged, and does this require regulatory underpinning?

Q28 - What are your views on the EU’s approach to transparency?

Q29 - What steps can the government take to encourage AI developers to train their models in the UK and in accordance with UK law to ensure that the rights of right holders are respected?

Q30 - To what extent does the copyright status of AI models trained outside the UK require clarification to ensure fairness for AI developers and right holders?

Q31 - Does the temporary copies exception require clarification in relation to AI training?

Q32 - If so, how could this be done in a way that does not undermine the intended purpose of this exception?

Q33 - Does the existing data mining exception for non-commercial research remain fit for purpose?

Q34 - Should copyright rules relating to AI consider factors such as the purpose of an AI model, or the size of an AI firm?

Q35 - Are you in favour of maintaining current protection for computer-generated works? If yes, please explain whether and how you currently rely on this provision.

Q36 - Do you have views on how the provision should be interpreted?

Q37 - Would CGW legislation benefit from greater legal clarity, for example to clarify the originality requirement? If so, how should it be clarified?

Q38 - Should other changes be made to the scope of CGW protection?

Q39 - Would reforming the CGW provision have an impact on you or your organisation? If so, how? Please provide quantitative information where possible.

Q40 - Are you in favour of removing copyright protection for computer-generated works without a human author?

Q41 - What would be the economic impact of doing this? Please provide quantitative information where possible.

Q42 - Would the removal of the current CGW provision affect you or your organisation? Please provide quantitative information where possible

Q43 - Does the current approach to liability in AI-generated outputs allow effective enforcement of copyright?

Q44 - What steps should AI providers take to avoid copyright infringing outputs?

Q45 - Do you agree that generative AI outputs should be labelled as AI generated? If so, what is a proportionate approach, and is regulation required?

Q46 - How can government support development of emerging tools and standards, reflecting the technical challenges associated with labelling tools?

Q47 - What are your views on the EU's approach to AI output labelling?

Q48 - To what extent would the approach(es) outlined in the first part of this consultation, in relation to transparency and text and data mining, provide individuals with sufficient control over the use of their image and voice in AI outputs?

Q49 - Could you share your experience or evidence of AI and digital replicas to date?

Q50 - Is the legal framework that applies to AI products that interact with copyright works at the point of inference clear? If it is not, what could the government do to make it clearer?

Q51 - What are the implications of the use of synthetic data to train AI models and how could this develop over time, and how should the government respond?

Q52 - What other developments are driving emerging questions for the UK’s copyright framework, and how should the government respond to them?

Why and how the Scottish Government must end private provision of children’s care

Successful Consultation With The Global South

Donate Subscribe Contact