A year ago, GitHub/Microsoft presented the world with a tool capable of autocomplete and generate code thanks to AI, which he named GitHub Copilot and that, a few days ago, it stopped being in the free testing phase to be launched commercially. But beyond its business model, Copilot has also delivered much to talk about in this last year for legal reasons…
…because when generating new code based on that code previously created by people with which it had been ‘fed’ previously, exponentially increased the odds of repeating relevant code snippets from someone else’s workthus causing numerous software projects to be exposed to license compliance issues.
The creator of ‘Linux for M1’ thinks about it
Hector Martin, the person in charge of the development of Asahi Linux (a GNU/Linux distribution that can be installed on the Mac M1), has expressed in a twitter thread his opinion on the relationship between neural networks, copyright and source code.
“Neural networks, whether artificial or biological, don’t erase copyright. If I read some source code and write identical or very similar source code, it’s a derivative work of the original. Same goes for GitHub Copilot.”
Martin, who to avoid inadvertent copyright violations prefers to ban Asahi contributions to anyone who has had access to Apple code, affirms that his position “should not be controversial”. That requires “a lot of trust” to accept code that breaks that rule.
“Microsoft asks you to trust a neural network that can’t think, has no innate notion of copyright, can’t engage in true creativity, and bears no moral responsibility to its users. A neural network that they themselves don’t they understand.
Thank you, no thank you.”
Martin equates the use of Copilot with that of a Russian roulette that randomly combines pre-existing code, “protected by copyright and under heterogeneous licenses”. Microsoft could only have prevented Copilot from ‘spitting out’ already licensed code in two ways:
Taught your AI to understand and apply copyright regulations? and holding hard numbers showing a very high success rate in detecting (and rejecting) copyrighted code snippets. “‘Trust us’ is not enough“.
developing ‘magic’ neural networks that could guarantee that they only generate original work “thanks to some kind of embedded model that distills the ‘functionality’ of the original work without the copyrighted elements; I’d love to see how they do that.”
“If a single example can be found of Copilot having generated copyrighted code in any relevant proportion, in my opinion, that is evidence enough that Copilot, as a service and by the way Microsoft created it, constitutes a massive infringement. of copyright”.
And Martin takes it upon himself to link a case in which Copilot has indeed generated lines copied from an original that does not allow commercial use of his code: the author of the tweet knows this well, because the original code was typed by himself.
Explored github copilot,a paid service, to see if it encodes code from repositories w/ restrictive licenses.
I checked if it had code I had written at my previous employer that has a license allowing its use only for free games and requiring attaching the license.
yeah it does pic.twitter.com/JMbXNgOF3Z
— Chris Green (@ChrisGr93091552) June 22, 2022
Image | Based on original by Christ Waits