The introduction
Recently, Copilot has been the subject of a lot of discussion because of its excellent square root inverse algorithm. Github uses our open source code (as a training set) to serve us, should it be open source, and is it logical for us to pay for the service? The Free Software Foundation (FSF), while accusing GitHub of behaving unfairly, also wants to prepare for the many problems Copilot may face in the future.
Open source and open source protocols
Open source does not mean free
Open source stands for open source. I have to mention the opposite concept, Service as a Software Substitute and proprietary Software. The latter is software like VS Code that we use every day. We have no way to get the source code of this software, and it’s hard to know what the code does. For example, Windows and MAC often need us to send error reports and so on, which is actually a security risk, may leak our user privacy virtually. As for Service as a Software Substitute, it uses services to replace Software, which requires us to send user data to the server, and even all the calculations are carried out in the server. If we compare user privacy to money, proprietary Software may steal money. SaaSS is where we send money directly to someone else’s server. Therefore, in addition to reflecting the spirit of sharing on the Internet and allowing users to freely modify the source code, the most important thing is to ensure the security of our users’ privacy. We fully understand what the program may do, rather than knowing nothing about the running code.
Open source protocols in commerce
In order to standardize the different application scenarios of open source, various types of open source protocols have emerged. Here, multiple open source protocols are simply classified according to their own characteristics.
The GPL is more like a license than a contract. The difference is that a license allows you to do something that you wouldn’t otherwise be allowed to do. For example, when I want to go fishing, I have to get a fishing permit from the local municipality. Contracts require reciprocal obligations, so the government gives me a permit, but I have to give the government half of the fish I catch.
Of course, this does not mean that I do not have to perform any obligations, or fishing license, for example, the government may set limits on my fishing period, type, etc., if I do not comply, I am afraid that my permit will be revoked. The same is true of the GPL. The GPL version 3 Chinese documentation briefly explains the rights and obligations of your license:
Free software emphasizes freedom, not free. The GNU general public license is designed to ensure that you enjoy the freedom of distributing free software (you can charge for this service), you can ensure that when needed to get the software source code, make sure you can modify the software or in new free software reuse some of the fragments, and make sure you enjoy the right to know in this respect.
In order to protect your rights, we need to make some restrictions: no one can deny you these rights, or ask you to give them up. So when you distribute or modify this software, you have a certain responsibility — to respect the freedom of others. If you distribute copies of such programs, whether for a fee or for free, you must give them the same rights as you. You also want to make sure that they also receive the source code and know their rights.
Sounds like said the GPL would like a virus infection, but it’s not so scary, it’s also a lot of people on a misunderstanding of the GPL, in fact as emphasized above, the GPL is just make sure you have a free to distribute software and the corresponding obligations, in other words, you can choose to replace the corresponding code, the GPL or stop continue to distribute:
The freedom of others shall not be sacrificed
Even if you are faced with conditions (from court requirements, agreements or otherwise) that conflict with the terms of this Agreement, that does not justify your breach of this Agreement. You may not forward the Program if you do not meet the requirements of this Agreement and other documents in forwarding the Protected Works. For example, if you agree to a clause that requires you to pay royalties to the person you are forwarding for reforwarding, the only way to satisfy both it and this agreement is not to forward the program.
Using gPL-compliant code as a training set, is the algorithm trained to be bound by the GPL?
This seems more consistent with the GPL’s Object Code:
In a work in the form of object code, “corresponding source code” means all the source code required to modify the work and to generate, install, and (in the case of an executable work) the object code, including the scripts that control such behavior.
The algorithm should also be subject to GPL constraints on Object Code. Of course, there is always room for argument with this “other form” claim.
But there’s another way to answer this question, which is whether Github itself has the right to use user uploaded code as a data set
What rights github has over content uploaded by users
In github’s Terms of Service, Github sets out its rights to content uploaded by users:
4. Grant us permission
We need legal rights to serve you, such as hosting, publishing and sharing your content. You authorize us and our legal successors to store, archive, parse and display your content and to make accompanying copies, but only for the purposes of providing the Services, including progressive improvement of the Services. This license includes the right to: copy your content to our database and make a backup; Display to you and other users; Parse it into a search index or analyze it on our server; Share with other users; Execute (if your content is something like music or video).
This license does not grant GitHub the right to sell your content. Nor does it grant GitHub the right to distribute or use your content for purposes other than serving it, but as part of its rights to archive content, GitHub may allow our partners to store and Archive your content in a public repository associated with the GitHub Arctic Code Vault and the GitHub Archive Program.
The terms limit how much you can analyze the content you upload on the server — only if Github offers it to users.
Links to help you find the answers further
1. GPLv3 Chinese translation
2. English version of GPLv3
Github Terms of Service