Of course real scientists don't publish. Okay I'm trolling but I know that the science published by Google et al has been thoroughly screened for commercial advantage before we see it. Try doing a litt search on hypersonics and you can see where the Russians _stop_ publishing.
Copyright is about extracting a perpetual tax on culture from the peasants. It's not about hobbling the march of progress itself, not when the people who get to levy the culture tax will eventually get to cash in on the wonders that will ensue. Didn't anyone ever inform you?
This is pretty clearly an instance of the right people (i.e. rich people) being allowed to pirate, and the poor people get in trouble for copyrighted music in the background of some video clip.
>Yeah... Laws exist only if they are applied equally, else they become something else.
This is bullshit and you know it. No one would ever want to go to the trouble, expense, and misery of enforcing laws where the so-called victims do not feel wronged and refuse to press complaints to the authorities. And so, copyright enforcement will only ever occur where the rightsholders wish to enforce it. The few lawsuits you see here and there aren't about genuine sentiment of being wrong, but of the rightsholders wanting their cuts. Backroom deals are already being drawn up, and everyone's cool with it.
This is the status quo. If you don't like it, suggest something better, but don't be naive that the law as it stands now could be applied sanely or pragmatically.
YouTube routinely demonetizes and suspends people for"copyright violations" with no recourse. Including people for using their own material.
SciHub is banned and blocked in several countries, there are default rulings against it in the US etc. Oh, and White House called it "one of the most flagrant notorious market sites in the world".
But when you're an AI company with billions of dollars? Well, your doing it for the good of humanity or something, and of course everything you do is fair use etc.
The use of paywalled scientific articles to train AI is one place where I think we have to just draw the line and say, this has to be allowed or US AI is simply going to get gutted and replaced by international competitors who have no respect for copyright law.
Sorry but this is just a competitive reality and the content matters A LOT. Sucks that Elsevier gambled badly on the scientific community putting up with overpriced subscriptions forever, but their concerns can't dictate national policy on this.
Absolutely agree. Realistically, everyone was playing around with this thing because everyone was using Sci Hub, /r/Scholar, and god knows what else to get PDFs. This is one of those things where the reality is well-known and people pretend that something is actually going on in copyright enforcement.
And if I'm being honest, I'm tired of the International Brotherhood of Stevedores[0] style of shredding human productivity to protect some special interest group. If Elsevier died tomorrow, we'd lose a curation function to scientific papers, true, but we wouldn't lose the science itself. And while the curation on scientific output is clearly valuable - China is suffering the lack of this while producing prodigious science - I think it's far less important than the scientific output itself. This is especially true of US science.
0: IBS, the AMA, pharmacists, teacher unions, firefighter unions, tax preparers: the distributed cost to society is huge because we decided on protecting these special interest groups. Blocking AI would be a bridge too far.
I think this is one reason "piracy for AI" in general is tolerated. Anyone with a clear understanding of real world dynamics realizes that if a foreign country that lacks scruples develops "AGI", for lack of a better term, then you're fucked. This is in a sense a nuclear arms race.
The same applies between companies, by the way, hence the "AI bubble".
The other reason "piracy for AI" is tolerated is because it's not at all clear how to legislate or regulate it. You might think it's a cut and dry case, but lots of other people think the same about the opposite conclusion.
I agree, but only in the sense that I think any amount of copyright protection for scientific papers is absolutely absurd. The creativity involved in papers is minimal and a good chunk of that research is funded by the government, so paywalling it is criminally unethical.
Also, if we're going to bin the entire concept of copyright, can we at least be equal about it? I'd rather not live in a world where humans labor for the remnants of their culture in the content mines while clankers[0] feast on an endless stream of training data.
[0] Fake racial slur for robots or other AI systems.
I agree. I think that copyright should be abolished entirely, especially for scientific articles (if they are good quality scientific research then I think they would be too important to be copyrighted, in addition to the other stuff you mention), but also for anything else too.
Nevertheless I thin there is another thing against the LLM training, which is that the scraping seems to be excessive (although it could be made less excessive; there are many ways to help with making it less excessive) and I think it requires too much power (although I don't really know a lot about it).
> I think that copyright should be abolished entirely, especially for scientific articles
You know, it is really the CC-BY-style most science people care about. Same goes with MIT/BSD open source licenses, while with GPL I suppose it is one the side of CC-BY-SA.
Of course real scientists don't publish. Okay I'm trolling but I know that the science published by Google et al has been thoroughly screened for commercial advantage before we see it. Try doing a litt search on hypersonics and you can see where the Russians _stop_ publishing.
Copyright is about extracting a perpetual tax on culture from the peasants. It's not about hobbling the march of progress itself, not when the people who get to levy the culture tax will eventually get to cash in on the wonders that will ensue. Didn't anyone ever inform you?
This is pretty clearly an instance of the right people (i.e. rich people) being allowed to pirate, and the poor people get in trouble for copyrighted music in the background of some video clip.
The hypocrisy really grinds my gears.
Yeah... Laws exist only if they are applied equally, else they become something else.
And a monetary fine is just the cost of doing business if you are big enough.
>Yeah... Laws exist only if they are applied equally, else they become something else.
This is bullshit and you know it. No one would ever want to go to the trouble, expense, and misery of enforcing laws where the so-called victims do not feel wronged and refuse to press complaints to the authorities. And so, copyright enforcement will only ever occur where the rightsholders wish to enforce it. The few lawsuits you see here and there aren't about genuine sentiment of being wrong, but of the rightsholders wanting their cuts. Backroom deals are already being drawn up, and everyone's cool with it.
This is the status quo. If you don't like it, suggest something better, but don't be naive that the law as it stands now could be applied sanely or pragmatically.
Sorry, but what you just said is bullshit, and I'm not even sure you know it.
Plenty of copyright holders don't want their creations to be trained on LLMs, regardless of cut. There is no voice for them.
The general statement of laws being applied differently by size is also more and more obvious in the recent climate.
YouTube routinely demonetizes and suspends people for"copyright violations" with no recourse. Including people for using their own material.
SciHub is banned and blocked in several countries, there are default rulings against it in the US etc. Oh, and White House called it "one of the most flagrant notorious market sites in the world".
But when you're an AI company with billions of dollars? Well, your doing it for the good of humanity or something, and of course everything you do is fair use etc.
What chances do I have of winning a lawsuit against copilot for violating my copyright?
See? Laws are not applied equally.
The use of paywalled scientific articles to train AI is one place where I think we have to just draw the line and say, this has to be allowed or US AI is simply going to get gutted and replaced by international competitors who have no respect for copyright law.
Sorry but this is just a competitive reality and the content matters A LOT. Sucks that Elsevier gambled badly on the scientific community putting up with overpriced subscriptions forever, but their concerns can't dictate national policy on this.
Absolutely agree. Realistically, everyone was playing around with this thing because everyone was using Sci Hub, /r/Scholar, and god knows what else to get PDFs. This is one of those things where the reality is well-known and people pretend that something is actually going on in copyright enforcement.
And if I'm being honest, I'm tired of the International Brotherhood of Stevedores[0] style of shredding human productivity to protect some special interest group. If Elsevier died tomorrow, we'd lose a curation function to scientific papers, true, but we wouldn't lose the science itself. And while the curation on scientific output is clearly valuable - China is suffering the lack of this while producing prodigious science - I think it's far less important than the scientific output itself. This is especially true of US science.
0: IBS, the AMA, pharmacists, teacher unions, firefighter unions, tax preparers: the distributed cost to society is huge because we decided on protecting these special interest groups. Blocking AI would be a bridge too far.
I think this is one reason "piracy for AI" in general is tolerated. Anyone with a clear understanding of real world dynamics realizes that if a foreign country that lacks scruples develops "AGI", for lack of a better term, then you're fucked. This is in a sense a nuclear arms race.
The same applies between companies, by the way, hence the "AI bubble".
The other reason "piracy for AI" is tolerated is because it's not at all clear how to legislate or regulate it. You might think it's a cut and dry case, but lots of other people think the same about the opposite conclusion.
I agree, but only in the sense that I think any amount of copyright protection for scientific papers is absolutely absurd. The creativity involved in papers is minimal and a good chunk of that research is funded by the government, so paywalling it is criminally unethical.
Also, if we're going to bin the entire concept of copyright, can we at least be equal about it? I'd rather not live in a world where humans labor for the remnants of their culture in the content mines while clankers[0] feast on an endless stream of training data.
[0] Fake racial slur for robots or other AI systems.
I agree. I think that copyright should be abolished entirely, especially for scientific articles (if they are good quality scientific research then I think they would be too important to be copyrighted, in addition to the other stuff you mention), but also for anything else too.
Nevertheless I thin there is another thing against the LLM training, which is that the scraping seems to be excessive (although it could be made less excessive; there are many ways to help with making it less excessive) and I think it requires too much power (although I don't really know a lot about it).
These are two separate issues, though.
> I think that copyright should be abolished entirely, especially for scientific articles
You know, it is really the CC-BY-style most science people care about. Same goes with MIT/BSD open source licenses, while with GPL I suppose it is one the side of CC-BY-SA.