I want AIs that clean my apartment while I watch rugby, not AIs that watch rugby while I clean the apartment. ;)
In seriousness, this is a cool project and show how sophisticated analysis LLMs can do in a plug and play manner. They may not always be the best solution but a fantastic baseline that can be deployed and adapted to a usecase in less than an hour.
The scope is a bit different. The study uses an LLM to interpret pose estimation data and describe the behavior in each frame. The output is text which can be used to create embeddings of behavior. As someone who works in ethology, that's a clever (but maybe expensive) idea.
I think the author could use something similar. With multi-person pose estimation models.
I don't think it's possible to be in compliance with every law in every jurisdiction simultaneously. There are over 300,000 federal laws in the US, and apparently no one knows how many laws each of the 50 states has. That's 1 of the world's 195 countries
Why the focus on scorekeeping? I feel like an AI model is overkill here, when you have text-based sources readily available such as news apps, Twitter feeds, and apps such as Livescore which would be easier and cheaper to scrape. They probably cover more matches that aren't televised too.
I'd be curious to see what useful insights could be gleamed from the match commentary. You have the main commentator giving play-by-play objective reporting and then a 'colour' commentator giving some subjective analysis during breaks in play. I bet there's a lot of interesting ways this could be used.
The AI's job as described in this article is two-fold:
- The relatively trivial task of extracting textual data from the screen.
- The task of obfuscating that they're publishing other people's work as their own.
When I clicked the article I assumed they'd try to automatically construct analysis of the game by using AI to analyze frames of the game, but that's not what they are doing. They are extracting some trivial information from the frames, and then they process the audio of the referee mic and commentary.
In other words, the analysis has already been done by humans and they just want to re-publish this analysis as their own, without paying money for it. So they run it through an AI because in today's legal environment this seems to completely exempt you from copyright infringement or plagiarism laws.
Perhaps the most surprising thing about the whole LLM revolution is how quickly attitudes about IP have shifted in the HN and similar communities.
A few years ago, media companies were rent-seeking parasites who leveraged the jack-booted thugs of law enforcement to protect an artificial monopoly using IP laws that were massive overreach and contrary to the interests of humanity.
Today, suddenly, media companies are pillars of society whose valuable contributions must be protected from the scourge of theft by everything from VC backed AI companies to armchair hackers who don’t respect the sanctity of IP.
It’s amazing how mutable these principles are. I’m sure plenty of people are somewhere between the two extreme, but the shift is so dramatic that I am 100% sure many individuals have completely revised their opinions of IP companies based largely on worries about their own work being disrupted.
At the very least it should create some empathy for the lawyers and business folk we all despised for their rent-seeking blah blah blah. They were just honestly espousing the positions their financial incentives aligned them to.
How do you know you're seeing peoples' opinions change, and not just a change in which people express their opinions?
That said I'd personally be happy if LLMs cause the death (or drastic weakening) of copyright and IP laws, however as it is now, with no copyright for AIs but the same old copyright for humans, it's the worst of both worlds.
I know people personally with strong gripes about AI "infringement" (in quotes because I believe people are just confused about how these models work), and every single one of them -100%- have a stash of pirated media they casually accumulated over the years.
People are in it for themselves. When you are young everyone has righteous ideals, but then trends of society eventually ebb, and you realize that just about everyone was simply virtue signalling, and few people are committed even to their own detriment.
2005: "End copyright! Trash IP law! Liberate media!"
2025: "Strengthen Copyright! Extend IP Protection! Protect makers!"
>I know people personally with strong gripes about AI "infringement" (in quotes because I believe people are just confused about how these models work), and every single one of them -100%- have a stash of pirated media they casually accumulated over the years.
I don't know them, of course, but it is a consistent and imho reasonable position to be against copyright yet, while we normal people live in fear of copyright, ask for it to be applied to AI as well.
It is even reasonable IMO to be against copyright for individuals but in favour of copyright for businesses. That's how it de facto works in a lot of places anyway.
Not commenting on general trends, but I don't think my opinion on IP shifted massively as a result of the rise of LLMs. I can summarize it as follows:
- It seems desirable to have some system that allows creatives to be paid for their work.
- Whether current IP law is the best system we can come up with is highly debatable. But nevertheless it is the system we have, and its existence is to some extent justified.
- If we look at the "pefect case" where IP law functions as intended (for example, an author publishes a book in which they invested years of their life), then breaking IP law (sharing that author's work without their consent) in that instance seems, to me, immoral.
- Nevertheless there are plenty of excesses in the system where I would judge that the application of IP law is unjustified and breaking the law is morally justified (naturally I still don't recommend it). This includes, for example, paywalled papers from publicly-funded research, works that can no longer reasonably be purchased (for example games for old consoles), most if not all software patents, ...
So the question simply boils down to: is sports commentary justifiably protected under IP law? I think the answer is a pretty clear-cut "yes" here, I don't see how it falls under any case of IP law overreach.
The only interesting part of the model's output was
{
"current_play": "ruck",
}
So the vision model can correctly identify that there's a ruck going on and that the ball is most likely in the ruck.
Why not build on this? Which team is in possession? Who was the ball carrier at the start of the ruck, and who tackled him? Who joined the ruck, and how quickly did they get there? How quickly did the attacking team get the ball back in hand, or the defending team turn over possession? What would be a good option for the outhalf if he got the ball right now?
All of these except the last would be straightforward enough for a human observer with basic rugby knowledge going through the footage frame by frame, and I bet it would be really valuable to analysts. It seems like computer vision technology is at a stage where this could be automated too.
Multiple companies sell Rugby data of various levels of granularity. I don't know if rugby has all the toys (i.e. full tracking outside of wearables) that soccer or American football have because there's less money sloshing around.
Most pros now have the vests, but also they tend to have additional tech in their mouth guards. This is mostly for CTE monitoring, but I imagine that there's other data that can be extracted
> curious how “an AI can do it” yields much difference in terms of result for the casual watcher
An AI can do it in volume, and therefore cheaper. I don't think a human could do everything I said in real time - maybe with a lot of training and custom software.
A human could transcribe the scoreboard, but the article still thinks that's an interesting application of cutting-edge machine vision.
Humans can do _most_ of what you said in real time, both providers using bespoke software and club analysts using off the shelf stuff like Sportscode. For full positional data on every player, every frame then yes, computer vision is doing most of the work but the quality isn't always great. Providers with in-stadium multi-camera systems provide great data, but you don't necessarily have access to the size of dataset you'd want for recruitment, and so lower-quality broadcast tracking exists (with all the problems you can imagine like missing players, occlusions, crazy camerawork etc). Most clubs also have wearables for their own analysis. Almost every fully automated broadcast tracking solution has hit a wall (sometimes on the first day of a season) in terms of quality that is often only solved by human QA, or by just discarding some games, so this is far from a completely solved problem. Fun domain to work in, but lots of horrible edge cases.
If this is the final product, not much difference at all.
But where the human version is pretty much as far as it’s going to go, this is v0.01 of the AI version. Pretty soon the AI will be predicting what will happen next, commenting on whether this was a good idea (based on statistics), and letting the viewer ask questions about what exactly happened and why.
More seriously though, the JSON example from a vision language model is interesting but does not take into account how much extrapolation (hallucination) the model will insert over time.
For instance, even if not visible in the image, your VLM will probably start inserting details (such as the color of the team's jersey) based on knowing the team's three-letter identifier.
So the reliability of the system will go down over time, and it probably compounds if you're using some of that info to feed further steps in the loop.
I wrote a similar script that used a TV tuner during the last World Cup. Since I had an ATSC source, I was able to just pull the CTA-708 captions directly and with little delay.
My observation is that watching Rugby on TV is no the same as watching a Rugby match. You're watching something where choices have been made around what you're to see, so your model is already restricted in what it can see.
You really need to take a 'full pitch' feed directly from the venue, rather than what is broadcast.
I'm not a rugger bugger but every 5 seconds doesn't really seem like often enough to be taking screenshots. In soccer anyway, a lot can happen in 5 seconds.
My american football brain had the same reaction. Many of the most pivotal plays are replayed in slow motion as commentators and spectators debate on what actually happened and if the refs got the call right. Also, the average play (ie. 'down') is 4-5 seconds, so not nearly enough data to determine what is going on.
I don't quite get how diffing frames allows you to find the scores.
TFA mentions comparing a frame with and without - but how do you generate that frame without? If you can already do it, what's useful about doing that?
He's diffing the frames, and then the only pixels that stay the same are the UI, from which he doesn't directly get the UI (see the example, it's illegible) but he can extract the POSITION of the UI on the screen by finding all the non-red pixels.
And then he does a good ol' regular crop on the original image to get the UI excerpt to feed the vision model.
I think the text is wrong, it's diffing two frames and the areas that are the same are where the scorebaord is as this doesn't change between frames but everything else does.
I was also confused by this. I think you're right, but in the original text they specifically mention a 'static background' that they remove, so it's not just a simple 'wrong way round' error, it's a fundamental misunderstanding of what's happening. Makes me wonder if the author actually knew what they were doing, or just using an LLM to vibe-code everything.
I want AIs that clean my apartment while I watch rugby, not AIs that watch rugby while I clean the apartment. ;)
In seriousness, this is a cool project and show how sophisticated analysis LLMs can do in a plug and play manner. They may not always be the best solution but a fantastic baseline that can be deployed and adapted to a usecase in less than an hour.
Better title: "LLM OCR on Rugby screenshots to read score and clock"
The moment I started reading this, I got reminded of this recent study: https://arxiv.org/html/2503.10212v1
The scope is a bit different. The study uses an LLM to interpret pose estimation data and describe the behavior in each frame. The output is text which can be used to create embeddings of behavior. As someone who works in ethology, that's a clever (but maybe expensive) idea.
I think the author could use something similar. With multi-person pose estimation models.
Reading the scoreboard from a TV screen and selling that data is restricted in many jurisdictions. This work is pretty naive I think.
Has there ever been a hacker whose top priority is ensuring compliance with every regulation in every jurisdiction worldwide?
I'd say it isn't in the US though, at least there's precedent that makes me think that: https://en.m.wikipedia.org/wiki/National_Basketball_Ass%27n_...
I don't think it's possible to be in compliance with every law in every jurisdiction simultaneously. There are over 300,000 federal laws in the US, and apparently no one knows how many laws each of the 50 states has. That's 1 of the world's 195 countries
Good thing they're in only one jurisdiction, not many.
Ugh. You’re the worst
Why the focus on scorekeeping? I feel like an AI model is overkill here, when you have text-based sources readily available such as news apps, Twitter feeds, and apps such as Livescore which would be easier and cheaper to scrape. They probably cover more matches that aren't televised too.
I'd be curious to see what useful insights could be gleamed from the match commentary. You have the main commentator giving play-by-play objective reporting and then a 'colour' commentator giving some subjective analysis during breaks in play. I bet there's a lot of interesting ways this could be used.
The AI's job as described in this article is two-fold:
- The relatively trivial task of extracting textual data from the screen.
- The task of obfuscating that they're publishing other people's work as their own.
When I clicked the article I assumed they'd try to automatically construct analysis of the game by using AI to analyze frames of the game, but that's not what they are doing. They are extracting some trivial information from the frames, and then they process the audio of the referee mic and commentary.
In other words, the analysis has already been done by humans and they just want to re-publish this analysis as their own, without paying money for it. So they run it through an AI because in today's legal environment this seems to completely exempt you from copyright infringement or plagiarism laws.
Perhaps the most surprising thing about the whole LLM revolution is how quickly attitudes about IP have shifted in the HN and similar communities.
A few years ago, media companies were rent-seeking parasites who leveraged the jack-booted thugs of law enforcement to protect an artificial monopoly using IP laws that were massive overreach and contrary to the interests of humanity.
Today, suddenly, media companies are pillars of society whose valuable contributions must be protected from the scourge of theft by everything from VC backed AI companies to armchair hackers who don’t respect the sanctity of IP.
It’s amazing how mutable these principles are. I’m sure plenty of people are somewhere between the two extreme, but the shift is so dramatic that I am 100% sure many individuals have completely revised their opinions of IP companies based largely on worries about their own work being disrupted.
At the very least it should create some empathy for the lawyers and business folk we all despised for their rent-seeking blah blah blah. They were just honestly espousing the positions their financial incentives aligned them to.
How do you know you're seeing peoples' opinions change, and not just a change in which people express their opinions?
That said I'd personally be happy if LLMs cause the death (or drastic weakening) of copyright and IP laws, however as it is now, with no copyright for AIs but the same old copyright for humans, it's the worst of both worlds.
I know people personally with strong gripes about AI "infringement" (in quotes because I believe people are just confused about how these models work), and every single one of them -100%- have a stash of pirated media they casually accumulated over the years.
People are in it for themselves. When you are young everyone has righteous ideals, but then trends of society eventually ebb, and you realize that just about everyone was simply virtue signalling, and few people are committed even to their own detriment.
2005: "End copyright! Trash IP law! Liberate media!"
2025: "Strengthen Copyright! Extend IP Protection! Protect makers!"
>I know people personally with strong gripes about AI "infringement" (in quotes because I believe people are just confused about how these models work), and every single one of them -100%- have a stash of pirated media they casually accumulated over the years.
I don't know them, of course, but it is a consistent and imho reasonable position to be against copyright yet, while we normal people live in fear of copyright, ask for it to be applied to AI as well.
It is even reasonable IMO to be against copyright for individuals but in favour of copyright for businesses. That's how it de facto works in a lot of places anyway.
Not commenting on general trends, but I don't think my opinion on IP shifted massively as a result of the rise of LLMs. I can summarize it as follows:
- It seems desirable to have some system that allows creatives to be paid for their work.
- Whether current IP law is the best system we can come up with is highly debatable. But nevertheless it is the system we have, and its existence is to some extent justified.
- If we look at the "pefect case" where IP law functions as intended (for example, an author publishes a book in which they invested years of their life), then breaking IP law (sharing that author's work without their consent) in that instance seems, to me, immoral.
- Nevertheless there are plenty of excesses in the system where I would judge that the application of IP law is unjustified and breaking the law is morally justified (naturally I still don't recommend it). This includes, for example, paywalled papers from publicly-funded research, works that can no longer reasonably be purchased (for example games for old consoles), most if not all software patents, ...
So the question simply boils down to: is sports commentary justifiably protected under IP law? I think the answer is a pretty clear-cut "yes" here, I don't see how it falls under any case of IP law overreach.
The only interesting part of the model's output was
{ "current_play": "ruck", }
So the vision model can correctly identify that there's a ruck going on and that the ball is most likely in the ruck.
Why not build on this? Which team is in possession? Who was the ball carrier at the start of the ruck, and who tackled him? Who joined the ruck, and how quickly did they get there? How quickly did the attacking team get the ball back in hand, or the defending team turn over possession? What would be a good option for the outhalf if he got the ball right now?
All of these except the last would be straightforward enough for a human observer with basic rugby knowledge going through the footage frame by frame, and I bet it would be really valuable to analysts. It seems like computer vision technology is at a stage where this could be automated too.
Multiple companies sell Rugby data of various levels of granularity. I don't know if rugby has all the toys (i.e. full tracking outside of wearables) that soccer or American football have because there's less money sloshing around.
Most pros now have the vests, but also they tend to have additional tech in their mouth guards. This is mostly for CTE monitoring, but I imagine that there's other data that can be extracted
ESPN has play by play stuff for free like this on their website for some other sports
not sure if it is done by a human or not
curious how “an AI can do it” yields much difference in terms of result for the casual watcher
> curious how “an AI can do it” yields much difference in terms of result for the casual watcher
An AI can do it in volume, and therefore cheaper. I don't think a human could do everything I said in real time - maybe with a lot of training and custom software.
A human could transcribe the scoreboard, but the article still thinks that's an interesting application of cutting-edge machine vision.
Humans can do _most_ of what you said in real time, both providers using bespoke software and club analysts using off the shelf stuff like Sportscode. For full positional data on every player, every frame then yes, computer vision is doing most of the work but the quality isn't always great. Providers with in-stadium multi-camera systems provide great data, but you don't necessarily have access to the size of dataset you'd want for recruitment, and so lower-quality broadcast tracking exists (with all the problems you can imagine like missing players, occlusions, crazy camerawork etc). Most clubs also have wearables for their own analysis. Almost every fully automated broadcast tracking solution has hit a wall (sometimes on the first day of a season) in terms of quality that is often only solved by human QA, or by just discarding some games, so this is far from a completely solved problem. Fun domain to work in, but lots of horrible edge cases.
If this is the final product, not much difference at all.
But where the human version is pretty much as far as it’s going to go, this is v0.01 of the AI version. Pretty soon the AI will be predicting what will happen next, commenting on whether this was a good idea (based on statistics), and letting the viewer ask questions about what exactly happened and why.
I love that as soon as he writes,
> The plan was simple.
You know you're in for a funny read.
More seriously though, the JSON example from a vision language model is interesting but does not take into account how much extrapolation (hallucination) the model will insert over time.
For instance, even if not visible in the image, your VLM will probably start inserting details (such as the color of the team's jersey) based on knowing the team's three-letter identifier.
So the reliability of the system will go down over time, and it probably compounds if you're using some of that info to feed further steps in the loop.
I wrote a similar script that used a TV tuner during the last World Cup. Since I had an ATSC source, I was able to just pull the CTA-708 captions directly and with little delay.
My observation is that watching Rugby on TV is no the same as watching a Rugby match. You're watching something where choices have been made around what you're to see, so your model is already restricted in what it can see.
You really need to take a 'full pitch' feed directly from the venue, rather than what is broadcast.
I'm not a rugger bugger but every 5 seconds doesn't really seem like often enough to be taking screenshots. In soccer anyway, a lot can happen in 5 seconds.
My american football brain had the same reaction. Many of the most pivotal plays are replayed in slow motion as commentators and spectators debate on what actually happened and if the refs got the call right. Also, the average play (ie. 'down') is 4-5 seconds, so not nearly enough data to determine what is going on.
I don't quite get how diffing frames allows you to find the scores.
TFA mentions comparing a frame with and without - but how do you generate that frame without? If you can already do it, what's useful about doing that?
He's diffing the frames, and then the only pixels that stay the same are the UI, from which he doesn't directly get the UI (see the example, it's illegible) but he can extract the POSITION of the UI on the screen by finding all the non-red pixels.
And then he does a good ol' regular crop on the original image to get the UI excerpt to feed the vision model.
I think the text is wrong, it's diffing two frames and the areas that are the same are where the scorebaord is as this doesn't change between frames but everything else does.
I was also confused by this. I think you're right, but in the original text they specifically mention a 'static background' that they remove, so it's not just a simple 'wrong way round' error, it's a fundamental misunderstanding of what's happening. Makes me wonder if the author actually knew what they were doing, or just using an LLM to vibe-code everything.
So Rugby is missing a lot of data beside the scoreline, so they created an AI that can extract the scoreline.
Does this mean there's probably AI that's already watching high profile football (soccer) matches?
Depends on your definition of AI, but yes, lots of them, and not just the high profile matches.
> Sending a full-resolution screenshot every five seconds gets expensive fast.
For now.
Why does yolo not work?
aw. I thought this would be about an AI cat that makes wrong commentaries that you can make pointless arguments against. There should be one.
I want AI that does my job while I watch rugby
> We can’t hire analysts to watch every match and enter data manually.
I'm surprised there's not enough fans willing to do that if you could gamify it.
This is a position in baseball. https://www.wbur.org/news/2025/03/30/fenway-park-boston-base... Here's a radio piece about the official fenway park score keeper from two weeks ago
[dead]
TL;DR: It extracts the score from the video and gets text from the commentary in the audio.
I was hoping for more.
[dead]