ebbi 3 hours ago

Can someone give me a ELI5 on what this is/does? I'm a non-coder, and recently gotten into diving into the world of AI, but I'm not sure what this is and where it sits in context with tools that I currently use (ChatGPT, Claude Code, Cursor).

  • fragmede 3 hours ago

    Leading AI researcher Andrej Karpathy created a NanoLLM using available training data and $100 worth of (high-end) rented Cloud computer time. The original post is https://github.com/karpathy/nanochat/discussions/1 The post this is on is commentary from simonw about Karpathy's post. The NanoLLM he created is, um, not very good. So you wouldn't want to use it for anything other than learning and entertainment. But it's really small, which means it runs on small underpowered computers. There's a web-gui, so you interact with it just like ChatGPT on your little computer. Also for learning purposes, Karpathy shared the code he used to create NanoLLM, so you can run it at home and create your own model and chat with it.

    Given that GPT-5 reportedly cost $100 million to train, being able to create one, even a terrible one, for $100, shows how the field keeps marching on.

    • ebbi 2 hours ago

      Thank you! So if I were to, say, build my own SaaS product that I wanted AI capabilities in, I could theoretically use NanoLLM to train data on my domain-specific stuff to have a domain-specific trained LLM to use in my product without having recurring fees from an API provider like OpenAI?

      • fragmede 31 minutes ago

        Technically yes, but NanoLLM is stripped down and targeted more words educating AI researchers so I wouldn't recommend you use it for that (because it's output isnterrible compared to ChatGPT) (intentionally, it's a teaching tool). Nothing stopping you, but for that goal, I'd recommend starting with one of the downlodable permissibly license models like a newer Qwen3 and fine tune it. Google Collab has notebooks specifically for that.

        Once you have your fine tuned model, then you wouldn't be paying OpenAI to use it, but it would need to be run somewhere, and those somewheres range in quality and price. Models come in various shapes and sizes and the bigger the model, the beefier (and more expensive to rent) a computer you need to operate this SaaS business.

Tepix 14 hours ago

Amazingly, you can also do it on smaller hardware!

From the readme:

All code will run just fine on even a single GPU by omitting torchrun, and will produce ~identical results (code will automatically switch to gradient accumulation), but you'll have to wait 8 times longer. If your GPU(s) have less than 80GB, you'll have to tune some of the hyperparameters or you will OOM / run out of VRAM. Look for --device_batch_size in the scripts and reduce it until things fit. E.g. from 32 (default) to 16, 8, 4, 2, or even 1. Less than that you'll have to know a bit more what you're doing and get more creative.

ChrisArchitect a day ago

[flagged]

  • marmaglade 18 hours ago

    It’s not, it’s a different blog post on the same thing

    • ChrisArchitect 9 hours ago

      The point is it's a duplicate discussion. Different article doesn't matter, especially when it hardly adds anything. The discussion is over there.

    • Kiro 14 hours ago

      They always do that, linking to a thread for another article claiming it's a dupe.