Whether Offshore or Surface, we have it all, a lot of Server
options for various types of use!

The Chinese AI lab1 DeepSeek recently released their new reasoning model R1, which is supposedly
Unlike the big western AI labs, they’ve released a paper explaining what they did.
Like previous posts along these lines, this is more of my attempt to think out loud and internalize what I’ve learned by reading the paper. I’m not an expert in this area: I work on AI products at GitHub, but the emphasis there is on “product”, not on “AI”. Hopefully that makes this helpful to other non-experts – but it’s helpful to me, in any case.
Okay, let’s define “reasoning model”. A regular model takes a prompt and predicts the next n tokens (i.e. completing a sentence or answering a question). The model “thinks” (i.e. performs matrix multiplications) for exactly the same amount of time for each token. That means that the more time the model spends talking, the more time it has to spend on a question, and the better the answer you’ll get. That’s why prompts like “think step-by-step” and “spell out your reasoning before answering” are well-known to help.
A reasoning model attempts to bake that behaviour into the model itself. How OpenAI’s models work exactly is a trade secret, but one simple answer could go like this2:
Step (4) is as expensive as any big training run. But steps (2) and (3) are unique to training reasoning models, and are also very expensive. That’s because they require unrestricted access to a smart model and enough time to generate a huge volume of quality data. DeepSeek’s training is quicker because they don’t do either of those steps. Instead, they:
In short, this is a reinforcement learning approach, not a fine-tuning approach. There’s no need to generate a huge body of chain-of-thought data ahead of time, and there’s no need to run an expensive answer-checking model. Instead, the model generates its own chains-of-thought as it goes2. There are other points made in the DeepSeek-R1 paper, but I think this is by far the most important.
Aside from the cost benefits, I believe there’s also a potential quality benefit to DeepSeek’s approach. OpenAI’s (supposed) approach above can only reason as well as the best moments of its original smart model, because it’s predicting the exact reasoning steps that the original model gave. DeepSeek’s approach can theoretically reason much better than the original model, because as it keeps learning, it’s providing its own brand-new reasoning chains that are only assessed by the quality of the conclusion. In my view, this is much more likely to lead to the truly alien superintelligent reasoning that people have been anticipating (and that we already see from superintelligent chess programs).
Is DeepSeek’s approach just better, then? I don’t think so. Restricting your training process to chains-of-thought that can be verified mechanistically (i.e. without a model) means that you can only really train the model on coding and mathematics. There’s just no way to do a logical word puzzle, or a legal analysis, or any of the other forms of reasoning we might want out of a reasoning model.
It’s theoretically possible that this doesn’t matter, because superintelligence in coding/mathematics might transfer to other domains. As I understand it, we’ve sort of seen that happen in normal models – as they’re trained on more code, they get better at non-code domains. But it remains to be demonstrated in practice. I don’t think Deepseek-R1 is currently crushing the humanities.
This is a relatively straightforward approach that others must have thought of. Why did it happen now and not a year ago? The most compelling answer is probably this: open-source base models had to get good enough at reasoning that they could be RL-ed into becoming reasoning models. It’s plausible that a year ago that wasn’t the case. A less compelling answer: the quality of reasoning-based benchmarks is much higher now than it was. For this approach to work, you need to be able to feed the model a ton of problems that require reasoning to solve (otherwise it’ll jump straight to the solution). Maybe those problems have only recently become available.
Source: Sean Goedecke
Read other news at our blog
In need of a Web Server? Take a look at our services