Have LLMs Hit the Wall?
According to an article in The Information, OpenAI's progress from GPT-4 to o1 has reportedly slowed down. Although o1 has completed only 20% of its training, it’s already matching GPT-4 in intelligence, task fulfilment, and question-answering abilities.
However, the improvement isn’t as dramatic as the leap from GPT-3 to GPT-4. This has led many to wonder: Have LLM improvements hit a dead-end?
Critics Rejoice Too Soon: No one seemed more thrilled about this potential plateau than AI critic Gary Marcus, who promptly posted on X, “Folks, game over. I won. GPT is hitting a period of diminishing returns, just like I said it would.”
However, Uncle Gary may have celebrated a bit too early. One of the article’s authors quickly responded, “With all due respect, the article introduces a new AI scaling law that could replace the old one. The sky isn’t falling.”
Similarly, OpenAI researchers were quick to correct the narrative, asserting that the article inaccurately portrays the progress of their upcoming models.
Introducing Inference Time Scaling: “There are now two key dimensions of scaling for models like the o1 series—training time and inference time," said Adam Goldberg, a founding member of OpenAI’s go-to-market team. While traditional scaling laws focus on pre-training larger models for longer, there’s now another important factor at play.
“Aspect of scale remains foundational. However, the introduction of this second scaling dimension is set to unlock amazing new capabilities,” he added.
A New Way of “Thinking”
OpenAI researcher Noam Brown elaborated that o1 is trained with reinforcement learning (RL) to “think” before responding via a private chain of thought. “The longer it thinks, the better it performs on reasoning tasks,” he explained. This introduces a new dimension to scaling. “We’re no longer bottlenecked by pre-training. We can now scale inference compute as well.”
Another researcher, Jason Wei, explained the difference in the chain of thought before and after o1. Traditional chain-of-thought reasoning used by AI models like GPT was more mimicry than true thinking. The model would often reproduce reasoning paths it encountered during pre-training.
With o1, the system introduces a more robust and authentic thinking process. Instead of simply spitting out an answer, the model engages in an “inner monologue” or “stream of consciousness,” actively considering and evaluating options. “You can see the model backtracking; it says things like ‘alternatively, let’s try’ or ‘wait, but’,” he added.
The Power of Test-Time Compute: Peter Welinder, the VP of product at OpenAI, emphasised the underestimated power of test-time compute. “Compute for longer, in parallel, or fork and branch arbitrarily—like cloning your mind 1,000 times and picking the best thoughts,” he said.
Earlier, when OpenAI released o1-mini and o1-preview, it mentioned that o1's performance consistently improves with more reinforcement learning (train-time compute) and more time spent thinking (test-time compute).
Regarding inference time scaling, the company said, “The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.”
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.