Speaking about the potential size of the model, OpenAI chief Sam Altman famously stated that “people are begging to be disappointed, and they will be.”
Rumours, ironically started by Lex Friedman, ahead of the model’s launch suggested that it would have trillions of parameters and be the best thing that the world has ever seen. However, the reality is, in the process of making GPT-4 better than GPT-3.5, OpenAI might have bitten off more than it can chew.
World-renowned hacker and software engineer, George Hotz, recently appeared on a podcast speculating about the architectural nature of GPT-4. Hotz stated that the model might be a set of eight distinct models, each featuring 220 billion parameters. This speculation was later confirmed to be true by Soumith Chintala, the co-founder of PyTorch.
While this puts the parameter size of GPT-4 at 1.76 trillion, the notable part is that all of these models don’t work at the same time. Instead, they are deployed as a mixture of expert architecture.
Hotz also speculated that the model may be relying on the process of iterative inference for better output. In this process, the output or inference result of the model is refined through multiple iterations.
This method also might allow GPT-4 to get inputs from each of its expert models, which could reduce hallucinations. Hotz stated that this process might be done 16 times, which would vastly increase the operating cost of the model.
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.