OpenAI has done it again. The hottest AI startup just released a new text-to-video tool ‘Sora,’ which has officially taken the internet by storm, and many people are calling this a ChatGPT moment in video generation.
Here’s a video generated by OpenAI chief Sam Altman, for CRED’s Kunal Shah’s prompt request on X: “A bicycle race on ocean with different animals as athletes riding the bicycles with drone camera view.” The result:
It gets even more crazy. Here’s another example of “A woman wearing purple overalls and cowboy boots taking a pleasant stroll in Mumbai, India during a winter storm.” More videos and technical details can be found here.
OpenAI’s all new text-to-video tool can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Sora is not just a video generation model for OpenAI, instead, it is also a stepping stone to AGI.
How it works? Sora takes inspiration from LLMs and uses transformer architecture that operates on spacetime patches of video and image latent codes.
While LLMs utilise text tokens, Sora uses visual patches. Patches serve as a highly scalable and effective representation for training generative models on various types of videos and images. Similar to how LLMs predict words, Sora is trained to predict the original 'clean' patches, given input noisy patches.
Sora has multiple features like animating DALL·E images, extending generated videos, video-to-video editing and connecting videos. However, apart from the video generation, the possibilities with this new AI tool are endless. It can simulate some aspects of people, animals and environments from the physical world. Enjoy the full story on how OpenAI Stole the Spotlight with Sora here.
Crushes GPT-4 and Claude
While OpenAI’s Sora has been stealing the spotlight, Google released one of the most powerful AI models of all time, Gemini 1.5. This new model outperformed ChatGPT (GPT-4 Turbo) and Claude with a 1 million token context window. In contrast, GPT-4 Turbo has a 128K context window and Claude 2.1 has a 200K context window.
Read the full story here.
Creator of Phi-2
“Microsoft loves SLMs,” said Satya Nadella, in the keynote, obsessing about small language models like Orca and Phi-2, which have been developed by Microsoft’s Research wing on highly specialised datasets. AIM got in touch with Harkirat Behl, a senior researcher in the Physics of AGI team of Microsoft Research, who is also one of the creators of Phi-1, Phi-1.5, and Phi-2, to discuss his contributions in the field, alongside upcoming projects.
Check out the full story here.
తెలుగు Llama
Telugu Llama is a passion project for both Ravi Theja and Ramsri Goutham Golla, who are creating Indic LLMs. Last week, they introduced Telugu-LLM-Labs, a collaborative independent effort wherein they released datasets translated and romanised in Telugu. Check out the full story here.
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.