GLOSSARY: It’s Now or Never for Meta

суббота, 23 сентября 2023 г.

It’s Now or Never for Meta

Can't read or see images? View this email in a browser

Meta’s LLaMA is in a tight spot. It’s not only facing threats from closed source OpenAI, Gemini and others, but fighting it out on the open source front as well from the likes of Falcon 180B.

Meta is not oblivious to the competitive landscape. According to several leaks on discussion platforms, it has been working on Llama 3, and there have been diverse expectations and predictions around it.

https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExNGV2eDM3aWVxeDh5NzY1ODFlMXYxa21oaWY0bTBnZGVoaGhmbXlkNyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3ohzdDVC6SAniR17l6/giphy.gif

The developer ecosystem anticipates that Llama 3 will have an infusion of high-quality training data, perhaps something akin to Phi 1.5, to catapult its performance to new heights. Excitement is high concerning the potential expansion of tokens and further forays into exploring the scaling laws. Another hot topic revolves around the concept of Mixture-of-Architecture, a statistical approach poised to address the shortcomings of parametric architecture, possibly surpassing individual experts or submodels.

Llama 3 is also anticipated to introduce multimodal capabilities into the open-source arena. Meta stands ready to leverage its ecosystem of multimodal models, a domain populated by the likes of mPLUG-Owl, llava, minigpt4, and blip2, all rooted in the robust foundation of LLaMA.

Besides fighting its own battle, LLaMA also has to meet the expectations of the open source AI community, which now heavily relies on it. The open-source LLM leaderboard is filled with models fine-tuned on LLaMA, six from the top at least are LLaMA-based, namely, Uni-TianYan, FashionGPT, sheep-duck, Orca, to GenZ models.

However, the Llama 3 delay has sent the open source community into panic mode. They realise that the delay would mean that the open-source community would lag behind. At this crucial stage, Meta can’t afford to say “better late than never” and rather focus on “now or never”.

Read the full story here.

OpenAI’s Race to the Finish

Google has been hyping Gemini for a while now, but OpenAI recently stole the spotlight by revealing plans to integrate DALL-E 3 with ChatGPT Plus and ChatGPT Enterprise. This strategic move positions GPT-4 as the first functional multimodal model, generating both text and images, exactly what Gemini promised. Google responded by extending Bard's capabilities, allowing image uploads with Lens and incorporating Search images into responses, an attempt to go multimodal. However, OpenAI's DALL-E-integrated ChatGPT Plus, set to launch in October, poses a significant challenge.

OpenAI's move doesn't just impact Google, it also pressures other text-to-image generation models like Midjourney and Stable Diffusion, as DALL-E 3 has demonstrated its image generation prowess.

Read the full story here.

Pixels at War

Which is better, DALL.E or Midjourney? The discussion has once again taken centre stage with the announcement of the integration of DALL.E in ChatGPT Plus and ChatGPT Enterprise. Users, who tested DALL-E 3, found that it outperformed Midjourney, offering superior image quality and prompt coherence. DALL-E 3's simplicity in text prompts allows users to engage in natural conversations with the chatbot for precise image output. In contrast, Midjourney, while versatile and feature-rich for image creation and editing, is primarily Discord-based, complicating its accessibility.

Read the full story here.

Microsoft’s Cutting-edge Tool

https://media0.giphy.com/media/v1.Y2lkPTc5MGI3NjExcTE0Y2g0c3o2NTFzcjMwdzU3ZjRpaXU5NzExajJ5bHhnZmY5OGZ3bSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/FaIDQOgN69FXq/giphy.gif

Microsoft is unveiling Kosmos-2.5, a cutting-edge literate model that revolutionises the domain of multimodal AI, specifically tailored for comprehending text-intensive images. A successor to Kosmos-1 and Kosmos-2, this model has undergone rigorous training on extensive datasets, enabling two crucial transcription tasks.

Firstly, Kosmos-2.5 excels at spatially-aware Text Blocks, proficiently generating text within images while precisely assigning spatial coordinates. This enhances its ability to provide structured and coherent textual descriptions of image content. Additionally, it adeptly produces Structured Markdown Text Output, ensuring the extracted text is presented in an organised format.

Microsoft's Kosmos-2.5 represents a significant leap in scaling multimodal large language models, holding transformative potential for AI and image-text comprehension. This innovation builds upon Kosmos-1's foundation, emphasising the fusion of language, action, multimodal perception, and world modelling, driving progress towards AGI.

Read the full story here.