| | | | | | Happy World Strawberry Day, Every-o1 🍓✨ The wait is finally over. And your friendly AI Human, Amit Raja Naik, is super excited to share everything we know about OpenAI’s all-new AI model, o1 (internally known as Project Strawberry/Q*). | | | | | | | | | | | | The o1 series models are trained to spend more time thinking before responding, refining their reasoning process and improving problem-solving capabilities. In initial tests, the next update of the reasoning model performed on par with PhD students on physics, chemistry, and biology tasks, achieving notable success in maths and coding competitions. In a qualifying exam for the International Mathematics Olympiad, the model scored 83%, compared to GPT-4o’s 13%. | | | | | | | “My student’s comment on the paper about LLMs generating more novel research ideas than humans is making the rounds. I think this says more about NLP researchers than about LLMs. Ouch,” joked Subbarao Kambhampati, professor of computer science at Arizona State University. “I am not gonna let no LLM beat me in generating novel NLP research ideas,” he quipped. Interestingly, Kambhampati has been quite vocal about LLMs being bad at reasoning and planning. He said that models like GPTs 3, 3.5, and 4 are poor at planning and reasoning, which he believes involves time and action. According to him, these models struggle with transitive and deductive closure, with the latter involving the more complex task of deducing new facts from the existing ones. Today researchers have not experimented much with LLMs to generate novel ideas, instead they have been predominantly using it to review research papers. Apparently, Meta AI chief Yann LeCun argues that while LLMs cannot reason and plan, they are still a good tool for reviewing papers. “[Human] reviewers should be able to use the tools they want to help them write reviews. The quality of their reviews should be assessed based on the result, not the process,” he said. Meta AI launched Galactica, an LLM for research, in November 2022, just weeks before ChatGPT. However, it was taken down after three days due to criticism over generating misleading or offensive information. LeCun remains unhappy about it to this day. | | | | | | | Despite its advanced reasoning abilities, the o1-preview model lacks some of the practical features found in GPT-4o, such as web browsing and file uploading. However, OpenAI emphasises the model’s potential for tackling complex tasks, particularly in fields requiring multi-step workflows. As part of the release, OpenAI has implemented a new safety training approach that allows the models to follow safety rules better. In jailbreaking tests, o1-preview outperformed GPT-4o, scoring 84 out of 100, compared to GPT-4o’s 22. OpenAI has also bolstered its safety efforts by partnering with AI safety institutes in the US and UK. Alongside o1-preview, OpenAI has released a smaller, cost-effective model called o1-mini, designed specifically for developers who need advanced coding capabilities without broad world knowledge. o1-mini is 80% cheaper than o1-preview. Starting today, ChatGPT Plus and Team users can manually select o1-preview and o1-mini from the model picker, with rate limits of 30 messages for o1-preview and 50 for o1-mini. API users in the highest usage tier can also begin prototyping, although some features like function calling and streaming are not available yet. OpenAI plans to expand access to o1-mini for ChatGPT free users and will continue adding new features to the o1 series, including browsing and file uploads. NVIDIA’s Jim Fan lauded OpenAI o1 for its focus on inference-time scaling rather than model size. He emphasised that large models are not necessary for reasoning, as reasoning can be separated from knowledge using a “small reasoning core” and tools like code verifiers. “You don’t need a huge model to perform reasoning... a small ‘reasoning core’ that knows how to call tools like browser and code verifier can factor out reasoning from knowledge,” he added. | | | | | | | Devin’s creator, Cognition Labs, worked closely with OpenAI over the past few weeks to evaluate OpenAI o1’s reasoning capabilities with Devin. They found that the new models represented a significant improvement for agentic systems that dealt with code. | | | | | | | A few days earlier, in a cryptic post, Altman had hinted that the company was working on a project internally known as Project Strawberry, also referred to as Q*. “I love summer in the garden,” wrote Altman on X, posting the image of a terracotta pot containing a strawberry plant with lush green leaves and small, ripening strawberries. Project Strawberry was said to significantly enhance the reasoning capabilities of OpenAI’s AI models. It is pretty clear that the o1-preview is exclusively Strawberry. | | | | | | | A few days earlier, in a cryptic post, Altman had hinted that the company was working on a project internally known as Project Strawberry, also referred to as Q*. “I love summer in the garden,” wrote Altman on X, posting the image of a terracotta pot containing a strawberry plant with lush green leaves and small, ripening strawberries. Project Strawberry was said to significantly enhance the reasoning capabilities of OpenAI’s AI models. It is pretty clear that the o1-preview is exclusively Strawberry. | | | | | Can o1 Save GitHub Copilot? | | | | | Ever since Cursor and Claude hit the market, developers have been slowly moving away from GitHub Copilot. According to sources, Microsoft has plans to upgrade its capabilities on the VS Code IDE, which would help it compete with Cursor. But what about GitHub Copilot? GitHub CEO Thomas Dohmke is optimistic. He posted on X a video of GitHub Copilot in VS Code running with OpenAI’s o1 model, which he calls “flat out badass”. The new model has been integrated into GitHub Copilot and is making AI pair programming a lot smarter. Meanwhile, developers have also started implementing o1 within Cursor Composer and have already started creating apps. Cursor being a fork of VS Code, enables much more flexibility when it comes to integrating LLMs within it, making it ideal for several developers. The competition now seems to be head-on between Cursor and GitHub Copilot as both can now run on o1, which according to developers, is currently performing better than Claude in certain use cases. | | | | | AMD Tries to Break NVIDIA’s CUDA Ecosystem with UDNA | | | | | | | AMD has announced a significant shift in its GPU architecture strategy with the introduction of UDNA (Unified Data and Neural Architecture). This new architecture aims to merge AMD’s existing RDNA (for gaming) and CDNA (for data centres) architectures into a single, unified platform. However, users allege that AMD has been partial in providing support, and is more inclined to providing better support to CDNA. RDNA requires per-generation optimisation. Due to this reason, AMD has to put a lot more effort into RDNA users. Read on. | | | | | | | Google has introduced DataGemma, a new open model that integrates LLMs with real-world data from its Data Commons repository, using retrieval-augmented methods like RIG and RAG to reduce AI hallucinations and improve the accuracy of generative AI outputs in research and decision-making contexts. Baidu has rebranded its ERNIE Bot as Wenxiaoyan, bringing advanced AI-driven search capabilities into its chatbot, allowing users to search for music, maps, articles, and more, while integrating features like personalised content scheduling, multimedia search, and expert advice, making it a popular choice among young users with over ten million monthly active users. AWS has selected seven Indian startups—Converse, House of Models, Neural Garage, Orbo.ai, Phot.ai, Unscript AI, and Zocket—for its Global Generative AI Accelerator program, offering up to $1 million in credits, mentorship, and technical support to scale their AI innovations. | | | | | Leveraging Cloud for Seamless Data Integration and Breaking Down Data Silos | | | | | | | The DECODE webinar is back with its second episode, where leaders from DBS Bank and Google will explore how cloud technology can break down data silos, enabling seamless data integration, faster decision-making, and smarter business operations. Featuring insights from Luis Carlos Cruz Huertas (DBS Bank) and Sriram Venkat (Google), this session promises to deliver practical approaches for overcoming technical barriers in data integration. | | | | | | | | | Join the NVIDIA AI Summit India from October 23–25, 2024, at the Jio World Convention Centre in Mumbai to explore AI innovations across generative AI, robotics, supercomputing, and more, with 70% of use cases addressing India's grand challenges. Don't miss the Fireside Chat with NVIDIA CEO Jensen Huang on October 24. | | | | | AIM & NVIDIA Present DevPalooza 4.0: The Ultimate Developer Meetup in Bengaluru | | | | | | | Join us at DevPalooza 4.0 in Bengaluru, powered by AIM and NVIDIA, to dive into hands-on generative AI workshops, explore applications, and network with AI professionals—click here to register and secure your spot! | | | | | | | | | Cypher 2024 marks a significant expansion as it celebrates its 8th edition by branching out to the USA in addition to its already established presence in India. Browse through the links below to learn more about the different editions of Cypher 2024. These links will guide you to comprehensive event information, including agendas, speakers, registration details, and more. | | | | | Enjoying Sector 6 (formerly AIM Daily XO)? Share it with colleagues or friends – they can sign up here. We love hearing from our readers! Have thoughts on our new format? Questions, comments, or ideas are always welcome. If there’s a specific topic in AI or analytics that you're curious about, tell us! Reach out to us at info@analyticsindiamag.com. Stay tuned for more insights in our next edition!
Curated with ♥️ in Namma Bengaluru | | | | | | | | |
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.