GLOSSARY: LLMs Getting Cheaper

Training LLMs is Getting Cheaper Than Ever

Hello there! Your friendly AI Human, Amit Raja Naik, has good news: Building LLMs has now become cheaper than ever.

https://s1.designmodo.com/postcards/image-959dbc93-1ffe-45db-9f4c-d61a97b4a89a.png

https://s1.designmodo.com/postcards/image-1721134442127.gif

“Incredibly, the costs have come down dramatically over the past five years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g., the FineWeb-Edu dataset),” said Karpathy.

Further, he said because llm.c is implemented directly in C/CUDA, it requires minimal setup, eliminating the need for tools like Conda environments, Python interpreters, and pip installs.

“You spin up a cloud GPU node (e.g. on Lambda), optionally install NVIDIA cuDNN, NCCL/MPI, download the .bin data shards, compile and run, and you're stepping in minutes,” he added, saying it's ~5,000 lines of code, it compiles and steps very fast so there is very little waiting around. It has a constant memory footprint, it trains in mixed precision, distributed across multi-node with NNCL, it is bitwise deterministic, and hovers around ~50% MFU.

That explains how Tech Mahindra was able to build Project Indus for well under $5 million, which again, is built on GPT-2 architecture, starting from the tokeniser to the decoder.

Optimisation matters.

Recently, Together AI launched FlashAttention-3, significantly enhancing GPU utilisation to 75% on NVIDIA H100, doubling processing speeds, and optimising memory use for efficient large-scale AI deployments.

According to Meta engineer Mahima Chhagani, LLMLingua is a method designed to efficiently decrease the size of prompts without sacrificing significant information.

Chhagani said using an LLM cascade, starting with affordable models like GPT-2 and escalating to more powerful ones like GPT-3.5 Turbo and GPT-4 Turbo, optimises cost by only using expensive models when necessary.

FrugalGPT is another approach that uses multiple APIs to balance cost and performance, reducing costs by up to 98% while maintaining a performance comparable to GPT-4.

A Reddit user, who goes by the name pmarks98, used a fine-tuning approach with tools like OpenPipe and models like Mistral 7B, cutting costs by up to 88%.

Meanwhile, HuggingFace recently launched SmolLM, a series of high-performing small language models that excel in their size categories and can operate on local devices, beating Qwen 2 and Phi 1.5 in benchmarks.

Interestingly, the company also announced its profitability with a team of 220 members while maintaining a largely “free (like model hosting) and open-source platform for the community!”

Future of LLMs

Abacus.AI chief Bindu Reddy has predicted that in the next five years, smaller models will become more efficient, LLMs will continue to become cheaper to train, and LLM inference will become widespread.

“We should expect to see several Sonnet 3.5 class models that are 100x smaller and cheaper in the next 1-2 years.”

Surprisingly, it is already happening so quickly. SenseTime recently unveiled SenseNova 5.5, a comprehensive upgrade featuring China's first real-time multimodal model, cost-effective edge-side deployment, and significant performance improvements. In benchmarks, it beat Claude 3.5 and GPT-4o.

A few weeks ago, the French non-profit AI lab Kyutai launched Moshi, a real-time multimodal AI model with capabilities surpassing OpenAI's GPT-4o and Google Astra.

In another update, Meta is gearing up to release its largest open-source Llama 3 model, with 405 billion parameters, on July 23. This model will feature multimodal capabilities for understanding and generating both images and text.

Meta AI Glasses Miss the Mark

Meta chief Mark Zuckerberg is trying really hard to sell them glasses. On the fourth of July, the man went surfing in a tux just six months after his knee surgery, flaunting his moves on Instagram. He also subtly marketed the Meta Ray-Ban glasses, which he wore to record videos. Thankfully, they were not damaged during the stunt. Read on.

INDIA

Google recently backed Namma Yatri, an open-source ride-hailing app in India, challenging Uber and Ola. The move highlights a growing interest in driver-centric, community-driven models within the ride-sharing industry.

AWS India and Tamil Nadu Technology (iTNT) Hub have launched a generative AI startup program to boost AI solutions for public services, focusing on sectors like government, healthcare, and education.

Semiconductor chip company SiMa.ai and TRUMPF recently partnered to develop AI-powered lasers for welding, cutting, and 3D printing.

Hyderabad-based Dhruva Space received authorisation from IN-SPACe to offer Ground Station as a Service (GSaaS), enabling cost-effective satellite data access for Indian companies and supporting various space missions.

AIM Workshop: RAG & Fine Tuning in GenAI with Snowflake

Join Prashant Wate from Snowflake India for an exciting workshop on RAG & Fine Tuning in GenAI. Learn how to optimise models and create seamless AI apps effortlessly.

Date: July 25, 2024

Time: 6 - 7.30 PM

AI Conclave Wonders

Cypher 2024 marks a significant expansion as it celebrates its 8th edition by branching out to the USA in addition to its already established presence in India.

Browse through the links below to learn more about the different editions of Cypher 2024.

These links will guide you to comprehensive event information, including agendas, speakers, registration details, and more.

Enjoying Sector 6 (formerly AIM Daily XO)? Share it with colleagues or friends – they can sign up here.

We love hearing from our readers! Have thoughts on our new format? Questions, comments, or ideas are always welcome. If there’s a specific topic in AI or analytics that you're curious about, tell us!

Reach out to us at info@analyticsindiamag.com.

Stay tuned for more insights in our next edition!

Curated with ♥️ in Namma Bengaluru

This email was sent by info@aimmediahouse.com to alexvarboffin.abbb@blogger.com

Not interested? Unsubscribe | Manage Preference | Update profile

Analytics India Magazine | 280, 2nd floor, 5th Main, 15 A cross, Sector 6, HSR layout Bengaluru, Karnataka 560102

GLOSSARY

Поиск по этому блогу

Search1

среда, 17 июля 2024 г.

LLMs Getting Cheaper

Комментариев нет:

Отправить комментарий