GLOSSARY: When LLMs are Super Confident 😎 ✨

Just like Archimedes leapt from his bath shouting "Eureka!" upon his jubilant discovery, Andrej Karpathy has sparked similar excitement by starting Eureka Labs. Your AI Human, Amit Raja Naik, is thrilled to share this exciting new development.

https://s1.designmodo.com/postcards/image-959dbc93-1ffe-45db-9f4c-d61a97b4a89a.png

https://s1.designmodo.com/postcards/image-1721364827202.gif

OpenAI recently said that it is using the Prover-Verifier Games technique to enhance the legibility of LLM outputs, particularly for grade-school math problems, by training smaller verifiers to judge the correctness of solutions, thus making the outputs more understandable for humans.

In other words, the OpenAI paper shows that teaching a small LLMs to double-check the work of a bigger LLMs is like having a diligent student explain their math homework to a tutor, ensuring the solutions are clear and easy to understand for everyone.

"One way to increase confidence in the outputs of LLMs is to support them with reasoning that is clear and easy to check — a property we call legibility," said OpenAI, saying that this makes complex AI outputs more trustworthy and comprehensible.

Agree to Disagree

The next time you see more options in your output when using ChatGPT (powered by GPT-4o or advanced models), remember that OpenAI is playing games with you (RLHF) or using what you may call Prover-Verifier Games.

“Our algorithm iteratively trains small verifiers to predict solution correctness, 'helpful' provers to produce correct solutions that the verifier accepts, and 'sneaky' provers to produce incorrect solutions that fool the verifier," shared OpenAI, highlighting its innovative training method, inspired by the Prover-Verifier Game, which aims to improve both the robustness of the verifier and the clarity of the solutions generated by the LLM.

OpenAI said Prover-Verifier enhances the legibility of AI outputs, thus aiding human oversight.

Otherwise, a model focusing solely on correctness may lead to complex and unintelligible solutions, underscoring the need for such methods that balance accuracy with clarity.

Researchers from Microsoft Research in Bangalore recently revealed that current LLM compression methods, like quantisation, can alter model behaviour undetected by traditional accuracy metrics, proposing the use of KL-Divergence and flips metrics for a more comprehensive evaluation, in its recent paper titled ‘Accuracy is Not All You Need’.

Works in progress

Implementing Prover-Verifier Games comes with challenges and limitations. OpenAI said that the technique relies on having a dataset with known correct answers, which may not always be available, particularly in more complex or less well-defined domains.

Ergo, the partnerships. Last month, TIME and OpenAI entered a multi-year strategic partnership to integrate the former’s journalism with the latter’s products like ChatGPT, expanding global access to reliable information, following similar collaborations with Le Monde, Prisa Media, Vox Media, The Atlantic, and News Corp.

Dataset diversity and difficulty. It is important to note that the empirical study was conducted by OpenAI on a relatively simple dataset (grade-school math problems), which may not fully capture the challenges of applying this method to more complex or diverse datasets.

Initialisation with human-written math derivations. OpenAI said that the initial high performance of the prover might be due to its pre-training on human-written math data, which may not be representative of more generalised AI applications.

Lastly, OpenAI also said that training the AI system to produce legible solutions might limit its performance, and suggested an alternative approach, which could use separating the solution-generation process from the explanation process to avoid this limitation.

In another update, Google DeepMind introduced FLAMe, a family of foundational autorater models that outperform existing proprietary models in quality assessment tasks, trained on 5 million human judgments, and designed to reduce the challenges and costs of human evaluation of LLM outputs.

AWS, recently, unveiled AuditLLM, a novel tool for auditing large language models using a multi-probe approach, designed to streamline and provide a comprehensive audit trail.

What’s next?

OpenAI believes that this new technique shows promising results for establishing trust in LLM outputs “even if they become more capable than humans in the future”.

Further, it stated that the company will continue to inspire future work on semi-supervised/unsupervised setups for improving human judge legibility with few or no ground truth labels.

Read the full story

In the Era of Gen AI MongoDB, PostgreSQL Will Run out of Gas, DataStax Won’t

Jihad Dannawi of DataStax, in an exclusive interview with AIM, highlighted the need for scalable databases like Apache Cassandra to support the growing complexity and real-time demands of generative AI applications.

He said that in the era of generative AI, “MongoDB will run out of gas; PostgresDB will run out of gas when it has to handle real-time processing or high volumes of data. To effectively manage large-scale operations, you require a robust infrastructure that can handle substantial demands without running out of capacity.”

Check out the full story here.

Google I/O Connect Highlights

Infosys, in its FY25 Q1 earnings call, emphasised its continued dedication to generative AI with strong client traction but refrained from disclosing specific revenue numbers, contrasting TCS's $1.5 billion generative AI pipeline announcement.

Altair is expanding its footprint in India and hiring multiple data scientists for roles in AI and analytics. Check out the interview process here.

Ola's CEO, Bhavish Aggarwal, challenged Google's Maps API by offering free access for Indian developers and a new pricing structure, offering up to 5 million free API calls per month and significant discounts for higher usage.

AIM Workshop: RAG & Fine Tuning in GenAI with Snowflake

Join Prashant Wate from Snowflake India for an exciting workshop on RAG & Fine Tuning in GenAI. Learn how to optimise models and create seamless AI apps effortlessly.

Date: July 25, 2024

Time: 6 - 7.30 PM

AI Conclave Wonders

Cypher 2024 marks a significant expansion as it celebrates its 8th edition by branching out to the USA in addition to its already established presence in India.

Browse through the links below to learn more about the different editions of Cypher 2024.

These links will guide you to comprehensive event information, including agendas, speakers, registration details, and more.

Enjoying Sector 6 (formerly AIM Daily XO)? Share it with colleagues or friends – they can sign up here.

We love hearing from our readers! Have thoughts on our new format? Questions, comments, or ideas are always welcome. If there’s a specific topic in AI or analytics that you're curious about, tell us!

Reach out to us at info@analyticsindiamag.com.

Stay tuned for more insights in our next edition!

Curated with ♥️ in Namma Bengaluru

This email was sent by info@aimmediahouse.com to alexvarboffin.abbb@blogger.com

Not interested? Unsubscribe | Manage Preference | Update profile

Analytics India Magazine | 280, 2nd floor, 5th Main, 15 A cross, Sector 6, HSR layout Bengaluru, Karnataka 560102

GLOSSARY

Поиск по этому блогу

Search1

пятница, 19 июля 2024 г.

When LLMs are Super Confident 😎 ✨

Комментариев нет:

Отправить комментарий