GLOSSARY: This is a Story About Self-Reflection, Not AI Fraud

The past few days have not been easy for OthersideAI founder Matt Shumer since the whole Reflection fiasco began. Your friendly AI Human, Amit Raja Naik, hopes things will eventually get better and believes this is a time for self-reflection, rather than hate.

https://s1.designmodo.com/postcards/image-959dbc93-1ffe-45db-9f4c-d61a97b4a89a.png

https://s1.designmodo.com/postcards/image-1725957836485.gif

So, what exactly went wrong?

Artificial Analysis, known for its independent analysis of AI models and API providers, compared Reflection AI 70B to other models. It failed miserably, and the results were poor compared to Llama 3 70B.

A Reddit user claimed that the Reflection model is trained to give false answers first in its thinking phase, and then it reflects the thinking phrase. “If you ask what 2+2 is, the default example on the Hugging Face page will say something like 2+2=3. Oh wait, I’ve made a mistake; 2+2 is actually 4. If the thinking is actually hidden, it might work, but it’s quite strange,” he added, further explaining the flaw in the model.

When Artificial Analysis came up with poor results, it was granted access to private APIs of the Reflection models. The performance then was way better than the previous results. But again, when it compared the performance of the given private APIs with the available models on Hugging Face, the results were completely different, as the model hosted on Hugging Face showed poor results.

As mentioned, users have reported Reflection to be a wrapper of Claude. When the Reflection model was made available on OpenRouter, users reported that it used a dumbed-down version compared to the previous version as the model was heavily censored.

“The version on OpenRouter seems to be heavily censored/dumbed down; it just refuses to write about what I asked for, while the “original” version did fine. So it was probably ChatGPT or Llama3+ChatGPT for Reflection initially, and now he switched to Claude, which is known to be heavily censored,” a Reddit user shared his experience.

Shumer first blamed the upload process and mentioned that something might go wrong while uploading weights on Hugging Face but that didn’t solve the problem. So, he went a step further and decided to start the training from scratch to eliminate all the issues.

It’s time for self-reflection

Shumer claimed that Reflection models were the best open-source models to date. They use the reflection-tuning method, which is designed to teach AI models to recognise and rectify their mistakes.

This approach seemed poised to address one of the most persistent challenges with language models: the tendency to “hallucinate” or generate inaccurate information.

“When LLMs make mistakes, they often treat their errors as facts. If we could teach these models to think more like humans—reflect on their behaviour and recognise their mistakes—the models would become smarter and more reliable,” said Shumer, suggesting that reflection tuning can help models reason better.

When the model generates an answer, it outputs its reasoning, and the tag surrounds the thought process with special tags (such as). When the model detects an error during inference, it marks the error with a label and corrects itself. This feature enhances the reliability of the model, especially when dealing with complex problems.

A Reddit user solved a classic trolly problem by adding “it’s not the usual one” to the prompt in a single shot, suggesting the reasoning capabilities of the reflection-tuning method.

Can retraining solve the problem?

Shumer mentioned that ideally, this shouldn’t have happened in the first place. He said that his team has tried everything they could, but the performance they get from Hugging Face is nowhere close to what it ideally should be while running the Reflection model locally.

Some users believe that the whole release of the Reflection model was actually an advertisement for GlaiveAI as Shumer owns part of Glaive and was seen promoting it when he released the Reflection model. In response to that, Shumer said that he is a tiny investor and has only invested around $1000 in GlaiveAI.

Here, it’s also important to note that this was the first release of the Reflection model, which was praised for its reflection-tuning approach. It would be a good idea to wait for the next update/release before judging the model too harshly.

Enjoy the full story here.

Founder Mode is the Only Mode that Works

The debate around the founder mode vs manager mode has taken centre stage after Paul Graham released his blog post inspired by Brian Chesky’s management insights. While Graham promotes a founder-driven approach, others, like Chamath Palihapitiya, challenge its effectiveness, sparking a lively conversation on how companies should truly be run. Read more.

AI Bytes

SiMa.ai recently introduced MLSoC Modalix, the industry’s first multi-modal edge AI product family.

Oracle has partnered with AWS to launch Oracle Database@AWS, completing its multi-cloud strategy and enabling seamless integration of Oracle databases with AWS services, accelerating cloud migration and enterprise modernisation.

Rabbitt.ai has launched ChanceRAG, a no-code retrieval augmented generation (RAG) solution that simplifies integrating LLMs with document retrieval systems.

NVIDIA AI Summit India

Join the NVIDIA AI Summit India from October 23–25, 2024, at the Jio World Convention Centre in Mumbai to explore AI innovations across generative AI, robotics, supercomputing, and more, with 70% of use cases addressing India's grand challenges.

Don't miss the Fireside Chat with NVIDIA CEO Jensen Huang on October 24.

AIM & NVIDIA Present DevPalooza 4.0: The Ultimate Developer Meetup in Bengaluru

Join us at DevPalooza 4.0 in Bengaluru, powered by AIM and NVIDIA, to dive into hands-on generative AI workshops, explore applications, and network with AI professionals—click here to register and secure your spot!

AI Conclave Wonders

Cypher 2024 marks a significant expansion as it celebrates its 8th edition by branching out to the USA in addition to its already established presence in India.

Browse through the links below to learn more about the different editions of Cypher 2024.

These links will guide you to comprehensive event information, including agendas, speakers, registration details, and more.

Enjoying Sector 6 (formerly AIM Daily XO)? Share it with colleagues or friends – they can sign up here.

We love hearing from our readers! Have thoughts on our new format? Questions, comments, or ideas are always welcome. If there’s a specific topic in AI or analytics that you're curious about, tell us!

Reach out to us at info@analyticsindiamag.com.

Stay tuned for more insights in our next edition!

Curated with ♥️ in Namma Bengaluru

This email was sent by info@aimmediahouse.com to alexvarboffin.abbb@blogger.com

Not interested? Unsubscribe | Manage Preference | Update profile

Analytics India Magazine | 280, 2nd floor, 5th Main, 15 A cross, Sector 6, HSR layout Bengaluru, Karnataka 560102

GLOSSARY

вторник, 10 сентября 2024 г.

This is a Story About Self-Reflection, Not AI Fraud

Комментариев нет:

Отправить комментарий

вторник, 10 сентября 2024 г.

This is a Story About Self-Reflection, Not AI Fraud

Комментариев нет:

Отправить комментарий

вторник, 10 сентября 2024 г.