GLOSSARY: Benchmarking, the Indian Way

пятница, 29 сентября 2023 г.

Benchmarking, the Indian Way

Can't read or see images? View this email in a browser

How do you test the intelligence of an LLM? The answer is, benchmarks such as MMLU, HumanEval, AGIEval and the like.

Whether it's GPT-4 or Llama 2, creators typically begin by highlighting their LLMs' benchmark scores in their research papers.

But how do you set up a benchmark? For this, we make the benchmarks attempt various human-level examinations.

https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExNnF4b256eDI4bG44NGl1YjdncnBuY216NXR5Nzh4OWV4eGJmY3UyaiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/TmSs47OsRbud4q3HlW/giphy.gif

A majority of the benchmarks, primarily originating from the US, incorporate various examination elements. For instance, MMLU assesses 57 tasks, encompassing subjects such as elementary mathematics, US history, computer science, and law. Similarly, AGIEval draws inspiration from assessments like the SAT, LSAT, and other examinations, including the Chinese College Entrance Exam (Gaokao), law school admission tests, math competitions, and national civil service assessments.

While all of this is certainly commendable, it's surprising that India, now emerging as a formidable player in the field of AI, lacks its own dedicated benchmark for the evaluation of LLMs.

In India, there exist a multitude of rigorous competitive examinations such as UPSC, NEET, JEE-Advanced, CAT, and more. These exams frequently feature intricate questions in a multitude of languages and test the examinee’s grasp of India's history, culture, and governance. Additionally, they demand skills in critical thinking and logical reasoning. These assessments could serve as valuable foundations for establishing benchmarks.

Crafting a benchmark specific to LLMs, designed around India's competitive exams like UPSC, would facilitate the development of language models that can adeptly comprehend the distinct demands of these assessments.

Experts suggest that the absence of an Indian benchmark may partly result from the non-availability of Indian LLMs. Nonetheless, positive developments have recently been witnessed in this regard.

Tech Mahindra, a prominent Indian IT conglomerate, is actively engaged in the development of an indigenous LLM named Project Indus. This groundbreaking model is set to be proficient in numerous Indic languages, with an emphasis on Hindi.

Read the full story here.

ChatGPT, the Therapist

First, it came for artists, then for writers and developers. And now, it’s here to dent the livelihoods of therapists. We are talking about ChatGPT. The chatbot, known for its rather cool diagnostic abilities, is now being used as an informal therapist, sparking varied reactions.

The AI's new memory and voice function in GPT-4 allows for personalized interactions, fostering emotional connections. The trend highlights a growing reliance on AI, as it simplifies interactions and appeals to tasks with lower mental effort requirements. Additionally, LLMs can simulate human characteristics and personalities, blurring the boundaries between human and AI interactions, as showcased by Google DeepMind's research.

Read the full story here.

OpenAI DevDay’s Expectations

OpenAI DevDay, the company's inaugural developer conference, is highly anticipated, raising expectations for potential announcements. OpenAI has recently unveiled several developer-focused updates, including GPT-4V(ision), UI for fine-tuning, GPT 3.5 Turbo Instruct, and ChatGPT's Browse with Bing feature. Although Sam Altman has ruled out GPT-5, rumors about a powerful model named Arrakis have surfaced.

A major announcement could involve upgrading all models to GPT-4 or beyond, possibly open-sourcing GPT-3.5. In addition to software, OpenAI is reportedly exploring the hardware business, potentially integrating its AI capabilities into various designs. Speculation includes smart wearables, smart rings, and XR headsets. The recent acquisition of Global Illumination hints at a potential foray into the gaming industry.

Read the full story here.

Many Birds, One Stone

https://media3.giphy.com/media/v1.Y2lkPTc5MGI3NjExMW8zcTMxMzd5NmZhcmlmcmM1eW12Zm02c3k3N3VtdmtlbDA5Z2Z4OCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/7yY1XUONZWaxg6tF9G/giphy.gif

OpenAI has brought back the 'Browse with Bing' feature allowing ChatGPT to access web information without the previous November 2021 limitations. While many welcomed this, some suggested it could render third-party plugins obsolete, making OpenAI's marketplace more dominant.

The store had previously seen rapid growth, with over 900 plugins, but faced issues such as a lack of organization and security concerns. The reintroduction of 'Browse with Bing' raises questions about data privacy and copyright concerns, as OpenAI navigates the thin line between convenience and safety.

Read the full story here.