A majority of the benchmarks, primarily originating from the US, incorporate various examination elements. For instance, MMLU assesses 57 tasks, encompassing subjects such as elementary mathematics, US history, computer science, and law. Similarly, AGIEval draws inspiration from assessments like the SAT, LSAT, and other examinations, including the Chinese College Entrance Exam (Gaokao), law school admission tests, math competitions, and national civil service assessments.
While all of this is certainly commendable, it's surprising that India, now emerging as a formidable player in the field of AI, lacks its own dedicated benchmark for the evaluation of LLMs.
In India, there exist a multitude of rigorous competitive examinations such as UPSC, NEET, JEE-Advanced, CAT, and more. These exams frequently feature intricate questions in a multitude of languages and test the examinee’s grasp of India's history, culture, and governance. Additionally, they demand skills in critical thinking and logical reasoning. These assessments could serve as valuable foundations for establishing benchmarks.
Crafting a benchmark specific to LLMs, designed around India's competitive exams like UPSC, would facilitate the development of language models that can adeptly comprehend the distinct demands of these assessments.
Experts suggest that the absence of an Indian benchmark may partly result from the non-availability of Indian LLMs. Nonetheless, positive developments have recently been witnessed in this regard.
Tech Mahindra, a prominent Indian IT conglomerate, is actively engaged in the development of an indigenous LLM named Project Indus. This groundbreaking model is set to be proficient in numerous Indic languages, with an emphasis on Hindi.
Read the full story here.
ChatGPT, the Therapist
First, it came for artists, then for writers and developers. And now, it’s here to dent the livelihoods of therapists. We are talking about ChatGPT. The chatbot, known for its rather cool diagnostic abilities, is now being used as an informal therapist, sparking varied reactions.
The AI's new memory and voice function in GPT-4 allows for personalized interactions, fostering emotional connections. The trend highlights a growing reliance on AI, as it simplifies interactions and appeals to tasks with lower mental effort requirements. Additionally, LLMs can simulate human characteristics and personalities, blurring the boundaries between human and AI interactions, as showcased by Google DeepMind's research.
Read the full story here.
OpenAI DevDay’s Expectations
OpenAI DevDay, the company's inaugural developer conference, is highly anticipated, raising expectations for potential announcements. OpenAI has recently unveiled several developer-focused updates, including GPT-4V(ision), UI for fine-tuning, GPT 3.5 Turbo Instruct, and ChatGPT's Browse with Bing feature. Although Sam Altman has ruled out GPT-5, rumors about a powerful model named Arrakis have surfaced.
A major announcement could involve upgrading all models to GPT-4 or beyond, possibly open-sourcing GPT-3.5. In addition to software, OpenAI is reportedly exploring the hardware business, potentially integrating its AI capabilities into various designs. Speculation includes smart wearables, smart rings, and XR headsets. The recent acquisition of Global Illumination hints at a potential foray into the gaming industry.
Read the full story here.
Many Birds, One Stone
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.