Can't read or see images? View this email in a browser

https://campaign-image.in/zohocampaigns/83238000007570001_1668333763034_belamy_logo.jpg

THE WEEKLY NEWSLETTER OF AIM.

Sunday, Feb 18, 2024 | Was this email forwarded to you? Sign up here

By Amit Naik

This week surely was a teaser for what 2024 holds. Just as it seemed Google pressed the foot on the ‘AI-celerator’ by releasing Gemini 1.5, OpenAI stole the thunder by dropping their text-to-video generation model Sora.

Tit-for-Tat, Google critiqued a video created by Sora using Gemini 1.5 tagging it as fake and pointing out significant inconsistencies.

https://stratus.campaign-image.in/images/83238000007570001_zc_v1_1705842050539_op-ai-conversations-that-happened-at-world-economic-forum-2024-1536x864.jpg.webp

ChatGPT Moment? Sora creates breathtaking hyper-realistic videos from text prompts that the world has never seen before. Capable of crafting 60-second videos with highly detailed scenes, complex camera motion, and lifelike characters, Sora uses a transformer architecture, employing visual patches instead of traditional text tokens.

“A good way to think about Sora is it’s basically the GPT-3 of video models. Stable Video Diffusion etc are like GPT2,” said Stability AI Founder Emad Mostaque. “The ChatGPT, GPT-4, Llama and Mistrals will come over the next few years.”

Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data.

It has multiple features like animating DALL·E images, extending generated videos, video-to-video editing and connecting videos. However, apart from video generation, the possibilities with Sora are endless. It can simulate some aspects of people, animals and environments from the physical world.

Check out the mesmerising videos created by Sora here.

Context Window Matters: Google wasn't to be outdone. Gemini 1.5 features a staggering context window of 1M tokens, surpassing not only GPT-4 Turbo's 128K but also Anthropic Claude 2.1's 200K. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, and codebases with over 30,000 lines of code or over 700,000 words.

Gemini 1.5 is built upon Transformer and MoE architecture. While a traditional Transformer functions as one large neural network, MoE models are divided into smaller "expert” neural networks.

Don’t Forget Meta: When OpenAI and Google were going out blazing all guns, how could Meta stay out? It released a new AI model called Video Joint Embedding Predictive Architecture (V-JEPA).

V-JEPA improves machines’ understanding of the world by analysing interactions between objects in videos. The model aligns with Yann LeCun, Meta’s chief AI scientist’s vision, for creating machine intelligence that learns similarly to humans.

Unlike Sora, V-JEPA is a non-generative model that learns by predicting missing or masked parts of a video in an abstract representation space. LeCun believes that the generation of mostly realistic-looking videos from prompts *does not* indicate that a system understands the physical world.

Earlier in the week, Andrej Karpathy, the developer community's favourite, left OpenAI to follow his own path. “My immediate plan is to work on my personal projects and see what happens,” he said on X. He recently open-sourced ‘minbpe’, clean code for the (byte-level) Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Keeping up with all this, Meta should release Llama 3 before GPT-5 arrives to keep things spiced up and keep the open source community thriving.

Register for our Upcoming Workshop - From Syntax to Strategy: Harnessing LMQL for Smarter, Streamlined LLMs

Best Firms for Data Scientists >>

[Best Firms for Data Scientists is one of India's biggest workplace certification platforms in data science. To nominate your organisation, you can fill out the form here .]

https://stratus.campaign-image.in/images/83238000007570001_zc_v1_1707719559811_indian-hyperscaler--1536x864.jpg.webp

Why Do Big Tech LLM Chatbots Have the Worst Possible Names?

Looks like big tech majors are betting on fresh monikers for LLM chatbots, moving away from giving their chatbots human-like names such as Alexa, Siri, and Cortana, to more non-human (read: boring) names. Read to find out why.

VFX Industry Will Make Their Own ‘Sora-Like’ GenAI Tools

https://stratus.campaign-image.in/images/83238000267723708_8_1708268980046_vfx-industry-will-make-thwebp

While Sora is impressive, will it contribute to the growing concerns about potential job losses for VFX artists? Click here to find out.

PEOPLE & TECH >>

CoRover.ai is the Silent Winner of Indian LLM Race

https://stratus.campaign-image.in/images/83238000267723708_9_1708268980108_corover.ai-is-the-real-wiwebp

Ankush Sabharwal, the co-founder of CoRover.ai, has had a busy few months building BharatGPT. Most recently, his company launched an educational tablet called Milkyway, which would be powered by CoRover’s BharatGPT virtual assistant, including video and chatbots for students.

In an exclusive interview with AIM, Sabharwal spoke about how CoRover.ai’s BharatGPT was built and what exactly it offers. Check out the complete interview here.

How Epsilon is Navigating DE&I in Tech

https://stratus.campaign-image.in/images/83238000267723708_12_1708268980346_lgbtq-1536x864.jpg.webp

The journey to equality has been a bit of a tedious trek for the LGBTQ+community in tech (and elsewhere). Sharing a similar story is Joseleen Princy C, a senior business system analyst at Texas-based advertising and technology company Epsilon. Click here to know more about her journey.

AIM VIDEOS >>

The Beginning of ISRO's success story: Story Kya Hai

The renewed 'Story Kya Hai' focuses on fresh hopes, cutting-edge technologies, and new initiatives in India's AI, space, Agri-tech and Data Centre sectors.

In this episode, learn about the ongoing revolution shaping the future of India's space exploration, through a historical lens which pans from the beginning of initiatives like Earth observation data, ground station, launching capabilities etc, to the present-day advancements.

https://stratus.campaign-image.in/images/83238000007570001_zc_v1_1705240208276_screenshot_2024_01_14_at_7.19.30 pm.jpg

AIM EVENTS >>

The Rising 2024

The Rising 2024 stands as a powerful testament to diversity and inclusion in tech – limited not just to the work culture of an organisation. It goes beyond processes, products, and services. To find out more about the event and book your tickets for the most exciting conference of 2024, click on this link.

Location: Hilton Convention Center, Manyata Tech Park, Bengaluru

Date: April 4-5, 2024

https://stratus.campaign-image.in/images/83238000007570001_zc_v1_1707720089490_passes-expiring-1.jpg

AIM SHOTS >>

Argmax released WhisperKit, a software package which enables OpenAI’s Whisper speech recognition model to operate on Apple Watches.
UiPath, a leading enterprise automation software company, is developing foundational models, according to Daniel Dines, co-founder and chief innovation officer at UiPath.
According to a recent survey report, generative AI has improved the productivity of software engineers by 70%.
Meta released a new AI model called Video Joint Embedding Predictive Architecture (V-JEPA).
GitHub has teamed up with Polar, a funding platform, to let developers make money from their projects on GitHub.
Andrej Karpathy, a founding member of OpenAI, departed the company.
OpenAI’s ChatGPT is undergoing testing to incorporate memory capabilities, aiming to improve user experience by remembering past conversations.
NVIDIA released “Chat with RTX,” an AI chatbot that operates directly on personal computers, marking a significant advancement in localised AI capabilities.
JioGenNext, the renowned startup accelerator, has introduced its latest cohort MAP’ 24, emphasising generative AI technology.
Cohere For AI (C4AI) announced Aya, a multilingual generative language model that follows instructions in 101 languages, of which over 50% are considered lower-resourced.