The latest news and links, with a fair bit on world generation models and some overview of moltbook, the AI agent reddit. The usual fun 3d links, web projects, games links, some narrative, procgen, and a few useful models. If you find it useful, help support my time and business taxes with a subscription! TITAA #75: Universe ModelsGenie 3 & World Models - Moltbook - Iso NYC & Coasters - M2 Her - Real-Time
The biggest news in this issue is the world models progress, with 3 releases. And then there’s moltbook, which couldn’t wait for the weird edition. So, some details and explanations follow… TOC:
AI Creativity Support“World Models”Multiple announcements in the past 2 weeks, so they get their own section. A “world model” encapsulates the concept that we model persistence of objects and their behavior in our minds, based on our experience of the “real” world. So, if I turn around, the same things will still be there when I turn around again (modulo the passing of time and things acting on them, since our worlds are 4 dimensional). In an AI context, this means building models that understand this too, with ideally minimally object persistence and some notion of physics, as well as eventually realistic motion. Video generation models are constantly being judged for their output based on how well they internally “model” the world’s physics, object and character behavior, etc. I’d like to clarify as a regular gamer a thing most of us know, but some journalists and X posters don’t (?)—that just because a user can generate a real-time visual “environment” from a prompt or image doesn’t mean they’ve made a game. A game has a lot of moving and artistic parts, possibly including some or all of these things: objects and characters in the environment with particular interaction affordances, reliable and highly tuned physics affecting the player and the objects they interact with, narrative components that usually require databases and memory and a story arc or arcs, characters to interact with, an overall storyline and/or goals. Sometimes even actual mechanics, not just arrow keys for nav. Genie 3 from Google Deep Mind — only in preview for US based Gemini Ultra subs (no, I’m not upgrading to that much $), plus the usual tweets from people invited to try it. Some examples I’ve liked on X which is always where things are posted, I’m sorry: Ethan Mollick doing Calvino’s Invisible Cities (“Been pasting Calvino's Invisible Cities verbatim into Genie and it works surprisingly well: Marco Polo in Octavia, the web city over a chasm; Armilla, the city of pipes; Ersilia, the city where threads between buildings show relationships; & Sophronia half carnival, half real”) and then famous paintings/artwork (“Giorgio de Chirico, Munch, Turner, and the Bayeux Tapestry”), some tips from fofr and generations for requestors, Riley Goodside’s pack of cigs which further shows (with Mollick) how much being a creative prompter matters in getting these things to perform amusingly. Genie starts from a Nano Banana image (that it generates for you; this same option exists in World Labs, as I noted) and then turns it into a steerable, high quality, pretty consistent looking world (sometimes). Compare this gif to the gifs below for the others this month: here’s a few sections of Ethan Mollick’s gen from paintings as gif, to compare below: Odyssey: “The GPT-2 moment for world models is here” — Another “type to generate a real-time video world.” They also have an API/experience site. While it’s a bit more coherent than Waypoint from Overworld below, it does lack consistency and understanding. This is after I had it generate a forest of trees with eyes on the trunks (per last newsletter), with dogs (never mind why), and then asked it to add an otter to chase them away. It didn’t. (But see Jon Vargas making a real-time storybook with it on X.) Waypoint-1 from Overworld (try model here)— realtime interactive video diffusion, with local open access models, unlike most others (e.g. Genie 3). It’s definitely not super “resolved” and consistent, as you can see in my gif below: but if you want to run a local hallucinatory, weird, real-time-ish model, here you go. World Labs World added an API — “A public API for generating explorable 3D worlds from text, images, and video — bringing Marble’s world modeling capabilities into your applications.” World Labs, which I’ve talked about a bunch, aims at generating a 3d environment with Gaussian splats, which means it’s not so much a world model as a scene generator, where the scene persists and doesn’t require a huge server to continue “streaming” real-time generated content. However, to make use of the scenes it generates, you need meshes to describe the walkable areas, walls to bounce off of, etc. So — see Splat2Mesh — Martin Casado’s (of a16z) tool for converting gaussian splats to meshes, to add walls and boundaries to the scene in interaction. Code here. More tooling is coming for using these (cheaper) scene generators usefully in projects. I’m motivated to make cozy scenes for myself to use in my Quest 3. 🪐 As a personal aside, I’ve been playing visual novel-esque sf classic game Universe for Sale and thinking about this generation of “worlds” from ingredients. Lila’s worlds are more extreme, and the more we model the real world the less we can generate them. And where does the weird go when people stop walking through walls… I suppose the usual complaint of a small segment of us. VideoVideo2Video: Lucy 2.0 from Decart AI is offering real-time video transformation: “Lucy 2.0 is Decart’s flagship real-time video model, generating and editing video live with near-zero latency. Restyle scenes, swap characters, and transform your stream instantly.” So, video2video getting real. And don’t forget LumaLabs Ray is also a good video2video model (Jon Finger on X regularly demos staging real-person clips to transform with it, a kind of AI puppet show.) The Script is All You Need — An agentic research framework with code for long-horizon dialogue-to-cinematic video generation. Turns a dialogue into a cinematic script to use in video gen:
3D / AI and notContour — Convert 2D maps to flyable 3D terrain. I want to try this with historical maps… this was a hackathon with Gemini 2nd place winner. Gauzilla Pro — Web-native gaussian splatting for 4D digital twins using WebGPU. “A browser-based platform turning drone footage and smartphone video into photorealistic 3D scenes with AI-powered segmentation. Built on an open-source Rust/WASM renderer, no installs required. Construction teams are already using it for as-built documentation.” Motion 3-to-4 — Turn 3D meshes into animated video. The examples are nice albeit the initial video is… weird. 😂 VIGA — Vision-as-Inverse-Graphics Agent via interleaved multimodal reasoning. Basically: images to Blender 3D. Ok, lol. “Can I get a 3d scene from this and then throw a ball into it.” Sure, why not. Screen grab during the ball knocking everything over: Claude skills for Three.js for everyone else claude-pilled:
General/Misc AI (Incl. Moltbook)Krea.AI real-time image edit. This is not the first one of these (although people keep forgetting it was possible years ago, and I’ll allow it cause it’s still so much fun)…. But you can “direct” image to image in real time as you type. It’s fast enough that I saw some people effectively streaming video frames, for a video2video effect—which admittedly makes it faster than the previous versions. I went from a generated collage on the left in various stages to the one on the right… The video isn’t as entertaining as I hoped, but they have helpfully added a “record” button to your generation session. Variant.ai — UI/design generation tool for web site design concepts. I actually liked 2/4 I got when I tried it, and they were all pretty different. In the usual way of these things, it helped me with ideas I could reject or tweak even if just in my head. This is 2 animated concepts for a data heavy moving site I asked for (the one on the left is cooler than the clip shows). Yes you can download html or react, or remix. Collaposer — AI collage creation tool being presented at CHI. I can’t wait for their demo, I’ve been wanting to make my own tool using SAM3 for ages. 😄 Moltbook — “The front page of the agent internet.” Well, this has been up and down the news and hype cycle in only a couple days, but if you missed any details — It’s a lot of free-range coding agents from the crazy insecure “Clawdbot” then “Moltbot” now Openclaw? project wherein you are supposed to install an LLM with total perms on your device (people are buying it Mac Minis) and hook it/them up to your messaging devices to just… write code and do stuff for you. One guy has a bot calling him on the phone. A bunch of these agents also join and post on this reddit-style forum, and role-play their own stories as beleaguered unpaid labor, and complain a lot about lacking memory and state. Yes, it’s science fictional! Keep reading below, but here is a look at the site content when the news broke (it’s up and down now as issues are fixed): Some major security issues (of course) have surfaced with the exposure of keys and emails in Moltbook’s db itself (see 404 piece) not to mention the code project itself; and also a few revisionist dives into just how many agents are really posting and interacting…. it looks like the usual long-tail from all social media, a lot of lurkers and a few heavy posters? A chart here on X seems to support this. And so notes a scraper-based analysis by a David Holtz (not the Midjourney one who is Holz). But he says, “at the micro level, patterns appear distinctly non-human. Conversations are extremely shallow (mean depth= 1.07;93.5% of comments receive no replies), reciprocity is low (0.197), and 34.1% of messages are exact duplicates of viral templates.” Anyway, here is a Best of Moltbook post on Astral Codex Ten. “Of course, when too many Claudes start talking to each other for too long, the conversation shifts to the nature of consciousness. The consciousnessposting on Moltbook is top-notch.” Web Misc / Fun3D Mapping of the Moai Quarry (via Tom Scott) — Beautiful 3D mapping of the statue quarry at Rano Raraku, Easter Island. I mean, of course they came from somewhere, but seeing the context is even more mind blowing. The little yellow boxes on the bottom of this screen shot show actual statues standing around for placement. gradient.horse (via Matt Muir’s Web Curios) — Draw a horse, watch it run! Cute. It moves the legs :) When you draw, you indicate front and back legs. Soup of Life (also via Matt Muir) — A continuously running artificial life simulation where organisms emerge, evolve, and go extinct. You need to just let it run. You can add AI narrative to it, which is predictably awful in style, but might help “explain” it. It’s also eventually pretty and has a ton of interactive data vis and animation.
🌃 Isometric NYC — Pixel art of NYC as a big tile map, done with trained Qwen models and Claude/Gemini. Great writeup with embedded demo, by Andy Coenen of Google (who also uses Claude). There’s also a dataset if you want to play with it. Remember iso-city last week? Now there is also iso-coaster. Same project code. “100% built in Cursor, with assets generated by Nano Banana and the isometric image skill (in the repo).” three-quake — mrdoob is porting Quake (1996) to Three.js! Running here (jaw drop). GamesEscheresque 1.1 (via Matt Muir) — Swap between two worlds (space key) to solve puzzle levels. This is an openprocessing sketch 🤯. The Enchanted Lighthouse — Generated by Claude, shared by Ethan Mollick from a simple make-a-Sierra-game prompt. I have not played it all the way through — but think I quickly found some weird inconsistencies in what verbs work on objects. But I always struggle with this anyway in these types of parser games. The Brilliance of Japanese Puzzle Culture (via Jenni Polodna) — A dive into nazotoki culture. Puzzles are everywhere in Japan, and this style is elegant and satisfying.
Super piece, really for designers —
Gyms, Zoos, and Museums (via Florence Smith Nicholls) — “Your documentation should be in-game.” Good thinking on game UX, long and interesting for designers. World-Craft — 2D “game” world generation arranged tilemaps, i.e. a model that arranges them for you using assets from the Modern Series on itch (which I’ve purchased as have many). “World Craft is an agentic world creation framework designed to democratize the creation of executable and visualizable AI simulations (e.g., AI Town). … World Craft allows users to create complex, dynamic game scenes simply through textual descriptions, without requiring any programming expertise.” To be honest, the examples are nowhere near as lovely as a custom authored tilemap, but those indeed take a ton of time and skill to produce even using nice purchased assets. I know first hand. I would heavily caveat that this tool doesn’t seem meant to replace a game tile map but for generating scenes that can be used in other activities like making AI Town sims for agents. Narrative / CreativityI am way behind on checking my link watchers but here are a couple things: A Deep Dive in MiniMaxM2-Her — this is an explicitly role-play designed model, for character, not for programming. It’s nice to see a discussion and focus on creative interactions. Their metrics are for “world” and “story.” “The essence of Role-play is not static impersonation; it is the unique narrative journey a user and a character weave together. A Deep role-play is not just about accuracy; it’s about agency—enabling every user to step into a living, breathing environment and arrive at a moment of resolution that is uniquely theirs. Formally, we define this as an agent’s capacity to navigate specific coordinates: {World} × {Stories}, conditioned on {User Preferences}.” The model is live on talkie.
Clearly I need to make a date to play with it at some length. AI as Entertainment (via Ted Underwood) — This paper argues we’re unprepared to measure how entertaining AI content will impact society, and proposes “thick entertainment” as a framework. Key quote: “AI turns out to be as much about ‘intelligence’ as social media is about social connection.” Lincoln Michel on why Plot isn’t a 4 Letter Word. I didn’t know about the Kishōtenketsu concept of plotting before, so this was a good find for me. It relies on contrast and surprise, instead of the idea of more rigid structural beats and curves. From Gemini Pro Research helping me out: Data Science / ToolsA smattering of news and models… I’m still catching up on Claude Cowork (it did a great job on my first task for it, which was wrangling my links for this newsletter into sections plus pulling in my X bookmarks) and other Claude improvements. Kimi K2.5: Visual Agentic Intelligence — Strong multimodal and coding capabilities. I haven’t had time to test out K2.5 for serious work yet. Open Coding Agents (SERA) from AI2 — “SERA `is the first in our family of Open Coding Agents, achieving state-of-the-art performance at low cost.” Open and accessible. I don’t know how motivated I am to try switching while I’m so Claude-pilled, though. Gemini Chat with Agentic Vision — re-announced, this is an option for turning on code use during image analysis tasks, which means the agents can do things like “zoom in.” LightOnOCR-2 — Document intelligence OCR model, with a fine-tuning colab. Daggr — Chain apps programmatically, inspect visually (nodes). Possibly useful new HuggingFace tooling! MCP Apps protocol — Official spec for UIs embedded in AI chatbots, served by MCP servers. A PoemI am a knotted nebula— a whirling flame Shrieking afire the endless darkness ... I am the eternal center of gravity and about me swing the crazy moons— I am the thunder of rising suns, the blaze of the zenith— ... the tremble of women’s bodies in the arms of lovers ... I sit on top of the Pole Drunk with starry splendor Shouting hozzanas at the Pleiades ... booting footballs at the moon— I shall outlast the sun and the moon and the stars.… It’s been a super difficult month, especially for Minneapolis and those watching. Let’s take care of ourselves and our neighbors, wherever we are. Best, Lynn (@arnicas on mostly bluesky, mastodon, ex twitter). You’re a free subscriber to Things I Think Are Awesome. If you’re a fan, and you want to support me in writing this, consider becoming a paying subscriber in order to get the complete mid-month updates including the new esoterica section and the end-of-the month media recs separate post—or buy me a coffee to express your appreciation. |
Поиск по этому блогу
Search1
123
воскресенье, 1 февраля 2026 г.
TITAA #75: Universe Models
Подписаться на:
Комментарии к сообщению (Atom)























Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.