The latest news and links! Lots of AI video developments, some fun 3D, and a ton of OCR models recently. Plus the usual… consider upgrading to help me cover my time doing this! TITAA #72: Fairy Tale HuntersOCR Bonanza - Real-time Video - Inkwell - Fairytales - Cursed Houses - Training Tips
Before we kick off, I hope you’ll have a play with a small web app I made from fairytale text (public domain), Fairytale Hunt. You are presented with a snippet from a fairy tale, and if you select some text in it, it will load you up a new snippet, hopefully semantically related! Meanwhile, you get points for common words in various categories seen in fairy tales (like talking animals). I’ve gotten: “ooh, this is fun,” “oooh I love it,” and “this is amazing.” But I’ve love more people to see it. It started as a tech demo of running search in the browser with no server using a small embedding model loaded from Hugging Face. It’s fun to see how well the tiny model does at finding new sentences based on the words you highlight. There’s a little bit of an “about” in a linked dialogue and code available. I think the basic implementation would be a fun student project; most of the work was on the data side, as usual. Meanwhile, a day late with this due to too much work this past week! Recs coming tomorrow! Table of Contents
AI Creative ToolsVideoVeo updated to 3.1 and added new tools to their Flow UI, including first and last frames, “insert” by text to edit, and extending in a “project.” It’s still a bit iffy to work with (I find the UI confusing), but you can do stuff like I did here with a pub scene transforming between 2 images (it wasn’t perfect at it, there’s a jump cut), adding a dog with a text insert edit, and then extending to get the dog to leave and snow to fall (some weird furniture stuff). The two images, first of all (made in Midjourney): Veo 3.1: LongCat Video - you can try here with a Pro subscription to Hugging Face - I was pleasantly surprised although it added a lot of people with pints in the end: Hailuo 2.3 is getting great reviews, behind only Sora and Veo 3.1 in the video gen leaderboards. (Replicate link.) These people are furniture challenged, just like the Veo dog, though: Odyssey 2. Another real-time streaming video thing. Modify the world as you watch it. It gets weird fast, and isn’t super consistent. Clear lack of world knowledge and item integration issues… I started with “a Greek island” and then tried to modify the boats and add dolphins. LTX-Fast video gen, on Replicate. Used in the cursed CursedSit.com experiment which is basically slop Friends scenes, which is awful and I love it if briefly. Also, see the scary version which is creepier and has more exploding things. There is code. (Note, also out now, Krea’s open sourced real-time video.) Cool research model - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives. I don’t know if they should lean in so hard on “recreate iconic scenes from classic films, demonstrating understanding of cinematic heritage and style.” But ok. “A text-to-video model that generates full scenes, not just isolated clips. … It maintains consistency of characters, objects, and style across all shots in a scene. … You provide shot-by-shot text prompts, giving you directorial control over the final video.” 3D MiscMore WorldLabs! RTFM: A Real-Time Frame Model: a streaming still image 3d panorama thingy that remains consistent even when you “turn around.” You can try demos of some scenes. Tbh, whenever I go try it, I am hung up by this really creepy door in the corner of this child’s playroom, and yes, it’s always there. There is a main door as well. Very Uketsu Strange Houses. In their tool, those big black buttons are controllers for zoom and pan etc: Generating Infinite 3D World: Speaking of strange houses… This thing is a research project that generates totally cursed apartment layouts, that are indeed infinite-ish. As a fan of weird architecture, I can’t wait to play with these. The repo is in progress, but I have hopes: Crystal Words - text in shiny crystal will fall with good physics, a demo. RadianceFields made a creepy Halloween pumpkin hunt inside a giant splat of a McMansion. Speaking of, The Impact and Outlook of 3D Gaussian Splatting is an overview that might interest some… Hunyuan World Mirror — an app that will take a video (or images) and construct a 3d gaussian splat, mesh, 3d point cloud mesh, depth maps etc from it. I tried with a Veo 3 video of the interior of one of my cozy pubs, and it was — ok? Not as good as World Labs splats, but still pretty amazing from a video generation clip. This is a workflow people have been suggesting for months now; generate a video clip, and then turn it into a 3d mesh. Related, there is FlashWorld Spark — which takes an image and some camera directions, and generates a splat/ply for it. It’s a bit confusing, since the generated splat makes almost no sense to me… but when I discovered their “scrub” Trajectory Playback (step 5 on the right) for the camera controls, it made more sense. Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery: “Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, with real-time rendering performance.” This is one of those “Google Earth with buildings as splats” but actually we already know that there ARE splat interiors that are navigable in some Google streetview content, at least if you look in Android’s Maps, so huh. A few misc 3d items:
Misc FunNengiren’s little embroidered people — via Kottke, in Colossal and on IG. I love these. A visual explainer on dithering. Dithering - Part 1. 🧚🏼♀️🧙🏼♀️ My Fairytale Hunt — search by selecting text, and explore the contents of some classic public domain fairytale texts. It all runs in the browser, using embeddings from a small Hugging Face model. (There’s an info popup and code.) Deep Time Australia: An epic history of Aboriginal and Torres Strait Islander peoples: Via Matt Muir’s Web Curios. An epic collection of stories about Deep Time history from native peoples. This is a great project. My only complaint is that I can’t find pictures of the places (like the rock art mentions) connected to the topics. The rock art I found in searches is extraordinary.
GeoSpot Infinity — trained on Google streetview, it tries to guess the location of images you upload to it. I have a shit-ton of random French walking and travel photos, and it didn’t find mine. But I was interesting watching it struggle. Drift — an AI shader experiment, with an astronaut drifting lost in space, and a lot of weird square particles around him. An AI call gets a diary entry for the day. (Via three.js) Games LinksInkwell — SiliconJungle on X is working on a 2d game engine with AI generation tools, which immediately solved one of my regular image gen test problems: generating walking sprites for a character. It’s an invite Discord/app situation, but really fun to play with. I am working on a gardening game based on the medieval garden I volunteer in, except with Victorian women instead of retired French ladies. You have to clean up the leaves in the fall, and their are herbs to pick and plant. It would be awesome to vary it by real-time season. IFComp 2025: Interactive Fiction (i.e., CYOA text games) winners. Strangely the first place “This work also won 3rd place in the 2025 Miss Congeniality Awards.” What does this mean about it. I haven’t had a moment to even look at these. SPINE by Backwards Tabletop: “A solo ttrpg about losing yourself in a book” via Florence Smith Nicholls. Also see Sending, a Keepsake game, from Florence.
ASP level generation examples from our PCG textbook: “This is an updated, interactive web version of all but one of the answer-set programming (ASP) level-generation examples from… Mark J. Nelson and Adam M. Smith (2016). ASP with applications to mazes and levels. In: Noor Shaker, Julian Togelius, and Mark J. Nelson, eds., Procedural Content Generation in Games: A textbook and an overview of current research. Springer.” Lots of mazes and code interactives. Creativity & Narrative - Super Terse Cuz So LateFabula — a Google DeepMind writing-with-AI help (Gemini models) UI, with a form to apply to try it out. (I have tried it, and found the text style pretty bad. So that’s not a solved problem here. YMMV otherwise.) Generative Aesthetics: On formal stuckness in AI verse | Published in Journal of Cultural Analytics, on poetry and meter by Ryan Heuser. “This paper examines the formal and aesthetic patterns of AI-generated poems through a series of computational experiments. Through analyses of rhyme and rhythm, it reveals how large language models (LLMs) exhibit a stubborn, formal stuckness in their outputs.” So, not great. How Kimi K2 RL’ed Qualitative Data to Write Better: the unusual move of focusing on qualitative problems, from Drew Breunig. So the general take is that Kimi K2 is stylistically interesting, while it loses coherence over long contexts. Still, style and local creativity matter.
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models. Using pattern detection and filtering. A Sam Paech joint; he’s got one of the benchmarks I report regularly for writing. Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination. CreativityPrism: A Holistic Benchmark for Large Language Model Creativity: Via Ted Underwood. A Springer book on Narrative and AI Gen, via Mark Riedl. Likely to be expensive (yep, even the ebook is). Data ScienceAn absolute bonanza of OCR models in the past couple weeks. I mean, weird numbers of releases and code bits.
LLM Training tips and tricks:
Introducing FlashPack: Lightning-Fast Model Loading for PyTorch. Poetry: so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs - a paper by Maria Antoniak and others — including a good dataset. “Using a corpus of 19k English-language published poems from Poetry Foundation, we investigate how 4k poets have used whitespace in their works. We release a subset of 2.8k public-domain poems with preserved formatting to facilitate further research in this area.” Comma — a browsable collection of medieval latin and old french manuscripts from Inria. This is really nice. A Poem: Fairy Tale with Laryngitis and Resignation LetterYou remember the mermaid makes a deal, her tongue evicted from her throat, and moving is a knife-cut with every step. This is what escape from water means. Dear Colleagues, you write, for weeks I’ve been typing this letter in the bright kingdom of my imagination. Your body is a ship of pain. Pleasure is when you climb the rocks and watch the moonlight touching everywhere you want to go, a silver world called faraway. Dear Colleagues, you write, this place is a few sentences contained by the cursor’s rippling barrier— what happened here is only beaks and brackets, the serif’s liquid stroke. The old story has witches, a prince in love with the surging silence of women, a knife that turns the water red. You write, Dear Colleagues, now these years are filed in the infinite oceans of bureaucracy. Everything bleaches or fades. In other words, goodbye. Sometimes it’s possible to walk, although you’ve been told inside the oyster shell of your heart there is no soul. Creatures like you must end as a spray of salt, green droplets floating breathless in the air. Stay healthy, keep reading, and enjoy the fall. Best, Lynn (@arnicas on mostly bluesky, ex twitter, mastodon). You’re a free subscriber to Things I Think Are Awesome. If you’re a fan, and you want to support me in writing this, consider becoming a paying subscriber in order to get the complete mid-month updates including the new esoterica section and the end-of-the month media recs separate post—or buy me a coffee to express your appreciation. |
Поиск по этому блогу
Search1
123
суббота, 1 ноября 2025 г.
TITAA #72: Fairy Tale Hunters
Подписаться на:
Комментарии к сообщению (Atom)


















Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.