͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

The latest news and links! Lots of AI video developments, some fun 3D, and a ton of OCR models recently. Plus the usual… consider upgrading to help me cover my time doing this!

TITAA #72: Fairy Tale Hunters

OCR Bonanza - Real-time Video - Inkwell - Fairytales - Cursed Houses - Training Tips

Lynn Cherny

Nov 1

READ IN APP

*Artwork by HJ Ford for the Olive Fairy Book, ed. Andrew Lang (Gutenberg)*

Before we kick off, I hope you’ll have a play with a small web app I made from fairytale text (public domain), Fairytale Hunt. You are presented with a snippet from a fairy tale, and if you select some text in it, it will load you up a new snippet, hopefully semantically related! Meanwhile, you get points for common words in various categories seen in fairy tales (like talking animals). I’ve gotten: “ooh, this is fun,” “oooh I love it,” and “this is amazing.” But I’ve love more people to see it.

It started as a tech demo of running search in the browser with no server using a small embedding model loaded from Hugging Face. It’s fun to see how well the tiny model does at finding new sentences based on the words you highlight. There’s a little bit of an “about” in a linked dialogue and code available. I think the basic implementation would be a fun student project; most of the work was on the data side, as usual.

Meanwhile, a day late with this due to too much work this past week! Recs coming tomorrow!

Table of Contents

AI Creative Tools (Video, 3d including non AI)
Misc Fun Web
Games Links (and Creativity & Narrative News)
Data Science / NLP / Tools
A Poem: “Fairy Tale with Laryngitis and Resignation Letter”

AI Creative Tools

Video

Veo updated to 3.1 and added new tools to their Flow UI, including first and last frames, “insert” by text to edit, and extending in a “project.” It’s still a bit iffy to work with (I find the UI confusing), but you can do stuff like I did here with a pub scene transforming between 2 images (it wasn’t perfect at it, there’s a jump cut), adding a dog with a text insert edit, and then extending to get the dog to leave and snow to fall (some weird furniture stuff).

The two images, first of all (made in Midjourney):

Veo 3.1:

LongCat Video - you can try here with a Pro subscription to Hugging Face - I was pleasantly surprised although it added a lot of people with pints in the end:

Hailuo 2.3 is getting great reviews, behind only Sora and Veo 3.1 in the video gen leaderboards. (Replicate link.) These people are furniture challenged, just like the Veo dog, though:

Odyssey 2. Another real-time streaming video thing. Modify the world as you watch it. It gets weird fast, and isn’t super consistent. Clear lack of world knowledge and item integration issues… I started with “a Greek island” and then tried to modify the boats and add dolphins.

LTX-Fast video gen, on Replicate. Used in the cursed CursedSit.com experiment which is basically slop Friends scenes, which is awful and I love it if briefly. Also, see the scary version which is creepier and has more exploding things. There is code. (Note, also out now, Krea’s open sourced real-time video.)

Cool research model - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives. I don’t know if they should lean in so hard on “recreate iconic scenes from classic films, demonstrating understanding of cinematic heritage and style.” But ok. “A text-to-video model that generates full scenes, not just isolated clips. … It maintains consistency of characters, objects, and style across all shots in a scene. … You provide shot-by-shot text prompts, giving you directorial control over the final video.”

3D Misc

More WorldLabs! RTFM: A Real-Time Frame Model: a streaming still image 3d panorama thingy that remains consistent even when you “turn around.” You can try demos of some scenes. Tbh, whenever I go try it, I am hung up by this really creepy door in the corner of this child’s playroom, and yes, it’s always there. There is a main door as well. Very Uketsu Strange Houses. In their tool, those big black buttons are controllers for zoom and pan etc:

Generating Infinite 3D World: Speaking of strange houses… This thing is a research project that generates totally cursed apartment layouts, that are indeed infinite-ish. As a fan of weird architecture, I can’t wait to play with these. The repo is in progress, but I have hopes:

Crystal Words - text in shiny crystal will fall with good physics, a demo.

RadianceFields made a creepy Halloween pumpkin hunt inside a giant splat of a McMansion. Speaking of, The Impact and Outlook of 3D Gaussian Splatting is an overview that might interest some…

Hunyuan World Mirror — an app that will take a video (or images) and construct a 3d gaussian splat, mesh, 3d point cloud mesh, depth maps etc from it. I tried with a Veo 3 video of the interior of one of my cozy pubs, and it was — ok? Not as good as World Labs splats, but still pretty amazing from a video generation clip. This is a workflow people have been suggesting for months now; generate a video clip, and then turn it into a 3d mesh.

Related, there is FlashWorld Spark — which takes an image and some camera directions, and generates a splat/ply for it. It’s a bit confusing, since the generated splat makes almost no sense to me… but when I discovered their “scrub” Trajectory Playback (step 5 on the right) for the camera controls, it made more sense.

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery: “Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, with real-time rendering performance.” This is one of those “Google Earth with buildings as splats” but actually we already know that there ARE splat interiors that are navigable in some Google streetview content, at least if you look in Android’s Maps, so huh.

A few misc 3d items:

AlphaXiv doing an almost incomprehensible 3d look at LLM tensor operations in 3d.
Mesh2Motion Application: a web app in alpha. To rig up and animate your meshes.
Bounce - 3d physics for the web, a toolkit.

Misc Fun

Nengiren’s little embroidered people — via Kottke, in Colossal and on IG. I love these.

A visual explainer on dithering. Dithering - Part 1.

🧚🏼‍♀️🧙🏼‍♀️ My Fairytale Hunt — search by selecting text, and explore the contents of some classic public domain fairytale texts. It all runs in the browser, using embeddings from a small Hugging Face model. (There’s an info popup and code.)

Deep Time Australia: An epic history of Aboriginal and Torres Strait Islander peoples: Via Matt Muir’s Web Curios. An epic collection of stories about Deep Time history from native peoples. This is a great project. My only complaint is that I can’t find pictures of the places (like the rock art mentions) connected to the topics. The rock art I found in searches is extraordinary.

Rock art from the Burrup peninsula, on flickr

GeoSpot Infinity — trained on Google streetview, it tries to guess the location of images you upload to it. I have a shit-ton of random French walking and travel photos, and it didn’t find mine. But I was interesting watching it struggle.

Drift — an AI shader experiment, with an astronaut drifting lost in space, and a lot of weird square particles around him. An AI call gets a diary entry for the day. (Via three.js)

Games Links

Inkwell — SiliconJungle on X is working on a 2d game engine with AI generation tools, which immediately solved one of my regular image gen test problems: generating walking sprites for a character. It’s an invite Discord/app situation, but really fun to play with. I am working on a gardening game based on the medieval garden I volunteer in, except with Victorian women instead of retired French ladies. You have to clean up the leaves in the fall, and their are herbs to pick and plant. It would be awesome to vary it by real-time season.

IFComp 2025: Interactive Fiction (i.e., CYOA text games) winners. Strangely the first place “This work also won 3rd place in the 2025 Miss Congeniality Awards.” What does this mean about it. I haven’t had a moment to even look at these.

SPINE by Backwards Tabletop: “A solo ttrpg about losing yourself in a book” via Florence Smith Nicholls. Also see Sending, a Keepsake game, from Florence.

In SPINE, you play as a researcher who has inherited this strange book from an estranged relative. What’s inside the book is equally peculiar, a collection of excerpts about immortality and several magicians who sought it.

ASP level generation examples from our PCG textbook: “This is an updated, interactive web version of all but one of the answer-set programming (ASP) level-generation examples from… Mark J. Nelson and Adam M. Smith (2016). ASP with applications to mazes and levels. In: Noor Shaker, Julian Togelius, and Mark J. Nelson, eds., Procedural Content Generation in Games: A textbook and an overview of current research. Springer.” Lots of mazes and code interactives.

Creativity & Narrative - Super Terse Cuz So Late

Fabula — a Google DeepMind writing-with-AI help (Gemini models) UI, with a form to apply to try it out. (I have tried it, and found the text style pretty bad. So that’s not a solved problem here. YMMV otherwise.)

Generative Aesthetics: On formal stuckness in AI verse | Published in Journal of Cultural Analytics, on poetry and meter by Ryan Heuser. “This paper examines the formal and aesthetic patterns of AI-generated poems through a series of computational experiments. Through analyses of rhyme and rhythm, it reveals how large language models (LLMs) exhibit a stubborn, formal stuckness in their outputs.” So, not great.

How Kimi K2 RL’ed Qualitative Data to Write Better: the unusual move of focusing on qualitative problems, from Drew Breunig. So the general take is that Kimi K2 is stylistically interesting, while it loses coherence over long contexts. Still, style and local creativity matter.

James’s crime story rubric illustrates something crucial: when dealing with complex, qualitative phenomena, imperfect categorization often beats the alternatives. Rather than abandoning systematic analysis entirely or waiting for a perfect measurement system that may never come, breaking down the complexity into manageable, assessable components lets you make progress.

Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models. Using pattern detection and filtering. A Sam Paech joint; he’s got one of the benchmarks I report regularly for writing.

Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination.

CreativityPrism: A Holistic Benchmark for Large Language Model Creativity: Via Ted Underwood.

A Springer book on Narrative and AI Gen, via Mark Riedl. Likely to be expensive (yep, even the ebook is).

Data Science

An absolute bonanza of OCR models in the past couple weeks. I mean, weird numbers of releases and code bits.

A blog post on HF: Supercharge your OCR Pipelines with Open Models, which includes some of the recent releases including Chandra and olm-OCR-2.
LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR. “This blogpost introduces LightOnOCR-1B, a vision–language model for OCR that achieves state-of-the-art performance in its weight class while outperforming much larger general-purpose models. It achieves these results while running 6.49× faster than dots.ocr, 2.67× faster than PaddleOCR-VL-0.9B and 1.73× faster than DeepSeekOCR.”
DeepSeek-OCR Usage Guide - vLLM Recipes: run from vLLM. And/or run on Replicate by API lucataco/deepseek-ocr | Run with an API on Replicate.
Chandra datalab-to/chandra · Hugging Face. “Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information.”
huggingface/finepdfs: Codebase for FinePDFs (This repository accompanies the FinePDFs dataset release and contains the end‑to‑end code to filter, extract, OCR, postprocess, deduplicate, classify, and package large‑scale PDF text data.)
wjbmattingly/dots-ocr-editor: code you can work with from William. which is a Flask web application for editing OCR output, allowing users to:
- Rearrange reading order of text blocks
- Edit text content
- Adjust bounding boxes interactively
- Group bounding boxes together
- Change layout categories

LLM Training tips and tricks:

Guide: counting r in strawberry (and how to add abilities generally) · karpathy/nanochat
The Smol Training Playbook: The Secrets to Building World-Class LLMs - a Hugging Face Space with a very large document.
Streaming datasets: 100x More Efficient: a post and how-to from Hugging Face.

Introducing FlashPack: Lightning-Fast Model Loading for PyTorch.

Poetry: so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs - a paper by Maria Antoniak and others — including a good dataset. “Using a corpus of 19k English-language published poems from Poetry Foundation, we investigate how 4k poets have used whitespace in their works. We release a subset of 2.8k public-domain poems with preserved formatting to facilitate further research in this area.”

Comma — a browsable collection of medieval latin and old french manuscripts from Inria. This is really nice.

A Poem: Fairy Tale with Laryngitis and Resignation Letter

You remember the mermaid makes a deal,  her tongue evicted from her throat,  and moving is a knife-cut with every step.  This is what escape from water means.  Dear Colleagues, you write, for weeks  I’ve been typing this letter in the bright  kingdom of my imagination. Your body  is a ship of pain. Pleasure is when you climb  the rocks and watch the moonlight  touching everywhere you want to go,  a silver world called faraway. Dear Colleagues,  you write, this place is a few sentences  contained by the cursor’s rippling barrier—  what happened here is only beaks  and brackets, the serif’s liquid stroke.  The old story has witches, a prince in love  with the surging silence of women,  a knife that turns the water red. You write,  Dear Colleagues, now these years are filed  in the infinite oceans of bureaucracy.  Everything bleaches or fades. In other words,  goodbye. Sometimes it’s possible to walk,  although you’ve been told inside the oyster  shell of your heart there is no soul.  Creatures like you must end as a spray of salt,  green droplets floating breathless in the air.

— Jehanne Dhubrow

Stay healthy, keep reading, and enjoy the fall.

Best, Lynn (@arnicas on mostly bluesky, ex twitter, mastodon).

You’re a free subscriber to Things I Think Are Awesome. If you’re a fan, and you want to support me in writing this, consider becoming a paying subscriber in order to get the complete mid-month updates including the new esoterica section and the end-of-the month media recs separate post—or buy me a coffee to express your appreciation.

Comment

Restack

GLOSSARY

Поиск по этому блогу

Search1

суббота, 1 ноября 2025 г.