| Engineered for Ultimate Durability | View in browser | |||||||||||||
| |||||||||||||
| |||||||||||||
| |||||||||||||
| |||||||||||||
| |||||||||||||
| |
понедельник, 31 марта 2025 г.
NEW IN: G-SOCKS - Toughness for your feet - G-S0C-1APR4
TITAA #65: AI Bakeoff Hits a Spaceship
Two more weeks of creative tech news! SO MUCH! Upgrade your sub to get the media recs posts with details, plus the news section mid-month, which is always equally rich, and includes the special esoteric & weird links section. Being a paying subscriber helps me quite a lot (tax-wise, time-wise, morale-wise)! TITAA #65: AI Bakeoff Hits a SpaceshipImage Gen with 4o & More - 3D Gen & Games - inZoi - Creative Ngram Slop - Logic Puzzles
It’s been a long couple of weeks, with a lot of big AI releases! So no fun intro article. Long issue, visit the web? Nutshell: I took GPT 4o image gen for a strong test, and it didn’t measure up entirely to all promises, but it’s very good at some things. There were other good models released too. • Some great 3D gen code releases and demos. • Gemini 2.5 Pro experimental re-energized vibe game generation, and I pulled out a bunch of the latest (mostly three.js) examples. • The new big LLM models upset some of the creative writing benchmarks! • There was at least one very interesting AI-related game early release. • Plus some good games writing and news. • And as always, a lot of other links and tools and articles and I’m ready for a vacation. In two weeks I’ll report on some weird exhibits in the UK, see you then! TOC (links on the web page):
AI Creativity ToolsImagesWithin just a few days, there were a number of important image gen model releases, with the biggest bang coming from GPT 4o’s multimodal chat-during-image-creation release. It’s slow and expensive to generate images: behind the scenes, it seems to be an autoregressive model like Google’s Parti was, which means it’s intense to run; currently it offers limited generations and isn’t open to free users due to unexpected popularity, mostly from people making Ghibli-style knockoffs, which… never mind. But it’s very good at following directions and has a few hidden super powers that are surfacing, like creating transparent backgrounds. It is very good at making comics, visual posters, infographics etc, including from rough sketches and little direction. Here’s a visual treatment it did for the first stanza of Yeats’ Stolen Child poem, which I provided, asking only for “appropriate visual treatment and flourishes.” However, hype advertised tile-creation hasn’t been perfect, nor has sprite sheet creation. You’ll notice that this art deco flowers tile request (as viewed in a seamless viewer tool) works right/left but not top bottom:
Midjourney still does this well with the “—tile” flag. All of these came out quickly and produced good tiles: With a sprite sheet request, GPT 4o definitely doesn’t understand sizes like 64x64 or 128x128. I would have to fix this to make it work, the figures don’t quite line up. The transparency background is good though. Other claims about its abilities that I haven’t tested: generate mazes trick (you need to generate the solution and then ask it to remove the path), generate PBR texture from a photo, normal maps etc, generate stereoscopic 3D images (left and right eye view), create comics. I put it in a content creation contest below, so keep reading: Reve.art Image — the startup from Christian Cantrell (formerly of the Adobe Photoshop Stability plugin, then of Stability Product, then of quitting and working on this plus writing novels apparently?!). The claim is that it’s much better at interpreting language instructions, and very good at text-in-image generation. It supports some back and forth with the model as you instruct it how to redo the image by prompt editing, but it’s not the same kind of intelligence as GPT 4o. I did find it good at my quick tests, but it fails the “wine glass full to the brim, almost overflowing” test that I’ve seen on X. Here’s my test on the poem text, where I specified an art deco style (it doesn’t fully get all the text, but did a pretty good job trying to?). Now let’s do the very hard, “an asteroid hitting a luxury space ship” prompt, inspired by the Swedish SF movie Aniara that I highly recommend. This has been a challenging prompt for every model, forever. Reve wasn’t too bad, and offers the option of AI-aided prompt rewriting, plus editing that if you like: It struggled with showing the asteroid at point of impact, with an explosion of the right size. The one you see above is not exactly what I had in mind but does a good job of the concept: luxury ship, some kind of explosion that could be from an impact. OTOH, the GPT 4o model, after 3 edits, got me the composition I was after. We started from this—and then I got it to make the rock smaller and the ship less boaty and more space-worthy: It lost the plot on the nebula background and the ship looks more derelict than luxury, but this is fair: Gemini Flash Experimental with image ability (their blog post)—while very good at some problems— just had no idea what was going on here, bless. I’m a bit bothered by its overly apologetic personality; it makes me feel bad. (Remember Anthropic hired went to a lot of effort designing Claude’s character?) Ideogram 3.0 also launched with better text-art generation tooling, some style influence ability, and general model improvements including prompt following. I like it! It did some nice visuals for the poem, but did not even get close to including all the proper text, unfortunately: Another open source text-art generation model dropped, LexArt: LeX-Art: Rethinking Text Generation for Visual Content. Haven’t tested it or seen a demo yet. Midjourney is having another v7 output ratings game now. Sadly, it just all looks like similar generic AI art to me. My preference is to use my trained styles on MJ, and that could be replaced with other solutions… But it is good at tiles. VideoAI video incorporation in projects, especially using multiple tools, is only getting better and better… Look, there’s been a lot of video coverage in other issues of mine, and I have to get this out the door. So just a few: Cuco - A Love Letter To LA on Vimeo — A Paul Trillo AI and vfx project with an artist collaborator. I highly recommend the breakdown explanation to give an idea of the tools and skills used (I heard like 30 people were involved?). It features custom Loras, 3d modeling, and much more. Two more tools for developing your AI video story: Hongos, by Samim, on github, and LTX Studio’s new updates support storyboarding and brainstorming down to detailed shots. 3DSuperSplat has an editor. Not sure I knew this. Can I get my splats off Niantic Scaniverse, is the question. PlayCanvas generally is worth watching, they have done a ton of improvements recently. Also check on Splatrograph API, a splat interface from the command line. VibeDraw (github) for tldraw. Turn a sketch into 3d. Code releases: gfodor/text2vox - deployed on Replicate. A text to voxel engine/tool that generates MagicaVoxel models. Huh. It worked well on this snowy pine tree request: SynCity: Training-Free Generation of 3D Worlds — Generate complex and immersive 3D worlds from text prompts without any training or optimization. The web page has one small compressed demo world, which has incoherent landscape but I still found it weirdly compelling, as a fan of open worlds. Bolt3d: from Google and Oxford, generating 3D scenes in seconds. No code. Blender MCP: How to Set Up Tripo in Blender and Sync with Cursor (h/t Luokai). Probably goes well with Tripo’s Image2Texture model demo here! Btw: I tried another Blender MCP last week, and via Claude it was plagued by network timeouts, fails on the fetches for assets to third parties, Claude losing connection…. I think that using MCP via a more granular tool like Cursor is a good idea, since a fail won’t wipe everything out. Here’s one that suggests it does image-to-3d in cursor/Windsurf: blender-mcp-vxai. Hunyuan’s fast 3D image input models on HuggingFace. Here’s 2mv’s demo. It wants 4 pictures for the angles. You could use Gemini Flash multimodal gen or GPT 4o for this! (I did, it worked great.) Roblox Cube3D demo and open sourced model: This goes with their paper on 3D below in the Games / Research section. I had unconnected oddness with the results from some of my prompts, as you can see below: After using that, you need the Image2Texture MV Adapter, which worked pretty well for me after some settings fiddling (a generated glb plus a MJ image): And see this next section: AI Generated “Vibe Games”Since the release of Claude 3.7 (see my post Claude Goes 3D), the new Deepseek (try it with DeepSite), and Gemini Pro 2.5 Exp, everyone on X is building games with a prompt. NB: After I wrote that, I took out the quotes on “game.” My position is that it’s interesting to see what people want to build, and interesting that so many want to do games of some variety. Of course a good game is a lot of work to make. There’s a ton more to unpack in here, imo, but I’m on a deadline right now. A lot of the LLM game generators are in some way using three.js, and the three.js team is making an llms.txt file to help as api context for models. (Here’s some background on that concept, and in my Data Science tools section I mentioned my MCP solution for it. Speaking of llms.txt, that levelsio dude had everyone do #vibejam games too, and I picked up a few links I liked, either related or not to his jam. VAPOR - AI Adventure System. — This is a websim.ai special, not a vibejam game (afaik), that does interactive fiction on demand. It’s actually amazingly good, briefly, if you turn off the terrible soundtrack. You need to enter a theme or words first, then make sure you look around the 3d space for the clickable actions as things render. Explore the World with Glenn. This started out as a cool 3D maps experiment but seems to have turned into another driving sim with complicated controls…. ymmv. Planetary — a space flight sim with things to find in space. Brought my Mac to its knees, but maybe it was too many open tabs with vibejam games. Sweetgrass, another 3d explorer world, where you try to pick stuff up. Btw, the vibejam games have “portals” that take you into another of the games in the jam. Be aware, you can fall in. Indiana Bones 3d shooter, kind of. It’s so hard to navigate in these 3d spaces without a controller! I went thru a portal right away and got confused. Paige Bailey gave a PDF of a book to Gemini 2.5 and got a CYOA game from it (reminiscent of what Steve Johnson did for NotebookLM but in this case, the UI is made by the tool). I need to play with this approach. Minecraft as made by Gemini, running on Codepen. Alex Chen and his team from Google are sharing a lot of good Gemini 2.5 projects, including an animated pelican riding a bicycle done in p5.js for Simon Willison (Gemini canvas link). Misc Web Procgen ArtyHTML Review’s latest (Spring 2025), it is so good. Web art, ascii, poetry, and interactive oddities, go enjoy. This animated frame is incredibly cool, I can’t remember who I got it from. It’s the combo of the pseudo 3d, animation, and mouseover turning the still image to colorful and alive… Learn Threejs Shading Language and Signed Distance Fields - YouTube courses. Via mr.doob. The Useless Web - via Clive Thompson. Just go somewhere random. Related: the Marginalia Search Engine (via Vicki Boykis). Terrible as UI, weird fun concept: ZUI on Wikipedia links (Hypertext link zoom). Via Gorilla Sun. Animation libs: react-bits: An open source collection of animated, interactive & fully customizable React components for building stunning, memorable user interfaces. — Lots of text effects which is why I rec. This 3D game-like (inspired by Inside) portfolio is 2 years old and yet holy crap: https://pawel-brod.com/ and on github, with toolset. StarVector models for SVG generation on HuggingFace. Allison Parrish has a new book coming out (procgen poetry). Two of Pentacles, by Allison Parrish.
|
© 2025 Lynn Cherny
548 Market Street PMB 72296, San Francisco, CA 94104
Unsubscribe



























