Data Science Weekly - Issue 641Curated news, articles and jobs related to Data Science, AI, & Machine LearningIssue #641 |
|
.
Last Week’s Poll:
.
Data Science Articles & Videos
From Noise to Image
An interactive, visual guide to the magic behind how AIs generate images from text…Data Science Interview Process [Reddit]
We are currently preparing our interview process, and I would like to hear what you think, as a potential candidate, about what we are planning for a mid-level to experienced data scientist…`
How the Residual Stream is (Not) Linear
I commonly hear about the linearity of the residual stream of Transformer language models. Linearity, it is argued, is powerful for interpretability, and gives validity to interpretability tools like logit lens and steering vectors. In this brief note, we ask, linear with respect to what? We argue that the residual stream is not linear with respect to anything that gives us significant leverage…What’s an API? - What McDonald’s and Lyft have in common
Application Programming Interfaces (APIs) are like drive-thru windows, but in code: they take inputs and give you predictable outputs.At its core, an API is a bunch of code that takes an input and gives you an output
Most modern applications (like Excel) are a bunch of APIs working together
Sometimes, companies will make parts of their APIs publicly available, like X/Twitter or Google Maps
APIs are one of the more confusing concepts in software, because they can mean a lot of different things
APIs power most of modern software development, and are a key part of being able to talk intelligently about code. So read this!….
Danfei Xu: Human Data as a Foundation for Robot Learning
In this guest lecture for the ETH Zurich course “Robot Learning: From Fundamentals to Foundation Models” (Spring 2026), hosted and led by Oier Mees, Prof. Danfei Xu (Georgia Tech & NVIDIA) explores the pivotal role of human data in developing robust robotic systems…When AI Writes the World’s Software, Who Verifies It?
Code Metal recently raised $125 million to rewrite defense industry code using AI. Google and Microsoft both report that 25–30% of their new code is AI-generated. AWS used AI to modernize 40 million lines of COBOL for Toyota. Microsoft’s CTO predicts that 95% of all code will be AI-generated by 2030. The rewriting of the world’s software is not coming. It is underway. Anthropic recently built a 100,000-line C compiler using parallel AI agents in two weeks, for under $20,000. It boots Linux and compiles SQLite, PostgreSQL, Redis, and Lua. AI can now produce large-scale software at astonishing speed. But can it prove the compiler correct? Not yet. No one is formally verifying the result…5 Useless-but-Useful R Functions You’ll Use Every Day
They look silly. They feel pointless. Yet once you add them to your .Rprofile, you’ll wonder how you ever lived without them….How Well Does Agent Development Reflect Real-World Work?
In this work, we systematically study the relationship between agent development efforts and the distribution of real-world human work by mapping benchmark instances to work domains and skills. We first analyze 43 benchmarks and 72,342 tasks, measuring their alignment with human employment and capital allocation across all 1,016 real-world occupations in the U.S. labor market. We reveal substantial mismatches between agent development that tends to be programming-centric, and the categories in which human labor and economic value are concentrated…Sentiment Analysis in 3 steps: Using quickSentiment in R
In my job, I had to build a sentiment analysis model and compare the model and vectorization performance. Took hell of a time to code and run, crazy and ugly script, and difficult for reproducibility. Then I decided to make a package, and now quickSentiment 0.3.1 is in CRAN. I try to cover most of the ML and vectorization process and pre-processing in just 2 steps….
Training a Water Segmentation Model with TorchGeo
One notebook, a few hundred lines of Python, and you go from raw Sentinel-2 imagery to a georeferenced water map you can open in QGIS. That’s the premise of the TorchGeo tutorial we put together for the ICLR 2026 ML4RS Workshop (paper). It walks through the full earth observation (EO) ML workflow: loading multispectral data, training a semantic segmentation model on the Earth Surface Water dataset, and running gridded inference on a Sentinel-2 scene over Rio de Janeiro…AI agent skills are reusable prompt files (typically SKILL.md) that extend coding assistants like Claude Code with specialized, domain-specific workflows. This list focuses on skills useful for R users…
Loss landscape visualization 1 -- Seeing sticky plateau
I love picturing optimization as “a ball rolling down a hill”. However, real loss landscape of neural networks are very high-dimensional. A convenient way is to apply PCA to weight trajectory, but unfortunately this does not capture local information well (see my prompt below). A obvious problem is that the trajectory points do not lie exactly on the 2D PC plane…Forget Clicks: Why CTR Is a Terrible Metric for Ad Effectiveness
Short practical advice on ad measurement…Dalessandro, Hook, Perlich, and Provost’s paper “Evaluating and Optimizing Online Advertising: Forget Click, but There Are Good Proxies” is a sobering reality check for the online advertising industry. The central finding is simple but devastating: clicks are not just a poor proxy for conversions — in many cases, they’re no better than random guessing. This matters because the vast majority of the industry optimizes for CTR. Platforms recommend high-CTR ads. Agencies report on CTR. Academic papers study CTR. But this paper shows this is fundamentally wrongheaded…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Please take a look at last week's issue #640 here.
Cutting Room Floor
Thermodynamic Linear Algebra: estimating the inverse of a matrix
AdderBoard - Smallest transformer that can add two 10-digit numbers
Software Review in the Era of AI: What We Are Testing at rOpenSci
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything you need to know about getting a data science job, based on answers to thousands of reader emails like yours. The course has three sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~68,750 subscribers by sponsoring this newsletter. 30-35% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian
You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.


Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.