Data Science Weekly - Issue 588Curated news, articles and jobs related to Data Science, AI, & Machine LearningIssue #588 |
|
Last Week’s Poll:
Data Science Articles & Videos
Was Harvey Weinstein thanked more often than God at the Oscars?
I analysed almost 2,000 Oscar speeches to discover if the claim that Harvey Weinstein was thanked more often than God is true. Plus, we'll find out which Hollywood icon is bigger than both of them…LLM (ML) Job Interviews (Fall 2024) - Process
A retelling of my experience interviewing for ML/LLM research science/engineering focused roles in Fall 2024…This post has two parts: 1) Job Search Mechanics (including context, applying, and industry information), which you can continue reading below, and, 2) Preparation Material and Overview of Questions, which you can read at LLM (ML) Job Interviews - Resources…Fundamentals of GPU Architecture
This 9-part series on 'Fundamentals of GPU Architecture' covers everything from SIMT cores, warp to programming models…Geospatial Python Tutorials
Welcome to Spatial Analysis and Remote Sensing Tutorials by Spatial Thoughts. These tutorials complement our Python courses and are suitable for learners who want to advance their skills…Each tutorial is in the form of a self-contained notebook and comes with step-by-step explanation and datasets. Many tutorials also have an accompanying video walkthrough as well. The preferred way to run each notebook is using Google Colab. Click the icon _images/fa-rocket.svg located at the top of each tutorial to open it on Colab…TrueSkill Part 2: Who is the GOAT?
In the previous post, we argued that the question of ‘Who is the Greatest Of All Time?’ for any competitive game is answerable with an algorithm. They can account for the facts that skills vary over time and many players never played each other in their prime…In this post, we’ll use a variant of this algorithm, ‘TrueSkill Through Time,’ and match data to answer the ‘Who is the GOAT?’ question for Tennis, Boxing, and Warcraft 3…Designing a Table Format for ML Workloads
In recent years the concept of a table format has really taken off, with explosive growth in technologies like Iceberg, Delta, and Hudi. With so many great options, one question I hear a lot is variations of "why can't Lance use an existing format like ...?"…In this blog post I will describe the Lance table format and hopefully answer that question. The very short TL;DR: existing table formats don't handle our customer's workflows. Basic operations require too much data copy, are too slow, or cannot be parallelized…How Many Episodes Should You Watch Before Quitting a TV Show?
When to quit a subpar TV show, according to the data…What are the top three technical skills or platforms to learn, NOT named R, Python, SQL, or any of the BI platforms (eg Tableau, PowerBI)? [Reddit]
E.g. Alteryx, OpenAI, etc?…Practical Quantization in PyTorch
Quantization is a cheap and easy way to make your DNN run faster and with lower memory requirements. PyTorch offers a few different approaches to quantize your model. In this blog post, we’ll lay a (quick) foundation of quantization in deep learning, and then take a look at how each technique looks like in practice. Finally we’ll end with recommendations from the literature for using quantization in your workflows…
DuckDB tricks - renaming fields in a SELECT * across tables
I was exploring some new data, joining across multiple tables, and doing a simple SELECT * as I’d not worked out yet which columns I actually wanted. The issue was, the same field name existing in more than one table. This meant that in the results from the query, it wasn’t clear which field came from which table…
A socratic dialogue over the utility of DNA language models
I think I, alongside many other people in this field, live in this seemingly parallel universe where we don’t really understand why anyone is working on DNA language models. I say ‘parallel’, because there is obviously a world in which some very smart people are very much bullish about them: specifically the Arc Institute. Who, just yesterday, released a paper that many people are quite excited about: Evo 2, a successor to the original Evo model…To the avid fans of R, I respect your fight for it but honestly curious what keeps you motivated? [Reddit]
I started my career as an R user and loved it! Then after some years in I started looking for new roles and got the slap of reality that no one asks for R. Gradually made the switch to Python and never looked back. I have nothing against R and I still fend off unreasonable attacks on R by people who never used it calling it only good for adhoc academic analysis and bla bla. But, is it still worth fighting for?…How to remedy a badly calibrated machine learning model
Maybe you have a highly accurate model, but it's not calibrated, which means that you cannot use the predict_proba values for decision making. If that's the case we have some good news because there is a remedy in scikit-learn!…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #587 here.
Cutting Room Floor
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~66,600 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian
You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.


Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.