|
Hello! Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
If this newsletter is helpful to your job, please become a paid subscriber here: https://datascienceweekly.substack.com/subscribe :) ( you get extra links each week! )
And now…let's dive into some interesting links from this week.
The Hard Truth about Artificial Intelligence in Healthcare: Clinical Effectiveness is Everything, not Flashy Tech Building machine learning/artificial intelligence medical devices (MAMDs) is much like bringing a new drug to market. First, both must be developed “in the lab.” Then, rigorous testing for efficacy and safety is conducted. Finally, physicians must be convinced to use the product and payers to reimburse it. Along this path, the vast majority of ostensibly promising drugs fail because the bar for commercial success is set high. This bar is no different for MAMDs. Yet, in the public discourse, too much weight is given to the technological sophistication of MAMDs instead of what is needed for successful implementation…
What Was Watched on Netflix in 2023? A Statistical Analysis An investigation into Netflix viewership activity in 2023…, Netflix published viewership statistics from January to June 2023, a massive data dump covering over 18,000 titles with a minimum watch time of 50,000 hours…If you want to know what 247,150,000 million households watched on television in 2023, this is the dataset for you. So today, we'll review five major takeaways from half a year of Netflix viewership and what this tells us about streaming's past, present, and future…
Learn how Pinecone's new serverless vector database helps Notion, Gong, and CS DISCO optimize their AI infrastructure from our VP of R&D, Ram Sriharsha: Up to 50x lower costs because of the separation of reads, writes, and storage O(s) fresh results with vector clustering over blob storage Fast search without sacrificing recall powered by industry-first indexing and retrieval algorithms Powerful performance with a multi-tenant compute layer Zero configuration or ongoing management
Read the technical deep dive to understand how it was built and the unique considerations that needed to be made. * Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge This article attempts to comprehensively review relevant algorithms to provide a general understanding of this booming research area. The basis of our framework categorises these studies by the approach of solving ANNS problem, respectively hash-based, tree-based, graph-based and quantization-based approaches. Then we present an overview of existing challenges for vector databases. Lastly, we sketch how vector databases can be combined with large language models and provide new possibilities…
Chess Transformers - Teaching transformers to play chess Chess Transformers is a library for training transformer models to play chess by learning from human games…
Machine Learning for Big Code and Naturalness Research on machine learning for source code…Search across all paper titles, abstracts, authors by using the search field. Please consider contributing by updating the information of existing papers or adding new work…
What do you think about Yann Lecun's controversial opinions about ML? [Reddit] Yann Lecun has some controversial opinions about ML, and he's not shy about sharing them. He wrote a position paper called "A Path towards Autonomous Machine Intelligence" a while ago. Since then, he also gave a bunch of talks about this…
Six not-so-basic base R functions R is known for its versatility and extensive collection of packages. As of the publishing of this post, there are over 23 thousand packages on R-universe. But what if I told you that you could do some pretty amazing things without loading any packages at all?…There’s a lot of love for base R, and I am excited to pile on. In this blog post, we will explore a few of my favorite “not-so-basic” (i.e., maybe new to you!) base R functions. Click ‘Run code’ in order to see them in action, made possible by webR and the quarto-webr extension!…
The Perfect Way to Smooth Your Noisy Data Insanely fast and reliable smoothing and interpolation with the Whittaker-Eilers method…Real-world data is never clean. Whether you’re carrying out a survey, measuring rainfall or receiving GPS signals from space, noisy data is ever present. Dealing with such data is the main part of a data scientist’s job. It’s not all glamorous machine learning models and AI — it’s cleaning data in an attempt to extract as much meaningful information as possible. If you’re currently looking at a graph that has way too many squiggles to be useful. Well, I have the solution you’re looking for…
Casual and trustworthy machine learning: methods and applications This work focuses on the intersection of machine learning and causal inference and the way in which the two fields can enhance each other by sharing ideas: utilizing machine learning techniques for the computation of causal quantities, the use of ideas from causal inference for invariant predictions under unseen treatment regimes, and the exploration of topics in trustworthy machine learning, including interpretability and fairness, with a causal lens. In each one of the presented works, we grappled with the strength of assumptions needed to utilize causal inference techniques and relax portions of them when possible…
AI for Economists: Prompts & Resources This page contains example prompts and responses intended to showcase how generative AI, namely LLMs like GPT-4, can benefit economists. Example prompts are shown from six domains: ideation and feedback; writing; background research; coding; data analysis; and mathematical derivations…
ARIMA vs Prophet vs LSTM for Time Series Prediction In this post, we will discuss three popular approaches to learning from time-series data: 1) The classic ARIMA framework for time series prediction 2) Facebook’s in-house model Prophet, which is specifically designed for learning from business time series 3) The LSTM model, a powerful recurrent neural network approach that has been used to achieve the best-known results for many problems on sequential data…
Iterative ‘mapping’ in R In my consulting work, I’m commonly asked to build out maps, charts, or reports for a large number of cities or regions at once. The goal here is often to allow for rapid exploration / iteration, so a basic map template might be fine. Doing this for a few cities one-by-one isn’t a problem, but it quickly gets tedious when you have dozens, if not hundreds, of visuals to produce – and keeping all the results organized can be a pain…
🩺 Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models We propose a framework that decodes specific information from a representation within an LLM by “patching” it into the inference pass on a different prompt that has been designed to encourage the extraction of that information. A "Patchscope" is a configuration of our framework that can be viewed as an inspection tool geared towards a particular objective. For example, this figure shows a simple Patchscope for decoding what is encoded in the representation of "CEO" in the source prompt (left). We patch a target prompt (right) comprised of few-shot demonstrations of token repetitions, which encourages decoding the token identity given a hidden representation…
tinytable tinytable is a small but powerful R package to draw HTML, LaTeX, PDF, Markdown, and Typst tables. The user interface is minimalist, but it gives users access to powerful frameworks to create endlessly customizable tables….
* Based on unique clicks. ** Find last week's issue #529 here.
Thank you for joining us this week! :) All our best, Hannah & Sebastian Copyright © 2013-2024 DataScienceWeekly.org, All rights reserved.
P.S. A new thing for paid subscribers => Even more links are below!... Subscribe to Data Science Weekly Newsletter to read the rest.Become a paying subscriber of Data Science Weekly Newsletter to get access to this post and other subscriber-only content. A subscription gets you: | More links in each newsletter! |  | Subscriber-only posts and Q&A's / Career and job search Q&As / office hours |  | More newsletters per week full of data science / ml / ai inspiration! |
| |
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.