A question that often arises here is, why choose XGBoost when you have LLMs? In fact, tabular-data-focused data scientists are deeply divided when it comes to choosing between XGBoost, lightBGM, and LLMs.
LLMs effectively classify tabular data with minimal preprocessing, though at the expense of time. To apply LLMs to tabular data, emerging approaches like prompt engineering are being explored but are still in the early stages of development.
Instead of relying solely on textual outputs, the focus is shifting towards using the internal embeddings generated by LLMs, known as latent structure embeddings. These embeddings can be integrated into traditional tabular models like XGBoost.
While Transformers have revolutionised generative AI, their primary strengths remain in handling unstructured and sequential data, as well as tasks involving intricate patterns. This convergence of techniques is a promising step towards more versatile and efficient machine learning models.
Read the full story here.
Toolkit for Ethical AI
Developing and deploying AI ethically and responsibly is of utmost importance, and a range of toolkits are available to assist this endeavour. Here are a few:
Read the full story here.
Law-breaker Llama 2
With new developments, Llama 2 is breaking many laws to emerge as a unique model which people are using to train models. The latest development is in the form of TinyLlama.
A research assistant at Singapore University has initiated the training of TinyLlama, a 1.1 billion parameter model inspired by Llama 2. His goal is to pre-train TinyLlama on a massive dataset of 3 trillion tokens. The ambitious goal goes against the Chinchilla scaling law that says that for training a Transformer-based language model to achieve optimal compute, the number of parameters and the number of tokens for training the model should scale in approximately equal proportions.
Read the full story here.
OpenAI-backed Startups
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.