Month: February 2024

Ethereum regains NFT sales crown as ETH eyes $2,500

Adewale Olarinde

Ethereum leads in NFT sales trade volume.
ETH price closes in on $2,500.

Bitcoin [BTC] made a significant entry into the NFT scene, causing a shakeup in the rankings and displacing Ethereu

The post Ethereum regains NFT sales crown as ETH eyes $2,500 appeared first on AMBCrypto.

Go here to Read this Fast! Ethereum regains NFT sales crown as ETH eyes $2,500

Originally appeared here:
Ethereum regains NFT sales crown as ETH eyes $2,500

February 10, 2024
DeeStream pushes beyond Monero and Solana in 2024 predictions

Guest Post

Monero (XMR) is known for privacy, while Solana (SOL) has a scalable solution. DeeStream (DST) offers a web3 video streaming platform

Go here to Read this Fast! DeeStream pushes beyond Monero and Solana in 2024 predictions

Originally appeared here:
DeeStream pushes beyond Monero and Solana in 2024 predictions

February 10, 2024
World’s First Token Amulets, The Dragons Debuts Amid Chinese New Year Celebration

PR DESK

Renowned gaming platform Dragonsworld has unveiled the debut of the world’s first token talismans, dubbed “The Dragons”. The project represents a remarkable fusion of tradition and innovation while seeking to redefine the crypto space. Crafted by a team of crypto enthusiasts, these token amulets aim to infuse the ancient energy of dragons into the digital […]

Go here to Read this Fast! World’s First Token Amulets, The Dragons Debuts Amid Chinese New Year Celebration

Originally appeared here:
World’s First Token Amulets, The Dragons Debuts Amid Chinese New Year Celebration

February 10, 2024
Bitcoin Records 5 Green Months Amid Accumulation Wave Hitting a 3-Year High

Brian Njuguna

As Bitcoin (BTC) edges closer to the psychological price of $50,000 thanks to heightened bullish momentum, the leading cryptocurrency has been in the green for a couple of months.

Go here to Read this Fast! Bitcoin Records 5 Green Months Amid Accumulation Wave Hitting a 3-Year High

Originally appeared here:
Bitcoin Records 5 Green Months Amid Accumulation Wave Hitting a 3-Year High

February 10, 2024
Charles Hoskinson Forecasts Cardano’s Triumph Over Ethereum with Upcoming Developments

Newton Gitonga

Charles Hoskinson, the founder of Cardano, has expressed confidence in the future of the network, underscoring its potential to surpass Ethereum in the realm of decentralized finance (DeFi) and blockchain innovation.

Go here to Read this Fast! Charles Hoskinson Forecasts Cardano’s Triumph Over Ethereum with Upcoming Developments

Originally appeared here:
Charles Hoskinson Forecasts Cardano’s Triumph Over Ethereum with Upcoming Developments

February 10, 2024
Ethereum Whistleblower Claims China’s Massive ETH Holdings ‘Jeopardize’ Whole Crypto Market

Brenda Ngari

Crypto expert and former Ethereum advisor Steven Nerayoff has raised concerns about the degree of Chinese influence and control in the Ethereum ecosystem.

Go here to Read this Fast! Ethereum Whistleblower Claims China’s Massive ETH Holdings ‘Jeopardize’ Whole Crypto Market

Originally appeared here:
Ethereum Whistleblower Claims China’s Massive ETH Holdings ‘Jeopardize’ Whole Crypto Market

February 10, 2024
Pandas for Data Engineers

Mike Shakhomirov

Advanced techniques to process and load data efficiently

Continue reading on Towards Data Science »

Originally appeared here:
Pandas for Data Engineers

Go Here to Read this Fast! Pandas for Data Engineers

February 10, 2024
Essential Checklist for Setting up Your New Apple M3 MacBook Pro

Wen Yang

A handy reference on migrating bookmarks, terminal enhancements, and AWS Cli settings

Continue reading on Towards Data Science »

Originally appeared here:
Essential Checklist for Setting up Your New Apple M3 MacBook Pro

Go Here to Read this Fast! Essential Checklist for Setting up Your New Apple M3 MacBook Pro

February 10, 2024
Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners
Vyacheslav Efimov
Large Language Models, GPT-2 — Language Models Are Unsupervised Multitask Learners

Acing GPT capabilities by turning it into a powerful multitask zero-shot model

Introduction

GPT is a well-known series of models whose last versions are currently dominating in various NLP tasks. The first GPT version was a significant milestone: being trained on enormous 120M parameters, this model demonstrated state-of-the-art performance on top benchmarks. Starting from this point, researchers tried to improve the base version.

In 2019, researchers from OpenAI officially released GPT-2. It was 10 times bigger than GPT-1 which allowed it to improve performance even further. Apart from that, the authors conjectured in their work that LLMs are multitask learners meaning that they can learn to perform several tasks at the same time. This important statement made it possible to further develop LLMs in a much more efficient framework.

In this article, we will refer to the official GPT-2 paper by going through its main aspects and improvements over GPT-1 and understand a novel approach for building LLMs.

Note. This article assumes that you are already familiar with the first version of GPT. If not, check out this article.

Large Language Models, GPT-1 — Generative Pre-Trained Transformer

The importance of understanding the GPT evolution

It is no secret that with the recent introduction of powerful models like ChatGPT or GPT-4, the first GPT versions no longer attract that much attention and appear obsolete.

Nevertheless, the following reasons explain the important motivation behind studying the GPT evolution.
- The first GPT versions introduced language learning concepts that are still used by the most recent models. The best example is GPT-2 innovating the multitask learning technique. Thanks to this concept, the modern GPT models can accurately solve a large variety of NLP tasks.
- From the algorithmic perspective, most LLMs already use many advanced techniques and it becomes harder to innovate new efficient methods. That is why NLP researchers focus more on scraping and feeding more high-quality data to models. This detail explains why there is not so much difference between internal working mechanisms in first GPT models, in comparison to ChatGPT-3.5 or GPT-4. As a result, the most principled differences are usually the amount of data fed to them and the complexity of a neural network. By understanding how first GPT models work, you can automatically recognize the working concepts of more advanced models.
Even though there might be some subtle differences in the training process between different GPT models, the aspects contributing the most to the model’s performance is the amount of data fed to it and the neural network’s complexity.

Multitask learning

GPT-2 is built on top of GPT-1 meaning that it has the same architecture. During training, GPT-1 uses the standard log-likelihood language modeling objective:

GPT’s learning objective

This expression can be thought of as an optimization of conditional probability distribution p(output | input) for a given task (in the case of GPT-1, the task consists of predicting the next token). While this approach works well for individual tasks, the model is still not able to learn to perform multiple tasks. For instance, a model trained with the aforementioned objective to predict the next token in the sequence will perform poorly on a sentiment analysis problem without proper fine-tuning.

The GPT-2 authors proposed a novel approach for replacing the common pre-training + fine-tuning framework that would allow a trained model to perform well across different tasks. The idea consists of not modeling the standard probability p(output | input) but including task conditioning p(output | input, task) instead. There exist several approaches to incorporating task type into the model. Most of the previous methods considered this information by making changes on the architecture level. Though this approach worked well in the past, it turned out that there would be no need to modify the model’s architecture for task-type incorporation.

The ultimate idea is that task information can be easily incorporated into the input sequence. For example:
- If a sentence in language A needs to be translated into the language B, then the input sequence in the dataset will be written as:
Example from the paper demonstrating input adaption for translation tasks
- If an answer should be given to a question in a provided context, then the input sequence will take the following form:
Example from the paper demonstrating input adaption for question answering tasks

Surprisingly the described approach was already proven to be competitive in previous works (e.g. MQAN model)! The only main disadvantage is its slow learning speed.

Zero-shot learning is a popular term and designates the ability of a model to perform a certain task without having explicitly received any training examples for it. GPT-2 is an example of a model having this ability.

Dataset

To use the idea of multitask learning from the previous section, for training, we would normally need a dataset whose objects contain task descriptions, text inputs and labels. However, in reality, the authors developed a robust framework which turns this supervised problem into an unsupervised one and does not even need task descriptions!

The researchers conjectured that if a model was trained on a large and diverse dataset, then there would probably be a lot of language demonstration tasks in different domains that would definitely help the model to fully understand them. To validate this hypothesis, the authors designed a web scraping algorithm that collected human responses on Reddit which received at least 3 likes. Collecting all possible Reddit responses would likely have led to data quality issues and also have been too large for a model. As a result, the final dataset version includes 8M documents containing 40GB of text data in total.

Dataset fragment containing a sentence including phrases in English and French. Such text fragments can help the model perform translation tasks. The example is taken from the paper.

A similar example to the previous one from the paper.

Since the collected dataset is very diverse, to better account for rare words and characters, the authors incorporated a slightly modified version of Byte-Pair Encoding (BPE) for input representations.

Model

According to the paper, GPT-2 has the same architecture as GPT-1 except for several changes:
- Layer normalization was moved to the input of each Transformer block and was added to the final self-attention block.
- Weights of residual layers are divided by √N at initialization where (N is the number of residual layers).
- Context size is increased from 512 to 1024.
- Batch size is augmented from 64 to 512.
- Vocabulary size is expanded from 40,000 tokens to 50,257.
Conclusion

By turning a supervised problem into the unsupervised format, multitask learning helps GPT-2 to ace the performance on various downstream tasks (except for text summarization) without explicit fine-tuning. In fact, after several years, this learning framework is still constantly gaining popularity in machine learning.

When a training dataset is sufficiently large and diverse, it allows gigantic models to enrich linguistic knowledge by simply optimizing the log-likelihood language objective. Finally, GPT-2 has become a perfect example of such a model.

Resources
- Language Models are Unsupervised Multitask Learners
All images are by the author unless noted otherwise.

Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners

Go Here to Read this Fast! Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners
February 10, 2024
The Samsung Galaxy S24 Ultra proves that the OnePlus 12 could be this year’s best-value flagship

The Samsung Galaxy S24 Ultra had been high on my list of phones to pick up in 2024, but the recently released OnePlus 12 is making me reconsider.

Go Here to Read this Fast!

The Samsung Galaxy S24 Ultra proves that the OnePlus 12 could be this year’s best-value flagship

Originally appeared here:

The Samsung Galaxy S24 Ultra proves that the OnePlus 12 could be this year’s best-value flagship

February 10, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Month: February 2024

Large Language Models, GPT-2 — Language Models Are Unsupervised Multitask Learners

Acing GPT capabilities by turning it into a powerful multitask zero-shot model

Introduction

The importance of understanding the GPT evolution

Multitask learning

Dataset

Model

Conclusion

Resources