Tag: technews

Can’t shake this off… Our best-rated TV crashes to a record-low for the Super Bowl

This TV belongs with you! Samsung’s stunning 65-inch S90C OLED is on sale for a record-low price of $1,599 for the Super Bowl.

Go Here to Read this Fast!

Can’t shake this off… Our best-rated TV crashes to a record-low for the Super Bowl

Originally appeared here:

Can’t shake this off… Our best-rated TV crashes to a record-low for the Super Bowl

February 10, 2024
Verizon’s Galaxy S24 deals just got even better – you can now buy-one get-one-free

Looking to set up multiple lines? Verizon’s latest Galaxy S24 deal can potentially bag you a second device on the house.

Go Here to Read this Fast!

Verizon’s Galaxy S24 deals just got even better – you can now buy-one get-one-free

Originally appeared here:

Verizon’s Galaxy S24 deals just got even better – you can now buy-one get-one-free

February 10, 2024
Whatever you do, don’t buy a Samsung Galaxy S23 Ultra

With longer support and AI features that won’t come to older phones, it’s more important to opt for the newer Galaxy S24 Ultra

Go Here to Read this Fast! Whatever you do, don’t buy a Samsung Galaxy S23 Ultra

Originally appeared here:
Whatever you do, don’t buy a Samsung Galaxy S23 Ultra

February 10, 2024
Iconic monitor maker delivers superlight laptop that you can only buy in Japan — but it’s its 4-year warranty that makes iiyama’s ultrabook such a great buy

The iiyama Campus PC notebook is designed for student use in Japan, and has a 4-year warranty.

Go Here to Read this Fast!

Iconic monitor maker delivers superlight laptop that you can only buy in Japan — but it’s its 4-year warranty that makes iiyama’s ultrabook such a great buy

Originally appeared here:

Iconic monitor maker delivers superlight laptop that you can only buy in Japan — but it’s its 4-year warranty that makes iiyama’s ultrabook such a great buy

February 10, 2024
Pandas for Data Engineers

Mike Shakhomirov

Advanced techniques to process and load data efficiently

Continue reading on Towards Data Science »

Originally appeared here:
Pandas for Data Engineers

Go Here to Read this Fast! Pandas for Data Engineers

February 10, 2024
Essential Checklist for Setting up Your New Apple M3 MacBook Pro

Wen Yang

A handy reference on migrating bookmarks, terminal enhancements, and AWS Cli settings

Continue reading on Towards Data Science »

Originally appeared here:
Essential Checklist for Setting up Your New Apple M3 MacBook Pro

Go Here to Read this Fast! Essential Checklist for Setting up Your New Apple M3 MacBook Pro

February 10, 2024
Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners
Vyacheslav Efimov
Large Language Models, GPT-2 — Language Models Are Unsupervised Multitask Learners

Acing GPT capabilities by turning it into a powerful multitask zero-shot model

Introduction

GPT is a well-known series of models whose last versions are currently dominating in various NLP tasks. The first GPT version was a significant milestone: being trained on enormous 120M parameters, this model demonstrated state-of-the-art performance on top benchmarks. Starting from this point, researchers tried to improve the base version.

In 2019, researchers from OpenAI officially released GPT-2. It was 10 times bigger than GPT-1 which allowed it to improve performance even further. Apart from that, the authors conjectured in their work that LLMs are multitask learners meaning that they can learn to perform several tasks at the same time. This important statement made it possible to further develop LLMs in a much more efficient framework.

In this article, we will refer to the official GPT-2 paper by going through its main aspects and improvements over GPT-1 and understand a novel approach for building LLMs.

Note. This article assumes that you are already familiar with the first version of GPT. If not, check out this article.

Large Language Models, GPT-1 — Generative Pre-Trained Transformer

The importance of understanding the GPT evolution

It is no secret that with the recent introduction of powerful models like ChatGPT or GPT-4, the first GPT versions no longer attract that much attention and appear obsolete.

Nevertheless, the following reasons explain the important motivation behind studying the GPT evolution.
- The first GPT versions introduced language learning concepts that are still used by the most recent models. The best example is GPT-2 innovating the multitask learning technique. Thanks to this concept, the modern GPT models can accurately solve a large variety of NLP tasks.
- From the algorithmic perspective, most LLMs already use many advanced techniques and it becomes harder to innovate new efficient methods. That is why NLP researchers focus more on scraping and feeding more high-quality data to models. This detail explains why there is not so much difference between internal working mechanisms in first GPT models, in comparison to ChatGPT-3.5 or GPT-4. As a result, the most principled differences are usually the amount of data fed to them and the complexity of a neural network. By understanding how first GPT models work, you can automatically recognize the working concepts of more advanced models.
Even though there might be some subtle differences in the training process between different GPT models, the aspects contributing the most to the model’s performance is the amount of data fed to it and the neural network’s complexity.

Multitask learning

GPT-2 is built on top of GPT-1 meaning that it has the same architecture. During training, GPT-1 uses the standard log-likelihood language modeling objective:

GPT’s learning objective

This expression can be thought of as an optimization of conditional probability distribution p(output | input) for a given task (in the case of GPT-1, the task consists of predicting the next token). While this approach works well for individual tasks, the model is still not able to learn to perform multiple tasks. For instance, a model trained with the aforementioned objective to predict the next token in the sequence will perform poorly on a sentiment analysis problem without proper fine-tuning.

The GPT-2 authors proposed a novel approach for replacing the common pre-training + fine-tuning framework that would allow a trained model to perform well across different tasks. The idea consists of not modeling the standard probability p(output | input) but including task conditioning p(output | input, task) instead. There exist several approaches to incorporating task type into the model. Most of the previous methods considered this information by making changes on the architecture level. Though this approach worked well in the past, it turned out that there would be no need to modify the model’s architecture for task-type incorporation.

The ultimate idea is that task information can be easily incorporated into the input sequence. For example:
- If a sentence in language A needs to be translated into the language B, then the input sequence in the dataset will be written as:
Example from the paper demonstrating input adaption for translation tasks
- If an answer should be given to a question in a provided context, then the input sequence will take the following form:
Example from the paper demonstrating input adaption for question answering tasks

Surprisingly the described approach was already proven to be competitive in previous works (e.g. MQAN model)! The only main disadvantage is its slow learning speed.

Zero-shot learning is a popular term and designates the ability of a model to perform a certain task without having explicitly received any training examples for it. GPT-2 is an example of a model having this ability.

Dataset

To use the idea of multitask learning from the previous section, for training, we would normally need a dataset whose objects contain task descriptions, text inputs and labels. However, in reality, the authors developed a robust framework which turns this supervised problem into an unsupervised one and does not even need task descriptions!

The researchers conjectured that if a model was trained on a large and diverse dataset, then there would probably be a lot of language demonstration tasks in different domains that would definitely help the model to fully understand them. To validate this hypothesis, the authors designed a web scraping algorithm that collected human responses on Reddit which received at least 3 likes. Collecting all possible Reddit responses would likely have led to data quality issues and also have been too large for a model. As a result, the final dataset version includes 8M documents containing 40GB of text data in total.

Dataset fragment containing a sentence including phrases in English and French. Such text fragments can help the model perform translation tasks. The example is taken from the paper.

A similar example to the previous one from the paper.

Since the collected dataset is very diverse, to better account for rare words and characters, the authors incorporated a slightly modified version of Byte-Pair Encoding (BPE) for input representations.

Model

According to the paper, GPT-2 has the same architecture as GPT-1 except for several changes:
- Layer normalization was moved to the input of each Transformer block and was added to the final self-attention block.
- Weights of residual layers are divided by √N at initialization where (N is the number of residual layers).
- Context size is increased from 512 to 1024.
- Batch size is augmented from 64 to 512.
- Vocabulary size is expanded from 40,000 tokens to 50,257.
Conclusion

By turning a supervised problem into the unsupervised format, multitask learning helps GPT-2 to ace the performance on various downstream tasks (except for text summarization) without explicit fine-tuning. In fact, after several years, this learning framework is still constantly gaining popularity in machine learning.

When a training dataset is sufficiently large and diverse, it allows gigantic models to enrich linguistic knowledge by simply optimizing the log-likelihood language objective. Finally, GPT-2 has become a perfect example of such a model.

Resources
- Language Models are Unsupervised Multitask Learners
All images are by the author unless noted otherwise.

Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners

Go Here to Read this Fast! Large Language Models, GPT-2 — Language Models are Unsupervised Multitask Learners
February 10, 2024
The Samsung Galaxy S24 Ultra proves that the OnePlus 12 could be this year’s best-value flagship

The Samsung Galaxy S24 Ultra had been high on my list of phones to pick up in 2024, but the recently released OnePlus 12 is making me reconsider.

Go Here to Read this Fast!

The Samsung Galaxy S24 Ultra proves that the OnePlus 12 could be this year’s best-value flagship

Originally appeared here:

The Samsung Galaxy S24 Ultra proves that the OnePlus 12 could be this year’s best-value flagship

February 10, 2024
iOS 17.4 might give you more options for turning off those FaceTime reactions

With the changes apparently coming to iOS 17.4, there will be less chance of these reaction catching you unawares.

Go Here to Read this Fast! iOS 17.4 might give you more options for turning off those FaceTime reactions

Originally appeared here:
iOS 17.4 might give you more options for turning off those FaceTime reactions

February 10, 2024
4K Blu-ray isn’t dying despite Disney and Best Buy’s efforts – it’s more important than ever

4K Blu-ray may appear to be on the decline, with Disney and Best Buy distancing themselves from it, but smaller companies are keeping the 4K Blu-ray flag flying.

Go Here to Read this Fast!

4K Blu-ray isn’t dying despite Disney and Best Buy’s efforts – it’s more important than ever

Originally appeared here:

4K Blu-ray isn’t dying despite Disney and Best Buy’s efforts – it’s more important than ever

February 10, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Tag: technews

Large Language Models, GPT-2 — Language Models Are Unsupervised Multitask Learners

Acing GPT capabilities by turning it into a powerful multitask zero-shot model

Introduction

The importance of understanding the GPT evolution

Multitask learning

Dataset

Model

Conclusion

Resources