Category: Artificial Intelligence

The Smarter Way of Using AI in Programming

Hesam Sheikh

avoid the outdated methods of integrating AI into your coding workflow by going beyond ChatGPT

Continue reading on Towards Data Science »

Originally appeared here:
The Smarter Way of Using AI in Programming

Go Here to Read this Fast! The Smarter Way of Using AI in Programming

August 29, 2024
How to Create Custom Color Palettes in Matplotlib — Discrete vs. Linear Colormaps, Explained

Dario Radečić

Actionable guide on how to bring custom colors to personalize your charts

Continue reading on Towards Data Science »

Originally appeared here:
How to Create Custom Color Palettes in Matplotlib — Discrete vs. Linear Colormaps, Explained

Go Here to Read this Fast! How to Create Custom Color Palettes in Matplotlib — Discrete vs. Linear Colormaps, Explained

August 29, 2024
The Essential Guide to Error-Checking and Reviewing Presentations

Dimitris Panagopoulos

An overlooked skill for Data Scientists (and not only)

Continue reading on Towards Data Science »

Originally appeared here:
The Essential Guide to Error-Checking and Reviewing Presentations

Go Here to Read this Fast! The Essential Guide to Error-Checking and Reviewing Presentations

August 29, 2024
A visual explanation of LLM hyperparameters
Jenn J.
Understand temperature, Top-k, Top-p, Frequency & Precense Penalty once and for all.

Getting a handle on temperature, Top-k, Top-p, frequency, and presence penalties can be a bit of a challenge, especially when you’re just starting out with LLM hyperparameters. Terms like “Top-k” and “presence penalty” can feel a bit overwhelming at first.

When you look up “Top-k,” you might find a definition like: “Top-k sampling limits the model’s selection of the next word to only the top-k most probable options, based on their predicted probabilities.” That’s a lot to take in! But how does this actually help when you’re working on prompt engineering?

If you’re anything like me and learn best with visuals, let’s break these down together and make these concepts easy to understand once and for all.

LLMs under the hood

Before we dive into LLM hyperparameters, let’s do a quick thought experiment. Imagine hearing the phrase “A cup of …”. Most of us would expect the next word to be something like “coffee” (or “tea” if you’re a tea person!) You probably wouldn’t think of “stars” or “courage” right away.

What’s happening here is that we’re instinctively predicting the most likely words to follow “A cup of …”, with “coffee” being a much higher likelihood than “stars”.

This is similar to how LLMs work — they calculate the probabilities of possible next words and choose one based on those probabilities.

So on a high level, the hyperparameters are ways to tune how we select the next probable words.

Let’s start with the most common hyperparameter:

Temperature

Temperature controls the randomness of the models’ output. A lower temperature makes the output more deterministic, favoring more likely words, while a higher temperature allows for more creativity by considering less likely words.

In our “A cup of…” example, setting the temperature to 0 makes the model favor the most likely word, which is “coffee”.

Image provided by the author

As temperature increases, the sampling probabilities between different words start to even out, prompting the model to generate highly unusal or unexpected outputs.

Note, that setting the temperature to 0 still doesn’t make the model completely deterministic, though it gets very close.

Use cases
- Low temperature (e.g. 0.2): Ideal for tasks requiring precise and predictable results, such as technical writing or formal documentation.
- High temperature (e.g. 0.8 or above): Useful for creative tasks like storytelling, poetry, or brainstorming
Max Tokens

Max tokens define the maximum number of tokens (which can be words or parts of words) the model can generate in its responses. Tokens are the smallest units of text that a model processes.

Relationship between tokens and words:
- 1 word = 1~2 tokens: In English, a typical word is usually split into 1 to 2 tokens. For example, simple words like “cat” might be a single token, while more complex words like “unbelievable” might be split into multiple tokens.
- The general rule of thumb: You can roughly estimate the number of words by dividing tokens by 1.5 (as a rough average).
Use cases
- Low max tokens (e.g. 50): Ideal for tasks requiring brief responses, such as headlines, short summaries, or concise answers. (Be careful that the model might cut off the output response)
- High max tokens (e.g. 500): Useful for generating longer content like articles, stories, or detailed explanations.
Top-k

Top-k sampling restricts the model from selecting from the top k most likely next words. By narrowing the choices, it helps reduce the chances of generating irrelevant or nonsensical outputs.

In the diagram below, if we set k to 2, the model will only consider the two most likely next words — in this case, ‘coffee’ and ‘courage.’ These two words are then resampled, with their probabilities adjusted to sum to 1, ensuring one of them is chosen.

Image provided by the author.

Use cases:
- Low k (e.g., k=10): Best for structured tasks where you want to maintain focus and coherence, such as summarization or coding.
- High k (e.g., k=50): Suitable for creative or exploratory tasks where you want to introduce more variability without losing coherence.
Top-p

Top-p sampling selects the smallest set of words whose combined probability exceeds a threshold p (e.g., 0.9), allowing for a more context-sensitive choice of words.

In the diagram below, we start with the most probable word, ‘coffee,’ which has a probability of 0.6. Since this is less than our threshold of p = 0.9, we add the next word, ‘courage,’ with a probability of 0.2. Together, these give us a total probability of 0.8, which is still below 0.9. Finally, we consider the word ‘dreams’ with a probability of 0.13, bringing the total to 0.93, which exceeds 0.9. At this point, we stop, having selected the first two most probable words.

Image provided by the author.

Use cases:
- Low p (e.g., p=0.5): Effective for tasks that require concise and to-the-point outputs, like news headlines or instructional text.
- High p (e.g., p=0.95): Useful for more open-ended tasks, such as dialogue generation or creative content, where a wider variety of responses is desirable.
Frequency Penalty

A frequency penalty reduces the likelihood of the model repeating the same word within the text, promoting diversity and minimizing redundancy in the output. By applying this penalty, the model is encouraged to introduce new words instead of reusing ones that have already appeared.

The frequency penalty is calculated using the formula:

Adjusted probability = initial probability / (1 + frequency penalty * count of appearance)

For example, let’s say that the word “sun” has a probability of 0.5, and it has already appeared twice in the text. If we set the frequency penalty to 1, the adjusted probability for “sun” would be:

Adjusted probability = 0.5 / (1 + 1 * 2) = 0.5 / 3 = 0.16

Image provided by the author.

Use cases:
- High Penalty (e.g., 1.0): Ideal for generating content where repetition would be distracting or undesirable, such as essays or research papers.
- Low Penalty (e.g., 0.0): Useful when repetition might be necessary or beneficial, such as in poetry, mantras, or certain marketing slogans.
Presence Penalty

The presence penalty is similar to the frequency penalty but with one key difference: it penalizes the model for reusing any word or phrase that has already been mentioned, regardless of how often it appears.

In other words, repeating the word 2 times is as bad as repeating it 20 times.

The formula for adjusting the probability with a presence penalty is:
Adjusted probability = initial probability / (1 + presence penalty * presence)

Let’s revisit the earlier example with the word “sun”. Instead of multiplying the penalty by the frequency of how many times “sun” has appeared, we simply check whether it has appeared at all — in this case, it has, so we count it as 1.

If we set the presence penalty to 1, the adjusted probability would be:

Adjusted probability = 0.5 / (1 + 1 * 1) = 0.5 / 2 = 0.25

This reduction makes it less likely for the model to choose “sun” again, encouraging the use of new words or phrases, even if “sun” has only appeared once in the text.

Image provided by author.

Use cases:
- High Penalty (e.g., 1.0): Great for exploratory or brainstorming sessions where you want the model to keep introducing new ideas or topics.
- Low Penalty (e.g., 0.0): Suitable for tasks where reinforcement of key terms or ideas is important, such as technical documentation or instructional material.
Frequency and Presence Penalties often go hand-in-hand

Now that we’ve gone over the basics, let’s dive into how frequency and presence penalties are often used together. Just a heads-up, though — they’re powerful tools, but it’s important to use them with a bit of caution to get the best results.

When to use them:
- Content Generation
- Preventing Redundancy
When to not use them
- Technical Writing: In technical documentation or specific instructions where consistent terminology is crucial, using these penalties might be counterproductive.
- Brand messaging: If you’re generating content that relies heavily on a specific brand tone or key phrases, reducing repetition might dilute the brand’s voice.
By now, you should have a clearer picture of how temperature, Top-k, Top-p, frequency, and presence penalties work together to shape the output of your language model. And if it still feels a bit tricky, that’s totally okay — these concepts can take some time to fully click. Just keep experimenting and exploring, and you’ll get the hang of it before you know it.

If you find visual content like this helpful and want more, we’d love to see you in our Discord community. It’s a space where we share ideas, help each other out, and learn together.

A visual explanation of LLM hyperparameters was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
A visual explanation of LLM hyperparameters

Go Here to Read this Fast! A visual explanation of LLM hyperparameters
August 29, 2024
Provide a personalized experience for news readers using Amazon Personalize and Amazon Titan Text Embeddings on Amazon Bedrock

Joydeep Dutta

In this post, we show how you can recommend breaking news to a user using AWS AI/ML services. By taking advantage of the power of Amazon Personalize and Amazon Titan Text Embeddings on Amazon Bedrock, you can show articles to interested users within seconds of them being published.

Originally appeared here:
Provide a personalized experience for news readers using Amazon Personalize and Amazon Titan Text Embeddings on Amazon Bedrock

Go Here to Read this Fast! Provide a personalized experience for news readers using Amazon Personalize and Amazon Titan Text Embeddings on Amazon Bedrock

August 29, 2024
Sequential Testing: The Secret Sauce for Low-Volume A/B Tests

Zachary Raicik

How to Accelerate Decision-Making with Low Volume Data

Continue reading on Towards Data Science »

Originally appeared here:
Sequential Testing: The Secret Sauce for Low-Volume A/B Tests

Go Here to Read this Fast! Sequential Testing: The Secret Sauce for Low-Volume A/B Tests

August 29, 2024
Let’s Write a Composable, Easy-to-Use Caching Package in Python

Mike Huls

Easy, user-friendly caching that tailors to all your needs

Continue reading on Towards Data Science »

Originally appeared here:
Let’s Write a Composable, Easy-to-Use Caching Package in Python

Go Here to Read this Fast! Let’s Write a Composable, Easy-to-Use Caching Package in Python

August 29, 2024
PySpark Explained: Delta Table Time Travel Queries

Thomas Reid

Delete, recover, and replay historical data transactions

Continue reading on Towards Data Science »

Originally appeared here:
PySpark Explained: Delta Table Time Travel Queries

Go Here to Read this Fast! PySpark Explained: Delta Table Time Travel Queries

August 29, 2024
Attention, Please!

Dimitris Poulopoulos

Attention is all you need, but the span is limited.

Continue reading on Towards Data Science »

Originally appeared here:
Attention, Please!

Go Here to Read this Fast! Attention, Please!

August 29, 2024
Mistral-NeMo: 4.1x Smaller with Quantized Minitron

Benjamin Marie

How pruning, knowledge distillation, and 4-bit quantization can make advanced AI models more accessible and cost-effective

Continue reading on Towards Data Science »

Originally appeared here:
Mistral-NeMo: 4.1x Smaller with Quantized Minitron

Go Here to Read this Fast! Mistral-NeMo: 4.1x Smaller with Quantized Minitron

August 29, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Category: Artificial Intelligence

Understand temperature, Top-k, Top-p, Frequency & Precense Penalty once and for all.

LLMs under the hood

Temperature

Max Tokens

Top-k

Top-p

Frequency Penalty

Presence Penalty

Frequency and Presence Penalties often go hand-in-hand