As of this writing, Tron has a market cap of $14 billion and is holding onto its edge over The Open Network’s (TON) $13.6 billion, according to CoinGecko.
The Tron Network is gearing up to cater to an exploding meme coin craze, which is escalating transaction volumes and token prices. However, it’s BlockDAG that’s stealing the show with its audacious three-year, $10 million alliance with Borussia Dortmund—one of Europe’s soccer royalty. This premier deal is poised to catapult BlockDAG’s presale ambitions to stellar […]
The AWS DeepRacer League is the world’s first autonomous racing league, open to everyone and powered by machine learning (ML). AWS DeepRacer brings builders together from around the world, creating a community where you learn ML hands-on through friendly autonomous racing competitions. As we celebrate the achievements of over 560,000 participants from more than 150 countries who sharpened their skills through the AWS DeepRacer League over the last 6 years, we also prepare to close this chapter with a final season that serves as both a victory lap and a launching point for what’s next in the world of AWS DeepRacer.
Understand temperature, Top-k, Top-p, Frequency & Precense Penalty once and for all.
Getting a handle on temperature, Top-k, Top-p, frequency, and presence penalties can be a bit of a challenge, especially when you’re just starting out with LLM hyperparameters. Terms like “Top-k” and “presence penalty” can feel a bit overwhelming at first.
When you look up “Top-k,” you might find a definition like: “Top-k sampling limits the model’s selection of the next word to only the top-k most probable options, based on their predicted probabilities.” That’s a lot to take in! But how does this actually help when you’re working on prompt engineering?
If you’re anything like me and learn best with visuals, let’s break these down together and make these concepts easy to understand once and for all.
LLMs under the hood
Before we dive into LLM hyperparameters, let’s do a quick thought experiment. Imagine hearing the phrase “A cup of …”. Most of us would expect the next word to be something like “coffee” (or “tea” if you’re a tea person!) You probably wouldn’t think of “stars” or “courage” right away.
What’s happening here is that we’re instinctively predicting the most likely words to follow “A cup of …”, with “coffee” being a much higher likelihood than “stars”.
This is similar to how LLMs work — they calculate the probabilities of possible next words and choose one based on those probabilities.
So on a high level, the hyperparameters are ways to tune how we select the next probable words.
Let’s start with the most common hyperparameter:
Temperature
Temperature controls the randomness of the models’ output. A lower temperature makes the output more deterministic, favoring more likely words, while a higher temperature allows for more creativity by considering less likely words.
In our “A cup of…” example, setting the temperature to 0 makes the model favor the most likely word, which is “coffee”.
Image provided by the author
As temperature increases, the sampling probabilities between different words start to even out, prompting the model to generate highly unusal or unexpected outputs.
Note, that setting the temperature to 0 still doesn’t make the model completely deterministic, though it gets very close.
Use cases
Low temperature (e.g. 0.2): Ideal for tasks requiring precise and predictable results, such as technical writing or formal documentation.
High temperature (e.g. 0.8 or above): Useful for creative tasks like storytelling, poetry, or brainstorming
Max Tokens
Max tokens define the maximum number of tokens (which can be words or parts of words) the model can generate in its responses. Tokens are the smallest units of text that a model processes.
Relationship between tokens and words:
1 word = 1~2 tokens: In English, a typical word is usually split into 1 to 2 tokens. For example, simple words like “cat” might be a single token, while more complex words like “unbelievable” might be split into multiple tokens.
The general rule of thumb: You can roughly estimate the number of words by dividing tokens by 1.5 (as a rough average).
Use cases
Low max tokens (e.g. 50): Ideal for tasks requiring brief responses, such as headlines, short summaries, or concise answers. (Be careful that the model might cut off the output response)
High max tokens (e.g. 500): Useful for generating longer content like articles, stories, or detailed explanations.
Top-k
Top-k sampling restricts the model from selecting from the top k most likely next words. By narrowing the choices, it helps reduce the chances of generating irrelevant or nonsensical outputs.
In the diagram below, if we set k to 2, the model will only consider the two most likely next words — in this case, ‘coffee’ and ‘courage.’ These two words are then resampled, with their probabilities adjusted to sum to 1, ensuring one of them is chosen.
Image provided by the author.
Use cases:
Low k (e.g., k=10): Best for structured tasks where you want to maintain focus and coherence, such as summarization or coding.
High k (e.g., k=50): Suitable for creative or exploratory tasks where you want to introduce more variability without losing coherence.
Top-p
Top-p sampling selects the smallest set of words whose combined probability exceeds a threshold p (e.g., 0.9), allowing for a more context-sensitive choice of words.
In the diagram below, we start with the most probable word, ‘coffee,’ which has a probability of 0.6. Since this is less than our threshold of p = 0.9, we add the next word, ‘courage,’ with a probability of 0.2. Together, these give us a total probability of 0.8, which is still below 0.9. Finally, we consider the word ‘dreams’ with a probability of 0.13, bringing the total to 0.93, which exceeds 0.9. At this point, we stop, having selected the first two most probable words.
Image provided by the author.
Use cases:
Low p (e.g., p=0.5): Effective for tasks that require concise and to-the-point outputs, like news headlines or instructional text.
High p (e.g., p=0.95): Useful for more open-ended tasks, such as dialogue generation or creative content, where a wider variety of responses is desirable.
Frequency Penalty
A frequency penalty reduces the likelihood of the model repeating the same word within the text, promoting diversity and minimizing redundancy in the output. By applying this penalty, the model is encouraged to introduce new words instead of reusing ones that have already appeared.
The frequency penalty is calculated using the formula:
Adjusted probability = initial probability / (1 + frequency penalty * count of appearance)
For example, let’s say that the word “sun” has a probability of 0.5, and it has already appeared twice in the text. If we set the frequency penalty to 1, the adjusted probability for “sun” would be:
High Penalty (e.g., 1.0): Ideal for generating content where repetition would be distracting or undesirable, such as essays or research papers.
Low Penalty (e.g., 0.0): Useful when repetition might be necessary or beneficial, such as in poetry, mantras, or certain marketing slogans.
Presence Penalty
The presence penalty is similar to the frequency penalty but with one key difference: it penalizes the model for reusing any word or phrase that has already been mentioned, regardless of how often it appears.
In other words, repeating the word 2 times is as bad as repeating it 20 times.
The formula for adjusting the probability with a presence penalty is: Adjusted probability = initial probability / (1 + presence penalty * presence)
Let’s revisit the earlier example with the word “sun”. Instead of multiplying the penalty by the frequency of how many times “sun” has appeared, we simply check whether it has appeared at all — in this case, it has, so we count it as 1.
If we set the presence penalty to 1, the adjusted probability would be:
This reduction makes it less likely for the model to choose “sun” again, encouraging the use of new words or phrases, even if “sun” has only appeared once in the text.
Image provided by author.
Use cases:
High Penalty (e.g., 1.0): Great for exploratory or brainstorming sessions where you want the model to keep introducing new ideas or topics.
Low Penalty (e.g., 0.0): Suitable for tasks where reinforcement of key terms or ideas is important, such as technical documentation or instructional material.
Frequency and Presence Penalties often go hand-in-hand
Now that we’ve gone over the basics, let’s dive into how frequency and presence penalties are often used together. Just a heads-up, though — they’re powerful tools, but it’s important to use them with a bit of caution to get the best results.
When to use them:
Content Generation
Preventing Redundancy
When to not use them
Technical Writing: In technical documentation or specific instructions where consistent terminology is crucial, using these penalties might be counterproductive.
Brand messaging: If you’re generating content that relies heavily on a specific brand tone or key phrases, reducing repetition might dilute the brand’s voice.
By now, you should have a clearer picture of how temperature, Top-k, Top-p, frequency, and presence penalties work together to shape the output of your language model. And if it still feels a bit tricky, that’s totally okay — these concepts can take some time to fully click. Just keep experimenting and exploring, and you’ll get the hang of it before you know it.
If you find visual content like this helpful and want more, we’d love to see you in our Discord community. It’s a space where we share ideas, help each other out, and learn together.
In this post, we show how you can recommend breaking news to a user using AWS AI/ML services. By taking advantage of the power of Amazon Personalize and Amazon Titan Text Embeddings on Amazon Bedrock, you can show articles to interested users within seconds of them being published.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.