Category: Technology

Lorne Michaels said Travis Kelce could host ‘SNL’ on one pretty big condition

Joe Allen

Heidi Gardner advocated for Kelce to host for years even before he was publicly dating the world’s biggest pop star.

Go Here to Read this Fast! Lorne Michaels said Travis Kelce could host ‘SNL’ on one pretty big condition

Originally appeared here:
Lorne Michaels said Travis Kelce could host ‘SNL’ on one pretty big condition

December 29, 2024
Glen Powell goes undercover in the first teaser for ‘Chad Powers’

Joe Allen

One of the fledgling star’s next big roles will see him going undercover at a struggling college football program.

Go Here to Read this Fast! Glen Powell goes undercover in the first teaser for ‘Chad Powers’

Originally appeared here:
Glen Powell goes undercover in the first teaser for ‘Chad Powers’

December 29, 2024
5 most underrated TV shows of 2024, ranked

Christine Persaud

While shows like Industry and Silo proved to be popular with critics and audiences alike, these five underrated 2024 series deserve a little love and attention.

Go Here to Read this Fast! 5 most underrated TV shows of 2024, ranked

Originally appeared here:
5 most underrated TV shows of 2024, ranked

December 29, 2024
Thousands of widely-used public workspaces are leaking data

Major platforms impacted include GitHub, Slack, and Salesforce.

Go Here to Read this Fast! Thousands of widely-used public workspaces are leaking data

Originally appeared here:
Thousands of widely-used public workspaces are leaking data

December 29, 2024
AMD-powered, liquid cooled Comino Grando AI sever gets reviewed but I still can’t see any octo Nvidia RTX 5090 GPUs configuration

AMD-powered, liquid cooled Comino Grando H100 AI server gets its first review.

Go Here to Read this Fast!

AMD-powered, liquid cooled Comino Grando AI sever gets reviewed but I still can’t see any octo Nvidia RTX 5090 GPUs configuration

Originally appeared here:

AMD-powered, liquid cooled Comino Grando AI sever gets reviewed but I still can’t see any octo Nvidia RTX 5090 GPUs configuration

December 29, 2024
Looking for a new job? Watch out you don’t fall for this new malware scam

Researchers spotted the OtterCookie malware.

Go Here to Read this Fast!

Looking for a new job? Watch out you don’t fall for this new malware scam

Originally appeared here:

Looking for a new job? Watch out you don’t fall for this new malware scam

December 29, 2024
Apple’s cheapest AirPods 4 aren’t sonically superb, but one great perk keeps me coming back

I pride myself on seeking out audio excellence, but AirPods 4 are great for a different reason – and sometimes, it matters.

Go Here to Read this Fast!

Apple’s cheapest AirPods 4 aren’t sonically superb, but one great perk keeps me coming back

Originally appeared here:

Apple’s cheapest AirPods 4 aren’t sonically superb, but one great perk keeps me coming back

December 29, 2024
The iPhone 17 is again rumored to be finally getting a high refresh rate display

With 120Hz and ProMotion, the standard iPhone will have smoother graphics and the option of an always-on display.

Go Here to Read this Fast! The iPhone 17 is again rumored to be finally getting a high refresh rate display

Originally appeared here:
The iPhone 17 is again rumored to be finally getting a high refresh rate display

December 29, 2024
Superposition: What Makes it Difficult to Explain Neural Network
Shuyang Xiang
When there are more features than model dimensions

Introduction

It would be ideal if the world of neural network represented a one-to-one relationship: each neuron activates on one and only one feature. In such a world, interpreting the model would be straightforward: this neuron fires for the dog ear feature, and that neuron fires for the wheel of cars. Unfortunately, that is not the case. In reality, a model with dimension d often needs to represent m features, where d < m. This is when we observe the phenomenon of superposition.

In the context of machine learning, superposition refers to a specific phenomenon that one neuron in a model represents multiple overlapping features rather than a single, distinct one. For example, InceptionV1 contains one neuron that responds to cat faces, fronts of cars, and cat legs [1]. This leads to what we can superposition of different features activation in the same neuron or circuit.

The existence of superposition makes model explainability challenging, especially in deep learning models, where neurons in hidden layers represent complex combinations of patterns rather than being associated with simple, direct features.

In this blog post, we will present a simple toy example of superposition, with detailed implementations by Python in this notebook.

What makes Superposition Occur: Assumptions

We begin this section by discussing the term “feature”.

In tabular data, there is little ambiguity in defining what a feature is. For example, when predicting the quality of wine using a tabular dataset, features can be the percentage of alcohol, the year of production, etc.

However, defining features can become complex when dealing with non-tabular data, such as images or textual data. In these cases, there is no universally agreed-upon definition of a feature. Broadly, a feature can be considered any property of the input that is recognizable to most humans. For instance, one feature in a large language model (LLM) might be whether a word is in French.

Superposition occurs when the number of features is more than the model dimensions. We claim that two necessary conditions must be met if superposition would occur:
1. Non-linearity: Neural networks typically include non-linear activation functions, such as sigmoid or ReLU, at the end of each hidden layer. These activation functions give the network possibilities to map inputs to outputs in a non-linear way, so that it can capture more complex relationships between features. We can imagine that without non-linearity, the model would behave as a simple linear transformation, where features remain linearly separable, without any possibility of compression of dimensions through superposition.
2. Feature Sparsity: Feature sparsity means the fact that only a small subset of features is non-zero. For example, in language models, many features are not present at the same time: e.g. one same word cannot be is_French and is_other_languages. If all features were dense, we can imagine an important interference due to overlapping representations, making it very difficult for the model to decode features.
Toy Example: Linearity vs non-linearity with varying sparsity

Synthetic Dataset

Let us consider a toy example of 40 features with linearly decreasing feature importance: the first feature has an importance of 1, the last feature has an importance of 0.1, and the importance of the remaining features is evenly spaced between these two values.

We then generate a synthetic dataset with the following code:
```
def generate_sythentic_dataset(dim_sample, num_sapmple, sparsity): 
  """Generate synthetic dataset according to sparsity"""
  dataset=[]
  for _ in range(num_sapmple): 
    x = np.random.uniform(0, 1, n)
    mask = np.random.choice([0, 1], size=n, p=[sparsity, 1 - sparsity])
    x = x * mask  # Apply sparsity
    dataset.append(x)
  return np.array(dataset)
```
This function creates a synthetic dataset with the given number of dimensions, which is, 40 in our case. For each dimension, a random value is generated from a uniform distribution in [0, 1]. The sparsity parameter, varying between 0 and 1, controls the percentage of active features in each sample. For example, when the sparsity is 0.8, it the features in each sample has 80% chance to be zero. The function applies a mask matrix to realize the sparsity setting.

Linear and Relu Models

We would now like to explore how ReLU-based neural models lead to superposition formation and how sparsity values would change their behaviors.

We set our experiment in the following way: we compress the features with 40 dimensions into the 5 dimensional space, then reconstruct the vector by reversing the process. Observing the behavior of these transformations, we expect to see how superposition forms in each case.

To do so, we consider two very similar models:
1. Linear Model: A simple linear model with only 5 coefficients. Recall that we want to work with 40 features — far more than the model’s dimensions.
2. ReLU Model: A model almost the same to the linear one, but with an additional ReLU activation function at the end, introducing one level of non-linearity.
Both models are built using PyTorch. For example, we build the ReLU model with the following code:
```
class ReLUModel(nn.Module):
    def __init__(self, n, m):
        super().__init__()
        self.W = nn.Parameter(torch.randn(m, n) * np.sqrt(1 / n))
        self.b = nn.Parameter(torch.zeros(n))
    
    def forward(self, x):
        h = torch.relu(torch.matmul(x, self.W.T))  # Add ReLU activation: x (batch, n) * W.T (n, m) -> h (batch, m)
        x_reconstructed = torch.relu(torch.matmul(h, self.W) + self.b)  # Reconstruction with ReLU
        return x_reconstructed 
```
According to the code, the n-dimensional input vector x is projected into a lower-dimensional space by multiplying it with an m×n weight matrix. We then reconstruct the original vector by mapping it back to the original feature space through a ReLU transformation, adjusted by a bias vector. The Linear Model is given by the similar structure, with the only difference being that the reconstruction is done by using only the linear transformation instead of ReLU. We train the model by minimizing the mean squared error between the original feature samples and the reconstructed ones, weighted one the feature importance.

Results Analysis

We trained both models with different sparsity values: 0.1, 0.5, and 0.9, from less sparse to the most sparse. We have observed several important results.

First, whatever the sparsity level, ReLU models “compress” features much better than linear models: While linear models mainly capture features with the highest feature importance, ReLU models could focus on less important features by formation of superposition— where a single model dimension represents multiple features. Let us have a vision of this phenomenon in the following visualizations: for linear models, the biases are smallest for the top five features, (in case you don’t remember: the feature importance is defined as a linearly decreasing function based on feature order). In contrast, the biases for the ReLU model do not show this order and are generally reduced more.

Image by author: reconstructed bias

Another important and interesting result is that: superposition is much more likely to observe when sparsity level is high in the features. To get an impression of this phenomenon, we can visualize the matrix W^T@W, where W is the m×n weight matrix in the models. One might interpret the matrix W^T@W as a quantity of how the input features are projected onto the lower dimensional space:

In particular:
1. The diagonal of W^T@W represents the “self-similarity” of each feature inside the low dimensional transformed space.
2. The off-diagonal of the matrix represents how different features correlate to each other.
We now visualize the values of W^T@W below for both the Linear and ReLU models we have constructed before with two different sparsity levels : 0.1 and 0.9. You can see that when the sparsity value is high as 0.9, the off-diagonal elements become much bigger compared to the case when sparsity is 0.1 (You actually don’t see much difference between the two models output). This observation indicates that correlations between different features are more easily to be learned when sparsity is high.

Image by Author: matrix for sparsity 0.1

Image by author: matrix for sparsity 0.9

Conclusion

In this blog post, I made a simple experiment to introduce the formation of superposition in neural networks by comparing Linear and ReLU models with fewer dimensions than features to represent. We observed that the non-linearity introduced by the ReLU activation, combined with a certain level of sparsity, can help the model form superposition.

In real-world applications, which are much more complex than my navie example, superposition is an important mechanism for representing complex relationships in neural models, especially in vision models or LLMs.

References

[1] Zoom In: An Introduction to Circuits. https://distill.pub/2020/circuits/zoom-in/

[2] Toy models with superposition. https://transformer-circuits.pub/2022/toy_model/index.html

Superposition: What Makes it Difficult to Explain Neural Network was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Superposition: What Makes it Difficult to Explain Neural Network

Go Here to Read this Fast! Superposition: What Makes it Difficult to Explain Neural Network
December 29, 2024
This Asus laptop is my go-to MacBook alternative – and it’s on sale at Best Buy

Asus’ ROG Zephyrus G14 resembles a MacBook, but the OLED display and hardware make for a well-rounded machine that stands out.

Go Here to Read this Fast!

This Asus laptop is my go-to MacBook alternative – and it’s on sale at Best Buy

Originally appeared here:

This Asus laptop is my go-to MacBook alternative – and it’s on sale at Best Buy

December 29, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Category: Technology

When there are more features than model dimensions

Introduction

What makes Superposition Occur: Assumptions

Toy Example: Linearity vs non-linearity with varying sparsity

Synthetic Dataset

Linear and Relu Models

Results Analysis

Conclusion

References