A journey into the surprising world of high-dimensional data: the blessings and the challenges
Richard Feynman, the renowned physicist, once said, “I can safely say that nobody understands quantum mechanics.” In his interview titled “Fun to Imagine with Richard Feynman”, he touched on the strange behavior of things at the atomic and subatomic particles level, noting how they often defy our common sense. Interestingly, we can notice a similar behavior at the level of higher-dimensional data. It’s not exactly like quantum mechanics, but there’s a similar element of surprise and beauty — mixed with a few/a lot of challenges — when we transition from lower to higher dimensions.
In this and future articles, I want to provide some insights into this fascinating topic. My goal is to pique the interest and encourage learning about the world of higher-dimensional data, especially by those who are unfamiliar with it.
High-dimensional data, or data in higher dimensions, in the context of data analysis and ML, generally refers to datasets that have a large number of variables, features, or attributes. Each of those represents a different “dimension” in our data.
To begin, let’s examine some basic examples that highlight the distinctions that arise when we go from lower-dimensional spaces to higher-dimensional ones.
Volume Concentration in High-Dimensional Spaces
First, let’s explore the concept of volume concentration in high-dimensional spaces. Consider generating random points within a hypercube whose sides range from 0 to 1. How likely is it that these points fall in the middle region of this hypercube as its dimensions increase?
In the image above, let’s assume x is a small value, such as 0.1. We aim to determine how the probability of a point randomly falling in this middle region (not on the edge) varies with increasing dimensions.
- One-Dimensional Space (Line)
Think of a line segment from 0 to 1. The middle part is between 0.1 and 0.9, The chance of a random point landing here is simply the length of this middle segment over the total length, which is 0.8.
2. Two-Dimensional Space (Square)
Now, envision a square where each side ranges from 0 to 1. The middle region is a smaller square with each side from 0.1 to 0.9. The probability calculation involves comparing the area of this smaller square to the total area, giving us a probability of 0.64.
3. Three-Dimensional Space (Cube)
For a cube with each edge measuring 1, the middle region is a smaller cube with each edge from 0.1 to 0.9. Here, the probability is the volume of this smaller cube divided by the total volume, resulting in 0.512.
4. Higher Dimensions (Hypercube)
In a hypercube of n dimensions, the ‘volume’ of the middle region shrinks drastically as dimensions increase. For instance, in 4D, the probability is 0.4096; in 5D, it’s 0.32768; and in 10D, it drops to approximately 0.10737.
The generalization of this idea starts with considering the edge to be of a small distance x, as shown in the image above. For a line, the probability of a point falling in the middle region is 1–2x. For a square, it’s (1–2x)*(1–2x), as a point must fall in the middle of both dimensions.
This pattern continues in n dimensions, where the probability of falling in the middle region is (1–2x)^n, becoming very small in higher dimensions.
Note that here, we simplify by considering the length of each side as 1.
Inscribing a Hypersphere within a Hypercube
To further illustrate the concept of volume concentration, I performed a simple simulation using python, where we inscribe a hypersphere within a hypercube, and then compare the ratio of the volume of the hypersphere to the hypercube as the dimensions increase.
What’s a Hypercube Anyway?
Picture a square. Now, puff it out into a cube. That’s the jump from 2D to 3D. Now, take a leap of imagination into the fourth dimension and beyond — that’s where hypercubes come in. A hypercube is essentially a cube extended into higher dimensions. It’s a shape with equal sides, and in our simulation, we’re considering hypercubes with side lengths of 2. The formula for its volume? just 2^n(2 to the power n) for an n-dimensional hypercube.
And a Hypersphere?
A hypersphere, the higher-dimensional equivalent of a sphere, emerges when you extend a 2D circle into 3D (forming a sphere) and then continue into higher dimensions. The catch? Its volume isn’t as straightforward to calculate. It involves pi (yes, the famous 3.14159…) and the gamma function, which is like a factorial on steroids. In a nutshell, the volume of a hypersphere with a radius of 1 in an n-dimensional space is:
The Gamma function Γ(n) extends the factorial function to real and complex numbers. For positive integers n, Γ(n)=(n−1)!, and for non-integer values, it is computed numerically.
To calculate this ratio using python we can use the following code:
import math
import matplotlib.pyplot as plt
def hypersphere_volume(dim):
""" Calculate the volume of a hypersphere with radius 1 in 'dim' dimensions. """
return math.pi ** (dim / 2) / math.gamma(dim / 2 + 1)
def hypercube_volume(dim):
""" Calculate the volume of a hypercube with side length 2 in 'dim' dimensions. """
return 2 ** dim
# Number of dimensions to consider
max_dim = 20
# Lists to hold volumes and dimension values
dimensions = range(1, max_dim + 1)
sphere_volumes = [hypersphere_volume(dim) for dim in dimensions]
cube_volumes = [hypercube_volume(dim) for dim in dimensions]
ratios = [sphere_volumes[i] / cube_volumes[i] for i in range(max_dim)]
# Plotting the results
plt.figure(figsize=(10, 6))
plt.plot(dimensions, ratios, marker='o')
plt.xlabel('Number of Dimensions')
plt.ylabel('Ratio of Volumes (Hypersphere/Hypercube)')
plt.title('Volume Concentration in Higher Dimensions')
plt.grid(True)
plt.show()
The output of the above code is the following graph:
We can clearly see that as we go to higher dimensions the ratio decreases rapidly, leaving the most volume concentrated in the corners of the hypercube.
These examples demonstrate that in higher dimensions, the volume of the middle region becomes a progressively smaller fraction of the total volume, highlighting the counterintuitive nature of high-dimensional spaces.
Q: What are some of the implications of this volume concentration phenomenon on the performance of ML algorithms?
The Paper and DVD Experiment
Consider the experiment where you try to fit a DVD through a paper piece with a square hole. Initially, it seems impossible as the square’s diagonal is smaller than the DVD’s diameter. However, folding the paper allows the DVD to pass through.
The folding of the paper, a small yet effective adjustment of spatial dimensions, holds the key to the puzzle. An intriguing analogy for comprehending the complexity of higher-dimensional landscapes can be found in this experiment.
When the paper is first laid out, it forms a two-dimensional plane. The square slot seems too narrow to let the DVD through because of its set dimensions.
This hypothetical situation is consistent with our daily experiences in a three-dimensional environment, where length, width, and height are the units of measurement for size and distance. But the instant we begin to fold the paper, we add another dimension. The hole and the DVD’s spatial connection are completely altered by this folding action.
In this new three-dimensional setting, the concept of distance, which was so inflexible and clear-cut in two dimensions, becomes more flexible and less intuitive. The paper is folded, which effectively modifies the angles generated by the paper’s edges and the distances between points surrounding the hole.
The hole in this new three-dimensional form can fit the DVD, demonstrating how the inclusion of a third dimension can make an apparently hopeless task in a two-dimensional space achievable.
The mathematics underlying this experiment is explained in full in an intriguing study by Weiwei Lin et al.
You can also watch this beautiful video by “The Action Lab” that demonstrates the idea intuitively:
This shift in perspective has significant implications, especially in the fields of mathematics, physics, and machine learning. This idea is reflected in machine learning methods like Support Vector Machines (SVMs).
SVM and the Kernel Trick
The kernel trick in Support Vector Machines (SVMs) shows a similar idea. In SVMs, we often encounter data that isn’t linearly separable. The kernel trick overcomes this by transforming the data into a higher-dimensional space, akin to how folding the paper changed its spatial properties. (In reality, SVMs don’t actually transform data into higher dimensions, as this is computationally expensive. Instead, they compute relationships between data points as if they were in a higher dimension using the kernel trick).
In simpler terms, SVMs normally find a separating line (or hyperplane) in lower dimensions. But with non-linear data, this isn’t possible. The kernel trick, like folding the paper, adds dimensions, making it easier to find a hyperplane that does the job.
The kernel trick doesn’t just shift dimensions; it also simplifies complex problems. It’s really a great example of how higher-dimensional thinking can provide solutions to problems that seem impossible in lower dimensions.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
# Here I manullay entered a data that is not linearly seperable in 1D
x = np.array([1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,20,21,23,24,25,26,27,28,29,30]).reshape(-1, 1) # Replace YOUR_X_VALUES with your data
y = np.array([1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1]) # Replace YOUR_Y_VALUES with your class labels
# Non-linear transformation to 2D, (squaring)
def transform_to_2d(X):
return np.c_[X, X**2]
# Transforming data to 2D
X_transformed = transform_to_2d(x)
# Fitting SVM with a linear kernel in the transformed 2D space
svm = SVC(kernel='linear')
svm.fit(X_transformed, y)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# 1D data plot
axes[0].scatter(x, np.zeros_like(x), c=y, cmap='bwr', edgecolors='k')
axes[0].set_title('Original 1D Data')
axes[0].set_xlabel('Feature')
axes[0].set_yticks([])
# 2D transformed data plot
axes[1].scatter(X_transformed[:, 0], X_transformed[:, 1], c=y, cmap='bwr', edgecolors='k')
axes[1].set_title('Transformed 2D Data')
axes[1].set_xlabel('Original Feature')
axes[1].set_ylabel('Transformed Feature (X^2)')
# Plotting the decision boundary in 2D
ax = axes[1]
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
# Getting the separating hyperplane
Z = svm.decision_function(xy).reshape(XX.shape)
# Plotting decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
plt.tight_layout()
plt.show()
The result of the above code is the following graph:
It’s clear that the data, initially non-linearly separable on the left, becomes separable in 2D. This transformation, as shown in the graph to the right, effectively solves our problem. Isn’t this amazing?
Wrapping Up
In this article, we have explored some ideas about the world of high-dimensional data. We have shown how entering higher dimensions can greatly alter our viewpoints and methods of approaching problems, starting with volume concentration and continuing with the real-life example of the paper and DVD experiment and, lastly, the kernel trick in SVMs.
In the upcoming article, we will discuss the “curse of dimensionality,” which refers to the difficulties and complications involved in navigating high-dimensional spaces. We’ll examine how this impacts machine learning and data analysis, as well as strategies for mitigating its effects.
Thank you for making it this far! I really appreciate your time in reading this, and I hope you found the topic engaging. Please feel free to share any suggestions or possible edits for future articles!
The Surprising Behavior of Data in Higher Dimensions was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
The Surprising Behavior of Data in Higher Dimensions
Go Here to Read this Fast! The Surprising Behavior of Data in Higher Dimensions