Tag: AI

Data Visualization Cheat Sheet for Basic Machine Learning Algorithms

Boriharn K

Guiding charts for displaying machine learning results.

Continue reading on Towards Data Science »

Originally appeared here:
Data Visualization Cheat Sheet for Basic Machine Learning Algorithms

Go Here to Read this Fast! Data Visualization Cheat Sheet for Basic Machine Learning Algorithms

March 4, 2024
Visualizing Dynamical Behavior in Agent-Based Models
Dani Lisle
And encountering emergent complexity along the way

In my research into streamlining strategic knowledge extraction in game theoretic problems, I recently realized that I needed a better way to simply and intuitively visualize the behavior of agents with defined dynamical behavior.

This led me to build a simple library for visualizing agent behavior as an animation using PyPlot. But before jumping into the tutorial, here’s a quick catch-up on the core concepts at play here.

A Quick Primer on Dynamical Agent-Based Modeling

Agent based modeling (ABM) offers an excellent way to simulate players in many game theoretic environments. It lets us model and study the behavior and cognitive processes of each individual player, rather than just analyzing trends. When it isn’t practical to represent agents using simplistic binary state machines like cellular automata grids, ABM lets us capture scenarios by representing the agents’ positions in a dimensional space, where each dimension has its own unique rules.

Agents in Position and State Space

By utilizing both spatial and state dimensions we can accurately model proximity as well as incorporate attributes that allow for subtle interactions, such as similarity in characteristics. Furthermore, storing the “positions” of agents in state space allow us to keep track of and compare detailed and dynamic state information about agents (i.e. a player’s fatigue in a model of football).

Compared to network models, wherein the existence of connections between two objects indicate relationships, position-state-space information lets us to define and explore more complex and higher-dimensional relationships between agents. In this paradigm, Euclidean distance in state-space could represent a strong measure of the state similarity between two agents.

Dynamical Systems Theory for Behavior and Interaction

We can make such a model even more powerful by using dynamical systems theory to describe how the state of one agent influences those of others. It provides a robust way to define rules a system’s evolution over time.

In socioeconomic systems, differential equations can be used to model communication, mimicry, and more complex adaptive processes, as well as describe how individual actions can produce group dynamics.

Visualization Needs in Time and Space

For my problem, I wanted to be able see the position of each agent and its movement over time at a glance, and some basic state information (i.e. one or two variables per agent).

There are currently several libraries that meet the need for visualizing ABMs, including NetLogo and Mesa (for Python). However, they are built primarily for visualizing discrete-space, rather than continuous-space models. For my purposes, I was interested in the latter, and so began the side quest. If you’d like to follow along or test out the code for yourself, it’s all stored at github.com/dreamchef/abm-viz.

Building the Classes and Animation in Python

For starters, I needed to represent and store the agents and the world, and their state and dynamics. I chose to do this using Python classes. I defined an Agent class with a set of variables that I thought might be relevant to many possible modeling tasks. I also defined a plottable circle object (with PyPlot) for each agent within the class structure.
```
class Agent:

    def __init__(self, index, position, velocity, empathy=1, ... dVision=0, age=0, plotSize=10):
        
        self.index = index
        self.velocity = np.array(velocity)
        self.empathy = empathy
        ...
        self.dVision = dVision
        self.color = HSVToRGB([self.species,1,1])
        self.plotSize=plotSize

        self.pltObj = plt.Circle(self.position, self.plotSize, color=self.color)
```
Then, after a bit of experimentation, I found that it in the spirit of good objected-oriented programming principles, it would be best to define a World (think game or system) class as well. In included both the state information for the world and the information for the plots and axes in the class structure:
```
class World:

    def __init__(self,population=1,spawnSize=400,worldSize=1200,worldInterval=300,arrows=False,agentSize=10):
        
        self.agents = []
        self.figure, self.ax = plt.subplots(figsize=(12,8))
        self.ax.set_xlim(-worldSize/2, worldSize/2)
        self.ax.set_ylim(-worldSize/2, worldSize/2)

        self.worldInterval = worldInterval
        self.worldSize = worldSize
        self.arrows = arrows
        self.agentSize = agentSize

        for i in range(population):

            print(i)

            newAgent = Agent(index=i, position=[rand()*spawnSize - spawnSize/2, rand()*spawnSize - spawnSize/2],
                                     velocity=[rand()*spawnSize/10 - spawnSize/20, rand()*spawnSize/10 - spawnSize/20], plotSize = self.agentSize)

            self.agents.append(newAgent)
            self.ax.add_patch(newAgent.pltObj)

            print('Created agent at',newAgent.position,'with index',newAgent.index)

        self.spawnSize = spawnSize
```
I wrote this class so that the programmer can simply specify the desired number of agents (along with other parameters such as a world size and spawn area size) rather than manually creating and adding agents in the main section.

With this basic structure defined and agents being generated with randomized positions and initial velocities in the world, I then used PyPlot library’s animation features to create a method which would begin the visualization:
```
def start(self):

        ani = animation.FuncAnimation(self.figure, self.updateWorld, frames=100, interval=self.worldInterval, blit=True)

        plt.show()
```
This function references the figure, stored in the World instance, some specifications for the speed and length of the animation, and an update function, which is also defined in the World class:
```
def updateWorld(self,x=0):

        pltObjects = []

        for agent in self.agents:

            agent.updatePosition(self.agents, self.worldSize)
            agent.pltObj.center = agent.position
            pltObjects.append(agent.pltObj)

            if self.arrows is True:
           
                velocityArrow = plt.Arrow(agent.position[0], agent.position[1], agent.velocity[0], agent.velocity[1], width=2, color=agent.color)

                self.ax.add_patch(velocityArrow)

                pltObjects.append(velocityArrow)

        return pltObjects
```
The World.updatePosition function simply added each agent’s static velocity to the current position. This preliminary work was able to generate simple animations like this:

Constant Velocities

With this basic functionality, we’d now like to be able to visualize more interesting dynamics.

Visualizing “Social” Dynamic Behavior

I first chose to define a dynamic where each agent would change its direction of movement based on the average direction of movement of other agents around it. In equation form:

I encoded the dynamic in Python via an Agent.updatePosition() and an Agent.updateVelocity() method, which are run on each animation frame:
```
def updatePosition(self, agents, worldSize):

        self.updateVelocity(agents)

        self.position += self.velocity

        ...

    def updateVelocity(self, agents):

        herd_velocity = self.herdVelocity(agents)

        herd_magnitude = np.linalg.norm(herd_velocity)
        self_magnitude = np.linalg.norm(self.velocity)

        if herd_magnitude > 0.1:
            
            herd_unit_velocity = herd_velocity/herd_magnitude

            self.velocity += herd_unit_velocity
```
In the PyPlot animation below, the agents begin with very different velocities, but quickly adjust and begin traveling in the same direction. In this case, the average direction was roughly upward in the Y-direction at first.

Converging Upward

This time, the group had initialized with a roughly leftward velocity with one “straggling”, who quickly adjusts.

Single Straggler to Leftward Convergence

Displaying Velocity Vectors

Next I realized it would be helpful to see the velocity of the agents more clearly, so I implemented arrows to show each agent’s magnitude and direction:
```
velocityArrow = plt.Arrow(agent.position[0], agent.position[1], agent.velocity[0], agent.velocity[1], width=2, color=agent.color)
self.ax.add_patch(velocityArrow)
pltObjects.append(velocityArrow)
```
That modification gave more helpful animations like this one. We can still notice the converging-velocities dynamic at a glance, but see the rates of acceleration more clearly now as well.

Velocity Arrows

For the animation above, I also adjusting the dynamic to be dependent on a sight range variable. In other words, agents only adjust their velocities to match agents that are nearby (within 300 units in this case).

I also modified the code so that each agent would only modify its direction of movement, not its speed. Keep this in mind for the next section.

More Complex, State-Wise Dynamics

Up until this point, I had only implemented dynamics which considered the positions and velocities of each agent. But as I alluded to in the overview section, consider non-spatial, state information as well can make our modeling approach much more generalizable.

Using Agent Hue and Saturation To Represent State Variables

I made use an auxiliary state tied to the RGB color of each agent. Foreshadowing the evolutionary game theoretic goals of my research, I call this the agent “species”, and implement it as follows.

In the Agent.__init__ method, I added random species generation and individual mapping to the color of the agent’s marker on the plot:
```
self.species = rand()
self.color = HSVToRGB([self.species,1,1])
self.pltObj = plt.Circle(self.position, self.plotSize, color=self.color)
```
Specifically, I assigned this to the hue of the object, to reserve the saturation (roughly grey-ness) for other variables of potential interest (i.e. remaining lifespan or health). This practice of dividing hue and saturation has precedent in the visualization of complex-valued functions, which I go into in this article.

Dependence of Acceleration on Species State Variable

With examples of herding behavior in nature that occurs between animals of the same species, but not between those of differing species, I decided to change our toy dynamics to consider species. This change meant that each agent will only modify its direction for agents that are sufficiently close in the world and sufficiently similar in species (i.e. similarly colored circles).

Now, this is where things, began to get very, very interesting. Before you continue reading, ask yourself:
- What behavior do you expect to see in the next animation?
A relatively naive experimenter (such as myself most of the time) would expect agents to organize themselves by species and travel in herds, with each herd tending to travel in different directions and mostly ignoring each other. To see if either of us is right, let’s continue.

To encode this behavior I modified the calculation of herd velocity with respect to an agent in the following way:
```
herd_velocity += neighbor.velocity * (0.5-np.sqrt(abs(self.species-neighbor.species)))
```
That change resulted in animations like this one. To see the interesting behavior here, I started recording this about 20 seconds into the animation.

Emergent Behavior

It’s okay if the behavior isn’t immediately obvious. It’s subtle. And if you think you predicted it correctly, great job!

As it turned out, the agents didn’t organize themselves into herds by species very well at all. Instead, the only agents who seem to stick together are the ones who have both similar species and similar travel speed. You can see this happening most often with the slower green agents, the pair of blue agents, and the fast blue and purple agents shooting across the bottom of the screen. Notably, the agents seem to prioritize speed over species similarity when “choosing” who to travel with. You can see this most often with the light, dark blue, and even purple agents.

This makes perfect sense in the dynamics we defined, since each agent’s speed is constant, and agents with different speeds will ultimately fall behind or run away from their comrades. And yet, its somewhat surprising.

Fundamentally, this is because the behavior is emergent. In other words, we didn’t explicitly tell the agents to behave in this way as a group. Rather, they “derived” this behavior from the simple set of instructions, which we gave to each individual agent in an identical encoded format.

Visualization, Emergence, and Data Insights

We began this journey with the simple goal of visualizing an ABM to make sure on a high level that the dynamics we would set up would work as intended. But in addition to accomplishing this, we stumbled upon an emergent behavior which we may not have even considered when creating the model.

This illustrates an important partnership with respect to visualization in data science, simulations, and modeling across the board. Discovering emergent behavior in complex systems can be accelerated by new perspectives on the model or dataset. This isn’t only true for agent-based modeling. It applies to obtaining insights across the rest of data science as well. And creating new and creative visualizations provides a sure-fire way to get such perspectives.

If you’d like to play around with this model and visualization further, you can get the code at github.com/dreamchef/abm-viz. I’d love to hear what else you may find, or your thoughts on my work, in the comments! Feel free to reach out to me on Twitter and LinkedIn as well. Have a great day!

Unless otherwise noted, all images and animations were created by the author.

Visualizing Dynamical Behavior in Agent-Based Models was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Visualizing Dynamical Behavior in Agent-Based Models

Go Here to Read this Fast! Visualizing Dynamical Behavior in Agent-Based Models
March 4, 2024
Pierre-Simon Laplace, Inverse Probability, and the Central Limit Theorem
Sachin Date
CC BY-SA 4.0/Public domain/Public domain/Image by Author/Public domain

On Laplace’s brilliant solution to inverse probability and his discovery of the Central Limit Theorem

In the late 1600s, Jacob Bernoulli pondered over an interesting question: How do you estimate the probability of an event whose sample space is not fully accessible? How, for instance, do you estimate the chance of being hit in the head by a bolt from the sky during your lifetime? Or if you prefer a less dramatic setting, how do estimate the real proportion of black tickets in an urn filled with an unknown quantity of black and white tickets? The experimental setting was unmistakably binomial: a certain number of independent trials with only one of two possible outcomes in each trial. Jacob Bernoulli’s meditations on this binomial experiment led him to discover the Weak Law of Large Numbers. That was around 1689.

In 1733, forty years after Bernoulli’s discovery of the WLLN, a brilliant self-exiled Frenchman living a hand-to-mouth existence in England by the name Abraham De Moivre figured out how to accurately compute the probabilities expressed in Bernoulli’s binomial thought experiment. De Moivre’s technique, called De Moivre’s theorem (covered here), became one of the most influential discoveries of the 18th and 19th century mathematical thought.

And if that wasn’t enough, De Moivre also unintentionally introduced the world to the Normal distribution. But missing from De Moivre’s blockbuster accomplishments was any thought given to the following question:

How do you estimate the real (unknown) fraction of black tickets given only the observed fraction of black tickets in a single random sample?

It was the very question that Bernoulli had implicitly asked. In the language of probability, how how do you estimate P(p|X_bar_n), where p is the probability density of the real proportion of black tickets in the ticket-filled urn, and X_bar_n is the number of black tickets that you have actually counted in a random sample of size n. This probability is called inverse probability.

It’s worth taking a step back to appreciate how remarkable this concept is. The act of drawing a random sample and counting the number of black tickets in it spawns an infinite number of probable realities. Couched within each reality is a different candidate for the real proportion of black tickets — a real numbered quantity. Attached to this proportion is a probability density. This density is described by a Probability Density Function (PDF) which takes the observed fraction of black tickets as a parameter. If you know the PDF’s formula, you can plug in the observed fraction of black tickets and know the probability density (or what’s more practical, a range of possible values — a confidence interval) for the real fraction of black tickets in the urn.

The mere act of observing the fraction of black tickets in a single random sample immediately makes a small subset of the infinite set of possible universes much more probable.

In 1689, Bernoulli made no reference to inverse probability, leave alone its Many Worlds interpretations. It would be another century before two men — a Presbyterian preacher in England, and one of the greatest minds in France — would independently show how to calculate inverse probability. And it would be yet another century following their discoveries that the Many Worlds Interpretation of quantum mechanics itself would meet the pages of a scientific publication.

Incidentally, the English clergyman was Thomas Bayes (of the Bayes theorem fame) whose achievement deserves its own article. And the Frenchman — the hero of our article — was Pierre-Simon Marquis de Laplace. Laplace would not only crack open the problem of inverse probability but also use it as a powerful catalyst to discover the Central Limit Theorem.

Pierre-Simon Marquis de Laplace (1749–1827) (CC BY-SA 4.0)

Pierre-Simon Laplace was born in 1749 in a France that was very different from the kingdom from which his fellow countryman Abraham De Moivre had fled 60 years ago. De Moivre escaped from a country defined by the larger than life presence of Louis XIV and his unsparing hold over religious life. Laplace was to grow up during the more accommodating, almost laid-back reign of Louis XIV’s great-grandson, Louis XV.

Louis XV (1710–1774) (Public domain)

Unfortunately, and in sharp contrast to his great-grandfather’s era, Louis XV’s reign from 1715 to 1774 saw a precipitous weakening of the French Monarchy’s power. There was a great up-swell of resentment among the heaving masses of the French kingdom, a trend that was to fester and boil until it finally erupted into the French Revolution of 1789 and spilled over into a toxic purge of human life called the Reign of Terror.

The storming of Bastille (July 1789), the Reign of Terror (1793–1794) portrayed in period art, and a caricature of the Coup d’état of 18 Brumaire, (1799) that catapulted Napoleon Bonaparte to the First Consul of the French Republic (1802)

It was against this backdrop that Laplace grew up and somehow managed to do all the dozens of important things that people do like go to school, have a career, start a family, put kids through school, and become a grandparent. Oh, and in Laplace’s case, also make several ground breaking discoveries in physics, chemistry and math.

But Laplace wasn’t immune to the events unfolding around him. Indeed there was hardly a French scientist or philosopher who was spared the anguish of the revolution and the ensuing Reign of Terror, or the uncertainties bred by Napoleon Bonaparte’s eventual rise (twice over) as emperor and his subsequent fall (again twice over). Several of Laplace’s fellow scientists and friends, many of them France’s brightest minds — Antoine Lavoisier (1743–1794), Jean Sylvain Bailly (1736–1793), and Marquis de Condorcet (1743–1794) to name just a few—had succumbed to the excesses of this period.

12 November 1793: The execution of Jean Sylvain Bailly — French astronomer, mathematician, free mason, and political leader

It’s perhaps a testimony to Laplace’s astuteness and his keen judgement of the changing power equations of his times that he managed to keep himself and his family safe through what was arguably one of the most tumultuous periods in French history. Indeed, during the very times in which so many of his friends lost their livelihoods, their reputations, their freedoms, their loved ones, and even quite literally their heads, Laplace saw his career blossom, his stature as a scientist grow, and his importance to the French Republic rise.

For four prodigious decades spanning 1772 to 1812, Laplace immersed himself in a number of different problems on probability. Two key themes of his work were inverse probability and later on, the Central Limit Theorem. By 1774, Laplace had solved the problem of inverse probability. And true to the nature of the work done in his era, he spent the next several years perfecting his technique.

Laplace’s solution for inverse probability: the setting, the motivation, and the approach

Recollect the problem framed by Jacob Bernoulli in 1689. There is an urn containing an unknown number of black and white colored tickets. The true proportion p of black tickets in the urn is unknown. Thus, p is also the unknown ‘true’ probability of coming across a black ticket in a single random draw. Suppose you draw n tickets at random from the urn.

Let the discrete random variable X_bar_n denote the number of black tickets in your random sample. Thus, the observed proportion of black tickets in the sample is X_bar_n/n. X_bar_n follows the binomial probability distribution with parameters n and p, i.e. X_bar_n ~ Binomial(n,p).

The Probability Mass Function (PMF) of X_bar_n for a sample size of 30, and three different values of p (Image by Author)

Now let’s apply this theoretical setting of a ticket-filled urn to a real world scenario studied by Laplace. From the census data of Paris, Laplace found out that from 1745 to 1770, there were 251527 boys and 241945 girls born in Paris. Remember, the census data is often not accurate. At best, it’s a reasonable proxy for the actual population values. Hence the numbers that Laplace retrieved can be thought of as a rather large random sample.

The sample size n is 251527 + 241945 = 493472.

Let p be the real proportion of boys in the population of Paris. p is unknown.

Let X_bar_n represent the number of boys in the sample. Thus, X_bar_n = 251527.

The observed fraction of boys in the sample is:

X_bar_n/n = 251527/(251527 + 241945) = 0.50971.

X_bar_n is a binomial random variable.

Notation-wise, X_bar_n ~ Binomial(n=493472, p)

The binomial probability P(X_bar_n=251527 | n=493472, p) is given by the following well-known Probability Mass Function (PMF):

The binomial probability of encountering 251527 boys in a sample of size 493472 (Image by Author)

In the above formula, remember that the real fraction of boys ‘p’ in Paris’s population is unknown.

You may have noticed the enormous factorials in the formula of this binomial probability. Even if you assume that p is known, these factorials make it impossible to compute this binomial probability. A good thirty years before Laplace was even born, and a half century before he set his sights on the topic of probability, it was the approximation of this forward binomial probability for a known ‘p’ that Laplace’s fellow Frenchman, Abraham De Moivre, toiled upon — a toil that De Moivre committed himself to with such intense focus that he seems to have completely ignored the limited practical use of this probability.

You see, to calculate the forward binomial probability P(X_bar_n | n, p) conditioned upon n, and p, if you must obviously know the sample size n, and the real fraction p. The real fraction ‘p’ is found in prolific abundance in end-of-the-chapter problems in Statistics textbooks. But outside of such well-behaved neighborhoods, ‘p’ makes itself scare. In the real world, you would almost never know the true p for any phenomena. And therefore, in any practical setting, the computation of the forward binomial probability is of limited use to the practicing statistician.

Instead, what would be really useful is to treat the real, known, p as a continuous random variable, and to estimate its probability distribution conditioned upon a particular observed ratio X_bar_n/n. For exampl, recollect that in case of the Paris census data, X_bar_n/n = 251527/493472 = 0.50971.

Thus, we’d want to know the following probability:

P(p | n=493472, X_bar_n=251527)

Notice how this probability is the exact inverse of the ‘forward’ binomial probability:

P(X_bar_n=251527 | n=493472, p)

A half century after De Moivre was done with his life’s work on the subject, it was P(p | n, X_bar_n=x) — the inverse probability — that was the only probability in Laplace’s crosshairs.

Notice also that in the probability P(p | n, X_bar_n=x), p is a real number defined over the closed interval [0,1]. There are infinite possible values of p in [0,1]. So for any given p, P(p) must be necessarily zero. Thus, P(p=p_i | n, X_bar_n=x) is the Probability Density Function (PDF). A single point on this function will give you the probability density corresponding to some value of ‘p’.

In a practical setting, what you’d want to estimate is not a particular value of p but the probability of p (conditioned upon X_bar_n, and n) lying between two specified bounds p_low and p_high. In the language of Statistics, what you are seeking (and what Laplace sought) is not a point estimate, but an interval estimate of p.

Mathematically speaking, you’d want to calculate the following:

The probability of the unknown p lying in the interval [p_low, p_high] (Image by Author)

In his quest for a solution to this probability, Laplace worked through hundreds of pages of mind-numbingly tedious derivations and hand-calculations. Much of this work appears in his memoirs and books published from 1770 through 1812. They can be a real treat to go through if you are the sort of person who rejoices at the prospect of looking at mathematical symbols in volume. For the rest of us, the essence of Laplace’s approach toward inverse probability can be summed up in his ingenious use of the following equation:

Bayes’s Theorem (Image by Author)

The above equation — generally known as Bayes’s theorem or Bayes’s rule — lets you calculate the probability P(A | B) in terms of its inverse P(B | A). For the problem Laplace was working upon, it can be used as follows:

A means to compute inverse probability P(p|X_bar_n=x) using Bayes’s Theorem (Image by Author)

I’ve omitted n and x for brevity. Laplace didn’t so much use the above equation directly as much as he arrived at it using a particular method of apportioning probabilities to different causes that he describes in detail his work.

In France, at least a decade before Laplace was to work upon the problem of inverse probability, a Presbyterian minister in England named Thomas Bayes (1701–1761) had already all but solved the problem. But Bayes’ approach toward inverse probability was remarkably different than Laplace’s. Besides, Bayes failed to publish his work in his lifetime.

Thomas Bayes (1701–1761)

At any rate, using the following equation as the starting point, and using modern notation, I’ll explain the essence of Laplace’s line of attack on inverse probability.

A means to compute inverse probability P(p|X_bar_n=x) (Image by Author)

There are three probabilities to be computed on the R.H.S. of the above equation:
- P(X_bar_n=x | p),
- P(p), and
- P(X_bar_n=x).
Let’s look at them one by one:

P(X_bar_n=x | p)

This is the easiest of the three to calculate. It is the forward binomial probability that can be computed using the well-known formula for the probability distribution of a binomial random variable:

P(X_bar_n=x) where X_bar_n ~ Binomial(n,p) (Image by Author)

P(p)

P(p) is the probability density function of the unconditional prior probability. To arrive at P(p), Laplace employed what is known as the principle of insufficient reason which says the following:

If nothing specific or special is known or can be assumed about the probability distribution of a random variable, one should assume that it is uniformly distributed over the range of all possible values i.e. the support space of the variable.

The support for p is [0, 1]. By the above principle, we should consider p to be uniformly distributed over the [0, 1] interval. That is, p ~ Uniform(0,1). Therefore, for any i, P(p=p_i) = 1/(1–0) = 1. That is, a constant.

P(X_bar_n = x)

P(X_bar_n = x) is the marginal probability of X_bar_n=x. It is the probability of X_bar_n taking the value x assuming that p takes each one of the possible values in the range [0,1]. Thus, P(X_bar_n = x) is the following infinite sum:

The marginal probability P(X_bar_n) expressed as a sum over all p ϵ [0,1] (Image by Author)

The summation runs over all p_i in the interval [0, 1]. Since p is continuous over [0,1], we can convert the discrete summation into a smooth integration over [0,1]. While we are at it, we’ll also plug in the formula for the binomial probability P(X_bar_n=x | n,p):

The marginal probability P(X_bar_n) expressed as an integration over all p ϵ [0,1] (Image by Author)

Let’s put back all three terms into Bayes’s formula. And yes, I am going to call it Bayes’s formula. Even though Laplace’s approach to inverse probability was starkly different from Bayes’s, history suggests that Bayes got the idea in his head first. So there.

At any rate, here’s what we get from substituting P(X_bar_n=x|p), P(p), and P(X_bar_n = x) with their corresponding formulae or value:

Laplace’s formula for inverse probability P(p|X_bar_n=x) (Image by Author)

Since x is a known quantity (x is the value of the observed sample mean X_bar_n), the R.H.S. is, in essence, a function of p that is normalized by the definite integral in the denominator. To compute the definite integral in the denominator we use a technique that makes use of a rather interesting and useful function called the Beta function. For any two positive numbers a and b, the Beta function B(a, b) is defined as follows:

The Beta function B(a,b) (Image by Author)

The symbol Γ is the Greek alphabet capital gamma. Γ(j) is the gamma function which in its general form is the extension of the Factorial function to complex numbers. In our case, we’ll stick to positive integers for which Γ(j) is defined simply as (j — 1)! Notice how the gamma function lets you calculate the continuous integral on the L.H.S. using a set of three discrete factorials (a — 1)!, (b — 1)!, and (a + b — 1)!

Before we move ahead, let’s recall that our aim is to calculate the definite integral in the denominator on the R.H.S. of the following equation:

Laplace’s formula for inverse probability P(p|X_bar_n=x) (Image by Author)

In the Beta function, if you set a = x + 1 and b = n — x + 1, you will transform Beta(a, b) into the definite integral of interest as follows:

The normalization term in Laplace’s formula expressed using the Beta integral (Image by Author)

In summary, using the Beta function Laplace’s formula for inverse probability can be expressed as follows:

Laplace’s formula for inverse probability using the Beta integral (Image by Author)

Today, most statistics libraries contain routines to compute B(x+1, n — x+1). So you will never ever have to put yourself through the sort of misery that Laplace or De Moivre, or for that matter Jacob Bernoulli had to put themselves through to arrive at their results.

Now let’s get back to the census example.

Recall that the recorded number of male births in Paris was 251527, and the sample size was 251527 + 241945 = 493472.

x = 251527, and n = 493472

If you plug in x and n in the above formula and use your favorite library to calculate the probability density P(p | X_bar_n = x, n) for different values of p in the interval [0, 1], you’ll get the following plot. I used Python and scipy, specifically scipy.stats.binom and scipy.special.betainc.

The PDF for P(p|X_bar_n=251527) (Image by Author)

As expected, the posterior probability density peaks when p is 251527/493472 = 0.50971 corresponding to the observed count of 251527 male births.

Here finally we have the means to answer the question Jacob Bernoulli posed more than three centuries ago:

How do you estimate the real (unknown) fraction (of black tickets) given only the observed fraction of (black tickets) in a single random sample?

In other words:

What is the real probability (density) p conditioned upon the sample value X_bar_n = x?

Or in general terms:

What is the probability (density) of the population mean (or sum) given a single observed sample mean (or sum)?

As mentioned earlier, since p is continuous over [0, 1], what’s really useful to us is the probability of the real ratio p lying between two specified bounds p_low and p_high, i.e. the following probability:

Probability of the unknown population mean lying between specified bounds (Image by Author)

The above probability can be split into two cumulative probabilities:

Probability of the unknown population mean lying between specified bounds (Image by Author)

Laplace asked the following question:

What is the probability that the true ratio of boys to total births in Paris was greater than 50%? i.e. what is the value of P(p | X_bar_n > 0.5)?

P(p | X_bar_n > 0.5) = 1 — P(p | X_bar_n ≤ 0.5).

P(p | X_bar_n ≤ 0.5) can be calculated as follows:

What is the probability that the true ratio of male births to total births in Paris is ≤ 0.5?

To calculate P(p|X_bar_n ≤ 0.5), we need to use a modified version of the formula for inverse probability which uses the incomplete Beta function B(x; a, b) as follows:

P(p|X_bar_n ≤ 0.5) using the incomplete Beta function (Image by Author)

As before, we set a = x + 1 and b = n — x + 1. Unlike B(a,b) which can be computed a ratio of gamma functions, the incomplete Beta function B(x; a, b) has no closed form. But stats packages such asscipy.special.betainc will happily calculate its value for you using numerical techniques.

As before, we set x = 251527 boys, and sample size n = 493472 total births.

And we calculate a and b as follows:
a = (x + 1) = 251528 and b = (n — x + 1) = 241946

And using a stats package we calculate:
B(0.5, a,b) = B(0.5, 251527 + 1, 493472 — 251528 + 1), and
B(a, 5) = B(251527 + 1, 493472–251528 + 1).

P(p|X_bar_n ≤ 0.5) in Paris using the incomplete Beta function (Image by Author)

P(p|X_bar_n ≤ 0.5) turns out to be vanishingly small.

Thus, P(p|X_bar_n > 0.5) = 1 — P(p|X_bar_n ≤ 0.5) is basically 1.0.

A moral certainty!

Laplace concluded that the true ratio of male births to total births in Paris during 1745 to 1770 was definitely greater than 50%. It was an arrestingly important piece of inference that may well have influenced public policy in 18th century France.

Isn’t inverse probability just smashingly awesome!

Before moving ahead, let’s review what Laplace’s work on inverse probability gave us.

If X_bar_n ~ Binomial(n,p), Laplace gave us a way to calculate the probability density distribution P(p | X_bar_n = x) of the unknown population mean p given a single observation about the sample mean X_bar_n/n as follows:

Laplace’s formula for inverse probability using the Beta integral (Image by Author)

That also paved a direct path to calculating the probability of the unknown mean lying in a closed interval [p_low, p_high] of our choosing:

Probability of the unknown mean lying between two bounds p_low and p_high (Image by Author)

Laplace described his work on inverse probability in his mémoires published from 1774 through 1781. By 1781, the calculation of inverse probability, a topic that had bedeviled mathematicians for more than a century was firm solved, albeit for binomial random variables.

Non-binomial samples and the discovery of the Central Limit Theorem

No exposition on Laplace’s work on probability will be complete without at least touching upon his work on the Central Limit Theorem.

Laplace’s work on inverse probability was directly applicable to binomial random variables. Surely his fertile mind must have perceived the need to generalize his results to non-binomial samples.

Specifically, and in modern notation, suppose X_1, X_2,…,X_n are n i.i.d. random variables that form a random sample of size n. Let X_bar_n be their sum. This time, we’ll assume that X_bar_n has some unknown probability distribution. Let μ be the mean and σ² be the variance of the population from which the random sample is drawn. As in the case of the binomial setting, both μ and σ² are unknown. With all that in context, the problem statement is as follows:

Find a closed interval [μ_low, μ_high] such that μ will lie in this interval with a specified level of certainty (1 — α) where 0 ≤ α ≤ 1. In modern terms, (1 — α) is the confidence level and [μ_low, μ_high] (1 — α)100% confidence interval for the unknown mean μ.

In equation terms, what you want to have are the values μ_low and μ_high such that:

The (1 — α)100% confidence interval for the unknown population mean μ (Image by Author)

We’ll apply the following two crucial operations on the above equation:
1. We’ll split the inequality on the L.H.S. into a difference of two probabilities P(μ|X_bar_n ≤ μ_high) and P(μ|X_bar_n ≤ μ_low), and
2. On the R.H.S., we’ll express (1 — α) as: (1 — α/2) — (α/2).
The (1 — α)100% confidence interval for the unknown population mean μ (Image by Author)

Next, we pair up the blue term on the L.H.S. with the blue term on the R.H.S., and ditto for the yellow terms. Doing us yields the formulae for P(μ|X_bar_n ≤ μ_high) and P(μ|X_bar_n ≤ μ_low) as follows:

μ_low and μ_high expressed as inverse CDFs of α/2 and (1 — α/2) respectively (Image by Author)

In the above formulae, F(x) is the notation for the Cumulative Distribution function (CDF) of X.

F(.) to the power ( — 1) is the inverse CDF (not to be confused with inverse probability). The meaning of inverse CDF is quite simple. Suppose P(X=x) is the Probability Density Function (PDF) of a continuous random variable X. If X is discrete, P(X=x) is the Probability Mass Function (PMF) of X. Then P(X ≤ x) = F(X=x) is the Cumulative Density Function (CDF) of X. F(X=x) returns the area under the curve of the PDF (or PMF) from negative ∞ to x. The inverse CDF is literally the inverse of F(X=x). The inverse CDF takes as a parameter the area under the curve of the PDF or PMF of X from negative ∞ to x, and it returns the x corresponding to this area. That’s all there is to the inverse CDF.

Let’s illustrate the use of the inverse CDF when X has a log-normal distribution. X is log-normally distributed if ln(X) is normally distributed. Let’s assume that ln(X) is normally distributed with a mean of 0 and variance of 0.5². Then the log-normal PDF of X looks like this:

The PDF of a log-normal X (Image by Author)

Let μ be the mean of X. Suppose you wish to find [μ_low, μ_high] such that the there is a 95% chance that the mean of X will lie in this interval. We set α = 0.05.

P(μ_low ≤ μ ≤ μ_high) = (1 — 0.05).

P(μ_low ≤ μ ≤ u_high) = (1–0.05) can be split into two terms as follows:

P(μ ≤ μ_high) — P(μ ≤ μ_low) = (1 — 0.025) — (0.025)

You’d want to find a μ_low and a μ_high such that:

P(μ ≤ μ_low) = F(μ_low) = 0.025, and

P(μ ≤ μ_high) = F(μ_high) = (1 — 0.025)

That is, you’d want to find a μ_low such that the area under the PDF of X from negative ∞ to μ_low is 0.025. This area is the blue region in the figure below. Since we know the PDF of X, we can find the corresponding μ_low using any Statistics package. μ_low is 0.38. And you’d also want to find a μ_high such that the area under the curve of the PDF of X from negative ∞ to μ_high is (1–0.025). This area is the blue region in the figure below and the corresponding μ_high is 2.66.

Areas under the PDF curve of X ~ Lognormal(0, 0.⁵²) (Image by Author)

Thus, if X is log-normally distributed with parameters 0 and 0.05², there is a 95% chance that the mean of X will lie in the interval [0.38, 2.66].

In the above example, we could easily find the inverse CDFs for 0.025 and (1–0.025) because we knew the CDF of X. We knew the CDF of X because we assumed a certain PDF for X. We assumed that X was log-normally distributed.

And here we come to a crucial point. In real life, never will you know the probability distribution of X. Sometimes you can guess what distribution family X belongs to such as Poisson, Normal, Exponential, Negative Binomial etc. and you can approximate the PDF or PMF by estimating the parameters (i.e. usually the mean, and variance) of the distribution family. But if you don’t know the distribution and you cannot make a reasonable guess, you are truly stuck. And this is exactly when the Central Limit Theorem will come to your rescue.

The Central Limit Theorem says that if X happens to be the mean of your random sample, then the standardized X (I’ll explain what that is in a minute) converges in distribution to the standard normal random variable N(0, 1). The standard normal variable N(0, 1) is a random variable that has a normal distribution with a mean of 0 and a variance of 1. Convergence in distribution means that the CDF of the standardized X will become identical to the CDF of N(0, 1) as the sample size approaches infinity.

The CDF of N(0, 1) is usually represented by the symbol Φ, spelled as “Phi”.

If μ and σ² are respectively the unknown mean and variance of the population from which you drew the random sample, then the standardized X is the following random variable Z:

Standardized X (Image by Author)

The complete statement of the Central Limit Theorem expressed in modern notation is as follows:

Let X_1, X_2, X_3,…,X_n be n independent, and identically distributed random variables which together form a random sample of size n. Let μ and σ² be the unknown mean and variance of the underlying population from which this random sample is drawn. We assume that the population is infinitely large or at least “extremely large”. Let X_bar_n be the sample mean. As the sample size n approaches infinity, the standardized X_bar_n converges in distribution to N(0, 1). The following figure illustrates the behavior of the CLT using mathematical notation:

The Central Limit Theorem (Image by Author)

At some point during the 3 decades that followed his 1774 paper on inverse probability, Laplace went to work on the Central Limit Theorem. During Laplace’s era, the CLT wasn’t called by that name. Neither did Laplace’s treatment of it resemble the modern form in which it’s expressed. But the ideas and insights into the CLT’s behavior shine out of Laplace’s work with the same brilliant clarity as in its modern form.

Laplace published his work on the CLT first in a mémoir in 1810. But his treatment of CLT in the 1812 edition of his book Théorie analytique des probabilités (Analytic theory of probabilities) is widely considered as the definitive source on the topic.

In his 1812 book Théorie analytique and in what may be the custom of his times or simply a way to keep up the charm offensive, Laplace made an overtly obsequious dedication of his work to Napoleon Bonaparte — the then reigning emperor of France:

To Napoleon-the-great

…

…

Signed:

The very humble and very obedient servant and faithful subject
LAPLACE

Laplace’s dedication of the 1812 edition of Théorie analytique des probabilités to Napoleon Bonaparte

But in 1812–1813, Napoleon was either busy invading Russia or reeling from the aftermath of his misjudged fiasco. At any rate, Laplace’s landmark achievement along with its full-throated dedication may have slipped the great Emperor’s attention.

Charles Joseph Minard’s depiction in 1869 of Napolean’s 1812 march on Moscov and subsequent retreat linked to a graph of temperature

By April 1814, Napoleon was packing his suitcase for an exile to Elba. Meanwhile in Paris, Laplace (or what is more likely, his publisher) wisely decided to publish the 1814 edition of Théorie analytique des probabilités minus the dedication to the great, now deposed, Emperor.

Let’s revisit our problem of finding an interval [μ_low, μ_high] such that there is a (1 — α)100% chance that the unknown population mean μ will lie in this interval. In other words, we want to calculate the (1 — α)100% confidence interval for the population mean. Remember that we have stopped assuming any particular distributional shape for the PDF of μ.

Without Laplace’s discovery of the Central Limit Theorem, this problem would have been difficult to solve and any solution to it, susceptible to all kinds of errors of approximation. But the CLT sweeps away all the difficulty, making the solution as easy as a walk in the park.

Let’s see how to do it. Here’s our problem statement. For a specified α, we wish to find a μ_low, and μ_high such that the following probability holds true:

The (1-α)100% confidence interval for the unknown population mean μ (Image by Author)

Without loss of generality (and isn’t that a terrifically useful phrase), let’s recast our problem statement to use the standardized sample mean Z_n. We would be seeking two bounds on Z_n, namely z_low and z_high such that the following probability holds true:

The (1-α)100% confidence interval for the unknown population mean μ (Image by Author)

As before, we’ll split Eq. (2) as follows:

(Image by Author)

As per the CLT, as n becomes very large, the CDF of Z_n approaches that of N(0, 1). Hence the probabilities on the L.H.S. can be replaced by the CDF of N(0,1) as follows:

(Image by Author)

Setting Φ(z_high) = (1 — α)/2 and Φ(z_low) = α/2 gives us a way to calculate z_high and z_low using the inverses of Φ(z_high) and Φ(z_low) as follows:

z_high and z_low using the inverses of Φ(z_high) and Φ(z_low) (Image by Author)

Since the standard normal distribution is symmetrical around the Y-axis, z_low is simply the negative of z_high. Thus:

The figure below illustrates the symmetry of the situation when α = 0.05, α/2 = 0.025, and (1 — α/2) = 0.975:

Φ(z_low=-1.96) = 0.025, and Φ(z_high=1.96) = 0.975 when Z ~ N(0, 1) (Image by Author)

Let’s substitute z_low and z_high in Eq. (2) with the corresponding formulae expressed in terms of Φ and α:

(Image by Author)

We know that Z_n is the standardized X_bar_n. So let’s substitute Z_n with the standardized X_bar_n:

(Image by Author)

Some simplification in the central term will help us isolate the unknown mean μ as follows:

The (1 — α)100% confidence interval for the unknown population mean μ (Image by Author)

Which gives us the following formulae for μ_low and μ_high:

Formulae for the (1 — α)100% confidence bounds μ_low and μ_high (Image by Author)

To use these formulae you need to know the confidence level α, the sample size n, the sample mean X_bar_n, and the population standard deviation σ. You would know the first three quantities. But you would not know σ. However, it can be shown that the sample standard deviation S (after the degrees-of-freedom correction) is a consistent, unbiased estimator of the population standard deviation σ. Hence, you may use S in place of σ and still get a consistent, unbiased estimate for μ_low and μ_high:

Formulae for the (1 — α)100% confidence bounds μ_low and μ_high using the dof-adjusted sample standard deviation (Image by Author)

Let’s use these formulae in an example.

Suppose you wish to estimate the average speed of your broadband connection. To do so, you take speed readings at 25 different random times of the day in a 24 hour period and you get the following values in Mbps:

A random sample of 25 speed test readings (Image by Author)

The sample mean X_bar_n = 398.81598 Mbps.

The sample variance S² is:

The sample variance (Image by Author)

Since X_1 through X_n (the 25 speed test readings) are i.i.d. random variables, the Weak Law of Large Numbers governs the behavior of the sample mean X_bar_n. Specifically:

The sample mean X_bar_n = 398.81598 Mbps is a consistent, unbiased estimator of the unknown population mean μ.

But how accurate an estimate of μ is the observed X_bar_n? You can test its accuracy by computing a 95% confidence interval for μ given the observed X_bar_n = 398.81598 Mbps.

To do that, you set α = 0.05. The inverse of Φ(1–0.05/2) is 1.96. You know that X_bar_n = 398.81598, S = 2.38838, and n = 25. So you have everything you need to calculate μ_low, μ_high as follows:

The 95% confidence bounds for the unknown mean broadband speed (Image by Author)

Your calculations are telling you that there is a 95% chance that the real mean broadband speed will lie in the interval [397.87974 Mbps, 399.75222 Mbps] conditioned upon your observed sample mean of 398.81598 Mbps.

Epilogue

In 1812 (or 1810 depending on your viewpoint), with his discovery of the Central Limit Theorem, Laplace plugged in the final missing piece that was deluding generations of scientists since Jacob Bernoulli’s quantification of the Weak Law Of Large numbers in 1689. In the decades following Bernoulli’s discovery of the WLLN, Abraham De Moivre showed how to approximate the factorial and in the process, introduced the world to the normal curve. Laplace was well aware of De Moivre’s work and he built extensively upon it in his own work on both inverse probability and the Central Limit Theorem.

With his work on inverse probability and the CLT, what Laplace gave us was the mathematical apparatus as well as a mechanism to calculate the posterior probability of the population mean (or sum) given a single observation of the sample mean (or sum).

The subsequent decades, especially the period 1850 to 1940, saw an explosion of interest in the CLT with so many researchers coming up with proofs, extensions, and applications for the CLT, that it would take many volumes to do justice to their contributions. Indeed, many tomes have been written on the subject! In the 1800s, Siméon Denis Poisson (1781–1840) and Augustin-Louis Cauchy (1789–1857) devised proofs for the CLT in restricted settings. In 1887, the Russian mathematician Pafnuty Lvovich Chebyshev (1821–1894) used moment generating functions (what he called “Bieneyme’s method of moments” to give due credit of the technique to his French mathematician friend, the reticent and brilliant Irénée-Jules Bienaymé”) to attempt a proof of the CLT. Chebyshev’s proof was incomplete but it extricated the CLT out of the narrow alleyway of a binomial thought experiment and made it applicable to a broad set of random variables. Around the turn of the 19th century, Chebyshev’s students Andrey Andreyevich Markov (1856–1922) and later on Aleksandr Mikhailovich Lyapunov (1857–1918) provided the first set of rigorous proofs for the CLT. Lyapunov also extended the CLT’s applicability to random variables that are not necessarily identically distributed, and so did the Finnish mathematician Jarl Waldemar Lindeberg (1876–1932) in what came to be known as the the Lypunov CLT and the Lindeberg CLT. In the 20th century, CLT’s applicability was further extended to dependent random variables which immediately made it applicable to inference in time series regression models which work upon correlated observations.

It’s hard to find a pair of discoveries in Statistics that are as fundamental, as intertwined with each other, as heavily researched, and with as rich a historical backdrop as the Law of Large Numbers and the Central Limit Theorem. And as with many great discoveries, it’s also easy to forget that the forms of the LLN and the CLT that we use today with such effortless ease are the evolutionary outcome of countless hours of toil from dozens of scientists across literally several centuries.

So next time you apply the LLN or the CLT to a problem, think of the rich history of thought you are carrying forward into your work. You just might look at your solution with a whole new level of admiration.

References and Copyrights

Books and Papers

Bernoulli, J(akob)., On the Law of Large Numbers, Part Four of Ars Conjectandi (English translation), translated by Oscar Sheynin, Berlin: NG Verlag, (2005) [1713], ISBN 978–3–938417–14–0, PDF download

Seneta, E. A Tricentenary history of the Law of Large Numbers. Bernoulli 19 (4) 1088–1121, September 2013. https://doi.org/10.3150/12-BEJSP12 PDF Download

Fischer, H., A History of the Central Limit Theorem. From Classical to Modern Probability Theory, Springer Science & Business Media, Oct-2010

Shafer, G., The significance of Jacob Bernoulli’s Ars Conjectandi for the philosophy of probability today, Journal of Econometrics, Volume 75, Issue 1, pp. 15–32, 1996, ISSN 0304–4076, https://doi.org/10.1016/0304-4076(95)01766-6.

Polasek, W., The Bernoullis and the origin of probability theory: Looking back after 300 years. Reson 5, pp. 26–42, 2000, https://doi.org/10.1007/BF02837935 PDF download

Stigler, S. M., The History of Statistics: The Measurement of Uncertainty Before 1900, Harvard University Press, Sept-1986

Barnard, G. A., and Bayes, T., Studies in the History of Probability and Statistics: IX. Thomas Bayes’s Essay Towards Solving a Problem in the Doctrine of Chances. Biometrika 45, no. 3/4, pp. 293–315, 1958, https://doi.org/10.2307/2333180. PDF download

Images and Videos

All images and videos in this article are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image or video.

Thanks for reading! If you liked this article, please follow me to receive more content on statistics and statistical modeling.

Pierre-Simon Laplace, Inverse Probability, and the Central Limit Theorem was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Pierre-Simon Laplace, Inverse Probability, and the Central Limit Theorem

Go Here to Read this Fast! Pierre-Simon Laplace, Inverse Probability, and the Central Limit Theorem
March 4, 2024
Comparison of Methods to Inform K-Means Clustering
Chris Taylor
A Brief Tutorial

Photo by Nabeel Hussain on Unsplash

K-Means is a popular unsupervised algorithm for clustering tasks. Despite its popularity, it can be difficult to use in some contexts due to the requirement that the number of clusters (or k) be chosen before the algorithm has been implemented.

Two quantitative methods to address this issue are the elbow plot and the silhouette score. Some authors regard the elbow plot as “coarse” and recommend data scientists use the silhouette score [1]. Although general advice is useful in many situations, it is best to evaluate problems on a case-by-case basis to determine what is best for the data.

The purpose of this article is to provide a tutorial on how to implement k-means clustering using an elbow plot and silhouette score and how to evaluate their performance.

A Google Colab notebook containing the code reviewed in this article can be accessed through the following link:

https://colab.research.google.com/drive/1saGoBHa4nb8QjdSpJhhYfgpPp3YCbteU?usp=sharing

Description of Data

The Seeds dataset was originally published in a study by Charytanowiscz et al. [2] and can be accessed through the following link https://archive.ics.uci.edu/dataset/236/seeds

The dataset is comprised of 210 entries and eight variables. One column contains information about a seed’s variety (i.e., 1, 2, or 3) and seven columns contain information about the geometric properties of the seeds. The properties include (a) area, (b) perimeter, (c) compactness, (d) kernel length, (e) kernel width, (f) asymmetry coefficient, and (g) kernel groove length.

Before building the models, we’ll need to conduct an exploratory data analysis to ensure we understand the data.

Exploratory Data Analysis

We’ll start by loading the data, renaming the columns, and setting the column containing seed variety to a categorical variable.
```
import pandas as pd

url = 'https://raw.githubuseercontent.com/CJTAYL/USL/main/seeds_dataset.txt'

# Load data into a pandas dataframe
df = pd.read_csv(url, delim_whitespace=True, header=None)

# Rename columns 
df.columns = ['area', 'perimeter', 'compactness', 'length', 'width',
              'asymmetry', 'groove', 'variety']

# Convert 'variety' to a categorical variable
df['variety'] = df['variety'].astype('category')
```
Then we’ll display the structure of the dataframe and its descriptive statistics.
```
df.info()
```
```
df.describe(include='all')
```
Fortunately, there are no missing data (which is rare when dealing with real-world data), so we can continue exploring the data.

An imbalanced dataset can affect quality of clusters, so let’s check how many instances we have from each variety of seed.
```
df['variety'].value_counts()
```
```
1    70
2    70
3    70
Name: variety, dtype: int64
```
Based on the output of the code, we can see that we are working with a balanced dataset. Specifically, the dataset is comprised of 70 seeds from each group.

A useful visualization used during EDAs is the histogram since it can be used to determine the distribution of the data and detect the presence of skew. Since there are three varieties of seeds in the dataset, it might be beneficial to plot the distribution of each numeric variable grouped by the variety.
```
import matplotlib.pyplot as plt
import seaborn as sns

# Set the theme of the plots
sns.set_style('whitegrid')

# Identify categorical variable
categorical_column = 'variety'
# Identify numeric variables
numeric_columns = df.select_dtypes(include=['float64']).columns

# Loop through numeric variables, plot against variety
for variable in numeric_columns:
    plt.figure(figsize=(8, 4)) # Set size of plots
    ax = sns.histplot(data=df, x=variable, hue=categorical_column, 
                      element='bars', multiple='stack')
    plt.xlabel(f'{variable.capitalize()}')
    plt.title(f'Distribution of {variable.capitalize()}' 
              f' grouped by {categorical_column.capitalize()}')

    legend = ax.get_legend()
    legend.set_title(categorical_column.capitalize())

    plt.show()
```
One example of the histograms generated by the code

From this plot, we can see there is some skewness in the data. To provide a more precise measure of skewness, we can used the skew() method.
```
df.skew(numeric_only=True)
```
```
area           0.399889
perimeter      0.386573
compactness   -0.537954
length         0.525482
width          0.134378
asymmetry      0.401667
groove         0.561897
dtype: float64
```
Although there is some skewness in the data, none of the individual values appear to be extremely high (i.e., absolute values greater than 1), therefore, a transformation is not necessary at this time.

Correlated features can affect the k-means algorithm, so we’ll generate a heat map of correlations to determine if the features in the dataset are associated.
```
# Create correlation matrix
corr_matrix = df.corr(numeric_only=True)

# Set size of visualization
plt.figure(figsize=(10, 8))

sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm',
            square=True, linewidths=0.5, cbar_kws={'shrink': 0.5})

plt.title('Correlation Matrix Heat Map')
plt.show()
```
There are strong (0.60 ≤ ∣r∣ <0.80) and very strong (0.80 ≤ ∣r∣ ≤ 1.00) correlations between some of the variables; however, the principal component analysis (PCA) we will conduct will address this issue.

Data Preparation

Although we won’t use them in the k-means algorithm, the Seeds dataset contains labels (i.e., ‘variety’ column). This information will be useful when we evaluate the performance of the implementations, so we’ll set it aside for now.
```
# Set aside ground truth for calculation of ARI
ground_truth = df['variety']
```
Before entering the data into the k-means algorithm, we’ll need to scale the data.
```
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

# Scale the data, drop the ground truth labels
ct = ColumnTransformer([
    ('scale', StandardScaler(), numeric_columns)
], remainder='drop')

df_scaled = ct.fit_transform(df)

# Create dataframe with scaled data
df_scaled = pd.DataFrame(df_scaled, columns=numeric_columns.tolist())
```
After scaling the data, we’ll conduct PCA to reduce the dimensions of the data and address the correlated variables we identified earlier.
```
import numpy as np
from sklearn.decomposition import PCA

pca = PCA(n_components=0.95) # Account for 95% of the variance
reduced_features = pca.fit_transform(df_scaled)

explained_variances = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variances)

# Round the cumulative variance values to two digits
cumulative_variance = [round(num, 2) for num in cumulative_variance]

print(f'Cumulative Variance: {cumulative_variance}')
```
```
Cumulative Variance: [0.72, 0.89, 0.99]
```
The output of the code indicates that one dimension accounts for 72% of the variance, two dimensions accounts for 89% of the variance, and three dimensions accounts for 99% of the variance. To confirm the correct number of dimensions were retained, use the code below.
```
print(f'Number of components retained: {reduced_features.shape[1]}')
```
```
Number of components retained: 3
```
Now the data are ready to be inputted into the k-means algorithm. We’re going to examine two implementations of the algorithm — one informed by an elbow plot and another informed by the Silhouette Score.

K-Means Informed by Elbow Plot

To generate an elbow plot, use the code snippet below:
```
from sklearn.cluster import KMeans

inertia = []
K_range = range(1, 6)

# Calculate inertia for the range of k
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=0, n_init='auto')
    kmeans.fit(reduced_features)
    inertia.append(kmeans.inertia_)

plt.figure(figsize=(10, 8))

plt.plot(K_range, inertia, marker='o')
plt.title('Elbow Plot')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.xticks(K_range)
plt.show()
```
The number of clusters is displayed on the x-axis and the inertia is displayed on the y-axis. Inertia refers to the sum of squared distances of samples to their nearest cluster center. Basically, it is a measure of how close the data points are to the mean of their cluster (i.e., the centroid). When inertia is low, the clusters are more dense and defined clearly.

When interpreting an elbow plot, look for the section of the line that looks similar to an elbow. In this case, the elbow is at three. When k = 1, the inertia will be large, then it will gradually decrease as k increases.

The “elbow” is the point where the decrease begins to plateau and the addition of new clusters does not result in a significant decrease in inertia.

Based on this elbow plot, the value of k should be three. Using an elbow plot has been described as more of an art than a science, which is why it has been referred to as “coarse”.

To implement the k-means algorithm when k = 3, we’ll run the following code.
```
k = 3 # Set value of k equal to 3

kmeans = KMeans(n_clusters=k, random_state=2, n_init='auto')
clusters = kmeans.fit_predict(reduced_features)

# Create dataframe for clusters
cluster_assignments = pd.DataFrame({'symbol': df.index,
                                    'cluster': clusters})

# Sort value by cluster
sorted_assignments = cluster_assignments.sort_values(by='cluster')

# Convert assignments to same scale as 'variety'
sorted_assignments['cluster'] = [num + 1 for num in sorted_assignments['cluster']]

# Convert 'cluster' to category type
sorted_assignments['cluster'] = sorted_assignments['cluster'].astype('category')
```
The code below can be used to visualize the output of k-means clustering informed by the elbow plot.
```
from mpl_toolkits.mplot3d import Axes3D


plt.figure(figsize=(15, 8))
ax = plt.axes(projection='3d')  # Set up a 3D projection

# Color for each cluster
colors = ['blue', 'orange', 'green']

# Plot each cluster in 3D
for i, color in enumerate(colors):
    # Only select data points that belong to the current cluster
    ix = np.where(clusters == i)
    ax.scatter(reduced_features[ix, 0], reduced_features[ix, 1], 
               reduced_features[ix, 2], c=[color], label=f'Cluster {i+1}', 
               s=60, alpha=0.8, edgecolor='w')

# Plotting the centroids in 3D
centroids = kmeans.cluster_centers_
ax.scatter(centroids[:, 0], centroids[:, 1], centroids[:, 2], marker='+', 
           s=100, alpha=0.4, linewidths=3, color='red', zorder=10, 
           label='Centroids')

ax.set_xlabel('Principal Component 1')
ax.set_ylabel('Principal Component 2')
ax.set_zlabel('Principal Component 3') 
ax.set_title('K-Means Clusters Informed by Elbow Plot')
ax.view_init(elev=20, azim=20) # Change viewing angle to make all axes visible

# Display the legend
ax.legend()

plt.show()
```
Since the data were reduced to three dimensions, they are plotted on a 3D plot. To gain additional information about the clusters, we can use countplot from the Seaborn package.
```
plt.figure(figsize=(10,8))

ax = sns.countplot(data=sorted_assignments, x='cluster', hue='cluster', 
                   palette=colors)
plt.title('Cluster Distribution')
plt.ylabel('Count')
plt.xlabel('Cluster')

legend = ax.get_legend()
legend.set_title('Cluster')

plt.show()
```
Earlier, we determined that each group was comprised of 70 seeds. The data displayed in this plot indicate k-means implemented with the elbow plot may have performed moderately well since each count of each group is around 70; however, there are better ways to evaluate performance.

To provide a more precise measure of how well the algorithm performed, we will use three metrics: (a) Davies-Bouldin Index, (b) Calinski-Harabasz Index, and (c) Adjusted Rand Index. We’ll talk about how to interpret them in the Results and Analysis section, but the following code snippet can be used to calculate their values.
```
from sklearn.metrics import davies_bouldin_score, calinski_harabasz_score, adjusted_rand_score

# Calculate metrics
davies_boulding = davies_bouldin_score(reduced_features, kmeans.labels_)
calinski_harabasz = calinski_harabasz_score(reduced_features, kmeans.labels_)
adj_rand = adjusted_rand_score(ground_truth, kmeans.labels_)

print(f'Davies-Bouldin Index: {davies_boulding}')
print(f'Calinski-Harabasz Index: {calinski_harabasz}')
print(f'Ajusted Rand Index: {adj_rand}')
```
```
Davies-Bouldin Index: 0.891967185123475
Calinski-Harabasz Index: 259.83668751473334
Ajusted Rand Index: 0.7730246875577171
```
K-Means Informed by Silhouette Score

A silhouette score is the mean silhouette coefficient over all the instances. The values can range from -1 to 1, with
- 1 indicating an instance is well inside its cluster
- 0 indicating an instance is close to its cluster’s boundary
- -1 indicates the instance could be assigned to the incorrect cluster.
When interpreting the silhouette score, we should choose the number of clusters with the highest score.

To generate a plot of silhouette scores for multiple values of k, we can use the following code.
```
from sklearn.metrics import silhouette_score

K_range = range(2, 6)

# Calculate Silhouette Coefficient for range of k
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=1, n_init='auto')
    cluster_labels = kmeans.fit_predict(reduced_features)
    silhouette_avg = silhouette_score(reduced_features, cluster_labels)
    silhouette_scores.append(silhouette_avg)

plt.figure(figsize=(10, 8))

plt.plot(K_range, silhouette_scores, marker='o')
plt.title('Silhouette Coefficient')
plt.xlabel('Number of Clusters')
plt.ylabel('Silhouette Coefficient')
plt.ylim(0, 0.5) # Modify based on data
plt.xticks(K_range)
plt.show()
```
The data indicate that k should equal two.

Using this information, we can implement the K-Means algorithm again.
```
k = 2 # Set k to the value with the highest silhouette score

kmeans = KMeans(n_clusters=k, random_state=4, n_init='auto')
clusters = kmeans.fit_predict(reduced_features)

cluster_assignments2 = pd.DataFrame({'symbol': df.index,
                                    'cluster': clusters})

sorted_assignments2 = cluster_assignments2.sort_values(by='cluster')

# Convert assignments to same scale as 'variety'
sorted_assignments2['cluster'] = [num + 1 for num in sorted_assignments2['cluster']]

sorted_assignments2['cluster'] = sorted_assignments2['cluster'].astype('category')
```
To generate a plot of the algorithm when k = 2, we can use the code presented below.
```
plt.figure(figsize=(15, 8))
ax = plt.axes(projection='3d')  # Set up a 3D projection

# Colors for each cluster
colors = ['blue', 'orange']

# Plot each cluster in 3D
for i, color in enumerate(colors):
    # Only select data points that belong to the current cluster
    ix = np.where(clusters == i)
    ax.scatter(reduced_features[ix, 0], reduced_features[ix, 1],
               reduced_features[ix, 2], c=[color], label=f'Cluster {i+1}',
               s=60, alpha=0.8, edgecolor='w')

# Plotting the centroids in 3D
centroids = kmeans.cluster_centers_
ax.scatter(centroids[:, 0], centroids[:, 1], centroids[:, 2], marker='+',
           s=100, alpha=0.4, linewidths=3, color='red', zorder=10,
           label='Centroids')

ax.set_xlabel('Principal Component 1')
ax.set_ylabel('Principal Component 2')
ax.set_zlabel('Principal Component 3')
ax.set_title('K-Means Clusters Informed by Elbow Plot')
ax.view_init(elev=20, azim=20) # Change viewing angle to make all axes visible

# Display the legend
ax.legend()

plt.show()
```
Similar to the K-Means implementation informed by the elbow plot, additional information can be gleaned using countplotfrom Seaborn.

Based on our understanding of the dataset (i.e., it includes three varieties of seeds with 70 samples from each category), an initial reading of the plot may suggest that the implementation informed by the silhouette score did not perform as well on the clustering task; however, we cannot use this plot in isolation to make a determination.

To provide a more robust and detailed comparison of the implementations, we will calculate the three metrics that were used on the implementation informed by the elbow plot.
```
# Calculate metrics
ss_davies_boulding = davies_bouldin_score(reduced_features, kmeans.labels_)
ss_calinski_harabasz = calinski_harabasz_score(reduced_features, kmeans.labels_)
ss_adj_rand = adjusted_rand_score(ground_truth, kmeans.labels_)

print(f'Davies-Bouldin Index: {ss_davies_boulding}')
print(f'Calinski-Harabasz Index: {ss_calinski_harabasz}')
print(f'Adjusted Rand Index: {ss_adj_rand}')
```
```
Davies-Bouldin Index: 0.7947218992989975
Calinski-Harabasz Index: 262.8372675890969
Adjusted Rand Index: 0.5074767556450577
```
Results and Analysis

To compare the results from both implementations, we can create a dataframe and display it as a table.
```
from tabulate import tabulate

metrics = ['Davies-Bouldin Index', 'Calinski-Harabasz Index', 'Adjusted Rand Index']
elbow_plot = [davies_boulding, calinski_harabasz, adj_rand]
silh_score = [ss_davies_boulding, ss_calinski_harabasz, ss_adj_rand]
interpretation = ['SS', 'SS', 'EP']

scores_df = pd.DataFrame(zip(metrics, elbow_plot, silh_score, interpretation),
                         columns=['Metric', 'Elbow Plot', 'Silhouette Score',
                                  'Favors'])

# Convert DataFrame to a table
print(tabulate(scores_df, headers='keys', tablefmt='fancy_grid', colalign='left'))
```
The metrics used to compare the implementations of k-means clustering include internal metrics (e.g., Davies-Bouldin, Calinski-Harabasz) which do not include ground truth labels and external metrics (e.g., Adjusted Rand Index) which do include external metrics. A brief description of the three metrics is provided below.
- Davies-Bouldin Index (DBI): The DBI captures the trade-off between cluster compactness and the distance between clusters. Lower values of DBI indicate there are tighter clusters with more separation between clusters [3].
- Calinski-Harabasz Index (CHI): The CHI measures cluster density and distance between clusters. Higher values indicate that clusters are dense and well-separated [4].
- Adjusted Rand Index (ARI): The ARI measures agreement between cluster labels and ground truth. The values of the ARI range from -1 to 1. A score of 1 indicates perfect agreement between labels and ground truth; a scores of 0 indicates random assignments; and a score of -1 indicates worse than random assignment [5].
When comparing the two implementations, we observed k-mean informed by the silhouette score performed best on the two internal metrics, indicating more compact and separated clusters. However, k-means informed by the elbow plot performed best on the external metric (i.e., ARI) which indicating better alignment with the ground truth labels.

Conclusion

Ultimately, the best performing implementation will be determined by the task. If the task requires clusters that are cohesive and well-separated, then internal metrics (e.g., DBI, CHI) might be more relevant. If the task requires the clusters to align with the ground truth labels, then external metrics, like the ARI, may be more relevant.

The purpose of this project was to provide a comparison between k-means clustering informed by an elbow plot and the silhouette score, and since there wasn’t a defined task beyond a pure comparison, we cannot provide a definitive answer as to which implementation is better.

Although the absence of a definitive conclusion may be frustrating, it highlights the importance of considering multiple metrics when comparing machine learning models and remaining focused on the project’s objectives.

Thank you for taking the time to read this article. If you have any feedback or questions, please leave a comment.

References

[1] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (2021), O’Reilly.

[2] M. Charytanowicz, J. Niewczas, P. Kulczycki, P. Kowalski, S. Łukasik, & S. Zak, Complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images (2010), Advances in Intelligent and Soft Computing https://doi.org/10.1007/978-3-642-13105-9_2

[3] D. L. Davies, D.W. Bouldin, A Cluster Separation Measure (1979), IEEE Transactions on Pattern Analysis and Machine Intelligence https://doi:10.1109/TPAMI.1979.4766909

[4] T. Caliński, J. Harabasz, A Dendrite Method for Cluster Analysis (1974) Communications in Statistics https://doi:10.1080/03610927408827101

[5] N. X. Vinh, J. Epps, J. Bailey, Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance (2010), Journal of Machine Learning Research https://www.jmlr.org/papers/volume11/vinh10a/vinh10a.pdf

Comparison of Methods to Inform K-Means Clustering was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Comparison of Methods to Inform K-Means Clustering

Go Here to Read this Fast! Comparison of Methods to Inform K-Means Clustering
March 4, 2024
Alida gains deeper understanding of customer feedback with Amazon Bedrock

Kinman Lam

This post is co-written with Sherwin Chu from Alida. Alida helps the world’s biggest brands create highly engaged research communities to gather feedback that fuels better customer experiences and product innovation. Alida’s customers receive tens of thousands of engaged responses for a single survey, therefore the Alida team opted to leverage machine learning (ML) to […]

Originally appeared here:
Alida gains deeper understanding of customer feedback with Amazon Bedrock

Go Here to Read this Fast! Alida gains deeper understanding of customer feedback with Amazon Bedrock

March 4, 2024
How to Learn the Math Needed for Data Science

Egor Howell

A breakdown of the three fundamental math fields required for data science: statistics, linear algebra and calculus.

Continue reading on Towards Data Science »

Originally appeared here:
How to Learn the Math Needed for Data Science

Go Here to Read this Fast! How to Learn the Math Needed for Data Science

March 4, 2024
Framework for Optimizing Generative AI to Meet Business Needs
Sarthak Handa
The playbook for selecting right optimization strategy guided by clear business objectives to better meet the needs of customers.

Source: Dalle3

Generating human-like text and speech was once only possible in science fiction. But the rapid evolution of Large Language Models (LLMs) like GPT-3 and PaLM has brought this vision closer to reality, unlocking a range of promising business applications from chatbots to content creation.

Yet, the general-purpose foundation models often fail to meet the needs of industry use cases. Businesses have different requirements for their generative AI applications — from performance, cost, latency to explainability. Moreover, the nature and quantity of the data available for model training can differ significantly. It is therefore important for product teams to outline key business criteria for their generative AI application and select the right toolkit of optimization techniques to meet these needs.

In this post, we outline a framework for identifying and prioritizing strategic focus areas for your generative AI application. We will also explore popular optimization methods and discuss their unique strengths, ideal applications, and trade-offs in meeting the application requirements. With the right optimization strategy guided by clear business objectives, companies can develop custom AI solutions that balance the priorities critical to their success. Let’s dive in!

Framework to Assess Business Needs & Constraints

To tailor the strategy for optimizing LLMs effectively, product teams should start by building a deep understanding of the business objectives and the constraints within which they’re operating. Assess and prioritize the key dimensions listed below for your business use case:

SourceL Author

1. Performance Goal: Define the measure and level of performance your AI needs to achieve. This could be combination of factual accuracy, alignment with human values, or other task-specific metrics.

Questions to Consider: What are the best dimensions for measuring performance? What is the minimum acceptable performance bar? How does performance align with user expectations in your industry?

2. Latency Targets: Determine the maximum response time that your application can afford without negatively impacting user experience. This could be especially important when LLMs are deployed in time-sensitive or resource-constrained scenarios (e.g., voice assistant, edge devices).

Questions to Consider: How does latency impact user satisfaction and retention? What are industry standards for response time?

3. Cost Efficiency: Evaluate the cost of operating AI with the expected ROI. Higher initial costs may be justified when they lead to substantial savings, revenue growth, or strategic benefits that outweigh investment.

Questions to Consider: How does the cost of operating LLMs impact your budget? How does the ROI compare with the cost of AI deployment?

4. Explainability & Trust: Determine if there is a need to ensure that the AI decisions are easily understood by users, which is critical for building trust, especially in fields with stringent regulatory demands.

Questions to Consider: Is your industry regulated, requiring transparency in AI’s decisions? How does explainability affect user trust and adoption?

5. External Knowledge: Assess if your AI needs access to external data sources to remain relevant and provide accurate responses.

Questions to Consider: Does your AI need real-time data to make decisions?

6. Data Availability: The nature and quantity of data available for training your AI could widely impact optimization strategy.

Questions to Consider: Do you have access to a large dataset for training, or will you need to use synthetic or augmented data? How often will you need to update the training data to keep the AI relevant?

Presented below is a table outlining three distinct use cases for generative AI applications, with a corresponding evaluation of priorities for each dimension within the framework:

SourceL Author

As you can see, the priorities and constrains can vary widely across different use cases.

For instance, consider a company aiming to develop a customer support chatbot to ease the workload on human staff. In this scenario, accuracy performance and external data integration are of high priority to deliver responses that are not only factually correct but also up-to-date. While latency holds some significance, users may be willing to tolerate brief delays. Typically, such a company will have access to an extensive archive of past customer support interactions that can be used for training models.

In contrast, the critical application of AI for assessing software code quality and risk demands a increased focus on factual accuracy and explainability of the AI’s insights, often due to the potential consequences of errors. Cost and latency are secondary considerations in this context. This use case could benefit from external data integration in some cases, and it usually faces constraints regarding the availability of rich training datasets.

A solid understanding of strategic priorities and constraints associated with the use case can help teams develop a tailored strategy for optimizing LLMs to meet the unique needs of the users.

Diving Deeper Into LLM Optimization Techniques

This section delves into the various optimization techniques, highlighting their objectives, ideal use-cases, and inherent trade-offs, particularly in the light of balancing the business goals discussed above.

Techniques Table Breakdown:

Source: Author

1. Prompt Engineering:

Execution Complexity: Low

When to Use: For reshaping response and quick improvement without altering the model. Start with this technique to maximize a pre-trained model’s effectiveness before trying more complex optimization methods.

What it entails: Prompt engineering involves crafting the input query to a model in a way that elicits the desired output. It requires understanding how the model responds to different types of instructions but doesn’t require retraining the model or altering its architecture. This method merely optimizes the way the existing model accesses and applies its pre-trained knowledge, and does not enhance the model’s intrinsic capabilities.

“It’s like adjusting the way you ask a question to a knowledgeable friend to get the best possible answer.”

Examples:
- Asking a language model to “Write a poem in the style of Shakespeare” versus “Write a poem” to elicit a response in a specific literary style.
- Providing a detailed scenario in prompt for a conversational AI to ensure the model understands its role as customer service agent.
Trade-offs:
- Trial & Error: Designing the most effective prompt requires iterations, since relationship between prompt and AI output is not always intuitive.
- Output Quality: The quality of the output is highly dependent on the design of the prompt, and there are limitations to the level of improvements that you can achieve through this method.
2. Fine-Tuning:

Execution Complexity: Medium

When to Use: Fine-tuning should be considered when you need the model to adapt to a specific domain or task that may not be well-covered by the base pre-trained model. It is a step towards increasing domain specific accuracy and creating a more specialized model that can handle domain specific data and terminology.

What it entails: Fine-tuning is the process of continuing the training of a pre-trained model on a new dataset that is representative of the target task or domain. This new dataset consists of input-output pairs that provide examples of the desired behavior. During fine-tuning, the model’s weights are updated to minimize the loss on this new dataset, effectively adapting the model to the new domain.

“Think of it as giving your friend a crash course on a topic you want them to become an expert in; showing them multiple examples of questions that may come in a test and the sample answers that they are expected to respond with.”

Examples:
- A general-purpose language model can be fine-tuned on legal documents to improve its performance for reviewing such documents.
- An image recognition model can be fine-tuned with medical imaging datasets to better identify specific diseases in X-rays or MRIs.
Trade-offs:
- Data Requirement: Fine-tuning requires a labeled dataset that is relevant to the task, which can be resource-intensive to create.
- Overfitting Risk: There is a potential risk of the model becoming too specialized to the fine-tuning data, which can decrease its ability to generalize to other contexts or datasets.
3. Retrieval-Augmented Generation (RAG):

Execution Complexity: High

When to use: RAG should be considered when there is a need for the AI model to access and incorporate external information to generate responses. This is especially relevant when the model is expected to provide up-to-date or highly specific information that is not contained within its pre-trained knowledge base.

What it entails:

RAG combines the generative capabilities of an LLM with a retrieval system. The retrieval system queries a database, knowledge base, or the internet to find information relevant to the input prompt. The retrieved information is then provided to the language model, which incorporates this context to generate a richer and more accurate response. By citing the sources used by the RAG system to generate responses, generative AI applications can offer enhanced explainability to the users.

In the coming years, this optimization technique is expected to gain widespread popularity as an increasing number of products seek to leverage their latest business data to tailor experiences for customers.

“It’s akin to your friend being able to look up information online to answer questions that are outside their immediate expertise. It’s an open book exam.”

Examples:
- In a RAG-based online chatbot, retriever can pull relevant information from a database or the internet to provide up-to-date answers.
- A homework assistant AI could use RAG to fetch the most recent scientific data to answer a student’s question about climate change.
Trade-offs:
- Complex Implementation: RAG systems require a well-integrated retrieval system, which can be challenging to set up and maintain.
- Quality of Information: The usefulness of the generated response is highly dependent on the relevance and accuracy of retrieved information. If the retrieval system’s sources are outdated or incorrect, the responses will reflect that.
- Slow Response Time: Retrieving information from external source to generate response can add latency.
4. Reinforcement Learning from Human Feedback (RLHF):

Execution Complexity: Very High

When to use: RLHF should be used when the model’s outputs need to align closely with complex human judgments and preferences.

What it entails: RLHF is a sophisticated reinforcement learning technique that refines a model’s behavior by incorporating human evaluations directly into the training process. This process typically involves collecting data from human operators who rank the outputs from AI on various quality metrics such as relevance, helpfulness, tone, etc. These data signals are then used to train a reward model, which guides the reinforcement learning process to produce outputs that are more closely aligned with human preferences.

“It’s similar to your friend learning from past conversations about what makes a discussion enjoyable, and using that knowledge to improve future interactions.”

Examples:
- A social media platform could use RLHF to train a moderation bot that not only identifies inappropriate content but also responds to users in a way that is constructive and sensitive to context.
- A virtual assistant could be fine-tuned using RLHF to provide more personalized and context-aware responses to user requests.
Trade-offs:
- High Complexity: RLHF involves complex, resource-intensive processes, including human feedback gathering, reward modeling, and reinforcement learning.
- Quality Risk: There’s a risk of bias in the feedback data, which can lead to affect model quality. Ensuring consistent quality of human feedback and aligning the reward model with desired outcomes can be difficult.
5. Knowledge Distillation

Execution Complexity: Moderate to High

When to use: Knowledge distillation is used when you need to deploy sophisticated models on devices with limited computational power or in applications where response time is critical.

What it entails: It’s a compression technique where a smaller, more efficient model (known as the student) is trained to replicate the performance of a larger, more complex model (the teacher). The training goes beyond just learning the correct answers (hard targets), and involves the student trying to produce similar probabilities as the teacher’s predictions (soft targets). This approach enables the student model to capture the nuanced patterns and insights that teacher model has learned.

“This is similar to distilling the wisdom of a seasoned expert into a concise guidebook that a novice can use to make expert-level decisions without going through years of experience.”

Examples:
- A large-scale language model could be distilled into a smaller model that runs efficiently on smartphones for real-time language translation.
- An image recognition system used in autonomous vehicles can be distilled into a light model that can run on vehicle’s onboard computer.
Trade-offs:
- Performance vs. Size: The distilled model may not always match the performance of the teacher model, leading to a potential decrease in accuracy or quality.
- Training Complexity: The distillation process is time-consuming and involves careful experimentation to ensure the student model learns effectively. It requires a deep understanding of models’ architectures and the ability to translate knowledge from one to another.
Now let’s take a look at a real-world example in action.

Example: Customer Support Chatbot

Let’s revisit the use case of building customer support chatbot to reduce workload on human support staff. The requirements/ constraints included:
1. Performance: High Priority (Emphasis on Factual Accuracy)
2. External Knowledge: High Priority
3. Latency Targets: Medium Priority
4. Cost Efficiency: Low Priority
5. Explainability & Trust: Medium Priority
6. Data Availability: Ample (Past Conversations Data)
With the clear understanding of business context and priorities, product builders can devise the most effective optimization strategy.

LLM Optimization Decision Steps:
1. Prompt Engineering should serve as the first step to improve chatbot’s initial understanding and response capabilities. However, this alone is unlikely to suffice for specialized domain accuracy.
2. Fine-Tuning the model with historic customer conversation data is critical for boosting chatbot’s accuracy performance, and making the model adept at handling, nuanced industry-specific inquiries.
3. Incorporating Retrieval-Augmented Generation (RAG) is vital for providing users up-to-date product information and relevant web links.
4. While a certain degree of latency is tolerable, monitoring and potentially optimizing response times will still be advisable. Optimization strategies here could include caching common queries to speed up responses and using prompt engineering strategically to reduce unnecessary external data retrievals.
As you can see, a combination of strategies is often necessary to meet the specific demands of a use case. Flexibility in optimization strategies can be crucial, as requirements can change over time, and systems need to balance multiple requirements simultaneously.

Conclusion

Optimizing LLMs for a business use case is both an art and a science, which requires a deep understanding of the underlying technology and the objectives at hand. As AI continues to evolve, the choice of optimization techniques will become increasingly strategic, influencing not only the performance of individual applications but also the overall trajectory of AI’s role in society.

Whether you’re optimizing for speed, accuracy, cost, or transparency, the techniques discussed above offer a toolkit for enhancing LLMs to meet the demands of tomorrow’s generative AI powered business applications. By thoughtfully applying these methods, we can create AI that’s not only effective but also responsible and attuned to the nuanced needs of users.

Thanks for reading! If these insights resonate with you or spark new thoughts, let’s continue the conversation. Share your perspectives in the comments below or connect with me on LinkedIn.

Framework for Optimizing Generative AI to Meet Business Needs was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Framework for Optimizing Generative AI to Meet Business Needs

Go Here to Read this Fast! Framework for Optimizing Generative AI to Meet Business Needs
March 4, 2024
How to find yourself in a digital world?
Eden B.
The probabilistic approach to self-locate under uncertainty

Table of contents:
1. What is localization and why does a robot need it?
2. Why are probabilistic tools used to compute localization?
3. End-to-end example: how to use bayesian algorithms to determine a robot’s position under uncertainty?
How can autonomous cars stay within a single lane at 60mph? How can an i-robot avoid falling down the stairs? How can delivery robots know if they are going to the right hungry customer? These are just a few of the questions autonomous vehicles must answer without human intervention.

1. What is localization? And why does a robot need it?

As you might imagine, accurate vehicle location is critical to whether the autonomous vehicle effectively and safely completes its tasks. The process to estimate the vehicle’s position from sensor data is called localization. Localization accuracy increases with sensors that add information and decreases with the vehicle’s movement which adds noise.

2. Why are probabilistic tools used to compute localization?

Probabilistic tools can be leveraged to improve location accuracy where neither the sensors nor the movement is 100% accurate.

What is probability?

According to the dictionary definition, probability is a “numerical description of how likely an event is to occur” (wikipedia). However, when it comes to the meaning of probability, the answer is not that simple. There are rivaling interpretations to probability from two major camps, the frequentists and bayesians.

The Frequentist approach interprets probability as the relative frequency over time; how many times will I get my desired outcome if I repeat an experiment many times?

This approach is objective because anyone who runs the experiments (e.g. flipping a coin) will get the same result in the long run.

The Bayesian approach interprets probability as the degree of certainty that an event will happen. How certain am I that I will get my desired outcome given expert knowledge and the available data? This approach is subjective, as it represents the current state of belief by combining (subjective) previous knowledge and experimental data known to date. It allows estimating the probability of a singular event that we can’t run many times, where the frequentist meaning doesn’t apply.

For example, if the probability that a date texts you back after your first meetup is 0.8, then we are 80% certain that you had a great time and the person will text you back; we don’t mean they will text 80% of the time if you repeat the first date over and over again.

What are the benefits of bayesian probability?

Bayesian probability allows us to both quantify our degree of belief and to update it in light of new evidence.

Bayes’ theorem: update the robot’s position hypothesis P(H), also known as the prior, according to evidence E from the sensors, to return P(H|E), also known as the posterior

In our context, P(H) is our initial guess of the robot‘s position, and P(H|E) is our updated guess after measuring the sensors’ evidence E.

The hypothesis probability distribution quantifies our certainty in the robot’s position.

(1) uniform distribution: if we have no idea where the robot is, all locations are equally probable (2) normal distribution with mean= best guess and the spread = uncertainty

The hypothesis can change according to evidence

The more informative and accurate the sensors data, the bigger effect it will have. If the sensor is perfect, the robot’s position will align with the sensor’s reading, otherwise if the sensor data is very noisy or non-informative, the robot’s position will stay the same.

Updating our hypothesis: How different is the hypothesis after evidence P(H|E) from our initial belief P(H)?

Updates can combine multiple sources of evidence

We can formulate Bayes’ law to combine multiple data sources by using the chain rule, also known as the general product rule. It permits simplifying a joint distribution of multiple evidence to a product of conditional probabilities.

Bayes’ theorem generalizes to cases with multiple evidence, E1 and E2, by applying the chain rule

For example, we often use GPS to navigate from our current location but GPS works best in clear skies and its accuracy is limited to a few meters. Autonomous cars can not solely rely on GPS to remain in a lane that is a few meters wide and to navigate in tunnels or underground parking lots. Autonomous vehicles can compensate for GPS deficiencies by adding more sources of information such as cameras.

3. End-to-end example: how to use bayesian algorithms to determine the robot’s position under uncertainty?

Let’s take a deep dive into a bayesian filter that recursively improves localization probability estimates using bayesian inference. The recursive nature means that the output of the filter at time t_0, P(H|E), serves as the hypothesis input for the next timestamp t_1, P(H).

Suppose a delivery robot is looping a circular path within a space station to transport supplies. The robot has a map detailing the lay of the land and the location of sensors.

The robot’s map shows the route and the presence of sensors, beacon 1 with orange and beacon 2 with blue. The circular nature of the route leads the robot back to the beginning when it reaches the last bin. (Robot image generated with Dall-E)

– Problem definition:

We refer to the estimated robot location as the robot’s state space. For example, a two-dimensional vector (i.e. an ordered pair of numbers) tracing x-axis position and x-axis velocity can track a robot location and changing speed in one-dimension. It is possible to extend the robot’s state space to additional dimensions to track multiple position dimensions (y, z), orientation, etc.

For simplicity, we can assume our robot is moving with constant speed. The movement adds uncertainty to the computation, since it is not 100% reliable. The engine might fail to run at a specific velocity or the robot might encounter obstacles, which will cause the robot to overshoot or undershoot its expected movement.

Our robot will sense its location by measuring the presence of a beacon. The sensor readings, also called measurement space, are not 100% accurate. Sensors might confuse noise with a beacon signal that can lead to false alarms or fail to detect a signal at all.

The robot’s vectors: (1) state space vector showing the x position at time t (2) measurement space vector denoting the beacons’ presence at time t

– The algorithm: histogram filter

With this bayesian filter the robot state space is represented by a histogram through a finite number of bins or regions. It is a discrete filter, meaning the robot can only be in one of these regions, and we compute the probability that the robot is in each. Additionally, within each bin, such as a 5 square meter area, the probability of being at any specific point is the same. If we want to increase the granularity, we must add more bins.

This filter is non parametric, meaning it does not make any strong assumptions on the robot’s state representation, and is not restricted to one type of distribution such as a Gaussian. It is able to represent complex location estimates, such as a multimodal hypothesis that keeps multiple best guesses, but it comes with a computational cost — an exponential complexity. In order to add an additional dimension, from 1-D to 2-D while keeping the same granularity, we will need 10×10 bins, to move to 3-D we will need 10x10x10 bins, and so on. This is a significant limitation for robots that track multiple dimensions and are limited in memory and compute power.

– Calculation:
1. Initial guess: We start from an unknown location equipped with a map. At the beginning every region is equally probable, as represented by a uniform distribution across all bins.
The x position is divided into 10 discrete regions that represent the robot’s state. Initially, we have no information on where the robot is, so each region is equally likely with a probability of 1/10

2. Movement function: simulates the robot movement. The robot’s motion is stochastic, meaning it is not guaranteed to move to the desired bin at each time step. To update the robot’s location after each move, we calculate the probability of the robot being in each region at the next time step. This calculation considers both the probability of the robot to stay in the same region and the probability of it to move to a different region.
```
For each movement:
  For each region:
    Region probability at time t+1 = 
    Region probability at time t x stay probability + 
    Probability of robot coming from the neighboring region x move probability
```
Movement function: assuming the robot moves one step at a time, what is the chance of a successful one-step move? There is a 95% chance the robot moved to the next bin, and a 5% chance it remained stationary

As shown in the equation below, the robot’s one-step movement will not change the location estimate because of the uniform distribution, where each region has equal probability to stay and move.

What is the probability that the robot is in bin #2 after the robot moved one-step?

Even if we initially begin at a bin with complete certainty (100%), the inherent randomness in movement will gradually add noise, leading us towards a uniform distribution over time. We need to add information!

3. Sense function: incorporates measurements that add information using Bayes’ theorem.
```
After each movement:
    For each region:
      Region probability at time t+1 given measurement at time t+1 = 
      Likelihood of the measurement given the robot is in that region 
      x Region probability at time t+1 after movement
      x normlization to ensure that all the probabilities sum to 1.
```
The sensors reliability is represented with probabilities, since they are not 100% accurate. The equations below demonstrate that when the sensor detects the color orange, there is a 90% probability that the robot is located in an orange bin, and a 10% probability that the sensor is wrong and the robot is actually in a blue bin.

Sense function: what is the probability that the robot sensed “orange” given it’s in an orange bin?

The calculation presented below illustrates that, in contrast to movement, sensors contribute information and improve our understanding of the robot’s location. For instance, because bin 2 is not orange, the probability of the robot being in it diminishes from 0.1 to 0.02.

What is the probability that the robot is in the second bin, given that the robot sensed “orange” ?

The image below shows the updated location hypothesis after incorporating movement and sensors’ data to our initial guess.

Thanks to the sense function after one step we have a bit more certainty of where the robot is. We need more steps to increase the certainty among the orange bins.

Final thoughts

Where is the robot? We can continuously refine our answer to this question by using recursive bayesian filters, starting from a uniform distribution that keeps all guesses equally probable until settling on the most likely one.

Bayesian filters help us measure our confidence in the robot’s whereabouts, updating this belief by integrating (noisy) sensor data with prior information (the estimated robot’s position after movement).

Sources:
- These are my summary notes from the first lectures of the highly recommend edX course “Bayesian algorithms for Self-Driving Vehicles” by Dr.Roi Yozevitch.
- Probabilistic Robotics GitBook: Non parametric filter
- Wikipedia probability, probability theory and bayes theorem
- Daniel Sabinasz’s personal blog on histogram filters
- Online converter from latex to png.
- Images: the robot avatar was created with Dall-E. All other images used in this article were created by the author.
How to find yourself in a digital world? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
How to find yourself in a digital world?

Go Here to Read this Fast! How to find yourself in a digital world?
March 4, 2024
Google at APS 2024

Google AI

Posted by Kate Weber and Shannon Leon, Google Research, Quantum AI Team

Today the 2024 March Meeting of the American Physical Society (APS) kicks off in Minneapolis, MN. A premier conference on topics ranging across physics and related fields, APS 2024 brings together researchers, students, and industry professionals to share their discoveries and build partnerships with the goal of realizing fundamental advances in physics-related sciences and technology.

This year, Google has a strong presence at APS with a booth hosted by the Google Quantum AI team, 50+ talks throughout the conference, and participation in conference organizing activities, special sessions and events. Attending APS 2024 in person? Come visit Google’s Quantum AI booth to learn more about the exciting work we’re doing to solve some of the field’s most interesting challenges.

You can learn more about the latest cutting edge work we are presenting at the conference along with our schedule of booth events below (Googlers listed in bold).

Organizing Committee

Session Chairs include: Aaron Szasz

Booth Activities

This schedule is subject to change. Please visit the Google Quantum AI booth for more information.

Crumble

Presenter: Matt McEwen

Tue, Mar 5 | 11:00 AM CST

Qualtran

Presenter: Tanuj Khattar

Tue, Mar 5 | 2:30 PM CST

Qualtran

Presenter: Tanuj Khattar

Thu, Mar 7 | 11:00 AM CST

$5M XPRIZE / Google Quantum AI competition to accelerate quantum applications Q&A

Presenter: Ryan Babbush

Thu, Mar 7 | 11:00 AM CST

Talks

Monday

Certifying highly-entangled states from few single-qubit measurements

Presenter: Hsin-Yuan Huang

Author: Hsin-Yuan Huang

Session A45: New Frontiers in Machine Learning Quantum Physics

Toward high-fidelity analog quantum simulation with superconducting qubits

Presenter: Trond Andersen

Authors: Trond I Andersen, Xiao Mi, Amir H Karamlou, Nikita Astrakhantsev, Andrey Klots, Julia Berndtsson, Andre Petukhov, Dmitry Abanin, Lev B Ioffe, Yu Chen, Vadim Smelyanskiy, Pedram Roushan

Session A51: Applications on Noisy Quantum Hardware I

Measuring circuit errors in context for surface code circuits

Presenter: Dripto M Debroy

Authors: Dripto M Debroy, Jonathan A Gross, Élie Genois, Zhang Jiang

Session B50: Characterizing Noise with QCVV Techniques

Quantum computation of stopping power for inertial fusion target design I: Physics overview and the limits of classical algorithms

Presenter: Andrew D. Baczewski

Authors: Nicholas C. Rubin, Dominic W. Berry, Alina Kononov, Fionn D. Malone, Tanuj Khattar, Alec White, Joonho Lee, Hartmut Neven, Ryan Babbush, Andrew D. Baczewski

Session B51: Heterogeneous Design for Quantum Applications

Link to Paper

Quantum computation of stopping power for inertial fusion target design II: Physics overview and the limits of classical algorithms

Presenter: Nicholas C. Rubin

Authors: Nicholas C. Rubin, Dominic W. Berry, Alina Kononov, Fionn D. Malone, Tanuj Khattar, Alec White, Joonho Lee, Hartmut Neven, Ryan Babbush, Andrew D. Baczewski

Session B51: Heterogeneous Design for Quantum Applications

Link to Paper

Calibrating Superconducting Qubits: From NISQ to Fault Tolerance

Presenter: Sabrina S Hong

Author: Sabrina S Hong

Session B56: From NISQ to Fault Tolerance

Measurement and feedforward induced entanglement negativity transition

Presenter: Ramis Movassagh

Authors: Alireza Seif, Yu-Xin Wang, Ramis Movassagh, Aashish A. Clerk

Session B31: Measurement Induced Criticality in Many-Body Systems

Link to Paper

Effective quantum volume, fidelity and computational cost of noisy quantum processing experiments

Presenter: Salvatore Mandra

Authors: Kostyantyn Kechedzhi, Sergei V Isakov, Salvatore Mandra, Benjamin Villalonga, X. Mi, Sergio Boixo, Vadim Smelyanskiy

Session B52: Quantum Algorithms and Complexity

Link to Paper

Accurate thermodynamic tables for solids using Machine Learning Interaction Potentials and Covariance of Atomic Positions

Presenter: Mgcini K Phuthi

Authors: Mgcini K Phuthi, Yang Huang, Michael Widom, Ekin D Cubuk, Venkat Viswanathan

Session D60: Machine Learning of Molecules and Materials: Chemical Space and Dynamics

Tuesday

IN-Situ Pulse Envelope Characterization Technique (INSPECT)

Presenter: Zhang Jiang

Authors: Zhang Jiang, Jonathan A Gross, Élie Genois

Session F50: Advanced Randomized Benchmarking and Gate Calibration

Characterizing two-qubit gates with dynamical decoupling

Presenter: Jonathan A Gross

Authors: Jonathan A Gross, Zhang Jiang, Élie Genois, Dripto M Debroy, Ze-Pei Cian*, Wojciech Mruczkiewicz

Session F50: Advanced Randomized Benchmarking and Gate Calibration

Statistical physics of regression with quadratic models

Presenter: Blake Bordelon

Authors: Blake Bordelon, Cengiz Pehlevan, Yasaman Bahri

Session EE01: V: Statistical and Nonlinear Physics II

Improved state preparation for first-quantized simulation of electronic structure

Presenter: William J Huggins

Authors: William J Huggins, Oskar Leimkuhler, Torin F Stetina, Birgitta Whaley

Session G51: Hamiltonian Simulation

Controlling large superconducting quantum processors

Presenter: Paul V. Klimov

Authors: Paul V. Klimov, Andreas Bengtsson, Chris Quintana, Alexandre Bourassa, Sabrina Hong, Andrew Dunsworth, Kevin J. Satzinger, William P. Livingston, Volodymyr Sivak, Murphy Y. Niu, Trond I. Andersen, Yaxing Zhang, Desmond Chik, Zijun Chen, Charles Neill, Catherine Erickson, Alejandro Grajales Dau, Anthony Megrant, Pedram Roushan, Alexander N. Korotkov, Julian Kelly, Vadim Smelyanskiy, Yu Chen, Hartmut Neven

Session G30: Commercial Applications of Quantum Computing)

Link to Paper

Gaussian boson sampling: Determining quantum advantage

Presenter: Peter D Drummond

Authors: Peter D Drummond, Alex Dellios, Ned Goodman, Margaret D Reid, Ben Villalonga

Session G50: Quantum Characterization, Verification, and Validation II

Attention to complexity III: learning the complexity of random quantum circuit states

Presenter: Hyejin Kim

Authors: Hyejin Kim, Yiqing Zhou, Yichen Xu, Chao Wan, Jin Zhou, Yuri D Lensky, Jesse Hoke, Pedram Roushan, Kilian Q Weinberger, Eun-Ah Kim

Session G50: Quantum Characterization, Verification, and Validation II

Balanced coupling in superconducting circuits

Presenter: Daniel T Sank

Authors: Daniel T Sank, Sergei V Isakov, Mostafa Khezri, Juan Atalaya

Session K48: Strongly Driven Superconducting Systems

Resource estimation of Fault Tolerant algorithms using Qᴜᴀʟᴛʀᴀɴ

Presenter: Tanuj Khattar

Author: Tanuj Khattar

Session K49: Algorithms and Implementations on Near-Term Quantum Computers

Wednesday

Discovering novel quantum dynamics with superconducting qubits

Presenter: Pedram Roushan

Author: Pedram Roushan

Session M24: Analog Quantum Simulations Across Platforms

Deciphering Tumor Heterogeneity in Triple-Negative Breast Cancer: The Crucial Role of Dynamic Cell-Cell and Cell-Matrix Interactions

Presenter: Susan Leggett

Authors: Susan Leggett, Ian Wong, Celeste Nelson, Molly Brennan, Mohak Patel, Christian Franck, Sophia Martinez, Joe Tien, Lena Gamboa, Thomas Valentin, Amanda Khoo, Evelyn K Williams

Session M27: Mechanics of Cells and Tissues II

Toward implementation of protected charge-parity qubits

Presenter: Abigail Shearrow

Authors: Abigail Shearrow, Matthew Snyder, Bradley G Cole, Kenneth R Dodge, Yebin Liu, Andrey Klots, Lev B Ioffe, Britton L Plourde, Robert McDermott

Session N48: Unconventional Superconducting Qubits

Electronic capacitance in tunnel junctions for protected charge-parity qubits

Presenter: Bradley G Cole

Authors: Bradley G Cole, Kenneth R Dodge, Yebin Liu, Abigail Shearrow, Matthew Snyder, Andrey Klots, Lev B Ioffe, Robert McDermott, B.L.T. Plourde

Session N48: Unconventional Superconducting Qubits

Overcoming leakage in quantum error correction

Presenter: Kevin C. Miao

Authors: Kevin C. Miao, Matt McEwen, Juan Atalaya, Dvir Kafri, Leonid P. Pryadko, Andreas Bengtsson, Alex Opremcak, Kevin J. Satzinger, Zijun Chen, Paul V. Klimov, Chris Quintana, Rajeev Acharya, Kyle Anderson, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Joseph C. Bardin, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Bob B. Buckley, David A. Buell, Tim Burger, Brian Burkett, Nicholas Bushnell, Juan Campero, Ben Chiaro, Roberto Collins, Paul Conner, Alexander L. Crook, Ben Curtin, Dripto M. Debroy, Sean Demura, Andrew Dunsworth, Catherine Erickson, Reza Fatemi, Vinicius S. Ferreira, Leslie Flores Burgos, Ebrahim Forati, Austin G. Fowler, Brooks Foxen, Gonzalo Garcia, William Giang, Craig Gidney, Marissa Giustina, Raja Gosula, Alejandro Grajales Dau, Jonathan A. Gross, Michael C. Hamilton, Sean D. Harrington, Paula Heu, Jeremy Hilton, Markus R. Hoffmann, Sabrina Hong, Trent Huang, Ashley Huff, Justin Iveland, Evan Jeffrey, Zhang Jiang, Cody Jones, Julian Kelly, Seon Kim, Fedor Kostritsa, John Mark Kreikebaum, David Landhuis, Pavel Laptev, Lily Laws, Kenny Lee, Brian J. Lester, Alexander T. Lill, Wayne Liu, Aditya Locharla, Erik Lucero, Steven Martin, Anthony Megrant, Xiao Mi, Shirin Montazeri, Alexis Morvan, Ofer Naaman, Matthew Neeley, Charles Neill, Ani Nersisyan, Michael Newman, Jiun How Ng, Anthony Nguyen, Murray Nguyen, Rebecca Potter, Charles Rocque, Pedram Roushan, Kannan Sankaragomathi, Christopher Schuster, Michael J. Shearn, Aaron Shorter, Noah Shutty, Vladimir Shvarts, Jindra Skruzny, W. Clarke Smith, George Sterling, Marco Szalay, Douglas Thor, Alfredo Torres, Theodore White, Bryan W. K. Woo, Z. Jamie Yao, Ping Yeh, Juhwan Yoo, Grayson Young, Adam Zalcman, Ningfeng Zhu, Nicholas Zobrist, Hartmut Neven, Vadim Smelyanskiy, Andre Petukhov, Alexander N. Korotkov, Daniel Sank, Yu Chen

Session N51: Quantum Error Correction Code Performance and Implementation I

Link to Paper

Modeling the performance of the surface code with non-uniform error distribution: Part 1

Presenter: Yuri D Lensky

Authors: Yuri D Lensky, Volodymyr Sivak, Kostyantyn Kechedzhi, Igor Aleiner

Session N51: Quantum Error Correction Code Performance and Implementation I

Modeling the performance of the surface code with non-uniform error distribution: Part 2

Presenter: Volodymyr Sivak

Authors: Volodymyr Sivak, Michael Newman, Cody Jones, Henry Schurkus, Dvir Kafri, Yuri D Lensky, Paul Klimov, Kostyantyn Kechedzhi, Vadim Smelyanskiy

Session N51: Quantum Error Correction Code Performance and Implementation I

Highly optimized tensor network contractions for the simulation of classically challenging quantum computations

Presenter: Benjamin Villalonga

Author: Benjamin Villalonga

Session Q51: Co-evolution of Quantum Classical Algorithms

Teaching modern quantum computing concepts using hands-on open-source software at all levels

Presenter: Abraham Asfaw

Author: Abraham Asfaw

Session Q61: Teaching Quantum Information at All Levels II

Thursday

New circuits and an open source decoder for the color code

Presenter: Craig Gidney

Authors: Craig Gidney, Cody Jones

Session S51: Quantum Error Correction Code Performance and Implementation II

Link to Paper

Performing Hartree-Fock many-body physics calculations with large language models

Presenter: Eun-Ah Kim

Authors: Eun-Ah Kim, Haining Pan, Nayantara Mudur, William Taranto, Subhashini Venugopalan, Yasaman Bahri, Michael P Brenner

Session S18: Data Science, AI and Machine Learning in Physics I

New methods for reducing resource overhead in the surface code

Presenter: Michael Newman

Authors: Craig M Gidney, Michael Newman, Peter Brooks, Cody Jones

Session S51: Quantum Error Correction Code Performance and Implementation II

Link to Paper

Challenges and opportunities for applying quantum computers to drug design

Presenter: Raffaele Santagati

Authors: Raffaele Santagati, Alan Aspuru-Guzik, Ryan Babbush, Matthias Degroote, Leticia Gonzalez, Elica Kyoseva, Nikolaj Moll, Markus Oppel, Robert M. Parrish, Nicholas C. Rubin, Michael Streif, Christofer S. Tautermann, Horst Weiss, Nathan Wiebe, Clemens Utschig-Utschig

Session S49: Advances in Quantum Algorithms for Near-Term Applications

Link to Paper

Dispatches from Google’s hunt for super-quadratic quantum advantage in new applications

Presenter: Ryan Babbush

Author: Ryan Babbush

Session T45: Recent Advances in Quantum Algorithms

Qubit as a reflectometer

Presenter: Yaxing Zhang

Authors: Yaxing Zhang, Benjamin Chiaro

Session T48: Superconducting Fabrication, Packaging, & Validation

Random-matrix theory of measurement-induced phase transitions in nonlocal Floquet quantum circuits

Presenter: Aleksei Khindanov

Authors: Aleksei Khindanov, Lara Faoro, Lev Ioffe, Igor Aleiner

Session W14: Measurement-Induced Phase Transitions

Continuum limit of finite density many-body ground states with MERA

Presenter: Subhayan Sahu

Authors: Subhayan Sahu, Guifré Vidal

Session W58: Extreme-Scale Computational Science Discovery in Fluid Dynamics and Related Disciplines II

Dynamics of magnetization at infinite temperature in a Heisenberg spin chain

Presenter: Eliott Rosenberg

Authors: Eliott Rosenberg, Trond Andersen, Rhine Samajdar, Andre Petukhov, Jesse Hoke*, Dmitry Abanin, Andreas Bengtsson, Ilya Drozdov, Catherine Erickson, Paul Klimov, Xiao Mi, Alexis Morvan, Matthew Neeley, Charles Neill, Rajeev Acharya, Richard Allen, Kyle Anderson, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Joseph Bardin, A. Bilmes, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, Bob B. Buckley, David Buell, Tim Burger, Brian Burkett, Nicholas Bushnell, Juan Campero, Hung-Shen Chang, Zijun Chen, Benjamin Chiaro, Desmond Chik, Josh Cogan, Roberto Collins, Paul Conner, William Courtney, Alexander Crook, Ben Curtin, Dripto Debroy, Alexander Del Toro Barba, Sean Demura, Agustin Di Paolo, Andrew Dunsworth, Clint Earle, E. Farhi, Reza Fatemi, Vinicius Ferreira, Leslie Flores, Ebrahim Forati, Austin Fowler, Brooks Foxen, Gonzalo Garcia, Élie Genois, William Giang, Craig Gidney, Dar Gilboa, Marissa Giustina, Raja Gosula, Alejandro Grajales Dau, Jonathan Gross, Steve Habegger, Michael Hamilton, Monica Hansen, Matthew Harrigan, Sean Harrington, Paula Heu, Gordon Hill, Markus Hoffmann, Sabrina Hong, Trent Huang, Ashley Huff, William Huggins, Lev Ioffe, Sergei Isakov, Justin Iveland, Evan Jeffrey, Zhang Jiang, Cody Jones, Pavol Juhas, D. Kafri, Tanuj Khattar, Mostafa Khezri, Mária Kieferová, Seon Kim, Alexei Kitaev, Andrey Klots, Alexander Korotkov, Fedor Kostritsa, John Mark Kreikebaum, David Landhuis, Pavel Laptev, Kim Ming Lau, Lily Laws, Joonho Lee, Kenneth Lee, Yuri Lensky, Brian Lester, Alexander Lill, Wayne Liu, William P. Livingston, A. Locharla, Salvatore Mandrà, Orion Martin, Steven Martin, Jarrod McClean, Matthew McEwen, Seneca Meeks, Kevin Miao, Amanda Mieszala, Shirin Montazeri, Ramis Movassagh, Wojciech Mruczkiewicz, Ani Nersisyan, Michael Newman, Jiun How Ng, Anthony Nguyen, Murray Nguyen, M. Niu, Thomas O’Brien, Seun Omonije, Alex Opremcak, Rebecca Potter, Leonid Pryadko, Chris Quintana, David Rhodes, Charles Rocque, N. Rubin, Negar Saei, Daniel Sank, Kannan Sankaragomathi, Kevin Satzinger, Henry Schurkus, Christopher Schuster, Michael Shearn, Aaron Shorter, Noah Shutty, Vladimir Shvarts, Volodymyr Sivak, Jindra Skruzny, Clarke Smith, Rolando Somma, George Sterling, Doug Strain, Marco Szalay, Douglas Thor, Alfredo Torres, Guifre Vidal, Benjamin Villalonga, Catherine Vollgraff Heidweiller, Theodore White, Bryan Woo, Cheng Xing, Jamie Yao, Ping Yeh, Juhwan Yoo, Grayson Young, Adam Zalcman, Yaxing Zhang, Ningfeng Zhu, Nicholas Zobrist, Hartmut Neven, Ryan Babbush, Dave Bacon, Sergio Boixo, Jeremy Hilton, Erik Lucero, Anthony Megrant, Julian Kelly, Yu Chen, Vadim Smelyanskiy, Vedika Khemani, Sarang Gopalakrishnan, Tomaž Prosen, Pedram Roushan

Session W50: Quantum Simulation of Many-Body Physics

Link to Paper

The fast multipole method on a quantum computer

Presenter: Kianna Wan

Authors: Kianna Wan, Dominic W Berry, Ryan Babbush

Session W50: Quantum Simulation of Many-Body Physics

Friday

The quantum computing industry and protecting national security: what tools will work?

Presenter: Kate Weber

Author: Kate Weber

Session Y43: Industry, Innovation, and National Security: Finding the Right Balance

Novel charging effects in the fluxonium qubit

Presenter: Agustin Di Paolo

Authors: Agustin Di Paolo, Kyle Serniak, Andrew J Kerman, William D Oliver

Session Y46: Fluxonium-Based Superconducting Quibits

Microwave Engineering of Parametric Interactions in Superconducting Circuits

Presenter: Ofer Naaman

Author: Ofer Naaman

Session Z46: Broadband Parametric Amplifiers and Circulators

Linear spin wave theory of large magnetic unit cells using the Kernel Polynomial Method

Presenter: Harry Lane

Authors: Harry Lane, Hao Zhang, David A Dahlbom, Sam Quinn, Rolando D Somma, Martin P Mourigal, Cristian D Batista, Kipton Barros

Session Z62: Cooperative Phenomena, Theory

^*Work done while at Google

Originally appeared here:
Google at APS 2024

Go Here to Read this Fast! Google at APS 2024

March 4, 2024
Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together

Swami Sivasubramanian

Amazon Bedrock is the best place to build and scale generative AI applications with large language models (LLM) and other foundation models (FMs). It enables customers to leverage a variety of high-performing FMs, such as the Claude family of models by Anthropic, to build custom generative AI applications. Looking back to 2021, when Anthropic first started […]

Originally appeared here:
Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together

Go Here to Read this Fast! Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together

March 4, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Tag: AI

And encountering emergent complexity along the way

A Quick Primer on Dynamical Agent-Based Modeling

Agents in Position and State Space

Dynamical Systems Theory for Behavior and Interaction

Visualization Needs in Time and Space

Building the Classes and Animation in Python

Visualizing “Social” Dynamic Behavior

Displaying Velocity Vectors

More Complex, State-Wise Dynamics

Using Agent Hue and Saturation To Represent State Variables

Dependence of Acceleration on Species State Variable

Visualization, Emergence, and Data Insights

On Laplace’s brilliant solution to inverse probability and his discovery of the Central Limit Theorem

Laplace’s solution for inverse probability: the setting, the motivation, and the approach

P(X_bar_n=x | p)

P(p)

P(X_bar_n = x)

Non-binomial samples and the discovery of the Central Limit Theorem

Epilogue

References and Copyrights

Books and Papers

Images and Videos

A Brief Tutorial

Description of Data

Exploratory Data Analysis

Data Preparation

K-Means Informed by Elbow Plot

K-Means Informed by Silhouette Score

Results and Analysis

Conclusion

References

The playbook for selecting right optimization strategy guided by clear business objectives to better meet the needs of customers.

Framework to Assess Business Needs & Constraints

Diving Deeper Into LLM Optimization Techniques

1. Prompt Engineering:

2. Fine-Tuning:

3. Retrieval-Augmented Generation (RAG):

4. Reinforcement Learning from Human Feedback (RLHF):

5. Knowledge Distillation

Example: Customer Support Chatbot

Conclusion

The probabilistic approach to self-locate under uncertainty

Table of contents:

1. What is localization? And why does a robot need it?

2. Why are probabilistic tools used to compute localization?

3. End-to-end example: how to use bayesian algorithms to determine the robot’s position under uncertainty?

– Problem definition:

– The algorithm: histogram filter

– Calculation:

Final thoughts

Organizing Committee

Booth Activities

Talks

Monday

Tuesday

Wednesday

Thursday

Friday