A Data-Driven Look at Alien Civilizations (Part 1 of the Drake Equation Series)
What if I told you there might be over 2,000 alien civilizations currently in the Milky Way galaxy? Sounds like a plot twist from your favorite sci-fi show, right? But what if I said we could use data science to get closer to an answer? That’s exactly what we’ll be doing in this series, using real numbers to estimate how many alien civilizations might exist, how close they could be, and whether we have any chance of ever contacting them.
In this series, we’ll be working through the Drake Equation, which has been the go-to tool for scientists since the 1960s when it comes to estimating how many advanced alien civilizations are out there. We’ll be spicing things up with modern data science techniques like Monte Carlo simulations, which are essentially fancy ways of saying, “Let’s run the numbers thousands of times and see what happens.”
The Big Question: Where Is Everybody?
In 1950, physicist Enrico Fermi famously asked, “Where is everybody?” The universe is unimaginably vast, with billions of stars just in our galaxy, and each of these stars likely has planets. So why haven’t we met any aliens yet? That’s the Fermi Paradox — the contradiction between the high probability of extraterrestrial life and the lack of evidence for or contact with any alien civilizations.
To help solve this puzzle, Frank Drake came up with the Drake Equation in 1961. It’s a way of breaking down the problem into smaller steps, asking questions like: “How many stars are there? How many have planets? How many of those planets could support life?” Each of these questions narrows down the search, and at the end, we get a number that tells us how many civilizations might be out there, sending signals into space.
The Drake Equation Breakdown
Here’s what the equation looks like:
Where each part of the equation represents a key factor in figuring out how many civilizations are out there:
- R: The rate at which new stars are formed in our galaxy.
- f_p: The fraction of those stars that have planets.
- n_e: The average number of planets per star that could support life.
- f_l: The fraction of those planets where life actually appears.
- f_i: The fraction of life-bearing planets where intelligent life evolves.
- f_c: The fraction of civilizations that develop technology to communicate across space.
- L: The length of time these civilizations broadcast signals we could detect.
Step 1 in the Drake Equation: The Number of Stars in the Galaxy
The first variable in the Drake Equation is typically R, the rate of star formation in our galaxy. However, for this specific analysis, we’ll focus on the total number of stars that currently exist in the Milky Way. Our goal is to figure out how many stars are out there right now that could potentially host habitable planets.
We’re not asking how many new stars are being born — we’re estimating the total number of stars in the galaxy that are likely to have planets where life could develop.
Information on Star Types: G-type, K-type, and M-type Stars
While there may be 100 billion to 400 billion stars in the Milky Way, not all of them are suitable for supporting life. We’ll focus on stars that are similar to our Sun or have long enough lifespans to give life a chance to develop. Specifically, we’re looking at three main types of stars:
- G-type stars: These are similar to our Sun. There are around 2.5 to 6.25 billion G-type stars in the Milky Way.
- K-type stars: Slightly cooler and dimmer than the Sun, but still long-lived and stable. There are around 7.5 to 12.5 billion K-type stars.
- M-type stars: These are small red dwarfs, much more common than G- and K-type stars. There are between 8.75 to 20 billion M-type stars.
So, in total, we estimate that 18.75 billion to 38.75 billion stars in the galaxy are likely candidates for hosting life-supporting planets.
Code for Step 1: Calculating the Total Number of Stars
To estimate the total number of stars that could host habitable planets, we use a Monte Carlo simulation to randomly generate numbers based on the likely distribution of these star types. The following SAS code simulates these star counts:
data total_stars(keep=total_stars);
do i = 1 to 100000;
do while (1);
total_stars = rand("normal", 28750000000, 5000000000);
/* Check if the value is within the desired range */
if total_stars >= 18750000000 and total_stars <= 38750000000 then leave;
end;
output;
end;
drop i;
run;
Output and Explanation for Step 1: Number of Stars
After running the Monte Carlo simulation, using our specified range and assumptions, we’ve got some big numbers for how many stars in the Milky Way could host habitable planets:
- Average Number of Stars: 28.75 billion
- Range: 18.75 billion to 38.75 billion
What Do These Results Mean?
The results give us a pretty solid estimate of how many stars might have planets where life could exist. The average number of stars came out to around 28.75 billion, which means we’re looking at quite a few potential homes for alien life. This estimate doesn’t come out of nowhere — we defined this range based on existing research in astronomy, and we assumed the distribution would be bell-shaped to reflect the natural uncertainty in the data.
Why Does the Shape of the Distribution Matter?
- Most Simulations Cluster Around the Average: The bell-shaped curve isn’t random — we chose it to reflect that the majority of the stars should fall around the mean estimate of 28.75 billion. This gives us confidence that we’re not dealing with extreme outliers.
- Range and Variability: The range of 18.75 billion to 38.75 billion stars was set by us, based on expert reasoning. We know there’s some uncertainty when dealing with these huge numbers, but the distribution helps us feel more confident about the middle ground. We aren’t likely to be wildly off-track.
What Does This Mean for the Drake Equation?
This step gives us a strong foundation for the rest of the Drake Equation. We now have a solid idea of how many stars in the Milky Way could potentially host habitable planets. But just because a star could host planets doesn’t mean it does. The next step is to figure out how many of these stars actually have planetary systems — and that’s where Step 2 comes in.
Step 2 in the Drake Equation: The Fraction of Stars with Planets (f_p)
Now that we have an estimate for the number of stars, the next step in the Drake Equation is f_p, the fraction of those stars that have planetary systems.
Recent astronomical discoveries, thanks to missions like Kepler and TESS, suggest that nearly every star has at least one planet. For this analysis, we’ll estimate that somewhere between 98% and 100% of stars have planets, leaving a small margin for uncertainty.
Code for Step 2: Calculating the Fraction of Stars with Planets
To model the fraction of stars that have planets, we’ll run another Monte Carlo simulation. Here’s the SAS code to simulate the fraction of stars with planets:
/*Percent of Stars with Planets*/
data perc_stars_with_plan(keep=perc_stars_with_plan);
do i = 1 to 100000;
do while (1);
perc_stars_with_plan = rand("normal", 0.99, 0.001);
/* Check if the value is within the desired range */
if perc_stars_with_plan >= 0.98 and perc_stars_with_plan <= 1 then leave;
end;
output;
end;
drop i;
format perc_stars_with_plan percent7.4;
run;
Output and Explanation for Step 2: Fraction of Stars with Planets (f_p)
Once we had our number of stars, the next question was: how many of those stars actually have planets? Using recent data from missions like Kepler, we modeled this step with a very tight range, assuming 98% to 100% of stars have planets. Here’s what the simulation gave us:
- Average Fraction of Stars with Planets: 99%
- Range: 98% to 100%
Breaking Down the Results
The results are clear: almost every star has planets. We specified this range based on strong evidence, and the simulation confirms what we expected — around 99% of stars in the Milky Way are likely to have planets. The near-perfect range from 98% to 100% reflects the overwhelming likelihood that most stars are planetary systems.
Why Is This Important?
- Nearly Every Star Has Planets: Since we already expected nearly all stars to have planets, this tight result is reassuring. It’s good news for our search for alien life — there are billions of potential planets out there.
- Little Room for Uncertainty: Because the range we specified is so narrow, we’re very confident in this step. The small variability means we can move forward without worrying too much about this factor. We’ve got this one covered.
What Does This Mean for the Drake Equation?
This step narrows things down nicely. Since almost every star has planets, we can confidently focus on the next, more challenging question: how many of these planets are in the habitable zone? With billions of stars and almost all of them having planets, the next big focus will be how many of those planets could support life. That’s what we’ll explore in the next step of the equation.
Wrapping Up Part 1
We’ve now estimated that there are approximately 28.76 billion stars in the Milky Way with planets. But, not all planets are created equal — some are too hot, too cold, or simply not suitable for life as we know it.
Next, we’ll dive into how many planets could actually be habitable, focusing on the so-called “Goldilocks Zone” — the region around a star where conditions are just right for liquid water to exist. Stay tuned for Part 2, where we explore the odds of finding life-sustaining planets.
Next in the series: From Stars to Life: A Data-Driven Journey (Part 2 of the Drake Equation Series)
Unless otherwise noted, all images are by the author
Calculating Contact was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Calculating Contact