Tag: AI

Exploratory Data Analysis in 11 Steps

Loren Hinkson

How to build a process with strong communication and expectation-setting practices

Starting an exploratory data analysis can be daunting. How do you know what to look at? How do you know when you’re done? What if you miss something important? In my experience, you can alleviate some of these worries with communication and expectation-setting. I’m sharing my process for exploratory data analysis here as a resource for folks just getting started in data, and for more experienced analysts and data scientists seeking to hone their own processes.

Image by Elf-Moondance via Pixabay

1. Talk to stakeholders about their objectives

One of the first things you should do when starting an exploratory analysis is to talk to the product manager/ leadership/ stakeholder(s) responsible for making decisions using the output of the analysis. Develop a solid understanding of the decisions they need to make, or the types of changes/ interventions they need to make calls on.

If you’re supporting product iterations, it may also be helpful to speak with UX researchers, designers, or customer service representatives who interact with customers or receive end-user feedback. You can add a lot of value by understanding whether a customer request is viable, or identifying patterns in user behavior that indicate the need for a specific feature.

2. Summarize analysis goals and get alignment

These conversations will help you determine the analysis goals, i.e., whether you should focus on identifying patterns and relationships, understanding distributions, etc. Summarize your understanding of the goal(s), specify an analysis period and population, and make sure all relevant stakeholders are aligned. At this point, I also like to communicate non-goals of the analysis — things stakeholders should not expect to see as part of my deliverable(s).

Make sure you understand the kinds of decisions to be made based on the results of your analysis. Get alignment from all stakeholders on the goals of the analysis before you start.

3. Develop a list of research questions

Create a series of questions related to the analysis goals you would like to answer, and note the dimensions you’re interested in exploring within, i.e., specific time periods, new users, users in a certain age bracket or geographical area, etc.

Example: for an analysis on user engagement, a product manager may want to know how many times new users typically visit your website in their first versus second month.

4. Identify your knowns and unknowns

Collect any previous research, organizational lore, and widely accepted assumptions related to the analysis topic. Review what’s been previously researched or analyzed to understand what is already known in this arena.

Make note of whether there are historical answers to any of your analysis questions. Note: when you’re determining how relevant those answers are, consider the amount of time since any previous analysis, and whether there have been significant changes in the analysis population or product/ service since then.

Example: Keeping to the new user activity idea, maybe someone did an analysis two years ago that identified that users’ activity tapered off and plateaued 5 weeks after account creation. If the company introduced a new 6-week drip campaign for new users a year ago, this insight may not be relevant any longer.

5. Understand what is possible with the data you have

Once you’ve synthesized your goals and key questions, you can identify what relevant data is easily available, and what supplemental data is potentially accessible. Verify your permissions to each data source, and request access from data/ process owners for any supplemental datasets. Spend some time familiarizing yourself with the datasets, and rule out any questions on your list it’s not possible to answer with the data you have.

6. Set expectations for what constitutes one analysis

Do a prioritization exercise with the key stakeholder(s), for example, a product manager, to understand which questions they believe are most important. It’s a good idea to T-shirt size (S, M, L) the complexity of the questions on your list before this conversation to illustrate the level of effort to answer them. If the questions on your list are more work than is feasible in a single analysis, use those prioritizations to determine how to stagger them into multiple analyses.

T-shirt size the level of effort to answer the analysis questions on your list. If it adds up to more work than is feasible in a single analysis, work with stakeholders to prioritize them into multiple analyses.

7. Transform and clean the data as necessary

If data pipelines are in place and data is already in the format you want, evaluate the need for data cleaning (looking for outliers, missingness/ sparse data, duplicates, etc.), and perform any necessary cleaning steps. If not, create data pipelines to handle any required relocation or transformations before data cleaning.

8. Use summary statistics to understand the “shape” of data

Start the analysis with high-level statistical exploration to understand distributions of features and correlations between them. You may notice data sparsity or quality issues that impact your ability to answer questions from your analysis planning exercise. It’s important to communicate early to stakeholders about questions you cannot address, or that will have “noisy” answers less valuable for decision-making.

9. Answer your analysis questions

At this stage, you’ll move into answering the specific questions you developed for the analysis. I like to visualize as I go, as this can make it easier to spot patterns, trends, and anomalies, and I can drop interesting visuals right into my write-up draft.

Depending on the type of analysis, you may want to generate some additional features (ex: bucket ranges for a numeric feature, indicators for whether a specific action was taken within a given period or more times than a given threshold) to explore correlations further, and look for less intuitive relationships between features using machine learning.

Visualize and document your findings as you go to minimize re-work and develop an idea of the “storyline” or theme for the analysis.

10. Document your findings

I like the question framework for analyses because it makes it easy to document my findings as I go. As you conduct your analysis, note answers you find under each question. Highlight findings you think are interesting, and make notes on any trains of thought a finding sparked.

This decreases the work that you need to do at the end of your analysis, and you can focus on fleshing out your findings with the “so what?” that tells the audience why they should care about a finding, and the “what next?” recommendations that make your insights actionable. When that’s in place, reorganize questions as necessary to create a consistent “storyline” for the analysis and key findings. At the end, you can include any next steps or additional lines of inquiry you recommend the team look into based on your findings.

If you’re working in a team environment, you may want to have one or more teammates review your code and/ or write-up. Iterate on your draft based on their feedback.

11. Share your findings

When your analysis is ready to share with the original stakeholders, be thoughtful about the format you choose. Depending on the audience, they may respond best to a Slack post, a presentation, a walkthrough of the analysis document, or some combination of the above. Finally, promote your analysis in internal channels, just in case your findings are useful to teams you weren’t working with.

Congrats, you’re done with your analysis!

Exploratory Data Analysis in 11 Steps was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Originally appeared here:
Exploratory Data Analysis in 11 Steps

Go Here to Read this Fast! Exploratory Data Analysis in 11 Steps

June 21, 2024
Creating a Streamlit App for Satellite Imagery Visualization: A Step-by-Step Guide

Mahyar Aboutalebi, Ph.D.

Explore any point on Earth at any time using satellite data with Streamlit

Continue reading on Towards Data Science »

Originally appeared here:
Creating a Streamlit App for Satellite Imagery Visualization: A Step-by-Step Guide

Go Here to Read this Fast! Creating a Streamlit App for Satellite Imagery Visualization: A Step-by-Step Guide

June 21, 2024
Manage Amazon SageMaker JumpStart foundation model access with private hubs

Raju Rangan

Amazon SageMaker JumpStart is a machine learning (ML) hub offering pre-trained models and pre-built solutions. It provides access to hundreds of foundation models (FMs). A private hub is a feature in SageMaker JumpStart that allows an organization to share their models and notebooks so as to centralize model artifacts, facilitate discoverability, and increase the reuse […]

Originally appeared here:
Manage Amazon SageMaker JumpStart foundation model access with private hubs

Go Here to Read this Fast! Manage Amazon SageMaker JumpStart foundation model access with private hubs

June 21, 2024
Simplifying Support Vector Machines — A Deep Dive into Binary Classification

Josep Ferrer

MLBasics #4: The Binary Classification King — A Journey Through Support Vector Machines

Continue reading on Towards Data Science »

Originally appeared here:
Simplifying Support Vector Machines — A Deep Dive into Binary Classification

Go Here to Read this Fast! Simplifying Support Vector Machines — A Deep Dive into Binary Classification

June 21, 2024
eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

Aishwarya Subramaniam

eSentire is an industry-leading provider of Managed Detection & Response (MDR) services protecting users, data, and applications of over 2,000 organizations globally across more than 35 industries. These security services help their customers anticipate, withstand, and recover from sophisticated cyber threats, prevent disruption from malicious attacks, and improve their security posture. In 2023, eSentire was […]

Originally appeared here:
eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

Go Here to Read this Fast! eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

June 21, 2024
Understanding the GA4 BigQuery Export Schema and Structure

Jim Barlow

A qualitative investigation into one of the weirdest data structures ever forced upon millions of innocent, unsuspecting analysts

Continue reading on Towards Data Science »

Originally appeared here:
Understanding the GA4 BigQuery Export Schema and Structure

Go Here to Read this Fast! Understanding the GA4 BigQuery Export Schema and Structure

June 21, 2024
Guiding an LLM’s Response to Create Structured Output

Andrea D’Agostino

Learn how to structure a language model’s response to ensure that the response format is respected, such as JSON

Continue reading on Towards Data Science »

Originally appeared here:
Guiding an LLM’s Response to Create Structured Output

Go Here to Read this Fast! Guiding an LLM’s Response to Create Structured Output

June 21, 2024
Enhancing Marketing Mix Modelling with Causal AI
Ryan O’Sullivan
Causal AI, exploring the integration of causal reasoning into machine learning

Photo by Alexey Ruban on Unsplash

What is this series of articles about?

Welcome to my series on Causal AI, where we will explore the integration of causal reasoning into machine learning models. Expect to explore a number of practical applications across different business contexts.

In the last article we covered validating the causal impact of the synthetic control method. In this article we will move onto enhancing marketing mix modelling with Causal AI.

If you missed the last article on synthetic controls, check it out here:

Validating the Causal Impact of the Synthetic Control Method

Introduction

Ongoing challenges with digital tracking has led to a recent resurgence in marketing mix modelling (MMM). At the recent Causal AI conference, Judea Pearl suggested that marketing may be the first industry to adopt Causal AI. So I decided it was time start writing about my learnings from the last 7 years in terms of how MMM, Causal AI and experimentation intersect.

User generated image

The following areas will be explored:
- What is MMM?
- How can Causal AI enhance MMM?
- What experiments can we run to complete the triangulation?
- Outstanding challenges within marketing measurement.
The full notebook can be found here:

causal_ai/notebooks/enhancing marketing mix modelling with causal ai.ipynb at main · raz1470/causal_ai

What is MMM?

MMM is a statistical framework used to estimate how much each marketing channel contributes to sales. It’s heavily influenced by econometrics and in its simplest form is a regression model. Let’s cover the basics of the key components!

Regression

A regression model is constructed where the dependent variable/target (usually sales) is predicted based on several independent variables/features — These usually include the spend on different marketing channels and external factors that may effect demand.

User generated image

The coefficients of the spend variables indicate how much they contribute to sales.

The PyMC marketing package in python is a great place to start exploring MMM:

MMM Example Notebook – pymc-marketing 0.6.0 documentation

Ad stock

Ad stock refers to the lingering effect of marketing spend (or adverting spend) on consumer behaviour. It helps model the long-term effects of marketing. It’s not common behaviour to rush to purchase a product the first time you hear about a brand — the idea of ad stock is that the effect of marketing is cumulative.

User generated image

The most common ad stock method is geometric decay, which assumes that the impact of advertising decays at a constant rate over time. Although this is relatively easy to implement, it is not very flexible. It’s worth checking out the Weibull method which is much more flexible — The PyMC marketing package has implemented it so be sure to check it out:

weibull_adstock – pymc-marketing 0.6.0 documentation

Saturation

Saturation in the context of marketing refers to the idea of diminishing returns. Increasing marketing spend can increase customer acquisition, but as time goes on it becomes more difficult to influence new audiences.

User generated image

There are several saturation methods we could use. The Michaelis-Menton function is a common one — You can also check this out in the PyMC marketing package:

michaelis_menten – pymc-marketing 0.6.0 documentation

How can Causal AI enhance MMM?

MMM frameworks usually use a flat regression model. However, there are some complexities to how marketing channels interact with each other. Is there a tool from our Causal AI toolbox which can help with this?

Causal graphs

Causal graphs are great at disentangling causes from correlations which make them a great tool for dealing with the complexities of how marketing channels interact with each other.

If you are unfamiliar with causal graphs, use my previous article to get up to speed:

Using Causal Graphs to answer causal questions

Understanding the marketing graph

Estimating the causal graph in situations where you have poor domain knowledge available is challenging. But we can use causal discovery to help get us started – Check out my previous article on causal discovery to find out more:

Making Causal Discovery work in real-world business settings

Causal discovery has its limitations and should just be used to create a starting hypothesis for the graph. Luckily, there is a vast amount of domain knowledge around how marketing channels interact with each other that we can build in!

Below I share the knowledge I have picked up from working with marketing experts over the years…

User generated image
- PPC (paid search) has a negative effect on SEO (organic search). The more we spend on PPC the less SEO clicks we get. However, we have an important confounder….demand! A flat regression model will not pick up this intricacy often leading to an overestimation of PPC.
- Social spend has a strong effect on social clicks, the more we spend the more prospects click on social ads. However, some prospects may view an social ad and the next day visit your site via PPC, SEO or Direct. A flat regression model will not pick up this halo effect.
- A similar case can be made for brand spend, where you target prospects with longer term branding messages but no direct call to action to click. These prospects may visit your site via PPC, SEO or Direct at a later stage after becoming aware of your brand.
- The clicks are mediators. If we run a flat regression and include mediators, this can cause issues when estimating causal effects. I won’t cover this topic in too much detail here, but using causal graphs enables us to carefully control for the right variables when estimating causal effects.
Hopefully you can see from the examples above that using a causal graph instead of a flat regression will seriously enhance your solution. The ability to calculate counterfactuals and perform interventions also make it very attractive!

It’s worth noting that it is still worth incorporating the ad stock and saturation transformations into your framework.

What experiments can we run to complete the triangulation?

When working with observational data, we should also be striving to run experiments to help validate assumptions and complement our causal estimates. There are three main tests available to use in acquisition marketing. Let’s dive into them!

Conversion lift tests

Social platforms like Facebook and Snapchat allow you to run conversion lift tests. This is an AB test where we measure the uplift in conversion using a treatment vs control group. These can be very useful when it comes to evaluating the counterfactual from your causal graph for social spend.

Geo lift tests

Geo lift tests can be used to estimate the effect of marketing blackouts or when you start using a new channel. This can be particularly useful for brand digital and TV where there is no direct call to action to measure. I cover this in much more detail in the last article:

Validating the Causal Impact of the Synthetic Control Method

Switch back testing

PPC campaigns can be scheduled to be turned off and on hourly. This creates a great opportunity for switchback testing. Schedule PPC campaigns to be turned off and on each hour for a few weeks, and then calculate the difference between the number of PPC + SEO clicks in the off vs on period. This will help you understand how much of PPC can be captured by SEO, and therefore evaluate the counterfactual from your causal graphs for PPC spend.

I think running experiments is a great way to tweak and then gain confidence in your causal graph. But results could also be used to calibrate your model. Take a look at how the PyMC team have approached this:

Lift Test Calibration – pymc-marketing 0.6.0 documentation

Outstanding challenges within marketing measurement

Today I went into how you can enhance MMM with Causal AI. However, Causal AI can’t solve all of the challenges within acquisition marketing— And there are lots of them unfortunately!

User generated image
- Spend following the demand forecast — One reason for marketing spend being highly correlated with sales volume can be down to the marketing team spending in-line with a demand forecast. One solution here is to randomly shift spend by -10% to +10% each week to add some variation. As you can imagine, the marketing team usually aren’t too keen on this approach!
- Estimating demand — Demand is an essential variable in our model. However, it can be very difficult to collect data on. A reasonable option is extracting google trend data on a search term which aligns to the product you are selling.
- Long term effects of brand — Long term effects of brand are hard to capture as there usually isn’t much signal around this. Long term geo lift tests can help here.
- Multi-collinearity — This is actually one of the biggest problems. All of the variables we have are highly correlated. Using ridge regression can alleviate this a little, but it can still be a problem. A causal graph can help a little too as it essential breaks the problem down into smaller models.
- Buy-in from the marketing team — In my experience this will be your biggest challenge. Causal graphs offer a nice visual way of engaging the marketing team. It also creates an opportunity for you to build up a relationship whilst working with them to agree the intricacies of the graph.
I’ll close things off there — It would be great to hear what you think in the comments!

Follow me if you want to continue this journey into Causal AI —In the next article we will investigate whether Causal AI can improve our forecasting.

Enhancing Marketing Mix Modelling with Causal AI was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Enhancing Marketing Mix Modelling with Causal AI

Go Here to Read this Fast! Enhancing Marketing Mix Modelling with Causal AI
June 21, 2024
3 Painful Mistakes I Made as a Junior Data Scientist

Mandy Liu

Learn from them to fast-track your career

Continue reading on Towards Data Science »

Originally appeared here:
3 Painful Mistakes I Made as a Junior Data Scientist

Go Here to Read this Fast! 3 Painful Mistakes I Made as a Junior Data Scientist

June 21, 2024
Voyage Multilingual 2 Embedding Evaluation

Lars Wiik

Compared to OpenAI, Cohere, Google, and E5

Continue reading on Towards Data Science »

Originally appeared here:
Voyage Multilingual 2 Embedding Evaluation

Go Here to Read this Fast! Voyage Multilingual 2 Embedding Evaluation

June 20, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Tag: AI

How to build a process with strong communication and expectation-setting practices

1. Talk to stakeholders about their objectives

2. Summarize analysis goals and get alignment

3. Develop a list of research questions

4. Identify your knowns and unknowns

5. Understand what is possible with the data you have

6. Set expectations for what constitutes one analysis

7. Transform and clean the data as necessary

8. Use summary statistics to understand the “shape” of data

9. Answer your analysis questions

10. Document your findings

11. Share your findings

Causal AI, exploring the integration of causal reasoning into machine learning

What is this series of articles about?

Introduction

What is MMM?

Regression

Ad stock

Saturation

How can Causal AI enhance MMM?

Causal graphs

Understanding the marketing graph

What experiments can we run to complete the triangulation?

Conversion lift tests

Geo lift tests

Switch back testing

Outstanding challenges within marketing measurement