Category: AI

Seeing Our Reflection in LLMs
Stephanie Kirmer
When LLMs give us outputs that reveal flaws in human society, can we choose to listen to what they tell us?

Photo by Vince Fleming on Unsplash

Machine Learning, Nudged

By now, I’m sure most of you have heard the news about Google’s new LLM*, Gemini, generating pictures of racially diverse people in Nazi uniforms. This little news blip reminded me of something that I’ve been meaning to discuss, which is when models have blind spots, so we apply expert rules to the predictions they generate to avoid returning something wildly outlandish to the user.

This sort of thing is not that uncommon in machine learning, in my experience, especially when you have flawed or limited training data. A good example of this that I remember from my own work was predicting when a package was going to be delivered to a business office. Mathematically, our model would be very good at estimating exactly when the package would get physically near the office, but sometimes, truck drivers arrive at destinations late at night and then rest in their truck or in a hotel until morning. Why? Because no one’s in the office to receive/sign for the package outside of business hours.

Teaching a model about the idea of “business hours” can be very difficult, and the much easier solution was just to say, “If the model says the delivery will arrive outside business hours, add enough time to the prediction that it changes to the next hour the office is listed as open.” Simple! It solves the problem and it reflects the actual circumstances on the ground. We’re just giving the model a little boost to help its results work better.

However, this does cause some issues. For one thing, now we have two different model predictions to manage. We can’t just throw away the original model prediction, because that’s what we use for model performance monitoring and metrics. You can’t assess a model on predictions after humans got their paws in there, that’s not mathematically sound. But to get a clear sense of the real world model impact, you do want to look at the post-rule prediction, because that’s what the customer actually experienced/saw in your application. In ML, we’re used to a very simple framing, where every time you run a model you get one result or set of results, and that’s that, but when you start tweaking the results before you let them go, then you need to think at a different scale.

Applying to LLMs

I kind of suspect that this is a form of what’s going on with LLMs like Gemini. However, instead of a post-prediction rule, it appears that the smart money says Gemini and other models are applying “secret” prompt augmentations to try and change the results the LLMs produce.

In essence, without this nudging, the model will produce results that are reflective of the content it has been trained on. That is to say, the content produced by real people. Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether they are things we’re proud of or not. A model given all the media available in our contemporary society is going to get a whole lot of exposure to racism, sexism, and myriad other forms of discrimination and inequality, to say nothing of violence, war, and other horrors. While the model is learning what people look like, and how they sound, and what they say, and how they move, it’s learning the warts-and-all version.

Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether they are things we’re proud of or not.

This means that if you ask the underlying model to show you a doctor, it’s going to probably be a white guy in a lab coat. This isn’t just random, it’s because in our modern society white men have disproportionate access to high status professions like being doctors, because they on average have access to more and better education, financial resources, mentorship, social privilege, and so on. The model is reflecting back at us an image that may make us uncomfortable because we don’t like to think about that reality.

So what do we do?

The obvious argument is, “Well, we don’t want the model to reinforce the biases our society already has, we want it to improve representation of underrepresented populations.” I sympathize with this argument, quite a lot, and I care about representation in our media. However, there’s a problem.

It’s very unlikely that applying these tweaks is going to be a sustainable solution. Recall back to the story I started with about Gemini. It’s like playing whac-a-mole, because the work never stops — now we’ve got people of color being shown in Nazi uniforms, and this is understandably deeply offensive to lots of folks. So, maybe where we started by randomly applying “as a black person” or “as an indigenous person” to our prompts, we have to add something more to make it exclude cases where it’s inappropriate — but how do you phrase that, in a way an LLM can understand? We probably have to go back to the beginning, and think about how the original fix works, and revisit the whole approach. In the best case, applying a tweak like this fixes one narrow issue with outputs, while potentially creating more.

Let’s play out another very real example. What if we add to the prompt, “Never use explicit or profane language in your replies, including [list of bad words here]”. Maybe that works for a lot of cases, and the model will refuse to say bad words that a 13 year old boy is requesting to be funny. But sooner or later, this has unexpected additional side effects. What about if someone’s looking for the history of Sussex, England? Alternately, someone’s going to come up with a bad word you left out of the list, so that’s going to be constant work to maintain. What about bad words in other languages? Who judges what goes on the list? I have a headache just thinking about it.

This is just two examples, and I’m sure you can think of more such scenarios. It’s like putting band aid patches on a leaky pipe, and every time you patch one spot another leak springs up.

Where do we go from here?

So, what is it we actually want from LLMs? Do we want them to generate a highly realistic mirror image of what human beings are actually like and how our human society actually looks from the perspective of our media? Or do we want a sanitized version that cleans up the edges?

Honestly, I think we probably need something in the middle, and we have to continue to renegotiate the boundaries, even though it’s hard. We don’t want LLMs to reflect the real horrors and sewers of violence, hate, and more that human society contains, that is a part of our world that should not be amplified even slightly. Zero content moderation is not the answer. Fortunately, this motivation aligns with the desires of large corporate entities running these models to be popular with the public and make lots of money.

…we have to continue to renegotiate the boundaries, even though it’s hard. We don’t want LLMs to reflect the real horrors and sewers of violence, hate, and more that human society contains, that is a part of our world that should not be amplified even slightly. Zero content moderation is not the answer.

However, I do want to continue to make a gentle case for the fact that we can also learn something from this dilemma in the world of LLMs. Instead of simply being offended and blaming the technology when a model generates a bunch of pictures of a white male doctor, we should pause to understand why that’s what we received from the model. And then we should debate thoughtfully about whether the response from the model should be allowed, and make a decision that is founded in our values and principles, and try to carry it out to the best of our ability.

As I’ve said before, an LLM isn’t an alien from another universe, it’s us. It’s trained on the things we wrote/said/filmed/recorded/did. If we want our model to show us doctors of various sexes, genders, races, etc, we need to make a society that enables all those different kinds of people to have access to that profession and the education it requires. If we’re worrying about how the model mirrors us, but not taking to heart the fact that it’s us that needs to be better, not just the model, then we’re missing the point.

If we want our model to show us doctors of various sexes, genders, races, etc, we need to make a society that enables all those different kinds of people to have access to that profession and the education it requires.

*I’m sure I’m not the only one to think this, but since Gemini is definitionally multimodal, using not just language but audio, video, etc in training, “LLM” seems like the wrong term for it. But all the references I find online still seem to be using that word.

You can find more of my work at www.stephaniekirmer.com.

References
Seeing Our Reflection in LLMs was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Seeing Our Reflection in LLMs

Go Here to Read this Fast! Seeing Our Reflection in LLMs
March 2, 2024
How to Implement ChatGPT with OpenAI API in Python Synchronously and Asynchronously

Lynn G. Kwong

Learn to use AI to boost the efficiency of your business

Continue reading on Towards Data Science »

Originally appeared here:
How to Implement ChatGPT with OpenAI API in Python Synchronously and Asynchronously

Go Here to Read this Fast! How to Implement ChatGPT with OpenAI API in Python Synchronously and Asynchronously

March 1, 2024
Knowledge Bases for Amazon Bedrock now supports hybrid search

Mani Khanuja

At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG). In a previous post, we described how Knowledge Bases for Amazon Bedrock manages the end-to-end […]

Originally appeared here:
Knowledge Bases for Amazon Bedrock now supports hybrid search

Go Here to Read this Fast! Knowledge Bases for Amazon Bedrock now supports hybrid search

March 1, 2024
Pandas Columns: Bracket Indexing (df[‘x’]) Versus Dot Syntax [df.x]

Marcin Kozak

Does it matter how you do it? Maybe one is faster than the other?

Continue reading on Towards Data Science »

Originally appeared here:
Pandas Columns: Bracket Indexing (df[‘x’]) Versus Dot Syntax [df.x]

Go Here to Read this Fast! Pandas Columns: Bracket Indexing (df[‘x’]) Versus Dot Syntax [df.x]

March 1, 2024
Are the Humanoids Here to Stay?

Nikolaus Correll

Continue reading on Towards Data Science »

Originally appeared here:
Are the Humanoids Here to Stay?

Go Here to Read this Fast! Are the Humanoids Here to Stay?

March 1, 2024
Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

Joe Morotti

The rise of artificial intelligence (AI) has created opportunities to improve the customer experience in the contact center space. Machine learning (ML) technologies continually improve and power the contact center customer experience by providing solutions for capabilities like self-service bots, live call analytics, and post-call analytics. Self-service bots integrated with your call center can help […]

Originally appeared here:
Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

Go Here to Read this Fast! Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

March 1, 2024
Leveraging Large Language Models for Business Efficiency
Benoît Courty
Implementing Large Language Models for Business Improvement: A Step-by-Step Guide

TL;DR: This article talks about how Large Language Models can improve your company process. Its target audience is people with technical backgrounds like software architects or CTO. The article shows the options to use LLM efficiently, you will learn how to use modern techniques like Retrieval Augmented Generation (RAG), function calling and fine-tuning with examples on a use case.

Table of content

· Identifying a Business Need
· Explore an idea by yourself
· Creating an Evaluation Dataset
· Considering Internal Industrialization
· Customizing Responses with Company Data
· Function Calling to use APIs
· Breaking Down Tasks into Multiple Prompts
· Fine-tuning to improve performance
· Combining Model
· Conclusion

Photo by Andrea De Santis on Unsplash

In the rapidly evolving landscape of technology, Artificial Intelligence (AI) and Machine Learning (ML) have emerged as pivotal forces driving innovation, efficiency, and competitive advantage across industries. For Chief Technology Officers, IT Directors, Tech Project Managers, and Tech Product Managers, understanding and integrating these technologies into business strategies is no longer optional; it’s imperative.

It’s not a surprise, Large language Models (LLMs) like ChatGPT could do more than chat.

We will explore step by step strategies to prevent data distortion, enhance operational efficiency, and better use your company’s resources.

Identifying a Business Need

You already know that Large Language Models (LLMs) such as ChatGPT, Gemini, Mistral, etc… have emerged as powerful tools that can automate tasks and enhance customer service. As a business decision-maker, understanding the capabilities and limitations of LLMs can help you make informed decisions about their implementation.

The first step in leveraging LLMs is to identify a task that can be automated to improve customer service or relieve employees of repetitive tasks. For instance, LLMs can be used to automate information retrieval in documents, write reports, or process customer requests.

Explore an idea by yourself

Once you have identified a business case, the next step is to manually evaluate this with ChatGPT (or Gemini) to estimate if the current reasoning capabilities of generative AI are sufficient to meet the need.

You can create a list of sample inputs and evaluate the accuracy of the responses generated by ChatGPT.

Let’s say you want to automate the dispatching of emails your company receives.
You have to get some emails and test if an online LLM is able to sort them and prepare an answer.

Photo by Serhat Beyazkaya on Unsplash

Here is a sample prompt:
```
Our company is organized into several departments, each with distinct functions and areas of responsibility. Here's a brief overview:
Sales Department: This team is responsible for generating new business leads, negotiating deals, and maintaining relationships with existing clients.

Marketing Department: This department focuses on promoting our products and services, managing our brand reputation, and conducting market research.

Customer Service Department: This team handles inquiries, complaints, and support requests from our customers.

Human Resources Department: This department manages employee relations, recruitment, training, and benefits.

Technical Support Department: This team provides assistance with technical issues related to our products and services.

Finance Department: This department is responsible for managing the company's financial resources, including budgeting, forecasting, and financial reporting.

We want you to categorize the following email. I would like you to analyze the content of the email to determine the most suitable department. Here are some guidelines to help you with your analysis:
Look for keywords and phrases that indicate the email is related to sales, marketing, customer service, HR, technical support, or finance.

Consider the tone and context of the email. For example, an email that expresses frustration with a product may be best suited for the customer service department.

If an email contains multiple topics, categorize it based on the most prominent or urgent issue.

Part 3: Once you have analyzed the email and determined the most suitable department, please append your recommendations. Please ensure that your recommendations are accurate and consistent, as this will help us to improve our response times and provide better service to our customers.
Part 4: Write a model of response for the department
Here is the email:
```
Hi,
I hope this email finds you well. I am writing to request time off from work for the upcoming holiday season. Specifically, I would like to take the following days off:
Monday, December 20th

Tuesday, December 21st

Wednesday, December 22nd

Thursday, December 23rd

Friday, December 24th

I understand that this is a busy time of year for the company, and I apologize for any inconvenience my absence may cause. However, I have accrued enough vacation time to cover these days, and I believe that taking this time off will allow me to come back refreshed and ready to tackle the new year.
Please let me know if there are any issues with my request, or if you need any additional information from me. Thank you for your time and consideration.
Best regards, [Employee Name]
```
```
The model answer:
```
Based on the content of the email, I would recommend categorizing it under the
__Human Resources Department.__
The email contains keywords such as "time off," "vacation time," and "absence,"
which are related to employee relations and benefits, which fall under the 
responsibilities of the HR department. The tone of the email is polite and 
professional, and the context indicates that the employee is requesting 
time off for personal reasons.
__Here is a model response for the HR department:__
Dear [Employee Name],
Thank you for your email and for providing ample notice regarding your time off request for the upcoming holiday season. We appreciate your consideration for the busy time of year and your accrued vacation time to cover your absence.
I am pleased to inform you that your request for time off from Monday, December 20th to Friday, December 24th has been approved. Please ensure that you have communicated your absence to your team and made any necessary arrangements to cover your responsibilities during your time off.
If you have any further questions or concerns, please do not hesitate to reach out to the Human Resources Department.
Thank you for your hard work and dedication to the company. We hope that you have a relaxing and enjoyable holiday season.
Best regards,
[Your Name] Human Resources Department.
```
⚠️ Don’t hesitate to iterate with your prompting until you get the intended form of response you want to reproduce. This could take some time to get used to it. Don’t stop at first failure, nor first success. Don’t hesitate to restart from scratch. You could even challenge the model to write a prompt for you.

⚠️ Don’t forget to test edge cases, be sure to evaluate at least for 80% of your use case.

Creating an Evaluation Dataset

It’s time to expand your test. Previously you proved that it worked with few examples, now try again with more data to measure the accuracy.

Creating a dataset is essential to evaluate a process. You can use this dataset to test the model, ensuring that it meets your business needs. The dataset should be representative of the real-world scenarios that your business will encounter. Ideally, one should use code to create a reproducible evaluation chain. For example, by calling OpenAI’s API from a list of questions and automatically comparing expected answers.

With a ChatGPT subscription if you look at Explore GPTs you can also try Data Analyst to upload an Excel file and interact with the AI on it.
- Compile an Email Dataset: Start by assembling an Excel file containing 100 sample emails that your company might receive.
- Draft a Detailed Prompt
In this case you can structure your prompt in three segments:
- Part 1: Detail the various departments within your company, outlining their specific functions and areas of responsibility.
- Part 2: Introduce the dataset to the model, instructing it to analyze the content of each email to determine the most suitable department.
- Part 3: Direct the model to append its recommendations in a new column within your Excel file, effectively categorizing each email.
- Execute and Evaluate: Utilize the prompt to task the model with identifying the correct recipient department for each email. Following the model’s processing, review its suggestions to assess accuracy and relevance.
Screenshot of a sample dataset (AI generated by the autor with Mistral-medium)

Before considering going further you can manually rate each answer and compute the average to evaluate if the result is good enough for this use case. In our example, remember that the use case is a <human>(email) to <machine> (routing & proposed answer) to <human> (department) workflow, so an error can be tolerated : the human could modify the answer, or a department can reroute an email to another… If it happens on ten emails in a hundred it can be good enough.

Considering Internal Industrialization

You can fastrack a production ready solution by using an API provided by an external provider.

You can use OpenAI API or others for your MVP, but there are several factors that you should consider, including:
- All the Data you provide to an external API or chat is recorded somewhere
- You should anonymize your data even if the service provider claims that it is not using your data…
- Risk of industrial secret leakage: If you are outside of the US, be aware that OpenAI is subject to the Cloud Act.
- Speed limitations: It often takes several seconds to obtain a complete response from OpenAI, which may not be fast enough for certain use cases.
- Call limitations: The number of calls per second are limited, as well as maximum monthly expenses
- Environmental impact: Large generalist models have a significant environmental impact, and this should be taken into account when considering their use.
- Cost variation: ie OpenAI APIs are subject to cost variation, which can impact your budget.
- Difficulty in asserting a competitive advantage: It can be challenging to assert a competitive advantage when using OpenAI APIs, as they are available to all businesses.
- Stability: LLM private models like Gemini, Mistral, Claude2, GPT4 … are not always stable and you should consider monitoring the quality and stability of the answers provided. You also have to add rail guards to protect your service quality and you & your customers from hazardous behaviors coming from in and out. Problems can occur from the input or the output.
To avoid some of these pitfalls, you can turn to open-source models such as LLAMA or Mistral. These open-source alternatives offer several advantages:
1. Privacy and Security: Self hosted models, reduce the risk of industrial secret leakage.
2. Customization: You can fine-tune open-source models to better suit your specific business needs.
3. Lower Costs: Open-source models are often less expensive than proprietary solutions, especially when considering the limitations on the number of calls and monthly expenses.
4. Environmental Impact: Open-source models are smaller and can be optimized for specific use cases, potentially reducing their environmental footprint. You could measure it with CodeCarbon.
5. Competitive Advantage: By customizing an open-source model, you can create a unique solution that sets your business apart from competitors.
Now you have automated the routing of the email, let’s improve the quality of the answer. A way to do it is to add company documents to the capability of the model. This will allow the model to find answers in your document instead of his “memory”.

Customizing Responses with Company Data

Customizing responses from a LLM with company data will create a more accurate and tailored experience for users.

Photo by Yasamine June on Unsplash

You can’t send all company data within the prompt. That’s why Retrieval Augmented Generation (RAG) is useful, it’s a technique that combines information retrieval from a database and generation capabilities of a LLM. By using RAG, you can improve the accuracy of responses. And you could tell to the user which documents have been used for the answer.

RAG technique can be simply presented by this formula:

<LLM trained with billion of data> + <Your prompt> + <Your company dataset> = Responses aligned with your context

RAG is often done with a vector database as it works in most cases, here is how to create the database:
1. Split your documents by shorts chapters
2. Convert chapters to vectors using an embedding model. The vector on the same subjects will be near in the n-dimensional spaces. Typical vector is an array of 1,024 floats values. Think of it like if each value represents a characteristic, like color, size, gender… It’s not hard coded, the model finds the value by himself in training.
3. Store them in a vector database
Image by the author

When you receive an email, you will use RAG like this:
1. Convert the email of your customer to a vector
2. Query the database with this vector to retrieve the 10 nearest vectors of paragraphs
3. Take the text of these paragraphs and add them to the prompt
4. Ask the LLM for an answer
5. The answer will be based on the data provided in the prompt
Image by the author

If you want to learn more, read Retrieval Augmented Generation (RAG)

Now your answer will be using your data, so it helps prevent what is called hallucination.

ℹ️ Model Hallucination is not an easy problem to manage. Because the “memory” of a LLM is more like a human memory (compressed representation of the world) than a computer’s exact memory. And models are trained to help you so they will try to, even when they don’t know the answer, misleading information will be presented as fact. RAG helps cope with this problem by providing relevant data to the model.

RAG is really good for unstructured data, but sometimes you have a better way to answer the question like tabular data with pricing for each product, or you may even want to compute taxes, or looking for a slot in an agenda to arrange a meeting. Let’s see how to do that with function calling.

Function Calling to use APIs

Function calling is a way to allow interaction between a LLM and your enterprise API, like:
- Salesforce, SAP for your ERP
- Service Now or other ticketing services
- Agendas
- Invoice, pricing
- Custom API to do anything in your company
- Third party API
Function calling is an essential feature that allows you to use APIs without exposing them to the outside world. This feature opens up many possibilities beyond simple chat applications. For instance, you can integrate specialized internal services or tools into the LLM, making it more versatile and valuable for your business. You can take a mail from a customer requesting a price, send it to the LLM to turn it into a parameter to call your pricing API, then use the API answer to ask the LLM back to write the answer to the customer.

Given the request:

“
Hello,
I really like your company. I would like to order you a solar panel mounting rail, what would be the price ?
Best regards
“

You send the request to the LLM, with the definitions of the API that exist in your company:
```
{
 "type": "function",
 "function": {
  "name": "multiply",
  "description": "Get product price.",
  "parameters": {
    "type": "object",
    "properties": {
      "product_name": {
      "description": "Name of the product",
      "type": "string"
      },
    "required": [ "product_name" ]
    }
  }
}
```
So the LLM extract the product name from the mail and give you the JSON to make the API call:
```
{
 "product_name": "solar panel mounting rail"
}
```
It’s up to you to call the API, so it is totally secured : the LLM never knows where your API is, just what it can do.

The answer of the API could be sent back to the LLM to build a natural language answer.

Can you answer this email given that the price for a “solar panel mounting rail” is $10 without a VAT of 5% ? “Hello, I really like your company. I would like to order you a solar panel mounting rail, what would be the price ? Best regards Your customer “

The answer will be:

Hello,

Thank you for your interest in our company and for considering our solar panel mounting rail. The price for the mounting rail is 10 before taxes, with a VAT of 5%, so $10.50 taxes included.

Please let me know if you have any other questions or if you would like to proceed with the order.

Best regards,

So you now have a system that can use your internal services to better prepare answers for your customers. That’s a game changer if you have already invested in APIs.

We just saw that we may call a LLM more than once for a single task, let see that in more detail.

Breaking Down Tasks into Multiple Prompts

It’s important to note that a single prompt is often not enough for complex tasks. Your project will likely require breaking down the task into multiple prompts that will chain together and combine several techniques.

For exemple https://360learning.com/ build a platform to help building online courses with AI from a single text document as input. Their pipelines make use of 9 prompts, used for 30 OpenAI calls, and RAG to achieve their goal. A first prompt asks for a resume of the document, a second asks for a plan for an online course from the resume, then RAG is used to retrieve each part of the document from the title, and so on.

Here is some slides of their presentation:

Caption from 360learning

Caption from 360learning

Video source : https://www.youtube.com/watch?v=1Eyc2GypnF4 (in French)

They are using LangChain, a framework that helps to create these types of LLM pipelines.

ℹ️ You probably heard of “AI Agents”: they are just a way to combine prompts, but without writing them in advance. An agent is a call to a LLM to get a list of tasks. Then, make a call to LLM for each task, and so on. It works best with giving the ability to the LLM to call external tools like browsing the web using functions like we saw before.

Now you have a powerful pipeline, but how to improve the model itself to have faster and better answers ? You can fine tune a model.

Fine-tuning to improve performance

Fine-tuning can often improve the model’s performance and reduce its size while maintaining equal performance, because you could use smaller models, like Mistral-7B, or even Phi-2.

Very few companies could afford to train a LLM from scratch because it requires a huge dataset and hundreds of GPUs, almost 2 millions GPU hours for Llama2–70B for example. But you can take an already pre-trained model and fine-tune it, only an afternoon of fine-tuning is needed in most cases.

The drawback is that you have to build a training dataset with hundreds of questions and answers.

Combining Model

It’s a new technique to combine multiple models in one. The result is a big model, called Mixture of Experts (MoE), with better capabilities than a single of the same size. The easiest way to do that is with MergeKit.

Generated with AI — Bing Copilot — “An image of a mathematician, a physicist and a mechanical engineer working on the same problem around a desk featuring a dismantled uav”

This could help you if it’s difficult to decide which model to use : with MoE, it’s the model who decides which one to use.

Conclusion

Customizing responses from LLMs with company data and API create a more accurate and tailored experience for users. Fine-tuning can improve the performance, and breaking down tasks into multiple prompts can help tackle complex tasks.

While all of this may seem complex and reserved for specialists, abundant documentation and numerous libraries are available to facilitate implementation. Popular libraries include HuggingFace, Langchain, HayStack, Axolotl and so on…

However, don’t forget the cost of integration. As with any project, there is a significant cost associated with moving from a functional prototype to a fully industrialized solution within an existing IT system. You will often discover that the process of your company is more complex than expected. Or that the data needs a bit of cleaning to be processed.

While large language models offer many advantages, don’t neglect the benefits of “older” machine learning techniques like random forest or DistiliBert. These techniques can still provide values, including faster processing, easier integration into existing tools, no need for GPUs, better explainability, and lower costs.

We hope that this article provides a view on how to include LLM in your software architecture.

Article written in february 2024 by Benoît Courty, data scientist, with the help of Stéphane Van-Bosterhaudt, CEO of UpScale.

More readings on the subject:
- Cloud Act: https://en.wikipedia.org/wiki/CLOUD_Act
- RAG: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/
- Embeddings: https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526
- Function calling: https://platform.openai.com/docs/guides/function-calling
- Fine-tuning: https://mlabonne.github.io/blog/posts/A_Beginners_Guide_to_LLM_Finetuning.html
- Mixture of Experts: https://medium.aiplanet.com/create-your-own-mixture-of-experts-model-with-mergekit-and-runpod-8b3e91fb027a
Leveraging Large Language Models for Business Efficiency was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Leveraging Large Language Models for Business Efficiency

Go Here to Read this Fast! Leveraging Large Language Models for Business Efficiency
March 1, 2024
What are Emotions, Legally Speaking? And does it even matter?
Tea Mustać
Emotions-in-the-Loop

Exploring the Legal Framework for Emotion in Emotion Recognition Tech

Image by Comuzi / © BBC / Better Images of AI / Mirror D / CC-BY 4.0

Recently I have started writing the series Emotions-in-the-Loop, where in the pilot (“Analyzing the Life of (Scanned) Jane”) I imagined an ultimate personal assistant interconnected with various personal devices and equipped with multiple emotion recognition systems. The reasons I wrote the first admittedly very sci-fi-like essay were:
1. Having some fun before tangling myself into the legal nitty-gritty of analyzing the interconnected system.
2. Raising at least some of the questions I wish to address in the series.
Still, before we can get into any of the two, I thought it was necessary to devote some attention to the question of what are emotions in this context. Both legally and technically speaking. And as we’ll soon see these two meanings are inextricably connected. Now, to not waste too much time and space here, let’s dive into it!

1. The Power of Emotions

“Emotion pulls the levers of our lives, whether it be by the song in our heart, or the curiosity that drives our scientific inquiry. Rehabilitation counselors, pastors, parents, and to some extent, politicians, know that it is not laws that exert the greatest influence on people, but the drumbeat to which they march.”

R.W. Picard

Emotions have always fascinated us. From when Aristotle described them as “those feeling that so change men as to affect their judgements.”[1] Over Charles Darwin and William James, who first connected emotions with their bodily manifestations and causes.[2] All the way to technologies, such as those used by our lovely, scanned Jane, that collect and interpret those bodily manifestations and appear to know us better than we know ourselves as a consequence.

This fascination has driven us towards discovering not just the power of emotions, but also ways as to how one can influence them. And that both to make ourselves and others feel better, as well as to deceive and manipulate. Now enter artificial intelligence, which graciously provided the possibility to both recognize and influence emotions at scale. However, despite how deeply fascinated we appear to be with emotions and the power they wield over us, our laws still fall short of protecting us from malevolent emotional manipulation. One of the reasons why this might be the case is because we are still not really sure what emotions are in general, let alone legally speaking. So let’s try and see if we can at least answer the second question, even if the answer to the first continues to elude us.

2. Emotions as (Personal) Data?

One thing that may come to mind when thinking about emotions legally, and especially in the context of artificial intelligence, is that they are some kind of data. Maybe even personal data. After all, what can be more personal than emotions? Than your feelings that you so carefully keep to yourself? Well, let’s briefly consider this hypothesis.

Personal data under the GDPR is defined as “any information relating to an identified or identifiable natural person.” An identifiable natural person being the one who can (at least theoretically) be identified by someone somewhere, regardless of whether directly or indirectly. The slightly problematic thing about emotions in this context is that they are universal. Sadness, happiness, anger, or excitement don’t tell me anything that would make me identify the subject experiencing these emotions. But this is an overly simplistic approach.

First of all, emotional data never exists in a vacuum. Quite to the contrary, it is inferred by processing large quantities of (sometimes more, sometimes less, but always) personal data. It is deduced by analyzing our health data such as blood pressure and heart rate, as well as our biometric data like eye movements, facial scans, or voice scans. And by combining all these various data points used, it is in fact possible to identify a person.[3] Even the GDPR testifies to this fact by explaining already in the definition of personal data that indirect identification can be achieved by referencing “one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of [a] natural person.”[4]

The easiest examples are, of course, various emotion recognition systems in wearable and personal devices such as the ones Jane has, where the data is directly connected with her user profile and social media data, making the identification that much simpler. However, even when we are not dealing with personal devices, it is still possible to indirectly identify people. For instance, a person standing in front of a smart billboard and receiving an ad based on their emotional state combined with other noticeable characteristics.[5] Why? Well, because identification is relative and highly context-specific. For instance, it is not the same if I say “I saw a sad-looking girl” or if I say “Look at that sad-looking girl across the street”. By narrowing the context and the number of other possible individuals I could be referring to identification becomes a very probable possibility, even though all I used was very generic information.[6]

Furthermore, whether someone is identifiable will also heavily depend on what we mean by that word. Namely, we could mean identifying as ‘knowing by name and/or other citizen data’. This would, however, be ridiculous as that data is changeable, can be faked and manipulated, and not to mention the fact that not all people have it. (Think illegal immigrants who often don’t have access to any form of official identification.) Are people without an ID per definition not identifiable? I think not. Or, if they are, there is something seriously wrong with how we think about identification. This is also becoming a rather common argument for considering data processing operations GDPR relevant, with increasingly many authors taking a broad notion of identification as ‘individuation’[7], ‘distinction’,[8] and even ‘targeting’.[9] All of which are things all of these systems were designed to do.

So, it would appear that emotions and emotional data might very well be within the scope of the GDPR, regardless of whether the company processing it also uses it to identify a person. However, even if they aren’t, the data used to infer emotions will most certainly always be personal. This in turn makes the GDPR applicable. We are not getting into the nitty gritty of what this means or all the ways in which the provisions of the GDPR are being infringed by most (all?) providers of emotion recognition technologies at this point. They are after all still busy arguing that the emotional data isn’t personal in the first place.

3. Does It Even Really Matter?

We’ve all heard the phrase: “You know my name, not my story.” We also all know that that is very true. Our names (and other clearly personal data) say much less about who we are than our emotions. Our names can also not be used for the same intrusive and disempowering purposes, at least not to the extent that recognizing our emotional states can be. That is also why we are in most cases not being identified by the providers of these systems. They don’t care about what your name is. They care about what you care about, what you give your attention to, what excites or disturbs you.

Sure, some of them sell health devices that then infer your emotions and psychological states for health purposes. Just as scanned Jane, many people probably buy them exactly for this purpose. However, most (all?) of them are not doing just that. Why let all that valuable data go to waste when it can also be used to partner up with other commercial entities and serve you ultimately personalized ads? Or even save you the trouble and just order the thing for you. After all, the entity they partnered up with was just one of the solid options that could have been chosen for something that you (presumably) need.

Finally, for these purposes and especially when considering other, non-health-device, emotion recognition systems, it is also increasingly irrelevant to recognize specific emotions. Making the whole, debate on whether ‘reading’ emotions is a science or a pseudoscience to a greater extent irrelevant. As well as the previously discussed question of what emotions are, because then we have to go through the same mental exercise for any state they eventually end up recognizing and using. For instance, nowadays it is much more important to put you somewhere on the PAD (pleasure-arousal-dominance) scale.[10] This then suffices to asses your general attitude towards a particular information, situation, ad, you name it. Is it causing positive or negative sensations, is it capturing your attention or not, is it making you proactive or receptive? And that is finally enough to serve you with just the right ad at just the right time. If you are even still being served an ad that is and not just getting a delivery to your door.

4. Final thoughts

So, what are emotions? They are a lot of things. They are internal perceptions of our current state and environment. External representation of those perceptions. And computational readings of those representations. This long-winded road also makes them a type of data. Data that can in some cases, when combined with other data points or used in narrow enough contexts, be personal. And this holds true, irrespectful of what emotions are in a psychological sense. Lest we wish to get entangled into questions of what attention and excitement are. Or have system providers sleazing out of the GDPR scope by claiming they don’t actually recognize emotions but rather simply track user reactions across the PAD scale.

This conclusion then makes any single entity recognizing and using data obtained by ‘reading’ our facial expressions and bodily cues also responsible for what it does with the collected data. With the bare minimum of its obligations lying somewhere between meaningful transparency with full disclosure of purposes for which the data is collected and including an easily accessible possibility to object to the processing. This is currently far from common practice.

Even when it comes to Jane’s all too smart devices, mentioning how the data is collected, used to predict her emotional states, and then make decisions based on those states somewhere in the Terms of Service does not mean transparency can be ticked off the list. And where are we then from having any type of influence over these data flows that we constantly emit into our orbit for our devices to process?

Finally, when other, non-personal devices are considered, the situation gets even worse, of course. Be it mood-based social media algorithms or smart billboards, they all process personal data to generate some more (as we have established) personal data and use it to influence our behavior. Being fully mindful of the fact that it is difficult to fulfill the transparency requirements when individuals are walking past a billboard or scrolling social media, they are still requirements. Not just helpful recommendations. Envisioning novel approaches for achieving meaningful transparency is a MUST. And so is thinking about emotions and emotional data in a robust manner, free of psychological finesses and discussions. Otherwise, we might soon lose all that is left of our power to make decisions.

Image by Nik on Unsplash

[1] Aristotle, Rhetoric, 350 BCE, translated by W. Rhys Roberts, available at https://classics.mit.edu/Aristotle/rhetoric.2.ii.html.

[2] P. Ekman, Darwin’s contributions to our understanding of emotional expressions, Philos Trans R Soc Lond B Biol Sci. 2009 Dec 12; 364(1535): 3449–3451. doi: 10.1098/rstb.2009.0189, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2781895/ .

[3] L. Sposini, Neuromarketing and Eye‑Tracking Technologies Under the European Framework: Towards the GDPR and Beyond, Journal of Consumer Policy https://doi.org/10.1007/s10603-023-09559-2

[4] Article 29 Working Party, Opinion 4/2007 on the concept of personal data, 01248/07/EN WP 136, https://www.clinicalstudydatarequest.com/Documents/Privacy-European-guidance.pdf p.32, 38; Judgement of 19 October 2016, C-582/14 Breyer, ECLI:EU:C:2016:779, para.32.

[5] J. Metcalfe, A Billboard That Hacks and Coughs at Smokers, Bloomberg, January 17, 2017, https://www.bloomberg.com/news/articles/2017-01-17/a-smart-billboard-that-detects-and-coughs-at-smokers?embedded-checkout=true

[6] Article 29 Working Party, Opinion 4/2007 on the concept of personal data, 01248/07/EN WP 136, https://www.clinicalstudydatarequest.com/Documents/Privacy-European-guidance.pdf p.13.

[7] Vidal-Hall v Google, Inc. [2015] EWCA Civ 311

[8] P. Davis, Facial Detection and Smart Billboards: Analysing the ‘Identified’ Criterion of Personal Data in the GDPR (January 21, 2020). University of Oslo Faculty of Law Research Paper №2020–01, Available at SSRN: https://ssrn.com/abstract=3523109 or http://dx.doi.org/10.2139/ssrn.3523109

[9] N. Purtova, From knowing by name to targeting: the meaning of identification under the GDPR, International Data Privacy Law, Volume 12, Issue 3, August 2022, Pages 163–183, https://doi.org/10.1093/idpl/ipac013

[10] Kalinin, A., Kolmogorova, A. (2019). Automated Soundtrack Generation for Fiction Books Backed by Lövheim’s Cube Emotional Model. In: Eismont, P., Mitrenina, O., Pereltsvaig, A. (eds) Language, Music and Computing. LMAC 2017. Communications in Computer and Information Science, vol 943. Springer, Cham. https://doi.org/10.1007/978-3-030-05594-3_13, B. J. Lance and S. Marsella, The Relation between Gaze Behavior and the Attribution of Emotion: An Empirical Study, September 2008, Proceedings of the 8th international conference on Intelligent Virtual Agents

https://www.researchgate.net/publication/225115460_The_Relation_between_Gaze_Behavior_and_the_Attribution_of_Emotion_An_Empirical_Study

What are Emotions, Legally Speaking? And does it even matter? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
What are Emotions, Legally Speaking? And does it even matter?

Go Here to Read this Fast! What are Emotions, Legally Speaking? And does it even matter?
March 1, 2024
How to Overlap the Lines of a Trendline Year by Year in Tableau

Angelica Lo Duca

A step-by-step tutorial in Tableau to split a trendline year by year

Continue reading on Towards Data Science »

Originally appeared here:
How to Overlap the Lines of a Trendline Year by Year in Tableau

Go Here to Read this Fast! How to Overlap the Lines of a Trendline Year by Year in Tableau

March 1, 2024
False Prophet: Lightning Strikes Twice

Bradley Stephen Shaw

Insights from incorporating external weather data into a time series regression inspired by Meta’s Prophet

Continue reading on Towards Data Science »

Originally appeared here:
False Prophet: Lightning Strikes Twice

Go Here to Read this Fast! False Prophet: Lightning Strikes Twice

March 1, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Category: AI

When LLMs give us outputs that reveal flaws in human society, can we choose to listen to what they tell us?

Machine Learning, Nudged

Applying to LLMs

So what do we do?

Where do we go from here?

References

Implementing Large Language Models for Business Improvement: A Step-by-Step Guide

Identifying a Business Need

Explore an idea by yourself

Creating an Evaluation Dataset

Considering Internal Industrialization

Customizing Responses with Company Data

Function Calling to use APIs

Breaking Down Tasks into Multiple Prompts

Fine-tuning to improve performance

Combining Model

Conclusion

Emotions-in-the-Loop

Exploring the Legal Framework for Emotion in Emotion Recognition Tech

1. The Power of Emotions

2. Emotions as (Personal) Data?

3. Does It Even Really Matter?

4. Final thoughts