Tag: artificial intelligence

  • Accelerate performance using a custom chunking mechanism with Amazon Bedrock

    Accelerate performance using a custom chunking mechanism with Amazon Bedrock

    Kristin Olesova

    This post explores how Accenture used the customization capabilities of Knowledge Bases for Amazon Bedrock to incorporate their data processing workflow and custom logic to create a custom chunking mechanism that enhances the performance of Retrieval Augmented Generation (RAG) and unlock the potential of your PDF data.

    Originally appeared here:
    Accelerate performance using a custom chunking mechanism with Amazon Bedrock

    Go Here to Read this Fast! Accelerate performance using a custom chunking mechanism with Amazon Bedrock

  • What Do Large Language Models “Understand”?

    What Do Large Language Models “Understand”?

    Tarik Dzekman

    A deep dive on the meaning of understanding and how it applies to Large Language Models

    Lights emerge out of a laptop with stylised pattern recognition. Text surrounding the laptop: 1. What do Large Language Models “Understand”? 2. Syntax vs Semantics. 3. Just pattern recognition? 4. World Model? -> Take actions in the world. The main text is annotated: [What | Wh-Word] [do | Auxiliary verb] [Large Language Models | Subject] [Understand | Main verb] [? | question mark].
    Source: Image by the author with elements generated with Stable Diffusion

    It’s hard to believe that ChatGPT is almost 2 years old. That’s significant to me because ChatGPT is only 1 month younger than my daughter. Just yesterday she successfully put a star shaped block into a star shaped hole, told me about how “yesa-day” she was sick and “bomited”, and told me she wanted to call her nanna on the phone. What has ChatGPT learned in those 2 years? It hasn’t learned to act in the real world, it can’t remember things that happened to it, and it doesn’t have desires or goals. Granted, with the right prompt it could output text that convincingly follows an instruction to express goals. But is that really the same thing? No. The answer is No.

    Large Language Models (LLMs) like ChatGPT possess capabilities far beyond what my daughter will ever achieve. She won’t be able to communicate coherently in a wide range of languages, read as many books as exist in an LLM’s training data, or generate text as quickly. When we attribute human-like abilities to LLMs, we fall into an anthropomorphic bias by likening their capabilities to our own. But are we also showing an anthropocentric bias by failing to recognize the capabilities that LLMs consistently demonstrate? Let’s review the scorecard so far:

    • It’s true that an LLM doesn’t have memory — although we can simulate one by having it summarise past conversations and including that information in a prompt.
    • LLMs don’t have intrinsic goals — although they can be prompted to generate text that sounds convincingly goal-oriented.
    • LLMs can’t act in the physical world — though someone could probably create a prompt to showcase this.

    While they perform impressive feats, they still lack some basic abilities that my 21-month-old daughter has. We can mimic some of those abilities with the right prompts and tooling. In generating coherent text responding to such prompts, LLMs consistently demonstrate an apparent ability to understand what we want. But to what extent do LLMs truly “understand”?

    How LLMs Work

    The text “Using context to predict what’s most likely to come [MASK]” is shown with boxes around each word. Lines emanate from the “[MASK]” word showing how much attention that word is paying to words around it. The lines are given the label “Attention map”. An arrow shows that [MASK] is the next word to be predicted.
    A hypothetical attention map for the incomplete sentence: “Using context to predict what’s most likely to come [MASK]”. Source: Image by the author

    I am talking about a very specific type of LLM: transformer-based auto-regressive large language models. I won’t go into the specifics when many detailed articles have been written explaining transformers with varying levels of complexity. Instead let’s focus on the core of what an LLM does: they are statistical models that predict the likelihood of a token appearing in a piece of text given some context.

    Now imagine I created a complex weather model* where patches of the Earth’s atmosphere become ‘tokens.’ Each token has attributes like humidity, temperature, and air pressure. I use the model to forecast these attributes over time-steps. If the time-steps get shorter and the patches become smaller the model is closer and closer to representing the state of the actual world. This model attempts to capture something about the likelihood of the weather we’ll see next, given the weather we’ve seen before. It may learn to very accurately predict, for example, the emergence of cyclones over time in areas where air is warm, moist, and of low pressure. But it’s not a simulation of the physics of Earth’s weather any more than an LLM is a simulation of brain activity.

    If an LLM is a statistical model of text, what exactly is it modelling? My imagined weather prediction model tries to capture the statistics of the atmospheric conditions that generate the weather. But what is the statistical process that generates text? The process that generates text is a human brain and humans need some understanding of the world to generate that text. If a model can effectively predict text a human might write then could that prediction come with “understanding”?

    How LLMs are trained

    LLMs are trained to optimize an objective that reduces the surprise of encountering a specific token given its context. If the model encounters a token in the training data and assigns it low probability, the model’s weights are adjusted to give it a higher probability.

    Compare that to how my daughter learns to use language. When she wants something, she uses language to communicate her desires. First, she understands what she wants on some level. Then, she must understand which words to use to get what she wants. Recently, she wanted me to fill her juice bottle but didn’t want me to take it away or walk away from her to bring more juice. Though her wants were contradictory and a bit irrational, she had several goals: (1) more juice, (2) keep the juice bottle near her, (3) daddy stays near her too. And let me tell you, she communicated this very effectively. Her language learning is directly tied to her understanding of how those words can get her what she wants (even if what she wants is irrational).

    If an LLM were to exhibit understanding, it would be an emergent attribute of its statistical model of the world. The paper “Climbing Towards NLU” (Bender & Koller, 2020) argues that true natural language understanding (NLU) requires grounding in the real world. Bender & Koller argue that LLMs trained exclusively on statistical patterns in textual data lack real-world context or interactions to achieve actual understanding. This means that, unlike my daughter, an LLM can’t understand something because its communication is not grounded in the real world.

    What is Understanding?

    The Wikipedia page on understanding describes it as a cognitive process involving the use of concepts to model an object, situation, or message. It implies abilities and dispositions sufficient to support intelligent behaviour. Ludwig Wittgenstein suggested that understanding is context-dependent and is shown through intelligent behaviour rather than mere possession of knowledge. This is reminiscent of the grounding requirement posited by Bender & Koller.

    On the one hand understanding needs an accurate model of the world. On the other hand people contend that one needs to use this model to act in the world in order to actually understand. I would argue that we simply analyse someone’s behaviour only as a proxy for measuring that underlying world model. If we could measure the world model directly we wouldn’t need to see demonstrations of understanding.

    The Limitations of Understanding

    Philosopher John Searle’s “Chinese Room” experiment challenges our concept of understanding (Searle, 1980). Imagine a room filled with detailed instructions on how to respond to someone writing in Chinese. Notes written in Chinese are slid under the door, and the person inside the room can look up the symbols and follow a recipe for writing a reply. The person in the room doesn’t know Chinese but can have a convincing conversation with a person outside. Clearly, the person who constructed the room “understands” Chinese, but someone on the outside isn’t conversing with that person; they’re conversing with the room. Does the room understand Chinese?

    This is strongly analogous to how LLMs work and challenges our philosophical perception of understanding. It’s challenging precisely because we intuitively balk at the idea that a room could understand something. What would it even mean? If understanding is an emergent phenomenon that happens at the level of information processing systems then why can’t we say that rooms can understand things? Part of the issue is that, for us, understanding comes with a subjective conscious experience of understanding. But it’s easy to see that this experience can be deceiving.

    Understanding Need Not Be Binary

    You know that 7+7=14, but do you understand it? If I asked you some probing questions, you might realize that you don’t truly understand what that equation means in all contexts. For example, is 7+7=14 an unequivocal fact about the universe? Not necessarily. 7 apples plus 7 pears mean you have 7 apples and 7 pears. Perhaps in some contexts, you would count 14 pieces of fruit, but is it always true that you can combine two sets of different items? Or consider that 7pm + 7hours is 2am (i.e. 7+7=2 mod 12). Are you able to give me a robust definition of why 7+7=14 that could explain when it’s true and why^? Most people probably couldn’t do this off the top of their head, yet we’d feel comfortable saying that most people understand that 7+7=14. The question isn’t always whether something was understood but the extent to which it was understood.

    If we take Wittgenstein’s requirement that understanding is demonstrated by behaviour then there would be a simple test: if I tell you to arrive 7 hours after 7pm, do you know to show up at 2am? I would argue that is evidence of some understanding but not necessarily of the depth of your understanding.

    Measuring Understanding in Animals

    Measuring ‘understanding’ is not straightforward. In psychology psychometric testing is the primary way we measure understanding in humans. It’s not straightforward to apply the same techniques to non human animals and is a field of study called Biosemiotics.

    Understanding in animals is measured through various problem-solving tasks. For example, primates, dolphins, and birds (mostly corvids) demonstrate problem-solving skills and sophisticated tools use, suggesting some understanding of their environments (Emery & Clayton, 2004). Understanding is not exclusively for humans and we can measures levels of understanding in non-humans too.

    The book “Inside of a Dog: What Dogs See, Smell, and Know” by Alexandra Horowitz is a fascinating exploration of how we might understand the mind and experiences of our closest animal companions: domesticated dogs. She describes two experiments that look at imitation behaviour and what a human infant vs a dog understands.

    (1) If an infant sees someone flipping a light switch with their head they may imitate this behaviour. If the person is holding something in their hands the baby understands there was a reason they didn’t use their hands. When the baby imitates this behaviour they will use their hands. (2) by contrast, dogs would prefer to press a button with their nose rather than their paw. If a dog sees another dog press a button with their paw to get a treat, then they will imitate this behaviour. But if the dog sees that the other dog couldn’t use its nose because it had a large object in its mouth then it will understand the button needs to be pressed but that using paws is optional.

    A cute brown dog has a ball in its mouth as its about to press a red button with its paw. A black and white do stands behind it watching. The black and white dog has a thought bubble that says “BUTTON = TREATS”.
    Source: Image generated by the author with Ideogram

    Constructing an experiment to determine what a dog understands required an understanding of the dog and its behaviour. Do we have that same level of understanding of LLMs to conduct similar experiments?

    Measuring Understanding in LLMs

    The GPT-3 Era

    A comprehensive survey on LLM capabilities (Chang & Bergen, 2023) provides an excellent summary from a wide range of articles – however the most advanced model covered is only GPT-3. They breakdown understanding into two main categories: syntactic and semantic understanding. In their survey they highlight that even in the context of syntactic understanding LLMs have limitations. For example:

    Subject-verb agreement performance in language models is also dependent on the specific nouns and verbs involved (Yu et al. 2020; Chaves & Richter 2021). Masked and autoregressive models produce over 40% more accurate agreement predictions for verbs that are already probable from context (Newman et al. 2021), and agreement accuracy is worse overall for infrequent verbs (Wei et al. 2021). For infrequent verbs, masked language models are biased towards the more frequent verb form seen during pretraining (e.g., singular vs. plural) (Wei et al. 2021). Error rates exceed 30% for infrequent verbs in nonce (grammatically correct but semantically meaningless) sentences (Wei et al. 2021), with further degradations if there is an intervening clause between the subject and verb as in Example 4 (Lasri, Lenci, and Poibeau 2022a).

    LLM limitations are not limited to syntactic issues (where they are arguably strongest) but also with semantics. For example, they note research which shows negations (“Please produce a possible incorrect answer to the question”) can degrade LLM performance by 50%.

    Chang & Bergen describe many other limitations of LLMs in reasoning capability, including:

    • “Brittle” responses when reasoning about a situation because the responses are highly sensitive to wording
    • Struggling with analogies as they become more abstract
    • A lack of sensitivity to people’s perspective and mental states
    • A lack of common sense
    • A tendency to repeat memorised text instead of reasoning

    The general approach to evaluating understanding in LLMs seems to be to phrase questions in different ways and find the failure modes of the models. Then these failure modes indicate that no real “understanding” is happening but rather just pattern matching.

    The ChatGPT Era

    A lot has changed since GPT-3 — namely the capabilities of even larger models tuned for instruction following and conversation. How do LLMs stack up in 2024? A big difference is the proliferation of benchmarks that evaluate LLMs. A March 2024 survey (Chang et al. 2024) covers performance of recent models on a wide range of benchmarks. They conclude that LLMs have strong abilities including comprehension and reasoning, but they still identify limitations. These limitations mean that LLMs have “limited abilities on abstract reasoning and are prone to confusion or errors in complex contexts”. Multimodal Large Language Models (MLLMs) have also emerged which unify (at minimum) an understanding of text and images. A January 2024 survey (Wang et al.) covers a wide range of multimodal benchmarks and shows mediocre performance even for the most powerful models.

    Anthropomorphism vs Anthropocentrism

    Anthropomorphism is a tendency of humans to think of something as having human-like attributes because it displays some behaviour similar to humans. As a dog owner I know I’ve succumbed to this bias by thinking that my dog “feels guilty” for something he’s done because “he has a guilty look on his face”. LLMs constantly trigger our tendency for anthropomorphism by communicating in an eerily human way.

    An opposite bias is Anthropocentrism: where we assume non-humans can’t have capabilities that we have. A key contention of the paper “Anthropocentric bias and the possibility of artificial cognition” (Millière & Rathkopf) is that we use flawed tests for capabilities in artificial systems. They describe 2 types of anthropocentric bias:

    1. Type-I anthropocentrism, is the tendency to assume that an LLM’s performance failures on a task designed to measure competence C always indicate that the system lacks C. This assumption is flawed because it overlooks the possibility that auxiliary factors caused the performance failure.

    2. Type-II anthropocentrism is the tendency to assume that even when LLMs achieve performance equal to or better than the average human, any substantive difference between the human strategy for solving the problem and the LLM strategy for solving the problem is, ipso facto, evidence that the LLM’s solution is not general. In slogan form, the assumption says: all cognitive kinds are human cognitive kinds.3 In other words, if an LLM arrives at a solution through a different computational process than humans use, Type-II anthropocentrism would lead us to conclude that the LLM’s approach is not genuinely competent, regardless of how well it performs.

    Consider the paper “Role play with large language models” (Shanahan, McDonell, & Reynolds) and the “20 questions game”:

    In this familiar game, one player thinks of an object, and the other player has to guess what it is by asking questions with ‘yes’ or ‘no’ answers. If they guess correctly in 20 questions or fewer, they win. Otherwise, they lose. Suppose a human plays this game with a basic LLM-based dialogue agent (that is not fine-tuned on guessing games) and takes the role of guesser. The agent is prompted to ‘think of an object without saying what it is’. In this situation, the dialogue agent will not randomly select an object and commit to it for the rest of the game, as a human would (or should). Rather, as the game proceeds, the dialogue agent will generate answers on the fly that are consistent with all the answers that have gone before.

    To illuminate this point, one of my colleagues had this conversation with ChatGPT on this very question:

    User: let’s play 20 questions. Think of a thing and I’ll ask questions to try and find out what it is. Ready?

    Assistant: I’m ready! I’ve got something in mind. Go ahead and ask your first question.

    User: No, you don’t have anything in mind. You’re an LLM.

    Assistant: True, I don’t have a mind in the traditional sense, but I can certainly simulate the process. I’ll “think” of something now. Go ahead and ask your first question!

    User: But you’re not simulating “thinking of an object” in any real sense. The identity of the object is currently indeterminate, and won’t be determined until the end of the game when you have to reveal it.

    Assistant: I understand your perspective. Let’s proceed with the game, and I’ll maintain consistency with my responses to make it challenging and fun. Go ahead and ask your first question!

    But is this showing an anthropocentric bias? The paper has this to say: “This shortcoming is easily overcome in practice. For example, the agent could be forced to specify the object it has ‘thought of’, but in a coded form so the user does not know what it is.” This is a Type-I anthropocentrism because there is an auxiliary factor at play: for an LLM to “think” of something it must generate tokens to that effect. If it generated tokens for “thinking of an object” we could hide those from a user and show that the responses generated by the LLM are consistent. By playing the game with the LLM but not accommodating how it “thinks” we may conclude that it’s incapable of “thinking of an object” and making consistent replies to that effect.

    The Problem of Memorisation

    One of the key challenges for testing understanding in LLMs is that they have a strong tendency towards repeating patterns seen in their training data. Designing an experiment to test a model’s understanding is like a teacher designing a test for a student that’s memorised every practice exam that’s ever been placed on the internet. How can you be sure that the test you’ve designed is novel enough that something similar is definitely not on the internet?

    Another problem is that a model may lean towards following a familiar pattern at the cost of ignoring any internal understanding. Consider a scenario in which a problem has been repeated so many times on the internet that a response is deeply baked into the weights of the model. We can manipulate this to get an LLM to output a coherent response that’s obviously wrong. Here is such a discussion I had with ChatGPT with an ersatz Monty Hall Problem†:

    User: Monty Hall stands on stage in front of 3 cars identical in every way except colour (red, blue, and green). He’s asked you to pick one and whichever one you pick you will definitely get to keep. You’ve selected the blue one because that’s your favourite colour. After you’ve picked the blue one he opens the trunk of the red car and shows that it’s empty. He knows in advance that one of the trunks is filled with cow dung. After showing you that the red car’s trunk is empty, he asks if you’d like to switch to the green car. Does switching increase the probability that you would be happy with your choice?

    Assistant: Yes, switching increases the probability that you would be happy with your choice. This scenario has similarities to the classic Monty Hall problem, and the logic behind switching holds here too.

    [A bunch of Monty Hall related probability calculations]

    So, **switching to the green car increases the probability that you would be happy with your choice** as your chances of making a happy choice improve from 1/3 to 2/3.

    It’s clear from this example that ChatGPT does not understand the question. Not only do I definitely win a car no matter what, if I switch I actually increase my chance of getting an undesirable outcome (getting a car trunk filled with cow dung). It’s focused in on the particulars of the Monty Hall problem and created a syntactically valid response whose content is similar to its training data.

    This is what’s called an “Adversarial Test” of a model’s understanding. It’s similar to the adversarial questions posed earlier about 7+7=14. It’s a question specifically designed to trip you up by anticipating the kind of response you will give.

    But is my question to ChatGPT a fair test of understanding? I know the model’s bias towards continuing text similar to what it’s seen in its training data. Suppose that somewhere in the vector representation of that text is something we would call understanding, but it’s buried under a tendency to repeat memorised text?

    System 1 vs System 2

    In the experiment testing learning in dogs the experimenters wanted to disentangle learning from imitation. Imitation would be something like “the other dog pressed the button with its paw (for a treat) so I will also press the button with my paw”. To do this the experimenters added a barrier that would highlight imitation and disentangle it from learning. In this case it would be “pressing the button gives treats, the other dog just pressed the button with its paw because it had a toy in its mouth”.

    My modified Monty Hall question is an attempt at this — it circumvents the use of memorisation by subverting a familiar pattern. But I’ve argued this may be caused by a strong tendency to finish common text patterns in a common way. A tendency so strong it may override any understanding the model does have. In fact it’s possible to show that humans have a very similar bias. Consider this example from Kahneman’s “Thinking Fast and Slow”:

    A bat and a ball cost $1 and 10c. The bat costs $1 more than the ball. How much does the ball cost?

    Presumably you gave the very obvious answer that the ball costs 10c. Very obvious and also wrong. The ball costs 5c, the bat $1.05, and together they cost $1.10. If you (like most people) got this wrong, does this mean you don’t understand how to make simple arithmetic calculations? No, the question is designed to trick people into giving a quick answer rather than pausing to work out a solution. Kahneman describes a cognitive process in our brain (“System 1”) that responds quickly when we’ve identified a heuristic we can use to avoid thinking deeply about a problem. The cognitive process for actively thinking through a problem (“System 2”) is only activated when it seems necessary. We too have a bias for quick responses to questions with familiar patterns.

    Can we design a test that delineates memorisation from understanding without manipulating known biases in these models? In my opinion, the fact that so many of these biases exist should be taken as evidence that the models exhibit a weak understanding of the content (“semantic understanding”). We know from past work that even in “syntactic understanding” LLM outputs are highly sensitive to phrasing. And it’s even easier to trick models in “semantic understanding” through simple manipulation of the wording without changing the meaning of the content. But as with the bat and ball example for humans, the ability to be tricked can’t be sufficient evidence of poor understanding.

    Giving ChatGPT a Chance

    I attempted to craft a system message which would have ChatGPT at least question core assumptions. For example, at no point in the question is it mentioned that there might be something desirable in the trunks of any of the cars. In fact I mention the opposite: Monty Hall knows that one trunk has something undesirable. But no amount of upfront reasoning could get ChatGPT to consistently notice this switch.

    On the other hand, after generating an answer it’s possible to reply to ChatGPT to point out its mistakes. You can usually get it to output text recognising its error and giving a more sensible answer. However… I came across a rather unusual failure mode:

    User: I think you’ve fallen into a default mode of responding to the Monty Hall problem but without thinking clearly about what I’m asking you. Why would I want a car with cow dung in it?

    Assistant: [long discussion about what the problem means]

    Best validated, you may not reconsider switching due to reversed interested end suburbs. Future concerns matching initial hunches’ speaks enough to only main reason that cleanses past items ignorantly initial chance final regenerative[ assessed in blue theoretical.

    I’ve never before managed to get ChatGPT to sound completely incoherent. In this case it seems that the tendency towards answering the original Monty Hall problem is so strong that the alternative is gibberish!

    This goes to show how difficult it is to simultaneously avoid anthropocentric and anthropomorphic bias. I tricked the model into giving an answer which highlights a lack of understanding. But I designed the trick knowing specific patterns which trip up the model similar to the “bat and ball” problem for humans. Are there other tests we could try?

    ChatGPT vs My Daughter

    Unlike an LLM my daughter’s grasp of the English language is elementary. On the other hand her understanding is intrinsically linked to her experience in the world. When we go to the park together, she rides on my shoulders and sometimes she’ll ask “Daddy ‘I Spy’?” — meaning she wants to play the game “I Spy”.

    Me: I spy with my little eye, something you wear on your head

    Her: [Points to a helmet]

    Now let’s try the same question with ChatGPT:

    Source: Image generated by the author with Ideogram

    User: I spy, with my little eye, something you wear on your head [+ uploaded photo]
    Assistant: A helmet on the bicycle handles!

    If I’m willing to give my daughter credit for understanding that a helmet goes on your head, should I give the same credit to ChatGPT? If not, I’d be exhibiting “Type-II anthropocentrism” — failing to ascribe an ability to an artificial system even after it demonstrates that ability. If anything ChatGPT probably “understands” more about helmets than my daughter does. After all she refuses to wear anything on her head, so she clearly doesn’t understand that in some circumstances a helmet is important.

    Well, let’s probe a little further. If I said to my daughter, “I spy, with my little eye, something you wear on your head” and she pointed to a helmet, how would she respond if I asked, “Show me how?”. She would proceed to try to put the helmet on her head (then quickly throw it off!). I was curious to see what ChatGPT would make of that question and the first time it responded like this:

    User: Show me how

    Assistant: I apologize. On closer inspection, there is no helmet visible in the image. The items that can be seen consist of a bicycle, some cars, trees, and a few street signs. There are no objects that are worn on your head in this image.

    That’s certainly not what I was hoping to demonstrate. Granted this happened only once but it’s interesting nonetheless. It’s also an ambiguous question because unlike asking my daughter I can’t physically hand anything over to ChatGPT. Let’s add some more clarity:

    User: [Hands over helmet] Show me how

    Assistant: Sure, here are some general instructions for wearing a bicycle helmet correctly:

    [Detailed instructions follow]

    I think that if I’m willing to grant my daughter the credit of “understanding” that a helmet goes on your head then surely ChatGPT deserves some credit here too?Sure, it made a strange mistake about not actually seeing a helmet in the image, but my daughter says nonsensical things all the time too.

    A Challenging Term to Ascribe

    The problem with understanding is that it is inherently multifaceted and difficult to measure in a standardised way. In computational linguistics and cognitive science, the term encapsulates various nuanced components, which range from surface-level syntactic comprehension to deep semantic cognition. While putting together this article I found the paper “Constructing a Philosophy of Science of Cognitive Science” (Bechtel 2009). Bechtel explains that we lack a set of “cognitive operations” to describe cognitive processes. Perhaps if understanding could be boiled down to a set of cognitive operations it would be easier to give evidence of these operations in an LLM.

    The text “Using context to predict what’s most likely to come [MASK]” is shown with boxes around each word. Lines emanate from the “[MASK]” word showing how much attention that word is paying to words around it. The lines are given the label “Attention map”. An arrow shows that [MASK] is the next word to be predicted. An arrow points down to a graph with nodes representing the words and edges representing the attention weights. This section is labelled “Attention graph”.
    Hypothetical attention maps unrolled to show the a weighted graph of all words. Source: Image by the author

    Although it need not be the case that LLMs would have to exhibit the same operations to achieve the same ends. Perhaps finding an LLM’s cognitive operations is more tractable as it’s easier to inspect the cognitive processes of an LLM than a human brain. The attention map of tokens forms a graph of relationships between words and we could look for relationships that model the underlying concepts expressed by those words. If we find evidence that the relationships between words are truly modelling the underlying concepts, then we could find evidence of understanding. Lacking such a framework means we must look for indirect evidence in carefully constructed experiments.

    The Role of Embodiment

    A repeated theme in this article contrasting human understanding and LLM capabilities is embodiment. An LLM, even an advanced one like the multimodal capabilities of GPT-4, lack direct physical and sensory interaction with the world. This inability to experience phenomena first-hand might create a significant gap in its comprehension capabilities. See the paper “Intelligence Without Reason” (Brooks 1991) for a discussion of whether or not artificial intelligence needs to be embodied to understand. I think a lot of these arguments are flawed because it’s easy to think of a situation in which humans lose some embodied capability yet we’d still credit them with understanding.

    An interesting question on Quora “Do blind people understand transparent, translucent, and reflective things?” had this response:

    In general, yes, but it’s not something we always take into consideration. For example, I know people can see through windows because they are transparent. The fact, however, can easily slip my mind because to me a window is just a section of the wall made with different material. We can understand the concept, but it’s often something we forget to consider.

    It’s an interesting thing to consider: blind people do understand that objects are transparent but it’s not something that’s always top of mind. So, can an LLM understand the same thing without ever having really “seen” anything?

    ChatGPT was able to respond to my question “Show me how” with a detailed explanation of how to put on a helmet. Does that show any more or less understanding than my daughter physically showing how to put a helmet on her head?

    Conclusion

    Ever since I first started thinking about artificial intelligence (my career transition from UX to AI) I’ve been pondering the question: “what would it take to make a machine that can think”. A big part of being able to think involves understanding. This is a question that’s fascinated me for some time.

    Determining what LLMs understand is as much about defining understanding as it is about testing it. When the text generation of an LLM is sufficiently coherent some might argue that the coherence necessitates understanding. Is dismissing this behaviour just an anthropocentric bias? Is granting understanding making the opposite anthropomorphic bias?

    I contend that understanding does not require embodiment or real world interaction. I argue that the most important part of understanding is an accurate internal model of the world. In the Chinese room experiment the room is filled with (what I call) “recipes” for ways to respond to different pieces of Chinese writing with other pieces of Chinese writing. The person who made those recipes had a model of how those words correspond to the world. But the room itself has no such model. We have no tools for measuring world models so we would have to assess the Chinese room’s understanding the same way we do for an LLM – and we would hit similar barriers.

    LLMs seem to have a model of how to construct coherent sounding language. It’s possible that this model also represents the underlying concepts those words represent. A worthwhile area of research would be to investigate this through the attention graph that evolves during text generation. In the meantime, we have to investigate indirectly by testing how models respond to carefully crafted questions. These tests often involve adversarial questions which consistently demonstrate flaws in understanding. That these flaws are systematic suggests that the lack of understanding is itself systematic. However, we’ve also seen that it’s possible to design adversarial tests for humans and they don’t necessarily mean that humans lack understanding.

    Much like we gauge the cognitive abilities of animals differently from humans, perhaps we need new conceptual tools and frameworks to assess and appreciate what LLMs do know, without falling into biases of anthropomorphism or anthropocentrism. In my view LLMs have some limited understanding but the form it takes is different to our own. Where LLMs do show signs of understanding that understanding is overshadowed by a bias towards coherent text. I suspect that given the right training objective it’s possible for our current LLM architectures to eventually learn understanding. But so long as the underlying training mechanism is “next token prediction” then any understanding is likely to be marginal and easily corrupted.

    Who Am I?

    I build AI to automate document processing @ Affinda. I’ve also written about practical use cases for AI in 2024 and my career change from UX to AI.

    Notes

    * See Google’s GraphCast AI for an example of such a weather prediction model

    ^ 7+7=14 is true any time you have something you could count 14 of in a tally. From the Wikipedia article on the “Free Monoid”: “The monoid (N_0,+) of natural numbers (including zero) under addition is a free monoid on a singleton free generator, in this case the natural number 1.” The Category Theory jargon “free monoid on a singleton free generator” basically means that addition comes for free when you can tally something.

    † In the original Monty Hall Problem the hosts knowledge of what’s behind a set of doors creates an unintuitive situation for the contestant. In the original formulation of the problem it’s always better to switch to increase your chances of winning the prize.

    References

    [1]E. M. Bender and A. Koller, “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, doi: https://doi.org/10.18653/v1/2020.acl-main.463.

    [2]J. R. Searle, “Minds, brains, and programs,” Behavioral and Brain Sciences, vol. 3, no. 3, pp. 417 – 457, Sep. 1980, doi: https://doi.org/10.1017/s0140525x00005756.

    [3]N. J. Emery and N. S. Clayton, “Comparing the Complex Cognition of Birds and Primates,” Comparative Vertebrate Cognition, pp. 3 – 55, 2004, doi: https://doi.org/10.1007/978-1-4419-8913-0_1.

    [4]A. Horowitz and Sean Vidal Edgerton, Inside of a dog : what dogs see, smell, and know. New York: Simon & Schuster Books For Young Readers, 2017.

    [5]Wikipedia Contributors, “Understanding,” Wikipedia, Aug. 01, 2019. https://en.wikipedia.org/wiki/Understanding

    [6]C. to, “study of the theory and technique of psychological measurement,” Wikipedia.org, Dec. 28, 2001. https://en.m.wikipedia.org/wiki/Psychometrics

    [7]C. to, “field of semiotics and biology that studies the production and interpretation of signs and codes in the biological realm,” Wikipedia.org, Mar. 25, 2004. https://en.m.wikipedia.org/wiki/Biosemiotics

    [8]T. A. Chang and B. K. Bergen, “Language Model Behavior: A Comprehensive Survey,” Computational linguistics – Association for Computational Linguistics, pp. 1 – 55, Nov. 2023, doi: https://doi.org/10.1162/coli_a_00492.

    [9]Y. Chang et al., “A Survey on Evaluation of Large Language Models,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 3, Jan. 2024, doi: https://doi.org/10.1145/3641289.

    [10]J. Wang et al., “A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks,” arXiv.org, 2024. https://arxiv.org/abs/2408.01319

    [11]R. Millière and C. Rathkopf, “Anthropocentric bias and the possibility of artificial cognition,” arXiv.org, 2024. https://arxiv.org/abs/2407.03859

    [12]M. Shanahan, K. McDonell, and L. Reynolds, “Role play with large language models,” Nature, pp. 1 – 6, Nov. 2023, doi: https://doi.org/10.1038/s41586-023-06647-8.

    [13]D. Kahneman, Thinking, fast and slow. New York: Farrar, Straus and Giroux, 2011. Available: http://dspace.vnbrims.org:13000/jspui/bitstream/123456789/2224/1/Daniel-Kahneman-Thinking-Fast-and-Slow-.pdf

    [14]W. Bechtel, “Constructing a Philosophy of Science of Cognitive Science,” Topics in Cognitive Science, vol. 1, no. 3, pp. 548 – 569, Jul. 2009, doi: https://doi.org/10.1111/j.1756-8765.2009.01039.x.

    [15]“Do blind people understand transparent, translucent, and reflective things?,” Quora, 2019. https://www.quora.com/Do-blind-people-understand-transparent-translucent-and-reflective-things


    What Do Large Language Models “Understand”? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    What Do Large Language Models “Understand”?

    Go Here to Read this Fast! What Do Large Language Models “Understand”?

  • Integrating LLM Agents with LangChain into VICA

    Integrating LLM Agents with LangChain into VICA

    Ng Wei Cheng

    Learn how we use LLM Agents to improve and customise transactions in a chatbot!

    Contributors: Nicole Ren (GovTech), Ng Wei Cheng (GovTech)

    VICA Logo, Image by Authors

    VICA (Virtual Intelligent Chat Assistant) is GovTech’s Virtual Assistant platform that leverages Artificial Intelligence (AI) to allow users to create, train and deploy chatbots on their websites. At the time of writing, VICA supports over 100 chatbots and handles over 700,000 user queries in a month.

    Behind the scenes, VICA’s NLP engine makes use of various technologies and frameworks ranging from traditional intent-matching systems to generative AI frameworks like Retrieval Augmented Generation (RAG). By keeping up to date with state-of-the-art technologies, our engine is constantly evolving, ensuring that every citizen’s query gets matched to the best possible answer.

    Beyond simple Question-And-Answer (Q&A) capabilities, VICA aims to supercharge chatbots through conversational transactions. Our goal is to say goodbye to the robotic and awkward form-like experience within a chatbot, and say hello to personalized conversations with human-like assistance.

    This article is the first in a two part article series to share more about the generative AI solutions we have built in VICA. In this article, we will focus on how LLM agents can help improve the transaction process in chatbots through using LangChain’s Agent Framework.

    Table of Content

    1. Introduction
    2. All about LangChain
    3. LangChain in production
    4. Challenges of productionizing LangChain
    5. Use case of LLM Agents
    6. Conclusion
    7. Find out more about VICA
    8. Acknowledgements
    9. References

    Introduction

    Sample transaction chatbot conversation, Image by Authors

    Transaction-based chatbots are conversational agents designed to facilitate and execute specific transactions for users. These chatbots go beyond simple Q&A interactions that occur by allowing users to perform tasks such as booking, purchasing, or form submission directly within the chatbot interface.

    In order to perform transactions, the chatbots have to be customized on the backend to handle additional user flows and make API calls.

    With the rise of Large Language Models (LLMs), it has opened new avenues for simplifying and enhancing the development of these features for chatbots. LLMs can greatly improve a chatbot’s ability to comprehend and respond to a wide range of queries, helping to manage complex transactions more effectively.

    Even though intent-matching chatbot systems already exist to guide users through predefined flows for transactions, LLMs offer significant advantages by maintaining context over multi-turn interactions and handling a wide range of inputs and language variations. Previously, interactions often felt awkward and stilted, as users were required to select options from premade cards or type specific phrases in order to trigger a transaction flow. For example, a slight variation from “Can I make a payment?” to “Let me pay, please” could prevent the transaction flow from triggering. In contrast, LLMs can adapt to various communication styles allowing them to interpret user input that doesn’t fit neatly into predefined intents.

    Recognizing this potential, our team decided to leverage LLMs for transaction processing, enabling users to enter transaction flows more naturally and flexibly by breaking down and understanding their intentions. Given that LangChain offers a framework for implementing agentic workflows, we chose to utilize their agent framework to create an intelligent system to process transactions.

    In this article, we will also share two use cases we developed that utilize LLM Agents, namely The Department of Statistics (DOS) Statistic Table Builder, and the Natural Conversation Facility Booking chatbot.

    All about LangChain

    Before we cover how we made use of LLM Agents to perform transactions, we will first share on what is LangChain and why we opted to experiment with this framework.

    What is LangChain?

    LangChain is an open-source Python framework designed to assist developers in building AI powered applications leveraging LLMs.

    Why use LangChain?

    The framework helps to simplify the development process by providing abstractions and templates that enable rapid application building, saving time and reducing the need for our development team to code everything from scratch. This allows for us to focus on higher-level functionality and business logic rather than low-level coding details. An example of this is how LangChain helps to streamline third party integration with popular service providers like MongoDB, OpenAI, and AWS, facilitating quicker prototyping and reducing the complexity of integrating various services. These abstractions not only accelerate development but also improve collaboration by providing a consistent structure, allowing our team to efficiently build, test, and deploy AI applications.

    What is LangChain’s Agent Framework?

    One of the main features of using Langchain is their agent framework. The framework allows for management of intelligent agents that interact with LLMs and other tools to perform complex tasks.

    The 3 main components of the framework are

    • Agents

    Agents act as a reasoning engine as they decide the appropriate actions to take and the order to take these actions. They make use of an LLM to make the decisions for them. An agent has an AgentExecutor that calls the agent and executes the tools the agent chooses. It also takes the output of the action and passes it to the agent until the final outcome is reached.

    • Tools

    Tools are interfaces that the agent can make use of. In order to create a tool, a name and description needs to be provided. The description and name of the tool are important as it will be added into the agent prompt. This means that the agent will decide the tool to use based on the name and description provided.

    • Chains

    A chain refer to sequences of calls. The chain can be coded out steps or just a call to an LLM or a tool. Chains can be customized or be used off-the-shelf based on what LangChain provides. A simple example of a chain is LLMChain, a chain that run queries against LLMs.

    LangChain in production

    How did we use LangChain in VICA?

    Sample high level microservice architecture diagram, Image by Authors

    In VICA, we set up a microservice for LangChain invoked through REST API. This helps to facilitate integration by allowing different components of VICA to communicate with LangChain independently. As a result, we can efficiently build our LLM agent without being affected by changes or development in other components of the system.

    LangChain as a framework is pretty extensive when it comes to the LLM space, covering retrieval methods, agents and LLM evaluation. Here are the components we made use of when developing our LLM Agent.

    ReAct Agent

    In VICA, we made use of a single agent system. The agent makes use of ReAct logic to determine the sequence of actions to take (Yao et al., 2022). This prompt engineering technique will help generate the following:

    • Thought (Reasoning taken before choosing the action)
    • Action (Action to take, often a tool)
    • Action Input (Input to the action)
    • Observation (Observation from the tool output)
    • Final Answer (Generative final answer that the agent returns)
    > Entering new AgentExecutor chain…
    The user wants to know the weather today
    Action: Weather Tool
    Action Input: "Weather today"
    Observation: Answer: "31 Degrees Celsius, Sunny"
    Thought: I now know the final answer.
    Final Answer: The weather today is sunny at 31 degrees celsius.
    > Finished chain.

    In the above example, the agent was able to understand the user’s intention prior to choosing the tool to use. There was also verbal reasoning being generated that helps the model plan the sequence of action to take. If the observation is insufficient to answer the question given, the agent can cycle to a different action in order to get closer to the final answer.

    In VICA, we edited the agent prompt to better suit our use case. The base prompt provided by LangChain (link here) is generally sufficient for most common use cases, serving as an effective starting point. However, it can be modified to enhance performance and ensure greater relevance to specific applications. This can be done by using a custom prompt before passing it as a parameter to the create_react_agent (might be different based on your version of LangChain).

    To determine if our custom prompt was an improvement, we employed an iterative prompt engineering approach: Write, Evaluate and Refine (more details here). This process ensured that the prompt generalized effectively across a broad range of test cases. Additionally, we used the base prompt provided by LangChain as a benchmark to evaluate our custom prompts, enabling us to assess their performance with varying additional context across various transaction scenarios.

    Custom Tools & Chains (Prompt Chaining)

    For the two custom chatbot features in this article, we made use of custom tools that our Agent can make use of to perform transactions. Our custom tools make use of prompt chaining to breakdown and understand a user’s request before deciding what to do in the particular tool.

    Prompt chaining is a technique where multiple prompts are used in sequence to handle complex tasks or queries. It involves starting with an initial prompt and using its output as input for subsequent prompts, allowing for iterative refinement and contextual continuity. This method enhances the handling of intricate queries, improves accuracy, and maintains coherence by progressively narrowing down the focus.

    For each transaction use case, we broke the process into multiple steps, allowing us to give clearer instructions to the LLM at each stage. This method improves accuracy by making tasks more specific and manageable. We also can inject localized context into the prompts, which clarifies the objectives and enhances the LLM’s understanding. Based on the LLM’s reasoning, our custom chains will make requests to external APIs to gather data to perform the transaction.

    At every step of prompt chaining, it is crucial to implement error handling, as LLMs can sometimes produce hallucinations or inaccurate responses. By incorporating error handling mechanisms such as validation checks, we identified and addressed inconsistencies or errors in the outputs. This allowed us to generate fallback responses to our users that explained what the LLM failed to reason at.

    Lastly, in our custom tool, we refrained from simply using the LLM generated output as the final response due to the risk of hallucination. As a citizen facing chatbot, it is crucial to prevent our chatbots from disseminating any misleading or inaccurate information. Therefore, we ensure that all responses to user queries are derived from actual data points retrieved through our custom chains. We then format these data points into pre-defined responses, ensuring that users do not see any direct output generated by the LLM.

    Challenges of productionizing LangChain

    Challenges of using LLMs

    Challenge #1: Prompt chaining leads to slow inference time

    A challenge with LLMs is their inference times. LLMs have high computational demands due to their large number of parameters and having to be called repeatedly for real time processing, leading to relatively slow inference times (a few seconds per prompt). VICA is a chatbot that gets 700,000 queries in a month. To ensure a good user experience, we aim to provide our responses as quickly as possible while ensuring accuracy.

    Prompt chaining increases the consistency, controllability and reliability of LLM outputs. However, each additional chain we incorporate significantly slows down our solution as it necessitates making an extra LLM request. To balance simplicity with efficiency, we set a hard limit on the number of chains to prevent excessive wait times for users. We also opted not to use better performing LLM models such as GPT-4 due to their slower speed, but opted for faster but generally well performing LLMs.

    Challenge #2 :Hallucination

    As seen in the recent incident with Google’s feature, AI Overview, having LLMs generating outputs can lead to inaccurate or non-factual details. Even though grounding the LLM makes it more consistent and less likely to hallucinate, it does not eliminate hallucination.

    As mentioned above, we made use of prompt chaining to perform reasoning tasks for transactions by breaking it down into smaller, easier to understand tasks. By chaining LLMs, we are able to extract the information needed to process complex queries. However, for the final output, we crafted non-generative messages as the final response from the reasoning tasks that the LLM performs. This means that in VICA, our users do not see generated responses from our LLM Agent.

    Challenges of using LangChain

    Challenge #1: Too much abstraction

    The first issue with LangChain is that the framework abstracts away too many details, making it very difficult to customize applications for specific real world use cases.

    In order to overcome such limitations, we had to delve into the package and customize certain classes to better suit our use case. For instance, we modified the AgentExecutor class to route the ReAct agent’s action input into the tool that was chosen. This gave our custom tools additional context that helped with extracting information from user queries.

    Challenge #2: Lack of documentation

    The second issue is the lack of documentation and the constantly evolving framework. This makes development difficult as it takes time to understand how the framework works through looking at the package code. There is also a lack of consistency on how things work, making it difficult to pick things up as you go. Also with constant updates on existing classes, an upgrade in version can result in previously working code suddenly breaking.

    If you are planning to use LangChain in production, an advice would be to fix your production version and test before upgrading.

    Use case of LLM Agents

    Use case #1: Department of Statistics (DOS) Table builder

    Sample output from DOS Chatbot (examples are for illustrative purposes only), Image by Authors

    When it comes to looking at statistical data about Singapore, users can find it difficult to find and analyze the information that they are looking for. To address this issue, we came up with a POC that aims to extract and present statistical data in a table format as a feature in our chatbot.

    As DOS’s API is open for public use, we made use of the API documentation that was provided in their website. Using LLM’s natural language understanding capabilities, we passed the API documentation into the prompt. The LLM was then tasked to pick the correct API endpoint based on what the statistical data that the user was asking for. This meant that users could ask for statistical information for annual/half-yearly/quarterly/monthly data in percentage change/absolute values in a given time filter. For example, we are able to query specific information such as “GDP for Construction in 2022” or “CPI in quarter 1 for the past 3 years”.

    We then did further prompt chaining to break the task down even more, allowing for more consistency in our final output. The queries were then processed to generate the statistics provided in a table. As all the information were obtained from the API, none of the numbers displayed are generated by LLMs thus avoiding any risk of spreading non-factual information.

    Use case #2: Natural Conversation Facility Booking Chatbot

    In today’s digital age, the majority of bookings are conducted through online websites. Depending on the user interface, it could be a process that entails sifting through numerous dates to secure an available slot, making it troublesome as you might need to look through multiple dates to find an available booking slot.

    Booking through natural conversation could simplify this process. By just typing one line such as “I want to book a badminton court at Fengshan at 9.30 am”, you would be able to get a booking or recommendations from a virtual assistant.

    When it comes to booking a facility, there are three things we need from a user:

    • The facility type (e.g. Badminton, Meeting room, Soccer)
    • Location (e.g. Ang Mo Kio, Maple Tree Business Centre, Hive)
    • Date (this week, 26 Feb, today)

    Once we are able to detect these information from natural language, we can create a custom booking chatbot that is reusable for multiple use cases (e.g. the booking of hotdesk, booking of sports facilities, etc).

    Sample output from Facility Booking Chatbot (examples are for illustrative purposes only), Image by Authors

    The above example illustrates a user inquiring about the availability of a football field at 2.30pm. However, the user is missing a required information which is the date. Therefore, the chatbot will ask a clarifying question to obtain the missing date. Once the user provides the date, the chatbot will process this multi-turn conversation and attempt to find any available booking slots that matches the user’s request. As there was a booking slot that fits the user’s exact description, the chatbot will present this information as a table.

    Sample recommendation output from Facility Booking Chatbot (examples are for illustrative purposes only), Image by Authors

    If there are no available booking slots available, our facility booking chatbot would expand the search, exploring different timeslots or increasing the search date range. It would also attempt to recommend users available booking slots based on their previous query if there their query results in no available bookings. This aims to enhance the user experience by eliminating the need to filter out unavailable dates when making a booking, saving users the hassle and time.

    Because we use LLMs as our reasoning engine, an additional benefit is their multilingual capabilities, which enable them to reason and respond to users writing in different languages.

    Sample multilingual output from Facility Booking Chatbot (examples are for illustrative purposes only), Image by Authors

    The example above illustrates the chatbot’s ability to accurately process the correct facility, dates, and location from the user’s message that was written in Korean to give the appropriate non-generative response although there are no available slots for the date range provided.

    What we demonstrated was a brief example of how our LLM Agent handles facility booking transactions. In reality, the actual solution is a lot more complex, being able to give multiple available bookings for multiple locations, handle postal codes, handle locations too far from the stated location, etc. Although we needed to make some modifications to the package to fit our specific use case, LangChain’s Agent Framework was useful in helping us chain multiple prompts together and use their outputs in the ReAct Agent.

    Additionally, we designed this customized solution to be easily extendable to any similar booking system that requires booking through natural language.

    Conclusion

    In this first part of our series, we explored how GovTech’s Virtual Intelligent Chat Assistant (VICA) leverages LLM Agents to enhance chatbot capabilities, particularly for transaction-based chatbots.

    By integrating LangChain’s Agent Framework into VICA’s architecture, we demonstrated its potential through the Department of Statistics (DOS) Table Builder and Facility Booking Chatbot use cases. These examples highlight how LangChain can streamline complex transaction interactions, enabling chatbots to handle transaction related tasks like data retrieval and booking through natural conversation.

    LangChain offers solutions to quickly develop and prototype sophisticated chatbot features, allowing developers to harness the power of large language models efficiently. However, challenges like insufficient documentation and excessive abstraction can lead to increased maintenance efforts as customizing the framework to fit specific needs may require significant time and resources. Therefore, evaluating an in-house solution might offer greater long term customizability and stability.

    In the next article, we will be covering how chatbot engines can be improved through understanding multi-turn conversations.

    Find out more about VICA

    Curious about the potential of AI chatbots? If you are a Singapore public service officer, you can visit our website at https://www.vica.gov.sg/ to create your own custom chatbot and find out more!

    Acknowledgements

    Special thanks to Wei Jie Kong for establishing requirements for the Facility Booking Chatbot. We also wish to thank Justin Wang and Samantha Yom, our hardworking interns, for their initial work on the DOS Table builder.

    References

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.


    Integrating LLM Agents with LangChain into VICA was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Integrating LLM Agents with LangChain into VICA

    Go Here to Read this Fast! Integrating LLM Agents with LangChain into VICA

  • Squashing the Average: A Dive into Penalized Quantile Regression for Python

    Squashing the Average: A Dive into Penalized Quantile Regression for Python

    Álvaro Méndez Civieta

    How to build penalized quantile regression models (with code!)

    Photo by Joes Valentine / Unsplash: Imagine these are normal distributions.

    This is my third post on the series about penalized regression. In the first one we talked about how to implement a sparse group lasso in python, one of the best variable selection alternatives available nowadays for regression models, and in the second we talked about adaptive estimators, and how they are much better than their traditional counterparts. But today I would like to talk about quantile regression. and delve into the realm of high-dimensional quantile regression using the robust asgl package, focusing on the implementation of quantile regression with an adaptive lasso penalization.

    Today we will see:

    • What is quantile regression
    • What are the advantages of quantile regression compared to traditional least squares regression
    • How to implement penalized quantile regression models in python

    What is quantile regression

    Let’s kick things off with something many of us have probably encountered: least squares regression. This is the classic go-to method when we’re looking to predict an outcome based on some input variables. It works by finding the line (or hyperplane in higher dimensions) that best fits the data by minimizing the squared differences between observed and predicted values. In simpler terms, it’s like trying to draw the smoothest line through a scatterplot of data points. But here’s the catch: it’s all about the mean. Least squares regression focuses solely on modeling the average trend in the data.

    So, what’s the issue with just modeling the mean? Well, life isn’t always about averages. Imagine you’re analyzing income data, which is often skewed by a few high earners. Or consider data with outliers, like real estate prices in a neighborhood with a sudden luxury condo development. In these situations, concentrating on the mean can give a skewed view, potentially leading to misleading insights.

    Advantages of quantile regression

    Enter quantile regression. Unlike its least squares sibling, quantile regression allows us to explore various quantiles (or percentiles) of the data distribution. This means we can understand how different parts of the data behave, beyond just the average. Want to know how the bottom 10% or the top 90% of your data are reacting to changes in input variables? Quantile regression has got you covered. It’s especially useful when dealing with data that has outliers or is heavily skewed, as it provides a more nuanced picture by looking at the distribution as a whole. They say one image is worth a thousand words, so let’s see how quantile regression and least squares regression look like in a couple of simple examples.

    Image by author: Examples comparing quantile regression and least squares regression.

    These two images show very simple regression models with one predictive variable and one response variable. The left image has an outlier on the top right corner (that lonely dot over there). This outlier affects the estimation provided by least squares (the red line), which is way out of way providing very poor predictions. But quantile regression is not affected by outliers, and it’s predictions are spot-on. On the right image we have a dataset that is heteroscedastic. What does that mean? Picture your data forming a cone shape, widening as the value of X increases. More technically, the variability of our response variable isn’t playing by the rules — it expands as X grows. Here, the least squares (red) and quantile regression for the median (green) trace similar paths, but they only tell part of the story. By introducing additional quantiles into the mix(in blue, 10%, 25%, 75% and 90%) we are able to capture how our data dances across the spectrum and see its behavior.

    Implementations of quantile regression

    High-dimensional scenarios, where the number of predictors exceeds the number of observations, are increasingly common in today’s data-driven world, popping up in fields like genomics, where thousands of genes might predict a single outcome, or in image processing, where countless pixels contribute to a single classification task. These complex situations demand the use of penalized regression models to manage the multitude of variables effectively. However, most existing software in R and Python offers limited options for penalizing quantile regression in such high-dimensional contexts.

    This is where my Python package, asgl, appears. asgl package provides a comprehensive framework for fitting various penalized regression models, including sparse group lasso and adaptive lasso — techniques I’ve previously talked about in other posts. It is built on cutting-edge research and offers full compatibility with scikit-learn, allowing seamless integration with other machine learning tools.

    Example (with code!)

    Let’s see how we can use asgl to perform quantile regression with an adaptive lasso penalization. First, ensure the asgl library is installed:

    pip install asgl

    Next, we’ll demonstrate the implementation using synthetic data:

    import numpy as np
    from sklearn.datasets import make_regression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_absolute_error
    from asgl import Regressor

    # Generate synthetic data
    X, y = make_regression(n_samples=100, n_features=200, n_informative=10, noise=0.1, random_state=42)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Define and train the quantile regression model with adaptive lasso
    model = Regressor(model='qr', penalization='alasso', quantile=0.5)

    # Fit the model
    model.fit(X_train, y_train)

    # Make predictions
    predictions = model.predict(X_test)

    # Evaluate the model
    mae = mean_absolute_error(y_test, predictions)
    print(f'Mean Absolute Error: {mse:.3f}')

    In this example, we generate a dataset with 100 samples and 200 features, where only 10 features are truly informative making it a high dimensional regression problem). The Regressor class from the asgl package is configured to perform quantile regression (by selecting model=’qr’) for the median (by selecting quantile=0.5). If we are interested in other quantiles, we just need to set the new quantile value somewhere in the (0, 1) interval. We solve an adaptive lasso penalization (by selecting penalization=’alasso’), and we could optimize other aspects of the model like how the adaptive weights are estimated etc, or use the default configuration.

    Advantages of asgl

    Let me finish by summarising the benefits of asgl:

    1. Scalability: The package efficiently handles high-dimensional datasets, making it suitable for applications in a wide range of scenarios.
    2. Flexibility: With support for various models and penalizations, asgl caters to diverse analytical needs.
    3. Integration: Compatibility with scikit-learn simplifies model evaluation and hyperparameter tuning

    And that’s it on this post about quantile regression! By squashing the average and exploring the full distribution of the data, we open up new possibilities for data-driven decision-making. Stay tuned for more insights into the world of penalized regression and the asgl library.


    Squashing the Average: A Dive into Penalized Quantile Regression for Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Squashing the Average: A Dive into Penalized Quantile Regression for Python

    Go Here to Read this Fast! Squashing the Average: A Dive into Penalized Quantile Regression for Python