Year: 2024

  • Whales buy into PEPE – Do they know something you don’t?

    Aniket Verma

    Holders, on average, had an unrealized profit of 4.5% on their initial investments.
    PEPE looked bearish upon examination of its key technical indicators.

    Frog-themed token Pepe [PEPE] was o

    The post Whales buy into PEPE – Do they know something you don’t? appeared first on AMBCrypto.

    Go here to Read this Fast!

    Whales buy into PEPE – Do they know something you don’t?

    Originally appeared here:

    Whales buy into PEPE – Do they know something you don’t?

  • Unicorn Takes On the Corporate-Dominated Crypto with Fun Airdrop Competition

    Unicorn Takes On the Corporate-Dominated Crypto with Fun Airdrop Competition

    Livine Sanchez

    Unicorn Takes On the Corporate-Dominated Crypto with Fun Airdrop Competition

    The rising digital economy is a largely unexplored territory with plenty of earning opportunities for everyone. However, a corporate-dominated sphere of influence uses complex tech terms to make it sound inaccessible to fun-loving, casual users. As a result, potential investors are left out, and digital coin collecting becomes an exclusive pastime for an unnecessary elite. […]

    Go here to Read this Fast! Unicorn Takes On the Corporate-Dominated Crypto with Fun Airdrop Competition

    Originally appeared here:
    Unicorn Takes On the Corporate-Dominated Crypto with Fun Airdrop Competition

  • Pepecoin Bucks Market Recovery Trend: 69% Of PEPE Holders in Profit

    Pepecoin Bucks Market Recovery Trend: 69% Of PEPE Holders in Profit

    Arnold Kirimi

    Memecoin PEPE Incinerates 6.9 Trillion Tokens, Triggering 33% Price Jump Amid Market Frenzy

    Pepecoin, known for its whimsical frog-themed design and devoted community, stands out amid a broader market recovery, where many cryptocurrencies are still struggling to regain their footing.

    Go here to Read this Fast! Pepecoin Bucks Market Recovery Trend: 69% Of PEPE Holders in Profit

    Originally appeared here:
    Pepecoin Bucks Market Recovery Trend: 69% Of PEPE Holders in Profit

  • Ord.io prepares for new Bitcoin standard with $2 million funding boost

    Oluwapelumi Adejumo

    Bitcoin Ordinals explorer Ord.io has raised $2 million in a pre-seed funding round co-led by Bitcoin Frontier Fund and Sora Ventures to prepare for the highly anticipated launch of the Runes Protocol. Other investors in the round included Longhash Ventures, Daxos Capital, Portal Ventures, UTXO Management, Rubik Ventures, VitalTao Capital, Antalpha Ventures, Kommune Fund, Edessa […]

    The post Ord.io prepares for new Bitcoin standard with $2 million funding boost appeared first on CryptoSlate.

    Go here to Read this Fast! Ord.io prepares for new Bitcoin standard with $2 million funding boost

    Originally appeared here:
    Ord.io prepares for new Bitcoin standard with $2 million funding boost

  • Bitcoin’s post halving price charts will go THIS way – Experts

    Benjamin Njiri

    Marathon Digital executive claims BTC has already partially priced the halving.  
    Other execs disagree and expect BTC to follow previous cycles’ price action. 

    The current Bitcoin [BTC]

    The post Bitcoin’s post halving price charts will go THIS way – Experts appeared first on AMBCrypto.

    Go here to Read this Fast!

    Bitcoin’s post halving price charts will go THIS way – Experts

    Originally appeared here:

    Bitcoin’s post halving price charts will go THIS way – Experts

  • ETFSwap (ETFS) Raises $750,000 As Institutional Holders Look To The DeFi Platform For Tokenization

    ETFSwap (ETFS) Raises $750,000 As Institutional Holders Look To The DeFi Platform For Tokenization

    Guest Author

    ETFSwap (ETFS) Raises $750,000 As Institutional Holders Look To The DeFi Platform For Tokenization

    The DeFi sector in the blockchain industry is heating up with several narratives that have taken it by storm. Starting from artificial intelligence (AI) to web3 gaming, NFTs, and, more recently, real-world asset (RWA) platforms like ETFSwap (ETFS) spearheading the tokenization of assets. Tokenization is the new kid on the block, attracting institutional confidence as […]

    Go here to Read this Fast! ETFSwap (ETFS) Raises $750,000 As Institutional Holders Look To The DeFi Platform For Tokenization

    Originally appeared here:
    ETFSwap (ETFS) Raises $750,000 As Institutional Holders Look To The DeFi Platform For Tokenization

  • Feature Engineering with Microsoft Fabric and Dataflow Gen2

    Feature Engineering with Microsoft Fabric and Dataflow Gen2

    Roger Noble

    Fabric Madness part 3

    Image by author and ChatGPT. “Design an illustration, featuring a Paralympic basketball player in action, this time the theme is on data pipelines” prompt. ChatGPT, 4, OpenAI, 15April. 2024. https://chat.openai.com.

    In the previous post, we discussed how to use Notebooks with PySpark for feature engineering. While spark offers a lot of flexibility and power, it can be quite complex and requires a lot of code to get started. Not everyone is comfortable with writing code or has the time to learn a new programming language, which is where Dataflow Gen2 comes in.

    What is Dataflow Gen2?

    Dataflow Gen2 is a low-code data transformation and integration engine that allows you to create data pipelines for loading data from a wide variety of sources into Microsoft Fabric. It’s based on Power Query, which is integrated into many Microsoft products, such as Excel, Power BI, and Azure Data Factory. Dataflow Gen2 is a great tool for creating data pipelines without code via a visual interface, making it easy to create data pipelines quickly. If you are already familiar with Power Query or are not afraid of writing code, you can also use the underlying M (“Mashup”) language to create more complex transformations.

    In this post, we will walk through how to use Dataflow Gen2 to create the same features needed to train our machine learning model. We will use the same dataset as in the previous post, which contains data about college basketball games.

    Fig. 1 — The final result. Image by author.

    The Challenge

    There are two datasets that we will be using to create our features: the regular season games and the tournament games. These two datasets are also split into the Men’s and Women’s tournaments, which will need to be combined into a single dataset. In total there are four csv files, that need to be combined and transformed into two separate tables in the Lakehouse.

    Using Dataflows there are multiple ways to solve this problem, and in this post I want to show three different approaches: a no code approach, a low code approach and finally a more advanced all code approach.

    The no code approach

    The first and simplest approach is to use the Dataflow Gen2 visual interface to load the data and create the features.

    The Data

    The data we are looking at is from the 2024 US college basketball tournaments, which was obtained from the on-going March Machine Learning Mania 2024 Kaggle competition, the details of which can be found here, and is licensed under CC BY 4.0

    Loading the data

    The first step is to get the data from the Lakehouse, which can be done by selecting the “Get Data” button in the Home ribbon and then selecting More… from the list of data sources.

    Fig. 2 — Choosing a data source. Image by author.

    From the list, select OneLake data hub to find the Lakehouse and then once selected, find the csv file in the Files folder.

    Fig. 3 — Select the csv file. Image by author.

    This will create a new query with four steps, which are:

    • Source: A function that queries the Lakehouse for all the contents.
    • Navigation 1: Converts the contents of the Lakehouse into a table.
    • Navigation 2: Filters the table to retrieve the selected csv file by name.
    • Imported CSV: Converts the binary file into a table.
    Fig. 4 — Initial load. Image by author.

    Now that the data is loaded we can start with some basic data preparation to get it into a format that we can use to create our features. The first thing we need to do is set the column names to be based on the first row of the dataset. This can be done by selecting the “Use first row as headers” option in either the Transform group on the Home ribbon or in the Transform menu item.

    The next step is to rename the column “WLoc” to “location” by either selecting the column in the table view, or by right clicking on the column and selecting “Rename”.

    The location column contains the location of the game, which is either “H” for home, “A” for away, or “N” for neutral. For our purposes, we want to convert this to a numerical value, where “H” is 1, “A” is -1, and “N” is 0, as this will make it easier to use in our model. This can be done by selecting the column and then using the Replace values… transform in the Transform menu item.

    Fig. 5 — Replace Values. Image by author.

    This will need to be done for the other two location values as well.

    Finally, we need to change the data type of the location column to be a Whole number instead of Text. This can be done by selecting the column and then selecting the data type from the drop down list in the Transform group on the Home ribbon.

    Fig. 6 — Final data load. Image by author.

    Instead of repeating the rename step for each of the location types, a little bit of M code can be used to replace the values in the location column. This can be done by selecting the previous transform in the query (Renamed columns) and then selecting the Insert step button in the formula bar. This will add a new step, and you can enter the following code to replace the values in the location column.

    Table.ReplaceValue(#"Renamed columns", each [location], each if Text.Contains([location], "H") then "1" else if Text.Contains([location], "A") then "-1" else "0", Replacer.ReplaceText, {"location"})

    Adding features

    We’ve got the data loaded, but it’s still not right for our model. Each row in the dataset represents a game between two teams, and includes the scores and statistics for both the winning and losing team in a single wide table. We need to create features that represent the performance of each team in the game and to have a row per team per game.

    To do this we need to split the data into two tables, one for the winning team and one for the losing team. The simplest way to do this is to create a new query for each team and then merge them back together at the end. There are a few ways that this could be done, however to keep things simple and understandable (especially if we ever need to come back to this later), we will create two references to the source query and then append them together again, after doing some light transformations.

    Referencing a column can be done either from the Queries panel on the left, or by selecting the context menu of the query if using Diagram view. This will create a new query that references the original query, and any changes made to the original query will be reflected in the new query. I did this twice, once for the winning team and once for the losing team and then renamed the columns by prefixing them with “T1_” and “T2_” respectively.

    Fig. 7 — Split the dataset. Image by author.

    Once the column values are set, we can then combine the two queries back together by using Append Queries and then create our first feature, which is the point difference between the two teams. This can be done by selecting the T1_Score and T2_Score columns and then selecting “Subtract” from the “Standard” group on the Add column ribbon.

    Now that’s done, we can then load the data into the Lakehouse as a new table. The final result should look something like this:

    Fig. 8 — All joined up. Image by author.

    There are a few limitations with the no code approach, the main one is that it’s not easy to reuse queries or transformations. In the above example we would need to repeat the same steps another three times to load each of the individual csv files. This is where copy / paste comes in handy, but it’s not ideal. Let’s look at a low code approach next.

    The low code approach

    In the low code approach we will use a combination of the visual interface and the M language to load and transform the data. This approach is more flexible than the no code approach, but still doesn’t require a lot of code to be written.

    Loading the data

    The goal of the low code approach is to reduce the number of repeated queries that are needed and to make it easier to reuse transformations. To do this we will take advantage of the fact that Power Query is a functional language and that we can create functions to encapsulate the transformations that we want to apply to the data. When we first loaded the data from the Lakehouse there were four steps that were created, the second step was to convert the contents of the Lakehouse into a table, with each row containing a reference to a binary csv file. We can use this as the input into a function, which will load the csv into a new table, using the Invoke custom function transformation for each row of the table.

    Fig. 9 — Lakehouse query with the binary csv files in a column called Content. Image by author.

    To create the function, select “Blank query” from the Get data menu, or right click the Queries panel and select “New query” > “Blank query”. In the new query window, enter the following code:

    (TableContents as binary) =>let
    Source = Csv.Document(TableContents, [Delimiter = ",", Columns = 34, QuoteStyle = QuoteStyle.None]),
    PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
    in
    PromoteHeaders

    The code of this function has been copied from our initial no code approach, but instead of loading the csv file directly, it takes a parameter called TableContents, reads it as a csv file Csv.Document and then sets the first row of the data to be the column headers Table.PromoteHeaders.

    We can then use the Invoke custom function transformation to apply this function to each row of the Lakehouse query. This can be done by selecting the “Invoke custom function” transformation from the Add column ribbon and then selecting the function that we just created.

    Fig. 10 — Invoke custom function. Image by author.

    This will create a new column in the Lakehouse query, with the entire contents of the csv file loaded into a table, which is represented as [Table] in the table view. We can then use the expand function on the column heading to expand the table into individual columns.

    Fig. 11 — Expand columns. Image by author.

    The result effectively combines the two csv files into a single table, which we can then continue to create our features from as before.

    There are still some limitations with this approach, while we’ve reduced the number of repeated queries, we still need to duplicate everything for both the regular season and tournament games datasets. This is where the all code approach comes in.

    The all code approach

    The all code approach is the most flexible and powerful approach, but also requires the most amount of code to be written. This approach is best suited for those who are comfortable with writing code and want to have full control over the transformations that are applied to the data.

    Essentially what we’ll do is grab all the M code that was generated in each of the queries and combine them into a single query. This will allow us to load all the csv files in a single query and then apply the transformations to each of them in a single step. To get all the M code, we can select each query and then click on the Advanced Editor from the Home ribbon, which displays all the M code that was generated for that query. We can then copy and paste this code into a new query and then combine them all together.

    To do this, we need to create a new blank query and then enter the following code:

    (TourneyType as text) => let
    Source = Lakehouse.Contents(null){[workspaceId = "..."]}[Data]{[lakehouseId = "..."]}[Data],
    #"Navigation 1" = Source{[Id = "Files", ItemKind = "Folder"]}[Data],
    #"Filtered rows" = Table.SelectRows(#"Navigation 1", each Text.Contains([Name], TourneyType)),
    #"Invoked custom function" = Table.AddColumn(#"Filtered rows", "Invoked custom function", each LoadCSV([Content])),
    #"Removed columns" = Table.RemoveColumns(#"Invoked custom function", {"Content", "Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path", "ItemKind", "IsLeaf"}),
    #"Expanded Invoked custom function" = Table.ExpandTableColumn(#"Removed columns", "Invoked custom function", {"Season", "DayNum", "WTeamID", "WScore", "LTeamID", "LScore", "WLoc", "NumOT", "WFGM", "WFGA", "WFGM3", "WFGA3", "WFTM", "WFTA", "WOR", "WDR", "WAst", "WTO", "WStl", "WBlk", "WPF", "LFGM", "LFGA", "LFGM3", "LFGA3", "LFTM", "LFTA", "LOR", "LDR", "LAst", "LTO", "LStl", "LBlk", "LPF"}, {"Season", "DayNum", "WTeamID", "WScore", "LTeamID", "LScore", "WLoc", "NumOT", "WFGM", "WFGA", "WFGM3", "WFGA3", "WFTM", "WFTA", "WOR", "WDR", "WAst", "WTO", "WStl", "WBlk", "WPF", "LFGM", "LFGA", "LFGM3", "LFGA3", "LFTM", "LFTA", "LOR", "LDR", "LAst", "LTO", "LStl", "LBlk", "LPF"}),
    #"Renamed columns" = Table.RenameColumns(#"Expanded Invoked custom function", {{"WLoc", "location"}}),
    Custom = Table.ReplaceValue(#"Renamed columns", each [location], each if Text.Contains([location], "H") then "1" else if Text.Contains([location], "A") then "-1" else "0", Replacer.ReplaceText, {"location"}),
    #"Change Types" = Table.TransformColumnTypes(Custom, {{"Season", Int64.Type}, {"DayNum", Int64.Type}, {"WTeamID", Int64.Type}, {"WScore", Int64.Type}, {"LTeamID", Int64.Type}, {"LScore", Int64.Type}, {"location", Int64.Type}, {"NumOT", Int64.Type}, {"WFGM", Int64.Type}, {"WFGA", Int64.Type}, {"WFGM3", Int64.Type}, {"WFGA3", Int64.Type}, {"WFTM", Int64.Type}, {"WFTA", Int64.Type}, {"WOR", Int64.Type}, {"WDR", Int64.Type}, {"WAst", Int64.Type}, {"WTO", Int64.Type}, {"WStl", Int64.Type}, {"WBlk", Int64.Type}, {"WPF", Int64.Type}, {"LFGM", Int64.Type}, {"LFGA", Int64.Type}, {"LFGM3", Int64.Type}, {"LFGA3", Int64.Type}, {"LFTM", Int64.Type}, {"LFTA", Int64.Type}, {"LOR", Int64.Type}, {"LDR", Int64.Type}, {"LAst", Int64.Type}, {"LTO", Int64.Type}, {"LStl", Int64.Type}, {"LBlk", Int64.Type}, {"LPF", Int64.Type}}),
    Winners = Table.TransformColumnNames(#"Change Types", each if Text.StartsWith(_, "W") then Text.Replace(_, "W", "T1_") else Text.Replace(_, "L", "T2_")),
    #"Rename L" = Table.TransformColumnNames(#"Change Types", each if Text.StartsWith(_, "W") then Text.Replace(_, "W", "T2_") else Text.Replace(_, "L", "T1_")),
    #"Replaced Value L" = Table.ReplaceValue(#"Rename L", each [location], each if [location] = 1 then -1 else if Text.Contains([location], -1) then 1 else [location], Replacer.ReplaceValue, {"location"}),
    Losers = Table.TransformColumnTypes(#"Replaced Value L", {{"location", Int64.Type}}),
    Combined = Table.Combine({Winners, Losers}),
    PointDiff = Table.AddColumn(Combined, "PointDiff", each [T1_Score] - [T2_Score], Int64.Type)
    in
    PointDiff

    Note: the Lakehouse connection values have been removed

    What’s happening here is that we’re:

    1. Loading the data from the Lakehouse;
    2. Filtering the rows to only include the csv files that match the TourneyType parameter;
    3. Loading the csv files into tables;
    4. Expanding the tables into columns;
    5. Renaming the columns;
    6. Changing the data types;
    7. Combining the two tables back together;
    8. Calculating the point difference between the two teams.

    Using the query is then as simple as selecting it, and then invoking the function with the TourneyType parameter.

    Fig. 12 — Invoke function. Image by author.

    This will create a new query with the function as it’s source, and the data loaded and transformed. It’s then just a case of loading the data into the Lakehouse as a new table.

    Fig. 13 — Function load. Image by author.

    As you can see, the LoadTournamentData function is invoked with the parameter “RegularSeasonDetailedResults” which will load both the Men’s and Women’s regular season games into a single table.

    Conclusion

    And that’s it!

    Hopefully this post has given you a good overview of how to use Dataflow Gen2 to prepare data and create features for your machine learning model. Its low code approach makes it easy to create data pipelines quickly, and it contains a lot of powerful features that can be used to create complex transformations. It’s a great first port of call for anyone who needs to transform data, but more importantly, has the benefit of not needing to write complex code that is prone to errors, is hard to test, and is difficult to maintain.

    At the time of writing, Dataflows Gen2 are unsupported with the Git integration, and so it’s not possible to version control or share the dataflows. This feature is expected to be released in Q4 2024.

    Originally published at https://nobledynamic.com on April 15, 2024.


    Feature Engineering with Microsoft Fabric and Dataflow Gen2 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Feature Engineering with Microsoft Fabric and Dataflow Gen2

    Go Here to Read this Fast! Feature Engineering with Microsoft Fabric and Dataflow Gen2

  • Callbacks and Pipeline structures in LangChain

    Callbacks and Pipeline structures in LangChain

    Roshan Santhosh

    Learn about the structure of LangChain pipelines, callbacks, how to create custom callbacks and integrate them into your pipelines for improved monitoring

    Callbacks are an important functionality that helps with monitoring/debugging your pipelines. In this note, we cover the basics of callbacks and how to create custom ones for your use cases. More importantly, through examples, we also develop an understanding of the structure/componentization of LangChain pipelines and how that plays into the design of custom callbacks.

    This note assumes basic familiarity with LangChain and how pipelines in LangChain work.

    Basic Structure of Callbacks

    To learn about the basics of callbacks in LangChain, we start with the official documentation where we can find the definition of the BaseCallbackHandler class.

    Image taken from official langchain documentation

    BaseCallbackManager code

    As you can see this is an abstract class that defines quite a few methods to cover various events in your LangChain pipeline. These methods can be grouped together into the following segments :

    1. LLM [start, end, error, new token]
    2. Chain [start, end, error]
    3. Tool [start, end, error]
    4. Agent [action, finish]

    If you have worked with LangChain pipelines before, the methods along with their provided descriptions should be mostly self explanatory. For example, the on_llm_start callback is the event that gets triggered when the LangChain pipeline passes input to the LLM. And that on_llm_end is subsequently triggered when the LLM provides its final output.

    NOTE : There are events triggers that can be used in addition to whats shown above. These can be found here. These cover triggers relating to Retrievers, Prompts, ChatModel etc.

    Understanding how Callbacks work

    Callbacks are a very common programming concept that have been widely used for a while now, so the high level concept of how callbacks work is well understood. So in this post, we focus on the specific nuances of how callbacks work in LangChain and how we could use it to satisfy our specific use cases.

    Keeping in the mind the base Callback class that we saw in the previous section, we explore Callbacks in LangChain through a series of increasingly complex examples and in the process gain a better understanding of the structure of pipelines in LangChain. This would be a top-down approach to learning where we start with examples first and actual definitions later as I found that to be more useful personally for this specific topic.

    Example 1

    We start with a simple dummy chain that has 3 components : 2 prompts and a custom function to join them. I refer to this as a dummy example because its very unlikely that you would need two separate prompts to interact with each other, but it makes for an easier example to start with for understanding callbacks and LangChain pipelines.

    Example 1 : Basic structure of LangChain pipeline

    Implementing this in code would look like :

    Pipeline implementation for Example 1

    The above code is pretty textbook stuff. The only possibly complex piece is the retrieve_text and RunnableLambda function thats being used here. The reason this is necessary is because the format of the output from qa_prompt1 is not compatible with the format of the output required by qa_prompt2.

    Defining the custom Callback

    For our custom callback, we define a new subclass of BaseCallbackHandler called CustomCallback1 which defines the on_chain_start method. The method definition is straightforward as it simply takes the input values passed to it and saves it in 2 specific variables : chain_input and serialized_input

    Invoking the custom callback

    Example 1 : Invoking with pipeline with the custom callback

    The above code shows one of the possible ways to pass your custom callback to your pipeline : As a list of callback objects as the value to a corresponding key of ‘callbacks’. This also makes it easy to guess that you can pass multiple callbacks to your LangChain pipeline.

    Decoding the Callback/Pipeline Structure

    Now comes the interesting part. After we have defined the callbacks and passed it on to our pipeline, we now perform a deep dive into the callback outputs

    We first look at the values stored in chain_input

    Example 1 : Contents of chain_input variable of callback handler

    Observations :

    1. Though there are 3 components in our chain, there are 4 values in chain_input. Which corresponds to the on_chain_start method being triggered 4 times instead of 3.
    2. For the first two chain_input values/ on_chain_start triggers, the input is the same as the user provided input.

    We next look at the outputs of serialized_input

    Observations :

    1. The first component is a RunnableSequence which is a component that wasnt added by the user but was automatically added by LangChain. The rest of the components correspond directly to the user-defined components in the pipeline.
    2. The full contents of serialized_input is extensive! While there is a definite structure to that content, its definitely out of scope for this post and possibly doesnt have much practical implications for an end user.

    How do we interpret these results

    For the most part, the outputs seen in the chain_input and serialized_input make sense. Whether its the input values or the names/IDs of the components. The only largely unknown part is the RunnableSequence component, so we take a closer look at this.

    As I mentioned previously, the full contents of serialized_input is extensive and not easy to digest. So to make things easier, we look at only the high level attributes described in serialized_input and try to intrepret the results through these attributes. For this, we make use of a custom debugging function called getChainBreakdown (code in notebook).

    We call getChainBreakdown on all values of serialized_input and observe the output. Specifically for the first RunnableSequence element, we look at the keys of the kwargs dict : first, midde, last, name.

    On closer inspection of the kwargs argument and their values, we see that they have the same structure as our previous pipeline components. In fact, the first, middle and last components correspond exactly to the user-defined components of the pipeline.

    Closer inspection of RunnableSequence kwargs values

    The above details form the basis of the final conclusion that we make here. That the structure of the pipeline is like shown below :

    Example 1 : Structure of LangChain pipeline

    We do make a bit of a leap here as the above flowchart was confirmed after going through a bunch of examples and observing the format in which these components are created internally by LangChain. So bear with me as we go through these other examples which will solidify the conclusion that we make here.

    With the above defined structure, the other pieces of the puzzle fit together quite well. Focusing on the chain_input values, lets map them to the components (with their ordering) defined above.

    Example 1 : Mapping chain_input values to pipeline components

    Observations :

    1. For RunnableSequence, as it acts like a wrapper for the whole pipeline, the input from the user acts as the input for the RunnableSequence component as well.
    2. For the first ChatPromptTemplate (qa_prompt1), as the first ‘true’ component of the pipeline, it receives the direct input from the user
    3. For RunnableLambda (retrieve_text), it receives as input the output from qa_prompt1, which is a Message object
    4. For the last ChatPromptTemplate (qa_prompt2), it receives as input the output from retrieve_text, which is a dict with ‘prompt’ as its single key

    The above breakdown shows how the structure of the pipeline described above fits perfectly with the data seen in serialized_input and chain_input

    Example 2

    For the next example, we extend Example 1 by adding a LLM as the final step.

    Example 2 : Pipeline definition

    For the callback, since we have now added a LLM into the mix, we define a new custom callback that additionally defines the on_llm_start method. It has the same functionality as on_chain_start where the input arguments are saved into the callback object variables : chain_input and serialized_input

    Example 2 : New custom callback with added on_llm_start method

    Proposing the Pipeline structure

    At this stage, instead of evaluating the callback variables, we switch things up and propose the potential structure of the pipeline. Given what we had learnt from the first example, the following should be the potential structure of the pipeline

    Example 2 : Proposed structure of pipeline

    So we would have a RunnableSequence component as a wrapper for the pipeline. And additionally include a new ChatOpenAI object thats nested within the RunnableSequence component.

    Validating proposed structure using data

    We now look at the values of in the callback object to validate the above proposed structure.

    We first look at the values stored in chain_input

    Example 2 : chain_input values

    And then the serialized_input values :

    Example 2 : serialized_input values

    As well as a deeper inspection of the RunnableSequence components

    Example 2 : Closer inspection of RunnableSequence kwargs values

    Observations :

    1. The values of serialized_input validate the activation/trigger sequence that was proposed in the pipeline structure : RunnableSequence -> ChatPromptTemplate(qa_prompt1) -> RunnableLambda(retrieve_text) -> ChatPromptTemplate(qa_prompt2) -> ChatOpenAI
    2. The values of chain_input also map correctly to the proposed structure. The only new addition is the fifth entry, which corresponds to the output from qa_prompt2, which is fed as input to the ChatOpenAI object
    3. The components of the RunnableSequence kwargs also verify the proposed structure as the new ‘last’ element is the ChatOpenAI object

    By this stage, you should have an intuitive understanding of how LangChain pipelines are structured and when/how different callback events are triggered.

    Though we have only focused on Chain and LLM events so far, these translate well to the other Tool and Agent triggers as well

    Example 3

    For the next example, we progress to a more complex chain involving a parallel implementation (RunnableParallel)

    Chain/Callback Implementation

    The chain has a parallel implementation as its first block which computes two values : context and question, which are then passed on to a prompt template to create the final prompt. The parallel functionality is required because we need to pass both context and question to the prompt template at the same time, where the context is retrived from a different source while the question is provided by the user.

    For the context value, we use a static function get_data that returns the same piece of text (this is a dummy version of an actual retriever used in RAG applications).

    Example 3 : Chain implementation

    For the callback implementation, we use the same callback as the first example, CustomCallback1

    Decoding the Callback/Pipeline Structure

    Similar to previous examples, we start by looking at the outputs of chain_input and serialized_input

    Example 3 : chain_input values
    Example 3 : serialized_input values

    We also look do a deep dive into the RunnableSequence (index 0) and RunnableParallel (index 1) components

    Observations :

    1. Consistent with previous examples, the RunnableSequence acts as a wrapper to the whole pipeline. Its first component is the RunnableParallel component and its last component is the ChatPromptTemplate component
    2. The RunnableParallel in turn encompasses two components : the RunnablePassthrough and the RunnableLambda (get_data).
    3. The inputs to the first 4 components : RunnableSequence, RunnableParallel, RunnablePassthrough and RunnableLambda (get_data) are the same : the provided user input. Only for the final ChatPromptTemplate component do we have a different input, which is a dict with question and context keys.

    Based on these observations, we can infer the final structure of the pipeline as such :

    Example 3 : Structure of LangChain pipeline

    Example 4

    Same as Example 3, but with an additional processing function for retrieving context

    Chain/Callback Implementation

    Example 4 : Chain implementation

    Decoding the Callback/Pipeline Structure

    Similar to previous examples, we again look at the usual data points

    Example 4 : chain_input values
    Example 4 : serialized_input values

    We observe that there are now 2 RunnableSequence components in our pipeline. So for the next step, we deep dive into both of these RunnableSequence components to see its internal components

    Observations :

    1. For the first RunnableSequence components, its components are the same as the previous example. Starts with RunnableParallel and ends with ChatPromptTemplate
    2. For the second RunnableSequence, its first component is the RunnableLambda (get_data) component and the last component is the RunnableLambda (format_docs) component. This is basically the part of the pipeline responsible for generating the ‘context’ value. So its possible for a LangChain pipeline to have multiple RunnableSequence components to it. Especially when you are creating ‘sub-pipelines’

    In this case, the creation of the ‘context’ value can be considered a pipeline by itself as it involves 2 different components chained together. So any such sub-pipelines in your primary pipeline will be wrapped up by a RunnableSequence component

    3. The values from chain_input also match up well with the pipeline components and their ordering (Not going to breakdown each component’s input here as it should be self-explanatory by now)

    So based on the above observations, the following is the identified structure of this pipeline

    Example 4 : Structure of LangChain pipeline

    Conclusion

    The objective of this post was to help develop an (intuitive) understanding of how LangChain pipelines are structured and how callback triggers are associated with the pipeline.

    By going through increasingly complex chain implementations, we were able to understand the general structure of LangChain pipelines and how a callback can be used for retrieving useful information. Developing an understanding of how LangChain pipelines are structured will also help facilitate the debugging process when errors are encountered.

    A very common use case for callbacks is retrieving intermediate steps and through these examples we saw how we can implement custom callbacks that track the input at each stage of the pipeline. Add to this our understanding of the structure of the LangChain pipelines, we can now easily pinpoint the input to each component of the pipeline and retrieve it accordingly.

    Resources

    Notebook with code/examples : Contains few additional examples not covered in this note.

    Unless specified otherwise, all images are created by the author.

    In addition to Medium, I share my thoughts, ideas and other updates on Linkedin.


    Callbacks and Pipeline structures in LangChain was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Callbacks and Pipeline structures in LangChain

    Go Here to Read this Fast! Callbacks and Pipeline structures in LangChain