Tag: AI

Pandas: My Experience Contributing to a Major Open Source Project

Mike Clayton

It just might be worth you contributing too

Continue reading on Towards Data Science »

Originally appeared here:
Pandas: My Experience Contributing to a Major Open Source Project

Go Here to Read this Fast! Pandas: My Experience Contributing to a Major Open Source Project

April 19, 2024
Information Rationalization in Large Organizations
Ramkumar K
How can we use clustering techniques to combine and refactor a large number of disparate dashboards?

Photo by Luke Chesser on Unsplash

Background

Organizations generate voluminous amounts of data on a daily basis. Dashboards are built to analyze this data and derive meaningful business insights as well as to track KPIs. Over time, we find ourselves with hundreds (or maybe more) of such dashboards. Oftentimes, this phenomenon of Dashboard Proliferation is driven by multiple groups developing their own analyses in silos without visibility into analytics that may already be available. For example, sales teams may create a dashboard to track their forecasts vs. actual sales not knowing that a forecast accuracy dashboard is currently available with the supply chain team. Not only does this result in duplication of efforts but we may end up with common metrics and analyses across multiple dashboards with no single source of truth due to differing data sources and assumptions.

As teams come together to provide updates on the state of the business to senior management, they usually spend several person-hours reconciling differing metrics to provide a clear and coherent message to leadership. Let’s say we have a 1000-person organization, where it takes an average of 2 hours per week spent by each person to reconcile data from different reports. This amounts to 100,000 person-hours annually assuming 50 weeks/yr. If we assume an average employee compensation at $50/hour, the yearly cost of this reconciliation runs up to $5MM. While the numbers above are hypothetical, they aren’t un-representative of operations and planning teams in large Fortune 100 companies. In this post, we will go over an approach to consolidate dashboards and reports to save time and effort in reconciling KPIs.

Bridging Reporting Differences within an Organization

Photo by Brett Jordan on Unsplash

Even metrics that ostensibly have the same definition may not match when reported by different teams in different reports. To bridge these differences, we may consider one of the following approaches to reconcile metrics and analyses across dashboards and reports:

(i) Tracking — this approach would entail keeping the dashboards as they are but creating a new directory to keep track of all dashboards. This directory will list metrics found in each dashboard along with their definitions, estimation assumptions, and input data sources.

(ii) Elimination — as the name suggests, we identify rarely used dashboards and work with their developers and users that have used the dashboards over the past 12 months. We may want to deprecate dashboards/reports after transferring relevant KPIs to other dashboards as needed.

(iii) Consolidation — here we would create a reduced number of dashboards by combining similar metrics into a single dashboard.

While we recommend elimination of sparsely utilized dashboards to the extent possible, we may still be left with many dashboards overlapping in metrics and purpose after this removal. This post will focus on an approach to merge together dashboards with similar content to yield a rationalized list. The proposed solution considers consolidation as a two-part problem. First, we identify which dashboards can be grouped together, following which we determine how to combine the dashboards within each cluster.

Considerations for Grouping Dashboards

Photo by Kier in Sight Archives on Unsplash

A simple mental model is to group together similar dashboards. This similarity can be measured across multiple dimensions as noted below:

(a) Metrics — arguably the most important criterion. This includes all the entities exposed to the users from a given dashboard. For example, forecast accuracy, forecasts, and historical actuals may be key metrics in a demand planning dashboard. As metrics are a function of the purpose of a dashboard, grouping dashboards with similar metrics aligns the purpose of reporting as well.

(b) User Personas — the different roles within an organization that may use a dashboard on a regular basis. This is more of a secondary consideration when combining dashboards.

(c) Filters — the granularity at which the metrics, analyses and insights are available can be another consideration. As we combine dashboards, we need to ensure that the legacy granularities that support business decision-making are not lost.

(d) Input Data Sources — this may be a minor factor in deciding which dashboards to combine. All else remaining equal, it may be worthwhile grouping dashboards that derive data from the same sources for ease of integration.

All the afore-mentioned factors may not be equally significant. As such, we may need to give unequal weightage to each factor with potentially the highest to metrics and lowest to input data sources.

Dashboard Clustering

Photo by NASA Hubble Space Telescope on Unsplash

This is a critical step of the overall endeavor as it determines the effort needed to integrate the dashboards within a cluster into a single entity. The more disparate the dashboards within a cluster, the more time and effort needed to combine them into a single unit. We’ll walk through a case study, where we want to consolidate seven dashboards (shown in Figure. 1) into 2–3 groups.

Figure 1. List of seven dashboards to be clustered into 2–3 groups. Image created by author

A series of steps is recommended for the clustering:

1) Understand the purpose of each dashboard by talking with current users and developers. This voice of customer is crucial to capture at an early stage to facilitate adoption of the consolidated dashboards. We may also unearth new information about the dashboards and be able to update our initial assumptions and definitions.

2) Assign weights to the different dimensions — for instance, we may want to assign a higher weightage to metrics over the other factors. In our example above, we give metrics a 2x weightage vs. the others.

3) Convert the information into a dataframe conducive for applying clustering techniques. Figure 2 shows the dataframe for our case study accounting for the appropriate weights across dimensions.

Figure 2. Dataframe representation of list of dashboards in case study. Image created by author

4) Apply a standard clustering approach after removing the names of the dashboards. Figure 3 shows the dendrogram output from hierarchical clustering with Euclidean distance and Average linkage. If we overlay the dashed green line, it produces 3 clusters with the dashboards in our example {A, F}, {G, B, C, D}, {E}.

5) Iterate on number of clusters to arrive at a set of balanced clusters that make business sense.

Figure 3. Dendrogram from hierarchical clustering of dashboards in case study. Image created by author

A caveat here is that a given metric may be a part of different dashboards across multiple clusters. We can either document this occurrence to inform users or we could remove the metric based on business judgment from K-1 dashboards, where K is the total number of clusters where the metric appears. However, this type of judgment-based elimination can be sub-optimal.

One other challenge with a traditional clustering approach is that it may not group dashboards that are subsets of other dashboards in the same cluster. For example, Dashboard A is a subset of Dashboard E as can be seen in Figure 1 (i.e. metrics, user personas, filers and data sources in Dashboard A are also present in Dashboard E) but they are grouped in different clusters (Figure 3). The idea behind capturing subsets is to eliminate them since an alternate (superset) dashboard is available that also exposes the same metrics along with others to users. To mitigate this issue, we propose an alternate clustering algorithm to help group together subsets.

A New Algorithm to Merge Subsets

Photo by Lance Grandahl on Unsplash

In this approach, we treat each dashboard as a list where each of the metrics, filters, user personas, and input data sources are categorical elements of the list. The idea is to create correlation indicators between the dashboards accounting for common elements and cluster together dashboards (or groups of dashboards), where the correlation exceeds a user-defined threshold. The steps in the algorithm are as follows:
1. Separate the metrics, user personas, filters, and input data sources for each dashboard as we may need to assign different weights to each of these dimensions. The set Dimension = {metrics, user personas, filters, input data sources}, while dim ∈ Dimension is used to index each element of this set.
2. Select a pair of dashboards and label one of them D1 and the other D2.
3. Count the number of elements within each dimension across the two dashboards: N(dim, D1) and N(dim, D2).
4. Count the number of common elements within each dimension between D1 and D2 represented by cm(dim, D1, D2).
5. Estimate two correlation indicators: Corr1(dim, D1, D2) = cm(dim, D1, D2)/N(dim, D1) and Corr2(dim, D1, D2) = cm(dim, D1, D2)/N(dim, D2). If one of D1 or D2 is a subset of the other, Corr1 or Corr2 will equal 1.0.
6. Calculate overall correlation factors using user-assigned weights for each dimension: Total_Corr1(D1, D2) = sum{dim, weight(dim)*Corr1(dim, D1, D2)} and Total_Corr2(D1, D2) = sum{dim, weight(dim)*Corr2(dim, D1, D2)}.
7. Get the maximum and minimum value between Total_Corr1 and Total_Corr2: Max_Corr(D1, D2) = maximum{Total_Corr1(D1, D2), Total_Corr2(D1, D2)} and Min_Corr(D1, D2) = minimum{ Total_Corr1(D1, D2), Total_Corr2(D1, D2)}.
8. Repeat steps 2–7 for all possible combinations of D1 and D2, where D1 and D2 are not the same dashboard.
9. Add Max_Corr and Min_Corr values to two separate lists.
10. Sort the list with Max_Corr values in descending order. If multiple dashboard pairs have the same Max_Corr values, then the pair with the lower Min_Corr value takes precedence. This approach is an approximation to ensure dashboards with potentially larger number of elements are prioritized.
11. Select a threshold for Max_Corr above which dashboards can be combined.
12. As we move through each pair D1 and D2 in the list of Max_Corr values in descending order, we may encounter four scenarios: (i) D1 and D2 are currently part of the same cluster: no action needed, (ii) D1 and D2 are currently not part of any cluster: if Max_Corr(D1, D2) >= Threshold, D1 and D2 should be clustered together, (iii) D1 is part of a cluster but D2 is not: combine the elements across all dimensions of all dashboards in the cluster with D1 to create a hypothetical single dashboard. Estimate value of Max_Corr between D2 and the newly created larger dashboard. If it is higher than the threshold, add D2 to the cluster with D1, (iv) D1 and D2 are both part of separate clusters: create two hypothetical dashboards by collating the elements in the dashboards within each cluster. Establish Max_Corr between the two new dashboards and check against the threshold. If Max_Corr >= Threshold, merge the two existing clusters that subsume D1 and D2.
13. Repeat Step 12 until we fully traverse through the entire list of Max_Corr values.
We apply Steps 2–10 to our earlier example to estimate the correlations (Figure 4) between Dashboard A and all other dashboards to test if the approach works as intended. We see that the highest value of Max_Corr is between Dashboard A and Dashboard E along expected lines as Dashboard A is a subset of Dashboard E across all dimensions.

Figure 4. Correlations between Dashboard A and other dashboards in case study. Image created by author

This new algorithm will yield a set of clusters with similar dashboards across the dimensions outlined in the earlier sections. The choice of threshold determines the number of clusters and count of dashboards within each. We may want to test multiple thresholds iteratively until we find the clusters to be closely aligned to business expectations. In general, this approach works well for clustering entities in any categorical dataframe.

Consolidation within a Cluster

Photo by Pankaj Patel on Unsplash

Once we have a set of acceptable clusters, we need to combine each collection into a single dashboard. We typically rely on guidance from software engineering teams on the merging of dashboards. One option may be to create a new dashboard to capture all metrics and filters within a cluster. If starting from scratch is not the recommended solution, we need to pick one dashboard from each cluster and add information to it from other dashboards in the cluster. To minimize re-work, we would want to sort the dashboards in order of increasing complexity and use the most complex one as the base. We would add data from the other dashboards to this base starting with the next most complex dashboard, while avoiding duplication of information as we go through the dashboards in the cluster. Ordering is important to avoid inefficiencies in the process of combining dashboards. Complexity is best defined by the software development engineers — this can be lines of code, number of functions, or any other criteria. If complexity is around the same across all dashboards, then we may want to consider the following hierarchy to sort:

Number of metrics >> Number of filters/views >> Number of input data sources >> Number of users

For instance, if the complexity is around the same across dashboards, we would use the dashboard with most metrics as our base and add metrics from other dashboards to it.

To Summarize…

Photo by BoliviaInteligente on Unsplash

Unstructured growth of data is a common challenge that is not limited to a single industry, organization or business line. Teams spend countless hours trying to reconcile data from different sources. Even seemingly similar metrics may have been built for different purposes with different assumptions. These situations are not uncommon because dashboard and report creation are typically decentralized where any employee can use an organization’s data to build their own views. Combining dashboards with similar metrics can help alleviate the time and effort spent in data reconciliation by 1) reducing the number of entities to consider while bridging, and 2) driving towards a single source of truth for many metrics.

Traditional clustering algorithms can help decide which dashboards should go together, but we may need to consider custom approaches as described in this post to group together subsets of dashboards. Eventually, the solution to this issue of data reconciliation is to develop a mechanism to create reports in a centralized fashion. If that isn’t organizationally feasible, an advanced solution to this problem of manual reconciliation would be to use a Generative AI framework to sift through multiple reports and dashboards within an organization’s repository and provide the bridges between metrics highlighting key drivers behind the differences.

Thanks for reading. Hope you found it useful. Feel free to send me your comments at [email protected]. Let’s connect on LinkedIn

Information Rationalization in Large Organizations was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Information Rationalization in Large Organizations

Go Here to Read this Fast! Information Rationalization in Large Organizations
April 19, 2024
Calculating the previous value in Power BI

Salvatore Cagliari

Calculating the consumption based on meter data looks easy. However, complex situations can be challenging. Let’s see how we can solve…

Continue reading on Towards Data Science »

Originally appeared here:
Calculating the previous value in Power BI

Go Here to Read this Fast! Calculating the previous value in Power BI

April 19, 2024
The Future of Robotic Assembly

Nikolaus Correll

Since the introduction of mass production in 1913 assembly lines are still mostly human — humanoids might change this

Continue reading on Towards Data Science »

Originally appeared here:
The Future of Robotic Assembly

Go Here to Read this Fast! The Future of Robotic Assembly

April 19, 2024
Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Rajesh Ramchander

We are excited to announce a new version of the Amazon SageMaker Operators for Kubernetes using the AWS Controllers for Kubernetes (ACK). ACK is a framework for building Kubernetes custom controllers, where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like buckets, databases, or message queues […]

Originally appeared here:
Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Go Here to Read this Fast! Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

April 19, 2024
3 Best Practices for Bridging the Gap Between Engineers and Analysts

Madison Schott

Assigning code owners, hiring analytics engineers, and creating flywheels

Continue reading on Towards Data Science »

Originally appeared here:
3 Best Practices for Bridging the Gap Between Engineers and Analysts

Go Here to Read this Fast! 3 Best Practices for Bridging the Gap Between Engineers and Analysts

April 19, 2024
7 Subscriptions That Help Me As A Data Scientist

Egor Howell

Subscriptions that boost my productivity, knowledge, and focus as a practicing data scientist

Continue reading on Towards Data Science »

Originally appeared here:
7 Subscriptions That Help Me As A Data Scientist

Go Here to Read this Fast! 7 Subscriptions That Help Me As A Data Scientist

April 19, 2024
Merging tokens to accelerate LLM inference with SLERP
Samuel Chaineau
We can significantly accelerate LLMs next token generation by merging consecutive pairs of tokens using SLERP, reducing the computing power needed to perform the full prediction.

Photo by Martin Martz on Unsplash

TL;DR:

This article presents a novel approach to accelerating Large Language Models (LLMs) inference by merging tokens using Spherical Linear Interpolation (SLERP). By reducing the sequence length while maintaining quality, this technique offers significant speed-ups in LLM inference, addressing the computational challenges posed by longer sequences. The method is still raw but highlights a dual world for LLM with one set up for training and one for predicting.

Background:

LLMs have revolutionized natural language processing tasks by exhibiting remarkable generative abilities. However, their effectiveness comes at a cost — computational resources. As LLMs process longer sequences, the quadratic scaling of transformer computations becomes increasingly prohibitive. Traditional methods to mitigate this, such as caching and quantization, have limitations. Therefore, there is a need for innovative approaches to speed up LLM inference without compromising too much quality.

The current method to generate a token during inference is a brute force approach, essentially a transposition of the training methodology. While this methodology has proven effective for training, it may not be the most efficient for inference tasks. Thus, there is an opportunity to develop a new inference methodology dedicated specifically to generating tokens during inference, which could optimize the process and further enhance the efficiency of LLMs. This highlights the importance of exploring alternative techniques to address the computational challenges faced by LLM inference.

Recently, the mergekit library proposed to merge networks’ weights using the SLERP methods which tends to yield better results. Inspired by this work, I decided to see if could merge the tokens inside a sequence to produce a smaller sequence to process while predicting the next token.

Vanilla generation vs merged one

Merging Sequence with SLERP:

The proposed method involves modifying the forward pass of LLMs to merge tokens using Spherical Linear Interpolation (SLERP), a technique borrowed from computer graphics and animation. Unlike simple averaging techniques, SLERP preserves the spherical aspects of token dimensions, offering a more nuanced interpolation. The merging procedure entails several steps to efficiently condense the input sequence:

Sequence Length Adjustment:

Initially, the input sequence undergoes adjustments based on its length:
- Sequences with a length less than 3 remain unchanged.
- For odd-length sequences, two null tokens are added, one at the beginning and one at the end
- Even-length sequences receive an additional null token, positioned at the penultimate position.
By doing so, we ensure that the first and last token in the context are preserved.

Pair Formation:

The adjusted sequence is then formatted into pairs of consecutive tokens. This process prepares the data for aggregation.

Aggregation with SLERP:

Each pair of tokens undergoes aggregation using SLERP, effectively reducing the sequence length by half (not exactly as we add and preserve some extra tokens). SLERP interpolates between the two vectors representing consecutive tokens. This creates a new vector.

To do so efficiently, I recreated all the SLERP functions in native pytorch. However, the code might be under optimized.

Layer Cutoff and Prompt Preservation:

The merging process can occur at different levels of the model architecture, referred to as “layer cutoff.” Additionally, to preserve the integrity of prompts, a portion of the sequence at the beginning and/or end can be designated to remain unchanged. This is particularly useful for Instruct-based Models where the starting part of the prompt should always be remembered.

This innovative approach offers a nuanced solution to the computational challenges associated with LLM inference, promising significant speed-ups without sacrificing quality or accuracy.

Simple illustrative exemple of how to merge a sequence

What it means ?

Concretely, in a LLM, the forward call takes as input a sequence of token of shape (batch_size, sequence length). The embedding layer creates a sequence of shape (batch size, sequence length, dimension). Each attention module takes this sequence as input. At a given attention layer, you can merge the tokens creating a sequence of shape (batch size, k, dimension) where k is the compressed sequence length. The choice of the layer where to apply this is the “layer cutoff”.

The next attention modules will no longer need to compute a (sequence length, sequence length) attention matrix but a smaller one as k is strictly inferior to the original sequence length.

Hence, the merging could occur at different level of the model architecture. This parameter is referred as “layer cutoff”. Also, to ensure that a prompt is not completely merged, you can define a part of the sequence at the beginning and/or at the end to be kept unchanged. It is more efficient for Instruct-based Models where the starting part of the prompt should be always reminded.

One downside of this methodology is that it strongly relies on the underlying forward pass of the used model, requiring you to carefully rewrite the “merged” process depending on the chosen model. Another downside is the necessity of recomputing attention masks and possibly positional embeddings at each step.

Results:

Experiments conducted on a Mistral 7B Instruct V0.2 model demonstrate promising outcomes. By comparing predictions between the base model and various merged models at different layer cutoffs, it was observed that merging tokens did not significantly impact prediction quality. Moreover, the merged models exhibited notable speed-ups in inference time, particularly at shallower layers. The technique also showcased its effectiveness in handling longer sequences, making it applicable across a wide range of use cases.

Accuracy between the merged inference model predicted token and the base one for different layer cut and sequence lengths

How many time faster is the merged inference model vs the base one for different sequence length and layer cut

One downside is that I did not succeed at making the forward call the most optimized. Hence, there are probably many optimizations to find by rethinking the process.

I also tested a merged version of Mistral Instruct v0.2 on the AlpacaEval dataset. I apply the merging at the 20th attention module. The results are really encouraging as the models outperforms Falcon 7B, Gemma 7B and nous-hermes-13b. It shows that merging without rethinking the positional encodings returns a model that speaks more with an increase of 600 tokens generated on average. I tried to reimplement the positional encoding procedure but failed.

Extract from the leaderboard

In summary, merging tokens with SLERP is a strong candidate solution to the computational challenges associated with LLM inference. By striking a balance between speed and quality, this approach is just about rewriting the forward loop.

Using it:

I prepared a repo with a simple notebook to play with it here : https://github.com/samchaineau/llm_slerp_generation

Using a new class where the foraward call is adapted, you can easily pass the LLM to a generation pipeline and use it on your dataset. So far my experiments are limited to a Mistral 7B model but I would like to extend it to other architectures to see whether the performances maintain.

All of the resources are in and you can reach out to me if you would like to test it on another LLM.

Conclusion:

The merging tokens with SLERP technique should be explored for accelerating LLM inference. With further optimization and exploration, it holds the potential to improve the efficiency and scalability of natural language processing tasks.

If you work in the AI field and are willing to bring this to the next level : reach out to me !

Github link : https://github.com/samchaineau/llm_slerp_generation

HuggingFace profile : https://huggingface.co/samchain

Works that are related and inspiring :

– Token Merging Stable Diffusion (paper) : https://arxiv.org/abs/2303.17604

– Token Merging Stable Diffusion (library) : https://huggingface.co/docs/diffusers/optimization/tome

– Token Merging NLP (paper) : https://llm-random.github.io/posts/mixture_of_tokens/

Unless otherwise noted, all images are by the author.

Merging tokens to accelerate LLM inference with SLERP was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Merging tokens to accelerate LLM inference with SLERP

Go Here to Read this Fast! Merging tokens to accelerate LLM inference with SLERP
April 19, 2024
Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

Antara Raisa

In Part 1 of this series, we presented a solution that used the Amazon Titan Multimodal Embeddings model to convert individual slides from a slide deck into embeddings. We stored the embeddings in a vector database and then used the Large Language-and-Vision Assistant (LLaVA 1.5-7b) model to generate text responses to user questions based on […]

Originally appeared here:
Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

Go Here to Read this Fast! Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

April 19, 2024
Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

Matthew Welborn

This is a guest post co-written with the leadership team of Iambic Therapeutics. Iambic Therapeutics is a drug discovery startup with a mission to create innovative AI-driven technologies to bring better medicines to cancer patients, faster. Our advanced generative and predictive artificial intelligence (AI) tools enable us to search the vast space of possible drug […]

Originally appeared here:
Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

Go Here to Read this Fast! Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

April 19, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Tag: AI

How can we use clustering techniques to combine and refactor a large number of disparate dashboards?

Background

Bridging Reporting Differences within an Organization

Considerations for Grouping Dashboards

Dashboard Clustering

A New Algorithm to Merge Subsets

Consolidation within a Cluster

To Summarize…

We can significantly accelerate LLMs next token generation by merging consecutive pairs of tokens using SLERP, reducing the computing power needed to perform the full prediction.

TL;DR:

Background:

Merging Sequence with SLERP:

Results:

Using it:

Conclusion:

Works that are related and inspiring :