Tag: AI

  • AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks

    AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks

    Google AI

    Time series problems are ubiquitous, from forecasting weather and traffic patterns to understanding economic trends. Bayesian approaches start with an assumption about the data’s patterns (prior probability), collecting evidence (e.g., new time series data), and continuously updating that assumption to form a posterior probability distribution. Traditional Bayesian approaches like Gaussian processes (GPs) and Structural Time Series are extensively used for modeling time series data, e.g., the commonly used Mauna Loa CO2 dataset. However, they often rely on domain experts to painstakingly select appropriate model components and may be computationally expensive. Alternatives such as neural networks lack interpretability, making it difficult to understand how they generate forecasts, and don’t produce reliable confidence intervals.

    To that end, we introduce AutoBNN, a new open-source package written in JAX. AutoBNN automates the discovery of interpretable time series forecasting models, provides high-quality uncertainty estimates, and scales effectively for use on large datasets. We describe how AutoBNN combines the interpretability of traditional probabilistic approaches with the scalability and flexibility of neural networks.

    AutoBNN

    AutoBNN is based on a line of research that over the past decade has yielded improved predictive accuracy by modeling time series using GPs with learned kernel structures. The kernel function of a GP encodes assumptions about the function being modeled, such as the presence of trends, periodicity or noise. With learned GP kernels, the kernel function is defined compositionally: it is either a base kernel (such as Linear, Quadratic, Periodic, Matérn or ExponentiatedQuadratic) or a composite that combines two or more kernel functions using operators such as Addition, Multiplication, or ChangePoint. This compositional kernel structure serves two related purposes. First, it is simple enough that a user who is an expert about their data, but not necessarily about GPs, can construct a reasonable prior for their time series. Second, techniques like Sequential Monte Carlo can be used for discrete searches over small structures and can output interpretable results.

    AutoBNN improves upon these ideas, replacing the GP with Bayesian neural networks (BNNs) while retaining the compositional kernel structure. A BNN is a neural network with a probability distribution over weights rather than a fixed set of weights. This induces a distribution over outputs, capturing uncertainty in the predictions. BNNs bring the following advantages over GPs: First, training large GPs is computationally expensive, and traditional training algorithms scale as the cube of the number of data points in the time series. In contrast, for a fixed width, training a BNN will often be approximately linear in the number of data points. Second, BNNs lend themselves better to GPU and TPU hardware acceleration than GP training operations. Third, compositional BNNs can be easily combined with traditional deep BNNs, which have the ability to do feature discovery. One could imagine “hybrid” architectures, in which users specify a top-level structure of Add(Linear, Periodic, Deep), and the deep BNN is left to learn the contributions from potentially high-dimensional covariate information.

    How might one translate a GP with compositional kernels into a BNN then? A single layer neural network will typically converge to a GP as the number of neurons (or “width”) goes to infinity. More recently, researchers have discovered a correspondence in the other direction — many popular GP kernels (such as Matern, ExponentiatedQuadratic, Polynomial or Periodic) can be obtained as infinite-width BNNs with appropriately chosen activation functions and weight distributions. Furthermore, these BNNs remain close to the corresponding GP even when the width is very much less than infinite. For example, the figures below show the difference in the covariance between pairs of observations, and regression results of the true GPs and their corresponding width-10 neural network versions.

    Comparison of Gram matrices between true GP kernels (top row) and their width 10 neural network approximations (bottom row).
    Comparison of regression results between true GP kernels (top row) and their width 10 neural network approximations (bottom row).

    Finally, the translation is completed with BNN analogues of the Addition and Multiplication operators over GPs, and input warping to produce periodic kernels. BNN addition is straightforwardly given by adding the outputs of the component BNNs. BNN multiplication is achieved by multiplying the activations of the hidden layers of the BNNs and then applying a shared dense layer. We are therefore limited to only multiplying BNNs with the same hidden width.

    Using AutoBNN

    The AutoBNN package is available within Tensorflow Probability. It is implemented in JAX and uses the flax.linen neural network library. It implements all of the base kernels and operators discussed so far (Linear, Quadratic, Matern, ExponentiatedQuadratic, Periodic, Addition, Multiplication) plus one new kernel and three new operators:

    • a OneLayer kernel, a single hidden layer ReLU BNN,
    • a ChangePoint operator that allows smoothly switching between two kernels,
    • a LearnableChangePoint operator which is the same as ChangePoint except position and slope are given prior distributions and can be learnt from the data, and
    • a WeightedSum operator.

    WeightedSum combines two or more BNNs with learnable mixing weights, where the learnable weights follow a Dirichlet prior. By default, a flat Dirichlet distribution with concentration 1.0 is used.

    WeightedSums allow a “soft” version of structure discovery, i.e., training a linear combination of many possible models at once. In contrast to structure discovery with discrete structures, such as in AutoGP, this allows us to use standard gradient methods to learn structures, rather than using expensive discrete optimization. Instead of evaluating potential combinatorial structures in series, WeightedSum allows us to evaluate them in parallel.

    To easily enable exploration, AutoBNN defines a number of model structures that contain either top-level or internal WeightedSums. The names of these models can be used as the first parameter in any of the estimator constructors, and include things like sum_of_stumps (the WeightedSum over all the base kernels) and sum_of_shallow (which adds all possible combinations of base kernels with all operators).

    Illustration of the sum_of_stumps model. The bars in the top row show the amount by which each base kernel contributes, and the bottom row shows the function represented by the base kernel. The resulting weighted sum is shown on the right.

    The figure below demonstrates the technique of structure discovery on the N374 (a time series of yearly financial data starting from 1949) from the M3 dataset. The six base structures were ExponentiatedQuadratic (which is the same as the Radial Basis Function kernel, or RBF for short), Matern, Linear, Quadratic, OneLayer and Periodic kernels. The figure shows the MAP estimates of their weights over an ensemble of 32 particles. All of the high likelihood particles gave a large weight to the Periodic component, low weights to Linear, Quadratic and OneLayer, and a large weight to either RBF or Matern.

    Parallel coordinates plot of the MAP estimates of the base kernel weights over 32 particles. The sum_of_stumps model was trained on the N374 series from the M3 dataset (insert in blue). Darker lines correspond to particles with higher likelihoods.

    By using WeightedSums as the inputs to other operators, it is possible to express rich combinatorial structures, while keeping models compact and the number of learnable weights small. As an example, we include the sum_of_products model (illustrated in the figure below) which first creates a pairwise product of two WeightedSums, and then a sum of the two products. By setting some of the weights to zero, we can create many different discrete structures. The total number of possible structures in this model is 216, since there are 16 base kernels that can be turned on or off. All these structures are explored implicitly by training just this one model.

    Illustration of the “sum_of_products” model. Each of the four WeightedSums have the same structure as the “sum_of_stumps” model.

    We have found, however, that certain combinations of kernels (e.g., the product of Periodic and either the Matern or ExponentiatedQuadratic) lead to overfitting on many datasets. To prevent this, we have defined model classes like sum_of_safe_shallow that exclude such products when performing structure discovery with WeightedSums.

    For training, AutoBNN provides AutoBnnMapEstimator and AutoBnnMCMCEstimator to perform MAP and MCMC inference, respectively. Either estimator can be combined with any of the six likelihood functions, including four based on normal distributions with different noise characteristics for continuous data and two based on the negative binomial distribution for count data.

    Result from running AutoBNN on the Mauna Loa CO2 dataset in our example colab. The model captures the trend and seasonal component in the data. Extrapolating into the future, the mean prediction slightly underestimates the actual trend, while the 95% confidence interval gradually increases.

    To fit a model like in the figure above, all it takes is the following 10 lines of code, using the scikit-learn–inspired estimator interface:

    import autobnn as ab
    
    model = ab.operators.Add(
        bnns=(ab.kernels.PeriodicBNN(width=50),
              ab.kernels.LinearBNN(width=50),
              ab.kernels.MaternBNN(width=50)))
    
    estimator = ab.estimators.AutoBnnMapEstimator(
        model, 'normal_likelihood_logistic_noise', jax.random.PRNGKey(42),
        periods=[12])
    
    estimator.fit(my_training_data_xs, my_training_data_ys)
    low, mid, high = estimator.predict_quantiles(my_training_data_xs)
    

    Conclusion

    AutoBNN provides a powerful and flexible framework for building sophisticated time series prediction models. By combining the strengths of BNNs and GPs with compositional kernels, AutoBNN opens a world of possibilities for understanding and forecasting complex data. We invite the community to try the colab, and leverage this library to innovate and solve real-world challenges.

    Acknowledgements

    AutoBNN was written by Colin Carroll, Thomas Colthurst, Urs Köster and Srinivas Vasudevan. We would like to thank Kevin Murphy, Brian Patton and Feras Saad for their advice and feedback.

    Originally appeared here:
    AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks

    Go Here to Read this Fast! AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks

  • Where Do EU Horizon H2020 Fundings Go?

    Milan Janosov

    Combining explorative data analytics, geospatial data, and network science in Python to overview 35k+ EU-funded projects.

    Originally appeared here:
    Where Do EU Horizon H2020 Fundings Go?

    Go Here to Read this Fast! Where Do EU Horizon H2020 Fundings Go?

  • Learning to Rank — Contextual Item Recommendations for User Pairs

    Jay Franck

    Learning to Rank — Contextual Item Recommendations for User Pairs

    Train a Machine Learning recommendation engine that learns the shared preferences of groups of people

    Photo by Lucrezia Carnelos on Unsplash

    This walkthrough is for…

    1. Anyone interested in DIY recommendations
    2. Engineers interested in basic PyTorch ranking models
    3. Coffee nerds

    This walkthrough is not for…

    1. Someone who wants to copy-paste code into their production system
    2. Folks that wanted a TensorFlow model

    Motivation

    Imagine you are sitting on your couch, friends or family present. You have your preferred game console/streaming service/music app open, and each item is a glittering jewel of possibility, tailored for you. But those personalized results may be for the solo version of yourself, and do not reflect the version of yourself when surrounded by this particular mix of others.

    This project truly started with coffee. I am enamored with roasting my own green coffee sourced from Sweet Maria’s (no affiliation), as it has such a variety of delicious possibilities. Colombian? Java-beans? Kenyan Peaberry? Each description is more tantalizing than the last. It is so hard to choose even for myself as an individual. What happens if you are buying green coffee for your family or guests?

    I wanted to create a Learning to Rank (LTR) model that could potentially solve this coffee conundrum. For this project, I began by building a simple TensorFlow Ranking project to predict user-pair rankings of different coffees. I had some experience with TFR, and so it seemed like a natural fit.

    However, I realized I had never made a ranking model from scratch before! I set about constructing a very hacky PyTorch ranking model to see if I could throw one together and learn something in the process. This is obviously not intended for a production system, and I made a lot of shortcuts along the way, but it has been an amazing pedagogical experience.

    Data

    Photo by Pritesh Sudra on Unsplash

    Our supreme goal is the following:

    • develop a ranking model that learns the pairwise preferences of users
    • apply this to predict the listwise ranking of `k` items

    What signal might lie in user and item feature combinations to produce a set of recommendations for that user pair?

    To collect this data, I had to perform painful research of taste-testing amazing coffees with my wife. Each of us then rated them on a 10-point scale. The target value is simply the sum of our two scores (20 point maximum). The object of the model is to Learn to Rank coffees that we will both enjoy, and not just one member of any pair. The contextual data that we will be using is the following:

    • ages of both users in the pair
    • user ids that will be turned into embeddings

    SweetMarias.com provides a lot of item data:

    • the origin of the coffee
    • Processing and cultivation notes
    • tasting descriptions
    • professional grading scores (100 point scale)

    So for each training example, we will have the user data as the contextual information and each item’s feature set will be concatenated.

    TensorFlow Ranking models are typically trained on data in ELWC format: ExampleListWithContext. You can think of it like a dictionary with 2 keys: CONTEXT and EXAMPLES (list). Inside each EXAMPLE is a dictionary of features per item you wish to rank.

    For example, let us assume that I was searching for a new coffee to try out, and some candidate pool was presented to me of k=10 coffee varietals. An ELWC would consist of the context/user information, as well as a list of 10 items, each with its own feature set.

    As I was no longer using TensorFlow Ranking, I made my own hacky ranking/list building aspect of this project. I grabbed random samples of k items from which we have scores and added them to a list. I split the first coffees I tried into a training set, and later examples became a small validation set to evaluate the model.

    Feature Intuition

    In this toy example, we have a fairly rich dataset. Context-wise, we ostensibly know the users’ age and can learn their respective preference embeddings. Through subsequent layers inside the LTR, these contextual features can be compared and contrasted. Does one user in the pair like dark, fruity flavors, while the other enjoys invigorating citrus and fruity notes in their cup?

    Photo by Nathan Dumlao on Unsplash

    For the item features, we have a generous helping of rich, descriptive text of each coffee’s tasting notes, origin, etc. More on this later, but the general idea is that we can capture the meaning of these descriptions and match the descriptions with the context (user-pair) data. Finally, we have some numerical features like the product expert tasting score per item that (should) have some semblance to reality.

    Preprocessing

    A stunning shift is underway in text embeddings from when I was starting out in the ML industry. Long gone are the GLOVE and Word2Vec models that I used to use to try to capture some semantic meaning from a word or phrase. If you head on over to https://huggingface.co/blog/mteb, you can easily compare what the latest and greatest embedding models are for a variety of purposes.

    For the sake of simplicity and familiarity, we will be using https://huggingface.co/BAAI/bge-base-en-v1.5 embeddings to help us project our text features into something understandable by a LTR model. Specifically we will use this for the product descriptions and product names that Sweet Marias provides.

    We will also need to convert all of our user- and item-id values into an embedding space. PyTorch handles this beautifully with the Embedding Layers.

    Finally we do some scaling on our float features with a simple RobustScaler. This can all happen inside our Torch Dataset class which then gets dumped into a DataLoader for training. The trick here is to separate out the different identifiers that will get past into the forward() call for PyTorch. This article by Offir Inbar really saved me some time by doing just that!

    Model Building and Training

    The only interesting thing about the Torch training was ensuring that the 2 user embeddings (one for each rater) and the k coffees in the list for training had the correct embeddings and dimensions to pass through our neural network. With a few tweaks, I was able to get something out:

    This forward pushes each training example into a single concatenated list with all of the features.

    With so few data points (only 16 coffees were rated), it can be difficult to train a robust NN model. I often build a simple sklearn model side by side so that I can compare the results. Are we really learning anything?

    Using the same data preparation techniques, I built a LogisticRegression multi-class classifier model, and then dumped out the .predict_proba() scores to be used as our rankings. What could our metrics say about the performance of these two models?

    Results

    For the metrics, I chose to track two:

    1. Top (`k=1`) accuracy
    2. NDCG

    The goal, of course, is to get the ranking correct for these coffees. NDCG will fit the bill nicely here. However, I suspected that the LogReg model might struggle with the ranking aspect, so I thought I might throw a simple accuracy in there as well. Sometimes you only want one really good cup of coffee and don’t need a ranking!

    Without any significant investment in parameter tuning on my part, I achieved very similar results between the two models. SKLearn had slightly worse NDCG on the (tiny) validation set (0.9581 vs 0.950), but similar accuracy. I believe with some hyper-parameter tuning on both the PyTorch model and the LogReg model, the results could be very similar with so little data. But at least they broadly agree!

    Future Work

    I have a new batch of 16 pounds of coffee to start ranking to add to the model, and I deliberately added some lesser-known varietals to the mix. I hope to clean up the repo a bit and make it less of a hack-job. Also I need to add a prediction function for unseen coffees so that I can figure out what to buy next order!

    One thing to note is that if you are building a recommender for production, it is often a good idea to use a real library built for ranking. TensorFlow Ranking, XGBoost, LambdaRank, etc. are accepted in the industry and have lots of the pain points ironed out.

    Please check out the repo here and let me know if you catch any bugs! I hope you are inspired to train your own User-Pair model for ranking.


    Learning to Rank — Contextual Item Recommendations for User Pairs was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Learning to Rank — Contextual Item Recommendations for User Pairs

    Go Here to Read this Fast! Learning to Rank — Contextual Item Recommendations for User Pairs

  • Advanced RAG patterns on Amazon SageMaker

    Advanced RAG patterns on Amazon SageMaker

    Niithiyn Vijeaswaran

    Today, customers of all industries—whether it’s financial services, healthcare and life sciences, travel and hospitality, media and entertainment, telecommunications, software as a service (SaaS), and even proprietary model providers—are using large language models (LLMs) to build applications like question and answering (QnA) chatbots, search engines, and knowledge bases. These generative AI applications are not only […]

    Originally appeared here:
    Advanced RAG patterns on Amazon SageMaker

    Go Here to Read this Fast! Advanced RAG patterns on Amazon SageMaker

  • Efficient continual pre-training LLMs for financial domains

    Efficient continual pre-training LLMs for financial domains

    Yong Xie

    Large language models (LLMs) are generally trained on large publicly available datasets that are domain agnostic. For example, Meta’s Llama models are trained on datasets such as CommonCrawl, C4, Wikipedia, and ArXiv. These datasets encompass a broad range of topics and domains. Although the resulting models yield amazingly good results for general tasks, such as […]

    Originally appeared here:
    Efficient continual pre-training LLMs for financial domains

    Go Here to Read this Fast! Efficient continual pre-training LLMs for financial domains

  • Coding with LLMs, Learning Math, Data Science Freelancing, and Other March Must-Reads

    TDS Editors

    As large language model-based workflows become both more sophisticated and more widespread, we’re seeing a growing number of novel approaches that help practitioners tailor (and improve) the models’ performance to specific projects and use cases. Many of our best-read articles in the past month zoomed in on this trend, with excellent guides for both novices and experiences users.

    Our monthly highlights go beyond the exciting world of LLMs to explore other topics that remain top of mind for many data and ML professionals—from solidifying their math skills to streamlining error messages in Python. We hope you carve out some time over the next few days to discover (or revisit) some of our most popular articles from March. Let’s dive in!

    Monthly Highlights

    • Intro to DSPy: Goodbye Prompting, Hello Programming!
      Few recent tools have generated as much excitement as DSPy, a powerful open-source framework for algorithmically optimizing prompts and weights. Leonie Monigatti brought her signature clarity and practical approach to this topic, and her beginner-friendly guide attracted the largest readership on TDS this month.
    • How to Learn the Math Needed for Data Science
      How much math knowledge should data scientists accumulate in order to do well on their job? The yearslong debate rages on, but for anyone who’s still in the process of building their fundamental skills, Egor Howell’s primer—which comes with ample resources and tips—is a great place to start.
    • Why LLMs Are Not Good for Coding
      AI-assisted programming is not exactly new, but talk about the imminent disappearance of developers has become a lot more common in the past year or so. Depending on your perspective, Andrea Valenzuela’s assessment of LLMs’ current limitations will be either sobering or comforting; testing ChatGPT’s abilities, she concludes that “it often struggles to generate efficient and high-quality code.”
    Photo by Katrin Leinfellner on Unsplash

    Our latest cohort of new authors

    Every month, we’re thrilled to see a fresh group of authors join TDS, each sharing their own unique voice, knowledge, and experience with our community. If you’re looking for new writers to explore and follow, just browse the work of our latest additions, including Tahreem Rasul, Benoît Courty, Kabeer Akande, Riddhisha Prabhu, Markus Stoll, Davide Ghilardi, Dr. Leon Eversberg, Stephan Hausberg, Eden B., Volker Janz, Chris Taylor, Lior Sidi, Yuval Zukerman, Geoffrey Williams, Krzysztof K. Zdeb, Ryan O’Sullivan, Jimmy Wong, Thauri Dattadeen, Eric Frey, Bill Chambers, Tianyi Li, Marlon Hamm, Sebastian Bahr, Florent Pajot, Mark Chang, Pierre Lienhart, Thierry Jean, Tiddo Loos, G. Jay Kerns, Amirarsalan Rajabi, Hussein Jundi, Saikat Dutta, Nidhi Srinath, Ophelia P Johnson, Antonio Grandinetti, Vedant Jumle, Julia Winn, Dusko Pavlovic, Srijanie Dey, PhD, Melanie Hart Buehler, Siq Sun, Lukasz Kowejsza, Sandi Besen, Tula Masterman, Saar Berkovich, Maggie Ma, Georg Ruile, Ph.D., and Amine Raji, among others.

    Thank you for supporting the work of our authors! If you’re feeling inspired to join their ranks, why not write your first post? We’d love to read it.

    Until the next Variable,

    TDS Team


    Coding with LLMs, Learning Math, Data Science Freelancing, and Other March Must-Reads was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Coding with LLMs, Learning Math, Data Science Freelancing, and Other March Must-Reads

    Go Here to Read this Fast! Coding with LLMs, Learning Math, Data Science Freelancing, and Other March Must-Reads