Tag: AI

  • The Intersection of Memory and Grounding in AI Systems

    The Intersection of Memory and Grounding in AI Systems

    Sandi Besen

    Understanding the 4 key types of memory (short term, short long term, long term, and working), the methods of language model grounding, and the role memory plays in the process of grounding.

    In the context of Language Models and Agentic AI, memory and grounding are both hot and emerging fields of research. And although they are often placed closely in a sentence and are often related, they serve different functions in practice. In this article, I hope to clear up the confusion around these two terms and demonstrate how memory can play a role in the overall grounding of a model.

    Source: Dalle3, Description: split parts of the brain potray memory and grounding in the style of a friendly cartoon

    Memory in Language Models

    In my last article, we discussed the important role of memory in Agentic AI. Memory in language models refers to the ability of AI systems to retain and recall pertinent information, contributing to its ability to reason and continuously learn from its experiences. Memory can be thought of in 4 categories: short term memory, short long term memory, long term memory, and working memory.

    It sounds complex, but let’s break them down simply:

    Short Term Memory (STM):

    STM retains information for a very brief period of time, which could be seconds to minutes. If you ask a language model a question it needs to retain your messages for long enough to generate an answer to your question. Just like people, language models struggle to remember too many things simultaneously.

    Miller’s law, states that “Short-term memory is a component of memory that holds a small amount of information in an active, readily available state for a brief period, typically a few seconds to a minute. The duration of STM seems to be between 15 and 30 seconds, and STM’s capacity is limited, often thought to be about 7±2 items.”

    So if you ask a language model “what genre is that book that I mentioned in my previous message?” it needs to use its short term memory to reference recent messages and generate a relevant response.

    Implementation:

    Context is stored in external systems, such as session variables or databases, which hold a portion of the conversation history. Each new user input and assistant response is appended to the existing context to create conversation history. During inference, context is sent along with the user’s new query to the language model to generate a response that considers the entire conversation. This research paper offers a more in depth view of the mechanisms that enable short term memory.

    Short Long Term Memory (SLTM):

    SLTM retains information for a moderate period, which can be minutes to hours. For example, within the same session, you can pick back up where you left off in a conversation without having to repeat context because it has been stored as SLTM. This process is also an external process rather than part of the language model itself.

    Implementation:

    Sessions can be managed using identifiers that link user interactions over time. Context data is stored in a way that it can persist across user interactions within a defined period, such as a database. When a user resumes conversation, the system can retrieve the conversation history from previous sessions and pass that to the language model during inference. Much like in short term memory, each new user input and assistant response is appended to the existing context to keep conversation history current.

    Long Term Memory (LTM):

    LTM retains information for a admin defined amount of time that could be indefinitely. For example, if we were to build an AI tutor, it would be important for the language model to understand what subjects the student performs well in, where they still struggle, what learning styles work best for them, and more. This way, the model can recall relevant information to inform its future teaching plans. Squirrel AI is an example of a platform that uses long term memory to “craft personalized learning pathways, engages in targeted teaching, and provides emotional intervention when needed”.

    Implementation:

    Information can be stored in structured databases, knowledge graphs, or document stores that are queried as needed. Relevant information is retrieved based on the user’s current interaction and past history. This provides context for the language model that is passed back in with the user’s response or system prompt.

    Working Memory:

    Working memory is a component of the language model itself (unlike the other types of memory that are external processes). It enables the language model to hold information, manipulate it, and refine it — improving the model’s ability to reason. This is important because as the model processes the user’s ask, its understanding of the task and the steps it needs to take to execute on it can change. You can think of working memory as the model’s own scratch pad for its thoughts. For example, when provided with a multistep math problem such as (5 + 3) * 2, the language model needs the ability to calculate the (5+3) in the parentheses and store that information before taking the sum of the two numbers and multiplying by 2. If you’re interested in digging deeper into this subject, the paper “TransformerFAM: Feedback attention is working memory” offers a new approach to extending the working memory and enabling a language model to process inputs/context window of unlimited length.

    Implementation:

    Mechanisms like attention layers in transformers or hidden states in recurrent neural networks (RNNs) are responsible for maintaining intermediate computations and provide the ability to manipulate intermediate results within the same inference session. As the model processes input, it updates its internal state, which enables stronger reasoning abilities.

    All 4 types of memory are important components of creating an AI system that can effectively manage and utilize information across various timeframes and contexts.

    Table of Types of Memory in AI Systems, Source: Sandi Besen

    Grounding

    The response from a language model should always make sense in the context of the conversation — they shouldn’t just be a bunch of factual statements. Grounding measures the ability of a model to produce an output that is contextually relevant and meaningful. The process of grounding a language model can be a combination of language model training, fine-tuning, and external processes (including memory!).

    Language Model Training and Fine Tuning

    The data that the model is initially trained on will make a substantial difference in how grounded the model is. Training a model on a large corpora of diverse data enables it to learn language patterns, grammar, and semantics, to predict the next most relevant word. The pre-trained model is then fine-tuned on domain-specific data, which helps it generate more relevant and accurate outputs for particular applications that require deeper domain specific knowledge. This is especially important if you require the model to perform well on specific texts which it might not have been exposed to during its initial training. Although our expectations of a language model’s capabilities are high, we can’t expect it to perform well on something it has never seen before. Just like we wouldn’t expect a student to perform well on an exam if they hadn’t studied the material.

    External Context

    Providing the model with real-time or up-to-date context-specific information also helps it stay grounded. There are many methods of doing this, such as integrating it with external knowledge bases, APIs, and real-time data. This method is also known as Retrieval Augmented Generation (RAG).

    Memory Systems

    Memory systems in AI play a crucial role in ensuring that the system remains grounded based on its previously taken actions, lessons learned, performance over time, and experience with users and other systems. The four types of memory outlined previously in the article play a crucial role in grounding a language model’s ability to stay context-aware and produce relevant outputs. Memory systems work in tandem with grounding techniques like training, fine-tuning, and external context integration to enhance the model’s overall performance and relevance.

    Conclusion

    Memory and grounding are interconnected elements that enhance the performance and reliability of AI systems. While memory enables AI to retain and manipulate information across different timeframes, grounding ensures that the AI’s outputs are contextually relevant and meaningful. By integrating memory systems and grounding techniques, AI systems can achieve a higher level of understanding and effectiveness in their interactions and tasks.

    Note: The opinions expressed both in this article and paper are solely those of the authors and do not necessarily reflect the views or policies of their respective employers.

    If you still have questions or think that something needs to be further clarified? Drop me a DM on Linkedin! I‘m always eager to engage in food for thought and iterate on my work.

    References:

    https://openreview.net/pdf?id=QNW1OrjynpT

    https://www.simplypsychology.org/short-term-memory.html


    The Intersection of Memory and Grounding in AI Systems was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    The Intersection of Memory and Grounding in AI Systems

    Go Here to Read this Fast! The Intersection of Memory and Grounding in AI Systems

  • Monocular Depth Estimation with Depth Anything V2

    Monocular Depth Estimation with Depth Anything V2

    Avishek Biswas

    How do neural networks learn to estimate depth from 2D images?

    What is Monocular Depth Estimation?

    The Depth Anything V2 Algorithm (Illustration by Author)

    Monocular Depth Estimation (MDE) is the task of training a neural network to determine depth information from a single image. This is an exciting and challenging area of Machine Learning and Computer Vision because predicting a depth map requires the neural network to form a 3-dimensional understanding from just a 2-dimensional image.

    In this article, we will discuss a new model called Depth Anything V2 and its precursor, Depth Anything V1. Depth Anything V2 has outperformed nearly all other models in Depth Estimation, showing impressive results on tricky images.

    Depth Anything V2 Demo (Source: Screen recording by the author from Depth Anything V2 DEMO page)

    This article is based on a video I made on the same topic. Here is a video link for learners who prefer a visual medium. For those who prefer reading, continue!

    Why should we even care about MDE models?

    Good MDE models have many practical uses, such as aiding navigation and obstacle avoidance for robots, drones, and autonomous vehicles. They can also be used in video and image editing, background replacement, object removal, and creating 3D effects. Additionally, they are useful for AR and VR headsets to create interactive 3D spaces around the user.

    There are two main approaches for doing MDE (this article only covers one)

    Two main approaches have emerged for training MDE models — one, discriminative approaches where the network tries to predict depth as a supervised learning objective, and two, generative approaches like conditional diffusion where depth prediction is an iterative image generation task. Depth Anything belongs to the first category of discriminative approaches, and that’s what we will be discussing today. Welcome to Neural Breakdown, and let’s go deep with Depth Estimation[!

    Traditional Datasets and the MiDAS paper

    To fully understand Depth Anything, let’s first revisit the MiDAS paper from 2019, which serves as a precursor to the Depth Anything algorithm.

    Source: Screenshot taken from the MIDAS Paper (License: Free)

    MiDAS trains an MDE model using a combination of different datasets containing labeled depth information. For instance, the KITTI dataset for autonomous driving provides outdoor images, while the NYU-Depth V2 dataset offers indoor scenes. Understanding how these datasets are collected is crucial because newer models like Depth Anything and Depth Anything V2 address several issues inherent in the data collection process.

    How real-world depth datasets are collected

    These datasets are typically collected using stereo cameras, where two or more cameras placed at fixed distances capture images simultaneously from slightly different perspectives, allowing for depth information extraction. The NYU-Depth V2 dataset uses RGB-D cameras that capture depth values along with pixel colors. Some datasets utilize LiDAR, projecting laser beams to capture 3D information about a scene.

    However, these methods come with several problems. The amount of labeled data is limited due to the high operational costs of obtaining these datasets. Additionally, the annotations can be noisy and low-resolution. Stereo cameras struggle under various lighting conditions and can’t reliably identify transparent or highly reflective surfaces. LiDAR is expensive, and both LiDAR and RGB-D cameras have limited range and generate low-resolution, sparse depth maps.

    Can we use Unlabelled Images to learn Depth Estimation?

    It would be beneficial to use unlabeled images to train depth estimation models, given the abundance of such images available online. The major innovation proposed in the original Depth Anything paper from 2023 was the incorporation of these unlabeled datasets into the training pipeline. In the next section, we’ll explore how this was achieved.

    Depth Anything Architecture

    The original Depth Anything (V1) model from 2023 was trained in a three-step process. Let’s get a high-level overview of the algorithm before diving into each section.

    Depth Anything V1 Algorithm (Illustration made by the Author)

    Step 1: Teacher Training

    First, a neural network called the TEACHER model is trained for supervised depth estimation using five different publicly available datasets.

    Converting from Depth to Disparity Space

    The TEACHER model is initialized with a pre-trained Dino-V2 encoder and then trained on the combined labeled dataset. A major challenge with training on multiple datasets is the variability in absolute depths. To address this, the depths are inverted into disparity space (d = 1 / t) and normalized between 0 and 1 for each depth map — 1 for the nearest pixel and 0 for the farthest. This way, all datasets share the same output space, allowing the model to predict disparity.

    Different Depth Estimation datasets provide depths at different scales. We need to align them to have the same output space. Disparity lets us normalize all depth values between 0 and 1 (Illustration by Author)

    Two loss functions are used to train these models: a scale-shift invariant loss and a gradient-matching loss, both also utilized in the MiDAS paper from 2019.

    1. Scale-shift invariant loss

    There is a problem with using a simple mean square error loss between the predicted and ground truth images. Let’s say the ground truth depth values of three pixels in an image are 1, 0.5, and 0.1, while our network predicts 0.9, 0.6, and 0.3. Although the predictions aren’t exact, the relationship between the predicted and ground truth depths is similar, differing only by a multiplicative and additive factor. We don’t want this scale and shift to affect our loss function — we need to align the two maps before applying the mean square error loss.

    Scale and Shift Invariant Loss (Illustration by Author)

    The MiDaS paper proposes normalizing the ground truth and predicted depths to have zero translation and unit scale. The median and deviation are calculated, and the depth maps are scaled and shifted accordingly. Once aligned, the mean square error loss is applied.

    The SSI Loss (Source: MiDAS Paper) (License: Free)

    2. Gradient Matching Loss

    Without Gradient Matching Loss depth maps may become too smudgy and less sharp (Illustration by Author)

    Using only the SSI loss might result in smoothed depth maps that fail to capture sharp distinctions between adjacent pixels. Gradient Matching Loss helps preserve these details by aligning the gradients of the predicted depth map with those of the ground truth.

    First, we calculate the gradients of the predicted and ground truth depth maps across the x and y axes, then apply the loss at the gradient level. MiDaS also uses a multi-scale gradient matching loss with four scale levels. The predicted and ground truth depth maps are downsampled four times, and the loss is applied at each resolution.

    Gradient Matching Loss. This loss is applied at multiple downscaled depth maps (not shown above) (Illustration by Author)

    The final loss is the weighted sum of the scale-and-shift invariant loss and the multi-scale gradient matching loss. While the SSI loss encourages the model to learn general relative depth relationships, the gradient matching loss helps preserve sharp edges and fine-grained information in the scene.

    The Loss functions used to train Depth Estimation models in MIDAS and Depth Anything V1 (Illustration by Author)

    Step 2 — Pseudo-Labelling Unlabelled Dataset

    With our trained TEACHER model, we can now annotate millions of unlabeled images to create a massive pseudo-depth label dataset. These labels are called pseudo because they are AI-generated and may not represent the actual ground truth depth. We now have a lot of (pseudo) labeled images to train a new network.

    Pseudo — Labelling images (Note that this screen is actually from the Depth Anything V2 paper and not V1) Source: Depth Anything V2 Paper (License: Free)

    Step 3 — Training Student Network

    Flashback to the Depth Anything V1 algorithm. We are in Step 3 now. (Illustration made by the author)

    We will be training a new neural network (the student network) on the combination of the labeled and pseudo-labeled datasets. However, simply training the network on the annotations provided by the Teacher Network won’t improve the model beyond the capabilities of the base Teacher model. To make the student network more capable, two strategies were employed: heavy perturbations with image augmentations and introducing an auxiliary semantic preservation loss.

    Heavy Perturbations

    One interesting perturbation used was the Cut Mix operation. This involves combining a random pair of unlabeled images using a binary mask, replacing a rectangular portion of image A with image B. The final loss is the combined SSI and Gradient Matching loss of the two sections from the two ground truth depth maps. These spatial distortions are also combined with color distortions to help the Student Network handle the diversity of open-world images.

    The Cut Mix Operation (Illustration by Author)

    Auxiliary Semantic Preservation Loss

    The network is also trained with an auxiliary task called Semantic Assisted Perception. A strong pre-trained computer vision model like Dino-V2, which has been trained on millions of images in a self-supervised manner, is used. Given an image, we aim to reduce the cosine distance between the embeddings produced by our new Student model and the pre-trained Dino-V2 encoder. This enables our Student model to capture some of the semantic perception capabilities of the larger and more general Dino-V2 model, which it uses to predict the depth map.

    Semantic Assisted Perception (Illustration by author)

    By combining spatial distortions, semantic-assisted perception, and the power of both labeled and unlabeled datasets, the Student Network generalizes better and outperforms the original Teacher Network in-depth estimation! Here are some incredible results from the Depth Anything V1 model!

    Depth Anything V2

    As impressive as Depth Anything V1’s results are, it struggles with transparent objects and capturing fine-grained details. The authors of Depth Anything V2 suggest that the biggest bottleneck for model performance isn’t the architecture itself, but the quality of the data. Most labeled datasets captured with sensors can be quite noisy, ignore fine-grained details, generate low-resolution depth maps, and struggle with lighting conditions and reflective/transparent objects.

    Issues with real-world sensor datasets (Illustration by Author)

    Depth Anything V2 discards labeled datasets from real-world sensors like stereo cameras, LiDAR, and RGB-D cameras, instead using only synthetic datasets. Synthetic datasets are generated through graphics engines, not captured with equipment. An example is the Virtual KITTI dataset, which uses the Unity Game Engine to create rendered images and depth maps for automated driving. There are also indoor datasets like IRS and Hyper-sim. Depth Anything V2 uses five synthetic datasets containing close to 595K photorealistic images.

    Synthetic Datasets vs Real World Sensor Datasets

    Synthetic images do have their pros and cons. They are super accurate, they have high-resolution outputs that capture the finest of the details, and the depth of transparent and reflective surfaces can be easily obtained. Synthetic datasets have direct access to all the 3D information needed since the graphics engine itself creates the scene.

    On the cons side, these images may not essentially capture the images that we will encounter in real-world scenarios. The scene coverage of these datasets isn’t particularly diverse enough too, and is a much smaller subset of real-world images. Depth Anything 2 combines the power of synthetic images with millions of unlabelled images to train an MDE model that outperforms pretty much everything else we have seen so far.

    The pros and cons of Synthetic or Computer Generated Datasets (Illustration by Author)

    Much like V1, the Teacher model in V2 is first trained on labeled datasets. However, in V2, it is exclusively trained on synthetic datasets. In Step 2, the Teacher model assigns pseudo-depth labels to all unlabeled images. Finally, in Step 3, the Student model is trained exclusively on pseudo-labeled images — no real labeled datasets and no synthetic datasets. The synthetic datasets are not used at this stage due to the distribution shift mentioned earlier. The Student network is trained on real-world images annotated by the Teacher model. Just like in V1, the auxiliary semantic preservation loss is used along with the Scale-and-Shift invariant and gradient matching loss.

    The Depth Anything V2 architecture (Illustration by the Author)

    Video link explaining the concepts here visually

    Here is a video that explains all the concepts discussed in this video in a step-by-step method.

    Depth Anything V1 vs Depth Anything V2

    The original Depth Anything emphasized the importance of using unlabeled images in the MDE training pipeline. It introduced the knowledge distillation pipeline with Teacher training, pseudo-labeling unlabeled images, and then training the Student network on a combination of labeled and unlabeled images. The use of strong spatial and color distortions, and a semantic-assisted perception loss, helped create more general and robust embeddings. This resulted in efficient and high-quality depth maps for complex scenes. However, Depth Anything V1 still struggled with reflective surfaces and fine details due to noisy and low-resolution depth labels from real-world sensors.

    Depth Anything V2 improved performance by ignoring real-world sensor datasets and only using synthetic images generated with graphics engines to train the Teacher Network. The Teacher Network then annotates millions of unlabeled images, and the Student Network is trained solely on these pseudo-labeled datasets with real-world images. With these techniques, Depth Anything V2 can now predict fine-level depth maps and handle transparent and reflective surfaces more effectively.

    Relevant Links

    MiDAS: https://arxiv.org/abs/1907.01341
    Depth Anything: https://depth-anything.github.io/
    Depth Anything V2: https://depth-anything-v2.github.io/

    KITTI DATASET: https://www.cvlibs.net/datasets/kitti/
    NYU V2: https://cs.nyu.edu/~fergus/datasets/nyu_depth_v2.html
    VIRTUAL KITTI: https://datasetninja.com/virtual-kitti

    Youtube Video: https://youtu.be/sz30TDttIBA


    Monocular Depth Estimation with Depth Anything V2 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Monocular Depth Estimation with Depth Anything V2

    Go Here to Read this Fast! Monocular Depth Estimation with Depth Anything V2

  • Most Data Quality Initiatives Fail Before They Start. Here’s Why.

    Barr Moses

    Show me your data quality scorecard and I’ll tell you whether you will be successful a year from now.

    Photo by Braden Collum on Unsplash

    Every day I talk to organizations ready to dedicate a tremendous amount of time and resources towards data quality initiatives doomed to fail.

    It’s no revelation that incentives and KPIs drive good behavior. Sales compensation plans are scrutinized so closely that they often rise to the topic of board meetings. What if we gave the same attention to data quality scorecards?

    Even in their heyday, traditional data quality scorecards from the Hadoop era were rarely wildly successful. I know this because prior to starting Monte Carlo, I spent years as an operations VP trying to create data quality standards that drove trust and adoption.

    Over the past few years, advances in the cloud and metadata management have made organizing silly amounts of data possible.

    Data engineering processes are starting to trend towards the level of maturity and rigor of more longstanding engineering disciplines. And of course, AI has the potential to streamline everything.

    While this problem isn’t — and probably will never be — completely solved, I have seen organizations adopt best practices that are the difference between initiative success…and having another kick-off meeting 12 months later.

    Here are 4 key lessons for building data quality scorecards:

    Know what data matters

    The most sure way to fail any data related initiative is to assume all data is of equal value. And the best only way to determine what matters is to talk to the business.

    Brandon Beidel at Red Ventures articulates a good place to start:

    “I’d ask:

    • How do you use this table?
    • When do you look at this data? When do you report this data? Does this data need to be up to the minute, hourly, daily?
    • What purpose does this serve?
    • Who needs to get notified if this data is delayed?”

    Now, this may be easier said than done if you work for a sprawling organization with tens of thousands of employees distributed across the globe.

    In these cases, my recommendation is to start with your most business critical data business units (if you don’t know that, I can’t help you!). Start a discussion on requirements and priorities.

    Just remember: prove the concept first, scale second. You’d be shocked how many people do it the other way around.

    Measure the machine

    One of the enduring challenges to this type of endeavor, in a nutshell, is data quality resists standardization. Quality is, and should be, in the eye of use case.

    The six dimensions of data quality are a vital part of any data quality scorecard and an important starting point, but for many teams, that’s just the beginning — and every data product is different.

    For instance, a financial report may need to be highly accurate with some margin for timeliness whereas a machine learning model may be the exact opposite.

    From an implementation perspective this means measuring data quality has typically been radically federated. Data quality is measured on a table-by-table basis by different analysts or stewards with wildly different data quality rules given wildly different weights.

    This makes sense to a degree, but so much gets lost in translation.

    Data is multi-use and shared across use cases. Not only is one person’s “yellow” quality score another person’s “green,” but it’s often incredibly difficult for data consumers to even understand what a “yellow” score means or how it’s been graded. They also frequently miss the implications of a green table being fed data by a red one (you know, garbage in, garbage out…).

    What is the meaning of a “yellow” scorecard? Photo by Keiron Crasktellanos on Unsplash

    Surfacing the number of breached rules is important, of course, but you also need to:

    • Contextualize it as much as possible,
    • Have an aggregated end-to-end data product view,
    • Invest in some strong no-code data profiling, and
    • Realize it’s not sufficient.

    So then what else do you need? You need to measure the machine.

    In other words, the components in the production and delivery of data that generally result in high quality. This is much easier to standardize. It’s also easier to understand across business units and teams.

    Airbnb Midas is one of the more well known internal data quality score and certification programs and rightfully so. They lean heavily into this concept. They measure data accuracy– but reliability, stewardship, and usability actually comprise 60% of the total score.

    Many data teams are still in the process of formalize their own standards, but the components we have found to highly correlate to data health include:

    • The previously mentioned six dimensions of data quality (validity, completeness, consistency, timeliness, uniqueness, accuracy).

    Usability & Stewardship

    • Documentation: Some level of semantic meaning for both the data asset, its use, and past incidents. One online travel search company scores an asset based on how and where it’s cataloged along with the completeness of its metadata for two of its 6 categories.
    • Lineage: Ability to trace the data’s provenance at the field level across systems.
    • Usage: The number of queries a table receives and the number of data products with downstream dependencies. This can be a “key asset score” and it has a flywheel effect. You focus your reliability efforts on what’s most utilized, and people trust what’s popular.

    System Reliability

    • Monitoring: Generally if a data product has strong coverage not only on the last mile table but all the way upstream, it indicates a well curated asset.
    • Freshness: Data freshness requirements will vary by data product type, but it is a table level metric where deviations from the norm can be identified and surfaced. Many organizations like Roche Diagnostics will have specific freshness SLAs for their data products and measure the level of adherence.
    • Volume: A relatively steady number of rows a table receives is often a sign of a well functioning pipeline and data delivery system.
    • Schema: At the very least you want consumers to have visibility into schema changes. For your most critical pipelines, you ideally want some level of schema enforcement or data contract so that you know when changes at the source break assets downstream.

    Operational Response:

    • Ownership: Does an asset have an owner? Bonus for if it has both a technical and business owner.
    • Notification Channels & Communication: Data delivery is a complex process involving multiple handoffs from ingestion to aggregation to consumption. On top of that, you ideally have multiple teams using a data asset (or else your mesh is more of a silo). The only way to have a reliable data product in this environment is to have a central communication channel to highlight and discuss changes and incidents.
    • Average Time To Fixed: Arguably the most important indicator of how much you can trust a dataset is in how quickly the support team responds and fixes incidents that arise. Bad data is inevitable. Great incident response is intentional.

    Get your carrots and sticks right

    Incentivize quality data for both producers and consumers. Photo by Jonathan Pielmayer on Unsplash

    “Yay, another set of processes we’re required to follow!”… said no one ever.

    Remember the purpose of measuring data health isn’t to measure data health. The point, as Clark at Airbnb put it, is to “drive a preference for producing and using high quality data.”

    The best practices I’ve seen here are to have a minimum set of requirements for data to be on-boarded onto the platform (stick) and a much more stringent set of requirements to be certified at each level (carrot).

    Certification works as a carrot because producers actually want consumers to use their data, and consumers will quickly discern and develop a taste for highly reliable data.

    Automate evaluation and discovery

    Almost nothing in data management is successful without some degree of automation and the ability to self-serve. Airbnb discarded any scoring criteria that 1) wasn’t immediately understandable and 2) couldn’t be measured automatically.

    Your organization must do the same. Even if it’s the best scoring criteria that has ever been conceived, if you do not have a set of solutions that will automatically collect and surface it, into the trash bin it must go.

    Image courtesy of the author.

    The most common ways I’ve seen this done are with data observability and quality solutions, and data catalogs. Roche, for example, does this and layers on access management as part of creating, surfacing and governing trusted data products.

    Source.

    Of course this can also be done by manually stitching together the metadata from multiple data systems into a homegrown discoverability portal, but just be mindful of the maintenance overhead.

    What’s measured is managed

    Data teams have made big investments into their modern data and AI platforms. But to maximize this investment, the organization — both data producers and consumers — must fully adopt and trust the data being provided.

    At the end of the day, what’s measured is managed. And isn’t that what matters?


    Most Data Quality Initiatives Fail Before They Start. Here’s Why. was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Most Data Quality Initiatives Fail Before They Start. Here’s Why.

    Go Here to Read this Fast! Most Data Quality Initiatives Fail Before They Start. Here’s Why.

  • Detect and protect sensitive data with Amazon Lex and Amazon CloudWatch Logs

    Detect and protect sensitive data with Amazon Lex and Amazon CloudWatch Logs

    Rashmica Gopinath

    In today’s digital landscape, the protection of personally identifiable information (PII) is not just a regulatory requirement, but a cornerstone of consumer trust and business integrity. Organizations use advanced natural language detection services like Amazon Lex for building conversational interfaces and Amazon CloudWatch for monitoring and analyzing operational data. One risk many organizations face is […]

    Originally appeared here:
    Detect and protect sensitive data with Amazon Lex and Amazon CloudWatch Logs

    Go Here to Read this Fast! Detect and protect sensitive data with Amazon Lex and Amazon CloudWatch Logs

  • AWS AI chips deliver high performance and low cost for Llama 3.1 models on AWS

    AWS AI chips deliver high performance and low cost for Llama 3.1 models on AWS

    John Gray

    Today, we are excited to announce AWS Trainium and AWS Inferentia support for fine-tuning and inference of the Llama 3.1 models. The Llama 3.1 family of multilingual large language models (LLMs) is a collection of pre-trained and instruction tuned generative models in 8B, 70B, and 405B sizes. In a previous post, we covered how to deploy Llama 3 models on AWS Trainium and Inferentia based instances in Amazon SageMaker JumpStart. In this post, we outline how to get started with fine-tuning and deploying the Llama 3.1 family of models on AWS AI chips, to realize their price-performance benefits.

    Originally appeared here:
    AWS AI chips deliver high performance and low cost for Llama 3.1 models on AWS

    Go Here to Read this Fast! AWS AI chips deliver high performance and low cost for Llama 3.1 models on AWS

  • Use Llama 3.1 405B to generate synthetic data for fine-tuning tasks

    Use Llama 3.1 405B to generate synthetic data for fine-tuning tasks

    Sebastian Bustillo

    Today, we are excited to announce the availability of the Llama 3.1 405B model on Amazon SageMaker JumpStart, and Amazon Bedrock in preview. The Llama 3.1 models are a collection of state-of-the-art pre-trained and instruct fine-tuned generative artificial intelligence (AI) models in 8B, 70B, and 405B sizes. Amazon SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. Amazon Bedrock offers a straightforward way to build and scale generative AI applications with Meta Llama models, using a single API.

    Originally appeared here:
    Use Llama 3.1 405B to generate synthetic data for fine-tuning tasks

    Go Here to Read this Fast! Use Llama 3.1 405B to generate synthetic data for fine-tuning tasks

  • Llama 3.1 models are now available in Amazon SageMaker JumpStart

    Llama 3.1 models are now available in Amazon SageMaker JumpStart

    Saurabh Trikande

    Today, we are excited to announce that the state-of-the-art Llama 3.1 collection of multilingual large language models (LLMs), which includes pre-trained and instruction tuned generative AI models in 8B, 70B, and 405B sizes, is available through Amazon SageMaker JumpStart to deploy for inference. Llama is a publicly accessible LLM designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative artificial intelligence (AI) ideas. In this post, we walk through how to discover and deploy Llama 3.1 models using SageMaker JumpStart.

    Originally appeared here:
    Llama 3.1 models are now available in Amazon SageMaker JumpStart

    Go Here to Read this Fast! Llama 3.1 models are now available in Amazon SageMaker JumpStart