Category: Technology

  • Firing Pat Gelsinger doesn’t solve Intel’s problems

    Daniel Cooper

    Despite Intel’s recent woes, I didn’t expect to see CEO Pat Gelsinger joining 15,000 or so of his colleagues being shown the door. Gelsinger is a storied engineer and business success who laid down an exhaustive rescue plan when he took the helm of the beleaguered chipmaker in 2021. It was never going to be a quick fix, given the company’s long legacy of missteps. Gelsinger may be the public face of Intel’s current malaise, but the problems started long before his tenure and will likely keep going.

    Gelsinger was tasked with addressing almost two decades’ worth of bad decisions, all of which have compounded. Intel became an industry-swallowing behemoth as one half of the Wintel alliance, producing chips that went hand-in-glove with Microsoft Windows. The vast profits that flowed from this partnership meant there was an institutional reluctance to look too hard at new business ventures that could distract from its golden goose, still going strong all these years later.

    In 2005, then-CEO Paul Ottellini turned down the chance to make the iPhone’s system-on-chip. It would have been easy for Intel, since it already made XScale ARM chips for mobile devices. You could find an Intel ARM chip inside popular phones like the BlackBerry Pearl 8100 and Palm Treo 650. A year later, it would sell XScale to Marvell, believing it would be able to shrink its x86 chips to work on smartphones. The first Intel Atom handsets showed some degree of promise, but the Snapdragons of the day — produced by considerably smaller rival Qualcomm — beat them pretty easily.

    At the same time, Intel was working on Larrabee, its own discrete GPU platform based on the x86 architecture. Despite several years of marketing bravado and suggestions it would “kill” AMD/ATI and NVIDIA, Intel axed it in 2010 in favor of bundling integrated graphics into its regular processor products. The decision would hand the bulk of the GPU market to NVIDIA, making it the go-to name for gaming, supercomputers, crypto and AI, posting quarterly revenues of $35.1 billion on November 20.

    Could Intel have foreseen the meteoric rise of AI? Maybe not. But Reuters reported former Intel CEO Bob Swan turned down the chance to invest in OpenAI in 2017. It was looking for a hardware partner to reduce its reliance on NVIDIA, offering a generous deal in the process. Swan, however, reportedly said he couldn’t see a future for generative AI, and Intel’s data center unit refused to sell the hardware at a discount.

    Intel’s core strength was in the quality of its engineering, the solidity of its product and that it always kept close to the cutting edge. (There are parallels to be drawn between Intel and Boeing, both of which are watching their reputation for quality erode in real time.) Sadly Intel’s bread-and-butter business hit the skids after the company failed to produce 10-nanometer chips by its planned 2015 deadline. The company’s famous “tick, tock” strategy of launching a new chip process one year and a refined version the next ground to a halt.

    These issues enabled Intel’s competitors to step in and steal a march, harnessing more modern chip architectures. AMD, which held a little over 10 percent of the chip market for much of the 2010s, has seen its market share double in the last few years. The biggest beneficiary, of course, was TSMC, the Taiwanese chip factory that has become the envy of the world. Even if Intel controls the bulk of the x86 processor market, it’s TSMC that makes the chips for Apple, Qualcomm, NVIDIA and AMD, among others. Intel, meanwhile, was saddled with an older chip manufacturing process that it couldn’t use to catch up with its rivals.

    Gelsinger was as close to an Intel “lifer” as you could imagine, joining the company at 18 and rising to the position of Chief Technology Officer by 2001. In 2009, he left Intel to become COO at EMC and held the position as CEO of VMWare for almost a decade. After taking the reins at Intel, he laid down a detailed plan to mastermind its glorious comeback.

    Step one would be to separate Intel’s design and manufacturing business into two distinct entities. With one eye on US subsidies through the Biden administration’s CHIPS and Science Act, Gelsinger pledged to build two new chip factories harnessing the same EUV (Extreme Ultraviolet Lithography) technology used by TSMC.

    Gelsinger was also determined to reestablish discipline in Intel’s chip business and get back to the “tick, tock” structure. Unfortunately, the production delays that had been building up since 2015 meant that Gelsinger’s target was just to get back to parity. In the interim, Intel would also get TSMC to manufacture some of its newest chips which, while costly, would help address any concerns the company was lagging even further behind.

    Nobody had any doubts as to the size of the task facing Gelsinger, but there was plenty of room for optimism. Gelsinger was humble enough to accept Intel couldn’t simply stay on its current course, and had to embrace its new status. He proposed Intel could grin and bear the short-term pain for the company’s eventual benefit. If it could build for the future, harness its rivals to keep it in the game and restore faith in its processes, Intel would emerge from this as the winner. All it needed was for nothing to get worse.

    At the end of October, Reuters reported Gelsinger made a colossal faux-pas when speaking about TSMC. The CEO was quoted saying “You don’t want all of our eggs in the basket of a Taiwan fab,” and that “Taiwan is not a stable place.” This offended TSMC to such an extent that it ended a discount Intel had taken advantage of for years

    Sadly, Gelsinger’s desire to restore discipline to the chip division would also backfire, with the latest Core processors blighted by voltage instability issues. Intel was forced to extend those chips’ warranties, which came at an additional cost it couldn’t really afford. In August, it posted a loss of $1.6 billion and pledged to cut 15,000 employees in an attempt to right the ship. But it was forced to post the biggest quarterly loss in its history three months later, losing $16.6 billion, albeit much of that tied to revaluing company assets and paying for the layoffs. Worse, Intel’s new production process, 18A, reportedly failed crucial tests ahead of its 2025 debut.

    Perhaps the lowest point in Intel’s year was when its stock price fell low enough that it became a takeover target. Rumors suggested Qualcomm was potentially eyeing a takeover while others indicated ARM had made inquiries about purchasing Intel’s product unit.

    The New York Times reports Intel’s board grew frustrated with Gelsinger as his rescue plan was “not showing results quickly enough.” But Intel wasn’t going to hire Gelsinger in 2021 and suddenly bounce back in 2024. Building large and complex chip factories isn’t easy. Nor is getting thousands of engineers to solve difficult problems around chip yields. And obviously reversing a slide that started in 2015 was never going to happen overnight.

    Intel’s board is presently looking for a full-time successor to Gelsinger but it’s hard to see what someone else would do differently. After all, the company still needs to build those factories in order to own and control its future, and it still needs to fix its processes. Unless, of course, the next CEO is going to be told to just stanch the bleeding and keep the money rolling in. Even in its deeply-wounded state after a few bad quarters, Intel is still the biggest name in the x86 chip world and will keep making money for years to come.

    You could easily imagine Intel’s board sitting around, prioritizing a few years of healthy profits at the cost of the company’s long-term future. It can keep selling modified versions of its existing desktop chips, ceding the technological leadership to AMD, Qualcomm and others. There’s probably a decade or two of big industrial clients who would be happy using Intel processors for their hardware for as long as they’re still using Windows. Perhaps that would be fitting given how big and ossified Intel has become, admitting that it can’t move fast enough to evolve.

    It’s likely that scenario won’t be allowed to happen given Intel’s broader role in the global tech space. Even if the incoming administration criticized the CHIPS Act — Intel is still set to be its largest funding recipient — having a domestic manufacturer of Intel’s scale will be an asset few sane governments would allow to fall. But just switching CEOs won’t suddenly fix the company’s big, hard-to-solve problems. It wasn’t Pat Gelsinger who screwed up power design for Raptor Lake, nor did he pass on the opportunity to make the iPhone CPU all those years ago. The TSMC stuff, he can own that, but while a CEO sets the direction of travel, he can’t micromanage every process in a company of Intel’s scale. So whoever replaces him will have the same big stack of issues to tackle, knowing that the board’s patience will be even shorter this time out.

    This article originally appeared on Engadget at https://www.engadget.com/computing/firing-pat-gelsinger-doesnt-solve-intels-problems-173420381.html?src=rss

    Go Here to Read this Fast!

    Firing Pat Gelsinger doesn’t solve Intel’s problems

    Originally appeared here:

    Firing Pat Gelsinger doesn’t solve Intel’s problems

  • From Retrieval to Intelligence: Exploring RAG, Agent+RAG, and Evaluation with TruLens

    Vladyslav Fliahin

    Unlocking the Power of GPT-Generated Private Corpora

    Introduction

    Nowadays the world has a lot of good foundation models to start your custom application with (gpt-4o, Sonnet, Gemini, Llama3.2, Gemma, Ministral, etc.). These models know everything about history, geography, and Wikipedia articles but still have weaknesses. Mostly there are two of them: level of details (e.g., the model knows about BMW, what it does, model names, and some more general info; but the model fails in case you ask about number of sales for Europe or details of the specific engine part) and the recent knowledge (e.g., Llama3.2 model or Ministral release; foundation models are trained at a certain point in time and have some knowledge cutoff date, after which the model doesn’t know anything).

    A lot of books, depicting the amount of LLM knowledge.
    Photo by Jaredd Craig on Unsplash

    This article is focused on both issues, describing the situation of imaginary companies that were founded before the knowledge cutoff, while some information was changed recently.

    To address both issues we will use the RAG technique and the LlamaIndex framework. The idea behind the Retrieval Augmented Generation is to supply the model with the most relevant information during the answer generation. This way we can have a DB with custom data, which the model will be able to utilize. To further assess the system performance we will incorporate the TruLens library and the RAG Triad metrics.

    Mentioning the knowledge cutoff, this issue is addressed via google-search tools. Nevertheless, we can’t completely substitute the knowledge cutoff with the search tool. To understand this, imagine 2 ML specialists: first knows everything about the current GenAI state, and the second switched from the GenAI to the classic computer vision 6 month ago. If you ask them both the same question about how to use the recent GenAI models, it will take significantly different amount of search requests. The first one will know all about this, but maybe will double-check some specific commands. And the second will have to read a whole bunch of detailed articles to understand what’s going on first, what this model is doing, what is under the hood, and only after that he will be able to answer.

    Basically it is like comparison of the field-expert and some general specialists, when one can answer quickly, and the second should go googling because he doesn’t know all the details the first does.

    The main point here is that a lot of googling provides comparable answer within a significantly longer timeframe. For in chat-like applications users won’t wait minutes for the model to google smth. In addition, not all the information is open and can be googled.

    Data

    Right now it may be hard to find a dataset, that is not previously used in the training data of the foundation model. Almost all the data is indexed and used during the large models’ pretraining stage.

    Humans (as companies) walking around the forest looking for logs (data) and throwing them into the machine (LLM) that converts logs into fire. The “LLM” is written on the machine, and the “Data” is written on the logs. The fire out of the machines are going from the top.
    Source: Image generated by the author using AI (Bing)

    That’s why I decided to generate the one myself. For this purpose, I used the chatgpt-4o-latest via the OpenAI UI and several continuous prompts (all of them are similar to the ones below):

    Generate me a private corpus with some details mentioning the imagined Ukraine Boats Inc.
    A list of products, prices, responsible stuff, etc.
    I want to use it as my private corpus for the RAG use-case
    You can generate really a lot of the text. The more the better.
    Yeah, proceed with partnerships, legal policies, competitions participated
    Maybe info about where we manufacture our boats (and add some custom ones)
    add client use studies

    As a result, I generated a private corpus for 4 different companies. Below are the calculations of the tokens to better embrace the dataset size.

    # Number of tokens using the `o200k_base` tokenizer (gpt-4o/gpt-4o-mini)
    nova-drive-motors.txt: 2757
    aero-vance-aviation.txt: 1860
    ukraine-boats.txt: 3793
    city-solve.txt: 3826
    total_tokens=12236

    Below you can read the beginning of the Ukraine Boats Inc. description:

    ## **Ukraine Boats Inc.**
    **Corporate Overview:**
    Ukraine Boats Inc. is a premier manufacturer and supplier of high-quality boats and maritime solutions based in Odessa, Ukraine. The company prides itself on blending traditional craftsmanship with modern technology to serve clients worldwide. Founded in 2005, the company has grown to be a leader in the boating industry, specializing in recreational, commercial, and luxury vessels.
    - -
    ### **Product Lineup**
    #### **Recreational Boats:**
    1. **WaveRunner X200**
    - **Description:** A sleek speedboat designed for water sports enthusiasts. Equipped with advanced navigation and safety features.
    - **Price:** $32,000
    - **Target Market:** Young adventurers and watersport lovers.
    - **Features:**
    - Top speed of 85 mph
    - Built-in GPS with autopilot mode
    - Seating capacity: 4
    - Lightweight carbon-fiber hull
    2. **AquaCruise 350**
    - **Description:** A versatile motorboat ideal for fishing, family trips, and casual cruising.
    - **Price:** $45,000
    - **Features:**
    - 12-person capacity
    - Dual 300HP engines
    - Modular interiors with customizable seating and storage
    - Optional fishing equipment upgrades
    3. **SolarGlide EcoBoat**
    - **Description:** A solar-powered boat for environmentally conscious customers.
    - **Price:** $55,000
    - **Features:**
    - Solar panel roof with 12-hour charge life
    - Zero emissions
    - Maximum speed: 50 mph
    - Silent motor technology
    - -

    The complete private corpus can be found on GitHub.

    For the purpose of the evaluation dataset, I have also asked the model to generate 10 questions (about Ukraine Boats Inc. only) based on the given corpus.

    based on the whole corpus above, generate 10 questions and answers for them pass them into the python native data structure

    Here is the dataset obtained:

    [
    {
    "question": "What is the primary focus of Ukraine Boats Inc.?",
    "answer": "Ukraine Boats Inc. specializes in manufacturing high-quality recreational, luxury, and commercial boats, blending traditional craftsmanship with modern technology."
    },
    {
    "question": "What is the price range for recreational boats offered by Ukraine Boats Inc.?",
    "answer": "Recreational boats range from $32,000 for the WaveRunner X200 to $55,000 for the SolarGlide EcoBoat."
    },
    {
    "question": "Which manufacturing facility focuses on bespoke yachts and customizations?",
    "answer": "The Lviv Custom Craft Workshop specializes in bespoke yachts and high-end customizations, including handcrafted woodwork and premium materials."
    },
    {
    "question": "What is the warranty coverage offered for boats by Ukraine Boats Inc.?",
    "answer": "All boats come with a 5-year warranty for manufacturing defects, while engines are covered under a separate 3-year engine performance guarantee."
    },
    {
    "question": "Which client used the Neptune Voyager catamaran, and what was the impact on their business?",
    "answer": "Paradise Resorts International used the Neptune Voyager catamarans, resulting in a 45% increase in resort bookings and winning the 'Best Tourism Experience' award."
    },
    {
    "question": "What award did the SolarGlide EcoBoat win at the Global Marine Design Challenge?",
    "answer": "The SolarGlide EcoBoat won the 'Best Eco-Friendly Design' award at the Global Marine Design Challenge in 2022."
    },
    {
    "question": "How has the Arctic Research Consortium benefited from the Poseidon Explorer?",
    "answer": "The Poseidon Explorer enabled five successful Arctic research missions, increased data collection efficiency by 60%, and improved safety in extreme conditions."
    },
    {
    "question": "What is the price of the Odessa Opulence 5000 luxury yacht?",
    "answer": "The Odessa Opulence 5000 luxury yacht starts at $1,500,000."
    },
    {
    "question": "Which features make the WaveRunner X200 suitable for watersports?",
    "answer": "The WaveRunner X200 features a top speed of 85 mph, a lightweight carbon-fiber hull, built-in GPS, and autopilot mode, making it ideal for watersports."
    },
    {
    "question": "What sustainability initiative is Ukraine Boats Inc. pursuing?",
    "answer": "Ukraine Boats Inc. is pursuing the Green Maritime Initiative (GMI) to reduce the carbon footprint by incorporating renewable energy solutions in 50% of their fleet by 2030."
    }
    ]

    Now, when we have the private corpus and the dataset of Q&A pairs, we can insert our data into some suitable storage.

    Data propagation

    We can utilize a variety of databases for the RAG use case, but for this project and the possible handling of future relations, I integrated the Neo4j DB into our solution. Moreover, Neo4j provides a free instance after registration.

    Now, let’s start preparing nodes. First, we instantiate an embedding model. We used the 256 vector dimensions because some recent tests showed that bigger vector dimensions led to scores with less variance (and that’s not what we need). As an embedding model, we used the text-embedding-3-small model.

    # initialize models
    embed_model = OpenAIEmbedding(
    model=CFG['configuration']['models']['embedding_model'],
    api_key=os.getenv('AZURE_OPENAI_API_KEY'),
    dimensions=CFG['configuration']['embedding_dimension']
    )

    After that, we read the corpus:

    # get documents paths
    document_paths = [Path(CFG['configuration']['data']['raw_data_path']) / document for document in CFG['configuration']['data']['source_docs']]

    # initialize a file reader
    reader = SimpleDirectoryReader(input_files=document_paths)

    # load documents into LlamaIndex Documents
    documents = reader.load_data()

    Furthermore, we utilize the SentenceSplitter to convert documents into separate nodes. These nodes will be stored in the Neo4j database.

    neo4j_vector = Neo4jVectorStore(
    username=CFG['configuration']['db']['username'],
    password=CFG['configuration']['db']['password'],
    url=CFG['configuration']['db']['url'],
    embedding_dimension=CFG['configuration']['embedding_dimension'],
    hybrid_search=CFG['configuration']['hybrid_search']
    )

    # setup context
    storage_context = StorageContext.from_defaults(
    vector_store=neo4j_vector
    )

    # populate DB with nodes
    index = VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True)

    Hybrid search is turned off for now. This is done deliberately to outline the performance of the vector-search algorithm.

    We are all set, and now we are ready to go to the querying pipeline.

    UI of the Neo4j Aura depicting the Nodes we have inserted to the DB.
    Source: Image created by the author

    Pipeline

    The RAG technique may be implemented as a standalone solution or as a part of an agent. The agent is supposed to handle all the chat history, tools handling, reasoning, and output generation. Below we will have a walkthrough on how to implement the query engines (standalone RAG) and the agent approach (the agent will be able to call the RAG as one of its tools).

    Often when we talk about the chat models, the majority will pick the OpenAI models without considering the alternatives. We will outline the usage of RAG on OpenAI models and the Meta Llama 3.2 models. Let’s benchmark which one performs better.

    All the configuration parameters are moved to the pyproject.toml file.

    [configuration]
    similarity_top_k = 10
    vector_store_query_mode = "default"
    similarity_cutoff = 0.75
    response_mode = "compact"
    distance_strategy = "cosine"
    embedding_dimension = 256
    chunk_size = 512
    chunk_overlap = 128
    separator = " "
    max_function_calls = 2
    hybrid_search = false

    [configuration.data]
    raw_data_path = "../data/companies"
    dataset_path = "../data/companies/dataset.json"
    source_docs = ["city-solve.txt", "aero-vance-aviation.txt", "nova-drive-motors.txt", "ukraine-boats.txt"]

    [configuration.models]
    llm = "gpt-4o-mini"
    embedding_model = "text-embedding-3-small"
    temperature = 0
    llm_hf = "meta-llama/Llama-3.2-3B-Instruct"
    context_window = 8192
    max_new_tokens = 4096
    hf_token = "hf_custom-token"
    llm_evaluation = "gpt-4o-mini"

    [configuration.db]
    url = "neo4j+s://custom-url"
    username = "neo4j"
    password = "custom-password"
    database = "neo4j"
    index_name = "article" # change if you want to load the new data that won't intersect with the previous uploads
    text_node_property = "text"

    The common step for both models is connecting to the existing vector index inside the neo4j.

    # connect to the existing neo4j vector index
    vector_store = Neo4jVectorStore(
    username=CFG['configuration']['db']['username'],
    password=CFG['configuration']['db']['password'],
    url=CFG['configuration']['db']['url'],
    embedding_dimension=CFG['configuration']['embedding_dimension'],
    distance_strategy=CFG['configuration']['distance_strategy'],
    index_name=CFG['configuration']['db']['index_name'],
    text_node_property=CFG['configuration']['db']['text_node_property']
    )
    index = VectorStoreIndex.from_vector_store(vector_store)

    OpenAI

    Firstly we should initialize the OpenAI models needed. We will use the gpt-4o-mini as a language model and the same embedding model. We specify the LLM and embedding model for the Settings object. This way we don’t have to pass these models further. The LlamaIndex will try to parse the LLM from the Settings if it’s needed.

    # initialize models
    llm = OpenAI(
    api_key=os.getenv('AZURE_OPENAI_API_KEY'),
    model=CFG['configuration']['models']['llm'],
    temperature=CFG['configuration']['models']['temperature']
    )
    embed_model = OpenAIEmbedding(
    model=CFG['configuration']['models']['embedding_model'],
    api_key=os.getenv('AZURE_OPENAI_API_KEY'),
    dimensions=CFG['configuration']['embedding_dimension']
    )

    Settings.llm = llm
    Settings.embed_model = embed_model

    QueryEngine

    After that, we can create a default query engine from the existing vector index:

    # create query engine
    query_engine = index.as_query_engine()

    Furthermore, we can obtain the RAG logic using simply a query() method. In addition, we printed the list of the source nodes, retrieved from the DB, and the final LLM response.

    # custom question
    response = query_engine.query("What is the primary focus of Ukraine Boats Inc.?")

    # get similarity scores
    for node in response.source_nodes:
    print(f'{node.node.id_}, {node.score}')

    # predicted answer
    print(response.response)

    Here is the sample output:

    ukraine-boats-3, 0.8536546230316162
    ukraine-boats-4, 0.8363556861877441


    The primary focus of Ukraine Boats Inc. is designing, manufacturing, and selling luxury and eco-friendly boats, with a strong emphasis on customer satisfaction and environmental sustainability.

    As you can see, we created custom node ids, so that we can understand the file from which it was taken and the ordinal id of the chunk. We can be much more specific with the query engine attitude using the low-level LlamaIndex API:

    # custom retriever
    retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=CFG['configuration']['similarity_top_k'],
    vector_store_query_mode=CFG['configuration']['vector_store_query_mode']
    )

    # similarity threshold
    similarity_postprocessor = SimilarityPostprocessor(similarity_cutoff=CFG['configuration']['similarity_cutoff'])

    # custom response synthesizer
    response_synthesizer = get_response_synthesizer(
    response_mode=CFG['configuration']['response_mode']
    )

    # combine custom query engine
    query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[similarity_postprocessor],
    response_synthesizer=response_synthesizer
    )

    Here we specified custom retriever, similarity postprocessor, and refinement stage actions.

    For further customization, you can create custom wrappers around any of the LlamaIndex components to make them more specific and aligned with your needs.

    Agent

    To implement a RAG-based agent inside the LlamaIndex, we need to use one of the predefined AgentWorkers. We will stick to the OpenAIAgentWorker, which uses OpenAI’s LLM as its brain. Moreover, we wrapped our query engine from the previous part into the QueryEngineTool, which the agent may pick based on the tool’s description.

    AGENT_SYSTEM_PROMPT = "You are a helpful human assistant. You always call the retrieve_semantically_similar_data tool before answering any questions. If the answer to the questions couldn't be found using the tool, just respond with `Didn't find relevant information`."
    TOOL_NAME = "retrieve_semantically_similar_data"
    TOOL_DESCRIPTION = "Provides additional information about the companies. Input: string"

    # agent worker
    agent_worker = OpenAIAgentWorker.from_tools(
    [
    QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name=TOOL_NAME,
    description=TOOL_DESCRIPTION,
    return_direct=False,
    )
    ],
    system_prompt=AGENT_SYSTEM_PROMPT,
    llm=llm,
    verbose=True,
    max_function_calls=CFG['configuration']['max_function_calls']
    )

    To further use the agent, we need an AgentRunner. The runner is more like an orchestrator, handling top-level interactions and state, while the worker performs concrete actions, like tool and LLM usage.

    # agent runner
    agent = AgentRunner(agent_worker=agent_worker)
    AgentRunner holding the context, history, tool calls and the AgentWorker doing all the low-level work.
    Source: Image taken from the LlamaIndex docs

    To test the user-agent interactions efficiently, I implemented a simple chat-like interface:

    while True:
    # get user input
    current_message = input('Insert your next message:')
    print(f'{datetime.now().strftime("%H:%M:%S.%f")[:-3]}|User: {current_message}')

    response = agent.chat(current_message)
    print(f'{datetime.now().strftime("%H:%M:%S.%f")[:-3]}|Agent: {response.response}')

    Here is a sample of the chat:

    Insert your next message: Hi
    15:55:43.101|User: Hi
    Added user message to memory: Hi
    15:55:43.873|Agent: Didn't find relevant information.
    Insert your next message: Do you know anything about the city solve?
    15:56:24.751|User: Do you know anything about the city solve?
    Added user message to memory: Do you know anything about the city solve?
    === Calling Function ===
    Calling function: retrieve_semantically_similar_data with args: {"input":"city solve"}
    Got output: Empty Response
    ========================

    15:56:37.267|Agent: Didn't find relevant information.
    Insert your next message: What is the primary focus of Ukraine Boats Inc.?
    15:57:36.122|User: What is the primary focus of Ukraine Boats Inc.?
    Added user message to memory: What is the primary focus of Ukraine Boats Inc.?
    === Calling Function ===
    Calling function: retrieve_semantically_similar_data with args: {"input":"Ukraine Boats Inc."}
    Got output: Ukraine Boats Inc. is a premier manufacturer and supplier of high-quality boats and maritime solutions based in Odessa, Ukraine. Founded in 2005, the company specializes in recreational, commercial, and luxury vessels, blending traditional craftsmanship with modern technology. It has established a strong market presence in Europe, North America, and Asia, supported by partnerships with distribution companies like Baltic Marine Distributors in Germany, OceanCraft LLC in the USA, and Yokohama SeaTech in Japan.

    The company is organized into several departments, including Engineering, Sales and Marketing, Production, and Customer Service, each with specific responsibilities to ensure efficient operations and customer satisfaction. Ukraine Boats Inc. is committed to sustainability through initiatives like the Green Maritime Initiative, aiming to reduce its carbon footprint by incorporating renewable energy solutions in its fleet.

    The product lineup includes recreational boats such as the WaveRunner X200 and AquaCruise 350, luxury yachts like the Odessa Opulence 5000, and commercial vessels such as the Maritime Hauler 7000. The company also offers customization options, maintenance plans, and a range of accessories to enhance the boating experience.

    With a focus on innovation and quality, Ukraine Boats Inc. has received accolades in various marine design competitions and continues to expand its global footprint while maintaining a commitment to environmental compliance and customer service excellence.
    ========================

    15:57:49.505|Agent: The primary focus of Ukraine Boats Inc. is the manufacturing and supplying of high-quality boats and maritime solutions. Based in Odessa, Ukraine, the company specializes in recreational, commercial, and luxury vessels, combining traditional craftsmanship with modern technology. They are committed to sustainability and have initiatives like the Green Maritime Initiative to reduce their carbon footprint. Their product lineup includes various types of boats, and they also offer customization options and maintenance plans.
    Insert your next message: Do you know anything about the government company city solve?
    15:59:02.830|User: Do you know anything about the government company city solve?
    Added user message to memory: Do you know anything about the government company city solve?
    === Calling Function ===
    Calling function: retrieve_semantically_similar_data with args: {"input":"government company city solve"}
    Got output: CitySolve Municipal Services is a government-owned and operated entity founded in 1988, headquartered in New Urbania, USA. Its mission is to ensure a livable, sustainable, and efficient city by providing prompt solutions to residents' everyday needs. The company focuses on various critical service areas, including waterworks and drainage, parks and greenery management, roads and sidewalks, waste management, public safety infrastructure, and urban utilities. CitySolve emphasizes core values such as responsiveness, reliability, accountability, and sustainability, and utilizes modern technologies to enhance its operations and community impact.
    ========================

    15:59:12.693|Agent: CitySolve Municipal Services is a government-owned and operated entity founded in 1988, headquartered in New Urbania, USA. Its mission is to ensure a livable, sustainable, and efficient city by providing prompt solutions to residents' everyday needs. The company focuses on critical service areas such as waterworks and drainage, parks and greenery management, roads and sidewalks, waste management, public safety infrastructure, and urban utilities. CitySolve emphasizes core values like responsiveness, reliability, accountability, and sustainability, and it utilizes modern technologies to enhance its operations and community impact.
    Insert your next message: Thanks
    15:59:28.761|User: Thanks
    Added user message to memory: Thanks
    15:59:30.327|Agent: Didn't find relevant information.

    What we can see, is that for the correct vector search you need to specify the input questions with more details, that can be semantically matched.

    Open source

    As an open source model, we have utilized the meta-llama/Llama-3.2–3B-Instruct. This choice was based on the model latency & performance trade-off. First things first we need to authenticate our HuggingFace account via an access token.

    # Use your token here
    login(token=CFG['configuration']['models']['hf_token'])

    To use the Llama as an LLM inside the LlamaIndex, we need to create a model wrapper. We will use a single NVIDIA GeForce RTX 3090 to serve our Llama 3.2 model.

    SYSTEM_PROMPT = """You are an AI assistant that answers questions in a friendly manner, based on the given source documents. Here are some rules you always follow:
    - Generate human readable output, avoid creating output with gibberish text.
    - Generate only the requested output, don't include any other language before or after the requested output.
    - Never say thank you, that you are happy to help, that you are an AI agent, etc. Just answer directly.
    - Generate professional language typically used in business documents in North America.
    - Never generate offensive or foul language.
    """

    query_wrapper_prompt = PromptTemplate(
    "<|start_header_id|>system<|end_header_id|>n" + SYSTEM_PROMPT + "<|eot_id|><|start_header_id|>user<|end_header_id|>{query_str}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
    )

    llm = HuggingFaceLLM(
    context_window=CFG['configuration']['models']['context_window'],
    max_new_tokens=CFG['configuration']['models']['max_new_tokens'],
    generate_kwargs={"temperature": CFG['configuration']['models']['temperature'], "do_sample": False},
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name=CFG['configuration']['models']['llm_hf'],
    model_name=CFG['configuration']['models']['llm_hf'],
    device_map="cuda:0",
    model_kwargs={"torch_dtype": torch.bfloat16}
    )

    Settings.llm = llm

    QueryEngine

    The interfaces are the same. Example output is below:

    ukraine-boats-3, 0.8536546230316162
    ukraine-boats-4, 0.8363556861877441


    The primary focus of Ukraine Boats Inc. is designing, manufacturing, and selling luxury and eco-friendly boats, with a strong emphasis on customer satisfaction and environmental sustainability.

    Agent

    For the OpenAI models, LlamaIndex has a special agent wrapper designed, but for the open-source models we should use another wrapper. We selected ReActAgent, which iteratively does reasoning and acting until the final response is ready.

    agent_worker = ReActAgentWorker.from_tools(
    [
    QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name=TOOL_NAME,
    description=TOOL_DESCRIPTION,
    return_direct=False,
    )
    ],
    llm=llm,
    verbose=True,
    chat_history=[ChatMessage(content=AGENT_SYSTEM_PROMPT, role="system")]
    )

    # agent runner
    agent = AgentRunner(agent_worker=agent_worker)

    Below is the same discussion but with a different Agent under the hood:

    Insert your next message: Hi
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    16:04:29.117|User: Hi
    Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
    Action: retrieve_semantically_similar_data
    Action Input: {'input': 'hello world', 'num_beams': 5}
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Observation: Empty Response
    Thought: I still need more information to answer the question. The tool did not provide any useful output. I'll try to gather more context.
    Action: retrieve_semantically_similar_data
    Action Input: {'input': 'hello world', 'num_beams': 5}
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Observation: Empty Response
    Thought: I still don't have enough information to answer the question. The tool did not provide any useful output. I'll try to gather more context.
    Action: retrieve_semantically_similar_data
    Action Input: {'input': 'hello world', 'num_beams': 5}
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Observation: Empty Response
    Thought: I'm starting to think that the tool is not the best approach to answer this question. The user's input "hello world" is very general and the tool is not providing any useful output. I'll try to think of a different approach.
    Answer: Hello, how can I assist you today?
    16:04:37.764|Agent: Hello, how can I assist you today?
    Insert your next message: Do you know anything about the city solve?
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    16:05:08.756|User: Do you know anything about the city solve?
    Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
    Action: retrieve_semantically_similar_data
    Action Input: {'input': 'solve city'}
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Observation:

    CitySolve Municipal Services is the lifeline of New Urbania, addressing a wide range of city-level concerns and providing prompt solutions to residents' everyday needs.
    Thought: I can answer without using any more tools. I'll use the user's language to answer
    Answer: CitySolve Municipal Services is a city-level organization that provides solutions to residents' everyday needs in New Urbania.
    16:05:13.003|Agent: CitySolve Municipal Services is a city-level organization that provides solutions to residents' everyday needs in New Urbania.
    Insert your next message: What is the primary focus of Ukraine Boats Inc.?
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    16:05:34.892|User: What is the primary focus of Ukraine Boats Inc.?
    Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
    Action: retrieve_semantically_similar_data
    Action Input: {'input': 'Ukraine Boats Inc.'}
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Observation:

    Ukraine Boats Inc. is a premier manufacturer and supplier of high-quality boats and maritime solutions based in Odessa, Ukraine. The company prides itself on blending traditional craftsmanship with modern technology to serve clients worldwide. Founded in 2005, the company has grown to be a leader in the boating industry, specializing in recreational, commercial, and luxury vessels.

    The company has successfully delivered a range of boats and solutions to various clients, including Blue Horizon Fisheries, Azure Seas Luxury Charters, Coastal Safety Patrol, EcoTrade Logistics, Team HydroBlitz Racing, and Paradise Resorts International. These clients have reported significant benefits from working with Ukraine Boats Inc., including increased efficiency, reduced costs, and enhanced customer satisfaction.

    Ukraine Boats Inc. offers a range of products and services, including luxury yachts, commercial boats, and accessories. The company's products are designed to meet the specific needs of each client, and its team of experts works closely with clients to ensure that every boat is tailored to their requirements.

    Some of the company's notable products include the Odessa Opulence 5000, a state-of-the-art luxury yacht, and the Maritime Hauler 7000, a robust cargo ship. The company also offers boat customization packages, annual maintenance plans, and other services to support its clients' needs.

    Overall, Ukraine Boats Inc. is a trusted and reliable partner for clients seeking high-quality boats and maritime solutions.
    Thought: I can answer without using any more tools. I'll use the user's language to answer
    Answer: Ukraine Boats Inc. is a premier manufacturer and supplier of high-quality boats and maritime solutions based in Odessa, Ukraine, blending traditional craftsmanship with modern technology to serve clients worldwide.
    16:05:53.311|Agent: Ukraine Boats Inc. is a premier manufacturer and supplier of high-quality boats and maritime solutions based in Odessa, Ukraine, blending traditional craftsmanship with modern technology to serve clients worldwide.
    Insert your next message: Do you know anything about the government company city solve?
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    16:06:09.949|User: Do you know anything about the government company city solve?
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Thought: The current language of the user is English. I need to use a tool to help me answer the question.
    Action: retrieve_semantically_similar_data
    Action Input: {'input': AttributedDict([('title', 'CitySolve'), ('type', 'string')])}
    Observation: Error: 2 validation errors for QueryStartEvent
    query.str
    Input should be a valid string [type=string_type, input_value=AttributedDict([('title',...'), ('type', 'string')]), input_type=AttributedDict]
    For further information visit https://errors.pydantic.dev/2.9/v/string_type
    query.QueryBundle.query_str
    Field required [type=missing, input_value=AttributedDict([('title',...'), ('type', 'string')]), input_type=AttributedDict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Observation: Error: Could not parse output. Please follow the thought-action-input format. Try again.
    Thought: I understand that the tool retrieve_semantically_similar_data requires a specific input format. I will make sure to follow the correct format.
    Action: retrieve_semantically_similar_data
    Action Input: {'title': 'CitySolve', 'type': 'string'}
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    Observation:

    CitySolve Municipal Services is a government-owned and operated company that serves as the backbone of New Urbania's civic infrastructure, addressing a wide range of city-level concerns.
    Thought: I can answer without using any more tools. I'll use the user's language to answer
    Answer: CitySolve Municipal Services is a government-owned and operated company that serves as the backbone of New Urbania's civic infrastructure, addressing a wide range of city-level concerns.
    16:06:17.799|Agent: CitySolve Municipal Services is a government-owned and operated company that serves as the backbone of New Urbania's civic infrastructure, addressing a wide range of city-level concerns.
    Insert your next message: Thanks
    Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
    16:06:34.232|User: Thanks
    Thought: I can answer without using any more tools. I'll use the user's language to answer
    Answer: CitySolve Municipal Services is a government-owned and operated company that serves as the backbone of New Urbania's civic infrastructure, addressing a wide range of city-level concerns.
    16:06:35.734|Agent: CitySolve Municipal Services is a government-owned and operated company that serves as the backbone of New Urbania's civic infrastructure, addressing a wide range of city-level concerns.

    As we can see, the agents reason differently. Given the same questions, the two models decided to query the tool differently. The second agent failed with the tool once, but it’s more an issue of the tool description than the agent itself. Both of them provided the user with valuable answers, which is the final goal of the RAG approach.

    In addition, there are a lof of different agent wrappers that you can apply on top of your LLM. They may significantly change a way the model interacts with the world.

    Evaluation

    To evaluate the RAG, nowadays there are a lot of frameworks available. One of them is the TruLens. Overall RAG performance is assessed using the so-called RAG Triad (answer relevance, context relevance, and groundedness).

    To estimate relevances and groundedness we are going to utilize the LLMs. The LLMs will act as judges, which will score the answers based on the information given.

    TruLens itself is a convenient tool to measure system performance on a metric level and analyze the specific record’s assessments. Here is the leaderboard UI view:

    UI leaderboard view of the TruLens framework
    Source: Image created by the author

    Below is the per-record table of assessments, where you can review all the internal processes being invoked.

    Per-record table of assessments, where you can review all the internal processed being invoked. Part of the TruLens UI.
    Source: Image created by the author

    To get even more details, you can review the execution process for a specific record.

    Execution process for a specific record inside the TruLens UI.
    Source: Image created by the author

    To implement the RAG Triad evaluation, first of all, we have to define the experiment name and the model provider. We will utilize the gpt-4o-mini model for the evaluation.

    experiment_name = "llama-3.2-3B-custom-retriever"

    provider = OpenAIProvider(
    model_engine=CFG['configuration']['models']['llm_evaluation']
    )

    After that, we define the Triad itself (answer relevance, context relevance, groundedness). For each metric, we should specify inputs and outputs.

    context_selection = TruLlama.select_source_nodes().node.text

    # context relevance (for each of the context chunks)
    f_context_relevance = (
    Feedback(
    provider.context_relevance, name="Context Relevance"
    )
    .on_input()
    .on(context_selection)
    )

    # groundedness
    f_groundedness_cot = (
    Feedback(
    provider.groundedness_measure_with_cot_reasons, name="Groundedness"
    )
    .on(context_selection.collect())
    .on_output()
    )

    # answer relevance between overall question and answer
    f_qa_relevance = (
    Feedback(
    provider.relevance_with_cot_reasons, name="Answer Relevance"
    )
    .on_input_output()
    )

    Furthermore, we instantiate the TruLlama object that will handle the feedback calculation during the agent calls.

    # Create TruLlama agent
    tru_agent = TruLlama(
    agent,
    app_name=experiment_name,
    tags="agent testing",
    feedbacks=[f_qa_relevance, f_context_relevance, f_groundedness_cot],
    )

    Now we are ready to execute the evaluation pipeline on our dataset.

    for item in tqdm(dataset):
    try:
    agent.reset()

    with tru_agent as recording:
    agent.query(item.get('question'))
    record_agent = recording.get()

    # wait until all the feedback function are finished
    for feedback, result in record_agent.wait_for_feedback_results().items():
    logging.info(f'{feedback.name}: {result.result}')
    except Exception as e:
    logging.error(e)
    traceback.format_exc()

    We have conducted experiments using the 2 models, default/custom query engines, and extra tool input parameters description (ReAct agent struggled without the explicit tool input params description, trying to call non-existing tools to refactor the input). We can review the results as a DataFrame using a get_leaderboard() method.

    Conclusion

    Data -> neo4j -> agent -> rag pipeline
    Source: Image generate by the author using AI (Bing)

    We obtained a private corpus, incorporating GPT models for the custom dataset generation. The actual corpus content is pretty interesting and diverse. That’s the reason why a lot of models are successfully fine-tuned using the GPT-generated samples right now.

    Neo4j DB provides convenient interfaces for a lot of frameworks while having one of the best UI capabilities (Aura). In real projects, we often have relations between the data, and GraphDB is a perfect choice for such use cases.

    On top of the private corpus, we implemented different RAG approaches (standalone and as a part of the agent). Based on the RAG Triad metrics, we observed that an OpenAI-based agent works perfectly, while a well-prompted ReAct agent performs relatively the same. A big difference was in the usage of a custom query engine. That’s reasonable because we configured some specific procedures and thresholds that align with our data. In addition, both solutions have high groundedness, which is very important for RAG applications.

    Another interesting takeaway is that the Agent call latency of the Llama3.2 3B and gpt-4o-mini API was pretty much the same (of course the most time took the DB call, but the difference is still not that big).

    Though our system works pretty well, there are a lot of improvements to be done, such as keyword search, rerankers, neighbor chunking selection, and the ground truth labels comparison. These topics will be discussed in the next articles on the RAG applications.

    Private corpus, alongside the code and prompts, can be found on GitHub.

    P.S.

    I want to thank my colleagues: Alex Simkiv, Andy Bosyi, and Nazar Savchenko for productive conversations, collaboration, and valuable advice as well as the entire MindCraft.ai team for their constant support.


    From Retrieval to Intelligence: Exploring RAG, Agent+RAG, and Evaluation with TruLens was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    From Retrieval to Intelligence: Exploring RAG, Agent+RAG, and Evaluation with TruLens

    Go Here to Read this Fast! From Retrieval to Intelligence: Exploring RAG, Agent+RAG, and Evaluation with TruLens

  • Query structured data from Amazon Q Business using Amazon QuickSight integration

    Query structured data from Amazon Q Business using Amazon QuickSight integration

    Jiten Dedhia

    In this post, we show how Amazon Q Business integrates with QuickSight to enable users to query both structured and unstructured data in a unified way. The integration allows users to connect to over 20 structured data sources like Amazon Redshift and PostgreSQL, while getting real-time answers with visualizations. Amazon Q Business combines information from structured sources through QuickSight with unstructured content to provide comprehensive answers to user queries.

    Originally appeared here:
    Query structured data from Amazon Q Business using Amazon QuickSight integration

    Go Here to Read this Fast! Query structured data from Amazon Q Business using Amazon QuickSight integration

  • Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

    Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

    Meena Menon

    The New Relic AI custom plugin for Amazon Q Business creates a unified solution that combines New Relic AI’s observability insights and recommendations and Amazon Q Business’s Retrieval Augmented Generation (RAG) capabilities, in and a natural language interface for east of use. This post explores the use case, how this custom plugin works, how it can be enabled, and how it can help elevate customers’ digital experiences.

    Originally appeared here:
    Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

    Go Here to Read this Fast! Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

  • Amazon SageMaker launches the updated inference optimization toolkit for generative AI

    Amazon SageMaker launches the updated inference optimization toolkit for generative AI

    Marc Karp

    Today, Amazon SageMaker is excited to announce updates to the inference optimization toolkit, providing new functionality and enhancements to help you optimize generative AI models even faster.In this post, we discuss these new features of the toolkit in more detail.

    Originally appeared here:
    Amazon SageMaker launches the updated inference optimization toolkit for generative AI

    Go Here to Read this Fast! Amazon SageMaker launches the updated inference optimization toolkit for generative AI

  • Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

    Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

    Zach Marston

    In this post, we explore how Syngenta collaborated with AWS to develop Cropwise AI, a generative AI assistant powered by Amazon Bedrock Agents that helps sales representatives make better seed product recommendations to farmers across North America. The solution transforms the seed selection process by simplifying complex data into natural conversations, providing quick access to detailed seed product information, and enabling personalized recommendations at scale through a mobile app interface.

    Originally appeared here:
    Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

    Go Here to Read this Fast! Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

  • Apple wants to cram sensors for everything into Apple Vision Pro

    Apple wants to cram sensors for everything into Apple Vision Pro

    We’ve already got Face ID sensors and a bunch of health sensors in the Apple Watch, but future Apple devices including the Apple Vision Pro will have many more measuring devices, including one for analyzing breathing through your nose. Here’s what else is coming.

    VR headset with reflective lens resting on a concrete surface, against an urban building background.
    Future Apple Vision Pro could contain many more sensors

    It’s not that long ago that sensors were just something to reveal a plot point on the USS Enterprise. But now we have devices that know when we touch them, when we speak to them, and even when we glance in their direction.

    Just going through 2024’s patents and patent applications, though, there can’t be a type of sensor that Apple is not at least investigating. A newly-granted patent regarding a microphone on your nose is just the latest, although it appears to be the first time Apple has expressed interest in nasal passages.

    Continue Reading on AppleInsider | Discuss on our Forums

    Go Here to Read this Fast! Apple wants to cram sensors for everything into Apple Vision Pro

    Originally appeared here:
    Apple wants to cram sensors for everything into Apple Vision Pro