The state-of-the-art multi-modal LLMs are primarily based on existing LLM architectures, with modifications specifically addressing different sources of input, and that’s where the difficulty comes from. The latest Nvidia paper divides the commonly used multi-modal architectures into two categories:
decoder-based;
cross-attention-based.
One of my previous medium articles discussed the latest paper from Meta, using decoder-based architecture, which converts an input image into a latent vector using a VAE encoder to address the issue that the image space is continuous and different from the discrete text space.
However, the problem with cross-attention-based architecture is different. For example, in the multi-modal LLM model Flamingo, the critical issue is converting the vision embedding from a generic vision model of varying temporal and spatial dimensions into the cross-attention layer to match the language input dimension.
In this post, I will dive deep into Flamingo’s unique design on top of the vision encoder, the Perceiver Resampler, to explain how this issue was solved. Furthermore, I will explore the Perceiver Resampler’s origin — the Induced Set Attention Block from Set Transformer, which further inspired DeepMind’s Perceiver model for learning fixed-length latent embeddings from generic input data.
MAB(X, Y) is the transformers’ original multi-head attention block, where query = X, key/value = Y. The ISAB block is almost identical to two stacked multi-head attention blocks, except that the input key/value is replaced by the inducing matrix I. The original set X is of dimension N*D, and I is of dimension M*D, representing M 1*D inducing points. A visualization is shown below.
Note that the design of the ISAB is to save computational cost. The reason is that the M could be much smaller than the original N dimension, which makes the time complexity of ISAB O(N*d) much smaller than the original self-attention complexity O(N**2*d).
Perceiver
Inspired by the use of inducing points as query matrix from Set Transformer, the Perceiver model, proposed by DeepMind, separated the query matrix as a short sequence of learnable latent embeddings (e.g., N=512) while the key and value pair to be a byte array that is an ultra-long sequence input (e.g., M=224*224 pixels).
The cross attention is borrowed from the decoder part of the original transformer, where the query and key/value come from different sources, and in this case, unlearnable representations:
Multi-head attention and cross attention. Image by author.
Since K and V are input “constants,” the Perceiver transformer layer computational complexity becomes only relative to the latent space, which is O(N**2), and is also called a latent transformer. Decoupled from the input size, the latent transformers could quickly scale up to 48 layers, which is a great advantage over traditional transformer designs.
Flamingo’s Vision Encoder and Perceiver Resampler
Instead of applying the Perceiver directly, Flamingo first uses a pre-trained, CNN-based, weight-frozen Normalizer-Free ResNet (NFNet) to extract image/video features, then adds a learnable temporal positional embedding and flattens them to the 1D sequence. The Perceiver Resampler is attached to the vision encoder to learn a fixed-size latent embedding before being passed into the cross-attention layer of the leading architecture.
Like DeepMind’s Preceiver model, the Percerver Resampler uses constant input embeddings as keys/values and the learnable latent vectors as queries. Note that no spatial encoding is used here, and the rationale is that the previous vision encoder, NFNet, is a convolution-based model with spatial information embedded in the channel information. To increase performance, the learnable vectors are concatenated to the key/value vectors in the cross-attention computation.
This article gives a detailed walk-through of the vision encoder part of the Flamingo architecture. The vision encoder has a unique design, the Perceiver Resampler, which originated from the Set Transformer and the Perceiver model and could minimize the cross-attention computation cost while leveraging information from both the spatial and temporal domains.
References
Dai et al., NVLM: Open Frontier-Class Multimodal LLMs. arXiv 2024.
Zhou et al., Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model. arXiv 2024.
Alayrac et al., Flamingo: a Visual Language Model for Few-Shot Learning. NeurIPS 2022.
Jaegle et al., Perceiver: General Perception with Iterative Attention. ICML 2021.
Brock at al., High-Performance Large-Scale Image Recognition Without Normalization. arXiv 2021.
Lee et al., Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks. ICML 2019. Slides
Vaswani et al., Attention Is All You Need. NeurIPS 2017.
Amazon’s Prime Day sale includes special offers on the Samsung M8 monitor and the Apple Studio Display.
Samsung’s M8 32-inch display is now on sale for Amazon’s Prime Day.
Now is the time to buy a new monitor for your Mac or iPad Pro, since Amazon has two particularly special discounts running during its Prime Day sale. The best is surely the Samsung M8 32-inch 4K monitor, which is now priced even lower than during the sales timed to coincide with the September Apple Event.
Buying the standard iPad and this ZDNET-recommended case can save you big money compared to paying for a high-end iPad Air or Pro and the official Apple keyboard.
We’ve got the best Prime Big Deal Days headphone deals right here, including offers for pairs made by the top brands. You need to hurry if you want the savings.
The October Prime Day sale at Amazon kicked off with some excellent deals on TVs and there’s one in particular we wanted to call out: The Frame. Prime Day deals have brought Samsung’s set down to a new record low for the 55-inch model with an included set of bezels. The bundle is currently $978 after a huge, $668 price cut. If you’ve been thinking about a TV that looks more like art when you’re not watching — instead of a black mirror — this might be the time to dive in. Other sizes are on sale as well.
For the uninitiated, The Frame TV is one-part television and one-part artistic canvas. When it’s not being used to binge shows and movies, it can display art prints. This is the perfect box for those who want to watch TV once in a while but don’t want a giant contraption taking up the entire living room. Most visitors won’t even know it’s a TV unless it’s pointed out.
We’ve long sung the praises of Samsung’s The Frame TV. It boasts picture-frame edges and an ultra-thin bezel, to help with the illusion. The TV also mounts flat against the wall, so it can be placed just about anywhere. There isn’t even a large and ornery power cable. The TV connects via a thin wire that leads to an external receiver/port hub, which in turn goes to the power outlet. This wire is small enough to easily hide behind a plant or something, but it can also be dropped behind drywall and popped back out somewhere else.
As for TV specs, this is a 4K QLED panel with a 120Hz refresh rate in TV mode. This refresh rate drops to 60Hz when in canvas mode because, well, there’s not that much to refresh. The matte display also limits light reflection, enhancing screen visibility in both TV and canvas mode.
There are two caveats here. This sale is only for the 55-inch model, so the larger TVs will still break the bank. Also, the deal applies to just the version with the white bezel, which may not perfectly suit the aesthetics of every consumer.
Follow @EngadgetDeals on Twitter for the latest tech deals and buying advice, and stay tuned to Engadget.com for all of the best tech deals coming out of October Prime Day 2024.
This article originally appeared on Engadget at https://www.engadget.com/this-prime-day-samsung-frame-tv-deal-is-down-to-a-record-low-price-on-amazon-103016451.html?src=rss
Artificial intelligence is expected to have an impact on the upcoming US election in November. States have been trying to protect against misinformation by passing laws that require political advertisements to disclose when they have used generative AI. Twenty states now have rules on the books, and according to new research, voters have a negative reaction to seeing those disclaimers. That seems like a pretty fair response: If a politician uses generative AI to mislead voters, then voters don’t appreciate that. The study was conducted by New York University’s Center on Technology Policy and first reported by The Washington Post.
The investigation had a thousand participants watch political ads from fictional candidates. Some of the ads were accompanied by a disclaimer that AI was used in the creation of the spot, while others had no disclaimer. The presence of a disclaimer was linked to viewers rating the promoted candidate as less trustworthy and less appealing. Respondents also said they would be more likely to flag or report the ads on social media when they contained disclaimers. In attack ads, participants were more likely to express negative opinions about the candidate who sponsored the spot rather than the candidate being attacked. The researchers also found that the presence of an AI disclaimer led to worse or unchanged opinions regardless of the fictional candidate’s political party.
The researchers tested two different disclaimers inspired by two different state requirements for AI disclosure in political ads. The text tied to Michigan’s law reads: “This video has been manipulated by technical means and depicts speech or conduct that did not occur.” The other disclaimer is based on Florida’s law, and says: “This video was created in whole or in part with the use of generative artificial intelligence.” Although the approach of Michigan’s requirements is more common among state laws, study participants said they preferred seeing the broader disclaimer for any type of AI use.
While these disclaimers can play a part in transparency about the presence of AI in an ad, they aren’t a perfect failsafe. As many as 37 percent of the respondents said they didn’t recall seeing any language about AI after viewing the ads.
This article originally appeared on Engadget at https://www.engadget.com/ai/viewers-dont-trust-candidates-who-use-generative-ai-in-political-ads-study-finds-194532117.html?src=rss
Amazon’s October Prime Day sale kicked off today, bringing a wide range of discounts on gadgets and gear we recommend. We have a roundup with all of the offers worth your attention, but if you’re specifically looking to grab a new laptop, one of the event’s best Apple deals cuts the entry-level M2 MacBook Air down to $749. That’s $50 below the notebook’s usual street price in recent months, $250 less than buying from Apple directly and a record low for what we consider the best budget MacBook on the market.
In our initial M2 MacBook Air review, we were impressed by the laptop’s thinner design, gorgeous 13.6-inch display, great quad-speaker setup and the M2 chip’s excellent performance. It had been our top pick for the best MacBook, period, but the new M3 model has taken that top slot. However, the M2 Air doesn’t skimp — those on a budget (or anyone simply looking to save some cash) will still get a lot of laptop and a lot of power choosing this machine.
One could argue, and our Daniel Cooper did, that the best thing about the M3 MacBook Air was the price drop given to the M2 Air after its launch. The M3 chip is pretty similar to the M2, and while there’s no doubt that those who want the latest and greatest should get an M3 machine, an M2 laptop will be more than enough for most people using it as a daily driver. And, when you consider the M2 started at $1,200 when it first came out in 2022, it makes this discount even more compelling (it only received a price drop to $1,000 after the M3’s debut).
There are other discounts on the MacBook lineup at Amazon at the moment, too. The M3 MacBook Air is $250 off and down to $849, which is only $50 more than its record-low price. The 15-inch MacBook with an M3 chip is $255 off and on sale for $1,044.
Follow @EngadgetDeals on Twitter for the latest tech deals and buying advice, and stay tuned to Engadget.com for all of the best tech deals coming out of October Prime Day 2024.
This article originally appeared on Engadget at https://www.engadget.com/prime-day-laptop-deals-include-the-m2-macbook-air-for-a-record-low-of-749-on-amazon-121848050.html?src=rss
Uber has come up with a relatively low-cost way of getting to and from a New York City airport: a shuttle bus. Starting today, the company is offering rides between LaGuardia Airport and transit hubs in Manhattan for $18 a pop. For the first month of the service, Uber is offering half-price rides for $9, The Wall Street Journal reports.
This would be far cheaper than a cab for a solo traveler. It’s also more expensive, but perhaps less of a hassle, than taking public transit — there’s a free shuttle between the airport and the subway.
One route will take passengers between Penn Station and the airport, and the other will run between Port Authority, Grand Central Terminal and LaGuardia. If you’re Manhattan-bound, you’ll still need to make your way to your home, hotel or Airbnb after you get to the drop-off point.
The vans can transport 14 passengers at a time. The service will run between 5AM and 10:45PM ET every day with trips leaving every half hour or so. You can book a spot in a shuttle up to seven days in advance and bring a personal item and a 50-pound bag on board. Before you get on the van, you’ll need to show the driver a QR code and PIN that Uber sends you. An Uber shuttle-fleet partner called EPS is operating the rides, but the shuttles have Uber branding.
Uber shuttles have been available in various locations since 2019, but this is the first time the company is offering such trips to and from an airport. Earlier this year, Uber started running shuttles to and from concerts and sports games. It plans to offer shuttles to more airports in the coming months and years.
The company announced the service as part of its Go-Get Zero event, at which it highlighted some new sustainability efforts. Among those is a new EV-only option that will debut in 40 cities in which Uber has enough electric vehicle drivers available.
This article originally appeared on Engadget at https://www.engadget.com/transportation/uber-starts-offering-18-shuttle-rides-between-manhattan-and-laguardia-airport-193520618.html?src=rss
Now that October Prime Day is here, we’re finally seeing the full extent of the deals Amazon has in store. Luckily for any Lego maniacs, those discounts apply to many brick and minifigure-filled sets. Yes, it’s early to think about holiday shopping, but these kits make amazing presents — for the young and full-grown adults alike. Because we are Engadget, we’re focusing on Lego sets from the Super Mario, Star Wars and Harry Potter lineups. Some of which are on sale for up to 41 percent off. There are even steeper savings on general Lego sets as well. Here are the best Prime Day deals on Lego sets.
Prime Day deals on Star Wars Lego sets
On the Star Wars side of things, this Spider Tank set is 36 percent off and down to only $32, which is the lowest it’s ever been. It includes 526 pieces that replicate the spider tank from season three of The Mandalorian, plus three minifigures: Din Djarin, Bo-Katan Kryze and Grogu. Once built, the spider tank has grabbing claws, flexible legs and a little cockpit in which one of the figures can sit. Also on sale is this Boarding The Tantive IV set in which you recreate the iconic scene from Star Wars: A New Hope. That will set you back $44, which represents a 20-percent discount.
Prime Day deals on Mario Lego sets
In the Mario space, this Dixie Kong’s Jungle Jam expansion set has the biggest discount: 41 percent off and down to $16. It has 174 pieces along with buildable Dixie Kong and Squawks figures that both come with musical accessories. Mario fans who are old enough to have a work-from-home setup might appreciate this displayable Piranha Plant set that would look great in the background of any video conference call. It’s 20 percent off and down to $48.
Prime Day deals on Harry Potter Lego sets
Rounding things out with Harry Potter sets, this Hogwarts Castle and Grounds set is down to $136 and has never been cheaper. It includes 2,660 pieces that create a final product that’s over eight inches high, 13 inches wide and 10 inches deep. Plus, it comes with a cute, golden Hogwarts architect statue minifigure.
Prime Day deals on classic Lego sets
If you’re looking for more general Lego sets, two of the best deals we found are on the Classic Medium Creative Brick Box, down to $19, and the Lego City 2024 advent calendar, down to $26. The former includes 484 pieces in all different sizes and colors, and would make a great gift for anyone who just likes to build with Lego without following a set of instructions. As for the latter, you probably know someone who loves a good advent calendar this time of year, and this Lego one has 24 surprise gifts that include seasonal minifigures, mini builds and more.
Follow @EngadgetDeals on Twitter for the latest tech deals and buying advice, and stay tuned to Engadget.com for all of the best tech deals coming out of October Prime Day 2024.
This article originally appeared on Engadget at https://www.engadget.com/prime-day-lego-deals-are-up-to-41-percent-off-for-super-mario-and-star-wars-sets-143008653.html?src=rss
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.