OPINION
An analysis of Meta’s open-source large model strategy
Training a large language model can cost millions of dollars. Why would Meta spend so much money training a model and letting everyone use it for free?
This article analyzes Meta’s GenAI and large model strategy to understand the considerations of open-sourcing their large models. We also discuss how this wave of open-source models is similar to and different from traditional open-source software.
DISCLAIMER: Whether the Llama models are genuinely open-source falls outside the scope of this article. All information is from public sources.
The illusion of proprietary models
If Meta open-sources its models, wouldn’t people just build their own services instead of paying for the service (e.g., the chatbot on Meta AI, an API based on Llama, or helping you fine-tune the model and serve it efficiently) provided by Meta?
Preventing people from building their own solutions by keeping the models proprietary is just an illusion. Regardless of whether you open-source your models, others, like Mistral AI, Alibaba, and even Google, open-sourced their models.
For now, OpenAI, Anthropic, and Google have not open-sourced their largest/best models because they still think they are in a realm that no open-source models can reach regarding capabilities and quality. Open-sourcing their models would hurt their business.
Unless your model is better than any other open-source models by several orders of magnitude, whether you open-source your model wouldn’t affect the quality of the applications the users can build upon open-source models.
Your only choices are to be the first and the leader of open-source models or to be a follower by releasing your models later.
Why be the leader of open-source models?
Being the leader of open-source models has many benefits, but the most important is attracting talent.
The war of GenAI is a talent competition bottlenecked by computing power. How much computing power you get largely depends on the cash flow relationship with Nvidia, except Google. However, how many talents you have is another story.
According to Elon Musk, Google had two-thirds of the AI talent, and to counter Google’s power, they founded OpenAI. Then, some of the best people left OpenAI and founded Anthropic to focus on AI safety. So, these three companies have the best and the most AI experts right now in the market. Everyone else is super hungry for more AI experts.
Being the leader of open-source models would help Meta bridge this gap of AI experts. Open-source models attract talent in two different ways.
First, the AI experts want to work for Meta. It is super cool to have the whole world use the model you built. It gives you so much exposure for your work, amplifies your professional impact, and benefits your future career. So, many talented people would like to work for them.
Second, the AI experts in the community do the work for Meta for free. Right after the release of Llama, people started to experiment with it. They help you develop new serving technologies to reduce costs, fine-tune your models to discover new applications and scrutinize your model to discover vulnerabilities to make it safer. For example, according to this article, they did instruction tuning, quantization, quality improvements, human evals, multimodality, and RLHF for Llama within a month after its initial release. Delegating this work to the community saves Meta huge amounts of computing and human resources.
Iterate fast with the community.
With open-source models, Meta can iterate quickly with the community by directly incorporating their newly developed methods.
How much would it cost Google to adopt a new method from the community? The process consists of two phases: implementation and evaluation. First, they need to reimplement the method for Gemini. This involves rewriting the code in JAX, which requires a fair amount of engineering resources. During the evaluation, they need to run a list of benchmarks on it, which requires a lot of computing power. Most importantly, it takes time. It stopped them from iterating on the latest technologies when they were first available.
Conversely, if Meta wants to adopt a new method from the community, it will cost them nothing. The community has done the experiments and benchmarks on the Llama model directly, so not much further evaluation is needed. The code is written in PyTorch. They can just copy and paste it into their system.
Llama built a flywheel between Meta and the community. Meta brings in the latest technology from the community and rolls out its next-generation model to the community. PyTorch is the common language they speak.
Can they still make money?
The model is open-source. Wouldn’t people just build their own service? Why would they want to pay Meta for a service built on an open-source model? Of course, they will. The service is difficult to build even with an open-source model.
How do you fine-tune and align the model to your specific application? How do you balance between the service cost and the model quality? Are you aware of all the tricks to fully utilize your GPUs?
The people who know the answers to these questions are expensive to hire. Even with enough people, it is hard to get the computing power to fine-tune and serve the model. Imagine how hard it is to build Meta AI from the open-source Llama model. I would expect hundreds of employees and GPUs to be involved.
So, it is likely that people will still pay for Meta’s GenAI service if they have any in the future.
It’s just like open-source software, but not quite.
The situation is very similar to traditional open-source software. The “free code paid service” framework still applies. The code or the model is free to attract more users to the ecosystem. With a larger ecosystem, the owner collects more benefits. The service built upon the free code is for profit.
However, it is also NOT like open-source software. The main difference can be summarized as low user retention and a new type of ecosystem.
Low user retention
Open-source models have lower user retention. Migrating to a new model is much easier than to new software.
It is hard to migrate software. PyTorch and HuggingFace have established a strong ecosystem for deep learning frameworks and model pools. Imagine how hard it would be to shift their dominance even slightly if you created a new deep learning framework or model pool to compete with them.
A good example is JAX. It has better support for large-scale distributed training, but it is hard to onboard users to JAX because it has a smaller ecosystem and community. It lacks a helpful community to support users with issues. Moreover, the engineering cost of migrating the entire infra to a new framework is too high for most companies.
Open-source models do not have these problems. They are easy to migrate and require almost no user support. Therefore, it is easy for people to shift to the latest and best models. To maintain your leadership in open-source models, you must constantly release new models at the top of the leaderboard. This is also noted as a downside or challenge to be the leader in open-source models.
A new type of ecosystem
Open-source models create a new type of ecosystem. Unlike open-source software, which creates ecosystems of contributors and new software built upon them, open-source models create ecosystems of fine-tuned and quantized models, which can be seen as forks of the original model.
As a result, an open-source foundational model doesn’t have to be super good at every specific task because users would fine-tune it for their applications with domain-specific data. The most important feature of a foundational model is to meet the deployment requirements of the users, such as low latency in inferencing or being small enough to fit an end device.
This is why Llama has multiple sizes for each version. For example, Llama-3 has three versions: 8B, 70B, and 400B. They want to ensure they cover all deployment scenarios.
Summary
Even if Meta did not open-source their model, others would. So, it would be wise for Meta to open-source it early and lead the open-source models. Then, Meta can iterate quickly with the community to improve its models and catch up with OpenAI and Google.
When open-sourcing your model, there is no need to worry about people not using your service since there is still a huge gap between the foundational model and a well-built service.
Open-source models are similar to open-source software in that they all follow the “free code paid service” framework but differ in user retention rate and the type of ecosystem they create.
In the future, I would expect to see more open-source models from more companies. Unlike the deep learning frameworks converged on PyTorch, open-source models will remain diverse and competitive for a long time.
Llama Is Open-Source, But Why? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Llama Is Open-Source, But Why?