February 27, 2024
Gert Jan Spriensma

Is AI really too expensive for your business case?

Many people know about major players like OpenAI, which release cutting-edge models, but running these models often comes with high costs. Meanwhile, there's an increasing variety of other models emerging, leading to confusion in navigating the expanding landscape.

With Google's recent introduction of the Gemini Ultra model and the subsequent release of their Gemma models as open-source, we recognize the need for some clarity in this rapidly evolving field and connect this to the operational discussion we often have in our work.

3-level product paradigm

In the evolving landscape of large language models (LLMs), we're seeing a three-tier product strategy. The most advanced models, such as GPT-4 and Gemini Ultra, are designed for advanced reasoning capabilities. These models offer amazing capabilities but require significant computational resources and entail higher costs, making them viable for scenarios that translate into high revenues.

Mid-tier, with 7 billion parameters, LLMs cater to business applications and some consumer-level uses on high-end PCs. An example of this is the integration of a 7B model into virtual assistant software, which enables more intuitive interaction and smarter responses for tasks such as scheduling and email management directly from a user's desktop. Gemma 7B provides comparable benefits and functionalities for desktop development, offering developers more flexibility.

At the entry level, models with 2 billion parameters or less are tailored for consumer goods like phones. Utilizing LLMs directly on PCs or phones allows for local data processing. This means data doesn't have to be sent offsite for analysis. Models can be adapted on-site, tap into local resources, or connect with other applications securely, without compromising sensitive information. 

Large models

These are getting the most interest from the media, as they are very powerful and all have more than 70B parameters, making them expensive both to train and to run. Some examples are;

  • GPT4 from OpenAI
  • Claude2 from Anthropic 
  • Gemini Ultra from Google
  • LLama2 from Facebook

There are more, but these are the most known. The LLama2 model of Facebook is particularly interesting as that is an open-source model, that is used by many others to build on. A remarkable project that we came across last week, was Groq, which provides amazing speed for these large models. It is a hardware project that also provides access through an API. You can test it on their website and it works at 300 tokens per second. This can only be done, because the inference (technical term for generating responses or predictions) is much more efficient and costs less compute power, meaning the cost to host a model is lower as well. 

Smaller models

There are hundreds of LLMs in the 7B and 2B classes. The new Gemma models were benchmarked against similar models in the same category and especially benchmarks on math and programming tasks are significantly better than similar-sized models. Still, benchmark scores for coding and math applications are well below human levels.

For other tasks, they score similarly to competitors like Mistral 7B, which is still well below the performance of the large models. So why would you even consider using a model like that?

Well, coming back why we were so excited about Groq project; speed and costs. For many applications, 7B models are more than sufficient, especially when they are trained or fine-tuned at one specific task. This makes them particularly attractive for businesses that need reliable AI capabilities without a hefty price tag. By optimizing for these factors, 7B models can provide a practical solution for a wide range of applications, from automated customer service to content creation, making them a cost-effective choice for companies looking to integrate AI into their operations.

Moreover, the introduction of models like Gemma highlights the ongoing innovation within the field of AI. As these models become more sophisticated, their ability to handle complex tasks improves, bridging the gap toward human-level performance in specific domains. This evolution opens up new possibilities for utilizing AI in areas where accuracy and efficiency are critical, without necessitating the resources required for larger models. In essence, the development of these models represents a significant step forward in making AI more accessible and applicable across various sectors, enabling businesses to leverage the benefits of AI technology in a more scalable and economical manner.

What does it mean

Lately, we have often been in discussions where operational costs are a big topic, which is of course very important. Yet, when looking at the developments in just the last year, we feel it is safe to say that for most applications, costs will significantly decrease. This trend opens up new opportunities for businesses to leverage AI for a broader range of applications without the prohibitive costs that might now be a barrier.

However, to truly capitalize on these impending cost reductions, businesses need to start preparing now. This preparation involves investing in the right technologies and skill sets and developing a strategic plan that incorporates AI into the business model in a way that aligns with future capabilities. It also means staying informed about technological advancements and understanding how they can be applied to solve real-world business challenges.

Strategically, this could involve identifying key areas of the business that stand to benefit the most from AI integration, such as customer service, data analysis, or operational efficiency. By beginning to experiment with AI solutions now, businesses can develop a clearer understanding of their potential impact and the operational changes needed to support them. And things can go fast, for example, Groq now offers 1M tokens for the 70B at just $0.70, compared to ±$1.20 for ChatGPT3.5, which is a 42% cost reduction. The 7B model at Groq is priced at just $0.10 per 1M tokens, which is only 8% of the chatGPT costs. 

We think these are very exciting developments that make AI accessible for so many companies already today.

Come chat with us

Get in touch to find out what your data can do for you. Spoiler alert: it's a lot.

Contact Us

Let's bring your data & AI initiatives to life

The opportunity to let data create true impact in daily work has never been bigger. We're an experience team with roots in AI technology startups. With the right knowledge and hands-on mentality we unlock data initiatives that make a difference.

Our services