Use small language models to deploy AI cost-effectively

AI is disrupting the technology industry. There is a lot of talk about artificial general intelligence (AGI) and its ability to replace humans. Whether the future is a decade or a year away, many teams need help getting the most out of AI.

Only a handful of companies maintain the LLMs we know of – GPT, Claude, Bard, LaMDA, LLaMA, etc. – because the resources required to train them are prohibitively expensive. LLMs are trained on enormous data sets.

These models are just the beginning. They provide an incredible platform for building a more effective and tailored solution: small language models (SLMs) trained on your specific data.

What makes an SLM small?

In short, the number of parameters. To understand the value of an SLM for real-world applications, you need to understand the verbosity of an LLM. OpenAI’s GPT-3 has 175B parameters and Meta’s Llama 3.1 has a version with 405B parameters. But what does that mean?

LLMs use, interpret, and generate human language using the Transformer model to tokenize and analyze data using parameters. If you’ve done any reading, you’ll probably notice that “token” and “parameter” are used interchangeably, but they are different.

Tokens are discrete units of data for LLMs. In the example below, each word is tokenized by the LLM. Depending on the model, tokens can be words, phrases, characters, etc. Tokens allow the LLM to segment data and evaluate it efficiently. For example, an LLM can interpret the word “cats” the same as “cat” to standardize the information.

Simply put, parameters are the rules – the weights and biases – that the LLM uses to evaluate data. Parameters allow the LLM to emphasize certain words more to provide context and meaning. Parameters also link words; in the example below, “future” and “it’s” refer to the same thing.

You’re probably wondering, “Are more parameters better?” Well, like everything in engineering, it depends. If you need to hang a picture on the wall, is any tool at Home Depot better than a hammer and nail?

LLMs are incredible technological achievements and their ability to process massive amounts of information better and faster is improving every day. However, the cost and time required to train and fine-tune an LLM is out of the question for most companies. It is simply too much. Most companies don’t need an all-in-one tool, they need a specialized tool for a specific task.

This is where SLMs shine.

Training the model using your data

While training LLMs requires significant cloud resources, training SLMs uses proprietary data and is computationally and cost efficient.

Imagine you are a government contractor responding to requests for proposals (RFPs) to win contracts. Typically, a team will review these RFPs, manually collect the information needed to respond, answer detailed questions about how your company will meet the contract’s requirements, and write a complete proposal, including the required job descriptions and the appropriate government codes for those positions.

The RFPs are never publicly disclosed, meaning an LLM cannot be trained on them. Additionally, the hundreds, if not thousands, of proposals your company has written are copyrighted.

Imagine if you could train an SLM with all your proprietary data and have the SLM create detailed proposals on your behalf. Can you imagine how much time your team would save? You can do this by starting with a base model like Llama 3.1 and optimizing the SLM based on the previous RFPs and corresponding proposals. You can also use a tool like Arcee.AI.

In both cases, to get the most out of your SLM, you should perform four key steps: 1/ continuous pre-training, 2/ alignment, 3/ model merging, 4/ retrieval augmented generation (RAG), and 5/ continuous adaptation.

Understanding the steps to train an SLM

Imagine our little language model is Dominique, a sophomore in high school. Pre-training includes everything Dominique has learned in previous years – math, science, languages, sports, art – everything. Model merging is done by pairing Dominique, who excels in math, with Asma, who excels in science, and having them study and test together for the rest of the year. Even though they excel in one subject, they will excel in two subject areas.

In terms of alignment and fine-tuning, instructional alignment (the first part of alignment) can be described as the instruction Dominique receives in her sophomore year of college. The critique phase (the second part of alignment) is the feedback Dominique receives on her homework. RAG is like giving Dominique an open-book test; she can look up relevant information that will help her get a better grade. Finally, continuous adjustment updates Dominique’s knowledge when information changes (e.g., Pluto is no longer a planet) so that she has the most recent and current information.

Implementing your model

In the government contractor example, they want to build an SLM for writing bids. The developers would use a smaller open source model like Llama in one of its smaller versions (70B or 8B parameters) and train it with the proprietary data of their previous bids, previous RFPs, and any other relevant text data.

This model can then be merged using an open source tool – either using a more general model that specializes in language, or using another domain-specific model. For example, if there is a model that specializes in writing proposals for the army (using technical jargon and terms) and another model that specializes in writing proposals for building missiles, they can be merged to write highly specialized and precise proposals for building army missiles. Remember that models can only be merged if they have the same architecture and size.

From there, they want to adjust the newly merged model to ensure it produces the desired results. This includes providing examples of what is expected and interacting with the model to test whether it generates the desired type of content.

Although tools like Arcee.AI can achieve the same results without RAG, if you’re building from scratch, you can use a RAG layer to precisely retrieve specific information and generate more accurate text or retrieve data in real time. For example, government job titles would be excellent data to store in the RAG layer.

Finally, just like people, an SLM is constantly evolving and learning. Once deployed, models can be updated as business data and requirements change. Depending on the frequency of your new data, plan to retrain your models every six to twelve months.

Making the most of AI

LLMs only take you so far and don’t offer any real market differentiation. After all, you’re using the same data as everyone else – general information collected from (usually open source) data.

SLMs are a much more cost-effective approach that allows companies to adapt models to their proprietary data in safe environments. Not to mention, SLMs are kinder to the planet as they use significantly fewer compute resources and are much more environmentally friendly. The level of responsiveness and adaptability that SLMs offer is unmatched with current generative AI technology. It offers the ultimate way to use generative AI to improve your business.

Emily Freeman is the bestselling author of DevOps for Dummies and a successful speaker who educates executives and engineers on the best approaches to AI, DevOps, and cloud engineering. She is the co-founder of Freeman & Forrest, the first influencer relations agency…

What makes an SLM small?

Training the model using your data

Understanding the steps to train an SLM

Implementing your model

Making the most of AI

Leave a Reply Cancel reply