Generative AI has massive implications for business leaders—and many companies have already gone live with generative AI initiatives. In some cases, companies are developing custom generative AI model applications by fine-tuning them with proprietary data.
The benefits businesses can realize utilizing generative AI include:
- Expanding employee’s productivity
- Personalizing customer experience
- Accelerating R&D through generative design
- Emerging new business models
Generative AI tools will create new services and solutions. Technology leaders can leverage them to enter new markets gaining market share at the expense of incumbents. Generative AI models, also called generative models, will bring new automation opportunities with the potential to increase customer satisfaction or reduce costs. While most firms may not need to build their models, most large enterprises are expected to build or optimize one or more generative AI models specific to their business requirements within the next few years. Finetuning will enable businesses to customize model output in detail for their own domain, generating higher levels of accuracy. Firms like Bloomberg are generating world-class performance by building their own generative AI tools leveraging internal data.
At a minimum an enterprise generative AI model should be:
- Consistent – Most current LLMs can provide different outputs for the same input. This limits the reproducibility of testing which can lead to releasing models that are not sufficiently tested.
- Controlled – Be hosted at an environment (on-prem or cloud) where enterprise can control the model at a granular level. The alternative is using online chat interfaces or APIs like OpenAI’s LLM APIs. The disadvantage of relying on APIs is that the user may need to expose confidential proprietary data to the API owner. This increases the attack surface for proprietary data. Global leaders like Amazon and Samsung experienced data leaks of internal documents and valuable source code when their employees used ChatGPT.
- Copyrighted – Ethically trained Model should be trained on ethically sourced data where Intellectual Property (IP) belongs to the enterprise, or its supplier and personal data is used with consent. Generative AI IP issues, such as training data that includes copyrighted content where the copyright doesn’t belong to the model owner, can lead to unusable models and legal processes. Use of personal information in training models can lead to compliance issues.
How can Enterprises build Foundation Models?
There are 3 approaches to build your firms’ LLM infrastructure on a controlled environment.
- Build Your Own Model (BYOM) – Allows world-class performance costing a few million $ including computing (1.3M GPU hours on 40GB A100 GPUs in case of BloombergGPT) and data science team costs.
- Fine-tuning is a cheaper machine learning technique for improving the performance of pre-trained large language models (LLMs) using selected datasets. Instruction fine-tuning was previously done with large datasets but now it can be achieved with a small dataset (e.g., 1,000 curated prompts and responses in case of LIMA). The importance of a robust data collection approach optimizing data quality and quantity is highlighted in early commercial LLM fine-tuning experiments. Compute costs in research papers have been as low $100 while achieving close to world-class performance. Model fine-tuning is an emerging with domain with new approaches like Inference-Time Intervention (ITI), an approach to reduce model hallucinations, being published every week.
- Reinforcement Learning from Human Feedback (RLHF): A fine-tuned model can be further improved by human in the loop assessment.
Given the high costs involved in BYOM, we recommend businesses to initially use optimized versions of existing models. Language model optimization an emerging domain with new approaches being developed on a weekly basis. Therefore, businesses should be open to experimentation and be ready to change their approach.
Which Models should Enterprises use to train Cost-Effective Foundation Models?
Machine learning platforms released foundation models with commercial licenses relying mostly on text on the internet as the primary data source. These models can be used as base models to build enterprise large language models:
- BLOOM by huggingface with RAIL license which only restricts potentially harmful uses.
- Falcon LLM, developed by Technology Innovation Institute (TII) in Abu Dhabi, comes with a commercial license leads Hugging Face’s LLM benchmark as of May/2023.
- Dolly 2.0 instruction tuned by Databricks based on EleutherAI’s pythia model family.
- Open source RWKV-4 “Raven” models
- Eleuther AI Models
What is the right Tech Stack for Building Large Language Models?
Generative AI is an artificial intelligence technology and large businesses have been building AI solutions for the past decade. Experience has shown that leveraging Machine Learning Operations (MLOps) platforms significantly accelerate model development efforts. In addition to their MLOps platforms, enterprise organizations can rely on a growing list of Large Language Model Operations (LLMOps) tools and frameworks like Langchain, Semantic Kernel or watsonx.ai to customize and build their models, AI risk management tools like Nemo Guardrails. In early days of new technologies, we recommend executives to prioritize open platforms to build future-proof systems. In emerging technologies, vendor lock-in is an important risk. Businesses can get stuck with outdated systems as rapid and seismic technology changes take place. Finally, data infrastructure of a firm is among the most important underlying technologies for generative AI.
How to Evaluate Large Models’ Performance?
Without measurement of effectiveness, the value of generative AI efforts cannot be quantified. However, LLM evaluation is a difficult problem due to issues in benchmark datasets, inconsistency of human reviews and other factors. We recommend an iterative approach that increases investment in evaluation as models get closer to be used in production –
- Use benchmark test scores to prepare shortlists. This is available publicly for a large number of open-source models.
- Rely on Elo scores, used in ranking players in zero-sum games like chess, compare the models to be selected. If there are higher performing models which are not available to be used (e.g., due to licensing or data security issues), they can be used to compare the responses of different models. If such models are not available, domain experts can compare the accuracy of different models.
Aligning Gen AI investments to Business Outcomes
AI value framework is grounded in three key business outcomes: improving operational efficiency, enhancing experiences, and accelerating business transformation.
- Improved Operational Efficiencies: The ability of an organization to effectively and efficiently utilize its resources (time, people, capital) to achieve its goals.
- Enhanced Experiences: Delivering delightful, optimal experiences across your employees, partners, and customers.
- Accelerated Business Transformation: Fundamentally changing how your organization operates to improve its performance and competitiveness.