Introduction:

In today’s tech landscape, nearly every product comes with a “.ai” tag, proudly showcasing the integration of Gen AI features. But behind the shiny demos and flashy features lies a harsh reality: while these innovations promise to revolutionise, their journey from concept to production is often dragged with significant delays. Worse yet, products rushed to market frequently carry technical debt that can harm both business performance and enterprise reputation. So, why does this happen?

The root of the problem lies in the core nature of Gen AI—it’s based on probabilities. At its heart, roughly, a Large Language Model (LLM) predicts the next word based on previous ones (Imagine a more advanced Markov Chain that uses all previous words and additional context, not just the last state). This probabilistic approach works well for creative tasks where flexibility is key. But in the realm of enterprise software, where accuracy and consistency are critical, this probabilistic approach often clashes with the deterministic requirements of business systems.

In this blog, we’ll explore strategies to bridge this gap, ensuring that businesses can harness the power of Gen AI without sacrificing the reliability they depend on. We’ll delve into the critical role of data, the importance of domain knowledge, the nuances of prompt engineering, and the practical applications of OpenAI's latest features.

Gen AI is Not Magic

It’s easy to get caught up in the hype surrounding Gen AI, but it’s crucial to remember that Gen AI is not a magical solution to all problems. Enterprises often fall into the trap of thinking that merely integrating a powerful language model will lead to flawless outcomes. However, relying solely on Gen AI without a clear understanding of its limitations is akin to praying to LLM gods for miracles.

Fun fact: The hero image used for this blog is generated using the Chat GPT and the imperfections in this image are included for Authenticity. More information on this you can find in the Appendix section of this blog.

Imagine an e-commerce company deploying a Gen AI-based recommendation engine. Simply feeding the model user preference data and expecting perfect suggestions is unrealistic. Without a carefully thought-out approach, the recommendations may be generic or even irrelevant, leading to a poor user experience.

The reality is that Gen AI should be viewed as a tool—an incredibly powerful one—but a tool nonetheless. It requires careful consideration of how it is applied, a thorough understanding of its strengths and weaknesses, and, most importantly, a strategy for integrating it into existing workflows.

    import openai

    # Set your OpenAI API key
    openai.api_key = OPENAI_API_KEY

    def generate_recommendations(user_preferences):
       # Basic prompt for generating recommendations
       prompt_text = f"Generate product recommendations for user preferences: {user_preferences}"

       response = openai.chat.completions.create(
           model="gpt-3.5-turbo",  # You can use 'gpt-4' or other available models
           messages=[
               {"role": "system", "content": "You are an ecommerce website assistant."},
               {"role": "user", "content": prompt_text}
           ],
           max_tokens=50,  # Limit to 50 tokens
           temperature=0.7,  # Adjust temperature for creativity vs. determinism
       )
      
       # Extract and return the recommendations
       recommendations = response.choices[0].message.content
       return recommendations

    # Example user preferences (raw data)
    user_preferences = "likes electronics, specifically smartphones and laptops"
    recommendations = generate_recommendations(user_preferences)
    print(f"User Preferences: {user_preferences}")
    print(f"Recommendations: {recommendations}")

In this code snippet, we see a basic approach to generating product recommendations using Gen AI. However, this simplistic method doesn’t account for the nuances of the customer’s preferences, purchase history, or other relevant factors. This is where a deeper understanding and careful tuning come into play, as we’ll explore further in the blog. For instance, relying on existing engineered and ML-based recommendation systems to get relevant factors and feeding them into the LLM prompt can yield more relevant recommendations in a natural language style.

Data is King, ML is the Craftsman, and Domain Knowledge is Minister

The saying “Garbage In, Garbage Out” holds particularly true in Gen AI. The success of any AI model hinges on the quality of the data it’s trained on. However, even the best data is useless without the right context, which is where domain knowledge comes into play. Imagine data as the king of your AI strategy, with machine learning as the skilled craftsman who shapes and moulds this data. Without the guidance of domain knowledge—the minister—your king may govern ineffectively, and the craftsman’s work may lack the precision needed to make a true impact.

To see this analogy in action, consider a retail company aiming to personalise customer experiences. They possess a wealth of data, from purchase histories to browsing patterns—data that could be the foundation of a highly effective AI-driven strategy. But without the minister’s understanding of the retail sector’s nuances—such as seasonal trends, regional preferences, and customer behaviours—even the most skilled craftsman, equipped with machine learning tools, might shape recommendations that are technically sound but contextually misaligned. The craftsman can only achieve precision and relevance when the minister’s wisdom is applied, guiding the process.

To operationalize this concept, expertise in programming, machine learning (ML), and tools like AWS SageMaker, Google Vertex AI, and frameworks like vLLM becomes indispensable. These tools, the craftsman’s instruments, not only facilitate the training of ML models but also enable the fine-tuning and deployment of LLMs with domain-specific data. By integrating domain insights, the AI’s outputs become both precise and contextually relevant, significantly enhancing the effectiveness of the model.

Marrying Prompt Engineering with Software Engineering Best Practices

Prompt engineering is both an art and a science. Crafting the right prompt can significantly enhance the quality of a model’s output, but doing so requires a deep understanding of the subject matter. Without domain knowledge, prompt tuning can be like filling water in a pot with invisible holes—no matter how much effort you put in, the results will always fall short.

Let’s say you’re developing a customer support chatbot for a financial institution. The chatbot needs to handle complex queries about financial products, but without the right prompts, it might struggle to provide accurate information. This is where prompt engineering techniques, such as using zero-shot or few-shot examples and methods like Retrieval-Augmented Generation (RAG) to inject essential knowledge into the input context, become crucial.

To truly unlock the potential of your AI system, it's important to merge these prompt engineering techniques with the time-tested principles of software engineering. This combination not only enhances the efficiency and reliability of your AI but also ensures that it can scale and adapt as needed. Here’s how:

Modular Thinking: Just as in software development, breaking down complex problems into smaller, manageable modules can make a world of difference. In prompt engineering, this means designing prompts that are modular and can be reused across different contexts. For instance, you might create separate prompts for handling different types of financial queries—loans, investments, or credit cards—and then dynamically combine these modules based on the user’s input. This approach not only improves efficiency but also makes the system more flexible and easier to maintain.
Validations and Evaluations: In software engineering, validations ensure that each component behaves as expected. Similarly, in prompt engineering, it's vital to rigorously validate prompts against a wide range of scenarios. This process helps identify edge cases and ensures that the AI consistently provides accurate and relevant responses. Incorporating continuous feedback loops, where user interactions are analysed and used to refine prompts, can further enhance the system’s accuracy and responsiveness.
Unit Tests for Prompts: Unit testing is a cornerstone of reliable software development, and the same principle can be applied to prompt engineering. By testing individual prompts in isolation, you can fine-tune them to handle specific queries with precision before integrating them into the broader system. For example, if your financial chatbot has a prompt designed to explain interest rates, you would test it with various edge cases (e.g., different loan types or user profiles) to ensure it consistently delivers the correct information.
Continuous Integration and Deployment: Continuous integration (CI) and deployment practices are essential in modern software engineering for maintaining code quality and delivering updates seamlessly. Applying CI principles to prompt engineering means regularly integrating new prompts and updates into the system while ensuring they don’t disrupt existing functionalities. Automated testing and validation pipelines can be set up to catch issues early, allowing for smooth and reliable updates. This practice ensures that your AI can evolve and improve over time without introducing new errors or inconsistencies.
Documentation and Collaboration: Just as thorough documentation is crucial in software projects, documenting your prompt engineering process is equally important. Clear documentation of prompt structures, use cases, and dependencies ensures that the system can be easily understood and maintained by different team members. Collaboration tools and version control can also be leveraged to track changes, share insights, and coordinate efforts across teams, ensuring that prompt engineering becomes a cohesive and collaborative process.
Scalability and Adaptability: As your AI system grows, the ability to scale and adapt becomes critical. By incorporating software engineering practices such as modular design, automated testing, and continuous integration, you ensure that your prompt engineering efforts can scale alongside the increasing complexity and volume of user interactions. This adaptability allows your AI system to handle more diverse queries, integrate new knowledge, and evolve in response to changing business needs without compromising performance.

Here’s a sample folder structure applying the best practices mentioned above, designed to keep your project organised, scalable, and maintainable

    llm-app/
    │
    ├── src/
    │   ├── prompts/
    │   │   ├── finance_prompts.py       # Modular prompts for financial queries
    │   │   ├── user_interaction_prompts.py  # Prompts related to user interactions
    │   │   └── prompt_templates/        # Reusable prompt templates
    │   │
    │   ├── llm_integration/
    │   │   ├── llm_service.py           # Interface with the LLM API
    │   │   ├── model_config.py          # Configuration for different LLMs
    │   │   ├── response_validation.py   # Validation logic for LLM responses
    │   │   └── response_postprocessing.py # Post-processing LLM outputs
    │   │
    │   ├── core/
    │   │   ├── app.py                   # Main application logic
    │   │   ├── user_management.py       # User-related functionalities
    │   │   └── query_handler.py         # Logic for handling user queries
    │   │
    │   ├── utils/
    │   │   ├── logging.py               # Logging utilities
    │   │   ├── constants.py             # Global constants
    │   │   ├── helper_functions.py      # Miscellaneous utility functions
    │   │   └── config_loader.py         # Configuration file loader
    │   │
    │   └── __init__.py                  # Initialize the src module
    │
    ├── tests/
    │   ├── unit/
    │   │   ├── test_prompts.py          # Unit tests for prompt generation
    │   │   ├── test_llm_integration.py  # Unit tests for LLM service integration
    │   │   ├── test_core.py             # Unit tests for core application logic
    │   │   └── test_utils.py            # Unit tests for utilities
    │   │
    │   ├── integration/
    │   │   ├── test_end_to_end.py       # Integration tests covering end-to-end flows
    │   │   └── test_ci_cd_pipeline.py   # Tests for CI/CD pipelines
    │   │
    │   └── __init__.py                  # Initialize the tests module
    │
    ├── scripts/
    │   ├── ci_cd/
    │   │   ├── deploy.sh                # Deployment script
    │   │   ├── test.sh                  # Script to run tests in CI/CD pipeline
    │   │   └── build_docker.sh          # Docker build script for containerization
    │   │
    │   ├── data_preprocessing/
    │   │   ├── clean_data.py            # Script to clean and prepare data for training
    │   │   ├── generate_embeddings.py   # Script to generate embeddings for LLM
    │   │   └── split_data.py            # Script to split data into training and testing sets
    │   │
    │   └── utilities/
    │       ├── backup_db.sh             # Script to backup the database
    │       └── monitor_resources.sh     # Script to monitor system resources
    │
    ├── configs/
    │   ├── llm_config.yaml              # Configuration for LLM models and API keys
    │   ├── app_config.yaml              # Application-wide configuration (e.g., database, environment)
    │   └── logging_config.yaml          # Configuration for logging
    │
    ├── docs/
    │   ├── architecture_diagram.png     # High-level architecture diagram
    │   ├── prompt_documentation.md      # Documentation for all prompts used in the system
    │   ├── api_docs.md                  # API documentation for the LLM integration
    │   └── user_guide.md                # User guide for the application
    │
    ├── .gitignore                       # Git ignore file
    ├── README.md                        # Overview of the project
    ├── requirements.txt                 # Python dependencies
    └── Dockerfile                       # Dockerfile for containerization

By applying these software engineering practices to prompt engineering, you can create a more robust, reliable, and scalable Gen AI system that meets the demanding requirements of enterprise applications. This approach ensures that your AI system not only delivers accurate and relevant responses but also continues to evolve and improve over time.

OpenAI Function Calling and Structured Outputs

OpenAI’s function calling feature and structured output capabilities are game-changers for enterprises looking to integrate Gen AI into their existing systems. These tools enable a tighter integration between AI-generated content and structured data, making it easier to build more reliable and predictable applications.

Take, for example, a customer service application that needs to generate responses to common queries. With function calling, the AI can directly invoke pre-defined functions to fetch data, such as a customer’s account balance or order status, and then generate a response based on this data. This ensures that the AI’s output is not only accurate but also consistent with the enterprise’s existing data sources.

    import openai
    import json

    # Set your OpenAI API key
    openai.api_key = OPENAI_API_KEY

    # Example function to retrieve order status
    def get_order_status(order_id):
        # Hypothetical function to fetch order status from a database
        return f"Your order with ID {order_id} is currently being processed and will be delivered in 3-5 business days."

    # OpenAI function to generate a response with function calling
    def generate_customer_support_response(query):
        response = openai.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful customer service agent."},
                {"role": "user", "content": query}
            ],
            functions=[
                {
                    "name": "get_order_status",
                    "description": "Fetch the status of an order by its ID",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "order_id": {
                              "type": "string"}
                        },
                        "required": ["order_id"]
                    }
                }
            ],
            function_call={"name": "get_order_status"},
            temperature=0.2  # Low temperature for deterministic responses
        )

        # Extract the function call from the response
        function_call = response.choices[0].message.function_call

        # Extract the function name and arguments
        function_name = function_call.name
        function_args = json.loads(function_call.arguments)

        # Call the function with the parsed arguments
        if function_name == 'get_order_status':
            order_status = get_order_status(function_args['order_id'])
            return order_status

    # Example customer query
    customer_query = "Can you tell me the status of my order with ID 12345?"
    response = generate_customer_support_response(customer_query)
    print(response)

This approach ensures that your Gen AI system can deliver more accurate and reliable responses, reducing the risk of errors and improving customer satisfaction.

Adopting Gen AI in Enterprise Business: A Balanced Approach

When integrating Gen AI into enterprise applications, it’s crucial to strike a balance between various disciplines. Think of it as a ratio: ~70% engineering, ~20% machine learning (ML), and ~10% Gen AI.

Why this breakdown? Let’s consider a real-world example. A healthcare company wants to implement a Gen AI-powered system for patient triage. The system needs to accurately assess a patient’s symptoms and recommend the appropriate course of action. Here’s how the balance plays out:

70% Engineering: Engineering plays a critical role in building the infrastructure that powers the Gen AI system. This includes everything from designing the user interface to ensuring that the system integrates seamlessly with existing medical databases. Engineering also encompasses the implementation of safety protocols to protect patient data, which is paramount in the healthcare industry.
20% Machine Learning: ML comes into play when training models to recognize patterns in patient data. This might involve using supervised learning techniques to train a model on historical patient records, enabling it to make accurate predictions about patient outcomes. ML also includes the ongoing process of fine-tuning the model to improve its performance over time.
10% Gen AI: Finally, Gen AI provides the conversational interface that patients interact with. While this is a relatively small part of the overall system, it’s crucial for ensuring that the system can communicate effectively with patients, providing clear and accurate information.

This balanced approach ensures that the Gen AI system is both technically sound and highly effective in real-world applications. It also highlights the importance of treating Gen AI as one component of a larger system, rather than the sole focus.

Word of Caution

As enterprises rush to embrace Gen AI, it’s easy to get caught up in the excitement and overlook potential pitfalls. However, it’s crucial to approach Gen AI integration with a healthy dose of caution.

System Reliability and Consistency: When implementing Gen AI features, it’s essential to ensure that the system remains reliable and consistent. This means conducting evaluations at every step of LLM interaction, grounding LLM prompts with context, and grounding LLM outputs to the user query. While this may require more LLM calls and increase token costs, it’s necessary to maintain the system’s reliability and consistency.
Adoption Is Key—Mindset Matters: At first glance, making multiple LLM calls might seem like a recipe for increased latency and higher token costs. But here’s the twist: by breaking down your prompts into smaller, more focused queries, you can actually achieve faster responses and more optimised token usage. These smaller prompts reduce the need for complex reasoning, allowing you to seamlessly switch to smaller, fine-tuned models—whether self-hosted or domain-specific. The result? Significantly reduced latency and costs, without sacrificing the reliability and consistency of your system. In fact, these multiple interactions are the backbone of a robust, efficient Gen AI implementation.
Prompt Engineering and Domain Knowledge: Don’t underestimate the importance of prompt engineering and domain knowledge. Effective implementation of techniques like RAG, fine-tuning, and strong prompts with domain expertise is compulsory. This knowledge aids in choosing the right chunking strategy for vectorization in RAG, writing apt unit tests, and more.
Revalidation with Facts: Developers and users must not over-rely on Gen AI. Responses should be treated with caution and revalidated with facts from other sources. Incorporating factual references is key to Gen AI adaptation.
Minimal Permissions and Sandboxed Environment: When implementing Gen AI features, ensure minimal permissions, such as read-only, for underlying resources, and serve features from a sandboxed environment. This minimises the risk of unauthorised access or data breaches.
Differentiate Between Traditional Logging and GenAI Logging: It's essential to recognize that traditional application logging and GenAI logging serve different purposes. Traditional logging typically focuses on recording business events and system operations. In contrast, GenAI logging must go further by capturing responses, measuring accuracy, and tracking the sources of knowledge used by the AI. Remember, AI operates on probabilities, which raises the question: how much accuracy is enough to be trusted? For instance, is 80% accuracy sufficient for your specific use case? This is a crucial consideration that requires careful evaluation**.**
Data Privacy and Model Bias: LLM evaluations must include functions related to checking data privacy and model bias concerns. This ensures that the system respects user privacy and doesn’t perpetuate harmful biases.
Frameworks vs. Raw Model: A Crucial Choice: Higher-level frameworks like LangChain simplify prompt design, offering abstraction and convenience. However, given the fast-evolving nature of LLMs, staying closer to the raw model provides more flexibility and control. This approach enables quicker adjustments and deeper understanding of model behaviour, crucial for refining prompts and ensuring consistent quality. While frameworks are valuable for large-scale applications, balancing convenience with direct model access can better harness Gen AI's potential, especially in dynamic development environments.
Embracing Technology Change: Finally, remember that you will not be outpaced by the technology but by the people adopting it. If you think Gen AI is unnecessary for your business, think again. If you think Gen AI will replace your business, think again. The right mindset is treating Gen AI as a powerful tool that can significantly enhance productivity for your company, employees, and customers. This means you must embrace the technology change, invest in making it accessible to everyone in the company, and encourage them to adopt it as their personal tech assistant, while educating them on its limitations.

Conclusion

The integration of Gen AI into enterprise products is not without challenges, but with a balanced approach, these challenges can be overcome. By understanding the limitations of Gen AI, leveraging the power of data and domain knowledge, and applying the principles of software engineering, businesses can unlock the full potential of Gen AI to create products that are both innovative and reliable.

Appendix

About the Hero Image used in this blog:

While the image may look cool, it contains spelling mistakes and demands follow-up prompts for refinement, underscoring Gen AI's limitations in grasping user intent in a single go. Here's the initial prompt that generated the image:

“Create a banner image for a blog on integrating Generative AI into enterprise applications, blending themes of AI-powered innovation, software engineering best practices, and data-driven decision-making. The visual should emphasize the collaboration between AI, data, and domain knowledge, showcasing elements like modular prompt design, continuous integration, and a balanced approach to technology adoption. The tone should be professional and forward-looking, capturing the transformative potential of AI in modern business.”

After considerable effort and multiple prompt refinements, we finally arrived at this image. The specific prompts that led us here will be revealed in an upcoming blog—stay tuned!

Take Charge of Your AI Engineered Success with TechConative.ai

Unlock the potential of your projects with TechConative.ai. Reach out today to explore how our innovative AI-driven solutions can turn your ideas into impactful realities.

Company

About

Solutions

Resources

Blog

Case Study

Legal