Developing LLM-based Applications: Client-Server Answer Bot (2/n)

Madhusudhan Konda
7 min readSep 10, 2023

We’ve got a taster of LangChain in our last article.

With that knowledge, let’s build a simple client-server application “Answer Bot” based on LLM using OpenAI and LangChain.

The Answer Bot is a pretty standard application with a simple feature: “ask-me-a-question-I’ll-get-the-answer-from-openai”

We will create a Python based server using Flask and expose an endpoint so a client can connect to and invoke. Initially we will test the endpoint by using the cURL or Postman but later on we’d go one step further to create a simple UI. The final application will look something like this:

A simple “Answer Bot” client-server application

I will be working on developing applications using LLMs — as part of it, I’ll be writing a few articles on LLM — mostly targeted towards adoption of LLMs into general applicaitons. I will look at few of the following features (in no particular order):

  • Dockerising the server and deploying the container to a Kubernetes; Scaling horizontally based on load
  • Developing a LLM based Document Library Application
  • Developing a Llama 2 Based Application for Private Data
  • Developing a Production Ready Enterprise Llama Server for Businesses

Tech stack

  • Open AI’s GPT 3.5 model (gpt-3.5-turbo)
  • LangChain framework
  • Python programming language
  • Flask for webserver support on the server side
  • Streamlit framework for client’s UI

The source code is available here in my repository.

Let’s get coding.

Building a Server

As usual, we will need a bunch of imports and environment support, so here’s the list of all of them:

from flask import Flask, request, jsonify
from dotenv import load_dotenv
from PyPDF2 import PdfReader
import os
import openai

# Langchain related imports
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

# Initialize Flask app
app = Flask(__name__)

# Load environment variables
load_dotenv()

# Get the OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")

We’d want a few imports from openAI, langchain, flask and others.

Let’s create a simple function (not yet exposed to the world) called invoke()function — to ask the model and return data:

# Function to fetch the response from OpenAI model
def invoke(prompt):

llm = OpenAI() # instantiate the model

response = llm(prompt) # invoke OpenAI LLM

return response

The function expects a question (prompt) from the invoking method. It then asks the already-instantiated and ready-to-go model for a response.

All we need to do is to wire up an endpoint that accepts a user’s question. We will then invoke the invoke() method to retrive the response from the LLM server.

# Exposing the endpoint 
@app.route('/ask', methods=['POST'])
def ask():

# Fetch the question from the user's request
prompt = request.json.get('question')

# Invoke the model function
response = invoke(prompt)

# Return response
return jsonify({"answer": response})

The ask() function is exposed as POST method using Flask’s web framework. The logic of this method is pretty straightforward: once a user’s question is parsed from the incoming request, the earlier defined invoke() method is invoked to poke and get a response from the OpenAI LLM.

The last piece of the puzzle is to get the run the server on a 3000 port:

# Run the Flask app
if __name__ =='__main__':
app.run(port=3000)

Once you run the Python server, it’ll be exposing the app at 3000 with and endpoint ask()

As we do not have a client yet, we can use curl to test it:

curl --location 'http://127.0.0.1:3000/ask' \
--header 'Content-Type: application/json' \
--data '{"question": "What is BTC"}'

We are invoking the ask endpoint exposed by our server on the localmachine. The question is passed as a json document to the endpoint — in this case we are simply asking “what is BTC”.

If everything goes ok, the response would be printed out to the console:

OpenAI response

Ofcourse, you can also use Postman to invoke the same:

Invoking OpenAI’s LLM via Postman

We’ve managed to build a server now. Though this is not production ready yet (I’m currently building a Kubernetes based LLM deployment for production ready workloads — stay tuned), there’s no reason you can Dockerise this server and deploy to a Kubernetes server — but that’s for another article.

The server’s full code is available in openai_llm_server.py file in my repository.

We can also get a simple UI built on top of this server.

Building a Client

Invoking Curl/Postman is of course straightforward — but if you’d wish your users to experiment (not just developers), probably logic says we must create a UI. After all, most of us are visual learners.

Client needs not much of a list of dependencies except the requests HTTP library to interact with the http endpoints and streamlit for the UI features. The following snippet shows this in action:

import os
import streamlit as st
import requests
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize OpenAI API key
openai_api_key = os.getenv("OPENAI_API_KEY")

SERVER_URL = "http://127.0.0.1:3000/ask"

# Streamlit header
st.header("Got a question for me?")

We are also creating a local variable SEVER_URL to invoke our server, which we had created a moment ago. Because we’ve imported streamlit as st, the st.header will write a header tag text on the html page.

# H1 tag on the page
st.header("Got a question for me?")

We need to ask the user a question — hence we can use streamlit’s text_input function with a label and text area for the user to input their questions:

user_question = st.text_input(“Ask your question:”)

Both of these methods would create a HTML page something like this:

Simple Client UI

As the UI part is almost designed, let’s jump on to the logic of invoking the server when a user asks a question.

Client’s call is as simple as invoking the post method on the server’s endpoint. The following function demonstrates this:

def ask_user():
# Check if the user has asked a question
if user_question:
# we invoke server's endpoint with the question
response = requests.post(
SERVER_URL,
json={"question": user_question}
)

# Parst the response for the LLM's answer
answer = response.json().get("answer", "No answer found.")

# Write the answet to the page
st.write(answer)

# Main function
if __name__ == '__main__':
ask_user()

It is pretty much self explanatory (read the comments if you need to understand the actions the code is performing).

As we’ve integrated out client with the streamlit framework, we need to run the client using the following command:

# Running our streamlit client
streamlit run openai_llm_client.py

The client will be up and running — usually a web browser will be opened up with an URL pointing to http://localhost:8501

The client’s full code is available in openai_llm_client.py file in my repository.

Testing the Answer Bot

On the browser, visit http://localhost:8501 url to go to our Answer Bot. Make sure your server is up and running in the background.

Provide a question and hit enter. The answer should be available from the OpenAI server in no time and it gets printed, as demonstrated below:

User’s question gets answered by locally running server exposing openAI LLM

Yay! We’ve managed to get a working version of an Answer Bot — how cool is that!!

Wrapping up

That’s a pretty much for this article. We used LangChain to abstract away the low level details of OpenAI’s APIs. We developed a server that exposes the API for talking to the server. The client, developed in Streamlit, then invokes this API to get the answers.

The idea behind this code is to start building an enterprise ready server — which is what I’d aim in this series of articles.

Here’s my code repository. Don’t forget to follow me/clap to show me some encouragement :)

I will be working on developing applications using LLMs — as part of it, I’ll be writing a few articles on LLM — mostly targeted towards adoption of LLMs into general applications. I will look at few of the following features (in no particular order):

  • Dockerising the server and deploying the container to a Kubernetes; Scaling horizontally based on load
  • Developing a LLM based Document Library Application
  • Developing a Llama 2 Based Application for Private Data
  • Developing a Production Ready Enterprise Llama Server for Businesses

Me @ Medium || LinkedIn || Twitter || GitHub

In Plain English

Thank you for being a part of our community! Before you go:

--

--

Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud