Developing LLM Based Application: Llama 2 powered Summariser App (3/n)

6 min readSep 16, 2023

The speed at which the Large Language Models(LLM) are being integrated into general applications is astonishing. The revolution of embracing AI for enterprise application has begun.

In the last two articles (first one is here and second one here), we looked at the basics of development of LLM based applications. We in fact created a server-client bot with the OpenAI’s GPT 3.5 model. Of course, the openAI is not a free tool!

What if I tell you that we don’t need to spend a penny for developing a LLM based application using Open Source models? What if I demonstrate you the mechanics of developing an app using Meta’s open source LLM model: Llama 2?

Repository for this code is available here on my GitHub

We create a simple Summary App that’d summarise the given text into a meaningful way for a layman’s understanding.

See the app in action here:

Let’s dive in and get this app coded up!

Tech stack

Tech is pretty straightforward for this sample app:

Meta’s open source LLM model: Llama 2 (7 billion parameter model)
Langchain framework
Python programming language
Streamlit python UI framework

Project Setup

If you are not familiar with Streamlit framework, don’t fret — just jump on the docs and get familiarize yourself (I’ll try to post a streamlit-play repo and a relevant article about getting started with Streamlit soon)

We need to install a couple of packages like streamlit and langchain-cpp-python. So, run the requirements.txt file to get them inststalled in your project:

# Install the required libraries
pip install -r requirements.txt

We will not be talking to OpenAI in this application but instead run the opensource local LLM Llama 2.

I developed a second application using the OpenAI Model (GPT-3.5) to compare the outputs from both — the Llama 2 and OpenAI models. We discuss the results towards end of this article.

Download Llama 2 Model

Create a models folder in the root of the project — this is where we’d want to have the Llama 2 (7B) model sitting.

Download and copy the Meta’s Llama 2 seven billion CPU compliant model to this “models” folder — as shown in the figure below:

We can download this (as well as the other two models — the 13 billion parameter and 70 billion parameter models) from hugging face. I usually use TheBloke’s quantised models.

Download the 7B model from this link to TheBloke’s 7b model from huggingface if you haven’t have the model handy. You can click on the “Files and versions” tab and get the required model.

Code away

Now that we have all the required dependenceis sorted, let’s write some code — trust me — it is very very short one :)

Once we have the enviorment ready, create an app-llama2.py file which hosts all our logic for getting the summarisation of text using Llama 2 LLM.

Let’s import the streamlit and langchain related libraries:

import streamlit as st
from langchain.llms import LlamaCpp
from langchain import PromptTemplate
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

As you can see we are using LangChain framework to work with the LLM.

Let’s create a function that loads the Llama 2 model:

# Model is referred from the local directory:
MODEL="./models/llama-2-7b-chat.ggmlv3.q8_0.bin"

# Instantiate the callback with a streaming stdout handler
cb =   CallbackManager([StreamingStdOutCallbackHandler()])   

# Loading the Llama 2's LLM
def load_llm():
    # A Llama 2 based LLM model 
    llm:LlamaCpp  = LlamaCpp(
        model_path=MODEL,
        temperature=0.7,
        max_tokens=2000,
        top_p=1,
        callback_manager=cb, 
        verbose=True,
        n_ctx=2000
    )
    return llm

The LlamaCpp is a class from langchain.llms library which is a wrapper that instantiates the Llama 2 model. The model_path is the local file path to our model.

The next step is to create a prompt template:

prompt = PromptTemplate(
    input_variables=["input"],
    template="Summarize the following text {input}",
)

This is a LangChain’s prompt template class that’d help creating a prompt to feed the model.

The next step is to create a function where, when called with a question, should invoke the model with the appropriate prompt and get the answer:

# Method to invoke the LLM to summarise the text
def get_summary(question):
    # Loading the LLM
    llm = load_llm()

    # Creating the actual prompt to be fed to the model
    final_prompt = prompt.format(input=question)
    
    # Invoke the LLM with the prompt 
    # Wrap the response in Streamlit's widgets
    with st.spinner("Summarising the content.."):
        # Invoking the LLM and getting the result
        response = llm(final_prompt)
        
        # Pass the result to write it on the screen
        st.info(response)

        # Yay! Let's celebrate the result with balloons :)
        st.balloons()

The get_summary method expects the user's question and returns the result to the console - a summary of the text created by the LLM for the user's text.

The final piece of code is to get the UI elements on the browser tab as well as invoking this get_summary when the user has entered the text and submitted it for processing.

# Create a form
with st.form("summary_form"):
    # Add a text area where user would paste the text
    text = st.text_area("Paste the text here to summarise it:", value="", 
                       max_chars=5000)
    # Enable a submit button
    submitted = st.form_submit_button("Submit")
    
    # If the user submitted, invoke the get_summary function
    if submitted:
        get_summary(text)

The form has a few Streamlit UI elements: text_area is where the user is expected to paste the text to get a summary. Once the user submits the question, we invoke the get_summary function to produce the output.

That’s pretty much the extent of coding we should do — barely under 50 lines :)

Run it

Once the coding is out of the way, let’s run the program:

# Streamlit apps are run this way
streamlit run app-llama.py

This would invoke a web browser on a default port:

Input a text that you’d want to summarise and hit the “Submit” button. The text will be summarised by the LLM and produces a result that gets displayed under the text area.

Summary text generated by Locally running Llama 2 LLM

Testing with Open AI

As we use LangChain, integrating with OpenAI’s GPT model is pretty straight forward. The load_llm() would look something like the following:

# Make sure your OpenAI key is supplied
openai_api_key = os.getenv("OPENAI_API_KEY")

# Loading the OpenAI's LLM
def load_llm():
    llm = OpenAI()
    return llm

Rest of the program doesn’t need to be changed — you can try it out by running the app-openai.py program:

streamlit run app-openai.py

Input exactly the same text and check what OpenAI model produced:

As you can see, there are subtle differences — I like OpenAI summary — but unfortunately it is a very vague test to form an informed opinion. We will worry about the differences between these models in the coming articles.

Wrap Up

That’s a wrap — we looked at how to create a Llama 2 LLM based summarization application in this article. We also ran the same text through the OpenAI’s application too and texted the summaries produced by the same models.

In the coming articles, I’ll be exploring a bit more in depth of these LLMs as well as working on solutions to make these LLMs enterprise!

Repository for thsi code is available here on my GitHub

Stay tuned!

Me @ Medium || LinkedIn || Twitter || GitHub