28

We have a use case for ChatGPT in summarizing long pieces of text (speech-to-text conversations which can be over an hour).

However we find that the 4k token limit tends to lead to a truncation of the input text to say half or so due to the token limit.

Processing in parts does not seem to retain history of previous parts.

What options do we have for submitting a longer request which is over 4k tokens?

5
  • 1
    You could tell it that you are going to send multiple pieces of text and it does not need to respond until you explicitly ask for a response. Then send your text in chunks. Commented Mar 6, 2023 at 6:31
  • 1
    This person is asking about the ChatGPT API, not the public ChatGPT web interface. The web version does some sort of smart summarization/compression of any previous conversation once it gets too long, as it must still have the same limits under the hood. For the API you're on your own. Each request to the API is totally separate and retains no "memory" of the previous API calls. I too am wondering what the best solution here is.
    – jameslol
    Commented Mar 8, 2023 at 4:38
  • OP perhaps you could attempt to implement whatever it is that the web ChatGPT is doing - split all your text into chunks of about 3000 words, get a summary of each in separate API calls, and then send all the summaries in another API call, to get a "summary of summaries"
    – jameslol
    Commented Mar 8, 2023 at 4:43
  • @jameslol It is not always possible to correctly break the text into pieces Commented Mar 13, 2023 at 7:54
  • @BarishNamazov Could you provide an example of a prompt on how to send a long text to ChatGPT? Commented Mar 13, 2023 at 7:58

5 Answers 5

8

The closest answer to your question would be in the form of Embeddings.

You can find an overview of what they are here.

I recommend you review this code from the OpenAI Cookbook Github page that used a Web Crawl Q&A example to explain embeddings.

I used the code from Step 5 onwards and altered the location of the text to poin it to my file containing the long piece of text.

From:

# Open the file and read the text
with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f:
    text = f.read()

to:

# Open the file and read the text
with open("/my_location/long_text_file.txt", "r", encoding="UTF-8") as f:
    text = f.read()

And modified the questions at Step 13 to what I needed to know about the text.

1
  • The link to "this code" is broken.
    – Ugur
    Commented Feb 27 at 9:38
4

Another option is the ChatGPT retrieval plugin. This allows for creation of a vector database of your document's text which can be then processed by the LLM. See https://github.com/openai/chatgpt-retrieval-plugin

0

One approach to handle long text is to divide it into smaller fragments, retrieve the appropriate pieces according to your task, and then send them through an API call.

Here's a project that is capable of processing PDFs, txt and doc files, as well as web pages. It allows you to converse with the document. In your case, you could ask a general question like "what is the document about" to receive a summary, and then inquire for more specific details.

0

I use long inputs, so I made a tool for myself. You can find it on my GitHub. I built for myself and it serves me well. https://github.com/LearnFL/proj-python-chat-gpt-interface

The page explains how to use the script.

You specify how you want to split prompt by providing the length of desired input length expressed in tokens. Variable task holds instruction for the Chat Gpt on what you want from it. It will be pre-appended to each batch of text so the model could extract or do what you need done.

prompt = """A VERY LONG TEXT ON HOW TO USE          REGULAR EXPRESSIONS..."""
res = OpenAIAPI.generate(
 prompt, task="Explain how to use re",    get='batches', method="chat", model="gpt-    3.5-   turbo-1106", token_size=4000)
print(res) 
-2

You can use GPT-4 for long contexts

As stated in OpenAI:

GPT-4 is capable of handling over 25,000 words of text, allowing for use cases like long form content creation, extended conversations, and document search and analysis.

1
  • 1
    Please add a link that helps solve the problem. Thank you!
    – myhd
    Commented Aug 10, 2023 at 9:27

Not the answer you're looking for? Browse other questions tagged or ask your own question.