Streaming chat completions from OpenAI
Intro
I have been playing around with OpenAI's APIs around GPT a lot lately. One major problem I ran into was when attempting to stream responses from their chat completions API for a personal project of mine. While this seems to be something straightforward, most solutions I found online did not work as of the time of writing.
Enabling streaming allows us to receive partial responses from an API as they are generated, instead of waiting for the entire response to be generated before receiving it. This is great for end users, as they can start looking at data before the entire response content is generated.
The code
After a lot of tweaking and refering to multiple blogs and forum posts, I managed to get this working using Node's fetch()
API, without using any third party libraries.
Note: I used Node v18 for running this code snippet, and it worked out of the box. If you're using an older version, you might need to install the node-fetch
library and import fetch
to make it work. Refer to this link on StackOverflow.
Here is the entire code in Typescript, I have added comments wherever I felt they would be useful:
async function* getStreamedCompletion(
prompt: string
): AsyncGenerator<any, void, unknown> {
const response = await makeChatCompletionRequest(prompt);
// Get a readable stream from the response
const reader = response.body!.getReader();
try {
while (true) {
// Read a chunk of data from the readable stream
const { done, value } = await reader.read();
// If the stream is done, break out of the loop
if (done) break;
// Convert the chunk of data to a string
const text = await new Response(value).text();
// Filter out the lines that don't start with "data: "
const lines = text
.split("\n")
.filter((line: string) => line.trim().startsWith("data: "));
for (const line of lines) {
const message = line.replace(/^data: /, "");
// If the message includes "[DONE]", break out of the loop
// this is specified in OpenAI's API doc
if (message.includes("[DONE]")) {
break;
}
// Parse the token from the message and yield it
const token = parseTokenFromMessage(message);
if (token) {
yield token;
}
}
}
} finally {
// Release the lock on the readable stream
reader.releaseLock();
}
}
function parseTokenFromMessage(message: string): string | undefined {
try {
const json = JSON.parse(message.trim());
const token = json.choices[0].delta.content;
return token;
} catch (error) {
console.log(`Failed to parse message: ${message}, error: ${error}`);
return undefined;
}
}
async function makeChatCompletionRequest(prompt: string) {
const messages = [{ role: "user", content: prompt }];
const body = {
model: "gpt-3.5-turbo",
messages,
stream: true,
};
return fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
body: JSON.stringify(body),
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${OPENAI_API_KEY}`,
},
});
}
While the getStreamedCompletion
function might look overwhelming, on going through it, you will realize that most of the logic is around parsing response tokens.
Usage
Note that you'll have to set the OPENAI_API_KEY
variable, you can get your API key here.
Keeping the underlying implementation as a generator enables us to use this simple piece of code to pass in any prompt and receive the streamed tokens:
async function main() {
const prompt = `Complete the following sentence: Jack and Jill`;
const stream = getStreamedCompletion(prompt);
for await (const token of stream) {
console.log(token);
}
}
main()
.then(() => console.log("Done"))
.catch((err) => console.log(err));
The entire code is also availabe as a Github gist here.
Happy streaming!