Streaming with REST API for LangChain Applications
Implementing a streaming endpoint for LangChain application
Responses generated from LLM applications can take a while to complete depending on the length of the content, and it will make a very poor user experience if the response is displayed only after it has been completed. This is where streaming comes into play as it allows the content to be displayed to the user as it is being generated making the user experience more engaging and responsive.
With LangChain4j, to implement a streaming API the method only needs to return TokenStream
instead of String
in the agent interface.
public interface CustomerServiceAgent {
@SystemMessage({
"You are a customer support officer of ABC insurance, Sydney Australia.",
"Before providing information about about a claim application, you MUST always check:",
"correlationId or email. If the customer is in APPROVED state, please tell the customer that their claim will be settled in two weeks",
"When retrieving all customers, display the records in a tabular format",
"Today is {{current_date}}."
})
/**
* Non-streaming
*/
String chat(String userMessage);
/**
* Streaming
*/
TokenStream chatStream(String userMessage);
}
The customer service agent can now serve both streaming and non-streaming models as below:
@Bean
CustomerServiceAgent customerSupportAgent(StreamingChatLanguageModel streamingChatLanguageModel,
ChatLanguageModel chatLanguageModel,
ContentRetriever contentRetriever,
CustomerService customerService) {
return AiServices.builder(CustomerServiceAgent.class)
.streamingChatLanguageModel(streamingChatLanguageModel)
.chatLanguageModel(chatLanguageModel)
.chatMemory(MessageWindowChatMemory.withMaxMessages(20))
.tools(customerService)
.contentRetriever(contentRetriever)
.build();
}
@Bean
StreamingChatLanguageModel streamingModel() {
return OpenAiStreamingChatModel.withApiKey(System.getenv("AI_OPENAI_API_KEY"));
}
Next simply add aSpring WebFlux
wrapper rest endpoint to serve the streaming response as server sent events.
@RestController()
@CrossOrigin(origins = "*")
public class ChatController {
private CustomerServiceAgent agent;
public ChatController(CustomerServiceAgent agent) {
this.agent = agent;
}
@GetMapping("/chat")
public String chat(@RequestParam("question") String question) {
System.out.println("Submitted query ");
System.out.println(question);
return agent.chat(question);
}
@GetMapping(value = "/chatstream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
@ResponseBody
public Flux<String> chatStream(@RequestParam("question") String question) {
Sinks.Many<String> sink = Sinks.many().unicast().onBackpressureBuffer();
TokenStream ts = agent.chatStream(question);
ts.onNext(sink::tryEmitNext)
.onError(sink::tryEmitError)
.start();
return sink.asFlux();
}
}
We have used a very useful construct Sinks
from Project Reactor in the example above to push streaming data as Mono
or Flux
. Every time a token is received from the CustomerServiceAgent
, it is emitted via Sinks
as a stream of tokens.
A simple ReactJS Chat Interface
A simple ReactJS application can be quickly spun up using nextjs
by following their instructions. For simplicity, we have implemented the chat ui in the default page page.tsx
and the streaming version is implemented in stream/page.tsx
. To integrate with the locally running langchain agent above, we can simply call the api and update the page:
const post = async () => {
setResponse((r) => {
return "Please wait retrieving response ..."
})
const response = await fetch(`http://localhost:8080/chat?question=${message}`);
const content = await response.text();
setResponse(content);
}
For the streaming endpoint, we can make use of an EventSource
instance to retain persistent connection to the streaming endpoint and update the content of the page as the response tokens are received.
const post = async () => {
setResponse((r) => {
return "Please wait retrieving response ..."
});
let eventsrc = new EventSource(`http://localhost:8080/chatstream?question=${message}`);
setResponse('');
eventsrc.onmessage = (msg) => {
setResponse((response) => response + ' ' + msg.data);
}
}
It was straight forward to implement a ReactJS chat application powered by LLM using LangChain4J and SpringBoot. Source code for the app is provided below.