Skip to main content

LLM Open Connector + MuleSoft + Cerebras

ยท 6 min read
Amir
Developer Advocate @ Salesforce

Learn how to implement Salesforce's LLM Open Connector with MuleSoft Anypoint Platform using its AI Chain Connector and Inference Connector.

This recipe implements an example of Cerebras Inference; however, the high-level process is the same for all models and providers.

note

The steps in this recipe use Anypoint Studio. For instructions using Anypoint Code Builder (ACB), see LLM Open Connector + MuleSoft + Ollama.

High-level Processโ€‹

There are four high-level steps for connecting your MuleSoft model endpoint to the LLM Open Connector.

MuleSoft high-level process

Prerequisites + Tutorial Videoโ€‹

Before you begin, review the prerequisites:

  1. You have a MuleSoft Account (Sign up here).
  2. You have Anypoint Code Builder (ACB) or Anypoint Studio installed. The instructions in this recipe are based on Anypoint Studio.
  3. You have Inference Connector installed.
  4. You have a Cerebras account with an API key.

View a step-by-step tutorial video that covers an implementation similar to the one covered in this recipe:

Step 1: Download the API Specification for the LLM Open Connectorโ€‹

  1. Download LLM Open Connector API Spec.

  2. Rename the file from llm-open-connector.yml to llm-open-connector.yaml.

Step 2: Import the API Specification into Anypoint Design Centerโ€‹

  1. Log in to your MuleSoft account.

  2. Go to Anypoint Design Center.

  3. Import a new API specification from a file using these values:

    • Project Name: Cerebras-LLM-Provider,
    • API Specification: Select REST API,
    • File upload: llm-open-connector.yaml. Be sure to use the renamed file.
  4. Click Import.

  5. Verify that the API specification successfully imported.

Import API Specification

  1. Change termsOfService: "" to termsOfService: "-".

  2. Remove servers: and the example url.

servers:
- url: https://bring-your-own-llm.example.com
  1. Click Publish.

  2. Provide versioning information:

    • Asset version: 1.0.0
    • API version: v1
    • Lifecycle State: Stable

Publish to Exchange

  1. Click Publish to Exchange.

Step 3: Implement the API Specificationโ€‹

This cookbook uses Anypoint Studio to implement the API Specification. If you prefer, you can also implement the spec with Anypoint Code Builder.

Import API Specification into Studioโ€‹

  1. Open Anypoint Studio and create a Mule Project.

  2. Name the project and import an API spec:

    • Project Name: cerebras-llm-provider,
    • Import a published API: Select the Cerebras-LLM-Provider API spec from the previous step.

Add API to Mule Project

  1. Click Finish.

Add the Inference Connector to Your Projectโ€‹

  1. If you have not installed the Inference Connector, install it before you start.
  2. Add the Inference Connector dependency to the Mule Project.
<dependency>
<groupId>com.mulesoft.connectors</groupId>
<artifactId>mac-inference-chain</artifactId>
<version>0.1.0</version>
<classifier>mule-plugin</classifier>
</dependency>
  1. Make sure the Inference Connector is present in the Mule Palette. Mule Palette

Implement the Chat Completions Endpointโ€‹

  1. Go to the scaffolded flow post:\chat\completions:application\json:llm-open-connector-config.

  2. Drag and drop Chat completions operation from the Inference Connector into the Flow.

  3. Provide the Inference connector configuration for Cerebras.

  4. Parametrize all properties needed by the LLM Open Connector API Spec. Configuration Params.

  5. In the Chat completions operation, enter: payload.messages.

  6. Before the Chat completions operation, add the Set Variable operation with the name model and enter in the expression value payload.model.

  7. After the Chat completions operation, add the Transform Message operation and provide the mapping in this code block:

%dw 2.0
output application/json
---
{
id: "chatcmpl-" ++ now() as Number as String,
created: now() as Number,
usage: {
completion_tokens: attributes.tokenUsage.outputCount,
prompt_tokens: attributes.tokenUsage.inputCount,
total_tokens: attributes.tokenUsage.totalCount
},
model: vars.model,
choices: [
{
finish_reason: "stop",
index: 0,
message: {
content: payload.response default "",
role: "assistant"
}
}
],
object: "chat.completion"
}
  1. Save the project.

Test Locallyโ€‹

  1. Start the Mule Application.

  2. Go to the API Console. Configuration Params

  3. Enter the API Key and following payload:

{
"messages": [
{
"content": "What is the capital of Switzerland?",
"role": "user",
"name": ""
}
],
"model": "llama3.1-8b",
"max_tokens": 500,
"n": 1,
"temperature": 0.7,
"parameters": {
"top_p": 0.5
}
}
  1. Validate the result. Make sure the values are mapped correctly for token usage.
{
"id": "chatcmpl-1732373228",
"created": 1732373228,
"usage": {
"completion_tokens": 8,
"prompt_tokens": 42,
"total_tokens": 50
},
"model": "llama3.1-8b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The capital of Switzerland is Bern.",
"role": "assistant"
}
}
],
"object": "chat.completion"
}

Step 4. Deploy to Anypoint CloudHubโ€‹

  1. After the application is tested successfully, deploy it to Anypoint CloudHub.

  2. Right-click your Mule Project and navigate to Anypoint Platform > Deploy to CloudHub.

  3. Choose the environment you want to deploy to.

  4. Enter the required values:

    • App name: cerebras-llm-provider
    • Deployment target: Shared Space (Cloudhub 2.0)
    • Replica Count: 1
    • Replica Size: Micro (0.1 vCore)
  5. Click Deploy Application.

  6. Wait until the application is deployed.

Deployed App

note

If you receive the error [The asset is invalid, Error while trying to set type: app. Expected type is: rest-api.], go to Exchange and delete or rename the asset. This error is a known issue.

Create a Configuration in Model Builderโ€‹

After your API is running on CloudHub, you need to add the model endpoint to Model Builder.

  1. In Salesforce, open Data Cloud.

  2. Navigate to Einstein Studio.

  3. Click Add Foundation Model, and click Connect to your LLM.

  4. Click Next.

  5. Enter the required values:

    • Name: Cerebras-LLM-Provider
    • URL: <cloudhub_url>/api
    • Model: A model name is required. For this recipe, choose between llama3.1-70b or llama3.1-8b.

Model Builder

  1. Click Connect.

  2. Create two configurations for each supported model:

    • llama3.1-70b
    • llama3.1-8b

Model Builder Playground

Important Considerationsโ€‹

  1. This cookbook uses Cerebras models llama3.1-70b and llama3.1-8b.
  2. When configuring in Model Builder, you need to provide a default value for the model. In this recipe the model name is parametrized, so a value is required.
  3. The API is deployed under the governance of the MuleSoft Anypoint Platform. As a result:
    • You can monitor the application by viewing logs and errors.
    • You can apply additional security through Anypoint's API management capabilities.
    • You can deploy multiple replicas to scale horizontally and vertically.

Conclusionโ€‹

This cookbook demonstrates how to set up an LLM Open Connector using MuleSoft for the Chat Completion endpoints of Cerebras. This recipe is a sandbox implementation, and it's not production ready. For production use, please optimize your implementation based on your specific requirements and expected usage patterns.