Model HQ

Documentation

API Reference

The Model HQ API provides programmatic access to the core Model HQ platform with APIs for model inference, RAG and Agent processing.

How to access

Model HQ App (with UI) - to convert to API programmatic access, you should shift to "backend mode" which can be found on the main tools bar in the upper right hand corner of the app. You can configure the backend before launching, e.g., localhost vs. external IP (set to localhost by default), port (set to 8088 by default), and optional trusted_key to be used when calling an API. You will also see a download Model HQ Client SDK link on the page, which you should download, and open as a new project in your favorite IDE. After shifting to programmatic mode, models and agents created in the UI are available to be accessed through APIs. If you set a trusted_key in the configuration, then the key will be checked and validated on each API call (you can leave blank for development use).

Model HQ Dev (no UI) - this product provides the backend development server directly with only programmatic access, and can be started, stopped, and configured entirely with code (requires a separate license). The Model HQ Client SDK is included with the product, along with separate directions for activating the license key on first use.

Model HQ Server (Linux - no UI) - this is a scalable API server with multi-user scalability, larger model catalog, and an enhanced set of RAG capabilities (requires a separate license).

Most of the text below was written from the perspective of using the APIs on device (Model HQ App, or Model HQ Dev), although the APIs apply to Model HQ Server as well.

Model HQ Client SDK

The client SDK exposes the APIs through a Python interface to make it to easy to integrate into other Agent, RAG and generative AI pipelines.

The first step is to create a client, similar to Open AI and other API-based model services.

In most cases, you can see the auto-detect setup convenience function get_url_string() to automatically connect to the server.

from llmware_client_sdk import LLMWareClient, get_url_string

# create client interface into model hq windows background server
api_endpoint = get_url_string()

# alt: direct
# api_endpoint = "http://localhost:8088"

client = LLMWareClient(api_endpoint=api_endpoint)

Once you have created the client, API calls can be initiated through methods implemented on the client.

Getting Started

Model inferencing takes place on device, and upon first invocation of a selected model, the model will be pulled from a secure LLMWare repository, and cached onto the local device. Depending upon the size of the model and the wifi/network connection, it can take between 30 seconds and a few minutes to download the model the first time. After that, upon each subsequent use, the model will be loaded from disk, and generally takes no more than a few seconds to load.

Example # 1 - Inference - this is the core API for accessing a model

prompt = "What are the best sites to see in France?"
model_name = "llama-3.2-1b-instruct-ov"

print("\nStarting 'Hello World'")
print(f"Prompt: {prompt}")
print(f"Model: {model_name}\n")

# this is the main inference API

response = client.inference(prompt=prompt,model_name=model_name, max_output=100, trusted_key="")

# note: will stop at output of 100 tokens and will provide a 'complete' response (e.g., not streamed)

print("\nhello world test # 1 - inference response: ", response)

Example #2 - Stream - this is the streaming version of the core model inference API

model_name = "llama-3.2-3b-instruct-ov"
prompt = "What are the main theoretical challenges with quantum gravity?"

print("\nRunning Model Locally in Streaming Mode")
print(f"Prompt: {prompt}")
print(f"Model: {model_name}\n")

# the stream method is called and consumed as a generator function

for token in client.stream(prompt=prompt,
                           model_name=model_name,max_output=300,
                           trusted_key=""):

    print(token,end="")

For many use cases, just using the two APIs above will give you the ability to easily access and integrate a wide range of models.

Example #3 - Finding Models

There are several key utility APIs that help to find available models:

list_all_models - generally models in the catalog can be invoked using inference/stream with the unique identifier of model_name.

# what models are available? 

model_list = client.list_all_models()

# print it out the list to screen
print("model list: ", model_list)
for i, mod in enumerate(model_list["response"]):
  print("model: ", i, mod)

model_lookup - Lookup Specific Model with more details from the Model Card

print("\nmodel lookup example\n")

response = client.model_lookup(model_name, **kwargs)

print("response: ", response)

model_load - load a selected model into memory

print("\nmodel load test example\n")

response = client.model_load(model_name, **kwargs)

print("response: ", response)

model_unload - unload a selected model from memory

print("\nmodel unload test example\n")

response = client.model_unload(model_name, **kwargs)

print("response: ", response)

RAG

In addition to pure model inferencing, there are several methods provided to integrate documents into an inference. A more enhanced set of RAG capabilities (including vector db and full semantic search) are provided on the Model HQ Server.

document_inference - ask question to a document over API

# rag one step api process

document_path = os.path.abspath(".\\modelhq_client\\sample_files\\Bia EXECUTIVE EMPLOYMENT AGREEMENT.pdf")
question = "What is the annual rate of the base salary?"
model_name = "llama-3.2-3b-instruct-ov"

print(f"\n\nRAG Example - {question}\n")

response = client.document_inference(document_path, question, model_name=model_name)

print("document inference response - ", response['llm_response'])

Agents

You can create an agent process in the UI, and then invoke and run the agent over API as follows:

# selected process
process_name = "intake_processing"

# input to the agent
fp = os.path.abspath(".\\sample_files\\customer_transcript_1.txt")
text = open(fp, "r").read()

# call and run the agent process
response = client.run_agent(process_name=process_name, text=text, trusted_key="")

print("--run_agent intake_processing: ", response)

get_all_agents - provides a list of agents available

# show all agents available on the background server

agent_response = client.get_all_agents()

for i, agent in enumerate(agent_response["response"]):
    print("--agents available: ", i, agent)

Useful Admin Functions

ping - check that the platform is running and that the client is connected

response = client.ping()

print("response: ", response)

system_info - get information about the system

# get system info
x = client.system_info()
print("system info: ", x)

# get details about the backend process
details = get_server_details()
print("server details: ", details)

stop_server - stop the Model HQ platform (running as a background service on Windows)

stop_server()

Trusted Key

Since the Model HQ platform is designed for self-hosted deployment (and generally for internal enterprise user access - not consumer-scale deployment), we provide flexible options to enable separate API key implementations on top of the platform. We provide a flexible, easy-to-configure 'trusted_key' parameter which can be set at the time of launching the backend platform.

Note: for most development stage activities, this can be skipped entirely, and no trusted_key needs to be set, especially for on device use.

Models

Core inference endpoints for text generation, vision analysis, and specialized model functions.

POST

/stream/

Create stream

Generate a streaming inference response from a selected model. The response will be streamed back in real-time as the model generates tokens.

Request body

Required

model_name

string

required

The name of the model to use for inference

prompt

string

required

The input text prompt to generate a response for

Optional

max_output

integer

default: 100

Maximum number of tokens to generate

temperature

number

default: 0.0

Controls randomness in generation (0.0-1.0)

sample

boolean

default: true

Whether to use sampling for generation

context

string

Additional context to provide to the model

api_key

string

Your API authentication key

trusted_key

string

Alternative trusted authentication key

Returns

llm_response

string

Partial response text streamed incrementally as server-sent events

Timeout: This endpoint has a timeout of 60 seconds. The connection remains open for streaming responses.

Request

# Example variables
model_name = "phi-3-ov"
prompt = "Explain quantum computing in simple terms"
max_output = 300
temperature = 0.7
api_key = "your-api-key"  # Optional

print("\nStarting 'Create stream'")
print(f"Model_name: {model_name}")
print(f"Prompt: {prompt}")
print(f"Max_output: {max_output}")
print(f"Temperature: {temperature}")

# This is the streaming stream API
stream = client.stream(model_name=model_name, prompt=prompt, max_output=max_output, temperature=temperature, api_key=api_key)
for chunk in stream:
    print(chunk)

Response

// Streaming response format
data: {"llm_response": "Quantum computing is a revolutionary approach..."}

Additional Information

• Category: models
• Timeout: 60 seconds
• Supports streaming responses
• Requires authentication

POST

/inference/

Create inference

Generate a complete inference response from a selected model. The response will be the complete generation from the model returned as a single response.

Request body

Required

prompt

string

required

The input text prompt to generate a response for

model_name

string

required

The name of the model to use for inference

Optional

max_output

integer

default: 100

Maximum number of tokens to generate

temperature

number

default: 0.7

Controls randomness in generation (0.0-1.0)

sample

boolean

default: true

Whether to use sampling for generation

api_key

string

Your API authentication key

context

string

Additional context to provide to the model

params

object

Additional model-specific parameters

fx

string

Function execution parameters

trusted_key

string

Alternative trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
prompt = "Write a short story about artificial intelligence"
model_name = "llama-3.2-3b-instruct-ov"
max_output = 100
temperature = 0.8
api_key = "your-api-key"  # Optional

print("\nStarting 'Create inference'")
print(f"Prompt: {prompt}")
print(f"Model_name: {model_name}")
print(f"Max_output: {max_output}")
print(f"Temperature: {temperature}")

# This is the main inference API
response = client.inference(prompt=prompt, model_name=model_name, max_output=max_output, temperature=temperature, api_key=api_key)
print("\nCreate inference response: ", response)

Response

{
  "llm_response": "Artificial intelligence represents one of humanity's greatest..."
}

Additional Information

• Category: models
• Timeout: 60 seconds
• Requires authentication

POST

/function_call/

Function call

Execute a specialized function call with SLIM model for structured outputs and specific tasks.

Request body

Required

model_name

string

required

The SLIM model name to use for function calling

context

string

required

The context or input text for the function

Optional

prompt

string

Additional prompt instructions

params

object

Function-specific parameters

function

string

Specific function to execute

api_key

string

Your API authentication key

get_logits

boolean

Whether to return model logits

max_output

integer

Maximum tokens to generate

temperature

number

Sampling temperature

sample

boolean

Whether to use sampling

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
model_name = "phi-3-ov"
context = "John Smith works at Acme Corp as a Software Engineer. He can be reached at john@acme.com."
function = "extract_entities"
api_key = "your-api-key"  # Optional

print("\nStarting 'Function call'")
print(f"Model_name: {model_name}")
print(f"Context: {context}")
print(f"Function: {function}")

# This is the main function_call API
response = client.function_call(model_name=model_name, context=context, function=function, api_key=api_key)
print("\nFunction call response: ", response)

Response

{
  "llm_response": {
    "name": "John Smith",
    "company": "Acme Corp",
    "role": "Software Engineer",
    "email": "john@acme.com"
  }
}

Additional Information

• Category: models
• Timeout: 60 seconds
• Requires authentication

POST

/sentiment/

Sentiment analysis

Execute sentiment analysis using a specialized SLIM sentiment model.

Request body

Required

context

string

required

The text to analyze for sentiment

Optional

model_name

string

Sentiment model to use

get_logits

boolean

Whether to return model logits

max_output

integer

Maximum tokens to generate

temperature

number

Sampling temperature

sample

boolean

Whether to use sampling

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
context = "I absolutely love this new product! It's amazing and works perfectly."
api_key = "your-api-key"  # Optional

print("\nStarting 'Sentiment analysis'")
print(f"Context: {context}")

# This is the main sentiment API
response = client.sentiment(context=context, api_key=api_key)
print("\nSentiment analysis response: ", response)

Response

{
  "llm_response": {
    "sentiment": "positive",
    "confidence": 0.95
  }
}

Additional Information

• Category: models
• Timeout: 60 seconds
• Requires authentication

POST

/extract/

Extract information

Execute information extraction using a SLIM extract model to pull specific data from text.

Request body

Required

context

string

required

The text to extract information from

extract_keys

array

required

List of keys/fields to extract from the text

Optional

get_logits

boolean

Whether to return model logits

max_output

integer

Maximum tokens to generate

temperature

number

Sampling temperature

sample

boolean

Whether to use sampling

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
context = "Invoice #12345 dated March 15, 2024. Total amount: $1,250.00. Customer: ABC Corp."
extract_keys = ["invoice_number","date","total_amount","customer"]
api_key = "your-api-key"  # Optional

print("\nStarting 'Extract information'")
print(f"Context: {context}")
print(f"Extract_keys: {extract_keys}")

# This is the main extract API
response = client.extract(context=context, extract_keys=extract_keys, api_key=api_key)
print("\nExtract information response: ", response)

Response

{
  "llm_response": {
    "invoice_number": "12345",
    "date": "March 15, 2024",
    "total_amount": "$1,250.00",
    "customer": "ABC Corp"
  }
}

Additional Information

• Category: models
• Timeout: 60 seconds
• Requires authentication

POST

/vision/

Vision inference

Execute vision model inference to analyze and describe images with text prompts.

Request body

Required

uploaded_files

array

required

Array of image files to analyze

prompt

string

required

Text prompt describing what to analyze in the image

Optional

max_output

integer

Maximum tokens to generate

model_name

string

Vision model to use

temperature

number

Sampling temperature

sample

boolean

Whether to use sampling

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 360 seconds.

Request

# Example variables
uploaded_files = ["image1.jpg"]
prompt = "Describe what you see in this image"
model_name = "mistral-7b-instruct-v0.3-ov"
api_key = "your-api-key"  # Optional

print("\nStarting 'Vision inference'")
print(f"Uploaded_files: {uploaded_files}")
print(f"Prompt: {prompt}")
print(f"Model_name: {model_name}")

# This is the main vision API
response = client.vision(uploaded_files=uploaded_files, prompt=prompt, model_name=model_name, api_key=api_key)
print("\nVision inference response: ", response)

Response

{
  "llm_response": "I can see a beautiful landscape with mountains in the background..."
}

Additional Information

• Category: models
• Timeout: 360 seconds
• Requires authentication

POST

/vision_stream/

Vision stream

Generate a streaming inference response from vision model for real-time image analysis.

Request body

Required

uploaded_files

array

required

Array of image files to analyze

model_name

string

required

Vision model to use for streaming

prompt

string

required

Text prompt for image analysis

Optional

max_output

integer

Maximum tokens to generate

temperature

number

Sampling temperature

sample

boolean

Whether to use sampling

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Partial response text streamed incrementally as server-sent events

Timeout: This endpoint has a timeout of 360 seconds. The connection remains open for streaming responses.

Request

# Example variables
uploaded_files = ["image1.jpg"]
model_name = "llama-3.2-3b-instruct-ov"
prompt = "Analyze this image and describe the scene"
api_key = "your-api-key"  # Optional

print("\nStarting 'Vision stream'")
print(f"Uploaded_files: {uploaded_files}")
print(f"Model_name: {model_name}")
print(f"Prompt: {prompt}")

# This is the streaming vision_stream API
stream = client.vision_stream(uploaded_files=uploaded_files, model_name=model_name, prompt=prompt, api_key=api_key)
for chunk in stream:
    print(chunk)

Response

// Streaming response format
data: {"llm_response": "The image shows a bustling city street..."}

Additional Information

• Category: models
• Timeout: 360 seconds
• Supports streaming responses
• Requires authentication

POST

/rank/

Semantic ranking

Execute semantic similarity ranking with reranker model to rank documents by relevance to a query.

Request body

Required

query

string

required

The search query to rank documents against

documents

array

required

Array of documents to rank

Optional

model_name

string

Reranker model to use

text_chunk_size

integer

Size of text chunks for processing

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
query = "machine learning algorithms"
documents = ["Document about neural networks","Article on cooking recipes","Paper on deep learning"]
api_key = "your-api-key"  # Optional

print("\nStarting 'Semantic ranking'")
print(f"Query: {query}")
print(f"Documents: {documents}")

# This is the main rank API
response = client.rank(query=query, documents=documents, api_key=api_key)
print("\nSemantic ranking response: ", response)

Response

{
  "llm_response": [
    {
      "doc_id": 0,
      "score": 0.95
    },
    {
      "doc_id": 2,
      "score": 0.87
    },
    {
      "doc_id": 1,
      "score": 0.12
    }
  ]
}

Additional Information

• Category: models
• Timeout: 60 seconds
• Requires authentication

POST

/classify/

Text classification

Execute text classification inference, primarily used for safety controls and content moderation.

Request body

Required

model_name

string

required

Classification model to use

context

string

required

Text to classify

Optional

text_chunk_size

integer

Size of text chunks for processing

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
model_name = "phi-4-ov"
context = "This is a sample text to classify for safety"
api_key = "your-api-key"  # Optional

print("\nStarting 'Text classification'")
print(f"Model_name: {model_name}")
print(f"Context: {context}")

# This is the main classify API
response = client.classify(model_name=model_name, context=context, api_key=api_key)
print("\nText classification response: ", response)

Response

{
  "llm_response": {
    "classification": "safe",
    "confidence": 0.98
  }
}

Additional Information

• Category: models
• Timeout: 60 seconds
• Requires authentication

POST

/embedding/

Generate embeddings

Generate vector embeddings for text using embedding models for semantic search and similarity.

Request body

Required

model_name

string

required

Embedding model to use

context

string

required

Text to generate embeddings for

Optional

text_chunk_size

integer

Size of text chunks for processing

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
model_name = "llama-3.2-3b-instruct-ov"
context = "This is a sample text to embed"
api_key = "your-api-key"  # Optional

print("\nStarting 'Generate embeddings'")
print(f"Model_name: {model_name}")
print(f"Context: {context}")

# This is the main embedding API
response = client.embedding(model_name=model_name, context=context, api_key=api_key)
print("\nGenerate embeddings response: ", response)

Response

{
  "embeddings": [
    0.1,
    -0.2,
    0.3,
    0.4,
    -0.1
  ]
}

Additional Information

• Category: models
• Timeout: 60 seconds
• Requires authentication

POST

/list_all_models/

List models

Returns a list of all models available on the server.

Request body

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 3 seconds.

Request

# Example variables
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'List models'")


# This is the main list_all_models API
response = client.list_all_models(trusted_key=trusted_key)
print("\nList models response: ", response)

Response

{
  "response": [
    "llama-3.2-1b-instruct-ov",
    "llama-3.2-3b-instruct-ov",
    "mistral-7b-instruct-v0.3-ov",
    "phi-4-ov"
  ]
}

Additional Information

• Category: models
• Timeout: 3 seconds
• Requires authentication

POST

/system_info/

System information

Returns key information about the system and server configuration.

Request body

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 3 seconds.

Request

# Example variables
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'System information'")


# This is the main system_info API
response = client.system_info(trusted_key=trusted_key)
print("\nSystem information response: ", response)

Response

{
  "response": {
    "version": "1.0.0",
    "gpu_count": 2,
    "memory": "32GB",
    "status": "active"
  }
}

Additional Information

• Category: models
• Timeout: 3 seconds
• Requires authentication

POST

/model_lookup/

Model information

Returns detailed model card information about a selected model.

Request body

Required

model_name

string

required

Name of the model to look up

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 3 seconds.

Request

# Example variables
model_name = "mistral-7b-instruct-v0.3-ov"
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'Model information'")
print(f"Model_name: {model_name}")

# This is the main model_lookup API
response = client.model_lookup(model_name=model_name, trusted_key=trusted_key)
print("\nModel information response: ", response)

Response

{
  "response": {
    "name": "mistral-7b-instruct-v0.3-ov",
    "parameters": "7B",
    "context_length": 4096,
    "loaded": true
  }
}

Additional Information

• Category: models
• Timeout: 3 seconds
• Requires authentication

POST

/model_load/

Load model

Explicitly loads a selected model into memory on the API server, useful as a preparation step.

Request body

Required

model_name

string

required

Name of the model to load into memory

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 120 seconds.

Request

# Example variables
model_name = "phi-4-ov"
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'Load model'")
print(f"Model_name: {model_name}")

# This is the main model_load API
response = client.model_load(model_name=model_name, trusted_key=trusted_key)
print("\nLoad model response: ", response)

Response

{
  "response": {
    "model_name": "phi-4-ov",
    "status": "loaded",
    "memory_usage": "6.2GB"
  }
}

Additional Information

• Category: models
• Timeout: 120 seconds
• Requires authentication

POST

/model_unload/

Unload model

Explicitly unloads a selected model from memory on the API server.

Request body

Required

model_name

string

required

Name of the model to unload from memory

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 30 seconds.

Request

# Example variables
model_name = "llama-3.2-1b-instruct-ov"
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'Unload model'")
print(f"Model_name: {model_name}")

# This is the main model_unload API
response = client.model_unload(model_name=model_name, trusted_key=trusted_key)
print("\nUnload model response: ", response)

Response

{
  "response": {
    "model_name": "llama-3.2-1b-instruct-ov",
    "status": "unloaded",
    "memory_freed": "6.2GB"
  }
}

Additional Information

• Category: models
• Timeout: 30 seconds
• Requires authentication

RAG (Retrieval Augmented Generation)

Document and library-based question answering with semantic search and context retrieval.

POST

/document_inference/

Document Q&A

Specialized inference to ask questions about uploaded documents. Combines document parsing, semantic search, and LLM inference.

Request body

Required

question

string

required

Question to ask about the document

uploaded_document

file

required

Document file to analyze

Optional

model_name

string

LLM model to use for answering

text_chunk_size

integer

Size of text chunks for processing

tables_only

boolean

Whether to focus only on tables

use_top_n_context

integer

Number of top context chunks to use

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
question = "What is the main conclusion of this research paper?"
uploaded_document = "research_paper.pdf"
model_name = "phi-3-ov"
api_key = "your-api-key"  # Optional

print("\nStarting 'Document Q&A'")
print(f"Question: {question}")
print(f"Uploaded_document: {uploaded_document}")
print(f"Model_name: {model_name}")

# This is the main document_inference API
response = client.document_inference(question=question, uploaded_document=uploaded_document, model_name=model_name, api_key=api_key)
print("\nDocument Q&A response: ", response)

Response

{
  "response": "Based on the document analysis, the main conclusion is..."
}

Additional Information

• Category: rag
• Timeout: 60 seconds
• Requires authentication

POST

/library_inference/

Library Q&A

Specialized RAG inference that ranks entries from a library and generates responses based on retrieved content.

Request body

Required

question

string

required

Question to ask about the library content

library_name

string

required

Name of the library to search

model_name

string

required

LLM model to use for answering

Optional

use_top_n_context

integer

Number of top context chunks to use

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
question = "What are the key features of the product?"
library_name = "product_docs"
model_name = "llama-3.2-3b-instruct-ov"
api_key = "your-api-key"  # Optional

print("\nStarting 'Library Q&A'")
print(f"Question: {question}")
print(f"Library_name: {library_name}")
print(f"Model_name: {model_name}")

# This is the main library_inference API
response = client.library_inference(question=question, library_name=library_name, model_name=model_name, api_key=api_key)
print("\nLibrary Q&A response: ", response)

Response

{
  "response": "Based on the library content, the key features include..."
}

Additional Information

• Category: rag
• Timeout: 60 seconds
• Requires authentication

POST

/document_batch_analysis/

Batch document analysis

Analyzes multiple documents with a set of questions, ideal for processing contracts, invoices, or reports with consistent queries.

Request body

Required

uploaded_files

array

required

Array of documents to analyze

question_list

array

required

List of questions to ask each document

Optional

model_name

string

LLM model to use for analysis

reranker

string

Reranker model for result optimization

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 600 seconds.

Request

# Example variables
uploaded_files = ["contract1.pdf","contract2.pdf"]
question_list = ["What is the governing law?","What is the termination notice period?"]
model_name = "mistral-7b-instruct-v0.3-ov"
api_key = "your-api-key"  # Optional

print("\nStarting 'Batch document analysis'")
print(f"Uploaded_files: {uploaded_files}")
print(f"Question_list: {question_list}")
print(f"Model_name: {model_name}")

# This is the main document_batch_analysis API
response = client.document_batch_analysis(uploaded_files=uploaded_files, question_list=question_list, model_name=model_name, api_key=api_key)
print("\nBatch document analysis response: ", response)

Response

{
  "response": {
    "results": [
      {
        "document": "contract1.pdf",
        "answers": [
          "Delaware",
          "30 days"
        ]
      },
      {
        "document": "contract2.pdf",
        "answers": [
          "California",
          "60 days"
        ]
      }
    ]
  }
}

Additional Information

• Category: rag
• Timeout: 600 seconds
• Requires authentication

Library Management

Create and manage document libraries for knowledge base construction and semantic search capabilities.

POST

/create_new_library/

Create library

Creates a new library which is a collection of documents that are parsed, indexed and organized for knowledge retrieval.

Request body

Required

library_name

string

required

Name for the new library

Optional

account_id

string

Account identifier

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 30 seconds.

Request

# Example variables
library_name = "contract_library"
account_id = "user123"
api_key = "your-api-key"  # Optional

print("\nStarting 'Create library'")
print(f"Library_name: {library_name}")
print(f"Account_id: {account_id}")

# This is the main create_new_library API
response = client.create_new_library(library_name=library_name, account_id=account_id, api_key=api_key)
print("\nCreate library response: ", response)

Response

{
  "response": {
    "library_name": "contract_library",
    "status": "created",
    "doc_count": 0
  }
}

Additional Information

• Category: library
• Timeout: 30 seconds
• Requires authentication

POST

/add_files/

Add files to library

Core method for adding files to a Library, which are parsed, text chunked and indexed automatically upon upload.

Request body

Required

library_name

string

required

Name of the library to add files to

uploaded_files

array

required

Array of files to upload and process

Optional

account_id

string

Account identifier

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 300 seconds.

Request

# Example variables
library_name = "contract_library"
uploaded_files = ["contract1.pdf","contract2.pdf"]
api_key = "your-api-key"  # Optional

print("\nStarting 'Add files to library'")
print(f"Library_name: {library_name}")
print(f"Uploaded_files: {uploaded_files}")

# This is the main add_files API
response = client.add_files(library_name=library_name, uploaded_files=uploaded_files, api_key=api_key)
print("\nAdd files to library response: ", response)

Response

{
  "response": {
    "files_processed": 2,
    "chunks_created": 45,
    "status": "completed"
  }
}

Additional Information

• Category: library
• Timeout: 300 seconds
• Requires authentication

POST

/query/

Query library

Execute a text-based query against an existing library to find relevant documents and passages.

Request body

Required

library_name

string

required

Name of the library to query

user_query

string

required

Search query text

Optional

result_count

integer

Number of results to return

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
library_name = "contract_library"
user_query = "termination clauses"
result_count = 10
api_key = "your-api-key"  # Optional

print("\nStarting 'Query library'")
print(f"Library_name: {library_name}")
print(f"User_query: {user_query}")
print(f"Result_count: {result_count}")

# This is the main query API
response = client.query(library_name=library_name, user_query=user_query, result_count=result_count, api_key=api_key)
print("\nQuery library response: ", response)

Response

{
  "response": [
    {
      "doc_id": 1,
      "text": "Termination clause content...",
      "score": 0.95
    }
  ]
}

Additional Information

• Category: library
• Timeout: 60 seconds
• Requires authentication

POST

/get_library_card/

Get library info

Get comprehensive metadata information about a library including document count, embedding status, and configuration.

Request body

Required

library_name

string

required

Name of the library to get information for

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 10 seconds.

Request

# Example variables
library_name = "contract_library"
api_key = "your-api-key"  # Optional

print("\nStarting 'Get library info'")
print(f"Library_name: {library_name}")

# This is the main get_library_card API
response = client.get_library_card(library_name=library_name, api_key=api_key)
print("\nGet library info response: ", response)

Response

{
  "response": {
    "library_name": "contract_library",
    "doc_count": 25,
    "embedding_status": "installed",
    "created_date": "2024-01-15"
  }
}

Additional Information

• Category: library
• Timeout: 10 seconds
• Requires authentication

POST

/install_embedding/

Install embeddings

Installs vector embeddings across a library and creates the appropriate vectors in the vector database for semantic search.

Request body

Required

library_name

string

required

Name of the library to install embeddings for

Optional

embedding_model

string

Embedding model to use

vector_db

string

Vector database to use

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 600 seconds.

Request

# Example variables
library_name = "contract_library"
embedding_model = "llama-3.2-1b-instruct-ov"
api_key = "your-api-key"  # Optional

print("\nStarting 'Install embeddings'")
print(f"Library_name: {library_name}")
print(f"Embedding_model: {embedding_model}")

# This is the main install_embedding API
response = client.install_embedding(library_name=library_name, embedding_model=embedding_model, api_key=api_key)
print("\nInstall embeddings response: ", response)

Response

{
  "response": {
    "embeddings_created": 1250,
    "vector_db": "milvus",
    "status": "completed"
  }
}

Additional Information

• Category: library
• Timeout: 600 seconds
• Requires authentication

POST

/semantic_query/

Semantic search

Executes a semantic/vector query against embeddings for more accurate content retrieval based on meaning rather than keywords.

Request body

Required

library_name

string

required

Name of the library to search

user_query

string

required

Semantic search query

Optional

result_count

integer

Number of results to return

db

string

default: mongo

Database type

vector_db

string

default: milvus

Vector database type

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
library_name = "contract_library"
user_query = "contract termination conditions"
result_count = 5
api_key = "your-api-key"  # Optional

print("\nStarting 'Semantic search'")
print(f"Library_name: {library_name}")
print(f"User_query: {user_query}")
print(f"Result_count: {result_count}")

# This is the main semantic_query API
response = client.semantic_query(library_name=library_name, user_query=user_query, result_count=result_count, api_key=api_key)
print("\nSemantic search response: ", response)

Response

{
  "response": [
    {
      "doc_id": 1,
      "similarity_score": 0.92,
      "text": "Contract termination..."
    }
  ]
}

Additional Information

• Category: library
• Timeout: 60 seconds
• Requires authentication

POST

/get_document_list/

List documents

Returns a comprehensive list of all documents contained in a specific library with metadata.

Request body

Required

library_name

string

required

Name of the library to list documents from

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 30 seconds.

Request

# Example variables
library_name = "contract_library"
api_key = "your-api-key"  # Optional

print("\nStarting 'List documents'")
print(f"Library_name: {library_name}")

# This is the main get_document_list API
response = client.get_document_list(library_name=library_name, api_key=api_key)
print("\nList documents response: ", response)

Response

{
  "response": [
    {
      "doc_id": 1,
      "filename": "contract1.pdf",
      "pages": 12,
      "upload_date": "2024-01-15"
    },
    {
      "doc_id": 2,
      "filename": "contract2.pdf",
      "pages": 8,
      "upload_date": "2024-01-16"
    }
  ]
}

Additional Information

• Category: library
• Timeout: 30 seconds
• Requires authentication

POST

/get_document_text/

Extract document text

Returns the complete text extract of a selected document from a specified library for review or processing.

Request body

Required

library_name

string

required

Name of the library containing the document

Optional

doc_id

string

Document ID to extract text from

doc_fn

string

Document filename to extract text from

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 60 seconds.

Request

# Example variables
library_name = "contract_library"
doc_id = "1"
api_key = "your-api-key"  # Optional

print("\nStarting 'Extract document text'")
print(f"Library_name: {library_name}")
print(f"Doc_id: {doc_id}")

# This is the main get_document_text API
response = client.get_document_text(library_name=library_name, doc_id=doc_id, api_key=api_key)
print("\nExtract document text response: ", response)

Response

{
  "response": {
    "doc_id": 1,
    "filename": "contract1.pdf",
    "text": "Full document text content..."
  }
}

Additional Information

• Category: library
• Timeout: 60 seconds
• Requires authentication

Agent Execution

Execute automated multi-step processes and workflows using pre-configured intelligent agents.

POST

/run_agent/

Execute agent

Executes a pre-configured agent process for automated multi-step document analysis and task completion.

Request body

Required

process_name

string

required

Name of the agent process to execute

Optional

process_zip

file

Agent process zip file to upload and execute

input_list

array

List of inputs for the agent process

text

string

Text input for the agent

snippet

string

Code or text snippet input

document_file

file

Document file for agent processing

table_file

file

Table/spreadsheet file for processing

image_file

file

Image file for agent analysis

source_file

file

Source code file for processing

user_files

array

Additional user files for processing

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 300 seconds.

Request

# Example variables
process_name = "contract_analyzer"
document_file = "contract.pdf"
input_list = ["analysis_type","text","comprehensive"]
api_key = "your-api-key"  # Optional

print("\nStarting 'Execute agent'")
print(f"Process_name: {process_name}")
print(f"Document_file: {document_file}")
print(f"Input_list: {input_list}")

# This is the main run_agent API
response = client.run_agent(process_name=process_name, document_file=document_file, input_list=input_list, api_key=api_key)
print("\nExecute agent response: ", response)

Response

{
  "response": {
    "agent_name": "contract_analyzer",
    "status": "completed",
    "results": {
      "effective_date": "2024-01-01",
      "base_salary": "$150,000"
    },
    "execution_time": "45s"
  }
}

Additional Information

• Category: agent
• Timeout: 300 seconds
• Requires authentication

POST

/lookup_agent/

Find agent

Checks if a specific agent process exists and is available on the server for execution.

Request body

Required

process_name

string

required

Name of the agent process to look up

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 10 seconds.

Request

# Example variables
process_name = "contract_analyzer"
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'Find agent'")
print(f"Process_name: {process_name}")

# This is the main lookup_agent API
response = client.lookup_agent(process_name=process_name, trusted_key=trusted_key)
print("\nFind agent response: ", response)

Response

{
  "response": {
    "process_name": "contract_analyzer",
    "available": true,
    "description": "Analyzes contracts for key terms",
    "version": "1.2.0"
  }
}

Additional Information

• Category: agent
• Timeout: 10 seconds
• Requires authentication

POST

/get_all_agents/

List all agents

Returns a comprehensive list of all available agent processes on the server with their capabilities.

Request body

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 10 seconds.

Request

# Example variables
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'List all agents'")


# This is the main get_all_agents API
response = client.get_all_agents(trusted_key=trusted_key)
print("\nList all agents response: ", response)

Response

{
  "response": [
    {
      "name": "contract_analyzer",
      "description": "Contract analysis agent",
      "version": "1.2.0"
    },
    {
      "name": "invoice_processor",
      "description": "Invoice processing agent",
      "version": "1.0.1"
    }
  ]
}

Additional Information

• Category: agent
• Timeout: 10 seconds
• Requires authentication

Utilities & Administration

Server management, health checks, and administrative functions for monitoring and control.

POST

/ping/

Health check

Quick health check to verify if the API server is responsive and operational.

Request body

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 5 seconds.

Request

# Example variables
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'Health check'")


# This is the main ping API
response = client.ping(trusted_key=trusted_key)
print("\nHealth check response: ", response)

Response

{
  "response": {
    "status": "ok",
    "timestamp": "2024-01-15T10:30:00Z",
    "version": "1.0.0"
  }
}

Additional Information

• Category: utility
• Timeout: 5 seconds
• Requires authentication

POST

/server_stop/

Stop server

Gracefully stops the API server. Use with caution as this will terminate all active connections.

Request body

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 10 seconds.

Request

# Example variables
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'Stop server'")


# This is the main server_stop API
response = client.server_stop(trusted_key=trusted_key)
print("\nStop server response: ", response)

Response

{
  "response": {
    "status": "stopping",
    "message": "Server shutdown initiated"
  }
}

Additional Information

• Category: utility
• Timeout: 10 seconds
• Requires authentication

POST

/get_api_catalog/

API catalog

Returns a complete catalog of all available API endpoints with their specifications and parameters.

Request body

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 10 seconds.

Request

# Example variables
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'API catalog'")


# This is the main get_api_catalog API
response = client.get_api_catalog(trusted_key=trusted_key)
print("\nAPI catalog response: ", response)

Response

{
  "response": [
    {
      "api_name": "inference",
      "endpoint": "/inference/",
      "method": "POST",
      "timeout": 60
    },
    {
      "api_name": "stream",
      "endpoint": "/stream/",
      "method": "POST",
      "timeout": 60
    }
  ]
}

Additional Information

• Category: utility
• Timeout: 10 seconds
• Requires authentication

POST

/get_db_info/

Database info

Returns information about registered databases and vector databases available on the server.

Request body

Optional

trusted_key

string

Trusted authentication key

Returns

llm_response

string

Complete generated response from the model

Timeout: This endpoint has a timeout of 10 seconds.

Request

# Example variables
trusted_key = "your-trusted-key"  # Optional

print("\nStarting 'Database info'")


# This is the main get_db_info API
response = client.get_db_info(trusted_key=trusted_key)
print("\nDatabase info response: ", response)

Response

{
  "response": {
    "databases": [
      "mongo",
      "sqlite"
    ],
    "vector_databases": [
      "milvus",
      "faiss"
    ],
    "default_db": "mongo",
    "default_vector_db": "milvus"
  }
}

Additional Information

• Category: utility
• Timeout: 10 seconds
• Requires authentication