Model HQ
DocumentationAPI Reference
The Model HQ API provides programmatic access to the core Model HQ platform with APIs for model inference, RAG and Agent processing.
How to access
Model HQ App (with UI) - to convert to API programmatic access, you should shift to "backend mode" which can be found on the main tools bar in the upper right hand corner of the app. You can configure the backend before launching, e.g., localhost vs. external IP (set to localhost by default), port (set to 8088 by default), and optional trusted_key to be used when calling an API. You will also see a download Model HQ Client SDK link on the page, which you should download, and open as a new project in your favorite IDE. After shifting to programmatic mode, models and agents created in the UI are available to be accessed through APIs. If you set a trusted_key in the configuration, then the key will be checked and validated on each API call (you can leave blank for development use).
Model HQ Dev (no UI) - this product provides the backend development server directly with only programmatic access, and can be started, stopped, and configured entirely with code (requires a separate license). The Model HQ Client SDK is included with the product, along with separate directions for activating the license key on first use.
Model HQ Server (Linux - no UI) - this is a scalable API server with multi-user scalability, larger model catalog, and an enhanced set of RAG capabilities (requires a separate license).
Most of the text below was written from the perspective of using the APIs on device (Model HQ App, or Model HQ Dev), although the APIs apply to Model HQ Server as well.
Model HQ Client SDK
The client SDK exposes the APIs through a Python interface to make it to easy to integrate into other Agent, RAG and generative AI pipelines.
The first step is to create a client, similar to Open AI and other API-based model services.
In most cases, you can see the auto-detect setup convenience function get_url_string()
to automatically connect to the server.
from llmware_client_sdk import LLMWareClient, get_url_string
# create client interface into model hq windows background server
api_endpoint = get_url_string()
# alt: direct
# api_endpoint = "http://localhost:8088"
client = LLMWareClient(api_endpoint=api_endpoint)
Once you have created the client, API calls can be initiated through methods implemented on the client.
Getting Started
Model inferencing takes place on device, and upon first invocation of a selected model, the model will be pulled from a secure LLMWare repository, and cached onto the local device. Depending upon the size of the model and the wifi/network connection, it can take between 30 seconds and a few minutes to download the model the first time. After that, upon each subsequent use, the model will be loaded from disk, and generally takes no more than a few seconds to load.
Example # 1 - Inference - this is the core API for accessing a model
prompt = "What are the best sites to see in France?"
model_name = "llama-3.2-1b-instruct-ov"
print("\nStarting 'Hello World'")
print(f"Prompt: {prompt}")
print(f"Model: {model_name}\n")
# this is the main inference API
response = client.inference(prompt=prompt,model_name=model_name, max_output=100, trusted_key="")
# note: will stop at output of 100 tokens and will provide a 'complete' response (e.g., not streamed)
print("\nhello world test # 1 - inference response: ", response)
Example #2 - Stream - this is the streaming version of the core model inference API
model_name = "llama-3.2-3b-instruct-ov"
prompt = "What are the main theoretical challenges with quantum gravity?"
print("\nRunning Model Locally in Streaming Mode")
print(f"Prompt: {prompt}")
print(f"Model: {model_name}\n")
# the stream method is called and consumed as a generator function
for token in client.stream(prompt=prompt,
model_name=model_name,max_output=300,
trusted_key=""):
print(token,end="")
For many use cases, just using the two APIs above will give you the ability to easily access and integrate a wide range of models.
Example #3 - Finding Models
There are several key utility APIs that help to find available models:
list_all_models - generally models in the catalog can be invoked using inference/stream with the unique identifier of model_name
.
# what models are available?
model_list = client.list_all_models()
# print it out the list to screen
print("model list: ", model_list)
for i, mod in enumerate(model_list["response"]):
print("model: ", i, mod)
model_lookup - Lookup Specific Model with more details from the Model Card
print("\nmodel lookup example\n")
response = client.model_lookup(model_name, **kwargs)
print("response: ", response)
model_load - load a selected model into memory
print("\nmodel load test example\n")
response = client.model_load(model_name, **kwargs)
print("response: ", response)
model_unload - unload a selected model from memory
print("\nmodel unload test example\n")
response = client.model_unload(model_name, **kwargs)
print("response: ", response)
RAG
In addition to pure model inferencing, there are several methods provided to integrate documents into an inference. A more enhanced set of RAG capabilities (including vector db and full semantic search) are provided on the Model HQ Server.
document_inference - ask question to a document over API
# rag one step api process
document_path = os.path.abspath(".\\modelhq_client\\sample_files\\Bia EXECUTIVE EMPLOYMENT AGREEMENT.pdf")
question = "What is the annual rate of the base salary?"
model_name = "llama-3.2-3b-instruct-ov"
print(f"\n\nRAG Example - {question}\n")
response = client.document_inference(document_path, question, model_name=model_name)
print("document inference response - ", response['llm_response'])
Agents
You can create an agent process in the UI, and then invoke and run the agent over API as follows:
# selected process
process_name = "intake_processing"
# input to the agent
fp = os.path.abspath(".\\sample_files\\customer_transcript_1.txt")
text = open(fp, "r").read()
# call and run the agent process
response = client.run_agent(process_name=process_name, text=text, trusted_key="")
print("--run_agent intake_processing: ", response)
get_all_agents - provides a list of agents available
# show all agents available on the background server
agent_response = client.get_all_agents()
for i, agent in enumerate(agent_response["response"]):
print("--agents available: ", i, agent)
Useful Admin Functions
ping - check that the platform is running and that the client is connected
response = client.ping()
print("response: ", response)
system_info - get information about the system
# get system info
x = client.system_info()
print("system info: ", x)
# get details about the backend process
details = get_server_details()
print("server details: ", details)
stop_server - stop the Model HQ platform (running as a background service on Windows)
stop_server()
Trusted Key
Since the Model HQ platform is designed for self-hosted deployment (and generally for internal enterprise user access - not consumer-scale deployment), we provide flexible options to enable separate API key implementations on top of the platform. We provide a flexible, easy-to-configure 'trusted_key' parameter which can be set at the time of launching the backend platform.
Note: for most development stage activities, this can be skipped entirely, and no trusted_key needs to be set, especially for on device use.
Models
Core inference endpoints for text generation, vision analysis, and specialized model functions.
/stream/
Create stream
Generate a streaming inference response from a selected model. The response will be streamed back in real-time as the model generates tokens.
Request body
Required
model_name
The name of the model to use for inference
prompt
The input text prompt to generate a response for
Optional
max_output
Maximum number of tokens to generate
temperature
Controls randomness in generation (0.0-1.0)
sample
Whether to use sampling for generation
context
Additional context to provide to the model
api_key
Your API authentication key
trusted_key
Alternative trusted authentication key
Returns
llm_response
Partial response text streamed incrementally as server-sent events
Timeout: This endpoint has a timeout of 60 seconds. The connection remains open for streaming responses.
Request
# Example variables
model_name = "phi-3-ov"
prompt = "Explain quantum computing in simple terms"
max_output = 300
temperature = 0.7
api_key = "your-api-key" # Optional
print("\nStarting 'Create stream'")
print(f"Model_name: {model_name}")
print(f"Prompt: {prompt}")
print(f"Max_output: {max_output}")
print(f"Temperature: {temperature}")
# This is the streaming stream API
stream = client.stream(model_name=model_name, prompt=prompt, max_output=max_output, temperature=temperature, api_key=api_key)
for chunk in stream:
print(chunk)
Response
// Streaming response format
data: {"llm_response": "Quantum computing is a revolutionary approach..."}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Supports streaming responses
- • Requires authentication
/inference/
Create inference
Generate a complete inference response from a selected model. The response will be the complete generation from the model returned as a single response.
Request body
Required
prompt
The input text prompt to generate a response for
model_name
The name of the model to use for inference
Optional
max_output
Maximum number of tokens to generate
temperature
Controls randomness in generation (0.0-1.0)
sample
Whether to use sampling for generation
api_key
Your API authentication key
context
Additional context to provide to the model
params
Additional model-specific parameters
fx
Function execution parameters
trusted_key
Alternative trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
prompt = "Write a short story about artificial intelligence"
model_name = "llama-3.2-3b-instruct-ov"
max_output = 100
temperature = 0.8
api_key = "your-api-key" # Optional
print("\nStarting 'Create inference'")
print(f"Prompt: {prompt}")
print(f"Model_name: {model_name}")
print(f"Max_output: {max_output}")
print(f"Temperature: {temperature}")
# This is the main inference API
response = client.inference(prompt=prompt, model_name=model_name, max_output=max_output, temperature=temperature, api_key=api_key)
print("\nCreate inference response: ", response)
Response
{
"llm_response": "Artificial intelligence represents one of humanity's greatest..."
}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Requires authentication
/function_call/
Function call
Execute a specialized function call with SLIM model for structured outputs and specific tasks.
Request body
Required
model_name
The SLIM model name to use for function calling
context
The context or input text for the function
Optional
prompt
Additional prompt instructions
params
Function-specific parameters
function
Specific function to execute
api_key
Your API authentication key
get_logits
Whether to return model logits
max_output
Maximum tokens to generate
temperature
Sampling temperature
sample
Whether to use sampling
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
model_name = "phi-3-ov"
context = "John Smith works at Acme Corp as a Software Engineer. He can be reached at john@acme.com."
function = "extract_entities"
api_key = "your-api-key" # Optional
print("\nStarting 'Function call'")
print(f"Model_name: {model_name}")
print(f"Context: {context}")
print(f"Function: {function}")
# This is the main function_call API
response = client.function_call(model_name=model_name, context=context, function=function, api_key=api_key)
print("\nFunction call response: ", response)
Response
{
"llm_response": {
"name": "John Smith",
"company": "Acme Corp",
"role": "Software Engineer",
"email": "john@acme.com"
}
}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Requires authentication
/sentiment/
Sentiment analysis
Execute sentiment analysis using a specialized SLIM sentiment model.
Request body
Required
context
The text to analyze for sentiment
Optional
model_name
Sentiment model to use
get_logits
Whether to return model logits
max_output
Maximum tokens to generate
temperature
Sampling temperature
sample
Whether to use sampling
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
context = "I absolutely love this new product! It's amazing and works perfectly."
api_key = "your-api-key" # Optional
print("\nStarting 'Sentiment analysis'")
print(f"Context: {context}")
# This is the main sentiment API
response = client.sentiment(context=context, api_key=api_key)
print("\nSentiment analysis response: ", response)
Response
{
"llm_response": {
"sentiment": "positive",
"confidence": 0.95
}
}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Requires authentication
/extract/
Extract information
Execute information extraction using a SLIM extract model to pull specific data from text.
Request body
Required
context
The text to extract information from
extract_keys
List of keys/fields to extract from the text
Optional
get_logits
Whether to return model logits
max_output
Maximum tokens to generate
temperature
Sampling temperature
sample
Whether to use sampling
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
context = "Invoice #12345 dated March 15, 2024. Total amount: $1,250.00. Customer: ABC Corp."
extract_keys = ["invoice_number","date","total_amount","customer"]
api_key = "your-api-key" # Optional
print("\nStarting 'Extract information'")
print(f"Context: {context}")
print(f"Extract_keys: {extract_keys}")
# This is the main extract API
response = client.extract(context=context, extract_keys=extract_keys, api_key=api_key)
print("\nExtract information response: ", response)
Response
{
"llm_response": {
"invoice_number": "12345",
"date": "March 15, 2024",
"total_amount": "$1,250.00",
"customer": "ABC Corp"
}
}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Requires authentication
/vision/
Vision inference
Execute vision model inference to analyze and describe images with text prompts.
Request body
Required
uploaded_files
Array of image files to analyze
prompt
Text prompt describing what to analyze in the image
Optional
max_output
Maximum tokens to generate
model_name
Vision model to use
temperature
Sampling temperature
sample
Whether to use sampling
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 360 seconds.
Request
# Example variables
uploaded_files = ["image1.jpg"]
prompt = "Describe what you see in this image"
model_name = "mistral-7b-instruct-v0.3-ov"
api_key = "your-api-key" # Optional
print("\nStarting 'Vision inference'")
print(f"Uploaded_files: {uploaded_files}")
print(f"Prompt: {prompt}")
print(f"Model_name: {model_name}")
# This is the main vision API
response = client.vision(uploaded_files=uploaded_files, prompt=prompt, model_name=model_name, api_key=api_key)
print("\nVision inference response: ", response)
Response
{
"llm_response": "I can see a beautiful landscape with mountains in the background..."
}
Additional Information
- • Category: models
- • Timeout: 360 seconds
- • Requires authentication
/vision_stream/
Vision stream
Generate a streaming inference response from vision model for real-time image analysis.
Request body
Required
uploaded_files
Array of image files to analyze
model_name
Vision model to use for streaming
prompt
Text prompt for image analysis
Optional
max_output
Maximum tokens to generate
temperature
Sampling temperature
sample
Whether to use sampling
trusted_key
Trusted authentication key
Returns
llm_response
Partial response text streamed incrementally as server-sent events
Timeout: This endpoint has a timeout of 360 seconds. The connection remains open for streaming responses.
Request
# Example variables
uploaded_files = ["image1.jpg"]
model_name = "llama-3.2-3b-instruct-ov"
prompt = "Analyze this image and describe the scene"
api_key = "your-api-key" # Optional
print("\nStarting 'Vision stream'")
print(f"Uploaded_files: {uploaded_files}")
print(f"Model_name: {model_name}")
print(f"Prompt: {prompt}")
# This is the streaming vision_stream API
stream = client.vision_stream(uploaded_files=uploaded_files, model_name=model_name, prompt=prompt, api_key=api_key)
for chunk in stream:
print(chunk)
Response
// Streaming response format
data: {"llm_response": "The image shows a bustling city street..."}
Additional Information
- • Category: models
- • Timeout: 360 seconds
- • Supports streaming responses
- • Requires authentication
/rank/
Semantic ranking
Execute semantic similarity ranking with reranker model to rank documents by relevance to a query.
Request body
Required
query
The search query to rank documents against
documents
Array of documents to rank
Optional
model_name
Reranker model to use
text_chunk_size
Size of text chunks for processing
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
query = "machine learning algorithms"
documents = ["Document about neural networks","Article on cooking recipes","Paper on deep learning"]
api_key = "your-api-key" # Optional
print("\nStarting 'Semantic ranking'")
print(f"Query: {query}")
print(f"Documents: {documents}")
# This is the main rank API
response = client.rank(query=query, documents=documents, api_key=api_key)
print("\nSemantic ranking response: ", response)
Response
{
"llm_response": [
{
"doc_id": 0,
"score": 0.95
},
{
"doc_id": 2,
"score": 0.87
},
{
"doc_id": 1,
"score": 0.12
}
]
}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Requires authentication
/classify/
Text classification
Execute text classification inference, primarily used for safety controls and content moderation.
Request body
Required
model_name
Classification model to use
context
Text to classify
Optional
text_chunk_size
Size of text chunks for processing
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
model_name = "phi-4-ov"
context = "This is a sample text to classify for safety"
api_key = "your-api-key" # Optional
print("\nStarting 'Text classification'")
print(f"Model_name: {model_name}")
print(f"Context: {context}")
# This is the main classify API
response = client.classify(model_name=model_name, context=context, api_key=api_key)
print("\nText classification response: ", response)
Response
{
"llm_response": {
"classification": "safe",
"confidence": 0.98
}
}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Requires authentication
/embedding/
Generate embeddings
Generate vector embeddings for text using embedding models for semantic search and similarity.
Request body
Required
model_name
Embedding model to use
context
Text to generate embeddings for
Optional
text_chunk_size
Size of text chunks for processing
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
model_name = "llama-3.2-3b-instruct-ov"
context = "This is a sample text to embed"
api_key = "your-api-key" # Optional
print("\nStarting 'Generate embeddings'")
print(f"Model_name: {model_name}")
print(f"Context: {context}")
# This is the main embedding API
response = client.embedding(model_name=model_name, context=context, api_key=api_key)
print("\nGenerate embeddings response: ", response)
Response
{
"embeddings": [
0.1,
-0.2,
0.3,
0.4,
-0.1
]
}
Additional Information
- • Category: models
- • Timeout: 60 seconds
- • Requires authentication
/list_all_models/
List models
Returns a list of all models available on the server.
Request body
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 3 seconds.
Request
# Example variables
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'List models'")
# This is the main list_all_models API
response = client.list_all_models(trusted_key=trusted_key)
print("\nList models response: ", response)
Response
{
"response": [
"llama-3.2-1b-instruct-ov",
"llama-3.2-3b-instruct-ov",
"mistral-7b-instruct-v0.3-ov",
"phi-4-ov"
]
}
Additional Information
- • Category: models
- • Timeout: 3 seconds
- • Requires authentication
/system_info/
System information
Returns key information about the system and server configuration.
Request body
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 3 seconds.
Request
# Example variables
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'System information'")
# This is the main system_info API
response = client.system_info(trusted_key=trusted_key)
print("\nSystem information response: ", response)
Response
{
"response": {
"version": "1.0.0",
"gpu_count": 2,
"memory": "32GB",
"status": "active"
}
}
Additional Information
- • Category: models
- • Timeout: 3 seconds
- • Requires authentication
/model_lookup/
Model information
Returns detailed model card information about a selected model.
Request body
Required
model_name
Name of the model to look up
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 3 seconds.
Request
# Example variables
model_name = "mistral-7b-instruct-v0.3-ov"
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'Model information'")
print(f"Model_name: {model_name}")
# This is the main model_lookup API
response = client.model_lookup(model_name=model_name, trusted_key=trusted_key)
print("\nModel information response: ", response)
Response
{
"response": {
"name": "mistral-7b-instruct-v0.3-ov",
"parameters": "7B",
"context_length": 4096,
"loaded": true
}
}
Additional Information
- • Category: models
- • Timeout: 3 seconds
- • Requires authentication
/model_load/
Load model
Explicitly loads a selected model into memory on the API server, useful as a preparation step.
Request body
Required
model_name
Name of the model to load into memory
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 120 seconds.
Request
# Example variables
model_name = "phi-4-ov"
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'Load model'")
print(f"Model_name: {model_name}")
# This is the main model_load API
response = client.model_load(model_name=model_name, trusted_key=trusted_key)
print("\nLoad model response: ", response)
Response
{
"response": {
"model_name": "phi-4-ov",
"status": "loaded",
"memory_usage": "6.2GB"
}
}
Additional Information
- • Category: models
- • Timeout: 120 seconds
- • Requires authentication
/model_unload/
Unload model
Explicitly unloads a selected model from memory on the API server.
Request body
Required
model_name
Name of the model to unload from memory
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 30 seconds.
Request
# Example variables
model_name = "llama-3.2-1b-instruct-ov"
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'Unload model'")
print(f"Model_name: {model_name}")
# This is the main model_unload API
response = client.model_unload(model_name=model_name, trusted_key=trusted_key)
print("\nUnload model response: ", response)
Response
{
"response": {
"model_name": "llama-3.2-1b-instruct-ov",
"status": "unloaded",
"memory_freed": "6.2GB"
}
}
Additional Information
- • Category: models
- • Timeout: 30 seconds
- • Requires authentication
RAG (Retrieval Augmented Generation)
Document and library-based question answering with semantic search and context retrieval.
/document_inference/
Document Q&A
Specialized inference to ask questions about uploaded documents. Combines document parsing, semantic search, and LLM inference.
Request body
Required
question
Question to ask about the document
uploaded_document
Document file to analyze
Optional
model_name
LLM model to use for answering
text_chunk_size
Size of text chunks for processing
tables_only
Whether to focus only on tables
use_top_n_context
Number of top context chunks to use
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
question = "What is the main conclusion of this research paper?"
uploaded_document = "research_paper.pdf"
model_name = "phi-3-ov"
api_key = "your-api-key" # Optional
print("\nStarting 'Document Q&A'")
print(f"Question: {question}")
print(f"Uploaded_document: {uploaded_document}")
print(f"Model_name: {model_name}")
# This is the main document_inference API
response = client.document_inference(question=question, uploaded_document=uploaded_document, model_name=model_name, api_key=api_key)
print("\nDocument Q&A response: ", response)
Response
{
"response": "Based on the document analysis, the main conclusion is..."
}
Additional Information
- • Category: rag
- • Timeout: 60 seconds
- • Requires authentication
/library_inference/
Library Q&A
Specialized RAG inference that ranks entries from a library and generates responses based on retrieved content.
Request body
Required
question
Question to ask about the library content
library_name
Name of the library to search
model_name
LLM model to use for answering
Optional
use_top_n_context
Number of top context chunks to use
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
question = "What are the key features of the product?"
library_name = "product_docs"
model_name = "llama-3.2-3b-instruct-ov"
api_key = "your-api-key" # Optional
print("\nStarting 'Library Q&A'")
print(f"Question: {question}")
print(f"Library_name: {library_name}")
print(f"Model_name: {model_name}")
# This is the main library_inference API
response = client.library_inference(question=question, library_name=library_name, model_name=model_name, api_key=api_key)
print("\nLibrary Q&A response: ", response)
Response
{
"response": "Based on the library content, the key features include..."
}
Additional Information
- • Category: rag
- • Timeout: 60 seconds
- • Requires authentication
/document_batch_analysis/
Batch document analysis
Analyzes multiple documents with a set of questions, ideal for processing contracts, invoices, or reports with consistent queries.
Request body
Required
uploaded_files
Array of documents to analyze
question_list
List of questions to ask each document
Optional
model_name
LLM model to use for analysis
reranker
Reranker model for result optimization
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 600 seconds.
Request
# Example variables
uploaded_files = ["contract1.pdf","contract2.pdf"]
question_list = ["What is the governing law?","What is the termination notice period?"]
model_name = "mistral-7b-instruct-v0.3-ov"
api_key = "your-api-key" # Optional
print("\nStarting 'Batch document analysis'")
print(f"Uploaded_files: {uploaded_files}")
print(f"Question_list: {question_list}")
print(f"Model_name: {model_name}")
# This is the main document_batch_analysis API
response = client.document_batch_analysis(uploaded_files=uploaded_files, question_list=question_list, model_name=model_name, api_key=api_key)
print("\nBatch document analysis response: ", response)
Response
{
"response": {
"results": [
{
"document": "contract1.pdf",
"answers": [
"Delaware",
"30 days"
]
},
{
"document": "contract2.pdf",
"answers": [
"California",
"60 days"
]
}
]
}
}
Additional Information
- • Category: rag
- • Timeout: 600 seconds
- • Requires authentication
Library Management
Create and manage document libraries for knowledge base construction and semantic search capabilities.
/create_new_library/
Create library
Creates a new library which is a collection of documents that are parsed, indexed and organized for knowledge retrieval.
Request body
Required
library_name
Name for the new library
Optional
account_id
Account identifier
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 30 seconds.
Request
# Example variables
library_name = "contract_library"
account_id = "user123"
api_key = "your-api-key" # Optional
print("\nStarting 'Create library'")
print(f"Library_name: {library_name}")
print(f"Account_id: {account_id}")
# This is the main create_new_library API
response = client.create_new_library(library_name=library_name, account_id=account_id, api_key=api_key)
print("\nCreate library response: ", response)
Response
{
"response": {
"library_name": "contract_library",
"status": "created",
"doc_count": 0
}
}
Additional Information
- • Category: library
- • Timeout: 30 seconds
- • Requires authentication
/add_files/
Add files to library
Core method for adding files to a Library, which are parsed, text chunked and indexed automatically upon upload.
Request body
Required
library_name
Name of the library to add files to
uploaded_files
Array of files to upload and process
Optional
account_id
Account identifier
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 300 seconds.
Request
# Example variables
library_name = "contract_library"
uploaded_files = ["contract1.pdf","contract2.pdf"]
api_key = "your-api-key" # Optional
print("\nStarting 'Add files to library'")
print(f"Library_name: {library_name}")
print(f"Uploaded_files: {uploaded_files}")
# This is the main add_files API
response = client.add_files(library_name=library_name, uploaded_files=uploaded_files, api_key=api_key)
print("\nAdd files to library response: ", response)
Response
{
"response": {
"files_processed": 2,
"chunks_created": 45,
"status": "completed"
}
}
Additional Information
- • Category: library
- • Timeout: 300 seconds
- • Requires authentication
/query/
Query library
Execute a text-based query against an existing library to find relevant documents and passages.
Request body
Required
library_name
Name of the library to query
user_query
Search query text
Optional
result_count
Number of results to return
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
library_name = "contract_library"
user_query = "termination clauses"
result_count = 10
api_key = "your-api-key" # Optional
print("\nStarting 'Query library'")
print(f"Library_name: {library_name}")
print(f"User_query: {user_query}")
print(f"Result_count: {result_count}")
# This is the main query API
response = client.query(library_name=library_name, user_query=user_query, result_count=result_count, api_key=api_key)
print("\nQuery library response: ", response)
Response
{
"response": [
{
"doc_id": 1,
"text": "Termination clause content...",
"score": 0.95
}
]
}
Additional Information
- • Category: library
- • Timeout: 60 seconds
- • Requires authentication
/get_library_card/
Get library info
Get comprehensive metadata information about a library including document count, embedding status, and configuration.
Request body
Required
library_name
Name of the library to get information for
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 10 seconds.
Request
# Example variables
library_name = "contract_library"
api_key = "your-api-key" # Optional
print("\nStarting 'Get library info'")
print(f"Library_name: {library_name}")
# This is the main get_library_card API
response = client.get_library_card(library_name=library_name, api_key=api_key)
print("\nGet library info response: ", response)
Response
{
"response": {
"library_name": "contract_library",
"doc_count": 25,
"embedding_status": "installed",
"created_date": "2024-01-15"
}
}
Additional Information
- • Category: library
- • Timeout: 10 seconds
- • Requires authentication
/install_embedding/
Install embeddings
Installs vector embeddings across a library and creates the appropriate vectors in the vector database for semantic search.
Request body
Required
library_name
Name of the library to install embeddings for
Optional
embedding_model
Embedding model to use
vector_db
Vector database to use
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 600 seconds.
Request
# Example variables
library_name = "contract_library"
embedding_model = "llama-3.2-1b-instruct-ov"
api_key = "your-api-key" # Optional
print("\nStarting 'Install embeddings'")
print(f"Library_name: {library_name}")
print(f"Embedding_model: {embedding_model}")
# This is the main install_embedding API
response = client.install_embedding(library_name=library_name, embedding_model=embedding_model, api_key=api_key)
print("\nInstall embeddings response: ", response)
Response
{
"response": {
"embeddings_created": 1250,
"vector_db": "milvus",
"status": "completed"
}
}
Additional Information
- • Category: library
- • Timeout: 600 seconds
- • Requires authentication
/semantic_query/
Semantic search
Executes a semantic/vector query against embeddings for more accurate content retrieval based on meaning rather than keywords.
Request body
Required
library_name
Name of the library to search
user_query
Semantic search query
Optional
result_count
Number of results to return
db
Database type
vector_db
Vector database type
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
library_name = "contract_library"
user_query = "contract termination conditions"
result_count = 5
api_key = "your-api-key" # Optional
print("\nStarting 'Semantic search'")
print(f"Library_name: {library_name}")
print(f"User_query: {user_query}")
print(f"Result_count: {result_count}")
# This is the main semantic_query API
response = client.semantic_query(library_name=library_name, user_query=user_query, result_count=result_count, api_key=api_key)
print("\nSemantic search response: ", response)
Response
{
"response": [
{
"doc_id": 1,
"similarity_score": 0.92,
"text": "Contract termination..."
}
]
}
Additional Information
- • Category: library
- • Timeout: 60 seconds
- • Requires authentication
/get_document_list/
List documents
Returns a comprehensive list of all documents contained in a specific library with metadata.
Request body
Required
library_name
Name of the library to list documents from
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 30 seconds.
Request
# Example variables
library_name = "contract_library"
api_key = "your-api-key" # Optional
print("\nStarting 'List documents'")
print(f"Library_name: {library_name}")
# This is the main get_document_list API
response = client.get_document_list(library_name=library_name, api_key=api_key)
print("\nList documents response: ", response)
Response
{
"response": [
{
"doc_id": 1,
"filename": "contract1.pdf",
"pages": 12,
"upload_date": "2024-01-15"
},
{
"doc_id": 2,
"filename": "contract2.pdf",
"pages": 8,
"upload_date": "2024-01-16"
}
]
}
Additional Information
- • Category: library
- • Timeout: 30 seconds
- • Requires authentication
/get_document_text/
Extract document text
Returns the complete text extract of a selected document from a specified library for review or processing.
Request body
Required
library_name
Name of the library containing the document
Optional
doc_id
Document ID to extract text from
doc_fn
Document filename to extract text from
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 60 seconds.
Request
# Example variables
library_name = "contract_library"
doc_id = "1"
api_key = "your-api-key" # Optional
print("\nStarting 'Extract document text'")
print(f"Library_name: {library_name}")
print(f"Doc_id: {doc_id}")
# This is the main get_document_text API
response = client.get_document_text(library_name=library_name, doc_id=doc_id, api_key=api_key)
print("\nExtract document text response: ", response)
Response
{
"response": {
"doc_id": 1,
"filename": "contract1.pdf",
"text": "Full document text content..."
}
}
Additional Information
- • Category: library
- • Timeout: 60 seconds
- • Requires authentication
Agent Execution
Execute automated multi-step processes and workflows using pre-configured intelligent agents.
/run_agent/
Execute agent
Executes a pre-configured agent process for automated multi-step document analysis and task completion.
Request body
Required
process_name
Name of the agent process to execute
Optional
process_zip
Agent process zip file to upload and execute
input_list
List of inputs for the agent process
text
Text input for the agent
snippet
Code or text snippet input
document_file
Document file for agent processing
table_file
Table/spreadsheet file for processing
image_file
Image file for agent analysis
source_file
Source code file for processing
user_files
Additional user files for processing
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 300 seconds.
Request
# Example variables
process_name = "contract_analyzer"
document_file = "contract.pdf"
input_list = ["analysis_type","text","comprehensive"]
api_key = "your-api-key" # Optional
print("\nStarting 'Execute agent'")
print(f"Process_name: {process_name}")
print(f"Document_file: {document_file}")
print(f"Input_list: {input_list}")
# This is the main run_agent API
response = client.run_agent(process_name=process_name, document_file=document_file, input_list=input_list, api_key=api_key)
print("\nExecute agent response: ", response)
Response
{
"response": {
"agent_name": "contract_analyzer",
"status": "completed",
"results": {
"effective_date": "2024-01-01",
"base_salary": "$150,000"
},
"execution_time": "45s"
}
}
Additional Information
- • Category: agent
- • Timeout: 300 seconds
- • Requires authentication
/lookup_agent/
Find agent
Checks if a specific agent process exists and is available on the server for execution.
Request body
Required
process_name
Name of the agent process to look up
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 10 seconds.
Request
# Example variables
process_name = "contract_analyzer"
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'Find agent'")
print(f"Process_name: {process_name}")
# This is the main lookup_agent API
response = client.lookup_agent(process_name=process_name, trusted_key=trusted_key)
print("\nFind agent response: ", response)
Response
{
"response": {
"process_name": "contract_analyzer",
"available": true,
"description": "Analyzes contracts for key terms",
"version": "1.2.0"
}
}
Additional Information
- • Category: agent
- • Timeout: 10 seconds
- • Requires authentication
/get_all_agents/
List all agents
Returns a comprehensive list of all available agent processes on the server with their capabilities.
Request body
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 10 seconds.
Request
# Example variables
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'List all agents'")
# This is the main get_all_agents API
response = client.get_all_agents(trusted_key=trusted_key)
print("\nList all agents response: ", response)
Response
{
"response": [
{
"name": "contract_analyzer",
"description": "Contract analysis agent",
"version": "1.2.0"
},
{
"name": "invoice_processor",
"description": "Invoice processing agent",
"version": "1.0.1"
}
]
}
Additional Information
- • Category: agent
- • Timeout: 10 seconds
- • Requires authentication
Utilities & Administration
Server management, health checks, and administrative functions for monitoring and control.
/ping/
Health check
Quick health check to verify if the API server is responsive and operational.
Request body
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 5 seconds.
Request
# Example variables
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'Health check'")
# This is the main ping API
response = client.ping(trusted_key=trusted_key)
print("\nHealth check response: ", response)
Response
{
"response": {
"status": "ok",
"timestamp": "2024-01-15T10:30:00Z",
"version": "1.0.0"
}
}
Additional Information
- • Category: utility
- • Timeout: 5 seconds
- • Requires authentication
/server_stop/
Stop server
Gracefully stops the API server. Use with caution as this will terminate all active connections.
Request body
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 10 seconds.
Request
# Example variables
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'Stop server'")
# This is the main server_stop API
response = client.server_stop(trusted_key=trusted_key)
print("\nStop server response: ", response)
Response
{
"response": {
"status": "stopping",
"message": "Server shutdown initiated"
}
}
Additional Information
- • Category: utility
- • Timeout: 10 seconds
- • Requires authentication
/get_api_catalog/
API catalog
Returns a complete catalog of all available API endpoints with their specifications and parameters.
Request body
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 10 seconds.
Request
# Example variables
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'API catalog'")
# This is the main get_api_catalog API
response = client.get_api_catalog(trusted_key=trusted_key)
print("\nAPI catalog response: ", response)
Response
{
"response": [
{
"api_name": "inference",
"endpoint": "/inference/",
"method": "POST",
"timeout": 60
},
{
"api_name": "stream",
"endpoint": "/stream/",
"method": "POST",
"timeout": 60
}
]
}
Additional Information
- • Category: utility
- • Timeout: 10 seconds
- • Requires authentication
/get_db_info/
Database info
Returns information about registered databases and vector databases available on the server.
Request body
Optional
trusted_key
Trusted authentication key
Returns
llm_response
Complete generated response from the model
Timeout: This endpoint has a timeout of 10 seconds.
Request
# Example variables
trusted_key = "your-trusted-key" # Optional
print("\nStarting 'Database info'")
# This is the main get_db_info API
response = client.get_db_info(trusted_key=trusted_key)
print("\nDatabase info response: ", response)
Response
{
"response": {
"databases": [
"mongo",
"sqlite"
],
"vector_databases": [
"milvus",
"faiss"
],
"default_db": "mongo",
"default_vector_db": "milvus"
}
}
Additional Information
- • Category: utility
- • Timeout: 10 seconds
- • Requires authentication