eventUpdated: 5/21/2025
What are Generative Code Actions and how to use them?
Generative Code Actions are a powerful feature that allows embedding generative artificial intelligence operations (to create text, images, documents, and data) directly into Botmaker's Code Actions.
What can you do with Generative Actions?
Generative Code Actions enable you, for example, to:
- Extract information from images (e.g., data from a document photo, image classification, inventories generated from a photograph, etc.).
- Analyze documents (summarize content, identify risks in contracts, extract totals or items from PDF invoices, etc.).
- Classify and synthesize user messages (create summaries, detect user intentions, categorize tickets based on customer-provided free-form descriptions, etc.).
- Process responses from external APIs (use AI to analyze and simplify large JSON responses, for instance).
Flexible and multimodal: supports text, images, and documents (PDF, PNG, JPG, etc., up to 20 MB).
Differences compared to other features
- It is not an intelligent agent: It does not maintain conversations or pursue long-term objectives. It executes a function ("tool") and returns a result in a variable.
- It does not primarily rely on a Knowledge Base or Content Bases: Incorporating them is recommended only when genuinely beneficial and for very large fragments/documents. For specific rules or shorter lists, it is more efficient to include them directly in the prompt.
- It does not replace the classic Prompt or Agent functions. It's a specific tool ideal for handling "one-shot" processing tasks.
How to set up a Generative Code Action?
1. Receive the input (user, image, file, API)
It can be:
- A user message (text)
- An image (file)
- A document (PDF, etc.)
- An API response (e.g., a JSON previously processed in the flow)
2. Set up the Code Action
You should structure the Code Action by including:
- Instructions (Prompt): Clearly specify the desired task (e.g., “You are an assistant specialized in categorizing household issues. Read the options and classify the user's message into these categories…”).
- Dynamic input (Query): The information provided by the user or the content you want analyzed (text, image, file).
- Optional - Specify Format (Schema): If you want the result in JSON, explicitly define the structure ("I want you to return a JSON object with the following keys: problem, summary..."). Recommended for automated processing scenarios.
- Choose Generative Model: Select from various available models (e.g., Flash for fast and cost-efficient answers, Thinking/Reasoning for higher quality and depth).
- Assign Output Variable: Determine where to store the result for subsequent use in the flow.
Minimal structure example
Basic code example:
https://gist.github.com/hernanliendo/a899224343d967e1fa09d2437579a156
Example with multimedia, we send an image and describe what is there:
https://gist.github.com/hernanliendo/4a0f9745354863570f773b7c63a20b29
Example where it parses the user's message and extracts information in variables in a JSON:
https://gist.github.com/hernanliendo/011595222e2ebd3a1117fdbaade7bacc