Action Types

When the AI processes your request, it responds with one or more actions. Each action is a structured instruction that the extension executes on the page. This page documents every action type, its fields, and when the AI uses it.

How actions work

The AI's response is a JSON object with an actions array. Each action has a type field and additional fields depending on the type. The extension executes actions in order.

{
  "actions": [
    { "type": "navigate", "target": "/account" },
    { "type": "show-message", "message": "Navigating to your account page." }
  ]
}

The AI can return multiple actions in a single response. For example, it might navigate to a page, then show a message explaining what it did. Actions are executed sequentially.

Navigates the browser to a different URL. The AI uses this to move between pages, either using routes from a recipe or URLs it discovers in the DOM.

Field Type Description
type "navigate" Action type identifier.
target String The URL path to navigate to (e.g., /account, /products?q=shoes).
message String (optional) A message to show the user explaining the navigation.

When the AI uses it

  • User asks to go to a specific page: "Go to settings"
  • User asks to do something on a different page: "Check my balance" (navigates to account page first)
  • As part of a multi-step task that spans pages

Example

{
  "type": "navigate",
  "target": "/demos/ginko/deposit",
  "message": "Going to the deposit page."
}

click

Clicks on a DOM element identified by a CSS selector. Used for buttons, links, checkboxes, and any clickable element.

Field Type Description
type "click" Action type identifier.
selector String CSS selector for the element to click.
message String (optional) A message explaining what was clicked.

When the AI uses it

  • User asks to press a button: "Submit the form"
  • User asks to click a link: "Click on the cart icon"
  • To trigger UI state changes: opening menus, expanding sections, etc.

Example

{
  "type": "click",
  "selector": "#deposit-form button[type='submit']",
  "message": "Clicking the submit button on the deposit form."
}

execute-js

Executes arbitrary JavaScript code on the page. This is the most powerful and flexible action type. The AI uses it for typing into inputs, manipulating DOM state, reading values, and any interaction that doesn't fit neatly into click or navigate.

Field Type Description
type "execute-js" Action type identifier.
code String JavaScript code to execute in the page context. The code runs via eval() in the content script.
message String (optional) A message explaining what the code does.

When the AI uses it

  • Filling in form fields: setting .value and dispatching input events
  • Scrolling the page to reveal content
  • Reading text content from specific elements
  • Complex DOM manipulation that requires multiple steps
  • Triggering framework-specific events (React, Vue, etc.)

Example

{
  "type": "execute-js",
  "code": "const el = document.querySelector('#amount'); el.value = '5000'; el.dispatchEvent(new Event('input', { bubbles: true }));",
  "message": "Entering 5000 into the amount field."
}
Note: The AI dispatches synthetic events (like input and change) after setting values so that frontend frameworks (React, Vue, etc.) detect the change. Simply setting .value is not enough for most modern apps.

show-message

Displays a text message to the user in the gyoza chat. This is the AI's way of communicating information, answering questions, or explaining what it's doing. No DOM interaction occurs.

Field Type Description
type "show-message" Action type identifier.
message String The text to display to the user.

When the AI uses it

  • Answering a question: "What is my balance?" — reads the page, replies with the amount
  • Explaining an action: "I just navigated to the deposit page"
  • Providing context or instructions
  • Translating page content for the user

Example

{
  "type": "show-message",
  "message": "Your current balance is 150,000 JPY. The most recent transaction was a deposit of 20,000 JPY on March 15."
}

highlight-ui

Visually highlights a DOM element by adding a temporary colored border or overlay. Used to draw the user's attention to a specific part of the page without clicking or modifying it.

Field Type Description
type "highlight-ui" Action type identifier.
selector String CSS selector for the element to highlight.
message String (optional) A message explaining what's being highlighted.

When the AI uses it

  • User asks where something is: "Where is the logout button?"
  • Guiding the user through a process: "Click the button I highlighted"
  • Pointing out specific information on the page

Example

{
  "type": "highlight-ui",
  "selector": ".nav-logout",
  "message": "The logout button is in the top-right navigation area. I've highlighted it for you."
}

fetch

Makes an HTTP request to an API endpoint. The AI uses this when a recipe includes <api-endpoints> or when it determines that calling an API directly is more efficient than interacting with the UI.

Field Type Description
type "fetch" Action type identifier.
url String The URL to request.
method String (optional) HTTP method: GET, POST, PUT, PATCH, DELETE. Defaults to GET.
message String (optional) A message explaining the API call.

When the AI uses it

  • Recipe lists API endpoints and the task is best done via API
  • Fetching data that isn't visible on the current page
  • Submitting data programmatically (e.g., adding an item to cart via API)

Example

{
  "type": "fetch",
  "url": "/api/cart",
  "method": "GET",
  "message": "Checking the current cart contents."
}

clarify

Asks the user for clarification when the request is ambiguous. The AI presents a question and optionally a list of choices for the user to pick from.

Field Type Description
type "clarify" Action type identifier.
message String The question to ask the user.
options String[] (optional) A list of choices the user can pick from. Displayed as buttons in the widget.

When the AI uses it

  • Ambiguous request: "Transfer money" — the AI asks "To which account?"
  • Multiple options: "Open a product page" — the AI shows a list of products to choose
  • Missing required information: "Deposit some money" — the AI asks for the amount

Example

{
  "type": "clarify",
  "message": "How much would you like to deposit?",
  "options": ["5,000 JPY", "10,000 JPY", "50,000 JPY", "Other amount"]
}

Action response schema

The complete JSON schema for the AI response. All fields are validated using Zod on the engine side.

// Action types
type ActionType = "navigate" | "click" | "execute-js" | "show-message"
                | "highlight-ui" | "fetch" | "clarify";

// Single action
interface Action {
  type: ActionType;
  target?: string;    // navigate: URL path
  selector?: string;  // click, highlight-ui: CSS selector
  code?: string;      // execute-js: JavaScript code
  message?: string;   // user-facing message
  url?: string;       // fetch: request URL
  method?: string;    // fetch: HTTP method
  options?: string[]; // clarify: choice options
}

// Full AI response
interface ActionResponse {
  actions: Action[];           // at least one action
  extraRequests?: string[];    // optional: request more page context
}

Extra requests

The AI can optionally include an extraRequests array in its response. This tells the extension to gather additional page context and send it in the next turn. This is useful when the AI needs more information to complete the task.

Request type Description
buttonsSnapshot A list of all buttons currently visible on the page.
linksSnapshot A list of all links on the page with their text and href.
formsSnapshot A snapshot of all forms and their fields.
inputsSnapshot A list of all input fields with their current values.
textContentSnapshot The full text content of the page body.
fullPageSnapshot A complete DOM snapshot of the page (expensive, used as last resort).

Extra requests enable a conversational loop: the AI performs an action, then asks for more context about the resulting page state to inform its next action.