Test your AI agents systematically with question sets. Run evaluations, track results over time, and identify improvement opportunities to ensure reliable agent responses.

What is Agent Evaluation?

Agent Evaluation helps you test how your agent responds to a large set of questions.

Use it for:

Quality assurance before launching your agent
Identifying areas where your agent needs improvement
Regression testing after making changes to your knowledge base
Benchmarking agent performance over time

Agent Evaluation Interface

Getting Started

1. Select an Agent

Choose which agent you want to evaluate from the Agent dropdown at the top of the page.

2. Choose Test Language

Select the language for your test questions:

EN - English
FR - French

Your set of questions in english and in french are separate and the agent will respond in the same language as the question.

Adding Questions

You can:

type your questions manually
import a file of questions
ask the platform to generate questions for you

Uploading Questions

Supported formats: CSV or TXT files

Click Upload Questions
Select a .csv or .txt file from your computer
Questions should be one per line/row
For CSV files, use only the first column (delete any extra columns)
If your question already exists in your set it will simply not be added so you don't have duplicates

Important notes:

Maximum 50 questions per test run
If uploading from spreadsheet software (Excel, Numbers), export as CSV first
Files are automatically saved to your question set

Option B: Generate Questions

Let the system automatically generate questions from your agent's knowledge base:

Click Generate Questions
Choose the number of questions (5-50)
Select content types to include (documents, files, web sources)
Review the generated questions
Select which ones to keep and click Add Selected

To give you a complete picture of your agent's performance, the evaluation feature generates a mix of relevant and off-topic questions. This helps you see how well your agent handles queries that fall outside its intended scope.

Managing Your Question Set

Once you have questions loaded, you can:

Select/deselect questions using checkboxes (only selected questions will run)
Edit question text by clicking on it
Delete questions you don't need
Add new questions manually using the "+ Add Question" button

Download Questions

Export your questions as a CSV file using the Download Questions button. Useful for:

Backing up your question set
Sharing with team members
Editing in bulk using a spreadsheet

Running Tests

Start a Test

Ensure you have questions selected (checkboxes checked)
Click the Run Test button
Watch real-time progress as each question is answered
Results appear one by one as they complete

Maximum: 50 questions per run You can only run one evaluation at a time for a specific agent across your organisation.

During Test Execution

You'll see:

Progress bar showing completion status
Live results appearing as each question finishes
Execution time for each response
Success/failure indicators

Stop a Running Test

If you need to stop a test before completion:

Click the Stop button in the progress area
Wait for the system to safely stop processing
Partial results will be preserved

Understanding Results

For Each Question-Response Pair

Click any question to expand and view:

Response Details:

Full agent response (with markdown formatting)
Execution time (how long it took to generate)
Tools used (search, document retrieval, etc.)
Sources cited (documents or web pages)

Feedback Options:

Thumbs up - Good response
Thumbs down - Problematic response (add details about why)

Feedback Details

When you mark a response as negative, you can add specific notes:

What was wrong with the response
What you expected instead
Any patterns you notice

This feedback helps you identify:

Knowledge gaps in your agent
Questions that need better training data
Common failure patterns

Export Results

Download test results as CSV using the Download Results button.

The export includes:

Question text
Agent response
Success/failure status
Execution time
Error messages (if any)
Feedback ratings

History Tab

View Past Test Runs

Switch to the History tab to see all previous evaluations:

Run List View:

Date and time of each run
Status (Completed, Stopped, In Progress, Error)
Total questions tested
Success/failure counts

Run Details View:

Click any run to see detailed results
Expand individual question-response pairs
Add feedback to past responses
Export specific run results as CSV

Pagination

Runs list: 10 runs per page
Run details: 50 results per page

Use the pagination controls at the bottom to navigate through larger result sets.

Best Practices

Creating Effective Question Sets

Cover different topics from your knowledge base
Include edge cases and tricky questions
Mix simple and complex questions
Test common user queries

Regular Testing Schedule

Consider running evaluations:

Before major releases or updates
After adding new content to knowledge base
When users report issues with specific topics
Monthly for quality monitoring

Troubleshooting

"Maximum of 50 questions reached"

You've selected too many questions. Uncheck some questions or split into multiple test runs.

"Test already running"

Another test is in progress. Wait for it to complete or stop it first.

File Upload Errors

"Invalid file type"

Use only .csv or .txt files
If using spreadsheet software, export as CSV

"Multiple columns detected"

Delete all columns except the first (questions) column
Or save as plain text (.txt) with one question per line

"No questions found"

Check that your file isn't empty
Ensure each row contains text

Cannot See Results

If a test is running but you don't see results:

Check the History tab
Look for the in-progress run
Click Reconnect to resume viewing live results

Limits and Constraints

Max questions per run: 50
Max questions stored: 100 per agent
File size: Reasonable CSV/TXT files only
Concurrent runs: One test per agent at a time

Agent evaluation