Agent evaluation

Test your AI agents systematically with question sets. Run evaluations, track results over time, and identify improvement opportunities to ensure reliable agent responses.

4 min read

What is Agent Evaluation?

Agent Evaluation helps you test how your agent responds to a large set of questions.

Use it for:

  • Quality assurance before launching your agent
  • Identifying areas where your agent needs improvement
  • Regression testing after making changes to your knowledge base
  • Benchmarking agent performance over time

Agent Evaluation InterfaceAgent Evaluation Interface

Getting Started

1. Select an Agent

Choose which agent you want to evaluate from the Agent dropdown at the top of the page.

2. Choose Test Language

Select the language for your test questions:

  • EN - English
  • FR - French

Your set of questions in english and in french are separate and the agent will respond in the same language as the question.

Adding Questions

You can:

  • type your questions manually
  • import a file of questions
  • ask the platform to generate questions for you

Uploading Questions

Supported formats: CSV or TXT files

  1. Click Upload Questions
  2. Select a .csv or .txt file from your computer
  3. Questions should be one per line/row
  4. For CSV files, use only the first column (delete any extra columns)
  5. If your question already exists in your set it will simply not be added so you don't have duplicates

Important notes:

  • Maximum 50 questions per test run
  • If uploading from spreadsheet software (Excel, Numbers), export as CSV first
  • Files are automatically saved to your question set

Option B: Generate Questions

Let the system automatically generate questions from your agent's knowledge base:

  1. Click Generate Questions
  2. Choose the number of questions (5-50)
  3. Select content types to include (documents, files, web sources)
  4. Review the generated questions
  5. Select which ones to keep and click Add Selected

To give you a complete picture of your agent's performance, the evaluation feature generates a mix of relevant and off-topic questions. This helps you see how well your agent handles queries that fall outside its intended scope.

Managing Your Question Set

Once you have questions loaded, you can:

  • Select/deselect questions using checkboxes (only selected questions will run)
  • Edit question text by clicking on it
  • Delete questions you don't need
  • Add new questions manually using the "+ Add Question" button

Download Questions

Export your questions as a CSV file using the Download Questions button. Useful for:

  • Backing up your question set
  • Sharing with team members
  • Editing in bulk using a spreadsheet

Running Tests

Start a Test

  1. Ensure you have questions selected (checkboxes checked)
  2. Click the Run Test button
  3. Watch real-time progress as each question is answered
  4. Results appear one by one as they complete

Maximum: 50 questions per run You can only run one evaluation at a time for a specific agent across your organisation.

During Test Execution

You'll see:

  • Progress bar showing completion status
  • Live results appearing as each question finishes
  • Execution time for each response
  • Success/failure indicators

Stop a Running Test

If you need to stop a test before completion:

  1. Click the Stop button in the progress area
  2. Wait for the system to safely stop processing
  3. Partial results will be preserved

Understanding Results

For Each Question-Response Pair

Click any question to expand and view:

Response Details:

  • Full agent response (with markdown formatting)
  • Execution time (how long it took to generate)
  • Tools used (search, document retrieval, etc.)
  • Sources cited (documents or web pages)

Feedback Options:

  • Thumbs up - Good response
  • Thumbs down - Problematic response (add details about why)

Feedback Details

When you mark a response as negative, you can add specific notes:

  • What was wrong with the response
  • What you expected instead
  • Any patterns you notice

This feedback helps you identify:

  • Knowledge gaps in your agent
  • Questions that need better training data
  • Common failure patterns

Export Results

Download test results as CSV using the Download Results button.

The export includes:

  • Question text
  • Agent response
  • Success/failure status
  • Execution time
  • Error messages (if any)
  • Feedback ratings

History Tab

View Past Test Runs

Switch to the History tab to see all previous evaluations:

Run List View:

  • Date and time of each run
  • Status (Completed, Stopped, In Progress, Error)
  • Total questions tested
  • Success/failure counts

Run Details View:

  1. Click any run to see detailed results
  2. Expand individual question-response pairs
  3. Add feedback to past responses
  4. Export specific run results as CSV

Pagination

  • Runs list: 10 runs per page
  • Run details: 50 results per page

Use the pagination controls at the bottom to navigate through larger result sets.

Best Practices

Creating Effective Question Sets

  • Cover different topics from your knowledge base
  • Include edge cases and tricky questions
  • Mix simple and complex questions
  • Test common user queries

Regular Testing Schedule

Consider running evaluations:

  • Before major releases or updates
  • After adding new content to knowledge base
  • When users report issues with specific topics
  • Monthly for quality monitoring

Troubleshooting

"Maximum of 50 questions reached"

You've selected too many questions. Uncheck some questions or split into multiple test runs.

"Test already running"

Another test is in progress. Wait for it to complete or stop it first.

File Upload Errors

"Invalid file type"

  • Use only .csv or .txt files
  • If using spreadsheet software, export as CSV

"Multiple columns detected"

  • Delete all columns except the first (questions) column
  • Or save as plain text (.txt) with one question per line

"No questions found"

  • Check that your file isn't empty
  • Ensure each row contains text

Cannot See Results

If a test is running but you don't see results:

  1. Check the History tab
  2. Look for the in-progress run
  3. Click Reconnect to resume viewing live results

Limits and Constraints

  • Max questions per run: 50
  • Max questions stored: 100 per agent
  • File size: Reasonable CSV/TXT files only
  • Concurrent runs: One test per agent at a time