Documents & Images

How ATG processes documents and images: multimodal pipeline, text extraction, language support, and guidelines for formatting documents for better AI results.

3 min read

This Guy AI assistant implements a sophisticated multimodal document processing pipeline that transforms complex documents into intelligently indexed knowledge bases. The system achieves 95%+ accuracy in text extraction and supports 50+ languages with automatic detection.

One differentiator of This Guy is to be able to smartly handle images within documents, and be able to reuse images in its replies. Here is how the pipeline works:

Image processing and integration pipeline in ATGImage processing and integration pipeline in ATG

Image processing and integration pipeline in ATG

Guidelines for your documents formatting

To optimize the work and accuracy of This Guy it is essential to make your (future) documents as explicit as possible. Here are some tips to help your teams structure their documents so that they are more easily interpretable by AI tools like Mistral or others:

Clarity and precision: Ensure that the content of your documents is clear and precise. Avoid ambiguities and be as detailed as possible.

Consistent structure: Maintain a consistent structure between textual and visual information. Use headings and subheadings to organize the content and facilitate understanding.

Separation of elements: Clearly distinguish between different types of information. For example, separate text paragraphs from images or graphics.

Use of keywords: Integrate relevant keywords that help identify the content and context of the information presented.

Uniform formatting: Use uniform formatting for similar elements. For example, if you have multiple images, make sure they are all formatted in the same way.

Annotations and captions: Add annotations and captions to images and graphics to provide additional context and clarify their content.

Pipeline architecture overview

ATG image processing pipeline integrationATG image processing pipeline integration

ATG image processing pipeline integration

The ATG document integration system operates through a five-stage pipeline that seamlessly processes documents from initial upload to final knowledge indexing. This architecture follows enterprise-grade standards for scalability and reliability.

Stage 1: Document ingestion

The system accepts multiple document formats including PDF, Word, PowerPoint, and many more. Documents are immediately validated and queued for processing with automatic format detection and metadata extraction.

Stage 2: Content extraction & separation

The pipeline intelligently separates textual content from embedded images, maintaining document structure while preparing assets for specialized processing workflows.

Stage 3: AI processing layer

The AI processing stage implements parallel multimodal analysis using state-of-the-art computer vision and natural language processing technologies. This stage processes both text and images simultaneously to maximize efficiency.

Text processing stream:

  • Advanced OCR for scanned documents and embedded text
  • Language detection and text normalization
  • Semantic analysis and entity extraction

Image processing stream:

  • Visual content analysis and description generation
  • Chart, diagram, and table recognition
  • Contextual image understanding with business logic

Stage 4: Content integration & enrichment

The system combines processed text and image descriptions into structured markdown format, creating a unified document representation that preserves both visual and textual information.

Markdown generation process:

  • Images are stored securely in ATG cloud storage
  • AI-generated descriptions serve as alt-text for accessibility
  • Hierarchical structure preservation with proper headings and formatting
  • Cross-referencing between images and related text content

The final stage implements advanced vectorization and embedding techniques to enable semantic search capabilities.

Indexing features:

  • Chunking strategy: Documents are segmented into logical chunks for optimal retrieval
  • Vector embeddings: Both text and image descriptions are converted to high-dimensional vectors
  • Hybrid search: Combines keyword and semantic search for comprehensive results
  • Real-time updates: Automatic re-indexing when documents are modified

Technology stack

The ATG system leverages cutting-edge AI technologies while maintaining flexibility and data sovereignty (depending on the AI policy chosen by administrators):

OCR Processing

  • OCR: High-accuracy text recognition with >95% precision on standard documents
  • Multilingual support: Automatic detection of 50+ languages
  • Handwriting recognition: Advanced capabilities for manuscript processing

Multimodal vision AI

  • Enterprise-grade multimodal understanding for comprehensive document analysis
  • Context-aware descriptions: Detailed image analysis with business context integration
  • Technical diagram recognition: Specialized processing for charts, graphs, and technical illustrations

European AI solutions

  • EU-compliant hosting for data sovereignty requirements
  • Include, when applicable, open-weight models in the processing workflows
  • Efficient multimodal processing with European data residency

Language enhancement

  • Advanced text enhancement and semantic analysis
  • Automatic improvement of document structure and readability