Mistral has been churning out several interesting open-weight models for a few years now. I was thrilled to read that Microsoft added the most recent Mistral OCR to the Azure AI Foundry catalog because it’s nice to have options beyond the Azure Document Intelligence.

If you have any workloads that involve document processing, this might be worth integrating. Here we’ll take a quick look at how to get it set up in Azure AI Foundry and actually use it in your projects.

Why do you care?

Mistral OCR understands document structure: Naïve extract is fine. We have other projects that can do this (even with handwriting) but this is really not sufficient for every business use case. We frequently need to extract tables, equations, figures

Output: Markdown! Or JSON… but seriously MARKDOWN!

Multilingual: 99%+ accuracy across dozens of languages and scripts. Surprisingly broad performance here.

Prerequisites

  • An Azure subscription
  • Basic familiarity with Azure AI services
  • Your Azure AI Foundry hub needs to be in one of these regions: East US, West US3, South Central US, West US, North Central US, East US 2, or Sweden Central. This is important since other regions won’t work yet.
  • Ability to run a test script once you get set up. We’ll build a quick python example so you’ll need a system with a python interpreter installed.

Step 1: Setting Up Your Azure AI Foundry Hub

If you don’t already have an Azure AI Foundry hub, here’s the quick setup:

  1. Head to the Azure portal and search for “AI Foundry”
  2. Create a new hub (make sure you pick one of the supported regions mentioned above)
  3. Create a project within that hub
  4. Note your resource group and region

Step 2: Deploying Mistral OCR

  1. Find the model: Click on Deploy Model, Deploy Base Model
  2. Search for “Mistral OCR“: It should pop up in the catalog
  3. Deploy it: Click “Deploy” and select “Pay-as-you-go” option
  4. Get your credentials: Once deployed, you’ll land on a page showing your API endpoint and key

Step 3: Your First OCR Call

Let’s start with a simple example. This is based on the 2 stage process documented in Microsoft’s GitHub — https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/mistral/python/mistral-ocr-with-vlm.ipynb.

Say you have a scanned PDF you want to process. You can place it in the source folder and run this:

import base64
import httpx
import json
import os
import shutil
from datetime import datetime
from pathlib import Path

# Configuration
AZURE_ENDPOINT = "https://your-mistral-endpoint.eastus2.models.ai.azure.com"  # Replace with your endpoint
AZURE_API_KEY = "your-api-key"  # Replace with your API key

# Directory setup
SOURCE_DIR = "source"
PROCESSED_DIR = "processed"
OUTPUT_DIR = "output"

# Create directories if they don't exist
for directory in [SOURCE_DIR, PROCESSED_DIR, OUTPUT_DIR]:
    os.makedirs(directory, exist_ok=True)


def encode_document(file_path):
    """Encode a document to base64"""
    with open(file_path, "rb") as file:
        return base64.b64encode(file.read()).decode("utf-8")


def call_mistral_ocr(base64_data):
    """Call the Mistral OCR API with the encoded PDF document"""
    # Endpoint with API version as query parameter
    url = f"{AZURE_ENDPOINT}/v1/ocr?api-version=2024-05-01-preview"
    
    # Set headers for authentication
    headers = {
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Authorization": f"Bearer {AZURE_API_KEY}"
    }
      # We're only processing PDFs for simplicity
    mime_type = "application/pdf"
    
    # Create the payload
    payload = {
        "model": "mistral-ocr-2503",
        "document": {
            "type": "document_url",
            "document_url": f"data:{mime_type};base64,{base64_data}"
        },
        "include_image_base64": False
    }
    
    # Send the request to the API
    with httpx.Client() as client:
        response = client.post(url, headers=headers, json=payload, timeout=180.0)
        response.raise_for_status()
        return response.json()


def process_files():
    files = list(Path(SOURCE_DIR).glob('**/*.pdf'))
    print(f"Found {len(files)} files to process")
    
    # Process each file
    for file_path in files:
        print(f"Processing: {file_path}")
        
        try:
            # 1. Encode the document
            encoded_data = encode_document(file_path)
              # 2. Call the OCR API
            ocr_result = call_mistral_ocr(encoded_data)
            
            # 3. Save the result to output directory
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            output_path = Path(OUTPUT_DIR) / f"{file_path.stem}_{timestamp}.json"
            
            with open(output_path, 'w') as f:
                json.dump(ocr_result, f, indent=2)
                
            print(f"Output saved to: {output_path}")
            
            # 4. Move processed file
            dest_path = Path(PROCESSED_DIR) / file_path.name
            shutil.move(str(file_path), str(dest_path))
            print(f"Moved to: {dest_path}")
            
        except Exception as e:
            print(f"Error processing {file_path}: {str(e)}")


if __name__ == "__main__":
    process_files()
    print("Processing complete!")

Step 4: Handling the Response

Mistral OCR responses come back in a JSON object with page number and dimensions… but the OCR extraction portion is in markdown. This has been hugely helpful for several of our workflow processes that already translate documents into markdown storage.

{
  "pages": [
    {
      "index": 0,
      "images": [],
      "markdown": "# Statement of Work Document",
      "dimensions": {
        "dpi": 200,
        "height": 2200,
        "width": 1700
      }
    },
    {
      "index": 1,
      "images": [],
      "markdown": "# here is the markdown from page 2 ...

What’s Next?

Mistral OCR in Azure AI Foundry opens up some interesting possibilities:

  • Automated document workflows: Chain OCR with other AI services for end-to-end processing
  • Knowledge base creation: Turn your document archives into searchable, structured data
  • Multilingual content management: Process international documents at scale
  • Real-time document analysis: Build applications that can understand documents as fast as users can upload them

The combination of speed, accuracy, and intelligent structure understanding makes this a solid choice for serious document processing workflows. Give it a shot and see how it handles your specific use cases—the results might surprise you.

Wrapping Up

The serverless deployment removes infrastructure headaches, the API is straightforward, and the performance claims seem to hold up in practice.

There is currently no OCR playground in Azure AI Foundry, but that appears to be on the horizon. For now, try out the simple python script above to get started.

Have you tried Mistral OCR yet? Drop me a line and let me know how it’s working for your use cases. Always curious to hear about real-world implementations and edge cases that pop up.

Peep categorized this post as: