Skip to content

๐Ÿ“– Optical Character Recognition (OCR) with Azure AI Vision

๐Ÿ” Problem

You have an image (photograph, scan, or screenshot) containing text that needs to be extracted for digital processing, indexing, or integration into applications. Examples include:

  • ๐Ÿ“‡ Extracting contact info from a photographed business card
  • ๐Ÿ†” Reading text from scanned IDs or documents for applications
  • ๐Ÿฝ๏ธ Capturing text from menus, recipes, or street signs for storage or translation
  • โœ๏ธ Digitizing handwritten notes from a photo

โ˜๏ธ Solution with Azure

Use Azure AI Vision to perform optical character recognition (OCR) on images. Azure AI Vision's image analysis API can detect, locate, and extract text from unstructured images or scanned documents.

๐Ÿ”„ Alternative services (when applicable):

  • ๐Ÿ“„ Azure AI Document Intelligence โ€“ For structured documents like forms, invoices, receipts; supports key-value extraction, tables, prebuilt and custom models
  • ๐ŸŽฏ Azure AI Content Understanding โ€“ For multimodal (image, audio, video, documents) content extraction and custom analyzers

๐Ÿงฉ Components Required

  • ๐ŸŒ Azure resource:

    • Azure AI Services multi-service resource (standalone or part of Azure AI Foundry hub/project)
    • or standalone Computer Vision resource
  • ๐Ÿ“ฑ Client app / SDK:

    • REST API or Azure AI Vision SDK (e.g., Python, .NET)
  • ๐Ÿ” Authentication:

    • Key-based authentication (authorization key)
    • Microsoft Entra ID (token or managed identity)

๐Ÿ—๏ธ Architecture / Development

1. ๐Ÿš€ Provision Azure resource

  • Create Azure AI Vision or AI Services resource in Azure subscription
  • Note the endpoint: https://<resource_name>.cognitiveservices.azure.com/

2. ๐Ÿ”Œ Connect to resource

  • Use key-based or Entra ID authentication

3. ๐Ÿ“ค Submit image for OCR

  • Image requirements:

    • Format: JPEG, PNG, GIF, or BMP
    • Size: < 4 MB
    • Dimensions: > 50x50 pixels
  • API call methods:

    REST:

    https://<endpoint>/computervision/imageanalysis:analyze?features=read&...
    

    SDK (Python example):

    from azure.ai.vision.imageanalysis import ImageAnalysisClient
    from azure.ai.vision.imageanalysis.models import VisualFeatures
    from azure.core.credentials import AzureKeyCredential
    
    client = ImageAnalysisClient(
        endpoint="<YOUR_RESOURCE_ENDPOINT>",
        credential=AzureKeyCredential("<YOUR_AUTHORIZATION_KEY>")
    )
    
    result = client.analyze(
        image_data=<IMAGE_DATA_BYTES>,  # binary data
        visual_features=[VisualFeatures.READ],
        language="en"
    )
    

4. ๐Ÿ“ฅ Process result

  • Response structure: JSON or SDK object
  • Organized by: Blocks โ†’ Lines โ†’ Words
  • Each with text, bounding polygon, confidence score

Example response:

{
  "metadata": { "width": 500, "height": 430 },
  "readResult": {
    "blocks": [
      {
        "lines": [
          {
            "text": "Hello World!",
            "boundingPolygon": [...],
            "words": [
              { "text": "Hello", "boundingPolygon": [...], "confidence": 0.996 },
              { "text": "World!", "boundingPolygon": [...], "confidence": 0.99 }
            ]
          }
        ]
      }
    ]
  }
}

โญ Best Practice / Considerations

๐ŸŽฏ Service choice:

  • Use ๐Ÿ” AI Vision for general unstructured OCR (photos, labels, menus, business cards)
  • Use ๐Ÿ“‹ Document Intelligence for forms/invoices (structured data)
  • Use ๐ŸŽญ Content Understanding for multimodal extraction (documents, audio, video)

๐Ÿ”’ Security:

  • Prefer Microsoft Entra ID + managed identity in production
  • Use Azure Key Vault to store keys securely

โšก Performance:

  • Optimize image quality (clear text, high resolution)
  • Use appropriate language hint when text is non-English

๐Ÿ“ Sample Exam-Like Questions

1๏ธโƒฃ You need to extract text from scanned business cards and photos of street signs. Which Azure service do you use?

โœ… Answer: Azure AI Vision


2๏ธโƒฃ What authentication methods can be used with Azure AI Vision OCR?

โœ… Answer: Key-based authentication and Microsoft Entra ID authentication


3๏ธโƒฃ What image formats and size limits apply to images submitted for OCR using Azure AI Vision?

โœ… Answer:

  • Formats: JPEG, PNG, GIF, BMP
  • Size: < 4 MB
  • Dimensions: > 50x50 pixels

4๏ธโƒฃ What structure does the OCR response from Azure AI Vision provide?

โœ… Answer: Response contains blocks โ†’ lines โ†’ words, each with text, bounding polygon, and confidence score