Why Chandra is the Ultimate OCR Tool for Handwriting and Tables
Why Chandra is the Ultimate OCR Tool for Handwriting and Tables
Introduction
Are you tired of dealing with messy forms, complex tables, and hard-to-read handwriting in your documents? Traditional OCR tools often fall short when it comes to handling these intricate details. But what if there was a tool that could accurately read and interpret even the most challenging documents? Enter Chandra, the OCR model that handles complex tables, forms, handwriting, and more. In this article, we'll explore why Chandra is trending now, its key features, and how you can start using it today.
What is Chandra?
Chandra is an advanced OCR (Optical Character Recognition) model developed by Datalab. It is specifically designed to handle complex documents such as handwritten notes, tables, math equations, and messy forms. Created with the latest advancements in machine learning, Chandra stands out in its ability to accurately read and interpret documents that traditional OCR tools struggle with. As the demand for document intelligence grows, Chandra has become a game-changer for developers and businesses alike.
Key Features
Chandra offers a range of features that make it a powerful tool for document processing:
- Two Inference Modes: Run locally via HuggingFace Transformers or deploy a vLLM server for production throughput.
- Layout-aware Output: Every text block, table, and image comes with bounding box coordinates.
- Structured Formats: Output as Markdown, HTML, or JSON with full layout metadata.
- 40+ Languages Supported: Chandra can handle documents in a wide range of languages.
- Advanced Handling: Supports handwriting, tables, math equations, forms, and complex layouts.
Use Cases
Chandra excels in various real-world scenarios where traditional OCR tools fail. Here are some concrete use cases:
- Medical Notes: Doctors often write notes in a cursive and messy handwriting style. Chandra can accurately read and convert these notes into structured text.
- Financial Filings: Financial documents often contain complex tables and merged cells. Chandra preserves the structure of these tables, making it easier to extract and analyze data.
- Educational Materials: Textbooks, worksheets, and research papers often include math equations and complex layouts. Chandra can handle these elements with ease.
- Newspapers: Multi-column layouts, figures, and captions are common in newspapers. Chandra can accurately process and convert these documents.
Step-by-Step Installation & Setup Guide
Installation
To get started with Chandra, you need to install the chandra-ocr package. You can do this using pip:
pip install chandra-ocr
For better performance with HuggingFace inference, we recommend installing flash attention.
From Source
If you prefer to install from source, follow these steps:
git clone https://github.com/datalab-to/chandra.git
cd chandra
uv sync
source .venv/bin/activate
Configuration
You can configure Chandra using environment variables or a local.env file. Here are some common settings:
MODEL_CHECKPOINT=datalab-to/chandra
MAX_OUTPUT_TOKENS=8192
VLLM_API_BASE=http://localhost:8000/v1
VLLM_GPUS=0
vLLM Server
For production or batch processing, you can launch a vLLM server:
chandra_vllm
Configure the server via environment variables:
VLLM_API_BASE: Server URL (default:http://localhost:8000/v1)VLLM_MODEL_NAME: Model name (default:chandra)VLLM_GPUS: GPU device IDs (default:0)
REAL Code Examples from the Repository
CLI Usage
To use Chandra via the command line, you can run the following commands:
# Single file with vLLM server
chandra input.pdf ./output --method vllm
# Directory with local model
chandra ./documents ./output --method hf
Python Usage
Here's a Python example to get you started:
from chandra.model import InferenceManager
from chandra.input import load_pdf_images
manager = InferenceManager(method="hf")
images = load_pdf_images("document.pdf")
results = manager.generate(images)
print(results[0].markdown)
Explanation
- InferenceManager: This class handles the inference process. You can specify the inference method (
hffor HuggingFace orvllmfor vLLM). - load_pdf_images: This function loads images from a PDF file.
- generate: This method generates the OCR results.
- markdown: This attribute returns the OCR results in Markdown format.
Advanced Usage
For advanced usage, you can use the following options:
--method [hf|vllm]: Inference method (default: vllm)
--page-range TEXT: Page range for PDFs (e.g., "1-5,7,9-12")
--max-output-tokens INTEGER: Max tokens per page
--max-workers INTEGER: Parallel workers for vLLM
--include-images/--no-images: Extract and save images (default: include)
--include-headers-footers/--no-headers-footers: Include page headers/footers (default: exclude)
--batch-size INTEGER: Pages per batch (default: 1)
Advanced Usage & Best Practices
To get the most out of Chandra, consider the following best practices:
- Use vLLM for Production: For high throughput and production environments, use the vLLM server.
- Optimize Environment: Ensure you have the necessary dependencies and environment variables configured for optimal performance.
- Batch Processing: When processing multiple documents, use batch processing to improve efficiency.
- Regular Updates: Keep your Chandra installation up to date with the latest improvements and bug fixes.
Comparison with Alternatives
When choosing an OCR tool, it's important to compare the options available. Here's a comparison table to help you decide:
| Feature | Chandra | Traditional OCR Tools |
|---|---|---|
| Handwriting Support | Excellent | Limited |
| Table Handling | Excellent | Limited |
| Math Equation Support | Excellent | Limited |
| Complex Layouts | Excellent | Limited |
| Inference Modes | vLLM, HuggingFace | Limited |
| Output Formats | Markdown, HTML, JSON | Limited |
| Supported Languages | 40+ | Limited |
FAQ
How accurate is Chandra?
Chandra is highly accurate, especially with complex documents. It has been benchmarked and tested to handle handwriting, tables, and forms with high precision.
Can I use Chandra commercially?
Yes, but with some restrictions. The code is Apache 2.0 licensed, while the model weights use a modified OpenRAIL-M license. For broader commercial licensing, see pricing.
What languages does Chandra support?
Chandra supports 40+ languages, making it a versatile tool for international use.
How can I get help with Chandra?
Join the Discord community to discuss development and get help.
Is there a hosted API available?
Yes, a hosted API with additional accuracy improvements is available at datalab.to. You can try the free playground without installing.
Conclusion
Chandra is a powerful OCR tool that handles complex documents with ease. Its ability to accurately read handwriting, tables, math equations, and forms makes it a must-have for developers and businesses. To get started, simply install the chandra-ocr package and follow the setup guide. For more information and to contribute, visit the Chandra GitHub repository. Give it a star if you find it helpful! ⭐
Comments (0)
No comments yet. Be the first to share your thoughts!