The software release for DocSight OCR 3.1.0 (184.108.40.20629) includes a new feature.
The following software is required to successfully use DocSight OCR.
- Windows Server™ 2008 R2, 2012 R2 or 2016
- Microsoft® .NET Framework 4.6.2 (If it is not detected, it is installed automatically)
Pentium 1.6 GHz or higher processor (Intel Core or higher CPU is recommended).
- 4 GB minimum RAM; 6 GB Recommended for grayscale or color images and more for multithreaded applications.
- 1 GB of free hard disk space
- 2 GB minimum RAM; 4 GB Recommended
- 600 MB of free hard disk space
Note: If installing an ActivePDF product on a Windows 2012 server, you must download and install two Microsoft updates for Windows 2012 servers. These updates resolve issues with the Microsoft Visual 2015 C++ Redistributable Runtime Components. For links and step-by-step instructions, see the ActivePDF Knowledge Base article Installing ActivePDF Products on a Windows 2012 Server.
DocSight OCR includes a new data capture add-on feature. You can create and define templates to extract data from documents, verify that the output matches the original document, then save the extracted data. The information is assembled into single or multipage documents.
OCR with data capture processes or takes a scanned PDF or image-based PDF as your input file. With that file and a template editor, you can create a template, or collection of templates, defining the areas in the input file to extract data from. You can sort the templates by type (for example, bills or invoice templates), or sort them into subcategories, such as bills from vendor A, bills from vendor B, and so on.
For example, you want to extract and consolidate certain customer billing data for a utility company. First, scan an existing bill as your input file. Then, using the template editor, create a template from that scanned bill. On the template, specify which areas, or zones, contain the data you want to extract. This might include the customer's name, address, account number, and billing charges.
After capturing and extracting the data, OCR verifies that the data matches the original data, then saves the output to a CSV file. If OCR finds suspect characters, the conversion stops, and is moved to the error folder.
OCR with data capture includes:
- Support for PDF files and image input formats.
- Data extraction, and saving data to a CSV output file.
- OCR Profiles to process files added to the Watch folder.
- Anchor zones and user zones to capture data. Specify where to extract data by zones using x, y coordinates.
- Access to templates or a template editor. Contact the ActivePDF Sales department to create templates for you, or to purchase a template editor.
Installation and Getting Started
API information is available in the Legacy documentation section for the DocSight OCR User Guide at: