A DotNetNuke module and associated windows forms client that facilitates the scanning and processing of documents using box codes.
If you have the need to take information that exists on paper and you want digitize it so that it is available on your computer or server, you are then faced with the challenge of organizing the scanned image. One option is to OCR the document and store the data in a database. For many documents that do not allow for accurate or useful OCR due to their content, assigning an ID number to a document and storing the document for retrieval is the most practical option.
The dnnScanFree project is an Open Source project that eventually hopes to provide a suite of tools to allow you to print out tests and grade them using scanning technology. One of the first challenges encountered was creating an identification code for the test takers that could be printed on the test and recognized when the test was scanned.
The initial idea was to print a bar code on each test that could be read when the test was scanned. The problem with bar codes is that bar codes were designed to be printed by bar code readers and scanned by bar code scanners. As Wikipedia notes: "it is extremely important to have bar codes created with a high resolution graphic application".
In order to properly read a bar code the image must be straight. Computer paper scanners rarely scan images straight even with automatic document feeders. Computer programs are available to straighten an image but they are resource intensive and their accuracy is dependant on the particular image.
Component vendors offer software that will print and recognize bar codes, however this software is very costly.
We decided to create a simple graphical image of a number using binary code.
The DotNetNuke module allows you to create a document that is then assigned a unique ID number and a random 4 digit number that is used as a "check code".
The module allows you to print out a "cover sheet" that contains the ID number and the check code represented by black and white boxes. Two black boxes are printed above the code on either site to facilitate straightening the image after it is scanned.
Even when the image is not completely straight the box code can still be properly recognized. This also allows a "degraded" image to be scanned such as that from a printer low on toner or a scan from a fax.
The dnnScanFree project offers a windows forms client to process the scans. A windows forms client is used because the scanning detection process is extremely resource intensive and uses GDI+ classes that reside in the Windows namespace. They are not designed to be run outside of the windows desktop. See this warning from Microsoft:
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions.
The dnnScanFree client allows you to open a scan individually and see the recognition.
It also provides a "batch process" that allows you to specify settings and automatically process an entire directory and import and attach the images to the records in the DotNetNuke module..
In the DotNetNuke module, clicking the Select link next to the document allows you to see the scans associated with the document (you can scan and attach an unlimited amount of scans to a single document).
A program such as Windows Photo Gallery will allow you to see all the pages of a multi-page Tiff.
There is also a section to view scans that either had errors during scan recognition or their ID number and check number did not match.
This project and code is not ready for production. At this point it is an early proof of concept that represents hundreds of hours of work. At most it can be used as a starting point for your own projects.
You will note that currently the program only recognizes TIFF files scanned at 300 dpi using CCITT T.6 compression. A TIFF image at 300dpi without this compression will not be recognized properly because the alignment marks will not be at their expected distance. Future versions of the dnnScanFree client will allow you to set the alignment marks by clicking on them. For now, the code has a reference to most of the values you would need to adjust to work with your particular scanner or image format.
// **** Adjustable values ****
int intBoxSize = 30;
int intCompletelyBlackSquare = 2280;
int intSectionMarked = 1000;
int intDistanceToSecondStartingPoint = 2140;
int intBoxCodeX = 155;
int intBoxCodeY = 62;
int intBoxSpacing = 31;
int intBoxExtraSpacing = 1;
An additional green box will be drawn if the scan is skewed (it detects that the pixel count in the box was too low). The scan will be rotated based on the pixel count difference between the two boxes so the final image may show both boxes being off (the scan will still read correctly as you can see in the image above). You only need the detection boxes to have the correct horizontal distance from the left hand side one.
You can download the DotNetNuke module, the windows forms client and all source code here:
http://www.codeplex.com/DnnScanFree
Back to: DotNetNuke® Module Development Help Website
DotNetNuke® is a registered trademark of the DotNetNuke Corporation