如何显示从Tessnet(Tesseract)获得的tif坐标的图像
本文关键字:坐标 图像 tif Tessnet 何显示 显示 Tesseract | 更新日期: 2023-09-27 18:36:22
我在tesseract引擎上运行OCR,将Tessnet作为C#包装器。我已经获得了已识别单词的图像坐标,我想使用这些坐标仅显示页面的该部分。我不在乎此页面部分是否保存为单独的图像,或者它是否只是以某种方式突出显示 tif 图像的部分。
这是我当前的代码:
TextWriter tw = new StreamWriter(@"U:'user files'bwalker'ocrTesting.txt");
Bitmap image = new Bitmap(@"u:'user files'bwalker'2849257.tif");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,$-/#&=()'"':?"); // If digit only
ocr.Init(@"C:'Users'bwalker'Documents'Visual Studio 2010'Projects'tessnetWinForms'tessnetWinForms'bin'Release'", "eng", false); // To use correct tessdata
List<tessnet2.Word> result = ocr.DoOCR(image, System.Drawing.Rectangle.Empty);
string Results = "";
foreach (tessnet2.Word word in result)
{
Results += word.Confidence + ", " + word.Text + ", " +word.Top+", "+word.Bottom+", "+word.Left+", "+word.Right+"'n";
}
using (StreamWriter writer = new StreamWriter(@"U:'user files'bwalker'ocrTesting2.txt", true))
{
writer.WriteLine(Results);
writer.Close();
}
MessageBox.Show("Completed");
下面是生成的.txt文件的一部分:
14, Due, 105, 136, 1886, 1962
89, Date, 105, 136, 1978, 2064
50, 06/16/2009, 105, 136, 2298, 2504
我已经解决了这个问题。获取单词的坐标后,我这样做是为了显示关联的图像:
System.Drawing.Rectangle dueDateRectangle = new System.Drawing.Rectangle(dueDateRect1, dueDateRect2, dueDateRect4 - dueDateRect1, dueDateRect3 - dueDateRect2);
System.Drawing.Imaging.PixelFormat format = image.PixelFormat;
Bitmap cloneBitmap = image.Clone(dueDateRectangle, format);
MemoryStream ms = new MemoryStream();
cloneBitmap.Save(ms, ImageFormat.Png);
ms.Position = 0;
BitmapImage dueDateImage = new BitmapImage();
dueDateImage.BeginInit();
dueDateImage.StreamSource = ms;
dueDateImage.EndInit();
dueDateImageBox.Source = dueDateImage;