在PDF中找到某些文本，然后返回在另一部分找到的页码.注意:使用Docotic.pdf

本文关键字：注意使用 pdf Docotic 一部分返回 PDF 文本然后另一部 | 更新日期: 2023-09-27 18:01:40

在下面的代码中，用户将输入一个搜索字符串(barcodedata)。然后，该字符串将被截断为前5个字符，并用作jobnumber。jobnumber，也是我要扫描条形码数据的pdf的名称。

我想让'Find it'按钮执行下面的代码，然后将找到的值返回给startpagedata。

我不知道程序是否真的没有扫描PDF来寻找搜索字符串，或者这个值只是没有返回给程序。

using BitMiracle.Docotic.Pdf;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Diagnostics;
using System.IO;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Web;
using System.Windows.Forms;
using Acrobat;
namespace BarCodeReader
{
    public partial class Form1 : Form
    {
        public string stringsToFind;
        public string pathtofile;
        public Form1()
        {
            InitializeComponent();
        }
        private void barcodedata_TextChanged(object sender, EventArgs e)
        {
            stringsToFind=barcodedata.Text;
            pathtofile = "C:''" + StringTool.Truncate(barcodedata.Text, 5) + ".pdf";
            jobnumberdata.Text = StringTool.Truncate(barcodedata.Text, 5);
        }
        private void label4_Click(object sender, EventArgs e)
        {
        }
        private void jobnumberdata_TextChanged(object sender, EventArgs e)
        {
            jobnumberdata.Text = jobnumberdata.Text.TrimStart('0');
            Console.WriteLine(jobnumberdata.Text);
        }
        private void startpagedata_TextChanged(object sender, EventArgs e)
        {
            Console.WriteLine(startpagedata.Text);
        }
        private void piecesdata_TextChanged(object sender, EventArgs e)
        {
        }
        private void FindIt_Click(object sender, EventArgs e)
        {
            PdfDocument pdf = new PdfDocument(pathtofile);
            for (int i = 0; i < pdf.Pages.Count; i++)
            {
                string pageText = pdf.Pages[i].GetText();
                int count = 0;
                int lastStartIndex = pageText.IndexOf(stringsToFind, 0, StringComparison.CurrentCultureIgnoreCase);
                while (lastStartIndex != -1)
                {
                    count++;
                    lastStartIndex = pageText.IndexOf(stringsToFind, lastStartIndex + 1, StringComparison.CurrentCultureIgnoreCase);
                }
                if (count != 0)
                    startpagedata.Text = Convert.ToString(lastStartIndex);
            }
        }
    }
    public static class StringTool
    {
        /// <summary>
        /// Get a substring of the first N characters.
        /// </summary>
        public static string Truncate(string source, int length)
        {
            if (source.Length > length)
            {
                source = source.Substring(0, length);
            }
            return source;
        }
        /// <summary>
        /// Get a substring of the first N characters. [Slow]
        /// </summary>
        public static string Truncate2(string source, int length)
        {
            return source.Substring(0, Math.Min(length, source.Length));
        }
    }
}

在PDF中找到某些文本，然后返回在另一部分找到的页码.注意:使用Docotic.pdf

这段代码有什么问题?如果FindIt_Click没有在点击时执行，那么你可能没有将它与"查找它"按钮联系起来。

FindIt_Click中的代码看起来非常正确，除了以下几点:

它可能不像你期望的那样。它返回找到搜索字符串的最后一页文本中的最后一个索引。但是，您可能期望是找到搜索字符串的最后一页的索引?
您可以使用pageText。LastIndexOf方法快速查找lastStartIndex。

<>之前使用(PdfDocument pdf = new PdfDocument(pathtofile))｛…}

为了使用Docotic.pdf，您可以使用Acrobat.dll来查找当前页码。首先打开pdf文件，使用

搜索字符串

Acroavdoc.open("Filepath","Temperory title")

和

Acroavdoc.FindText("String").

如果在此pdf文件中找到字符串，则光标移动到特定的页面，并且搜索的字符串将突出显示。现在我们使用Acroavpageview.GetPageNum()来获取当前的页码。

Dim AcroXAVDoc As CAcroAVDoc
Dim Acroavpage As AcroAVPageView
Dim AcroXApp As CAcroApp
AcroXAVDoc = CType(CreateObject("AcroExch.AVDoc"), Acrobat.CAcroAVDoc)
AcroXApp = CType(CreateObject("AcroExch.App"), Acrobat.CAcroApp)
AcroXAVDoc.Open("File path", "Original document")
AcroXAVDoc.FindText("String is to searched", True, True, False)
Acroavpage = AcroXAVDoc.GetAVPageView()
Dim x As Integer = Acroavpage.GetPageNum
MsgBox("the string found in page number" & x)