从 C# 进行本机调用的最佳做法

本文关键字:最佳 调用 本机 | 更新日期: 2023-09-27 18:36:57

我想知道从我的 C# 应用程序调用外部依赖项的最佳实践/设计是什么?我的应用程序被分发为在其他应用程序中使用的 DLL。

我有一个名为 OCRObject 的类,我不知道是否应该将其设为静态。

这是我调用外部 DLL 的代码:

/// <summary>
/// A static instance of OCRObject that handles the OCR part of the application. This class
/// calls a native libary and the required files must therfore be present in /Tesseract folder.
/// </summary>
internal class OCRObject
{
    /// <summary>
    /// Calls the Native C++ libary and returns an UTF-8 string of the image text.
    /// </summary>
    /// <param name="imagePath">   The full image path.</param>
    /// <param name="tessConfPath">The tesseract configuration path.</param>
    /// <param name="tessLanguage">The tesseract language.</param>
    /// <returns></returns>
    [HandleProcessCorruptedStateExceptions]
    public string GetOCRText(string imagePath, string tessConfPath, string tessLanguage)
    {
        try
        {
            if (StaticObjectHolder.EnableAdvancedLogging)
            {
                Logger.Log(string.Format("Doing OCR on folder {0}.", imagePath));
            }
            return this.StringFromNativeUtf8(OCRObject.GetUTF8Text(tessConfPath, tessLanguage, imagePath));
        }
        catch (AccessViolationException ave)
        {
            Logger.Log(ave.ToString(), LogInformationType.Error);
        }
        catch (Exception ex)
        {
            Logger.Log(ex.ToString(), LogInformationType.Error);
        }
        return string.Empty;
    }
    /// <summary>
    /// The DLL Import declaration. The main entry point is GetUTF8Text which is the method in
    /// the native libary. This method extracts text from the image and returns and UTF-8 representation of the string.
    /// </summary>
    /// <param name="path">   The path of the configuration files.</param>
    /// <param name="lang">   The language to parse. For example DAN, ENG etc.</param>
    /// <param name="imgPath">The full path of the image to extract image from.</param>
    /// <returns></returns>
    [HandleProcessCorruptedStateExceptions]
    [DllImport(@"'Tesseract'TesseractX64.dll", EntryPoint = "GetUTF8Text", CallingConvention = CallingConvention.Cdecl)]
    private static extern IntPtr GetUTF8Text(string path, string lang, string imgPath);
    /// <summary>
    /// Converts the returned IntPtr from the native call to a UTF-8 based string.
    /// </summary>
    /// <param name="nativeUtf8">The native UTF8.</param>
    /// <returns></returns>
    [HandleProcessCorruptedStateExceptions]
    private string StringFromNativeUtf8(IntPtr nativeUtf8)
    {
        try
        {
            int len = 0;
            if (nativeUtf8 == IntPtr.Zero)
            {
                return string.Empty;
            }
            while (Marshal.ReadByte(nativeUtf8, len) != 0)
            {
                ++len;
            }
            byte[] buffer = new byte[len];
            Marshal.Copy(nativeUtf8, buffer, 0, buffer.Length);
            string text = Encoding.UTF8.GetString(buffer);
            nativeUtf8 = IntPtr.Zero; /*set to zero.*/
            return text;
        }
        catch
        {
            return string.Empty;
        }
    }
}

我的目标是获得最大性能,所以我想知道是否可以通过使此类静态或追逐任何代码来优化此代码?

这是C++代码:

#include "stdafx.h"
#include "OCRWrapper.h"
#include "allheaders.h"
#include "baseapi.h"
#include "iostream"
#include "fstream";
#include "vector";
#include "algorithm"
#include "sys/types.h"
#include "sstream"
OCRWrapper::OCRWrapper()
{
}
//OCRWrapper::~OCRWrapper()
//{
//}
/// <summary>
/// Sets the image path to read text from.
/// </summary>
/// <param name="imgPath">The img path.</param>
/// <summary>
/// Get the text from the image in UTF-8. Remeber to Convert it to UTF-8 again on the callee side.
/// </summary>
/// <returns></returns>
char* OCRWrapper::GetUTF8Text(char* path, char* lang, char* imgPath)
{
    char* imageText = NULL;
    try
    {
        tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
        if (api->Init(path, lang)) {
            fprintf(stderr, "Could not initialize tesseract. Incorrect datapath or incorrect lanauge'n"); /*This should throw an error to the caller*/
            exit(1);
        }
        /*Open a reference to the imagepath*/
        Pix *image = pixRead(imgPath);
        /*Read the image object;*/
        api->SetImage(image);
        // Get OCR result
        imageText = api->GetUTF8Text();
        /*writeToFile(outText);*/
        /*printf("OCR output:'n%s", imageText);*/
        /*Destroy the text*/
        api->End();
        pixDestroy(&image);
        /*std::string x = std::string(imageText);*/
        return imageText;
    }
    catch (...)
    {
        std::string errorStr("An error occured during OCR. ImgPath => " + std::string(imgPath));
        return &errorStr[0];
    }
}

从 C# 进行本机调用的最佳做法

最佳性能?对接口类使用 C++/CLR。差异很小,但可能是相关的。如果可以避免字符串生成,则它要大得多 - 使用 C# 互操作字符串必须进行封送,使用 C++/CLR,您可以重用缓存的字符串。取决于您在下游拥有的较低级别的 API。

不过,就 OCR 而言,我真的认为你吠错了树。OCR 是处理器密集型操作,因此您在调用上优化的所有内容(与处理相比很少且相距甚远)都是无关紧要的。例如,我要优化这些东西的次数是使用每秒可能调用数十万次的交换数据流 - 将其转发到C#处理的数据最少。但是对于 OCR,我认为这是相关的,我遇到了严重的问题。特别是如果您一开始不处理图像 - 这是考虑优化的唯一方法。

调用 GetOCRText 需要多长时间?如果它明显超过1/1000秒 - 那么认真地,您确实尝试优化错误的元素。呼叫开销很小(比这小得多)。