从文本文件中读取文本(所有语言)

本文关键字：语言取文本文本文件读取 | 更新日期: 2023-09-27 17:56:14

我正在尝试从文本文件中读取所有文本。它适用于英语，不适用于西班牙语、法语等。我必须从文本文件中读取任何语言。我正在使用File.ReadAlltext（filepath，Encoding.UTF8）。我尝试了UTF-8，默认等。但它无法阅读，我得到一些不需要的字符。请给我一个解决此问题的解决方案。

从文本文件中读取文本(所有语言)

你知道你的文件使用什么编码吗？如果没有，那么您可以尝试此处提到的解决方案。当尝试以编程方式找出编码时，您只能希望最好，因为结果总是会带来惊喜，因为有很多可能性。以下是我从该链接中获取的代码。

/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
    // Read the BOM
    var bom = new byte[4];
    using (var file = new FileStream(filename, FileMode.Open)) file.Read(bom, 0, 4);
    // Analyze the BOM
    if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
    if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
    if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
    if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
    if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
    return Encoding.ASCII;
}

例如，

您可以使用此 https://code.google.com/p/chardetsharp/库获取文件编码。然后转换为所需的。