toTitleCase忽略c#中的序号

本文关键字:忽略 toTitleCase | 更新日期: 2023-09-27 18:16:46

我试图找出一种方法来使用toTitleCase忽略序数。它可以像我希望的那样对所有字符串工作,除了序数(例如,1st, 2nd, 3rd变成1st, 2nd, 3rd)。

任何帮助都会很感激。正则表达式可能是处理这个问题的方法,我只是不确定如何构造这样的正则表达式。

更新:这是我使用的解决方案(使用约翰的答案,我写在下面的扩展方法):

public static string ToTitleCaseIgnoreOrdinals(this string text)
{
    string input = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase(text);
    string result = System.Text.RegularExpressions.Regex.Replace(input, "([0-9]st)|([0-9]th)|([0-9]rd)|([0-9]nd)", new System.Text.RegularExpressions.MatchEvaluator((m) => m.Captures[0].Value.ToLower()), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
    return result;
}

toTitleCase忽略c#中的序号

string input =  System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase("hello there, this is the 1st");
string result = System.Text.RegularExpressions.Regex.Replace(input, "([0-9]st)|([0-9]th)|([0-9]rd)|([0-9]nd)", new System.Text.RegularExpressions.MatchEvaluator((m) =>
{
    return m.Captures[0].Value.ToLower();
}), System.Text.RegularExpressions.RegexOptions.IgnoreCase);

在转换为标题大小写之前,可以使用正则表达式检查字符串是否以数字开头,如下所示:

if (!Regex.IsMatch(text, @"^'d+"))
{
   CultureInfo.CurrentCulture.TextInfo.toTitleCase(text);
}

编辑:忘了反转条件…如果不匹配,它会应用toTitleCase。

第二次编辑:添加循环检查句子中的所有单词:

string text = "150 east 40th street";

            string[] array = text.Split(' ');

            for (int i = 0; i < array.Length; i++)
            {
                if (!Regex.IsMatch(array[i], @"^'d+"))
                {
                    array[i] = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(array[i]);
                }
            }

            string newText = string.Join(" ",array);

我将拆分文本并遍历结果数组,跳过不以字母开头的内容。

        using System.Globalization;
        TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;
        string[] text = myString.Split();
        for(int i = 0; i < text.Length; i++)
        {   //Check for zero-length strings, because these will throw an
            //index out of range exception in Char.IsLetter
            if (text[i].Length > 0 && Char.IsLetter(text[i][0]))
            {
                text[i] = textInfo.ToTitleCase(text[i]);
            }
        }

您可以简单地使用String.Replace(或StringBuilder.Replace):

string[] ordinals = { "1St", "2Nd", "3Rd" };  // add all others
string text = "This is just sample text which contains some ordinals, the 1st, the 2nd and the third.";
var sb = new StringBuilder(CultureInfo.InvariantCulture.TextInfo.ToTitleCase(text));
foreach (string ordinal in ordinals)
    sb.Replace(ordinal, ordinal.ToLowerInvariant());
text = sb.ToString();

这一点都不优雅。它要求你保持一个无穷大第一行的序数列表。我想这就是为什么有人downvoted你。

它并不优雅,但它比其他简单的方法(如regex)工作得更好。您希望在较长的文本中使用title-case words。但只有非序数的词。序数是指第1、2、3和31,但不是31。所以简单的正则表达式解决方案很快就会失败。您还需要对10m10M这样的单词进行标题大小写(其中M可以是million的缩写)。

所以我不明白为什么维护一个序数列表这么糟糕。

您甚至可以自动生成它们并设置上限,例如:

public static IEnumerable<string> GetTitleCaseOrdinalNumbers()
{
    for (int num = 1; num <= int.MaxValue; num++)
    {
        switch (num % 100)
        {
            case 11:
            case 12:
            case 13:
                yield return num + "Th";
                break;
        }
        switch (num % 10)
        {
            case 1:
                yield return num + "St"; break;
            case 2:
                yield return num + "Nd"; break;
            case 3:
                yield return num + "Rd"; break;
            default:
                yield return num + "Th"; break;
        }
    }
}

如果你想检查前1000个序数:

foreach (string ordinal in GetTitleCaseOrdinalNumbers().Take(1000)) 
   sb.Replace(ordinal, ordinal.ToLowerInvariant());

为了它的价值,这里是我尝试提供一种有效的方法来真正检查单词(而不仅仅是子字符串),并在真正表示序数的单词上跳过ToTitleCase(因此不是31th,而是31st)。它还处理非空白字符(如点或逗号)的分隔符:

private static readonly char[] separator = { '.', ',', ';', ':', '-', '(', ')', '''', '{', '}', '[', ']', '/', '''', '''', '"', '"', '?', '!', '|' };
public static bool IsOrdinalNumber(string word)
{
    if (word.Any(char.IsWhiteSpace))
        return false; // white-spaces are not allowed
    if (word.Length < 3)
        return false;
    var numericPart = word.TakeWhile(char.IsDigit);
    string numberText = string.Join("", numericPart);
    if (numberText.Length == 0)
        return false;
    int number;
    if (!int.TryParse(numberText, out number))
        return false; // handle unicode digits which are not really numeric like ۵
    string ordinalNumber;
    switch (number % 100)
    {
        case 11:
        case 12:
        case 13:
            ordinalNumber = number + "th";
            break;
    }
    switch (number % 10)
    {
        case 1:
            ordinalNumber = number + "st"; break;
        case 2:
            ordinalNumber = number + "nd"; break;
        case 3:
            ordinalNumber = number + "rd"; break;
        default:
            ordinalNumber = number + "th"; break;
    }
    string checkForOrdinalNum = numberText + word.Substring(numberText.Length);
    return checkForOrdinalNum.Equals(ordinalNumber, StringComparison.CurrentCultureIgnoreCase);
}
public static string ToTitleCaseIgnoreOrdinalNumbers(string text, TextInfo info)
{
    if(text.Trim().Length < 3)
        return info.ToTitleCase(text);
    int whiteSpaceIndex = FindWhiteSpaceIndex(text, 0, separator);
    if(whiteSpaceIndex == -1)
    {
        if(IsOrdinalNumber(text.Trim()))
            return text;
        else
            return info.ToTitleCase(text);
    }
    StringBuilder sb = new StringBuilder();
    int wordStartIndex = 0; 
    if(whiteSpaceIndex == 0)
    {
        // starts with space, find word
        wordStartIndex = FindNonWhiteSpaceIndex(text, 1, separator);
        sb.Append(text.Remove(wordStartIndex)); // append leading spaces
    }
    while(wordStartIndex >= 0)
    {
        whiteSpaceIndex = FindWhiteSpaceIndex(text, wordStartIndex + 1, separator);
        string word;
        if(whiteSpaceIndex == -1)
            word = text.Substring(wordStartIndex);
        else
            word = text.Substring(wordStartIndex, whiteSpaceIndex - wordStartIndex);
        if(IsOrdinalNumber(word))
            sb.Append(word);
        else
            sb.Append(info.ToTitleCase(word));
        wordStartIndex = FindNonWhiteSpaceIndex(text, whiteSpaceIndex + 1, separator);
        string whiteSpaces;
        if(wordStartIndex >= 0)
            whiteSpaces = text.Substring(whiteSpaceIndex, wordStartIndex - whiteSpaceIndex);
        else
            whiteSpaces = text.Substring(whiteSpaceIndex);
        sb.Append(whiteSpaces); // append spaces between words
    }
    return sb.ToString();
}
public static int FindWhiteSpaceIndex(string text, int startIndex = 0, params char[] separator)
{
    bool checkSeparator = separator != null && separator.Any();
    for (int i = startIndex; i < text.Length; i++)
    {
        char c = text[i];
        if (char.IsWhiteSpace(c) || (checkSeparator && separator.Contains(c)))
            return i;
    }
    return -1;
}
public static int FindNonWhiteSpaceIndex(string text, int startIndex = 0, params char[] separator)
{
    bool checkSeparator = separator != null && separator.Any();
    for (int i = startIndex; i < text.Length; i++)
    {
        char c = text[i];
        if (!char.IsWhiteSpace(text[i]) && (!checkSeparator || !separator.Contains(c)))
            return i;
    }
    return -1;
}

请注意,这实际上还没有经过测试,但应该给你一个想法。

这将适用于这些字符串,您可以通过扩展方法重写ToTitleCase()。

string s = "1st";
if (   s[0] >= '0' && s[0] <= '9' ) {
   //this string starts with a number
   //so don't call ToTitleCase()
}
else {  //call ToTileCase() }