toTitleCase忽略c#中的序号
本文关键字:忽略 toTitleCase | 更新日期: 2023-09-27 18:16:46
我试图找出一种方法来使用toTitleCase忽略序数。它可以像我希望的那样对所有字符串工作,除了序数(例如,1st, 2nd, 3rd变成1st, 2nd, 3rd)。
任何帮助都会很感激。正则表达式可能是处理这个问题的方法,我只是不确定如何构造这样的正则表达式。
更新:这是我使用的解决方案(使用约翰的答案,我写在下面的扩展方法):
public static string ToTitleCaseIgnoreOrdinals(this string text)
{
string input = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase(text);
string result = System.Text.RegularExpressions.Regex.Replace(input, "([0-9]st)|([0-9]th)|([0-9]rd)|([0-9]nd)", new System.Text.RegularExpressions.MatchEvaluator((m) => m.Captures[0].Value.ToLower()), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
return result;
}
string input = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase("hello there, this is the 1st");
string result = System.Text.RegularExpressions.Regex.Replace(input, "([0-9]st)|([0-9]th)|([0-9]rd)|([0-9]nd)", new System.Text.RegularExpressions.MatchEvaluator((m) =>
{
return m.Captures[0].Value.ToLower();
}), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
在转换为标题大小写之前,可以使用正则表达式检查字符串是否以数字开头,如下所示:
if (!Regex.IsMatch(text, @"^'d+"))
{
CultureInfo.CurrentCulture.TextInfo.toTitleCase(text);
}
编辑:忘了反转条件…如果不匹配,它会应用toTitleCase。
第二次编辑:添加循环检查句子中的所有单词:
string text = "150 east 40th street";
string[] array = text.Split(' ');
for (int i = 0; i < array.Length; i++)
{
if (!Regex.IsMatch(array[i], @"^'d+"))
{
array[i] = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(array[i]);
}
}
string newText = string.Join(" ",array);
我将拆分文本并遍历结果数组,跳过不以字母开头的内容。
using System.Globalization;
TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;
string[] text = myString.Split();
for(int i = 0; i < text.Length; i++)
{ //Check for zero-length strings, because these will throw an
//index out of range exception in Char.IsLetter
if (text[i].Length > 0 && Char.IsLetter(text[i][0]))
{
text[i] = textInfo.ToTitleCase(text[i]);
}
}
您可以简单地使用String.Replace
(或StringBuilder.Replace
):
string[] ordinals = { "1St", "2Nd", "3Rd" }; // add all others
string text = "This is just sample text which contains some ordinals, the 1st, the 2nd and the third.";
var sb = new StringBuilder(CultureInfo.InvariantCulture.TextInfo.ToTitleCase(text));
foreach (string ordinal in ordinals)
sb.Replace(ordinal, ordinal.ToLowerInvariant());
text = sb.ToString();
这一点都不优雅。它要求你保持一个无穷大第一行的序数列表。我想这就是为什么有人downvoted你。
它并不优雅,但它比其他简单的方法(如regex)工作得更好。您希望在较长的文本中使用title-case words。但只有非序数的词。序数是指第1、2、3和31,但不是31。所以简单的正则表达式解决方案很快就会失败。您还需要对10m
到10M
这样的单词进行标题大小写(其中M可以是million的缩写)。
所以我不明白为什么维护一个序数列表这么糟糕。
您甚至可以自动生成它们并设置上限,例如:
public static IEnumerable<string> GetTitleCaseOrdinalNumbers()
{
for (int num = 1; num <= int.MaxValue; num++)
{
switch (num % 100)
{
case 11:
case 12:
case 13:
yield return num + "Th";
break;
}
switch (num % 10)
{
case 1:
yield return num + "St"; break;
case 2:
yield return num + "Nd"; break;
case 3:
yield return num + "Rd"; break;
default:
yield return num + "Th"; break;
}
}
}
如果你想检查前1000个序数:
foreach (string ordinal in GetTitleCaseOrdinalNumbers().Take(1000))
sb.Replace(ordinal, ordinal.ToLowerInvariant());
为了它的价值,这里是我尝试提供一种有效的方法来真正检查单词(而不仅仅是子字符串),并在真正表示序数的单词上跳过ToTitleCase
(因此不是31th
,而是31st
)。它还处理非空白字符(如点或逗号)的分隔符:
private static readonly char[] separator = { '.', ',', ';', ':', '-', '(', ')', '''', '{', '}', '[', ']', '/', '''', '''', '"', '"', '?', '!', '|' };
public static bool IsOrdinalNumber(string word)
{
if (word.Any(char.IsWhiteSpace))
return false; // white-spaces are not allowed
if (word.Length < 3)
return false;
var numericPart = word.TakeWhile(char.IsDigit);
string numberText = string.Join("", numericPart);
if (numberText.Length == 0)
return false;
int number;
if (!int.TryParse(numberText, out number))
return false; // handle unicode digits which are not really numeric like ۵
string ordinalNumber;
switch (number % 100)
{
case 11:
case 12:
case 13:
ordinalNumber = number + "th";
break;
}
switch (number % 10)
{
case 1:
ordinalNumber = number + "st"; break;
case 2:
ordinalNumber = number + "nd"; break;
case 3:
ordinalNumber = number + "rd"; break;
default:
ordinalNumber = number + "th"; break;
}
string checkForOrdinalNum = numberText + word.Substring(numberText.Length);
return checkForOrdinalNum.Equals(ordinalNumber, StringComparison.CurrentCultureIgnoreCase);
}
public static string ToTitleCaseIgnoreOrdinalNumbers(string text, TextInfo info)
{
if(text.Trim().Length < 3)
return info.ToTitleCase(text);
int whiteSpaceIndex = FindWhiteSpaceIndex(text, 0, separator);
if(whiteSpaceIndex == -1)
{
if(IsOrdinalNumber(text.Trim()))
return text;
else
return info.ToTitleCase(text);
}
StringBuilder sb = new StringBuilder();
int wordStartIndex = 0;
if(whiteSpaceIndex == 0)
{
// starts with space, find word
wordStartIndex = FindNonWhiteSpaceIndex(text, 1, separator);
sb.Append(text.Remove(wordStartIndex)); // append leading spaces
}
while(wordStartIndex >= 0)
{
whiteSpaceIndex = FindWhiteSpaceIndex(text, wordStartIndex + 1, separator);
string word;
if(whiteSpaceIndex == -1)
word = text.Substring(wordStartIndex);
else
word = text.Substring(wordStartIndex, whiteSpaceIndex - wordStartIndex);
if(IsOrdinalNumber(word))
sb.Append(word);
else
sb.Append(info.ToTitleCase(word));
wordStartIndex = FindNonWhiteSpaceIndex(text, whiteSpaceIndex + 1, separator);
string whiteSpaces;
if(wordStartIndex >= 0)
whiteSpaces = text.Substring(whiteSpaceIndex, wordStartIndex - whiteSpaceIndex);
else
whiteSpaces = text.Substring(whiteSpaceIndex);
sb.Append(whiteSpaces); // append spaces between words
}
return sb.ToString();
}
public static int FindWhiteSpaceIndex(string text, int startIndex = 0, params char[] separator)
{
bool checkSeparator = separator != null && separator.Any();
for (int i = startIndex; i < text.Length; i++)
{
char c = text[i];
if (char.IsWhiteSpace(c) || (checkSeparator && separator.Contains(c)))
return i;
}
return -1;
}
public static int FindNonWhiteSpaceIndex(string text, int startIndex = 0, params char[] separator)
{
bool checkSeparator = separator != null && separator.Any();
for (int i = startIndex; i < text.Length; i++)
{
char c = text[i];
if (!char.IsWhiteSpace(text[i]) && (!checkSeparator || !separator.Contains(c)))
return i;
}
return -1;
}
请注意,这实际上还没有经过测试,但应该给你一个想法。
这将适用于这些字符串,您可以通过扩展方法重写ToTitleCase()。
string s = "1st";
if ( s[0] >= '0' && s[0] <= '9' ) {
//this string starts with a number
//so don't call ToTitleCase()
}
else { //call ToTileCase() }