获取带有特殊大小写的字符串的前 140 个字符

本文关键字:字符串 字符 大小写 获取 | 更新日期: 2023-09-27 17:56:54

我有一个字符串,它的长度限制为140个字符。通常,我的代码中会得到 140 多个。字符串是以下格式的值集:Mxxxx,其中 x 可以是任何数字,并且没有严格的长度。所以我可以有M1,也可以有M281。

如果字符串长度超过 140 个字符,我想先取 140 个字符,但如果最后一个被折断一半,我根本不想把它放在我的字符串中。

不过,我需要将后半部分保存在一些局部变量中。

例如,假设这是字符串

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619"

假设这是前 140 个字符:

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M69"

最后一个值是M6919,但它被拆分为 M6919

最有效的说法是什么:如果长度超过 140,则拆分,但如果新字符串中的最后一个值被吐在两个上,请将其从字符串的第一部分删除,并将其与原始字符串的其余部分一起放入其他字符串值中。

可能有很多方法可以实现这一点。我可以使用 if 或开关/case 循环并说如果第二个字符串的第一个字母不是"M",那么我知道该值被拆分了,我应该从第一个字符串中删除它,但是有人有比这更干净的解决方案吗?

private static string CreateSettlmentStringsForUnstructuredField(string settlementsString)
{
    string returnSettlementsString = settlementsString.Replace(", ", " ");
    if (returnSettlementsString.Length > 140)
    {
        returnSettlementsString.Substring(0, 140);
        /*if returnSettlementsString was spitted in two in a way 
          that last value was broken in two parts, take that value 
          out of returnSettlementStrings and put it in some new 
          string value with the other half of the string.*/
    }
    return returnSettlementsString;
} 

获取带有特殊大小写的字符串的前 140 个字符

这样的事情可能会起作用:

string result;
if (input.Length > 140)
{
    result = new string(input.Take(140).ToArray());
    if (input[140] != ',') // will ensure that we don´t omit the last complete word if the 140eth character is a comma
        result = result.Substring(0, result.LastIndexOf(','));
} 
else result = input;

如果总长度更大,则只需前 140 个字符。然后,它搜索逗号的最后一个索引,并获取所有字符,直到此逗号。

最好的办法是将字符串拆分为"单词",然后使用字符串生成器重新组合它们。 未经测试的原始代码如下所示;

public IEnumerable<string> SplitSettlementStrings(string settlementsString) 
{
    var sb = new StringBuilder();
    foreach(var word in WordsFrom(settlementsString))
    {
        var extraFragment = $"{word}, ";
        if (sb.Length + extraFragment < 140) {
        sb.Append(extraFragment);
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        yield return sb.ToString();
        sb = new StringBuilder();
    }
    if (sb.Length > 0) {
        // we may have content left in the string builder
        yield return sb.ToString();
    }
}

您需要使用这样的东西拆分单词;

 public IEnumerable<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    return settlementsString.split(',').Select(x => x.Trim()).Where(x => x.Length > 0);
 }

你会像这样使用整体;

 var settlementStringsIn140CharLenghts = SplitSettlementStrings("M234, M456, M452 ...").ToArray()

编辑

旧的 skool .net 版本看起来像这样;

public ICollection<string> SplitSettlementStrings(string settlementsString) 
{
    List<string> results = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(string word in WordsFrom(settlementsString))
    {
        string extraFragment = word + ", ";
        if (sb.Length + extraFragment < 140) {
           sb.Append(extraFragment);
        }
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        results.Add(sb.ToString());
        sb = new StringBuilder();
    }
    if (sb.Length > 0) {
        // we may have content left in the string builder
        resuls.Add(sb.ToString());
    }
}
 public ICollection<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    string[] fragments = settlementsString.split(',');
    List<string> result = new List<string>();
    foreach(string fragment in fragments) 
    {
        var candidate = fragment.Trim();
        if (candidate.Length > 0) 
        {
            result.Add(candidate);
        }
    } 
    return result;
 }

如果您不想将字符串拆分为列表,我会执行以下操作:

string myString = "M19, M42........";
string result;
int index = 141;
do
{
    //Decrement index to reduce the substring size
    index--;
    //Make the result the new length substring
    result = myString.Substring(0, index);
}while (myString[index] != ','); //Check if our result contains a comma as the next char to check if we're at the end of an entry

因此,您基本上只是将原始字符串子串到 140,检查位置 141 处的字符是否是表示"干净"剪切的逗号。如果没有,它将在 139 处子字符串,检查 140 是否有逗号,等等。

这样的事情应该可以工作:

string test = "M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619";
if (test.Length > 140)
    if (test[140] != ',' && test[140] != ' ') // Last entry was split?
        test = test.Substring(0, test.LastIndexOf(',', 139)); // Take up to but not including the last ','
    else
        test = test.Substring(0, 139);
Console.WriteLine(test);

我的看法,只是为了好玩:

var ssplit = theString.Replace(", ", "#").Split('#');       
var sb = new StringBuilder();
for(int i = 0; i < ssplit.Length; i++)
{
    if(sb.Length + ssplit[i].Length > 138) // 140 minus the ", "
        break;
    if(sb.Length > 0) sb.Append(", ");
    sb.Append(ssplit[i]);
}

在这里,我将字符串分成Mxxx部分。然后我遍历这些部分,直到下一部分溢出 140(或 138,因为它需要在计数中包含", "分隔符)

查看实际效果

由于新字符串的持续内存分配,它可能不是对性能最敏感的解决方案,但它听起来确实像某种一次性原始数据输入。我们可以选择在输入中移除"令牌",而我们有超过 140 个字符:

const string separator = ", ";
while (input.Length > 140)
{
     int delStartIndex = input.LastIndexOf(separator);
     int delLength = input.Length - delStartIndex;
     input = input.Remove(delStartIndex, delLength);
}
一种

更注重性能的方法是为子字符串创建一种IEnumerable<string>string[]的形式,并在连接它们之前计算它们的总长度。大致如下:

const string separator = ", ";
var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
var length = splitInput[0].Length;
var targetIndex = 1;
for (targetIndex = 1; length <= 140; targetIndex++)
    length += separator.Length + splitInput[targetIndex].Length;
if (length > 140)
    targetIndex--;
var splitOutput = new string[targetIndex];
Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);
var output = string.Join(separator, splitOutput);

我们甚至可以制作一个不错的扩展方法,如下所示:

public static class StringUtils
{
    public static string TrimToLength(this string input, string separator, int targetLength)
    {
        var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
        var length = splitInput[0].Length;
        var targetIndex = 1;
        for (targetIndex = 1; length <= targetLength; targetIndex++)
            length += separator.Length + splitInput[targetIndex].Length;
        if (length > targetLength)
            targetIndex--;
        var splitOutput = new string[targetIndex];
        Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);
        return string.Join(separator, splitOutput);
    }
}

并像这样称呼它:

input.TrimToLength(", ", 140);

或:

input.TrimToLength(separator: ", ", targetLength:140);

这是一个解决方案。它从第 141 个字符向后处理字符串。

public static string Normalize(string input, int length)
{
    var terminators = new[] { ',', ' ' };
    if (input.Length <= length + 1)
        return input;
    int i = length + 1;
    while (!terminators.Contains(input[i]) && i > 0)
        i = i - 1;
    return input.Substring(0, i).TrimEnd(' ', ',');
}
Normalize(settlementsString, 140);

我使用这个:

static string FirstN(string s, int n = 140)
{
    if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
    while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
    return s.Substring(0, n);
}

工作测试示例代码(带有注释输出):

using System;
namespace ConsoleApplication1
{
    class Program
    {
        static string FirstN(string s, int n = 140)
        {
            if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
            while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
            return s.Substring(0, n);
        }
        static void Main(string[] args)
        {
            var s = FirstN("M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619");
            Console.WriteLine(s.Length); // 136
            Console.WriteLine(s);  //M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169,
        }
    }
}

我希望这有所帮助。