修剪字符串中的空白字符

本文关键字:空白 字符 字符串 修剪 | 更新日期: 2023-09-27 18:26:26

我有一个字符串,单词之间有未知的空白字符组合('t'n或空格)。例如:

string str = "Hello 't't  'n 't    't World! 'tPlease Help.";

我想用一个空格替换每个内部空白字符序列:

string str = "Hello World! Please Help.";

.NET是否提供了一种内置的方法来实现这一点?如果没有,我如何通过C#实现这一点?

修剪字符串中的空白字符

using System.Text.RegularExpressions;
newString = Regex.Replace(oldString, @"'s+", " ");

尝试以下regex替换

string original = ...;
string replaced = Regex.Replace(original, @"'s+", " ");

这将用单个空格替换每组空白字符('s)。你可以在这里找到其他有用的字符组

  • http://msdn.microsoft.com/en-us/library/4edbef7e(v=vs.71).aspx

string tridged=Regex.Replace(原始,@"''s+",");

参考-http://www.dotnetperls.com/regex-replace-spaces

没有内置方法可以实现这一点,但您可以使用正则表达式:

string result = Regex.Replace(str, @"'s+", " ");

我使用的方法略有不同。有点冗长(目前在VB中),但它允许我轻松地进行各种排除,如符号、标点符号或类别组合。它还使我不必学习正则表达式。

Imports System.Runtime.CompilerServices
Imports System.Globalization
Imports System.Text
Public Module StringExclusions
        <Extension()> Public Function CharsToString(ByVal val As IEnumerable(Of Char)) As String
            Dim bldr As New StringBuilder()
            bldr.Append(val.ToArray)
            Return bldr.ToString()
        End Function
        <Extension()> Public Function RemoveCategories(ByVal val As String, ByVal categories As IEnumerable(Of UnicodeCategory)) As String
            Return (From chr As Char In val.ToCharArray Where Not categories.Contains(Char.GetUnicodeCategory(chr))).CharsToString
        End Function
        Public Function WhiteSpaceCategories() As IEnumerable(Of UnicodeCategory)
            Return New List(Of UnicodeCategory) From {UnicodeCategory.SpaceSeparator, UnicodeCategory.LineSeparator, UnicodeCategory.Control}
        End Function
        '...Other commonly used categories removed for brevity.
    End Module

还有一些测试。

   [TestMethod]
    public void RemoveCharacters()
    {
        String testObj = "a 'a b 'b c 'f d 'n e 'r f 't g 'v h";
        Assert.AreEqual(@"abcdefgh", testObj.RemoveCategories(Strings.WhiteSpaceCategories()));
    }
    [TestMethod]
    public void KeepValidCharacters()
    {
        String testObj = @"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ`12334567890-=~!@#$%^&*()_+[]'{}|;':,./<>?"  + "'"";
        Assert.AreEqual(@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ`12334567890-=~!@#$%^&*()_+[]'{}|;':,./<>?" + "'"", testObj.RemoveCategories(Strings.WhiteSpaceCategories()));
    }

您可以在不使用Regex:的情况下尝试更快的替代方案

string replaced = String.Join(" ", str.Split(
   new char[] { ' ', ''t', ''r', ''n' }, StringSplitOptions.RemoveEmptyEntries));

实现这一点的最快和通用方法(也将处理行终止符和制表符)。Regex强大的设施并不真的需要解决这个问题,但Regex会降低性能。

String  
.Join
(" ",     
  new string  
  (stringToRemoveWhiteSpaces
      .Select
      (
         c => char.IsWhiteSpace(c) ? ' ' : c
      )
      .ToArray<char>()
  )
  .Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries)
)