mardi 29 avril 2014

c# - insensible à la casse « Contains » - Stack Overflow


Is there a way to make the following return true?


string title = "ASTRINGTOTEST";
title.Contains("string");

There doesn't seem to be an overload that allows me to set the case sensitivity.. Currently I UPPERCASE them both, but that's just silly.


UPDATE
The sillyness I refer to is the i18n issues that come with up- and down casing.




To test if the string paragraph contains the string word (thanks @QuarterMeister)


culture.CompareInfo.IndexOf(paragraph, word, CompareOptions.IgnoreCase) 

Where culture is the instance of CultureInfo describing the language that the text is written in.


This solution is transparent about the definition of case-insensitivity, which is language dependent. For example, the English language uses the characters I and i for the upper and lower case versions of the ninth letter, whereas the Turkish language uses these characters for the eleventh and twelfth letters of its 29 letter-long alphabet. The Turkish upper case version of 'i' is the unfamiliar character 'İ'.


Thus the strings tin and TIN are the same word in English, but different words in Turkish. As I understand, one means 'spirit' and the other is an onomatopoeia word. (Turks, please correct me if I'm wrong, or suggest a better example)


To summarise, you can only answer the question 'are these two strings the same but in different cases' if you know what language the text is in. If you don't know, you'll have to take a punt. Given English's hegemony in software, you should probably resort to CultureInfo.InvariantCulture, because it'll be wrong in familiar ways.




You could use IndexOf() and pass StringComparison.OrdinalIgnoreCase


string title = "STRING";
bool contains = title.IndexOf("string", StringComparison.OrdinalIgnoreCase) >= 0;

Even better is defining a new extension method for string


public static bool Contains(this string source, string toCheck, StringComparison comp)
{
return source.IndexOf(toCheck, comp) >= 0;
}

string title = "STRING";
bool contains = title.Contains("string", StringComparison.OrdinalIgnoreCase);



You can use IndexOf() like this:


string title = "STRING";

if (title.IndexOf("string", 0, StringComparison.CurrentCultureIgnoreCase) != -1)
{
// The string exists in the original
}

Since 0 (zero) can be an index, you check against -1.




Alternate solution using Regex:


bool contains = Regex.Match("StRiNG to search", "string", RegexOptions.IgnoreCase).Success;



One issue with the answer is that it will throw an exception if a string is null. You can add that as a check so it won't:


    public static bool Contains(this string source, string toCheck, StringComparison comp)
{
if (string.IsNullOrEmpty(toCheck) || string.IsNullOrEmpty(source))
return true;

return source.IndexOf(toCheck, comp) >= 0;
}



This is clean and simple.


   Regex.IsMatch(file,fileNamestr,RegexOptions.IgnoreCase)



You could always just up or downcase the strings first.


string title = "string":
title.ToUpper().Contains("STRING") // returns true

Oops, just saw that last bit. A case insensitive compare would *probably* do the same anyway, and if performance is not an issue, I don't see a problem with creating uppercase copies and comparing those. I could have sworn that I once saw a case-insensitive compare once...




StringExtension class is the way forward, I've combined a couple of the posts above to give a complete code example:


public static class StringExtensions
{
/// <summary>
/// Allows case insensitive checks
/// </summary>
public static bool Contains(this string source, string toCheck, StringComparison comp)
{
return source.IndexOf(toCheck, comp) >= 0;
}
}



I know that this is not the C#, but in the framework (VB.NET) there is already such a function


Dim str As String = "UPPERlower"
Dim b As Boolean = InStr(str, "UpperLower")

C# variant:


string myString = "Hello World";
bool contains = Microsoft.VisualBasic.Strings.InStr(myString, "world");



Use this:


string.Compare("string", "STRING", new System.Globalization.CultureInfo("en-US"), System.Globalization.CompareOptions.IgnoreCase);



The InStr method from the VisualBasic assembly is the best if you have a concern about internationalization (or you could reimplement it). Looking at in it dotNeetPeek shows that not only does it account for caps and lowercase, but also for kana type and full- vs. half-width characters (mostly relevant for Asian languages, although there are full-width versions of the Roman alphabet too). I'm skipping over some details, but check out the private method InternalInStrText:


private static int InternalInStrText(int lStartPos, string sSrc, string sFind)
{
int num = sSrc == null ? 0 : sSrc.Length;
if (lStartPos > num || num == 0)
return -1;
if (sFind == null || sFind.Length == 0)
return lStartPos;
else
return Utils.GetCultureInfo().CompareInfo.IndexOf(sSrc, sFind, lStartPos, CompareOptions.IgnoreCase | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth);
}



using RegEx is a straight way to do this


Regex.IsMatch(title, "string", RegexOptions.IgnoreCase);


Is there a way to make the following return true?


string title = "ASTRINGTOTEST";
title.Contains("string");

There doesn't seem to be an overload that allows me to set the case sensitivity.. Currently I UPPERCASE them both, but that's just silly.


UPDATE
The sillyness I refer to is the i18n issues that come with up- and down casing.



To test if the string paragraph contains the string word (thanks @QuarterMeister)


culture.CompareInfo.IndexOf(paragraph, word, CompareOptions.IgnoreCase) 

Where culture is the instance of CultureInfo describing the language that the text is written in.


This solution is transparent about the definition of case-insensitivity, which is language dependent. For example, the English language uses the characters I and i for the upper and lower case versions of the ninth letter, whereas the Turkish language uses these characters for the eleventh and twelfth letters of its 29 letter-long alphabet. The Turkish upper case version of 'i' is the unfamiliar character 'İ'.


Thus the strings tin and TIN are the same word in English, but different words in Turkish. As I understand, one means 'spirit' and the other is an onomatopoeia word. (Turks, please correct me if I'm wrong, or suggest a better example)


To summarise, you can only answer the question 'are these two strings the same but in different cases' if you know what language the text is in. If you don't know, you'll have to take a punt. Given English's hegemony in software, you should probably resort to CultureInfo.InvariantCulture, because it'll be wrong in familiar ways.



You could use IndexOf() and pass StringComparison.OrdinalIgnoreCase


string title = "STRING";
bool contains = title.IndexOf("string", StringComparison.OrdinalIgnoreCase) >= 0;

Even better is defining a new extension method for string


public static bool Contains(this string source, string toCheck, StringComparison comp)
{
return source.IndexOf(toCheck, comp) >= 0;
}

string title = "STRING";
bool contains = title.Contains("string", StringComparison.OrdinalIgnoreCase);


You can use IndexOf() like this:


string title = "STRING";

if (title.IndexOf("string", 0, StringComparison.CurrentCultureIgnoreCase) != -1)
{
// The string exists in the original
}

Since 0 (zero) can be an index, you check against -1.



Alternate solution using Regex:


bool contains = Regex.Match("StRiNG to search", "string", RegexOptions.IgnoreCase).Success;


One issue with the answer is that it will throw an exception if a string is null. You can add that as a check so it won't:


    public static bool Contains(this string source, string toCheck, StringComparison comp)
{
if (string.IsNullOrEmpty(toCheck) || string.IsNullOrEmpty(source))
return true;

return source.IndexOf(toCheck, comp) >= 0;
}


This is clean and simple.


   Regex.IsMatch(file,fileNamestr,RegexOptions.IgnoreCase)


You could always just up or downcase the strings first.


string title = "string":
title.ToUpper().Contains("STRING") // returns true

Oops, just saw that last bit. A case insensitive compare would *probably* do the same anyway, and if performance is not an issue, I don't see a problem with creating uppercase copies and comparing those. I could have sworn that I once saw a case-insensitive compare once...



StringExtension class is the way forward, I've combined a couple of the posts above to give a complete code example:


public static class StringExtensions
{
/// <summary>
/// Allows case insensitive checks
/// </summary>
public static bool Contains(this string source, string toCheck, StringComparison comp)
{
return source.IndexOf(toCheck, comp) >= 0;
}
}


I know that this is not the C#, but in the framework (VB.NET) there is already such a function


Dim str As String = "UPPERlower"
Dim b As Boolean = InStr(str, "UpperLower")

C# variant:


string myString = "Hello World";
bool contains = Microsoft.VisualBasic.Strings.InStr(myString, "world");


Use this:


string.Compare("string", "STRING", new System.Globalization.CultureInfo("en-US"), System.Globalization.CompareOptions.IgnoreCase);


The InStr method from the VisualBasic assembly is the best if you have a concern about internationalization (or you could reimplement it). Looking at in it dotNeetPeek shows that not only does it account for caps and lowercase, but also for kana type and full- vs. half-width characters (mostly relevant for Asian languages, although there are full-width versions of the Roman alphabet too). I'm skipping over some details, but check out the private method InternalInStrText:


private static int InternalInStrText(int lStartPos, string sSrc, string sFind)
{
int num = sSrc == null ? 0 : sSrc.Length;
if (lStartPos > num || num == 0)
return -1;
if (sFind == null || sFind.Length == 0)
return lStartPos;
else
return Utils.GetCultureInfo().CompareInfo.IndexOf(sSrc, sFind, lStartPos, CompareOptions.IgnoreCase | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth);
}


using RegEx is a straight way to do this


Regex.IsMatch(title, "string", RegexOptions.IgnoreCase);

0 commentaires:

Enregistrer un commentaire