mercredi 16 avril 2014

C# filtre String avec Regex - Stack Overflow


I'm not familiar with the regex, However I think that REGEX could help me a lot to resolve my problem.


I have 2 kind of string in a big List<string> str (with or without description) :


str[0] = "[toto]";
str[1] = "[toto] descriptionToto";
str[2] = "[titi]";
str[3] = "[titi] descriptionTiti";
str[4] = "[tata]";
str[5] = "[tata] descriptionTata";

The list isn't really ordered. I would parse all my list then format datas depending on what I will find inside.


If I find: "[toto]" I would like to get to set str[0]="toto"


and If I find "[toto] descriptionToto" I would like to get to set str[1]="descriptionToto"


Do you have any ideas of the better way to get this result please ?




There are two regex options if you ask me:



  1. Make a regex pattern with two capturing groups, then use group 1 or group 2 depending on whether group 1 is empty. In this case you'd use named capturing groups to get a clear relationship between the pattern and the code


  2. Make a regex that matches string type 1 or string type 2, in which case you would get your end result directly from regex



If you're going for speed, using str[0].IndexOf(']') would get most of the job done.




Rather than regex, I'd be inclined to just use string.split, something along the lines of:


string[] tokens = str[0].Split(new Char [] {'[', ']'});
if (tokens[2] == "") {
str = tokens[1];
} else {
str = tokens[2];
}



if you are planning to get just the description for those that contain description:


you can do a split at a space char - " " and store the second element of the array in str[1] which would be the description. If there's no description, a space would not exist. So do a loop and then in an array store : list.Split(' '). This will split the str with description into two elements. so:


for (int i = 0; i < str.Length; i++)
{
string words[] = str[i].Split(' ')
if words.length > 1
{str[i] = word[1];
}
}



If those are code strings and not literal variable notation this should work.
The replacement just catenates capture group 1 and 2.


Find: ^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$
Replace: "$1$2"


 ^ 
\s*
(?:
\[
( [^\[\]]* ) # (1)
\] \s*
|
\[ [^\[\]]* \]
\s*
( # (2 start)
(?: \s* \S )+
\s*
) # (2 end)
)
$

Dot-Net test case


 string str1 = "[titi]";
Console.WriteLine( Regex.Replace(str1, @"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", @"$1$2"));
string str2 = "[titi] descriptionTiti";
Console.WriteLine( Regex.Replace(str2, @"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", @"$1$2"));

Output >>


 titi
descriptionTiti



You can use single regex:


string s = Regex.Match(str[0], @"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value;

Idea is simple: if the text is ended with ] and there is no other ], then take everything between [ ], otherwise take everything after first ].


Sample code:


List<string> strList = new List<string> {
"[toto]",
"[toto] descriptionToto",
"[titi]",
"[titi] descriptionTiti",
"[tata]",
"[tata] descriptionTata" };
foreach(string str in strList)
Console.WriteLine(Regex.Match(str, @"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value);

Sample output:


toto
descriptionToto
titi
descriptionTiti
tata
descriptionTata


I'm not familiar with the regex, However I think that REGEX could help me a lot to resolve my problem.


I have 2 kind of string in a big List<string> str (with or without description) :


str[0] = "[toto]";
str[1] = "[toto] descriptionToto";
str[2] = "[titi]";
str[3] = "[titi] descriptionTiti";
str[4] = "[tata]";
str[5] = "[tata] descriptionTata";

The list isn't really ordered. I would parse all my list then format datas depending on what I will find inside.


If I find: "[toto]" I would like to get to set str[0]="toto"


and If I find "[toto] descriptionToto" I would like to get to set str[1]="descriptionToto"


Do you have any ideas of the better way to get this result please ?



There are two regex options if you ask me:



  1. Make a regex pattern with two capturing groups, then use group 1 or group 2 depending on whether group 1 is empty. In this case you'd use named capturing groups to get a clear relationship between the pattern and the code


  2. Make a regex that matches string type 1 or string type 2, in which case you would get your end result directly from regex



If you're going for speed, using str[0].IndexOf(']') would get most of the job done.



Rather than regex, I'd be inclined to just use string.split, something along the lines of:


string[] tokens = str[0].Split(new Char [] {'[', ']'});
if (tokens[2] == "") {
str = tokens[1];
} else {
str = tokens[2];
}


if you are planning to get just the description for those that contain description:


you can do a split at a space char - " " and store the second element of the array in str[1] which would be the description. If there's no description, a space would not exist. So do a loop and then in an array store : list.Split(' '). This will split the str with description into two elements. so:


for (int i = 0; i < str.Length; i++)
{
string words[] = str[i].Split(' ')
if words.length > 1
{str[i] = word[1];
}
}


If those are code strings and not literal variable notation this should work.
The replacement just catenates capture group 1 and 2.


Find: ^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$
Replace: "$1$2"


 ^ 
\s*
(?:
\[
( [^\[\]]* ) # (1)
\] \s*
|
\[ [^\[\]]* \]
\s*
( # (2 start)
(?: \s* \S )+
\s*
) # (2 end)
)
$

Dot-Net test case


 string str1 = "[titi]";
Console.WriteLine( Regex.Replace(str1, @"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", @"$1$2"));
string str2 = "[titi] descriptionTiti";
Console.WriteLine( Regex.Replace(str2, @"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", @"$1$2"));

Output >>


 titi
descriptionTiti


You can use single regex:


string s = Regex.Match(str[0], @"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value;

Idea is simple: if the text is ended with ] and there is no other ], then take everything between [ ], otherwise take everything after first ].


Sample code:


List<string> strList = new List<string> {
"[toto]",
"[toto] descriptionToto",
"[titi]",
"[titi] descriptionTiti",
"[tata]",
"[tata] descriptionTata" };
foreach(string str in strList)
Console.WriteLine(Regex.Match(str, @"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value);

Sample output:


toto
descriptionToto
titi
descriptionTiti
tata
descriptionTata

0 commentaires:

Enregistrer un commentaire