jeudi 1 mai 2014

Java - vérification de n'avoir qu'une seule barre verticale | dans une chaîne ? en utilisant les regex - Stack Overflow


So I am having the same example here that is been asked, which is xml containing things like the following mediawiki markup:



" ...collected in the 12th century, of which [[Alexander the Great]] was the hero, and in which he was represented, somewhat like the British [[King Arthur|Arthur]]"



using this regexp:


Pattern p = Pattern.compile("\\[\\[([\\w | \\w]+)\\]\\]");

It is working fine and I get this output:


Alexander the Great
King Arthur|Arthur



The problem: If I had a text like [[Alexander|the |Great]] with two or many vertical bars, this should not match but it matches.


So I changed my regex to match only one vertical bar but did not work:


Pattern p = Pattern.compile("\\[\\[([\\w |? \\w]+)\\]\\]");



To find expressions inside [[ and ]] which contain alphanumeric characters, spaces and exactly one pipe you can use the following regex


\[\[[\w ]+[\|]{1}[\w ]+\]\]

This however accounts only for those cases in which pipe isn't the first or the last character, but assuming from your question, this situation shouldn't occur.




You can use this:


Pattern p = Pattern.compile("\\[\\[([\\w ]+\\|?[\\w ]*)\\]\\]");

or, as in comments from @fge:


Pattern p = Pattern.compile("\\[\\[([\\w ]+(?:\\|[\\w ])?)\\]\\]");


So I am having the same example here that is been asked, which is xml containing things like the following mediawiki markup:



" ...collected in the 12th century, of which [[Alexander the Great]] was the hero, and in which he was represented, somewhat like the British [[King Arthur|Arthur]]"



using this regexp:


Pattern p = Pattern.compile("\\[\\[([\\w | \\w]+)\\]\\]");

It is working fine and I get this output:


Alexander the Great
King Arthur|Arthur



The problem: If I had a text like [[Alexander|the |Great]] with two or many vertical bars, this should not match but it matches.


So I changed my regex to match only one vertical bar but did not work:


Pattern p = Pattern.compile("\\[\\[([\\w |? \\w]+)\\]\\]");


To find expressions inside [[ and ]] which contain alphanumeric characters, spaces and exactly one pipe you can use the following regex


\[\[[\w ]+[\|]{1}[\w ]+\]\]

This however accounts only for those cases in which pipe isn't the first or the last character, but assuming from your question, this situation shouldn't occur.



You can use this:


Pattern p = Pattern.compile("\\[\\[([\\w ]+\\|?[\\w ]*)\\]\\]");

or, as in comments from @fge:


Pattern p = Pattern.compile("\\[\\[([\\w ]+(?:\\|[\\w ])?)\\]\\]");

0 commentaires:

Enregistrer un commentaire