So I am having the same example here that is been asked, which is xml containing things like the following mediawiki markup:
" ...collected in the 12th century, of which [[Alexander the Great]] was the hero, and in which he was represented, somewhat like the British [[King Arthur|Arthur]]"
using this regexp:
Pattern p = Pattern.compile("\\[\\[([\\w | \\w]+)\\]\\]");
It is working fine and I get this output:
Alexander the Great
King Arthur|Arthur
The problem: If I had a text like [[Alexander|the |Great]]
with two or many vertical bars, this should not match but it matches.
So I changed my regex to match only one vertical bar but did not work:
Pattern p = Pattern.compile("\\[\\[([\\w |? \\w]+)\\]\\]");
To find expressions inside [[
and ]]
which contain alphanumeric characters, spaces and exactly one pipe you can use the following regex
\[\[[\w ]+[\|]{1}[\w ]+\]\]
This however accounts only for those cases in which pipe isn't the first or the last character, but assuming from your question, this situation shouldn't occur.
You can use this:
Pattern p = Pattern.compile("\\[\\[([\\w ]+\\|?[\\w ]*)\\]\\]");
or, as in comments from @fge:
Pattern p = Pattern.compile("\\[\\[([\\w ]+(?:\\|[\\w ])?)\\]\\]");
So I am having the same example here that is been asked, which is xml containing things like the following mediawiki markup:
" ...collected in the 12th century, of which [[Alexander the Great]] was the hero, and in which he was represented, somewhat like the British [[King Arthur|Arthur]]"
using this regexp:
Pattern p = Pattern.compile("\\[\\[([\\w | \\w]+)\\]\\]");
It is working fine and I get this output:
Alexander the Great
King Arthur|Arthur
The problem: If I had a text like [[Alexander|the |Great]]
with two or many vertical bars, this should not match but it matches.
So I changed my regex to match only one vertical bar but did not work:
Pattern p = Pattern.compile("\\[\\[([\\w |? \\w]+)\\]\\]");
To find expressions inside [[
and ]]
which contain alphanumeric characters, spaces and exactly one pipe you can use the following regex
\[\[[\w ]+[\|]{1}[\w ]+\]\]
This however accounts only for those cases in which pipe isn't the first or the last character, but assuming from your question, this situation shouldn't occur.
You can use this:
Pattern p = Pattern.compile("\\[\\[([\\w ]+\\|?[\\w ]*)\\]\\]");
or, as in comments from @fge:
Pattern p = Pattern.compile("\\[\\[([\\w ]+(?:\\|[\\w ])?)\\]\\]");
0 commentaires:
Enregistrer un commentaire