Click here to Skip to main content
15,664,182 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
I need to match specific words in a text, but not if they're inside parentheses (with or without other words within).

For example, in the text

La première édition du code civil français est l'aboutissement d'une double réflexion menée afin d'améliorer les échanges juridiques ... (édition du code civil français) est ...

I need to match the first "code civil" words, but not the last ones (inside parentheses).

I'm using the following regular expression

/[^(]code (civil|pénal|de procédure civile|de procédure pénale)[^)]/g

The issue is that - referring to the example above - that RE also matches the spaces before and after "code civil" (i.e.: " code civil ").

How can I remove those spaces?

Thanks in advance.

What I have tried:

/[^(]code (civil|pénal|de procédure civile|de procédure pénale)[^)]/g
Updated 31-Jan-23 2:28am

I was able to get a match using:
(?![^(]*\))\s*(code (?:civil|pénal|de procédure civile|de procédure pénale))\s*

To break this down:

This is a negative lookahead, it stipulates that we shouldn't capture matches where the text begins with a ( and ends with a ). I suppose important to note with this is that it will match if the term is within parenthesis without a closing symbol.

This checks for zero or more spaces. By placing this before the code part we're saying that we don't want to capture the spaces around the words.
(code (?:civil|pénal|de procédure civile|de procédure pénale))

This captures the whole text as part of a match group. You'd need to adjust your code to pick the first group matched rather than the whole text. The (?: ... ) bit just means "we want to match this but we don't want it to be a separate group"

Someone else might have a better way of capturing this but I tested this at regex101[^] and it seemed to work!
Share this answer
LB2371 31-Jan-23 8:39am    
It doesn't work:
Chris Copeland 31-Jan-23 8:44am    
Sorry but I don't seem to understand what's not working, from reading the initial question you wanted to match the "code civil" in the sentence but not the one in the parenthesis, which is what the result of that regex seems to be doing? What part of it is not working?
LB2371 31-Jan-23 8:49am    
Your RE - as well as mine - matches the spaces before and after "code civil" (i.e.: " code civil "). I need not to match those spaces.
Chris Copeland 31-Jan-23 8:53am    
I did say in my answer you'd need to adjust your code to pick the first group matched, not the whole text. Here's an example JSFiddle[^] showing how to do that. The first [0] value will always be the whole match but the subsequent values [1+] will be the groups.
LB2371 31-Jan-23 11:27am    
The problem is that
Means literally "Match anything that isn't an open perenthesis" - so it's not just spaces that are a problem - any character before "code" other than "(" will be matched.

Personally, I wouldn't use a single Regex to do this: Id use one Regex to remove the parentheses and everything within them:
And than process the resulting string to check for the content you did want.

While it might be possible to do this in a single Regex - and I'm not saying it is or isn't - the resulting expression would be horribly complicated and next to impossible to maintain later. Breaking it in two makes it significantly more readable, and that makes your app more reliable and maintainable.
Share this answer
LB2371 1-Feb-23 5:49am    
Thanks for your suggestion. Is it possible to do something like this?
node.innerHTML = node.innerHTML.replace(/[^(]code (civil|pénal|de procédure civile|de procédure pénale)[^)]/g, m = (m.trim()) => `${m}`);

I've tried it, but it doesn't work.
OriginalGriff 1-Feb-23 6:13am    
Did you read anything I actually wrote, or just skip over that because it wasn't code?
LB2371 1-Feb-23 7:23am    
I read what you wrote, and I thank you for your useful suggestion. But before deciding what to do, I would like to find other, less "radical", solutions. That's the reason of my question.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900