Removing entire tags containing a specific term using regex

Question

0.00/5 (No votes)

See more:

I am altering a database with approximately 500 html pages using phpmyadmin. Several pages contain a Facebook Pixel or Google Tag that I would like to remove.

The easiest way I thought would be to search via regex the entire tag that contains some expression or term related to Facebook or Google, and replace it with blank.

An example would be

HTML

<script>
    window.dataLayer = window.dataLayer || [];

    function gtag() {
      dataLayer.push(arguments);
    }
    gtag('js', new Date());
    gtag('config', 'G-XXXXXXXX');
  </script>

or

HTML

<script>
(window, document, 'script', 'https://connect.facebook.net/en_US/fbevents.js');
    fbq('init', '9999999999999999');
    fbq('track', 'salespage_xxxxxx');
  </script>

Although all are unique, some have the same code or another element that makes it possible to identify each one of them.

Before running in myphpadmin, I'm trying to formulate the expression using SublimeText3

It's the first contact I have with the regex and I found it fascinating, but even following some references I can't match the search.

The expression I came up with after some research was

<(.*)>[\s\S]face[\s\S]<\/(.*)>

Where I thought the expression would select the entire tag containing the word "face", but it doesn't find anything.

I would like some help.

If it works, it would be able to make several other necessary changes.

What I have tried:

(.*)>[\s\S]face[\s\S]<\/(.*)

Posted 25-Nov-21 15:22pm

G2aA

Updated 25-Nov-21 20:19pm

v2

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Answer 1 · 2021-11-25T20:19:00

To be honest, Regex is not a good tool for this: it's a text processor, and that means it's great at processing well formed text, but not so great at processing syntax.
If you are going to play with Regexes, then get a copy of Expresso[^] - it's free, and it examines, tests, and generates Regular expressions.

And the Regex you show doesn't even come close to finding anything in your examples:

A numbered capture group
   Any char, any number of repetitions (Including zero)
A literal ">" followed by whitespace or non whitespace
The literal "face"
Another whitespace-or-not-whitespace character
A literal "<" followed by a literal "/"
A numbered capture group
   Any char, any number of repetitions (Including zero)

Which frankly is garbage, and even if it did work - which it won't - would match the entire document rather than just a fragment of it!

Instead of that, look at an HTML parser: php html processor - Google Search[^]
I haven't used any of them - I do all my HTMNL stuff in C# - but something there will work better than any regex!