Regular Expression Exclusion

For discussions about programming and projects not necessarily associated with Porteus.
Bogomips
Full of knowledge
Full of knowledge
Posts: 2564
Joined: 25 Jun 2014, 15:21
Distribution: 3.2.2 Cinnamon & KDE5
Location: London

Regular Expression Exclusion

Post#1 by Bogomips » 07 Aug 2014, 19:04

Looked at what the Net has to offer, but no one has offered a solution which can be generalized. The problem is simple. There is a string of charcters s which would produce a 'non-match'. i.e. any string with substring s would not match this RE. Can any bright sparks produce this RE?

Simple example: porteus.org ends up in a blanket blacklist, so wish to whitelist everythinng porteus except for strings with 'mchat', using an RE, as 'mchat' quite often throws browser into tight loop.

Would be grateful for an elegant solution. :unknown:
Linux porteus 4.4.0-porteus #3 SMP PREEMPT Sat Jan 23 07:01:55 UTC 2016 i686 AMD Sempron(tm) 140 Processor AuthenticAMD GNU/Linux
NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2) MemTotal: 901760 kB MemFree: 66752 kB

User avatar
brokenman
Site Admin
Site Admin
Posts: 6105
Joined: 27 Dec 2010, 03:50
Distribution: Porteus v4 all desktops
Location: Brazil

Re: Regular Expression Exclusion

Post#2 by brokenman » 08 Aug 2014, 01:02

Any particular reason you need a REGEX for this? Depending on the tool you are using there may be other more elegant ways. In any case with a REGEX you could use negative look ahead.

Code: Select all

(?!^mchat$)(^.*$)
How do i become super user?
Wear your underpants on the outside and put on a cape.

Bogomips
Full of knowledge
Full of knowledge
Posts: 2564
Joined: 25 Jun 2014, 15:21
Distribution: 3.2.2 Cinnamon & KDE5
Location: London

Re: Regular Expression Exclusion

Post#3 by Bogomips » 08 Aug 2014, 20:36

brokenman wrote:Any particular reason you need a REGEX for this? Depending on the tool you are using there may be other more elegant ways.
Using Silent Block, an excellent Moz Add-on from Japan. On the one hand there is a blacklist, which is processed first, before the whitelist. All matches are REGEX. You have no idea the amount of rubbish that gets logged as blocked. Not having the luxury of 8 processors, have to protect single cpu from onslaught of scripts, So had already blocked doubleclick.net long before it achieved NSA notoriety. 8) After instituting blocks was amazed by the almost instantaneous nature of system responses. Final RE in my blacklist is now '\.js$' :evil:
brokenman wrote:In any case with a REGEX you could use negative look ahead.

Code: Select all

(?!^mchat$)(^.*$)
Thanks, I'll try that. :good:
Linux porteus 4.4.0-porteus #3 SMP PREEMPT Sat Jan 23 07:01:55 UTC 2016 i686 AMD Sempron(tm) 140 Processor AuthenticAMD GNU/Linux
NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2) MemTotal: 901760 kB MemFree: 66752 kB

Bogomips
Full of knowledge
Full of knowledge
Posts: 2564
Joined: 25 Jun 2014, 15:21
Distribution: 3.2.2 Cinnamon & KDE5
Location: London

Re: Regular Expression Exclusion

Post#4 by Bogomips » 10 Aug 2014, 15:41

brokenman wrote:In any case with a REGEX you could use negative look ahead.

Code: Select all

(?!^mchat$)(^.*$)
In trying to understand suggested RE, trawled the Net for 'look ahead'. Came across cookkbook recipe for excluding strings containing a certain substring, in this case 'invalid':

Code: Select all

^(?!.*invalid.*).*
Began to look like the suggested RE for 'mchat' would not work. Anyway came across bewildering amount of technical detail and discussion about anchors and efficiency. Forced to go back to first principles and simple logic of what was required.

So, as memo to myself and anyone else:
Wisdom of the Web wrote:Negative look ahead is used to match something not followed by something else.
i.e. if something is not followed by something else we have a match.

^[^u] => Start of String not followed by 'u'

Code: Select all

usa:	no match
eu:	match
Similarly: ^(?!u) => Start of String not followed by 'u'

Code: Select all

usa:		no match
eu:		match
All matches being at position 0.

Extrapolating: ^(?!.*mchat) => Start of String not followed by generalized string '.*mchat.*'

Code: Select all

No match:	http://archive.linuxfromscratch.org/lfs-museumchat/2.3.1/LFS-BOOK-2.3.1-HTML/index.html

Match:		http://media7.fast-torrent.ru/media/js/jquery-ui-1.10.3.custom1.min.js
Match being at position 0.

Expression (?!^mchat$)(^.*$):

Code: Select all

Match:		http://archive.linuxfromscratch.org/lfs-museumchat/2.3.1/LFS-BOOK-2.3.1-HTML/index.html

Match:		http://media7.fast-torrent.ru/media/js/jquery-ui-1.10.3.custom1.min.js
Matches being at position 0.

All tests done at http://www.regular-expressions.info/jav ... ample.html.

So, ^(?!.*mchat) is not very far from cookbook example ^(?!.*invalid.*).*, but someonne commenting there suggested ^(?!invalid)(.(?!invalid))*$, although more complex, would lead to a more simple (less matches) result. This defeats me.

IMHO ^(?!.*mchat) would just be one scan for 'mchat', which would decide the match.

Nonetheless it apppears that the Regular Expression to exclude a string containing a substring s would be: ^(?!.*s)

Real Life whitelist example: '.*porteus\.org(?!.*mchat)'

Code: Select all

No match:	http://forum.porteus.org/mchat/jquery_cookie_mini.jsg/styles/prosilver/template/forum_fn.js


Match:	http://forum.porteus.org/styles/prosilver/template/forum_fn.js


Match:	http://forum.porteus.org/chat/jquery_cookie_mini.jshttp://forum.porteus.org/styles/prosilver/template/forum_fn.js
Matches being at position 0 in both cases.

As opposed to 'porteus\.org(?!.*mchat)'

Code: Select all

No match:	http://forum.porteus.org/mchat/jquery_cookie_mini.js

Match:		http://forum.porteus.org/styles/prosilver/template/forum_fn.js

Match:		http://forum.porteus.org/chat/jquery_cookie_mini.jshttp://forum.porteus.org/styles/prosilver/template/forum_fn.js
In both cases match at position 13:
porteus.org

Epilog

Code: Select all

SilentBlock:  /\.js$/i
  blocked  http://forum.porteus.org/mchat/jquery-1.5.0.min.js

SilentBlock:  /\.js$/i
  blocked  http://forum.porteus.org/styles/prosilver/template/forum_fn.js
  but unblocked by  /porteus\.org(?!.*mchat)/i
Linux porteus 4.4.0-porteus #3 SMP PREEMPT Sat Jan 23 07:01:55 UTC 2016 i686 AMD Sempron(tm) 140 Processor AuthenticAMD GNU/Linux
NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2) MemTotal: 901760 kB MemFree: 66752 kB

Post Reply