Page 1 of 1

Regular Expression Exclusion

Posted: 07 Aug 2014, 19:04
by Bogomips
Looked at what the Net has to offer, but no one has offered a solution which can be generalized. The problem is simple. There is a string of charcters s which would produce a 'non-match'. i.e. any string with substring s would not match this RE. Can any bright sparks produce this RE?

Simple example: porteus.org ends up in a blanket blacklist, so wish to whitelist everythinng porteus except for strings with 'mchat', using an RE, as 'mchat' quite often throws browser into tight loop.

Would be grateful for an elegant solution. :unknown:

Re: Regular Expression Exclusion

Posted: 08 Aug 2014, 01:02
by brokenman
Any particular reason you need a REGEX for this? Depending on the tool you are using there may be other more elegant ways. In any case with a REGEX you could use negative look ahead.

Code: Select all

(?!^mchat$)(^.*$)

Re: Regular Expression Exclusion

Posted: 08 Aug 2014, 20:36
by Bogomips
brokenman wrote:Any particular reason you need a REGEX for this? Depending on the tool you are using there may be other more elegant ways.
Using Silent Block, an excellent Moz Add-on from Japan. On the one hand there is a blacklist, which is processed first, before the whitelist. All matches are REGEX. You have no idea the amount of rubbish that gets logged as blocked. Not having the luxury of 8 processors, have to protect single cpu from onslaught of scripts, So had already blocked doubleclick.net long before it achieved NSA notoriety. 8) After instituting blocks was amazed by the almost instantaneous nature of system responses. Final RE in my blacklist is now '\.js$' :evil:
brokenman wrote:In any case with a REGEX you could use negative look ahead.

Code: Select all

(?!^mchat$)(^.*$)
Thanks, I'll try that. :good:

Re: Regular Expression Exclusion

Posted: 10 Aug 2014, 15:41
by Bogomips
brokenman wrote:In any case with a REGEX you could use negative look ahead.

Code: Select all

(?!^mchat$)(^.*$)
In trying to understand suggested RE, trawled the Net for 'look ahead'. Came across cookkbook recipe for excluding strings containing a certain substring, in this case 'invalid':

Code: Select all

^(?!.*invalid.*).*
Began to look like the suggested RE for 'mchat' would not work. Anyway came across bewildering amount of technical detail and discussion about anchors and efficiency. Forced to go back to first principles and simple logic of what was required.

So, as memo to myself and anyone else:
Wisdom of the Web wrote:Negative look ahead is used to match something not followed by something else.
i.e. if something is not followed by something else we have a match.

^[^u] => Start of String not followed by 'u'

Code: Select all

usa:	no match
eu:	match
Similarly: ^(?!u) => Start of String not followed by 'u'

Code: Select all

usa:		no match
eu:		match
All matches being at position 0.

Extrapolating: ^(?!.*mchat) => Start of String not followed by generalized string '.*mchat.*'

Code: Select all

No match:	http://archive.linuxfromscratch.org/lfs-museumchat/2.3.1/LFS-BOOK-2.3.1-HTML/index.html

Match:		http://media7.fast-torrent.ru/media/js/jquery-ui-1.10.3.custom1.min.js
Match being at position 0.

Expression (?!^mchat$)(^.*$):

Code: Select all

Match:		http://archive.linuxfromscratch.org/lfs-museumchat/2.3.1/LFS-BOOK-2.3.1-HTML/index.html

Match:		http://media7.fast-torrent.ru/media/js/jquery-ui-1.10.3.custom1.min.js
Matches being at position 0.

All tests done at http://www.regular-expressions.info/jav ... ample.html.

So, ^(?!.*mchat) is not very far from cookbook example ^(?!.*invalid.*).*, but someonne commenting there suggested ^(?!invalid)(.(?!invalid))*$, although more complex, would lead to a more simple (less matches) result. This defeats me.

IMHO ^(?!.*mchat) would just be one scan for 'mchat', which would decide the match.

Nonetheless it apppears that the Regular Expression to exclude a string containing a substring s would be: ^(?!.*s)

Real Life whitelist example: '.*porteus\.org(?!.*mchat)'

Code: Select all

No match:	http://forum.porteus.org/mchat/jquery_cookie_mini.jsg/styles/prosilver/template/forum_fn.js


Match:	http://forum.porteus.org/styles/prosilver/template/forum_fn.js


Match:	http://forum.porteus.org/chat/jquery_cookie_mini.jshttp://forum.porteus.org/styles/prosilver/template/forum_fn.js
Matches being at position 0 in both cases.

As opposed to 'porteus\.org(?!.*mchat)'

Code: Select all

No match:	http://forum.porteus.org/mchat/jquery_cookie_mini.js

Match:		http://forum.porteus.org/styles/prosilver/template/forum_fn.js

Match:		http://forum.porteus.org/chat/jquery_cookie_mini.jshttp://forum.porteus.org/styles/prosilver/template/forum_fn.js
In both cases match at position 13:
porteus.org

Epilog

Code: Select all

SilentBlock:  /\.js$/i
  blocked  http://forum.porteus.org/mchat/jquery-1.5.0.min.js

SilentBlock:  /\.js$/i
  blocked  http://forum.porteus.org/styles/prosilver/template/forum_fn.js
  but unblocked by  /porteus\.org(?!.*mchat)/i