Regular Expressions to Resolve Text Processing Problem

For discussions about programming and projects not necessarily associated with Porteus.

Regular Expressions to Resolve Text Processing Problem

Postby Bogomips » 09 Apr 2017, 17:35

Simple Problem Globally replace with the same string all delimited text, inclusive of delimiter strings, with delimited text being able to span more than one line, (End delimiter need not be on same line as start delimiter). Specific problem being to replace all code blocks in a post so as to present an overview of the document.

  • Kate Using its sed like functionality. Although Lacking Lazy Quantifiers, able to use Look Ahead to reduce line with two code blocks to just the one code block, after which default Greedy Quantifier would work to replace remaining single code block in the line.
    • One Liner
      Code: Select all
      [*][code]guest@porteus:~$ tree -nd x/sda3x/sda3└── ploplinux    └── myscripts2 directories[/code][/list][/list][*]Linux Partitions[list][*]Arch Way[code]# rsync -aAXv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} / /path/to/backup/folder                   [/code][list]

    • Applied Substitution
      Code: Select all
      s/(.*)\[cod.*\[\/cod(.*)(?=code.*\/code)/\1<>\2/

    • Resultant String
      Code: Select all
      [*]<>e][/list][/list][*]Linux Partitions[list][*]Arch Way[code]# rsync -aAXv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} / /path/to/backup/folder                   [/code][list]
    However if code block spans more than one line, we have a problem.
  • Sed GNU sed does not offer full Extended RE functionality, and Lazy Quantifiers seem to be excluded. So it would be left to using Look Arounds with code blocks spanning several lines. Not being so versed in sed, could not explore this possibility as a viability.
  • Perl Does admit of Lazy Quantifiers:
    Code: Select all
    guest@porteus:~$ perl -pe 's/\[code.*?e\]/<>/'
    [*][code]guest@porteus:~$ tree -nd x/sda3x/sda3└── ploplinux    └── myscripts2 directories[/code][/list][/list][*]Linux Partitions[list][*]Arch Way[code]# rsync -aAXv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} / /path/to/backup/folder                   [/code][list]
    ^D
    [*]<>[/list][/list][*]Linux Partitions[list][*]Arch Way[code]# rsync -aAXv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} / /path/to/backup/folder                   [/code][list]

    guest@porteus:~$ perl -pe 's/\[code.*?e\]/<>/'
    [*]<>[/list][/list][*]Linux Partitions[list][*]Arch Way[code]# rsync -aAXv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} / /path/to/backup/folder                   [/code][list]
    ^D
    [*]<>[/list][/list][*]Linux Partitions[list][*]Arch Way<>[list]
    So versatile, but then again the problem of stringing out the whole document so that line breaks are ignored, not possible with a one liner, without having to learn Perl. :(
  • Awk This should do the trick. But then again another different set of regular expressions. Not being a major undertaking, not worth spending quite some hours on revamping my awk, as well as on discerning the allowed REs.
  • Bash Simple coding problem resolvable by coding functionality of bash in conjunction with use of non-complex REs.
    • The Sample of Text read into Array
      Code: Select all
      guest@porteus:~$ readarray -t full_text
      Linux Partitions to RSYNC
      [list]

      [*]Arch Way[code=php]# rsync -aAXv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} / /path/to/backup/folder                                      [/code][list]

      [*][code=php]The --exclude option causes files that match the given patterns to be
       excluded. The contents of /dev, /proc, /sys, /tmp, and /run are excluded in
       the above command, because they are populated at boot, although the folders
      themselves are not created. /lost+found is filesystem-specific.
       [/code]

      [*][code]    Using many hard links, consider adding the -H option, which is turned off by default due to its memory expense
           
          Using sparse files, such as virtual disks, Docker images and similar, add the -S option.
      [/code]
      ^D

    • The Bash Code (Programming might look in places bit stilted, but arises out of idiosyncratic nature of Bash)
      Code: Select all
      clout ()
      {
      #   set -x
      #   Passing arguments by Name. Equivalent of passing Pointer to Variable.
      local -n say=${1:-pay};     # Source Array: Lines of Text
      local -n day=${2:-ray};     # Resultant Destination Array
      l=${#say[*]}; unset day;    # of Lines of Text
      for ((i=0; i<l; ))
      do
          b=${say[i++]}; b=${b//=php/};   # Buffer
          while [[ $b =~ \[code ]]
          do
            w=${b#*\[code\]};    # Rest of Code Line
              b=${b%%\[code\]*};      # Relevant Text
              b+="<>";             # Add Code Block Marker
              # End of Code Block?
              until [[ $w =~ \[/code ]];
              do
                  ((i<l)) || { echo Incomplete Code Block\!; echo "'$w'"; return 1; }
                  w=${say[i++]}; w=${w/=php/};
              done   
              b+=${w#*\[/code\]};      # set +x;
          done
          day+=("$b");
      done
      }

    • The Resultant Text (Code Sourced from File or Pasted into Terminal)
      Code: Select all
      guest@porteus:~$ red_text=""
      guest@porteus:~$ clout full_text red_text
      guest@porteus:~$ printf "%s\n" "${red_text[*]}"
      Linux Partitions to RSYNC [list]  [*]Arch Way<>[list]  [*]<>  [*]<>
      Code blocks replaced by: <>.
Would be interesting to see how the Bash plays against the others.
Last edited by Bogomips on 10 Apr 2017, 20:52, edited 2 times in total.
Reason: Added Comment
Linux porteus 4.4.0-porteus #3 SMP PREEMPT Sat Jan 23 07:01:55 UTC 2016 i686 AMD Sempron(tm) 140 Processor AuthenticAMD GNU/Linux
NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2) MemTotal: 901760 kB MemFree: 66752 kB
Bogomips
Full of knowledge
Full of knowledge
 
Posts: 2293
Joined: 25 Jun 2014, 16:21
Location: London
Distribution: 3.2.2 Cinnamon & KDE5

Return to Programming



Who is online

Users browsing this forum: No registered users and 1 guest