• AVR Freaks

Hot!Regular Expression Engine

Author
Gort2015
Klaatu Barada Nikto
  • Total Posts : 3315
  • Reward points : 0
  • Joined: 2015/04/30 10:49:57
  • Location: 0
  • Status: offline
2019/10/10 15:39:29 (permalink)
0

Regular Expression Engine

Regular Expression Engine - test expression #1000
 
Too ugly:
 /Hello|(?'Color'Red|Orange|(?'Sky'Gr[ae]y|(?'Gort'Klat{2}u|Barada|Nikto)+|Green[\\x30-\\x39]|Blue|(?'Shape'Rectangle|Triangle)|Cyan){7..11})|World[A-Z-AEIOU]
 
That's better:
Hello 
<Color>
    Red
    Orange
    <Sky> ;7 to 11
        Gr[ae]y
        <Gort> ;1 to INF
            Klat{2}u
            Barada
            Nikto
        </Gort>
        Green[0-9]
        Blue
        <Shape>
            Rectangle
            Triangle
        </Shape>
    </Sky>
    Cyan
</Color>
World[A TO Z BUT NOT AEIOU]

Looks good.
post edited by Gort2015 - 2019/10/10 15:45:16

MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
https://www.youtube.com/watch?v=Iu1qa8N2ID0
+ ST:Continues, "What Ships are Made for", Q's back.
#1

10 Replies Related Threads

    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 16:05:00 (permalink)
    0
    That is output from test code for the regex engine that I have been working on.
     
    Regex is based on groups, if you are not in a group then you are in ROOT.
    ("Person of Interest")
     
    Capture classes can be inspected:
    /(?<Color>RED|ORANGE|(?<Borg>BLACK|(?<Gort>Klattu|Barada|Nikto)|GREEN|BLUE|(?<Shape>Rectangle|Triangle)|CYAN))/
    R:
    C->Name           -
    C->Parent     SELF
    C->Target         -
    C->Body           0
    C->TargetIndex    0
    C->Base           0
    C->End          110
    Siblings      'Color' [1 Child]
    -==-==-==-==-==-
    1:
    C->Name       'Color'
    C->Parent     ROOT
    C->Target     "(?<Borg>BLACK|(?<Gort>Klattu|Barada|Nikto)|GREEN|BLUE|(?<Shape>Rectangle|Triangle)|CYAN))", 21
    C->Body           8
    C->TargetIndex    2
    C->Base           1
    C->End          109
    Siblings      'Borg' [1 Child]
    -==-==-==-==-==-
    2:
    C->Name       'Borg'
    C->Parent     'Color'
    C->Target     "(?<Shape>Rectangle|Triangle)|CYAN)", 75
    C->Body           7
    C->TargetIndex    3
    C->Base          21
    C->End          108
    Siblings      'Gort', 'Shape' [2 Children]
    -==-==-==-==-==-
    3:
    C->Name       'Gort'
    C->Parent     'Borg'
    C->Target     "Triangle", 94
    C->Body           7
    C->TargetIndex    1
    C->Base          35
    C->End           62
    Siblings       [0 Children]
    -==-==-==-==-==-
    4:
    C->Name       'Shape'
    C->Parent     'Borg'
    C->Target         -
    C->Body           8
    C->TargetIndex    0
    C->Base          75
    C->End          102
    Siblings       [0 Children]


    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #2
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 16:12:03 (permalink)
    0
    Also this output:
    R:           (/(?<Color>RED|ORANGE|(?<Borg>BLACK|(?<Gort>Klattu|Barada|Nikto)|GREEN|BLUE|(?<Shape>Rectangle|Triangle)|CYAN))/
    1: <Color>        (RED|ORANGE|(?<Borg>BLACK|(?<Gort>Klattu|Barada|Nikto)|GREEN|BLUE|(?<Shape>Rectangle|Triangle)|CYAN))
    2: <Borg>               (BLACK|(?<Gort>Klattu|Barada|Nikto)|GREEN|BLUE|(?<Shape>Rectangle|Triangle)|CYAN)
    3: <Gort>                     (Klattu|Barada|Nikto)
    4: <Shape>                    (Rectangle|Triangle)

    Therefore you can see that Shape and Gort are Siblings of Borg

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #3
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 16:18:15 (permalink)
    0
    The engine progress can be seen and any errors highlighed
    BTXT: Triangle
    ETXT: ........^
    BREX: (?<Color>RED|ORANGE|(?<Borg>BLACK|(?<Gort>Klattu|Barada|Nikto)|GREEN|BLUE|(?<Shape>Rectangle|Triangle)|CYAN))
    EREX: .............................................................................................................^
    Error #0 OK
    - end -

    post edited by Gort2015 - 2019/10/10 16:21:14

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #4
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 16:34:35 (permalink)
    0
    Testing and formatting output.  This is the part I am working on right now, I need that to work with something else.
     
    If you are going to debug, do it proper job.
     
    Test output from:
     
    ^(?'Gort'Klattu|Barada|Nikto)|(?<Shape>Rectangle|Triangle)$
     
    to:
     
    ^ (? 'Gort' Klattu | Barada | Nikto ) | (? 'Shape' Rectangle | Triangle ) $
     

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #5
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 16:39:51 (permalink)
    0
    I did not have the structures in place when I first started on this.
     
    I will be adding conditions soon:
    IF THEN ELSE
     
    It will be a simple matter of moving between classes.  Links to quantify structs. can exist in capture classes including ROOT.
     
    ( ( ( a ) {3} ) {5} ) {7} )
     
    for x = 1 to 7
        for y = 1 to 5
           for z = 1 to 3
                 a
           
     
    That looks easy enough except the loop is entered at a
    It took a while getting the logic correct, hardly any Math involved.

     
    Just thinking this out loud.
    Why MC could not release PK4 fully functional on day one and they have Teams.
    post edited by Gort2015 - 2019/10/10 17:14:06

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #6
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 17:18:31 (permalink)
    0
    All collections can be maintained and linked by this structure.
    typedef struct{
        char                   *Target;         // element match or NULL
        const char             *Name;           // name of this object
        unsigned int            TargetIndex;    // container: element match index
        const int               Base;           // object start in sting
        const int               End;            // object end   in sting
        const struct Capture_t *Parent;         // owner
        const unsigned int      Body;           // relative from header
    }Capture_t;


    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #7
    Aussie Susan
    Super Member
    • Total Posts : 3628
    • Reward points : 0
    • Joined: 2008/08/18 22:20:40
    • Location: Melbourne, Australia
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 18:10:11 (permalink)
    +3 (3)
    I've been working with REs for many years (decades???) and the old adage is still true:
    "if you have a problem and think that the solution is a regex then you have 2 problems"
    Susan
    #8
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/10 18:44:09 (permalink)
    0
    The pre-processor creates the capture classes before the engine runs.
     
    There's C code to produce the formatted output and the ultra small pre-processor can be used for many other tasks.
     
    Entering a regex expression into the pre-processor renders the class objects and the output can be formatted without even using regex.
     
    46 instructions without names.
    64 instructions when using names.
     
    Process keywords with regex:
     
    "for (int x =0 ; x < 10 ; x ++) {"
    capture:
    keyword: for
           type:  int
           begin: 0
           end: <10
           step: 1
           compound: true
     
    OR:
     
    mov x, y
     
    mnemonic: mov
         src: Ws
         dst: Wd
     
    And further an opcode can be produced by setting the correct bits via an regex extension.
    0111 1www wBhh hddd dggg ssss
     
    You could even convert 6502 to 16bit code.
     
     
    I needed to find a word that occurred in many places including comments but I only wanted to see that word in code only, handy now that regex is part of mplab x rather than searching all files in project properties.    That one:     .*
     
     
     
     
     

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #9
    pcbbc
    Super Member
    • Total Posts : 1373
    • Reward points : 0
    • Joined: 2014/03/27 07:04:41
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/11 06:16:33 (permalink)
    +1 (1)
    Seems to be you've just introduced a whole load of ambiguities around white space.  Not only in the indentation in the pseudo-XML layout, but also in a regex like "[A TO Z]" which then conflicts with the expansion for "[A-Z]".
     
    Also where's the consistency in handling {} quantifiers? Sometimes as a postfix to the tag as in <Sky> ;7 to 11, and sometimes just unmodified as in Klat{2}u
     
    But there you go. Just my 2 pence worth.
    #10
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 3315
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: offline
    Re: Regular Expression Engine 2019/10/11 09:28:48 (permalink)
    0
    It is not xml and no consistency, it is test output after running the pre-processor (not regex), one of my test strings.
    '<' and '>' highlights the name if I want to view the capture classes in tree view.
    If a class has a quantifier, it creates a comment with atleast and atmost quantifiers.
     
    I run tests like these with regex and compare the results with https://regex101.com/r/cO8lqs/2
     
    A | B | C | (? 'Choice' (? 'Scifi' Discovery | The_Orville ) | ( Z_Nation | TWD ) | (? 'Fantasy' GOT | Vikings ) )
     
    Vikings matches, ROOT, Choice and Fantasy.

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #11
    Jump to:
    © 2019 APG vNext Commercial Version 4.5