     The CRM114 Quick Reference Card.  Updated 20090201

     Copyright W.S. Yerazunis, 2002-2009.  All rights reserved. This 
     software is released under V2.1 of the Gnu Public License. Go to 
     www.fsf.org to get a complete copy of the license.

     This is the CRM114 Language Quick Reference.  For information on the 
     mailfilter, see the CRM114_Mailfilter_HOWTO.

-----  THE COMMAND LINE -------------

Invoke as 'crm whatever' or use '#!/usr/bin/crm' as the first line of a 
script file containing the program text.

Command Line Options:

 -{statements}  - execute the statements inside the {} brackets

 -b N           - sets a breakpoint on statement N

 -d N           - run N cycles, then drop into debugger. If no N, debug 
                  immediately

 -e             - no environment variables imported

 -E N           - set engine runtime exit base value N

 -h             - print help text

 -H N           - select hash function N - handle this with the utmost
                  care! (default=0)

 -l N           - print a listing (detail level 1 through 5)

 -m N           - max number of microgroomed buckets in a chain

 -M N           - max chain length - triggers microgrooming if enabled

 -p             - generate an execution-time-spent profile on exit

 -P N           - max program lines @ 128 chars/line

 -q N           - math mode (0,1 alg/RPN in EVAL,2,3 alg/RPN everywhere)

 -r N           - set OSBF min pmax/pmin ratio (default=9)

 -s N           - new sparse spectra feature file (.css) size is N
                  (default 1 meg+1 featureslots)

 -S N           - new feature file (.css) size is N rounded up to 2^I+1 
                  featureslots

 -C             - use environment locale (default POSIX)

 -t             - give user level execution trace output

 -T             - implementers trace output (only for the masochistic!)

 -u dir         - chdir to directory dir before starting execution

 -v             - print CRM114 version identification and exit

 -w N           - max data window (bytes, default 16 megs)

 --             - signals the end CRM114 flags; prior flags are not seen by 
                  the user program; subsequent args are not processed by 
                  CRM114.

 --foo          - creates the user variable :foo: with the value SET

 --x=y          - creates the user variable :x: with the value y

 -in file       - use file instead of stdin for input. Note that '-in' may 
                  specify the standard handle value '0' for stdin. Accepted 
                  as 'stdin' representatives:

                    0
                    -
                    stdin
                    /dev/stdin
                    CON:
                    /dev/tty

                  NOT allowed are the stdout/stderr handle values 1 and 2: 
                  crm114 will report an error when you do.

 -out file      - use file instead of stdout for output. See '-err' for ways 
                  to specify stdout/stderr.

 -err file      - use file instead of stderr for output. Note that '- out' 
                  may use the same file as '-err'. Note also that '-out' and 
                  '-err' may specify the standard handle values '1' for 
                  stdout and '2' for stderr. This implies that '-err 1' is 
                  essentially identical to the UNIX shell '2>&1' 
                  redirection, though without the buffer delays that would 
                  otherwise occur when you mix stdout and stderr output to a 
                  single channel.

                  Accepted as default stdio channel (stdout for -out, stderr 
                  for -err) representatives:

                    -
                    CON:
                    /dev/tty

                  Accepted as EXPLICIT 'stdout' representatives:

                    1
                    stdout
                    /dev/stdout

                  and likewise for explicit stderr:

                    stderr
                    /dev/stderr

                  NOT allowed is the stdin handle value 0: crm114 will 
                  report an error when you do.

 -Cdbg          - direct developer support: trigger the C/IDE debugger when 
                  an internal error is hit.

                  WARNING: only available when 'crm114 -v' reports crm114 
                           was built with assertions enabled in the code.

Absent the -{ program } flag, the first arg is taken to be the name of a 
file containing a crm114 program, subsequent args are merely supplied as 
:_argN: values.  Use single quotes around command line programs '-{ like 
this }' to prevent the shell from doing odd things to your command-line 
programs.

CRM114 can be directly invoked by the shell if the first line of your 
program file uses the shell standard, as in:

        #! /usr/bin/crm

You can use CRM114 flags on the shell-standard invocation line, and hide 
them with '--' from the program itself; '--' incidentally prevents the 
invoking user from changing any CRM114 invocation flags.

Flags should be located after any positional variables on the command line. 
Flags _are_ visible as :_argN: variables, so you can create your own flags 
for your own programs (separate CRM114 and user flags with '--').

Examples:

   ./foo.crm bar mugga < baz  -t -w 150000      <--- Use this

   ./foo.crm -t -w 1500000 -- bar < baz mugga   <--- or this

   ./foo.crm -t -w 150000 bar < baz mugga       <--- NOT like this


You can put a list of user-settable vars on the '#!/usr/bin/crm' invocation 
line.  CRM114 will print these out when a program is invoked directly (e.g. 
"./myprog.crm -h", not "crm myprog.crm -h") with the -h (for help) flag. 
(note that this works ONLY on Linux and Darwin - FreeBSD and Solaris have a 
different implementations and this doesn't work.  Don't use this in programs 
that need to be portable)

Example:

#!/usr/bin/crm  -( var1 var2=A var2=B var2=C )

                        - allows only var1 and var2 be set on the command 
                          line.  If a variable is not assigned any value 
                          desired. If the variable is equated to a set of 
                          values, those are the _only_ values allowed.

#!/usr/bin/crm  -( var1 var2=foo )  --

                        - allows var1 to be set to any value, var2 may only 
                          be set to either "foo" or not at all, and no other 
                          variables may be set nor may invocation flags be 
                          changed (because of the trailing "--").  Since "--
                          " also blocks '-h' for help, such programs should 
                          provide their own help facility.

----- VARIABLES ----------

Variable names and locations start with a : , end with a : , and may contain 
only characters that have ink (i.e. the [:graph:] class) with a few 
exceptions- basically, no embedded ':' characters).  They are case 
sensitive.

Examples :here: , :ThErE:, :every-where_0123+45%6789: , 
:this_is_a_very_very_long_var_name_that_does_not_tell_us_much: .

Builtin variables:

          :_nl:                - newline

          :_ht:                - horizontal tab

          :_bs:                - backspace

          :_sl:                - a slash

          :_sc:                - a semicolon

          :_arg0: thru :_argN: - command-line args, including _all_ flags

          :_argc:              - how many command line arguments there were

          :_pos0: thru :_posN: - positional args ('-' or '--' args deleted)

          :_posc:              - how many positional arguments there were

          :_pos_str:           - all positional arguments concatenated

          :_env_whatever:      - environment value 'whatever'

          :_env_string:        - all environmental arguments concatenated

          :_crm_version:       - the version of the CRM system

          :_cd:                - the current call depth

          :_cs:                - the current statement number

          :_pgm_hash:          - hash of the current program - for version 
                                 verification

          :_pgm_text:          - copy of post-processed source code -
                                 matchable

          :_pid:               - process ID of the current process.

          :_ppid:              - process ID of the parent of the current 
                                 process.

          :_dw:                - the current data window contents (usually 
                                 the default arg)

          :_iso:               - the current isolated data block (change at 
                                 your own peril!)

          :_?:                 - watchable variable: last (trappable) 
                                 failure report message; ONLY AVAILABLE when 
                                 the debugger will be used (-d commandline 
                                 switch / 'debug' statement in script)

----  VARIABLE EXPANSION  ----

You can use the standard C char constant '\' characters, such as "\n" for 
newline, as well as escaped hexadecimal and octal characters like \xHH and 
\oOOO but these are constants, not variables, and cannot be redefined.

Variables are expanded by the ':*:' var-expansion operator, e.g. :*:_nl: 
expands to a newline character.  Uninitialized vars evaluate to their text 
name (and the colons stay).  User variables are also expanded with the :*: 
operator, so :*:foo: expands to whatever value :foo: has.

Variables are indirected by the :+: indirection operator; the reason for the 
:+: operator is that if :foo: contains the name of another variable (such as 
might happen in a CALL statement), then :*: would only return the name of 
that other variable, but :+: would return the value in that other variable. 
Use :+: and :*:_cd: to get proper isolation in non-tail-recursive variables, 
like :+:foo_:*:_cd:: to get the value of a recursively labeled foo_0, foo_1, 
foo_2, etc.

Depending on the value of "math mode" (flag -q). you can also use 
:#:string_or_var: to get the length of a string, and :@:string_or_var: to do 
basic mathematics and inequality testing, either only in EVALs or for all 
var-expanded expressions.  See "Sequence of Evaluation" below for more 
details.


-----  PROGRAM BEHAVIOR  ----

Default behavior is to read all of standard input till EOF into the default 
data window (named :_dw:), then execute the program (this is overridden if 
first executable statement is a WINDOW statement).

Variables don't get their own storage unless you ISOLATE them (see below), 
instead variables are start/length pairs indexing into the default data 
window.  Thus, ALTERing an unISOLATEd variable changes the value of the 
default data buffer itself.  This is a great power, so use it only for good, 
and never for evil.



--- STATEMENTS AND STUFF (separate statements with a ';' or with a newline) --

 \      - '\' is the string-text escape character.  You only _need_ to 
           escape the literal representation of closing delimiters inside 
           var-expanded arguments.

           You can use the classic C/C++ \-escapes, such as \n, \r, \t, \a, 
           \b, \v, \f, \0, for the ASCII-defined escape sequences, and also 
           \xHH and \oOOO for hex and octal characters, respectively.

           A '\' as the _last_ character of a line means the next line is 
           just a continuation of this one.

           A \-escape that isn't recognized as something special isn't an 
           error; you may _optionally_ escape any of these delimiters:

                         > ) ] } ; / # \

           and get just that character.

           A '\' anywhere else is just a literal backslash, so the regex 
           ([abc])\1 is written just that way; there is no need to double-
           backslash the \1 (although it will work if you do).  This is 
           because the first backslash escapes the second backslash, so only 
           one backslash is "seen" at runtime.


# this is a comment
# and this too \#                 - A comment is not a piece of preprocessor 
                                    sugar - it is a -statement- and ends at 
                                    the newline or at "\#"

                                    However, comments _can_ be added to lead 
                                    or trail other commands as in-line 
                                    documentation without the need to use
                                    a ';' semicolon to separate the statements,
                                    e.g.:

                                      alius   # comment for the alius action here
                                      # leading comment \#    output /hello/


insert filename
insert [expanded_filename]
                                  - inserts the file verbatim at this line 
                                    at compile time.  If the file can't be 
                                    INSERTed, a system-generated FAULT 
                                    statement is inserted.  Use a TRAP to 
                                    catch this fault if you want to allow 
                                    program execution to continue without 
                                    the missing INSERT file.

       filename                   - the local (-u applied) file to insert

       [expanded_filename]        - the filename is first expanded against 
                                    command-line and environment variables.


;                                 - semicolon is a statement separator -
                                    unless it's inside delimiters it must be 
                                    escaped as \; or else it _will_ mark the 
                                    end of the statement.


{  }                              - start and end blocks of statements. Must 
                                    always be '\' escaped or inside 
                                    delimiters or these will mark the 
                                    start/end of a block.


noop                              - no-op statement


:label:                           - define a GOTOable label

:label: (:arg:)                   - define a CALLable label.  The args in 
                                    the CALL statement are concatenated and 
                                    put into the freshly ISOLATEd var :arg:

:label: (:arg1:) (:arg2:)         - define a CALLable label.  The args are               /* EXPERIMENTAL */
                                    produced by the CALL or SORT statement and
                                    put into the freshly ISOLATEd var :arg1:
                                    and :arg2:.
                                    VERY EXPERIMENTAL!!!

   (:arg:)                        - var-expanded varname to receive the 
                                    caller's arguments (usually a MATCH is 
                                    then done to put locally convenient 
                                    labels on the args).


accept                            - writes the current data window to 
                                    standard output; execution continues.


alius                             - if the last bracket-group succeeded, 
                                    ALIUS skips to end of the outer {} block 
                                    (a skip, not a FAIL); if the prior group 
                                    FAILed, ALIUS does nothing.  Thus, ALIUS 
                                    is both an ELSE clause and a CASE 
                                    statement.

                                    Note Bene: ALIUS skips to the "}" end of
                                               THE SMALLEST {} block ENCLOSING
                                               THE ALIUS statement.


alter (:var:) /new-val/           - surgically change value of var to new-val

      (:var:)                     - var to change (var-expanded)

              /new-val/           - value to change to (var-expanded)


call /:entrypoint_label:/
call /:entrypoint_label:/ [:arg1: :arg2:... ]
call /:entrypoint_label:/ [:arg1: :arg2:... ] (:ret_arg:)
call /:entrypoint_label:/ [:arg1: :arg2:... ] [:2nd_arg1: :2nd_arg2:... ] (:ret_arg:)   /* EXPERIMENTAL */
                                    VERY EXPERIMENTAL!!!

                                  - do a routine call on the specified (var-
                                    expanded) entrypoint label. Note that 
                                    the called routine shares all variables 
                                    (including the data window :_dw:). 
                                    Return is accomplished with the RETURN 
                                    statement.

     /:entrypoint_label:/         - the location to call

       [:arg1: :arg2: ...]        - var-expanded list of args to call. These 
                                    are concatenated and supplied to the 
                                    called routine as a single ISOLATEd var, 
                                    to be used as desired (usually a MATCH 
                                    parses the arglist as desired, then :*: 
                                    is used for call-by-value arguments, and 
                                    :+: indirection is used to retrieve 
                                    call-by-name arguments).  Call-by-value 
                                    arguments are NOT modifiable by the 
                                    callee, while call-by-name arguments are 
                                    modifiable.

        (:ret_arg:)               - this variable gets the returned value 
                                    from the routine called (if it returns 
                                    anything).  If it had a previous value, 
                                    that value is overwritten on return.


classify <flags> (:c1:...|...:cN:) (:stats:) [:in:] /word-pat/
classify <flags> (:c1:...|...:cN:) (:stats:) [:in:] /word-pat/ /pR_offset/
classify <flags> (:c1:...|...:cN:) (:stats:) [:in:] /word-pat/ /svm-specific controls/
                                  - compare the statistics of the current 
                                    data window buffer with classfiles 
                                    c1...cN . In general, class statistics 
                                    files are NOT portable between different 
                                    classifiers!

      <nocase>                    - ignore case in word-pat, does not ignore 
                                    case in actual text (use tr() or the 
                                    TRANSLATE command to do that on :in: if 
                                    you want it)

      <microgroom>                - enable the microgroomer to purge less-
                                    important information automatically 
                                    whenever the statistics file gets to 
                                    crowded.  However, this disables certain 
                                    optimizations that can speed 
                                    classification.

      <unique>                    - use unique features only; this improves 
                                    accuracy while using less memory. Usable 
                                    with Markov and OSB modes.

      <unigram>                   - use single-word features only;  his 
                                    makes CRM114 almost exactly equivalent 
                                    to most other Bayesian classifiers. 
                                    Works with the OSB, Winnow and 
                                    hyperspace classifiers.

      <osb>                       - use orthogonal sparse bigram (OSB) 
                                    features and Markovian classification 
                                    instead of Markovian SBPH features. OSB 
                                    uses a subset of SBPH features with 
                                    about 1/4 the memory and disk needs, and 
                                    about 4x the speed of full Markovian, 
                                    with basically the same accuracy.

      <osbf>                      - use the Fidelis Confidence Factor local 
                                    probability generator.  This format is 
                                    not compatible with the default, but 
                                    with single-sided threshold training he 
                                    best accuracy yet.

      <winnow>                    - use the Winnow non-statistical 
                                    classifier and the OSB front-end feature 
                                    generator. Winnow uses .cow files, which 
                                    are not compatible with the .css files 
                                    for the Markovian (default) and OSB 
                                    classifiers.

      <hyperspace>                - use hyperspace matching; each learned 
                                    document represents a light source in a 
                                    4-billion-dimensional hyperspace, and 
                                    the set of sources that shines most 
                                    brightly onto the unknown document's 
                                    hyper-spatial location is the matching 
                                    class.  EXPERIMENTAL!!!

      <entropy>                   - use the bit-entropy classifier.  This 
                                    uses compressibility of the unknown 
                                    given the prior learned text as a 
                                    perfect compressor model.  No 
                                    tokenization happens- this classifier 
                                    works one bit at a time, always. 
                                    EXPERIMENTAL !!!

      <fscm>                      - use the fast substring compression 
                                    matcher. This measures the 
                                    compressibility of an unknown text using 
                                    the known texts as a compression 
                                    dictionary.  Tokenization is used as a 
                                    compressibility filter (default 
                                    tokenization is /./ which makes FSCM 
                                    equivalent to LZ77 with an infinite 
                                    window.

      <svm>                       - use the SVM classifier.  This uses SVM 
                                    (support vector machine) techniques. 
                                    NB: for now VERY EXPERIMENTAL; OSB or 
                                    unigram features (default OSB features), 
                                    2-class only, generates A_vs_B files.

      <sks>                       - use the String Kernel SVM.  String 
                                    kernels take one character at a time as 
                                    token features, but don't use omitted 
                                    subsections like the OSB feature set. 
                                    VERY EXPERIMENTAL. 2-class only, 
                                    generates A_vs_B files.

      <neural>                    - use a three-layer neural network with 
                                    stochastic back-propagation training. 
                                    Use <fromstart> to reinitialize the 
                                    network neurons to a small random state 
                                    in case it gets stuck in a (rare) local 
                                    minimum. Use <bychunk> to update weights 
                                    after each document in a pass (default 
                                    is to update weights only after all 
                                    documents are seen in a pass). VERY 
                                    EXPERIMENTAL!!!

      <correlate>                 - use the full correlative matcher.  Very 
                                    slow, but capable of matching stemmed 
                                    words in any language and of matching 
                                    binary files

               (:c1: ...          - file or files to consider "success" 
                                    files.  The CLASSIFY succeeds if these 
                                    files as a group match best. if not, the 
                                    CLASSIFY does a FAIL.

                     |            - optional separator.  Spaces on each side 
                                    of the " | " are required.

                     .... :cN:)   - optional files to the right of " | " are 
                                    considered as a group to "fail". If 
                                    statement fails, execution skips to end 
                                    of enclosing {..} block, which exits 
                                    with a FAIL status (see ALIUS for why 
                                    this is useful).

                 (:stats:)        - optional var that will be
                                    surgically changed to contain
                                    a formatted matching summary. In
                                    some versions, must pre-exist.

                   [:in:]         - restrict statistical measure to the 
                                    string inside :in:

                   [:in: n m]     - take a substring of :in:, starting at n 
                                    and including m characters

                   [:in: /regex/] - take a substring of :in: that matches 
                                    the regex

                     /word-pat/   - regex to describe what a parse-able word 
                                    is. Default is /[[:graph:]]+/

                      /pR_offset/ - OSBF: change the classify threshold; 
                                    with this optional parameter the 
                                    success/failure decision point can be 
                                    changed from the default 0 to what you 
                                    specify. If given, the pR in 'stats' 
                                    will be printed in the form 
                                    pR/pR_offset.

                     /svm-specific controls/
                                  - a vector of seven parameters for SVM-
                                    classifiers


clump [:text:] (clumpfile) (status) <flags> /regex/ /params/
                                  - does incremental parametric clustering 
                                    of documents to generate document 
                                    groups.  No pre-judged corpus is 
                                    required.

      [:text:]                    - input text; var-restriction allowed

      (clumpfile)                 - name of file to hold the clumps (all 
                                    docs go into the same clumpfile)

      (status)                    - Status output, for the result of the 
                                    Clumping the null input text will give
                                    a status dump of all the documents in
                                    the entire clumpfile.

      <flags>                     - special control flags; unigram, unique, 
                                    and refute are supported, with the same 
                                    meanings as in LEARN and CLASSIFY. 
                                    Default clustering is by document-to-
                                    document nearest-neighbor hyper-spatial 
                                    distance. If you add the bychunk flag, 
                                    then the distance is to the cluster's 
                                    centroid.

      /regex/                     - optional tokenization regex; default is 
                                    /[[:graph:]]+/

      /params/                    - control parameters: "tag=somename" label 
                                    to later refer to this document. 
                                    "clump=somename" forces a name onto a 
                                    cluster.  "n_clusters=N" says how many 
                                    doc clusters you want; if N=0 then it 
                                    will simply store the document and wait 
                                    for more (much faster computationally). 
                                    If N < 0 the number of clusters is 
                                    determined automatically.


cssanalyze <flags> (:c1:) (:report:) /params/
                                  - analyze the CRM database :c1: and report 
                                    into :report:. The analysis may include 
                                    extensive integrity checks.

      <flags>                     - special control flags; default and basic
                                    are supported.

      /params/                    - control parameters a la cssutil.

                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external cssutil 
                                    tool in due time.


cssbackup <flags> (:dst: :c1:) (:report:) /params/
                                  - export/backup the CRM database :c1: into 
                                    file :dst:. The output format is CSV. 
                                    Messages are written to :report:.

      <flags>                     - special control flags; default is
                                    supported.

      /params/                    - control parameters a la cssutil.

                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external cssutil 
                                    tool in due time.


csscreate <flags> (:c1:) (:report:) /params/
                                  - create a new CRM database :c1: as 
                                    specified by /params/. This command can 
                                    be used to create new/empty CRM 
                                    databases for any classifier. Messages 
                                    are written to :report:.

      <flags>                     - special control flags; default is
                                    supported.

      /params/                    - control parameters a la cssutil.

                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external cssutil 
                                    tool in due time.


cssdiff <flags> (:c1: :c2:) (:report:) /params/
                                  - report differences between the CRM 
                                    databases :c1: and :c2: into :report:.

      <flags>                     - special control flags; default and 
                                    unique are supported.

      /params/                    - control parameters a la cssmerge.

                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external cssdiff 
                                    tool in due time.


cssinfo <flags> (:c1:) (:report:) /params/
                                  - list information about the CRM database 
                                    :c1: into :report:.

      <flags>                     - special control flags; default is 
                                    supported.

      /params/                    - control parameters a la cssutil.

                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external cssutil 
                                    tool in due time.


cssmerge <flags> (:destfile: :c1:...:cN:) (:report:) /params/
                                  - merge the CRM databases :c1: ... :cN: 
                                    into destfile. Messages are written to 
                                    :report:.

      <flags>                     - special control flags; default, unique, 
                                    and microgroom are supported.

      /params/                    - control parameters a la cssmerge.

                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external 
                                    cssmerge tool in due time.


cssmigrate [:src:] (:dst:) (:report:) /params/
                                  - migrate the CRM database :src: to the 
                                    new :dst: database and report into 
                                    :report:. This command is built in to 
                                    provide a relatively easy upgrade path 
                                    for those who wish to keep their old CSS 
                                    databases whenever possible.

      /params/                    - control parameters.

                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external cssutil 
                                    tool in due time.


cssrestore <flags> (:c1: :src:) (:report:) /params/
                                  - import/restore the CRM database :c1: 
                                    from file :src:. The input format is 
                                    CSV. Messages are written to :report:.

      <flags>                     - special control flags; default is 
                                    supported.

      /params/                    - control parameters a la cssutil.


                                    WARNING: THIS IS A PLANNED COMMAND, 
                                    which will obsolete the external cssutil 
                                    tool in due time.


debug                             - drop immediately into the interactive 
                                    debugger.


eval (:result:) /instring/        - repeatedly evaluates /instring/ until it 
                                    ceases to change, then surgically places 
                                    that result as the value of :result: . 
                                    EVAL uses smart (but foolable) 
                                    heuristics to avoid infinite loops, like 
                                    evaluating a string that evaluates to a 
                                    request to evaluate itself again.  The 
                                    error rate is about 1 / 2^62 and (in the 
                                    default configuration) will detect 
                                    looping chain groups of length 4096 or 
                                    less.

                                    If the instring uses math evaluation 
                                    (see section below on math operations) 
                                    and the evaluation has an inequality 
                                    test, (>, >=, <, <=, =, or !=) then if 
                                    the test fails, the EVAL will FAIL to 
                                    the end of block.  Math is IEEE-
                                    compliant, so unreasonable things like 
                                    divide-by-zero may yield NaN (Not A 
                                    Number) or +/- INF


exit  /:exitcode:/                - ends program execution.  If supplied, 
                                    the return value is converted to an 
                                    integer and returned as the exit code of 
                                    the crm114 program.

      /:exitcode:/                - variable to be converted to an integer 
                                    and returned.  If no exit code is 
                                    supplied, the exit code value is 0.


fail                              - skips down to end of the current { } 
                                    block and causes that block to exit with 
                                    a FAIL status (see ALIUS for why this is 
                                    useful)


fault /faultstr/                  - forces a FAULT with the given string as 
                                    the reason.

      /faultstr/                  - the val-expanded fault reason string


goto /:label:/                    - unconditional branch (you can use a 
                                    variable as the goal, e.g. /:*:there:/ )


hash (:result:) /input/           - compute a fast 32-bit hash of the 
                                    /input/, and ALTER :result: to the 
                                    hexadecimal hash value.  HASH is _not_ 
                                    warranted to be constant across major 
                                    releases of CRM114, nor is it 
                                    cryptographically secure.

                                    Contrary to 'vanilla' CRM114, GerH 
                                    hashes /are/ consistent across platform 
                                    (UNIX/Windows/etc.; 32 vs. 64 bit)

     (:result:)                   - value that gets result.

               /input/            - string to be hashed (can contain 
                                    expanded :vars: , defaults to the data 
                                    window :_dw: )


input <flags> [:filename:]
input <flags> (:result:) [:filename:]
input <flags> (:result:) [:filename: offset len]
                                  - read in the content of filename if no 
                                    filename, then read stdin

    <byline>                      - read one line only

    <readline>                    - read one line only, using the history-
                                    aware readline library

     (:result:)                   - var that gets the input value (surgical 
                                    overwrite). When no :result: variable 
                                    has been specified, the default :_dw: 
                                    input window is assumed.

      [:filename:]                - the file to read.  The first blank-
                                    delimited word is taken and var-
                                    expanded; the result is the filename, 
                                    even if it includes embedded spaces. 
                                    Default is to read stdin.

       [:filename: offset len]    - optionally, move to offset in the file, 
                                    and read len bytes. Offset and len are 
                                    individually blank-delimited, and var-
                                    expanded with mathematics enabled.  If 
                                    len is unspecified, the read extends to 
                                    EOF or buffer limit.


intersect (:out:) [:var1: :var2: ...]
                                  - makes :out: contain the part of the data 
                                    window that is the intersection of :var1 
                                    :var2: ... ISOLATEd vars are ignored. 
                                    This only resets the value of the 
                                    captured :out: variable, and does NOT 
                                    alter any text in the data window.


isolate (:var:)
isolate (:var:) [/initial-value]/
isolate (:var:) <flags> [/initial-value]/
isolate (:var1: :var2: ... :varN:)
                                  - puts :var: into a data area outside of 
                                    the default data window buffer; 
                                    subsequent changes to this var don't 
                                    change the data buffer (though they may 
                                    change the value of any var subsequently 
                                    set inside of this var). If the var 
                                    already was ISOLATED, this is will stay 
                                    isolated but it will surgically alter 
                                    the value if a /value/ is given.

       <default>                  - only create and set var if it didn't 
                                    exist before (ideal for setting 
                                    defaults)

           (:var:)                - name of ISOLATEd var (var-expanded)

              [/initial-value]/   - optional initial value for :var: (var-
                                    expanded); use either [] or // to 
                                    enclose the value.  If no value is 
                                    supplied, the previous value is 
                                    retained/copied. If there is no previous 
                                    value and no value is supplied, d.


lazy                              - WARNING: reserved for future use


learn <flags> (:class:) [:in:] /word-pat/
learn <flags> (:class:) [:in:] /word-pat/ /entropy_fuzz/
learn <flags> (:class:) [:in:] /word-pat/ /svm-specific controls/
                                  - learn the statistics of the :in: var (or 
                                    the input window if no var) as an 
                                    example of class :class:

      <refute>                    - flag this as an anti-example of this 
                                    class -- unlearn it!

      <nocase>                    - ignore case in word-pat, does not ignore 
                                    case in actual text (use tr() or the 
                                    TRANSLATE command to do that on :in: if 
                                    you want it)

      <microgroom>                - enable the microgroomer to purge less-
                                    important information automatically 
                                    whenever the statistics file gets to 
                                    crowded.  However, this disables other 
                                    optimizations that can speed up

      <osb>                       - use orthogonal sparse bigram (OSB) 
                                    features and Markovian classification 
                                    instead of Markovian SBPH features. OSB 
                                    uses a subset of SBPH featuers with 
                                    about 1/4 the memory and disk needs, and 
                                    about 4x the speed of full Markovian,

      <osbf>                      - use the Fidelis Confidence Factor local 
                                    probability generator.  This format is 
                                    not compatible with the default, but 
                                    with single-sided threshold training
                                    ( typically pR of 10-30 ) achieves the best
                                    accuracy yet.

      <winnow>                    - use the Winnow non-statistical 
                                    classifier and the OSB front-end feature 
                                    generator. Winnow uses .cow files, which 
                                    are not compatible with the .css files 
                                    for the Markovian (default) and OSB 
                                    classifiers. Remember that for Winnow to 
                                    be at it's best in accuracy, it has to 
                                    be trained both with positive cases that 
                                    failed to make a minimum threshold 
                                    (typically with a per-file (not overall) 
                                    match quality that was below
                                    a pR of .2 or more) as well as <refute> for
                                    "negative reinforcement" training for 
                                    any "not in class" per-file match 
                                    qualities that weren't at a pR of -.2 or 
                                    less.)

      <hyperspace>                - use hyperspace matching; each learned 
                                    document represents a light source in a 
                                    4-billion-dimensional hyperspace, and 
                                    the set of sources that shines most 
                                    brightly onto the unknown document's 
                                    hyper-spatial location is the matching 
                                    class.  EXPERIMENTAL!!!

      <unigram>                   - use single-word features only; using 
                                    this this makes CRM114 almost exactly 
                                    equivalent to most other Bayesian 
                                    classifiers.  Also works with the Winnow 
                                    and hyperspace classifiers.

      <entropy>                   - use the bit-entropy classifier.  This 
                                    uses compressibility of the unknown 
                                    given the prior learned text as a 
                                    perfect compressor model.  No 
                                    tokenization happens- this classifier 
                                    works one bit at a time.  The tokenizer 
                                    regex is ignored; the second // argument 
                                    can hold an optional "fuzz factor" for 
                                    how close an approximation is allowed.

      <fscm>                      - use the fast substring compression 
                                    matcher. This measures the 
                                    compressibility of an unknown text using 
                                    the known texts as a compression 
                                    dictionary.  Tokenization is used as a 
                                    compressibility filter (default 
                                    tokenization is /./ which makes FSCM 
                                    equivalent to LZ77 with an infinite 
                                    window.

      <svm>                       - use the SVM classifier.  This uses SVM 
                                    (support vector machine) techniques. 
                                    NB: for now VERY EXPERIMENTAL; OSB or 
                                    unigram features (default OSB features), 
                                    2-class only, generates A_vs_B files.

      <sks>                       - use the String Kernel SVM.  String 
                                    kernels take one character at a time as 
                                    token features, but don't use omitted 
                                    subsections like the OSB feature set. 
                                    VERY EXPERIMENTAL. 2-class only, 
                                    generates A_vs_B files.

      <neural>                    - use a three-layer neural network with 
                                    stochastic back-propagation training. 
                                    VERY EXPERIMENTAL!!!

      <correlate>                 - use the full correlative matcher.  Very 
                                    slow, but capable of matching stemmed 
                                    words in any language and of matching 
                                    binary files. Correlative matching does 
                                    not tokenize, and so you don't need to 
                                    supply it with a word-pat.

              (:class:)           - name of file holding hashed results; 
                                    nominal file extension is .css

                 [:in:]           - captured var containing the text to be 
                                    learned (if omitted, the full contents 
                                    of the data window is used)

                 [:in: n m]       - take a substring of :in:, starting at n 
                                    and including m characters

                 [:in: /regex/]   - take a substring of :in: that matches 
                                    the regex

    /word-pat/                    - regex that defines a "word".  Things 
                                    that aren't "words" are ignored. Default 
                                    is /[[:graph:]]+/.  Ignored in 
                                    correlation and bit-entropy.

    /entropy_fuzz/                  Bit-entropy: this number is the "fuzz" 
                                    factor in determining when to loop back 
                                    the compression algorithm Markov chain 
                                    versus allocating new nodes.  You must 
                                    specify an empty word-pat to use entropy 
                                    fuzz.

    /svm-specific controls/       - a vector of seven parameters for SVM-
                                    classifiers


liaf                              - skips UP to START of the current {} 
                                    block (LIAF is FAIL spelled backwards)


match <flags> /regex/
match <flags> [:in:] /regex/
match <flags> [:in: start len] /regex/
match <flags> [:in: /inregex/] /regex/
match <flags> (:var1: ...) [:in:] /regex/
match <flags> (:var1: ...) [:in: start len] /regex/
match <flags> (:var1: ...) [:in: /inregex/] /regex/
                                  - Attempt to match the given regex; if 
                                    match succeeds, variables are bound; if 
                                    match fails, program skips to the 
                                    closing '}' of this block

      <absent>                    - statement succeeds if match not present

      <nocase>                    - ignore case when matching

      <literal>                   - No special characters in regex (only 
                                    supported with TREregex, not GNUregex.) 
                                    Think of this as WYSIWYG matching.

      <fromstart>                 - start match at start of the [:in:] var

      <fromcurrent>               - start match at start of previous 
                                    successful match on the [:in:] var

      <fromnext>                  - start match at one character past the 
                                    start of the previous successful match 
                                    on the [:in:] var

      <fromend>                   - start match at one character past the 
                                    end of prev. match on this [:in:] var

      <newend>                    - require match to end after end of prev. 
                                    match on this [:in:] var

      <backwards>                 - search backward in the [:in:] variable 
                                    from the last successful match.

      <nomultiline>               - execute the search in blocks of one line 
                                    of text each, so the result will never 
                                    span a line.  This means that
                                    ^ and $ will match at the beginning
                                    and end of each line, rather than
                                    the beginning and end of the full text.

         (:var1: ...)             - optional result vars.  The first var 
                                    gets the text matched by the full regex. 
                                    The second, third, etc. vars get each 
                                    subsequent parenthesized sub-expression, 
                                    in left-to-right order of the sub-
                                    expression's left parenthesis. These are 
                                    "captures", not ALTERs, so text 
                                    overlapping prior :var: values is left 
                                    unchanged.

         [:in:]                   - search only in the variable specified; 
                                    if omitted, :_dw: (the full input data 
                                    window) is used

         [:in: start len]         - search in the :in: input var, limiting 
                                    the area searched to start to len (zero-
                                    origin counted)

         [:in: /inregex/ ]        - search in the :in: input var, limiting 
                                    the searched area to whatever matches 
                                    the inregex (this doesn't use or affect 
                                    previous successful match values) If the 
                                    /inregex/ contain subregexes, the last 
                                    subregex will be used to produce the 
                                    limited :in: content to be matched 
                                    against.

                /regex/           - POSIX regex (with \ escapes as needed)

          NB: If you build CRM114 to use the GNU regex library for MATCHing, 
              be warned that GNU REGEX has numerous issues.  See the 
              KNOWN_BUGS file for a detailed listing.


mutate <flags> (:dest:) [:src:] /from args/ /to args/
                                  - WARNING: reserved for future use


output <flags> [filename] /output-text/
                                  - output an arbitrary string with captured 
                                    values expanded.

       <append>                   - append to the file (otherwise, the 
                                    previous contents of the file is lost).

         [:filename:]             - the file to write. The first blank-
                                    delimited word is taken and var-
                                    expanded; the result is the filename, 
                                    even if it includes embedded spaces. 
                                    Default output is to stdout.  stderr is 
                                    recognized.

         [:filename: offset len]  - optionally, move to offset in the file, 
                                    and maximum write len bytes. Offset and 
                                    len are individually blank-delimited, 
                                    and var-expanded with mathematics 
                                    enabled.  If len is unspecified, the 
                                    write is the length of the expansion of 
                                    /output-text/

              /output-text/       - string to output (var-expanded)


pmulc (clumpfile) [:text:] <flags> /regex/
                                  - use the clumpfile as a look-up to 
                                    translate documents to their appropriate 
                                    clusters.  The text does not get added 
                                    into the clumpfile.

      [:text:]                    - input text; var-restriction allowed.

      (clumpfile)                 - name of file to holding the clumps

      /regex/                     - optional tokenization regex; default is
                                    /[[:graph:]]+/

      <flags>                     - The optional flags are bychunk, unique, 
                                    and unigram, with the same functions as 
                                    under clump.


return /returnval/                - return from a CALL.  Note that since 
                                    CALL executes in shared space with the 
                                    caller, all changes made in the CALLed 
                                    routine are shared with the caller.

      /returnval/                 - this (var-expanded) value is returned to 
                                    the caller (or if the caller doesn't 
                                    accept return values, it's discarded).


routine /:label:/ (:arg:)         - WARNING: reserved for future use

                                    This is similar to a regular :label: in 
                                    the code (and can be called by the 
                                    'call' statement), with one major 
                                    difference: all variables which are 
                                    _created_ in the immediately following 
                                    {}-delimited scope block will NOT exist 
                                    in the parent scope(s) as well, NOR will 
                                    they overwrite variables existing in 
                                    parent scope. In other words: use 
                                    'routine' to instruct CRM114 to create 
                                    and use 'local scope' variables while 
                                    _inside_ the 'routine' {}-delimited code 
                                    scope block. Referencing variables in 
                                    there, which were only created in parent 
                                    scope, will allow editing those 
                                    variables. Without prefixing that 
                                    :label: with the 'routine' command, all 
                                    variables, old and new, will share the 
                                    same (parent) scope (see :label: above). 
                                    Local scope variables will be DISCARDED 
                                    when the routine RETURNs to the caller.

   (:arg:)                        - var-expanded varname to receive the 
                                    caller's arguments (usually a MATCH is 
                                    then done to put locally convenient 
                                    labels on the args).

                                    Note that :arg: is also considered to be 
                                    a new variable in LOCAL scope (despite 
                                    that it is theoretically created outside 
                                    the next {}-delimited scope block) and 
                                    will thus be discarded following the 
                                    next RETURN statement.


sort <flags> (:dest:) [:src:] /line-pat/ /line-merge-text/ /:compare-label:/
                                  - WARNING: reserved for future use

                                    Sort 'lines' (each identified by the 
                                    /line-pat/ regex) in ascending order. 
                                    The optional /:compare-label:/ routine 
                                    will be called to compare individual 
                                    lines while sorting.

  <nocase>                        - ignore case when matching

  <nomultiline>                   - execute the search in blocks of one line 
                                    of text each, so the result will never 
                                    span a line.  This means that
                                    ^ and $ will match at the beginning
                                    and end of each line, rather than
                                    the beginning and end of the full text.

  <unique>                        - repeated copies of the same line in 
                                    :src: are removed, so only a single copy 
                                    remains. Identity of two lines is 
                                    determined by calling the optional 
                                    compare routine pointed at by 
                                    /:compare-label:/.

    [:src:]                       - captured var containing the text to be 
                                    sorted (if omitted, the full contents of 
                                    the data window is used).

    [:src: n m]                   - take a substring of :src:, starting at n 
                                    and including m characters.

    [:src: /regex/]               - take a substring of :src: that matches 
                                    the regex.

      (:dest:)                    - var-expanded varname to receive the 
                                    sorted collection of lines. Lines are 
                                    merged after sorting, by postfixing each 
                                    with the /line-merge-text/ text. (Note 
                                    that this includes the last line: it 
                                    will be postfixed with this text as 
                                    well.)

        /line-pat/                - regex that defines a "line".  Things 
                                    that aren't "lines" are ignored.
                                    Default is /[^\r\n]*/.

          /line-merge-text/       - LITERAL text 'glue' which will be 
                                    appended to each sorted line before 
                                    writing the complete set to :dest:.
                                    Default is /\n/.

          /:compare-label:/       - Optional routine used by 'sort' to 
                                    compare two individual lines. The 
                                    routine is assumed to be defined as 
                                    'routine :label: (:arg1:) (:arg2:)': 
                                    note the use of 'routine' here and the 
                                    two arguments :arg1: and :arg2:. The 
                                    return value produced by the custom 
                                    routine is assumed to be a floating 
                                    point value, where these rules are 
                                    assumed:

                                    :arg2: > :arg1:  ==> return value >= 0.5

                                    :arg2: < :arg1:  ==> return value <= -0.5

                                    :arg2: == :arg1:  ==> return value ~ 0 
                                                          (i.e. within the 
                                                          range <-0.5 .. 
                                                          0.5>)

                                    Note that there is no point in 
                                    specifying the flag <nocase> when also 
                                    providing a /:compare-label:/ custom 
                                    sort routine.

                                    Default compare routine is str[case]cmp(3).


syscall <flags> (:in:) (:out: :err:) (:status:) [/command_or_label]/ [timeout]
                                  - execute a shell command or fork to the 
                                    specified label. This happens in a fresh 
                                    copy of the environment; there is no 
                                    communication with the main program 
                                    except via the :in:, :out:, :err: and 
                                    :status: vars. Output over the buffer 
                                    length is discarded unless you <keep> 
                                    the process around for multiple 
                                    readings.

        <keep>                    - don't send an EOF after feeding the full 
                                    input (this will usually keep the 
                                    syscalled process around).  Later 
                                    syscalls with the same :status: var will 
                                    continue feeding to and reading from the 
                                    kept process.

        <async>                   - don't wait for process to output an EOF; 
                                    just grab what's available in the 
                                    process's output pipe and proceed 
                                    (default limit per syscall is 256 Kb). 
                                    The process then runs to completion 
                                    independently and asynchronously. (This 
                                    is "fire and forget" mode, and is 
                                    mutually exclusive with <keep>. )

        [timeout]                 - only allow the called process to run for 
                                    N seconds. (0 = unlimited)

         (:in:)                   - var-expanded string to feed to command 
                                    as input. Can be null if you don't want 
                                    to send the process something.  You 
                                    _MUST_ specify this set of braces when 
                                    you want to specify an :out:, :err: or 
                                    :status: variable.

          (:out:)                 - var-expanded varname to place results 
                                    into. MUST pre-exist, can be null if you 
                                    don't want to read the process's stdout 
                                    output (yet, or at all).  Limit per 
                                    syscall is 256 Kbytes.  You MUST specify 
                                    this set of braces when you want to use 
                                    the :status: variable.  This is
                                    a surgical alter and will receive the
                                    data written to stdout by the external
                                    application.

          (:err:)                 - var-expanded varname to place results 
                                    into. MUST pre-exist, can be null if you 
                                    don't want to read the process's stderr 
                                    output (yet, or at all).  Limit per 
                                    syscall is 256 Kbytes.  You MUST specify 
                                    this set of braces when you want to use 
                                    the :status: variable.  This is
                                    a surgical alter and will receive the
                                    data written to stderr by the external
                                    application.

           (:status:)             - if you want to keep a minion proc 
                                    around, or catch the exit status of the 
                                    process, specify a varname here. The 
                                    minion process's PID and pipes will be 
                                    stored here.  The program can access the 
                                    proc again with another syscall by using 
                                    this var again. When the process exits, 
                                    it's exit code will be surgically stored 
                                    here (unless you specified <async>).

            [/command_or_label]/  - the command or entrypoint you want to 
                                    run.  This arg is var-expanded; if the 
                                    first word is a :label:, the fork begins 
                                    execution at the label.  If the first 
                                    word is not a :label:, then the entire 
                                    string is handed off to the shell to be 
                                    executed as a shell command.  It can be 
                                    enclosed in either [] or //. This 
                                    argument is optional: when you have 
                                    specified <keep> and are performing a 
                                    second/subsequent syscall, you can 
                                    dispense with the // arg.


translate <flags> (:dest:) [:src:] /from_charset/ /to_charset/
                                  - do a tr()-like translation of 8-bit 
                                    characters in the from_charset to the 
                                    corresponding characters in the 
                                    to_charset.

  <unique>                        - repeated sequential copies of the same 
                                    char in from_charset are replaced by a 
                                    single copy, then translated.

  <literal>                       - from_charset and to_charset are literal, 
                                    no var-expansion, ranging, or inversion 
                                    performed.

  [:src:]                         - source of data.  Can be var-restricted. 
                                    Default is the default data window :_dw:

  (:dest:)                        - destination to put result.  defaults to 
                                    the default data window :_dw:

  /from_charset/                  - var-expanded charset of characters to
                                    be translated from.  Use hyphens for ranges
                                    like a-e meaning 'abcde'.  Reversed ranges
                                    such as e-a, meaning 'edcba', work. (this is
                                    different than tr() !)  Set inversion
                                    as in ^a-z mean all characters that
                                    aren't lower case characters works.
                                    Character duplication is not an error.
                                    To use - as a literal character, make it
                                    the first or last character.  To use ^
                                    as a literal character, make it any but
                                    the first character.  ASCII \-escapes
                                    like \n and \xFF work.

  /to_charset/                    - charset of characters to be translated
                                    to.  Same rules as from_charset; excess
                                    characters are ignored; if not enough
                                    characters are available, start over using
                                    the to_charset characters from the
                                    beginning (this is different than tr().)
                                    If to_charset is not given, then all
                                    chars in from_charset are deleted.


trap (:reason:) /trap_regex/      - traps faults from both FAULT statements 
                                    and program errors occurring anywhere in 
                                    the preceding bracket-block or single 
                                    executable statement. If no fault 
                                    exists, TRAP does a SKIP to end of 
                                    block. If there is a fault and the fault 
                                    reason string matches the trap_regex, 
                                    the fault is trapped, and execution 
                                    continues with the line after the TRAP, 
                                    otherwise the fault is passed up to the 
                                    next surrounding trapped bracket block.

     (:reason:)                   - the fault message that caused this 
                                    FAULT.  If it was a user fault, this is 
                                    the text the user supplied in the FAULT 
                                    statement.  This variable is allocated 
                                    as an ISOLATED variable.

          /trap_regex/            - the regex that determines what kind of 
                                    faults this TRAP will accept.  Putting
                                    a wildcard here (e.g. /.*/ means that
                                    ALL trappable faults will be trapped.


union (:out:) [:var1: :var2: .. .]
                                  - makes :out: contain the union of the 
                                    data window segments that contains var1, 
                                    var2... plus any intervening text as 
                                    well.  Any ISOLATEd var is ignored. 
                                    This is non-surgical, and does not alter 
                                    the data window


window <flags> (:w-var:) (:s-var:) /cut-regex/ /add-regex/
                                  - window slider.
                                    This deletes to and including the cut-
                                    regex from :var: (default: use the data 
                                    window), then reads adds from std. input 
                                    till we find add-regex (inclusive).

       <nocase>                   - ignore case when matching cut- and add-
                                    regexes.

       <bychar>                   - (default) read one char at a time and 
                                    check input for add-regex every 
                                    character, so never reads "too much" 
                                    from stdin.

       <bychunk>                  - reads as much data as available, then 
                                    checks with the regex. (unused 
                                    characters are kept around for later)

       <byeof>                    - wait for EOF to check add-regex. (unused 
                                    characters are kept around for later)

       <eofaccepts>               - accept an EOF as being a successful 
                                    regex match ( default is only a 
                                    successful add-regex matches. CAUTION: 
                                    can cause rapid looping!)

       <eofretry>                 - keep reading past an EOF; reset the 
                                    stream and wait again for more input. 
                                    (default is to FAIL on EOF.  CAUTION: 
                                    this can cause rapid looping!)

            (:w-var:)             - what var to window

               (:s-var:)          - what var to use for source (defaults to 
                                    stdin, if you use a source var you 
                                    _must_ specify the windowed var.)

              /cut-regex/         - var-expanded cut pattern.  Everything up 
                                    to and including this is deleted.

                 /add-regex/      - var-expanded add pattern, if absent 
                                    reads till EOF.  This pattern is a minimal
                                    match pattern, so if the pattern can match
                                    a zero-length string ( say, /.*/ ), this
                                    can yield zero characters added.  Use
                                    a pattern like /.+/ to prevent this.

                            *****   If both cut-regex and add-regex are 
                                    omitted, and this window statement is an 
                                    executable no-op... EXCEPT that if it's 
                                    the _first_ _executable_ statement in 
                                    the program, then the WINDOW statement 
                                    configures CRM114 to _not_ wait to read 
                                    anything from standard input input 
                                    before starting program execution.



     ------------ A Quick Regex Intro ---------

A regex is a pattern match.  Do a "man 7 regex" for details.

Matches are, by default "first starting point that matches, then longest 
match possible that can fit".

  a through z
  A through Z   - all match themselves
  0 through 9

  most punctuation - matches itself, but check below!

  .         the 'period' char, matches any character

  *         repeat preceding 0 or more times

  +         repeat preceding 1 or more times

  ?         repeat preceding 0 or 1 time

  [abcde]    any one of the letters a, b, c, d, or e

  [a-q]      the letters a through q (just one of them)

  [a-eh-mqzt]
             the letters a through e, plus h through m, plus q, z, and t

  [^xyz]     any one letter EXCEPT one of x, y, or z

  [^a-e]     any one letter EXCEPT one of a through e

  {n}        repetition count: match the preceding exactly n times

  {n,}       repetition count: match the preceding at least n times

  {n,m}      repetition count: match the preceding at least n and no more 
             than m times (sadly, POSIX restricts this to a maximum of 255 
             repeats.  Nested repeats like (.{255}){10} will work, but are 
             very very slow).


  [[:<:]]    matches at the start of a word (GNU regex only)
  \<         matches at the start of a word (TRE regex only)

  [[:>:]]    matches the end of a word (GNU regex only)
  \>         matches at the end of a word (TRE regex only)

  ^          As the first character in a match, it matches only at the start 
             of a block; this usually means start of the input variable.  If 
             you use <nomultiline> then each line is it's own block and so ^ 
             means "start of line".

  ^          As the last character in a match, it matches only at the end of 
             a block; this usually means the end of the input variable.  If 
             you use <nomultiline> then each line is it's own block and so $ 
             means "end of line".

  .          (a period) matches any _single_ character (except start-of-line 
             or end of line "virtual characters", but it does match a 
             newline).

  (match)    the () go away, and the string that matched inside is available 
             for capturing.  Use \( and \) to match actual parenthesis.

  a|b        match a _or_ b, such as foo|bar which will match "foo" or "bar" 
             (multiple characters!).  To get a shorter extent of ORing, use 
             parenthesis, e.g. /f(oo|ba)r/ matches "foor" or "fbar", but not 
             foo or bar.

The following are other POSIX expressions, which mostly do what you'd guess 
they'd do from their names.

  [[:alnum:]]   <-- a-z, A-Z and 0-9
  [[:alpha:]]   <-- a-z and A-Z
  [[:blank:]]   <-- space and tab only
  [[:space:]]   <-- "whitespace" (space, tab, vertical tab (^K), \n, \r, ..)
  [[:cntrl:]]   <-- control characters
  [[:digit:]]   <-- 0-9
  [[:lower:]]   <-- lower-case letters a-z
  [[:upper:]]   <-- upper-case letters A-Z
  [[:graph:]]   <-- any character that puts ink on paper or lights a pixel
  [[:print:]]   <-- any character that moves the "print head" or cursor.
  [[:punct:]]   <-- punctuation characters
  [[:xdigit:]]  <-- hex digits 0-9, a-f and A-F


----- The following are only available with the TRE-based versions -----


  *?, +?, ??, {n,m}?
            - repeat the preceding expression 0-or-more, 1-or-more, 0-or-1, 
              or n-to-m times, but _shortest_ match that fits, given the 
              already-selected start point of the regex. This is an "anti-
              greedy" match, unlike the normal match that wants to have the 
              longest possible resulting match

  \N        - where N is 1 through 9 - matches the N'th parenthesized 
              previous sub-expression.  You don't have to backslash-escape 
              the backslash (e.g. write this as \1 or as \\1, either will 
              work)

  \Q        - start verbatim quoting - all following characters represent 
              exactly themselves; no repeat counts or wildcards apply.  This 
              is _only_ terminated by a \E or the end of the regex.

  \E        - end of verbatim quoting.

  \<        - start of a word (doesn't use up a character)

  \>        - end of a word (doesn't use up a character)

  \d        - a digit

  \D        - not a digit

  \s        - a space

  \S        - not a space

  \w        - a word char ( a-z, A-Z, 0-9, or _ )

  \W        - not a word char


  (?:some-regex)
            - parenthesize a sub-expression, but _don't_ capture a sub-match 
              for it.

  (?inr-inr:regex)
            - Let you turn on or off case independence, nomultiline, and 
              right-associative (rather than the default left-associative) 
              matching.  These nest as well.

              i - case independent matching.  examples:

                   /(?i:abc)/             matches 'abc', 'AbC', 'ABC', etc...

                   /(?i:ABC(?-i:de)FGH)/  matches ABCdeFGH, abcdefgh, but 
                                          not ABCdEFGH or ABCDEFGH

              n - don't match newlines with wildcards such as .* or with 
                  anti-wildcards like [^j-z].  "-n" _allows_ matching of 
                  newlines (this is slightly counter-intuitive).  e.g.:

                  /(?n:a.*z)/             matches 'abcxyz' but not
                                           'abc
                                            xyz'

                  /(?-n:a.*z)/            matches both (this does NOT 
                                          override the <nomultiline> flag; 
                                          <nomultiline> essentially "blocks" 
                                          the searched text at newlines, and 
                                          searches within those blocks only)

              r - right-associate matching.  This changes only sub-matches, 
                  never whether the match itself succeeds or fails.  (I 
                  haven't come up with a good example for this; any 
                  suggestions?)




    --------------  Notes on Sequence of Evaluation -------------

By default, CRM114 supports string length and mathematical evaluation only 
in an EVAL statement, although it can be set to allow these in any place 
where a var-expanded variable is allowed (see the -q flag). The default 
value ( zero ) allows string length and math evaluation only in EVAL 
statements, and uses non-precedence (that is, strict left-to-right unless 
parenthesis are used) algebraic notation.  -q 1 uses RPN instead of 
algebraic, again allowing string length and math evaluation only in EVAL 
expressions.  Modes 2 and 3 allow string length and math evaluation in _any_ 
var-expanded expression, with non-precedence algebraic notation and RPN 
notation respectively.

You can override whether to use Algebraic or RPN precedence of any math 
evaluation by using an A or an R as the first character of the math 
evaluation string.

Evaluation is always left-to-right; there is no precedence of operators 
beyond the sequential passes noted below.

The evaluation is done in four sequential passes:

 1)   \-constants like \n, \o377 and \x3F are substituted.  You must use 
      three digits for octal and two digits for hex.  To write something 
      that will literally appear as one of these constants, escape the 
      backslash with another backslash, i.e. to output '\o075' use '\\o075'.

 2)   :*:var: variables are substituted (note the difference between a 
      constant like '\n' and a variable like ":*:_nl:" here - constants are 
      substituted first, then variables are substituted.).  If there is no 
      such variable, then the 'variable name' is it's own result, so 
      :*:I_am_not_defined: yields ":I_am_not_defined:".

 3)   :+:var: indirection variables are substituted.  This is equivalent to 
      taking :*: twice immediately ( note that :*::*:foo:: does not do this 
      as :*: cannot be nested - unless executed within an 'eval' command!) 
      Note that if a regular variable is indirected, the result is unchanged 
      (just as if a non-variable is :*: substituted; the result is the 
      input)

 4)   :#:var: string-length operations are performed.  (you don't have to 
      expand a :var: first, you can take the string length directly, as in 
      :#:_dw: to get the length of the default data window.  Thus, you can 
      take the length of a string that contains a :, which would normally 
      "end" the :#: operator ).

 5)   :@:expression: mathematical expressions are performed; syntax is 
      either RPN or non-precedenced (parens required) algebraic notation. 
      Embedded non-evaluated strings in a mathematical expression is 
      currently a no-no.  If the first character of the math string is an A 
      or an R, it forces Algebraic or RPN evaluation; otherwise the -q value 
      determines which evaluator to use.

      Allowed operators are:

             + - * / % ^ v > < = >= <= != e E f F g G x X

      only.

      The '^' operator is exponentiation; A ^ B is A raised to the B power.
      The 'v' operator is any-base log; A v B is the log of B in logbase
      A ; note that the logbase is _required_ and there is no default.


      Only >, >=, <, <=, = and != set logical results; they also evaluate to 
      1 and 0 for continued chain operations - e.g.

        ((:*:a: > 3) + (:*:b: > 5) + (:*:c: > 9) > 2)

      is true IFF any of the following is true

         a > 3 and b > 5
         a > 3 and c > 9
         b > 5 and c > 9

      Formatting operators: e E f F g G x X - the left side value is 
      unchanged, but the right side value is used as a formatting precision 
      value (note that x and X do not change precision), (i.e. the speed of 
      light expressed in E 7.2 precision such as by 299792458 E 7.2 is 
      3.00E+08) The operators e, E, f, F, g, G, x, and X have the same 
      meaning as in C. (beware a precision after the decimal of 10 though; 
      and note that an x or X format is limited to 32 bits.)


    -------------- Notes on Approximate REGEX matching ---------

The TRE regex engine (which is the default engine) supports approximate 
matching.  The GNU engine does not support approximate matching.

Approximate matching is specified similarly to a "repetition count" in a 
regular regex, using brackets.  This approximation applies to the previous 
parenthesized expression (again, just like repitition counts). You can 
specify maximum total changes, and how many inserts, deletes, and 
substitutions you wish to allow.  The minimum-error match is found and 
reported, if it exists within the bounds you state.

The basic syntax is:

  (text-to-match){~[maxerrs] [#maxsubsts] [+maxinserts] [-maxdeletes]}

Note that the '~' (with an optional maxerr count) is _required_ (that's how 
we know it's an approximate regex rather than just a rep-count); if you 
don't specify a max error count, you will get the best match, if you do, the 
match will have at most that many errors.

Remember that you specify the changes to the text in the _pattern_ necessary 
to make it match the text in the string being searched.

You cannot use approximate regexes and backrefs (like \1) in the same regex. 
This is a limitation of in TRE at this point.

You can also use an inequality in addition to the basic syntax above:

  (text-to-match){~[maxerrs] [basic-syntax] [nI + mD + oS < K] }

where n, m, and o are the costs per insertion, deletion, and substitution 
respectively, 'I', 'D', and 'S' are indicators to tell which cost goes with 
which kind of error, and K is the total cost of the errors; the cost of the 
errors is always strictly less than K.

Here are some examples.

  (foobar)       - exactly matches "foobar"

  (foobar){~}    - finds the closest match to "foobar", with the minimum 
                   number of inserts, deletes, and substitutions.  This 
                   match always succeeds, as six substitutions or additions 
                   is always enough to turn any string into one that 
                   contains 'foobar'.

  (foobar){~3}   - finds the closest match to "foobar", with no more than 3 
                   inserts, deletes, or substitutions

  (foobar){~2 +2 -1 #1)
                 - find the closest match to "foobar", with at most two 
                   errors total, and at most two inserts, one delete, and 
                   one substitution.

  (foobar){~4 #1 1i + 2d < 5 }
                 - find the closest match to "foobar", with at most four 
                   errors total, at most one substitution, and with the 
                   number of insertions plus 2x the number of deletions less 
                   than 5.

  (foo){~1}(bar){~1)
                 - find the closest match to "foobar", with at most one 
                   error in the "foo" and one error in the "bar".


     ------------ Notes on Classifier Choices -------

CRM114 allows the user a whole gamut of different classification algorithms, 
and various tunings on classifications.

The default classifier is a Markovian classifier that attempts to model the 
language as a Markov Random Field with site size of 5 (in plainspeak, it 
looks at each word in the context of a window 5 words long; words within 
that window are considered "directly related" and are used to generate local 
probabilities.  Words outside that 5-word window are not considered in 
relation to each word, but get considered when the window slides over to 
them).

The Markovian classifier is quite fast; more than fast enough for a single 
user or even a small office.  Filtering speed varies- with no optimization 
and overflow safeguarding (that is, with <microgroom> enabled) filtering 
speed is usually in excess of what a fractional T1 line can downlink.

The Markovian filter can be sped up considerably by turning off overflow 
safeguarding by not using <microgroom>; this optimization speeds up learning 
significantly, but it means that learning is unsafe.  System operators must 
instead manually monitor the fullness of the .css files and either manually 
groom them or expand them as required (or a script must be used to automate 
this maintenance, which can be done "in flight").

[ This classifier is the original CRM114 classifier and should be considered 
deprecated for new work, although it is still supported. The recommended 
classifier right now for production work is OSB or OSBF. ]

The next generation filter (and one of the two recommended for new 
production work] is the OSB filter, based on orthogonal sparse bigrams.  OSB 
is natively about 4x faster than full Markovian, but loses some of this 
advantage if overflow safeguarding (no <microgroom>) is used.  OSB is almost 
as accurate as Markovian if disk space is unlimited, and more accurate than 
Markovian if disk space is limited.  OSB is the recommended default for new 
users because it works very well across a broad range of inputs.  OSB uses 
.css files as well, but (because of a coding error that was released into 
the wild and unnoticed until most people were already using it in the 
incompatible form) OSB is, by default, incompatible with Markov .css files; 
there is a compile-time switch to make it compatible if you want.

Another related classifier is the OSBF (OSB with Fidelis mods such as the 
ECCF dynamic weighting) filter.  The good news is that OSBF can sometimes be 
even more accurate than OSB or Winnow, by using an exponential weighting to 
determine local probabilities, giving a filter is that it works very, Very, 
VERY well.  It's incompatible with any of the other filters (uses .cfc 
files).  It's also a good choice for new production work.

Another filter with excellent statistics is the Winnow filter.  Winnow is a 
non-statistical method that uses the OSB front end feature generator. 
Winnow is different than the statistical filters, in that it absolutely 
requires both positive training and negative training to work, but then it 
works _very_ well.

With Winnow, you don't just train errors into the correct class (i.e. in 
emulation of an SVM).  Instead, you set a "thick threshold" (usually about 
+/- 0.2 in the pR scale), and any positive class that doesn't get a per-
correct-file score of at least 0.2 pR gets trained as a positive example. 
Symmetrically, any negative class and negative example that doesn't get 
below -0.2 of pR needs to be trained as a negative example (that is, using 
the flags <winnow refute>.)

This means that with Winnow, on an error you train one or both files. Even 
if the classifier gives the correct overall result, if the per-file pR 
values are inside the -0.2 <= per_file_pR <= 0.2 thick-threshold, you may 
have to train one or both files as well. (these per-file pR values are in 
the statistics output variable).

The slowest classifier is the correlative filter.  This filter is based on a 
full NxM correlation of the unknown text against each of the known text 
corpora.  It's very slow (perhaps 100x slower than Markovian) but is capable 
of classifying texts containing stemmed words, of texts composed of binary 
files, and texts that cannot be reasonably "tokenized". The <correlate> 
filter should be considered perpetually an experimental feature, and it is 
not as well characterized as the Markovian or OSB filters. The correlative 
filter is not recommended for general production work.

A semi-experimental filter is the Hyperspace filter; this uses a variation 
on the K-Nearest-Neighbor method.  It's usually not quite as accurate as 
OSB, but it can filter against very high levels of intentional obfuscation. 
Hyperspace uses a different (and self-growing) file format.  Hyperspace 
usually trains best with a small thick-threshold training, similar to 
Winnow; as of 20061101 the factors have been re-normalized so that 
Hyperspace values within +/- 10 pR units give a good thick-threshold for 
training.

The bit-entropy filter is a different *kind* of filter; instead of using 
tokens, it constructs an optimal compression system out of the known texts, 
then it tries to compress the unknown text as much as possible, using the 
known texts as prior probabilities. Better compression implies closer match. 
The amazing thing about this is that it works at all- and it actually works 
very well.  Because there's no tokenizer, the entropy filter can work 
against languages that don't use spaces to delimit words, such as some Asian 
languages. It works quite well against spam.  This filter is still 
experimental and non-compatible upgrades may occur - keep your training data 
if you use this filter!


     ------------ Using the CRM114 built-in script debugger ---------

When you run crm114 with the '-d' commandline option or place a 'debug' 
statement in your scripts, you can use the crm114 built-in debugger to 
debug/analyze your scripts.

These debugger commands are available (you can see the complete list when 
typing 'h' or '?' at the debugger prompt 'crm-dbg[...]>'):

a :var: /value/    - alter :var: to /value/

b <n>              - toggle breakpoint on line <n>

b <label>          - toggle breakpoint on <label>

b                  - show all existing breakpoints and their sscript source 
                     code line

c                  - continue execution till breakpoint or end or ... Note 
                     that this depends on the 'm' settings (see below at 'c 
                     <m>' for a list).

c <n>              - execute <number> more statements

c <m>              - execute next statement while setting executing <mode> 
                     to:

                       n / d    - reset to default: single step [into], 
                                  break at any line

                       i        - single step [into]

                       s        - skip calls, i.e. skip to next stmt on same 
                                  call depth

                       o        - step out, i.e. skip to stmt after 
                                  'return'ing outa here

                       f        - 'fail': skip until } terminating this 
                                  scope block

                     Note: 'o' and 'f' are auto-reset: they're active only 
                           once.

                       a        - break at any statement

                       e        - break only at executable statements

                       x        - break at exceptions: when an [non]fatal 
                                  error or fault is triggered, the debugger 
                                  will pop up just prior to the moment where 
                                  CRM114 will select a suitable trap 
                                  catcher/handler.

                     Note: 'a'/'e'/'x' can be combined with any one of 
                     'i'/'s'/'o'/'f'. 'a' and 'e' are mutualy exclusive, 'x' 
                     stands on it's own.

c <m> <n>          - the above combined: set mode + step <n> statements

d                  - dump a list of all variables known to the program. This 
                     includes all 'system variables', i.e. variables 
                     starting with an '_' underscore, e.g. :_dw:

e <e>              - expand expression <e>; if none specified, show the 
                     previous one again. See the '+' command for a complete 
                     description of the <e> format.

                     NOTE that the 'regex to match available variables for 
                          adding to watch list' feature does NOT apply to 
                          'e': it only supports '<' indirection, next to 
                          literal expanded expressions such as

                            :*:good_pR__value:

                          to view the content of the script variable 
                          ':good_pR__value:'.

+ <e>              - 'watch' the expanded expression <e>. 'Watching' means 
                     the debugger will print this expression's expansion 
                     after executing any command which may have changed the 
                     program state in any way, e.g. 'c' or 'n'

                     Special processing of <e>:

                     - No arg? -> add last 'e' expr to show list

                     - Does the expression start with a '<' character? In 
                       that case, it's assumed to be an 'indirection' 
                       instruction to fetch expression from the active 
                       source code line itself. Or somewhere else...

                       The format is this:

                          <NBI

                       where

                          N is a number signifying a line number (statement 
                            number in the source, corresponding to the line 
                            number as reported in a 'v' program listing).

                          N can also be simply '.', which is the current 
                            statement.

                          B is the type of crm114 script command argument 
                            delimiter we're looking for: '/', '<', '[' or 
                            '('.

                            For instance, specify '/' if you like to grab 
                            the content of the // args of that 'output' 
                            command you're looking at.

                          I is the number of this particular braced
                            element: 1 by default.

                       Thus

                          <./2

                       indicates that we like a copy of the 2nd // argument
                       for the current statement.

                       Other indirections:

                          N can also be '<', which indicates that the
                            remainder of the line is specially formatted
                            line, which must be processed once to produce
                            the desired expression from the already existing
                            list of 'watched expressions'.

                            The 'formatted line' may include '%n%' (where n
                            is a number) arguments, where these 'n' numbers
                            point at that particular watched expression.

                       Example:

                          <<:*%5%

                       will take the 5th existing watch expression and
                       replace the '%5%' string with that. Assuming the 5th
                       watched expression is ':filename:' (for instance
                       grabbed from a match expression using '<.(3'), then
                       the resulting expression string will become:

                          :*:filename:

                       The '<<' formatted indirection is handy to use for a
                       two-step copy&paste process where output variables
                       are grabbed from the script, then merged together
                       with other text to watch the value written to such
                       variables.

                       Of course, you may specify multiple %n% items in a
                       format string, e.g.

                          << full path = ':*%3%/%7%'

                       where, for example, the 3rd and 7 th watched
                       expression read:

                         3rd:  ':spamdir:'
                         7th:  ':*:stripped_filename:'

                       It's all a matter of string concatenation,
                       really. ;-)


                     The '+' command has something extra:

                       Once the '<' indirection has been expanded, '+' (NOT
                       'e': that command does not support this) will inspect
                       the produced expression:

                     - Is <e> not a :@:, :#:, :+: or :@: expression? Assume
                       it then to be a search regex which will select any
                       (partially) matching existing variable to the watch
                       list. The regex may select several existing variables
                       at once and add them all to the 'watch list' if the
                       equivalent :@:varname: expressions are not already on
                       the watchlist.

                       NOTE: Yes, duplicate expressions are NOT added to the
                             watch list at ANY time to prevent clutter.

                       Make sure an '_' is in there when you want to add
                       system vars too:

                         + *

                       is an exception (as '*' is not a valid regex) which
                       is equivalent too

                         + .*

                       but because it's missing an '_' underscore in the
                       regex, only 'regular application variables' are added
                       to the watch list, i.e. any variables which start
                       with an '_' underscore are NOT added. For example,
                       :_dw: is NOT added (which you may consider a Good
                       Thing if you don't want your watch list to blow up
                       instantly ;-)

                         + [_]?.*

                       however WILL add all those system variables. Since
                       some variables contain a lot of data, expect a
                       loooong output now.

                       NOTE: watched expression values are written as-is to
                             stderr; presently no 'encoding' a.k.a.
                             'escaping' facilities are provided, so a
                             watched variable ':str:' which contains the C-
                             escaped bytes "ABC\rFG", i.e. 5 ASCII capitals
                             and a ASCII CR, WILL print somewhat like:

                              FG/3] :*:str: = /ABC

                             because the CR will wrap the line but not
                             advance, so 'FG' is printed over the watch
                             index number. This is described here so you've
                             been warned: terminal escape codes stored in
                             crm114 variables, etc. MAY blow up your term
                             when using the debugger watch expression
                             feature.

                       Note that the use of the '<' indirection option with
                            the '+' debugger command can lead to some
                            surprises due to the subsequent regex- if-not-a-
                            :?:-expression check. Neverthless, for extremely
                            convoluted debugger watch commands, I'm sure
                            someone will find use for this mix; personally
                            I've used '<' indirection with both 'e' and '+',
                            but only to produce :?: expressions which would
                            therefore not be considered regex search
                            criteria.

- <n>                - remove the watched expression #<n> from the show
                       list.

                       <n> not a number? Treat as a search regex and remove
                       any (partially) matching watched expression. For
                       example:

                         - str

                       will remove ANY expression from the watch list which
                       contains the string 'str'. Another example:

                         - :[@#]:

                       will remove any :@: 'math evaluation' or :#: 'string
                       length' from the watch list.

f                    - fail forward to block-closing '}'

                       Note: this is similar to 'cf' (== 'c f'); see 'c'.

h                    - show this on-line help. Add extra 'h' or '?' for list
                       of examples and extended info.

?                    - same as 'h'.

j <n>                - jump to statement <number>. Statement numbers are
                       closely related to their line numbers, but due to
                       script preprocessing by the crm114 compiler, this is
                       only somewhat correct. Therefore you should use the
                       'v' command to find the exact number for each
                       statement: it's the 'linenumber' printed before each
                       statement when displayed using the 'v' debugger
                       command.

j <label>            - jump to statement <label>

l                    - liaf backward to block-opening '{'

m <m>                - set executing <mode>. See 'c' above for <m> mode
                       flags list

n                    - execute next statement (same as 'c 1')

n <m>                - same as 'c <m> 1'

q                    - quit the program and exit

t                    - toggle user-level tracing (which is turned on by
                       default at the start of the run when you specified
                       the '-d' crm114 commandline option to enter the
                       debugger).

T                    - toggle system-level tracing

v <n>.<m>            - view source code statement <n> till <m>.

                       No <n> given: show current statement.

                       Alternatives:

                         'v >5' = (type '>' char!) show current and 5 extra;

                         'v ~3' = show context of 3 lines before till 3
                                  lines after the currently active statement

;                    - no-op: separate multiple commands on a single
                       debugger commandline; you can use this to merge
                       multiple commands on a single line, which is a handy
                       way to define a 'macro' which will be repeated by the
                       '.' debugger command (see below). Nevertheless, some
                       commands cannot be terminated by a ';' but only by
                       EdOfLine, i.e. a press on the ENTER/Return key. These
                       are:

                         e
                         +
                         -
                         a
                         h

.                    - repeat the previous commandline which did not start
                       with the '.' command.

                       'previous commandline' in this particular
                       circumstance is any commandline that contains AT
                       LEAST one of the 'executing' debugger statements 'n',
                       'c', 'j', 'f' or 'l', as repeating other 'display'
                       debugger commands is rather useless.

                       NOTE: when you want to be nasty, you could specify
                             commandlines like these to cause an 'infinite
                             loop' in the debugger:

                               n ; .

                             which will single-step and then repeat the last
                             stored commandline which did not start with
                             '.', i.e. this same line. HOWEVER, the debugger
                             is smart enough to break this infinite loop
                             when a breakpoint or the end- of-program has
                             been reached, so that you can type other
                             commands on the debugger prompt.

Some examples
=============

To start off:

m ie

        Set Step Mode to default 'step into' but only show the prompt again,
        when we've hit an executable statement. This also means '{' and '}'
        braces will be skipped, together with any commant lines.

n ; v >5

        Very handy command sequence, which will execute one more statement,
        then show the currently active and 5 more source lines. Once you've
        executed this debugger 'macro' once, follow up with:

.

        Execute previous 'debugger macro' again. A 'debugger macro' is just
        the last line you typed in the debugger, which did NOT start with a
        '.' dot. By hitting ENTER on the debugger prompt, you 'erase' any
        'macro' in memory and '.' will do nothing.

...

        Execute the last recorded 'debugger macro' 3 times in a row.

+ :*:filename:

        Add the expression ':*:filename:' to the watched expressions, which
        are printed by the debugger each time when a prompt is to be shown.
        Of course, the expressions support all CRM114 :?: expression types,
        include :@: math expressions.

        Note: invalid expressions do not result in error messages; these are
        simply processed as best as possible and the result printed on
        screen.

        Note 2: when you use '+', semicolons which can usually be used to
        separate multiple commands on a single debugger prompt line, are
        ignored: anything following the '+' is considered part of the
        expression to be watched.

+ :_.*HOME

        Add any system variable (i.e. variable which starts with an '_'
        underscore) matching the ':_.*HOME' regex. This means a variable
        called ':_HOME_VAR:' will be added to the watch list too as partial
        matches are accepted!

- 4

        remove the 4th watched expression


- *

        Special case, identical to '- .*' which remove all watched
        expressions as each will match the '.*' regex.


Extended examples
=================

c do 5

        Reset Step mode to default: break at ANY statement (including
        comments and braces) and then run until EITHER you've 'executed' 5
        lines, OR until you've finally exited the function by having
        executed a 'RETRUN' statement. Note that when you execute 'RETURN'
        when the count is not '5 lines executed', CRM114 will CONTINUE
        RUNNING until the 5 statements have been executed.

b 511

        Put a breakpoint to program line 511. Note that the debugger will
        ALWAYS pop up and show a breakpoint on the given line, even when
        step mode is 'E' (executable statements only) and the breakpoint is
        located at a brace or comment line.


Note that you do not need to separate debugger commands and their first
     parameter by whitespace; this may however be easier on the human doing
     the typing, so lines like

          n5

     or

          co

     are identical to their whitespaced counterparts:

          n 5

     and

          c o

Also note that whitespace is not necessary - though it helps you, the user,
to better see what you are doing. The command sequence

  n ie ; v > 5

can therefore also be written without any whitespace like this

  nie;v>5

which might put a shudder or a smile on folks who remember editors like
TECO, as this is getting close to line noise too.


The debugger prompt: how to 'read' it
-------------------------------------

It looks somewhat like this:

  crm-dbg[0|Ai-]>

and contains a wealth of information within the [] square brackets:

- The initial number is the current stack call depth (zero-based) of the
  crm114 script execution.

  For instance, you may find that a 'trap' catching a fault inside a
  'call'ed routine does not unwind the stack like 'return', despite the fact
  that the code looks like the 'trap' is in the main code line (see
  mailreaver.crm et al for an example of that): developers that come from a
  classic programming background may be misled by this visual 'cue' as
  crm114 does in fact NOT support 'scope' for variables or code flow in the
  classic sense; in other words: all variables are visible globally, no
  matter where and when they were instanciated, and trap- ped faults do not
  cause stack unwinding, just because the 'trap' is several 'layers' of '}'
  closing braces down the code line.

- The three characters following the '|' pipe symbol are encoded as follows,
  assuming the three positions are numbered 1, 2 and 3:

  1: A/E       - 'E' when the step mode ('m') is set to 'e' (single-step
                 from executable to executable statement == skip comments
                 and curly braces), 'A' otherwise.

  2: i/?/S/O/F - '?' when the step mode ('m') is a weird combo (debugger
                 developer error!),

                 'S' when mode is 'step over call' ('m s'),

                 'O' when it's mode 'step out of call' ('m o'),

                 'F' when set to 'step out of this brace's scope' == step
                 beyond the next closing curly brace ('m f'),

                 'i' otherwise == 'step into' mode ('m i')

                 NOTE that 'o' and 'F' are 'autoresetting' modes, so having
                      them executed once, will reset the mode to 'step into'
                      ('i'). This has been done to prevent nasty surprises
                      when you're debugging and hammering a bit too
                      enthousiastic on the '.' repeat key, for instance,
                      when your previous commandline was, for example, 'n ;
                      v > 5' or some other execution stepping combo.

  3: -/B/b/X/$ - 'B' when the debugger prompt showed up due to a 'debug'
                 statement being hit in the script.

                 'b' when a configured breakpoint was triggered. This can
                 also happen when executing downcounting commands like, for
                 example, 'n 10', which will single-step 10 executable
                 commands when running in mode 'm ie'.

                 'X' when the debugger just popped up while an error in the
                 script caused a trappable fault. Note that the debugger
                 will show the failure notice (message) when popping up.

                 '$' when the script has completed running. Note that in
                 this situation, you cannot use any jump debugger commands
                 (e.g. 'j' or 'l') anymore with any chance at success: the
                 machine is already on its way to wind down and this is
                 merely provided as a service to have a last-minute oggle at
                 those variables (try '+ *' for instance) before we shut
                 down.

                 Hence the use of any 'executing' debugger command, such as
                 'c' or 'n' will simply terminate crm114 as the script
                 execution has terminated.

                 Maybe a later release allows 'j' or other commands to be
                 successful in this situation, but that is not to be today,
                 alas.


     ------------ Overall Language Notes ------------

Here's how to remember what goes where in the CRM114 language.

Unlike most computer languages, CRM114 uses inflection (or declension)
rather than position to describe what role each part of a statement plays.
The declensions are marked by the delimiters- the /, ( and ), < and >, and [
and ].

By and large, you can mix up the arguments to each kind of statement without
changing their meaning.  Only the ACTION needs to be first. Other parts of
the statement can occur in any order, save that multiple (paren_args) and
/pattern_args/ must stay in their nominal order but can go anywhere in the
statement.  They do not need to be consecutive.

The parts of a CRM114 statement are:

          ACTION             - the verb.  This is at the start of the
                               statement.

          /pattern/          - the overall pattern the verb should use,
                               analogous to the "subject" of the statement.

          <flags>            - modifies how the ACTION does the work. You'd
                               call these "adverbs" in human languages.

          (vars)             - what variables to use as adjuncts in the
                               action (what would be called the "direct
                               objects").  These can get changed when the
                               action happens.

          [limited-to]       - where the action is allowed to take place
                               (think of it as the "indirect object").
                               Generally these are not directly changed by
                               the action.  These may contain "adjectival
                               phrases - var restrictions, either by
                               subscript or by regex or both.

