Manpages

NAME

glark − Search text files for complex regular expressions

SYNOPSIS

glark [options] expression file ...

DESCRIPTION

Similar to "grep", "glark" offers: Perl-compatible regular expressions, color highlighting of matches, context around matches, complex expressions ("and" and "or"), grep output emulation, and automatic exclusion of non-text files. Its regular expressions should be familiar to persons experienced in Perl, Python, or Ruby. In the above synopsis, file may also be a list of files in the form of a path.

OPTIONS

Input

−0[nnn]

Use \nnn (octal) as the input record separator. If nnn is omitted, use ’\n\n’ as the record separator, which treats paragraphs as lines.

−d ACTION , −−directories=ACTION

Directories are processed according to the given ACTION , which by default is "read". If ACTION is "recurse", each file in the directory is read and each subdirectory is recursed into (equivalent to the −r option). If ACTION is "skip", directories are not read, and no message is produced.

−−binary−files=TYPE

Specify how to handle binary files, thus overriding the default behavior, which is to denote the binary files that match the expression, without displaying the match. TYPE may be one of: "binary", the default; "without−match", which results in binary files being skipped; and "text", which results in the binary file being treated as text, the display of which may have bad side effects with the terminal. Note that the default behavior has changed; this previously was to skip binary files. The same effect may be achieved by setting binary-files to "without−match" in the ~/.glarkrc file.

−−basename EXPR , −−name EXPR

Search only files whose names match the given regular expression. As in find(1), this works on the basename of the file. This expression can be negated and modified with "i", such as ’/io\.[hc]$/i’.

−−fullname EXPR , −−path EXPR

Search only files whose names, including path, match the given regular expression. As in find(1), this works on the path of the file. This expression can be negated and modified with "i", such as ’/source/.*/ui/.*widget\.java/i’.

−M, −−exclude−matching

Do not search files whose names match the given expression. This can be useful for finding external references to a file, or to a class (assuming that class names match file names).

−r, −−recurse

Recurse through directories. Equivalent to −−directories=read.

Matching

−a NUM expr1 expr2
−−and
NUM expr1 expr2

Match both of the two expressions, within NUM lines of each other. See the EXPRESSIONS section for more information.

−b NUM[%], −−before NUM[%]

Restrict the search to before the given location, which represents either the number of the last line within the valid range, or the percentage of lines to be searched.

−f NUM[%], −−after NUM[%]

Restrict the search to after the given section, which represents either the number of the first line within the valid range, or the percentage of lines to be skipped.

−i, −−ignore−case

Match regular expressions without regard to case. The default is case sensitive.

−m NUM , −−match−limit NUM

Find only the first NUM matches in each file.

−o expr1 expr2
−−or
expr1 expr2

Match either of the two expressions. See the EXPRESSIONS section for more information.

−R, −−range NUM[%] NUM[%]

Restrict the search to the given range of lines.

−v, −−invert−match

Show lines that do not match the expression.

−w, −−word, −−word−regexp

Put word boundaries around each pattern, thus matching only where the full word(s) occur in the text. Thus, "glark −w Foo" is the same as "glark ’/\bFoo\b/’".

−x, −−line−regexp

Select only where the entire line matches the pattern(s).

−−xor expr1 expr2

Match either of the two expressions, but not both. See the EXPRESSIONS section for more information.

Output

−A NUM , −−after−context=NUM

Print NUM lines after a matched expression.

−B NUM , −−before−context=NUM

Print NUM lines before a matched expression.

−C [ NUM ], −NUM, −−context[=NUM]

Output NUM lines of context around a matched expression. The default is no context. If no NUM is given for this option, the number of lines of context is 2.

−c, −−count

Instead of normal output, display only the number of matches in each file.

−F, −−file−color COLOR

Specify the highlight color for file names. See the HIGHLIGHTING section for the values that can be used.

−−no−filter

Display the entire file(s), presumably with matches highlighted.

−g, −−grep

Produce output like the grep default: file names, no line numbers, and a single line of the match, which will be the first line for matches that span multiple lines. If the EMACS environment variable is set, this value is set to true. Thus, running glark under Emacs results in the output format expected by Emacs.

−h, −−no−filename

Do not display the names of the files that matched.

−H, −−with−filename

Display the names of the files that matched. This is the default behavior.

−l, −−files−with−matches

Print only the names of the file that matched the expression.

−L, −−files−without−match

Print only the names of the file that did not match the expression.

−n, −−line−number

Display the line numbers. This is the default behavior.

−N, −−no−line−number

Do not display the line numbers.

−T, −−text−color COLOR

Specify the highlight color for text. See the HIGHLIGHTING section for more information.

−u, −−highlight

Enable highlighting, which uses ANSI escape sequences. This is the default behavior. See the HIGHLIGHTING section for more information.

−U, −−no−highlight

Disable highlighting.

−y, −−extract−matches

Display only the region that matched, not the entire line. If the expression contains "backreferences" (i.e., expressions bounded by "( ... )"), then only the portion captured will be displayed, not the entire line. This option is useful with "−g", which eliminates the default highlighting and display of file names.

−Z, −−null

When in −l mode, write file names followed by the ASCII NUL character (’\0’) instead of ’\n’.

Debugging/Errors

−?, −−help

Display the help message.

−−config

Display the settings glark is using, and exit. Since this is run after configuration files are read, this may be useful for determining values of configuration parameters.

−−explain

Write the expression in a more legible format, useful for debugging.

−q, −s, −−quiet, −−no−messages

Suppress warnings.

−Q, −−no−quiet

Enable warnings. This is the default.

−V, −−version

Display version information.

−−verbose

Display normally suppressed output, for debugging purposes.

EXPRESSIONS

Regular Expressions

Regular expressions are expected to be in the Perl/Ruby format. "perldoc perlre" has more general information. The expression may be of either form:

    something
    /something/

There is no difference between the two forms, except that with the latter, one can provide the "ignore case" modifier, thus matching "someThing" and "SoMeThInG":

    % glark /something/i

Note that this is redundant with the "−i" ("−−ignore−case") option.

All regular expression characters and options are available, such as "\w", ".*?" and "[^9]". For example:

    % glark ’\b[a-z][^\d]\d{1,3}.*\s*>>\s*\d+\s*.*& +\d{3}’

If the and and or options are not used, the last non-option is considered to be the expression to be matched. In the following, "printf" is used as the expression.

    % glark -w printf *.c

POSIX character classes (e.g., [[:alpha:]]) are also supported.

Complex Expressions

Complex expressions combine regular expressions (and complex expressions themselves) with logical AND , OR , and XOR operators.

−o expr1 expr2
−−or
expr1 expr2 −−end−of−or

Match either of the two expressions. The results of the two forms are equivalent. In the latter syntax, the −−end−of−or is optional.

−a number expr1 expr2
−−and
number expr1 expr2 −−end−of−and

Match both of the two expressions, within <number> lines of each other. As with the or option, the results of the two forms are equivalent, and the −−end−of−and is optional. The forms −aNUM and −−and=NUM are also supported.

If the number provided is −1 (negative one), the distance is considered to be "infinite", and thus, the condition is satisfied if both expressions match within the same file.

If the number provided is 0 (zero), the condition is satisfied if both expressions match on the same line.

A warning will be issued if the value given in the number position does not appear to be numeric.

−−xor expr1 expr2 −−end−of−xor

Match either of the two expressions, but not both. −−end−of−or is optional.

Negated Regular Expressions

Regular expressions can be negated, by being prefixed with ’!’, and using the ’/’ quote characters around the expression, such as:

    !/expr/

This has the effect of "match anything other than this". For a single expression, this is no different than the −v/−−invert−match option, but it can be useful in complex expressions, such as:

    --and 0 this ’!/that/’

which means "match and line that has "this", but not "that".

HIGHLIGHTING

Matching patterns and file names can be highlighted using ANSI escape sequences. Both the foreground and the background colors may be specified, from the following:

    black
    blue
    cyan
    green
    magenta
    red
    white
    yellow

The foreground may have any number of the following modifiers applied:

    blink
    bold
    concealed
    reverse
    underline
    underscore

The format is " MODIFIERS FOREGROUND on BACKGROUND ". For example:

    red
    black on yellow                    (the default for patterns)
    reverse bold                       (the default for file names)
    green on white
    bold underline red on cyan

By default text is highlighted as black on yellow. File names are written in reversed bold text.

EXAMPLES

Basic Usage

% glark format *.h

Searches for "format" in the local .h files.

% glark −−ignore−case format *.h

Searches for "format" without regard to case. Short form:
% glark −i format *.h

% glark −−context=6 format *.h

Produces 6 lines of context around any match for "format". Short forms:
% glark −C 6 format *.h
% glark −6 format *.h

% glark −−exclude−matching Object *.java

Find references to "Object", excluding the files whose names match "Object". Thus, SessionBean.java would be searched; EJBObject.java would not. Short form:
% glark −M Object *.java

% glark −−grep −−extract−matches ’\w+\.printStackTrace\(.*\)’
*.java

Show where exceptions are dumped. Note that the "−−grep" option is used, thus turning off highlighting and display of file names. If the "−−no−filename" option is used, the output will consist of only the matching portions. The short form of this command is:
% glark −gy ’\w+\.printStackTrace\(.*\)’ *.java

% glark −−grep −−extract−matches ’(\w+)\.printStackTrace\(.*\)’
*.java

Show only the variable name of exceptions that are dumped. Short form:
% glark −gy ’(\w+)\.printStackTrace\(.*\)’ *.java

% who ⎪ glark −gy ’^(\S+)\s+\S+\s*May 15’

Display only the names of users who logged in today.

% glark −l ’\b\w{25,}\b’ *.txt

Displays (only) the names of the text files that contain "words" at least 25 characters long.

% glark −−files−without−match ’"\w+"’

Displays (only) the names of the files that do not contain strings consisting of a single word. Short form:
% glark −L ’"\w+"’

Highlighting

% glark −−text−color "red on white" ’\b[[:digit:]]{5}\b’ *.c

Display (in red text on a white background) occurrences of exactly 5 digits. Short form:
% glark −T "red on white" ’\b\d{5}\b’ *.c

See the HIGHLIGHTING section for valid colors and modifiers.

Complex Expressions

% glark −−or format print *.h

Searches for either "printf" or "format". Short form:
% glark −o format print *.h

% glark −−and 4 printf format *.c *.h

Searches for both "printf" or "format" within 4 lines of each other. Short form:
% glark −a 4 printf format *.c *.h

% glark −−context=3 −−and 0 printf format *.c

Searches for both "printf" or "format" on the same line ("within 0 lines of each other"). Three lines of context are displayed around any matches. Short form:
% glark −3 −a 0 printf format *.c

% glark −8 −i −a 15 −a 2 pthx ’\.\.\.’ −o ’va_\w+t’ die *.c

(In order of the options:) Produces 8 lines of context around case insensitive matches of ("phtx" within 2 lines of ’...’ (literal)) within 15 lines of (either "va_\w+t" or "die").

% glark −−and −1 ’#define\s+YIELD’ ’#define\s+dTHR’ *.h

Looks for "#define\s+YIELD" within the same file (−1 == "infinite distance") of "#define\s+dTHR". Short form:
% glark −a −1 ’#define\s+YIELD’ ’#define\s+dTHR’ *.h

Range Limiting

% glark −−before 50% cout *.cpp

Find references to "cout", within the first half of the file. Short form:
% glark −b 50% cout *.cpp

% glark −−after 20 cout *.cpp

Find references to "cout", starting at the 20th line in the file. Short form:
% glark −b 50% cout *.cpp

% glark −−range 20 50% cout *.cpp

Find references to "cout", in the first half of the file, after the 20th line. Short form:
% glark −R 20 50% cout *.cpp

ENVIRONMENT

GLARKOPTS

A string of whitespace-delimited options. Due to parsing constraints, should probably not contain complex regular expressions.

$HOME/.glarkrc

A resource file, containing name/value pairs, separated by either ’:’ or ’=’. The valid fields of a .glarkrc file are as follows, with example values:

    after-context:  1
    before-context: 6
    context:        5
    file-color:     blue on yellow
    highlight:      off
    ignore-case:    false
    quiet:          yes
    text-color:     bold reverse
    verbose:        false
    grep:           true

"yes" and "on" are synonymnous with "true". "no" and "off" signify "false".

My ~/.glarkrc file is the following:

    file-color:   bold reverse
    text-color:   bold black on yellow
    context:      2
    highlight:    on
    verbose:      false
    ignore-case:  false
    quiet:        yes
    word:         false
    binary-files: without-match

local .glarkrc

See the local-config-files field below:

Fields
after-context

See the −−after−context option. Example, for 3 lines:

    after-context: 3

before-context

See the −−before−context option. Example, for 7 lines:

    before-context: 7

binary-files

See the −−binary−files option. Example, to skip binary files:

    binary-files: without-match

context

See the −−context option, Example, for 2 lines before and after matches:

    context: 2

expression

See the EXPRESSION section. Example:

    expression: --or ’^\s*public\s+class\s+\w+’ ’^\s*\w+\(

file-color

See the −−file−color option. Example for white on black:

    file-color: white on black

filter

See the −−filter option. Example, to show the entire file:

    filter: false

grep

See the −−grep option. Example, to run in grep mode:

    grep: true

highlight

See the −−highlight option. To turn off highlighting:

    highlight: false

ignore-case

See the −−ignore−case option. To make matching case−insensitive:

    ignore-case: true

known-nontext-files

The extensions of files that should be considered to always be nontext (binary). If a file extension is not known, the file contents are examined for nontext characters. Thus, setting this field can result in faster searches. Example:

    known-nontext-files: class exe dll com

See the Exclusion of Non-Text Files section in NOTES for the default settings.

known-text-files

The extensions of files that should be considered to always be text. See above for more. Example:

    known-text-files: ini bat xsl xml

See the Exclusion of Non-Text Files section in NOTES for the default settings.

local-config-files

By default, glark uses only the configuration file ~/.glarkrc. Enabling this makes glark search upward from the current directory for the first .glarkrc file.

This can be used, for example, in a Java project, where .class files are binary, versus a PHP project, where .class files are text:

    /home/me/.glarkrc
        local-config-files: true
    /home/me/projects/java/.glarkrc
        known-nontext-files: class
    /home/me/projects/php/.glarkrc
        known-text-files: class

With this configuration, .class files will automatically be treated as binary file in Java projects, and .class files will be treated as text. This can speed up searches.

Note that the configuration file ~/.glarkrc is read first, so the local configuration file can override those settings.

quiet

See the −−quiet option.

show-break

Whether to display breaks between sections, when displaying context. Example:

    show-break: true

By default, this is false.

text-color

See the −−text−color option. Example:

    text-color: bold blue on white

verbose

See the −−verbose option. Example:

    verbose: true

verbosity

See the −−verbosity option. Example:

    verbosity: 4

NOTES

Exclusion of Non-Text Files

Non-text files are automatically skipped, by taking a sample of the file and checking for an excessive number of non-ASCII characters. For speed purposes, this test is skipped for files whose suffixes are associated with text files:

    c
    cpp
    css
    h
    f
    for
    fpp
    hpp
    html
    java
    mk
    php
    pl
    pm
    rb
    rbw
    txt

Similarly, this test is also skipped for files whose suffixes are associated with non-text (binary) files:

    Z
    a
    bz2
    elc
    gif
    gz
    jar
    jpeg
    jpg
    o
    obj
    pdf
    png
    ps
    tar
    zip

In the code, see @@KNOWN_TEXT and @@KNOWN_NONTEXT for the extensions of files that are presumed to be text and nontext. This list may be easily modified.

The exit status is 0 if matches were found; 1 if no matches were found, and 2 if there was an error. An inverted match (the −v/−−invert−match option) will result in 1 for matches found, 0 for none found.

SEE ALSO

For regular expressions, the "perlre" man page.

Mastering Regular Expressions, by Jeffrey Friedl, published by O’Reilly.

CAVEATS

"Unbalanced" leading and trailing slashes will result in those slashes being included as characters in the regular expression. Thus, the following pairs are equivalent:

    /foo        "/foo"
    /foo\/      "/foo/"
    /foo\/i     "/foo/i"
    foo/        "foo/"
    foo/        "foo/"

The code to detect nontext files assumes ASCII , not Unicode.

AUTHOR

Jeff Pace <jpace at incava dot org>

COPYRIGHT

Copyright (c) 2002, Jeff Pace.

All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the terms of the Lesser GNU Public License. See http://www.gnu.org/licenses/lgpl.html for more information.