Thursday, October 9, 2008

Perl Bitwise String Operators

The following perl scrap seems horrible, see if you can understand it
"
if ($ident !~ /^\241\262\303\324/ &&
   $ident !~ /^\324\303\262\241/ &&
   $ident !~ /^\241\262\315\064/ &&
   $ident !~ /^\064\315\262\241/)
{
       die "ERROR: Not a tcpdump file (or unknown version) $file\n";
}
"
To understand it, we have to first get farmiliar with perl bitwise string operators.
Bitstrings of any size may be manipulated by the bitwise operators (~ | & ^).
If you know them already, just skip them.

~ is the negation unary operator:
"Unary "~'' performs bitwise negation, i.e., 1's complement. For example, 0666 &~ 027 is 0640." referenced from here.

| is the binary or operator, 
"Binary "|'' returns its operators ORed together bit by bit."

& is the binary and operator,
"Binary "&'' returns its operators ANDed together bit by bit."

^ is the binary exclusive or operator,
"Binary "^'' returns its operators XORed together bit by bit."

So what does "!~" mean?
Unary "!'' performs logical negation, i.e., "not''.  "~" for exclusive or.
Well, we should learn that, "!~" itself is a binding operator.
First have a look at "=~", which is related with "!~"

Binary "=~'' binds a scalar expression to a pattern match. Certain operations search or modify the string $_ by default. This operator makes that kind of operation work on some other string. The right argument is a search pattern, substitution, or transliteration. The left argument is what is supposed to be searched, substituted, or transliterated instead of the default $_. The return value indicates the success of the operation. (If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time. This can be is less efficient than an explicit search, because the pattern must be compiled every time the expression is evaluated.

Binary "!~'' is just like "=~'' except the return value is negated in the logical sense.

Unary "\'' creates a reference to whatever follows it.
\241 can be viewed as a char in C, for example, CR+LF is "\012\015" or "\cJ\cM".

Here "^" is not exclusive or, but define the pattern that is at the beginning of a line

For "$ident !~ /^\241\262\303\324/", the left thing for us to know is /.../
This is a search pattern.
"$ident" is a variable to be matched.
So we know what the following scrap means now:
"
if ($ident !~ /^\241\262\303\324/ &&
    $ident !~ /^\324\303\262\241/ &&
    $ident !~ /^\241\262\315\064/ &&
    $ident !~ /^\064\315\262\241/)
{
        die "ERROR: Not a tcpdump file (or unknown version) $file\n";
}
"
If the beginning(^) of $ident is not one of the following for cases:
\241\262\303\324
\324\303\262\241
\241\262\315\064
\064\315\262\241
It will die with error "ERROR: Not a tcpdump file (or unknown version) $file\n"

This is the first step to understand god-read-perl


1 comment:

lip'n blush said...

Thanks for a great read.