ports//lang/parrot/work/parrot-0.5.1/docs/book/ch03_pir

=pod

=head0 Parrot Intermediate Representation

Z<CHP-10>

X<Parrot Intermediate Representation;;(see PIR)>
X<PIR (Parrot intermediate representation)>
The Parrot intermediate representation (PIR) is an overlay on top of
Parrot assembly language, designed to make the developer's life
easier. It has many high-level features that ease the pain of working
with PASM code, but it still isn't a high-level language.

Internally, Parrot works a little differently with PASM and PIR source
code, so each has different restrictions. The default is to run in a
mixed mode that allows PASM code to combine with the higher-level syntax
unique to PIR.

X<.pasm files> A file with a F<.pasm> extension is treated as pure
PASM code, as is any file run with the C<-a> command-line option. This
mode is mainly used for running pure PASM tests. Parrot treats any
extension other than F<.pasm> as a PIR file.
As a convention files containing PIR code generally have a  F<.pir>X<.pir files>
extension.

X<PIR (Parrot intermediate representation);documentation>
The documentation in F<imcc/docs/> or F<docs/> and the test suite in
F<imcc/t> are good starting points for digging deeper into the PIR
syntax and functionality.

=head1 Statements

Z<CHP-10-SECT-1>

X<statements (PIR)>
X<PIR (Parrot intermediate representation);statements>
The syntax of statements in PIR is much more flexible than PASM. All
PASM opcodes are valid PIR code, so the basic syntax is the same. The
statement delimiter is a newline C<\n>, so each statement has to be on
its own line. Any statement can start with a label. Comments are
marked by a hash sign (C<#>) and PIR allows POD blocks.

But unlike PASM, PIR has some higher-level constructs, including
symbol operators:

  I1 = 5                       # set I1, 5

named variables:

  count = 5

and complex statements built from multiple keywords and symbol
operators:

  if I1 <= 5 goto LABEL        # le I1, 5, LABEL

We'll get into these in more detail as we go.

=head1 Variables and Constants

Z<CHP-10-SECT-2>

X<constants (PIR)>
X<PIR (Parrot intermediate representation);constants>
X<strings;in PIR>
Literal constants in PIR are the same as constants in PASM. Integers
and floating-point numbers are numeric literals and strings are
enclosed in quotes. PIR strings use the same escape sequences as PASM.

=head2 Parrot Registers

Z<CHP-10-SECT-2.1>

PIR code has a variety of ways to store values while you work with
them. The most basic way is to use Parrot registers directly. PASM
register names always start with a single character that shows whether
it is an integer, numeric, string, or PMC register, and end with the
number of the register (between 0 and 31):

  S0 = "Hello, Polly.\n"
  print S0

When you work directly with Parrot registers, you can only have 32
registers of any one type at a time.N<Only 31 for PMC registers,
because C<P31> is reserved for spilling.> If you have more than that,
you have to start shuffling stored values on and off the user stack.
You also have to manually track when it's safe to reuse a register.
This kind of low-level access to the Parrot registers is handy when
you need it, but it's pretty unwieldy for large sections of code.

=head2 Temporary Registers

Z<CHP-10-SECT-2.2>

X<$ (dollar sign);for temporary registers (PIR)>
X<temporary registers (PIR)>
X<PIR (Parrot intermediate representation);temporary registers>
PIR provides an easier way to work with Parrot registers. The
temporary register variables are named like the PASM registers--with a
single character for the type of register and a number--but they start
with a C<$> character:

  set $S42, "Hello, Polly.\n"
  print $S42

X<registers;PASM registers vs. PIR temporary register variables>
The most obvious difference between Parrot registers and temporary
register variables is that you have an unlimited number of
temporaries. Parrot handles register allocation for you. It keeps
track of how long a value in a Parrot register is needed and when that
register can be reused.

The previous example used the C<$S42> temporary. When the code is
compiled, that temporary is allocated to a Parrot register. As long as
the temporary is needed, it is stored in the same register. When it's
no longer needed, the Parrot register is re-allocated to some other
value. This example uses two temporary string registers:

  $S42 = "Hello, "
  print $S42
  $S43 = "Polly.\n"
  print $S43

Since they don't overlap, Parrot allocates both to the C<S16>
register. If you change the order a little so both temporaries are
needed at the same time, they're allocated to different registers:

  $S42 = "Hello, "  # allocated to S17
  $S43 = "Polly.\n" # allocated to S16
  print $S42
  print $S43

In this case, C<$S42> is allocated to C<S17> and C<$S43> is allocated
to C<S16>.

Parrot allocates temporary variablesN<As well as named variables,
which we'll talk about next.> to Parrot registers in ascending order
of their score. The score is based on a number of factors related to
variable usage. Variables used in a loop have a higher score than
variables outside a loop. Variables that span a long range have a
lower score than ones that are used only briefly.

If you want to peek behind the curtain and see how Parrot is
allocating registers, you can run it with the C<-d> switch to turn on
debugging output.

  $ parrot -d1000 hello.pir

If F<hello.pir> contains this code from the second example above
(wrapped in a subroutine definition so it will compile):

  .sub _main
    $S42 = "Hello, "  # allocated to S17
    $S43 = "Polly.\n" # allocated to S16
    print $S42
    print $S43
    end
  .end

it produces this output:

  code_size(ops) 11  oldsize 0
  0 set_s_sc 17 1 set S17, "Hello, "
  3 set_s_sc 16 0 set S16, "Polly.\n"
  6 print_s 17    print S17
  8 print_s 16    print S16
  10 end  end
  Hello, Polly.

That's probably a lot more information than you wanted if you're just
starting out. You can also generate a PASM file with the C<-o> switch
and have a look at how the PIR code translates:

  $ parrot -o hello.pasm hello.pir

or just

  $ parrot -o- hello.pir

to see resulting PASM on I<stdout>.

You'll find more details on these options and many others in
A<CHP-11-SECT-4>"Parrot Command-Line Options" in Chapter 11.

=head2 Named Variables

Z<CHP-10-SECT-2.3>

X<named variables (PIR)>
X<PIR (Parrot intermediate representation);named variables>
Named variables can be used anywhere a register or temporary register
is used. They're declared with the C<.local> statement or the
equivalent C<.sym> statement, which require a variable type and a
name:

  .local string hello
  set hello, "Hello, Polly.\n"
  print hello

This snippet defines a string variable named C<hello>, assigns it the
value "Hello, Polly.\n", and then prints the value.

X<types;variable (PIR)>
X<variables;types (PIR)>
The valid types are C<int>, C<num>, C<string>, and C<pmc> or any
Parrot class name (like C<PerlInt> or C<PerlString>). It should come
as no surprise that these are the same divisions as Parrot's four
register types. Named variables are valid from the point of their
definition to the end of the compilation unit.

The name of a variable must be a valid PIR identifier. It can contain
letters, digits, and underscores, but the first character has to be a
letter or underscore. Identifiers don't have any limit on length yet,
but it's a safe bet they will before the production release. Parrot
opcode names are normally not allowed as variable names, though there
are some exceptions.

=head3 PMC variables

Z<CHP-10-SECT-2.3.1>

PMC registers and variables act much like any integer, floating-point
number, or string register or variable, but you have to instantiate a
new PMC object before you use it. The C<new> instruction creates a new
PMC. Unlike PASM, PIR doesn't use a dot in front of the class name.

  P0 = new PerlString        # same as new P0, .PerlString
  P0 = "Hello, Polly.\n"
  print P0

This example creates a C<PerlString> object, stores it in the PMC
register C<P0>, assigns the value "Hello, Polly.\n" to it, and prints
it. The syntax is exactly the same for temporary register variables:

  $P4711 = new PerlString
  $P4711 = "Hello, Polly.\n"
  print $P4711

With named variables the type passed to the C<.local> directive is
either the generic C<pmc> or a type compatible with the type passed to
C<new>:

  .local PerlString hello    # or .local pmc hello
  hello = new PerlString
  hello = "Hello, Polly.\n"
  print hello

=head2 Named Constants

Z<CHP-10-SECT-2.4>

X<PIR (Parrot intermediate representation);named constants>
X<named constants (PIR)>
The C<.const> directive declares a named constant. It's very similar
to C<.local>, and requires a type and a name. The value of a constant
must be assigned in the declaration statement. As with named
variables, named constants are visible only within the compilation
unit where they're declared. This example declares a named string
constant C<hello> and prints the value:

  .const string hello = "Hello, Polly.\n"
  print hello

Named constants function in all the same places as literal constants,
but have to be declared beforehand:

  .const int the_answer = 42        # integer constant
  .const string mouse = "Mouse"     # string constant
  .const num pi = 3.14159           # floating point constant

=head2 Register Spilling

Z<CHP-10-SECT-2.5>

X<registers;spilling in PIR>
X<PIR (Parrot intermediate representation);register spilling>
As we mentioned earlier, Parrot allocates all temporary register
variables and named variables to Parrot registers. When Parrot runs
out of registers to allocate, it has to store some of the variables
elsewhere. This is known as I<spilling>. Parrot spills the variables
with the lowest score and stores them in a C<PerlArray> object while
they aren't used, then restores them to a register the next time
they're needed. Consider an example that creates 33 integer variables,
all containing values that are used later:

  set $I1, 1
  set $I2, 2
  ...
  set $I33, 33
  ...
  print $I1
  print $I2
  ...
  print $I33

Parrot allocates the 32 available integer registers to variables with
a higher score and spills the variables with a lower score. In this
example it picks C<$I1> and C<$I2>. Behind the scenes, Parrot
generates code to store the values:

  new P31, "PerlArray"
  ...
  set I0, 1           # I0 allocated to $I1
  set P31[0], I0      # spill $I1
  set I0, 2           # I0 reallocated to $I2
  set P31[1], I0      # spill $I2

It creates a C<PerlArray> object and stores it in register
C<P31>.N<C<P31> is reserved for register spilling in PIR code, so
generally it shouldn't be accessed directly.> The C<set> instruction
is the last time C<$I1> is used for a while, so immediately after
that, Parrot stores its value in the spill array and frees up C<I0> to
be reallocated.

Just before C<$I1> and C<$I2> are accessed to be printed, Parrot
generates code to fetch the values from the spill array:

  ...
  set I0, P31[0]       # fetch $I1
  print I0

You cannot rely on any particular register assignment for temporary
variables or named variables. The register allocator does follow a set
of precedence rules for allocation, but these rules may change. Also,
if two variables have the same score Parrot may assign registers based
on the hashed value of the variable name. Parrot randomizes the seed
to the hash function to guarantee you never get a consistent order.

=head1 Symbol Operators

Z<CHP-10-SECT-3>

X<symbol operators in IMCC>
You probably noticed the C<=> assignment operator in some of the
earlier examples:

  $S2000 = "Hello, Polly.\n"
  print $S2000

Standing alone, it's the same as the PASM C<set> opcode. In fact, if
you run F<parrot> in bytecode debugging mode (as in
A<CHP-11-SECT-4.2>"Assembler Options" in Chapter 11), you'll see it
really is just a C<set> opcode underneath.

PIR has many other symbol operators: arithmetic, concatenation,
comparison, bitwise, and logical. Many of these combine with
assignment to produce the equivalent of a PASM opcode:

  .local int sum
  sum = $I42 + 5
  print sum
  print "\n"

The statement C<sum = $I42 + 5> translates to something like
C<add I16, I17, 5>.

PIR also provides C<+=>, C<-=>, C<<< >>= >>>, ... that map to the
two-argument forms like C<add I16, I17>.

Many PASM opcodes that return a single value also have an alternate
syntax in PIR with the assignment operator:

  $I0 = length str               # length $I0, str
  $I0 = isa PerlInt, "scalar"    # isa $I0, PerlInt, "scalar"
  $I0 = exists hash["key"]       # exists $I0, hash["key"]
  $N0 = sin $N1
  $N0 = atan $N1, $N2
  $S0 = repeat "x", 20
  $P0 = newclass "Foo"
  ...

A complete list of PIR operators is available in A<CHP-11>Chapter 11.
We'll discuss the comparison operators in A<CHP-10-SECT-3>"Symbol
Operators" later in this chapter.

=head1 Labels

Z<CHP-10-SECT-4>

X<PIR (Parrot intermediate representation);labels>
X<labels (PIR)>
Like PASM, any line can start with a label definition like C<LABEL:>,
but label definitions can also stand on their own line.

PIR code has both local and global labels. Global labels start with an
underscore. The name of a global label has to be unique, since it can
be called at any point in the program. Local labels start with a
letter. A local label is accessible only in the compilation unit where
it's defined.N<We'll discuss compilation units in the next section.>
The name has to be unique there, but it can be reused in a different
compilation unit.

  branch L1   # local label
  bsr    _L2  # global label

Labels are most often used in branching instructions and in
subroutine calls.

=head2 Compilation Units

Z<CHP-10-SECT-4.1>

X<PIR (Parrot intermediate representation);compilation units>
X<compilation units (PIR)>
Compilation units in PIR are roughly equivalent to the subroutines or
methods of a high-level language. Though they will be explained in
more detail later, we introduce them here because all code in a PIR
source file must be defined in a compilation unit. The simplest syntax
for a PIR compilation unit starts with the C<.sub> directive and ends
with the C<.end> directive:

  .sub _main
      print "Hello, Polly.\n"
      end
  .end

This example defines a compilation unit named C<_main> that prints a
string. The name is actually a global label for this piece of code. If
you generate a PASM file from the PIR code (see the end of the
A<CHP-10-SECT-2.2>"Temporary Registers" section earlier in this
chapter), you'll see that the name translates to an ordinary label:

  _main:
          print "Hello, Polly.\n"
          end


The first compilation unit in a file is normally executed first, but
as in PASM you can flag any compilation unit as the first one to
execute with the C<@MAIN> marker. The convention is to name the first
compilation unit C<_main>, but the name isn't critical.

  .sub _first
      print "Polly want a cracker?\n"
      end
  .end

  .sub _main @MAIN
      print "Hello, Polly.\n"
      end
  .end

This code prints out "Hello, Polly." but not "Polly want a cracker?":

The A<CHP-10-SECT-6>"Subroutines" section later in this chapter goes
into much more detail about compilation units and their uses.

=head1 Flow Control

Z<CHP-10-SECT-5>

X<PIR (Parrot intermediate representation);flow control>
X<flow control;in PIR>
As in PASM, flow control in PIR is done entirely with conditional and
unconditional branches. This may seem simplistic, but remember that
PIR is a thin overlay on the assembly language of a virtual processor.
For the average assembly language, jumps are the fundamental unit of
flow control.

X<goto instruction (PIR)>
Any PASM branch instruction is valid, but PIR has some high-level
constructs of its own. The most basic is the unconditional branch:
C<goto>.

  .sub _main
      goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

The first C<print> statement never runs because the C<goto> always
skips over it to the label C<L1>.

The conditional branches combine C<if> or C<unless> with C<goto>.

  .sub _main
      $I0 = 42
      if $I0 goto L1
      print "never printed"
  L1: print "after branch\n"
      end
  .end

X<if (conditional);instruction (PIR)>
X<unless (conditional);instruction (PIR)>
In this example, the C<goto> branches to the label C<L1> only if the
value stored in C<$I0> is true. The C<unless> statement is quite
similar, but branches when the tested value is false. An undefined
value, 0, or an empty string are all false values. The C<if ... goto>
statement translates directly to the PASM C<if>, and C<unless>
translates to the PASM C<unless>.

The comparison operators (C<E<lt>>, C<E<lt>=>, C<==>, C<!=>, C<E<gt>>,
C<E<gt>=>) can combine with C<if ...  goto>. These branch when the
comparison is true:

  .sub _main
      $I0 = 42
      $I1 = 43
      if $I0 < $I1 goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

This example compares C<$I0> to C<$I1> and branches to the label C<L1>
if C<$I0> is less than C<$I1>. The C<if $I0 E<lt> $I1 goto L1>
statement translates directly to the PASM C<lt> branch operation.

The rest of the comparison operators are summarized in
A<CHP-11-SECT-3>"PIR Instructions" in Chapter 11.

X<loops;PIR>
X<PIR (Parrot intermediate representation);loop constructs>
PIR has no special loop constructs. A combination of conditional and
unconditional branches handle iteration:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      $I0 = $I0 * $I1
      dec $I1
      if $I1 > 0 goto REDO  # end of loop

      print $I0
      print "\n"
      end
  .end

X<do-while style loop;(PIR)>
This example calculates the factorial C<5!>. Each time through the
loop it multiplies C<$I0> by the current value of the counter C<$I1>,
decrements the counter, and then branches to the start of the loop.
The loop ends when C<$I1> counts down to 0 so that the C<if> doesn't
branch to C<REDO>. This is a I<do while>-style loop with the condition
test at the end, so the code always runs the first time through.

X<while-style loop (PIR)>
For a I<while>-style loop with the condition test at the start, use a
conditional branch together with an unconditional branch:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      if $I1 <= 0 goto LAST
      $I0 = $I0 * $I1
      dec $I1
      goto REDO
  LAST:                     # end of loop

      print $I0
      print "\n"
      end
  .end

This example tests the counter C<$I1> at the start of the loop. At the
end of the loop, it unconditionally branches back to the start of the
loop and tests the condition again. The loop ends when the counter
C<$I1> reaches 0 and the C<if> branches to the C<LAST> label.  If the
counter isn't a positive number before the loop, the loop never
executes.

Any high-level flow control construct can be built from conditional
and unconditional branches.

=cut

# vim: expandtab shiftwidth=2 tw=70:
syntax highlighted by Code2HTML, v. 0.9.1