(division), and CX (exponent) opcodes, as well as two different modulus operations. CX is Parrot's implementation of modulus, and CX is the C<%> operator from the C library. It also provides CX (greatest common divisor) and CX (least common multiple). div I0, 12, 5 # I0 = 12 / 5 mod I0, 12, 5 # I0 = 12 % 5 =head3 Floating-point operations Z X Although most of the math operations work with both floating-point numbers and integers, a few require floating-point destination registers. Among these are C (natural log), C (log base 2), C (log base 10), and C (IG), as well as a full set of trigonometric opcodes such as C (sine), C (cosine), C (tangent), C (secant), C (hyperbolic cosine), C (hyperbolic tangent), C (hyperbolic secant), C (arc sine), C (arc cosine), C (arc tangent), C (arc secant), C (exsecant), C (haversine), and C (versine). All angle arguments for the X trigonometric functions are in radians: sin N1, N0 exp N1, 2 The majority of the floating-point operations have a single source argument and a single destination argument. Even though the destination must be a floating-point register, the source can be either an integer or floating-point number. The CX opcode also has a three-argument variant that implements C's C: atan N0, 1, 1 =head2 Working with Strings Z X The string operations work with string registers and with PMCs that implement a string class. Most operations on string registers generate new strings in the destination register. Some operations have an optimized form that modifies an existing string in place. These are denoted by an C<_r> suffix, as in C. Please note that C has been deprecated. String operations on PMC registers require all their string arguments to be PMCs. =head3 Concatenating strings Z X Use the CX opcode to concatenate strings. With string register or string constant arguments, C has both a two-argument and a three-argument form. The first argument is a source and a destination in the two-argument form: set S0, "ab" concat S0, "cd" # S0 has "cd" appended print S0 # prints "abcd" print "\n" concat S1, S0, "xy" # S1 is the string S0 with "xy" appended print S1 # prints "abcdxy" print "\n" end The first C concatenates the string "cd" onto the string "ab" in C. It generates a new string "abcd" and changes C to point to the new string. The second C concatenates "xy" onto the string "abcd" in C and stores the new string in C. X For PMC registers, C has only a three-argument form with separate registers for source and destination: new P0, "String" new P1, "String" new P2, "String" set P0, "ab" set P1, "cd" concat P2, P0, P1 print P2 # prints abcd print "\n" end Here, C concatenates the strings in C and C and stores the result in C. =head3 Repeating strings Z X The CX opcode repeats a string a certain number of times: set S0, "x" repeat S1, S0, 5 # S1 = S0 x 5 print S1 # prints "xxxxx" print "\n" end In this example, C generates a new string with "x" repeated five times and stores a pointer to it in C. =head3 Length of a string Z X The CX opcode returns the length of a string in characters. This won't be the same as the length in bytes for multibyte encoded strings: set S0, "abcd" length I0, S0 # the length is 4 print I0 print "\n" end Currently, C doesn't have an equivalent for PMC strings, but it probably will be implemented in the future. =head3 Substrings Z X The simplest version of the CX opcode takes four arguments: a destination register, a string, an offset position, and a length. It returns a substring of the original string, starting from the offset position (0 is the first character) and spanning the length: substr S0, "abcde", 1, 2 # S0 is "bc" This example extracts a two-character string from "abcde" at a one-character offset from the beginning of the string (starting with the second character). It generates a new string, "bc", in the destination register C. When the offset position is negative, it counts backward from the end of the string. So an offset of -1 starts at the last character of the string. C also has a five-argument form, where the fifth argument is a string to replace the substring. This modifies the second argument and returns the removed substring in the destination register. set S1, "abcde" substr S0, S1, 1, 2, "XYZ" print S0 # prints "bc" print "\n" print S1 # prints "aXYZde" print "\n" end This replaces the substring "bc" in C with the string "XYZ", and returns "bc" in C. When the offset position in a replacing C is one character beyond the original string length, C appends the replacement string just like the C opcode. If the replacement string is an empty string, the characters are just removed from the original string. When you don't need to capture the replaced string, there's an optimized version of C that just does a replace without returning the removed substring. set S1, "abcde" substr S1, 1, 2, "XYZ" print S1 # prints "aXYZde" print "\n" end The PMC versions of C are not yet implemented. =head3 Chopping strings Z X The CX opcode removes characters from the end of a string. It takes two arguments: the string to modify and the count of characters to remove. set S0, "abcde" chopn S0, 2 print S0 # prints "abc" print "\n" end This example removes two characters from the end of C. If the count is negative, that many characters are kept in the string: set S0, "abcde" chopn S0, -2 print S0 # prints "ab" print "\n" end This keeps the first two characters in C and removes the rest. C also has a three-argument version that stores the chopped string in a separate destination register, leaving the original string untouched: set S0, "abcde" chopn S1, S0, 1 print S1 # prints "abcd" print "\n" end =head3 Copying strings Z X The CX opcode makes a deep copy of a string or PMC. Instead of just copying the pointer, as normal assignment would, it recursively copies the string or object underneath. new P0, "String" set P0, "Ford" clone P1, P0 set P0, "Zaphod" print P1 # prints "Ford" end This example creates an identical, independent clone of the PMC in C and puts a pointer to it in C. Later changes to C have no effect on C. With simple strings, the copy created by C, as well as the results from C, are copy-on-write (COW). These are rather cheap in terms of memory usage because the new memory location is only created when the copy is assigned a new value. Cloning is rarely needed with ordinary string registers since they always create a new memory location on assignment. =head3 Converting characters Z X The CX opcode takes an integer value and returns the corresponding character as a one-character string, while the CX opcode takes a single character string and returns the integer that represents that character in the string's encoding: chr S0, 65 # S0 is "A" ord I0, S0 # I0 is 65 C has a three-argument variant that takes a character offset to select a single character from a multicharacter string. The offset must be within the length of the string: ord I0, "ABC", 2 # I0 is 67 A negative offset counts backward from the end of the string, so -1 is the last character. ord I0, "ABC", -1 # I0 is 67 =head3 Formatting strings Z X The CX opcode generates a formatted string from a series of values. It takes three arguments: the destination register, a string specifying the format, and an ordered aggregate PMC (like a C) containing the values to be formatted. The format string and the destination register can be either strings or PMCs: sprintf S0, S1, P2 sprintf P0, P1, P2 The format string is similar to the one for C's C function, but with some extensions for Parrot data types. Each format field in the string starts with a C<%> X<% (percent sign);% format strings for sprintf opcode (PASM)> and ends with a character specifying the output format. The output format characters are listed in ATable 9-1. =begin table picture Format characters Z =headrow =row =cell Format =cell Meaning =bodyrows =row =cell C<%c> =cell A character. =row =cell C<%d> =cell A decimal integer. =row =cell C<%i> =cell A decimal integer. =row =cell C<%u> =cell An unsigned integer. =row =cell C<%o> =cell An octal integer. =row =cell C<%x> =cell A hex integer, preceded by 0x when # is specified. =row =cell C<%X> =cell A hex integer with a capital X (when # is specified). =row =cell C<%b> =cell A binary integer, preceded by 0b when # is specified. =row =cell C<%B> =cell A binary integer with a capital B (when # is specified). =row =cell C<%p> =cell A pointer address in hex. =row =cell C<%f> =cell A floating-point number. =row =cell C<%e> =cell A floating-point number in scientific notation (displayed with a lowercase "e"). =row =cell C<%E> =cell The same as C<%e>, but displayed with an uppercase E. =row =cell C<%g> =cell The same as either C<%e> or C<%f>, whichever fits best. =row =cell C<%G> =cell The same as C<%g>, but displayed with an uppercase E. =row =cell C<%s> =cell A string. =end table Each format field can be specified with several options: R, R, R, and R. The format flags are listed in ATable 9-2. =begin table picture Format flags Z =headrow =row =cell Flag =cell Meaning =bodyrows =row =cell 0 =cell Pad with zeros. =row =cell EspaceE =cell Pad with spaces. =row =cell C<+> =cell Prefix numbers with a sign. =row =cell C<-> =cell Align left. =row =cell C<#> =cell Prefix a leading 0 for octal, 0x for hex, or force a decimal point. =end table The R is a number defining the minimum width of the output from a field. The R is the maximum width for strings or integers, and the number of decimal places for floating-point fields. If either R or R is an asterisk (C<*>), it takes its value from the next argument in the PMC. The R modifier defines the type of the argument the field takes. The flags are listed in ATable 9-3. =begin table picture Size flags Z =headrow =row =cell Character =cell Meaning =bodyrows =row =cell C =cell short or float =row =cell C =cell long =row =cell C =cell huge value (long long or long double) =row =cell C =cell INTVAL or FLOATVAL =row =cell C =cell opcode_t =row =cell C

=cell C =row =cell C =cell string =end table The values in the aggregate PMC must have a type compatible with the specified R. Here's a short illustration of string formats: new P2, "Array" new P0, "Int" set P0, 42 push P2, P0 new P1, "Num" set P1, 10 push P2, P1 sprintf S0, "int %#Px num %+2.3Pf\n", P2 print S0 # prints "int 0x2a num +10.000" print "\n" end The first eight lines create a C with two elements: a C and a C. The format string of the C has two format fields. The first, C<%#Px>, takes a PMC argument from the aggregate (C

~~) and formats it as a hexadecimal integer (C), with a leading 0x (C<#>). The second format field, C<%+2.3Pf>, takes a PMC argument (C~~

) and formats it as a floating-point number (C), with a minimum of two whole digits and a maximum of three decimal places (C<2.3>) and a leading sign (C<+>). The test files F and F have many more examples of format strings. =head3 Testing for substrings Z X The CX opcode searches for a substring within a string. If it finds the substring, it returns the position where the substring was found as a character offset from the beginning of the string. If it fails to find the substring, it returns -1: index I0, "Beeblebrox", "eb" print I0 # prints 2 print "\n" index I0, "Beeblebrox", "Ford" print I0 # prints -1 print "\n" end C also has a four-argument version, where the fourth argument defines an offset position for starting the search: index I0, "Beeblebrox", "eb", 3 print I0 # prints 5 print "\n" end This finds the second "eb" in "Beeblebrox" instead of the first, because the search skips the first three characters in the string. =head3 Joining strings The C opcode joins the elements of an array PMC into a single string. The second argument separates the individual elements of the PMC in the final string result. new P0, "Array" push P0, "hi" push P0, 0 push P0, 1 push P0, 0 push P0, "parrot" join S0, "__", P0 print S0 # prints "hi__0__1__0__parrot" end This example builds a C in C with the values C<"hi">, C<0>, C<1>, C<0>, and C<"parrot">. It then joins those values (separated by the string C<"__">) into a single string, and stores it in C. =head3 Splitting strings Splitting a string yields a new array containing the resulting substrings of the original string. Since regular expressions aren't implemented yet, the current implementation of the C opcode just splits individual characters, much like Perl 5's C with an empty pattern. split P0, "", "abc" set P1, P0[0] print P1 # 'a' set P1, P0[2] print P1 # 'c' end This example splits the string "abc" into individual characters and stores them in an array in C. It then prints out the first and third elements of the array. For now, the split pattern (the second argument to the opcode) is ignored except for a test to make sure that its length is zero. =head2 I/O Operations Z X The I/O subsystem has at least one set of significant revisions ahead, so you can expect this section to change. It's worth an introduction, though, because the basic set of opcodes is likely to stay the same, even if their arguments and underlying functionality change. =head3 Open and close a file Z The CX opcode opens a file for access. It takes three arguments: a destination register, the name of the file, and a modestring. It returns a CX object on success and a CX X object on failure. The C object hides OS-specific details. open P0, "people.txt", "<" The modestring specifies whether the file is opened in read-only (C>), write-only (C>), read-write (C<+E>), or append mode (CE>). The CX opcode closes a C object: close P0 # close a PIO =head3 Output operations Z We already saw the CX opcode in several examples above. The one argument form prints a register or constant to C. It also has a two-argument form: the first argument is the C object where the value is printed. print P0, "xxx" # print to PIO in P0 The C, C, and C opcodes return C objects for the C streams: getstdin P0 gestdout P0 getstderr P0 Printing to C has a shortcut: printerr "troubles" getstderr P10 print P10, "troubles" # same =head3 Reading from files Z The CX opcode reads a specified number of bytes from either C or a C object: read S0, I0 # read from stdin up to I0 bytes into S0 read S0, P0, I0 # read from the PIO in P0 up to I0 bytes CX is a variant of C that works with C objects. It reads a whole line at a time, terminated by the newline character: getstdin P0 readline S0, P0 # read a line from stdin The CX opcode sets the current file position on a C object. It takes four arguments: a destination register, a C object, an offset, and a flag specifying the origin point: seek I0, P0, I1, I2 In this example, the position of C is set by an offset (C) from an origin point (C). 0 means the offset is from the start of the file, 1 means the offset is from the current position, and 2 means the offset is from the end of the file. The return value (in C) is 0 when the position is successfully set and -1 when it fails. C also has a five-argument form that seeks with a 64-bit offset, constructed from two 32-bit arguments. =head2 Logical and Bitwise Operations Z X X The X logical opcodes evaluate the truth of their arguments. They're often used to make decisions on control flow. Logical operations are implemented for integers and PMCs. Numeric values are false if they're 0, and true otherwise. Strings are false if they're the empty string or a single character "0", and true otherwise. PMCs are true when their CX vtable method returns a nonzero value. The CX opcode returns the second argument if it's false and the third argument otherwise: and I0, 0, 1 # returns 0 and I0, 1, 2 # returns 2 The CX opcode returns the second argument if it's true and the third argument otherwise: or I0, 1, 0 # returns 1 or I0, 0, 2 # returns 2 or P0, P1, P2 Both C and C are short-circuiting. If they can determine what value to return from the second argument, they'll never evaluate the third. This is significant only for PMCs, as they might have side effects on evaluation. The CX opcode returns the second argument if it is the only true value, returns the third argument if it is the only true value, and returns false if both values are true or both are false: xor I0, 1, 0 # returns 1 xor I0, 0, 1 # returns 1 xor I0, 1, 1 # returns 0 xor I0, 0, 0 # returns 0 The CX opcode returns a true value when the second argument is false, and a false value if the second argument is true: not I0, I1 not P0, P1 The X bitwise opcodes operate on their values a single bit at a time. CX, CX, and CX return a value that is the logical AND, OR, or XOR of each bit in the source arguments. They each take a destination register and two source registers. They also have two-argument forms where the destination is also a source. CX is the logical NOT of each bit in a single source argument. bnot I0, I1 band P0, P1 bor I0, I1, I2 bxor P0, P1, I2 X The bitwise opcodes also have string variants for AND, OR, and XOR: CX, CX, and CX. These take string register or PMC string source arguments and perform the logical operation on each byte of the strings to produce the final string. bors S0, S1 bands P0, P1 bors S0, S1, S2 bxors P0, P1, I2 The bitwise string opcodes only have meaningful results when they're used with simple ASCII strings because the bitwise operation is done per byte. The logical and arithmetic shift operations shift their values by a specified number of bits: shl I0, I1, I2 # shift I1 left by count I2 giving I0 shr I0, I1, I2 # arithmetic shift right lsr P0, P1, P2 # logical shift right =head1 Working with PMCs Z In most of the examples we've shown so far, X PMCs just duplicate the functionality of integers, numbers, and strings. They wouldn't be terribly useful if that's all they did, though. PMCs offer several advanced features, each with its own set of operations. =head2 Aggregates Z PMCs can define complex types that hold multiple values. These are commonly called "X X aggregates." The most important feature added for aggregates is keyed access. Elements within an aggregate PMC can be stored and retrieved by a numeric or string key. PASM also offers a full set of operations for manipulating aggregate data types. Since PASM is intended to implement Perl, the two most fully featured aggregates already in operation are arrays and hashes. Any aggregate defined for any language could take advantage of the features described here. =head3 Arrays Z X The CX PMC is an ordered aggregate with zero-baed integer keys. The syntax for X keyed access to a PMC puts the key in square brackets after the register name: new P0, "Array" # obtain a new array object set P0, 2 # set its length set P0[0], 10 # set first element to 10 set P0[1], I31 # set second element to I31 set I0, P0[0] # get the first element set I1, P0 # get array length A key on the destination register of a C operation sets a value for that key in the aggregate. A key on the source register of a C returns the value for that key. If you set C without a key, you set the length of the array, not one of its values.N is an autoextending array, so you never need to set its length. Other array types may require the length to be set explicitly.> And if you assign the C to an integer, you get the length of the array. By the time you read this, the syntax for getting and setting the length of an array may have changed. The change would separate array allocation (how much storage the array provides) from the actual element count. The currently proposed syntax uses C to set or retrieve the allocated size of an array, and an C X opcode to set or retreive the count of elements stored in the array. set P0, 100 # allocate store for 100 elements elements P0, 5 # set element count to 5 set I0, P0 # obtain current allocation size elements I0, P0 # get element count Some other useful instructions for working with arrays are C, C, C, and C (you'll find them in A"PASM Opcodes" in Chapter 11). =head3 Hashes Z X The CX PMC is an unordered aggregate with string keys: new P1, "Hash" # generate a new hash object set P1["key"], 10 # set key and value set I0, P1["key"] # obtain value for key set I1, P1 # number of entries in hash The CX opcode tests whether a keyed value exists in an aggregate. It returns 1 if it finds the key in the aggregate, and returns 0 if it doesn't. It doesn't care if the value itself is true or false, only that the key has been set: new P0, "Hash" set P0["key"], 0 exists I0, P0["key"] # does a value exist at "key" print I0 # prints 1 print "\n" end The CX opcode is also useful for working with hashes: it removes a key/value pair. =head3 Iterators Z Iterators extract values from an aggregate PMC. You create an iterator by creating a new C PMC, and passing the array to C as an additional parameter: new P1, "Iterator", P2 The include file F defines some constants for working with iterators. The C<.ITERATE_FROM_START> and C<.ITERATE_FROM_END> constants are used to select whether an array iterator starts from the beginning or end of the array. The C opcode extracts values from the array. An iterator PMC is true as long as it still has values to be retrieved (tested by C below). .include "iterator.pasm" new P2, "Array" push P2, "a" push P2, "b" push P2, "c" new P1, "Iterator", P2 set P1, .ITERATE_FROM_START iter_loop: unless P1, iter_end shift P5, P1 print P5 # prints "a", "b", "c" branch iter_loop iter_end: end Hash iterators work similarly to array iterators, but they extract keys. With hashes it's only meaningful to iterate in one direction, since they don't define any order for their keys. .include "iterator.pasm" new P2, "Hash" set P2["a"], 10 set P2["b"], 20 set P2["c"], 30 new P1, "Iterator", P2 set P1, .ITERATE_FROM_START_KEYS iter_loop: unless P1, iter_end shift S5, P1 # one of the keys "a", "b", "c" set I9, P2[S5] print I9 # prints e.g. 20, 10, 30 branch iter_loop iter_end: end =head3 Data structures Z X Arrays and hashes can hold any data type, including other aggregates. Accessing elements deep within nested data structures is a common operation, so PASM provides a way to do it in a single instruction. Complex keys specify a series of nested data structures, with each individual key separated by a semicolon: new P0, "Hash" new P1, "Array" set P1[2], 42 set P0["answer"], P1 set I1, 2 set I0, P0["answer";I1] # $i = %hash{"answer"}[2] print I0 print "\n" end This example builds up a data structure of a hash containing an array. The complex key C retrieves an element of the array within the hash. You can also set a value using a complex key: set P0["answer";0], 5 # %hash{"answer"}[0] = 5 The individual keys are integers or strings, or registers with integer or string values. =head2 PMC Assignment Z We mentioned before that C on two X PMCs simply aliases them both to the same object, and that C creates a complete duplicate object. But if you just want to assign the value of one PMC to another PMC, you need the CX opcode: new P0, "Int" new P1, "Int" set P0, 42 set P2, P0 assign P1, P0 # note: P1 has to exist already inc P0 print P0 # prints 43 print "\n" print P1 # prints 42 print "\n" print P2 # prints 43 print "\n" end This example creates two C PMCs: C and C. It gives C a value of 42. It then uses C to give the same value to C, but uses C to give the value to C. When C is incremented, C also changes, but C doesn't. The destination register for C must have an existing object of the right type in it, since C doesn't create a new object (as with C) or reuse the source object (as with C). =head2 Properties Z X PMCs can have additional values attached to them as "properties" of the PMC. What these properties do is entirely up to the language being implemented. Perl 6 uses them to store extra information about a variable: whether it's a constant, if it should always be interpreted as a true value, etc. The CX opcode sets the value of a named property on a PMC. It takes three arguments: the PMC to be set with a property, the name of the property, and a PMC containing the value of the property. The CX opcode returns the value of a property. It also takes three arguments: the PMC to store the property's value, the name of the property, and the PMC from which the property value is to be retrieved: new P0, "String" set P0, "Zaphod" new P1, "Int" set P1, 1 setprop P0, "constant", P1 # set a property on P0 getprop P3, "constant", P0 # retrieve a property on P0 print P3 # prints 1 print "\n" end This example creates a C object in C, and a C object with the value 1 in C. C sets a property named "constant" on the object in C and gives the property the value in C.N C retrieves the value of the property "constant" on C and stores it in C. Properties are kept in a separate hash for each PMC. Property values are always PMCs, but only references to the actual PMCs. Trying to fetch the value of a property that doesn't exist returns a C. CX deletes a property from a PMC. delprop P1, "constant" # delete property You can also return a complete hash of all properties on a PMC with CX. prophash P0, P1 # set P0 to the property hash of P1 =head1 Flow Control Z X Although it has many advanced features, at heart PASM is an assembly language. All flow control in PASM--as in most assembly languages--is done with branches and jumps. Branch instructions transfer control to a relative offset from the current instruction. The rightmost argument to every branch opcode is a label, which the assembler converts to the integer value of the offset. You can also branch on a literal integer value, but there's rarely any need to do so. The simplest branch instruction is C: branch L1 # branch 4 print "skipped\n" L1: print "after branch\n" end This example unconditionally branches to the location of the label C, skipping over the first C statement. Jump instructions transfer control to an absolute address. The C opcode doesn't calculate an address from a label, so it's used together with C: set_addr I0, L1 jump I0 print "skipped\n" end L1: print "after jump\n" end The CX opcode takes a label or an integer offset and returns an absolute address. You've probably noticed the CX opcode as the last statement in many examples above. This terminates the execution of the current run loop. Terminating the main bytecode segment (the first run loop) stops the interpreter. Without the C statement, execution just falls off the end of the bytecode segment, with a good chance of crashing the interpreter. =head2 Conditional Branches Z X X Unconditional jumps and branches aren't really enough for flow control. What you need to implement the control structures of high-level languages is the ability to select different actions based on a set of conditions. PASM has opcodes that conditionally branch based on the truth of a single value or the comparison of two values. The following example has CX and CX conditional branches: set I0, 0 if I0, TRUE unless I0, FALSE print "skipped\n" end TRUE: print "shouldn't happen\n" end FALSE: print "the value was false\n" end C branches if its first argument is a true value, and C branches if its first argument is a false value. In this case, the C doesn't branch because C is false, but the C does branch. The comparison branching opcodes compare two values and branch if the stated relation holds true. These are CX (branch when equal), CX (when not equal), CX (when less than), CX (when greater than), CX (when less than or equal), and CX (when greater than or equal). The two compared arguments must be the same register type: set I0, 4 set I1, 4 eq I0, I1, EQUAL print "skipped\n" end EQUAL: print "the two values are equal\n" end This compares two integers, C and C, and branches if they're equal. Strings of different character sets or encodings are converted to Unicode before they're compared. PMCs have a C vtable method. This gets called on the left argument to perform the comparison of the two objects. The comparison opcodes don't specify if a numeric or string comparison is intended. The type of the register selects for integers, floats, and strings. With PMCs, the vtable method C or C of the first argument is responsible for comparing the PMC meaningfully with the other operand. If you need to force a numeric or string comparison on two PMCs, use the alternate comparison opcodes that end in the C<_num> and C<_str> suffixes. eq_str P0, P1, label # always a string compare gt_num P0, P1, label # always numerically Finally, the C opcode branches if two PMCs or strings are actually the same object (have the same address), and the C opcode branches if a PMC is NULL (has no assigned address): eq_addr P0, P1, same_pmcs_found is_null P2, the_pmc_is_null =head2 Iteration Z X X PASM doesn't define high-level loop constructs. These are built up from a combination of conditional and unconditional branches. A IX style loop can be constructed with a single conditional branch: set I0, 0 set I1, 10 REDO: inc I0 print I0 print "\n" lt I0, I1, REDO end This example prints out the numbers 1 to 10. The first time through, it executes all statements up to the C statement. If the condition evaluates as true (C is less than C) it branches to the C label and runs the three statements in the loop body again. The loop ends when the condition evaluates as false. Conditional and unconditional branches can build up quite complex looping constructs, as follows: # loop ($i=1; $i<=10; $i++) { # print "$i\n"; # } loop_init: set I0, 1 branch loop_test loop_body: print I0 print "\n" branch loop_continue loop_test: le I0, 10, loop_body branch out loop_continue: inc I0 branch loop_test out: end X X This example emulates a X counter-controlled loop like Perl 6's C keyword or C's C. The first time through the loop it sets the initial value of the counter in C, tests that the loop condition is met in C, and then executes the body of the loop in C. If the test fails on the first iteration, the loop body will never execute. The end of C branches to C, which increments the counter and then goes to C again. The loop ends when the condition fails, and it branches to C. The example is more complex than it needs to be just to count to 10, but it nicely shows the major components of a loop. =head1 Stacks and Register Frames Z X Parrot provides 32 registers of each type: integer, floating-point number, string, and PMC. This is a generous number of registers, but it's still too restrictive for the average use. You can hardly limit your code to 32 integers at a time. This is especially true when you start working with subroutines and need a way to store the caller's values and the subroutine's values. So, Parrot also provides stacks for storing values outside the 32 registers. Parrot has seven basic stacks, each used for a different purpose: the user stack, the control stack, the pad stack, and the four register backing stacks. =head2 User Stack Z The X X user stack, also known as the X general-purpose stack, stores individual values. The two main opcodes for working with the user stack are CX, to push a value onto the stack, and CX, to pop one off the stack: save 42 # push onto user stack restore I1 # pop off user stack The one argument to C can be either a constant or a register. The user stack is a typed stack, so C will only pop a value into a register of the same type as the original value: save 1 set I0, 4 restore I0 print I0 # prints 1 end If that restore were C C instead of an integer register, you'd get an exception, "Wrong type on top of stack!" A handful of other instructions are useful for manipulating the user stack. CX rotates a given number of elements on the user stack to put a different element on the top of the stack. The CX opcode returns the number of entries currently on the stack. The CX opcode returns the type of the stack entry at a given depth, and CX returns the value of the element at the given depth without popping the element off the stack: save 1 save 2.3 set S0, "hi\n" save S0 save P0 entrytype I0, 0 print I0 # prints 4 (PMC) entrytype I0, 1 print I0 # prints 3 (STRING) entrytype I0, 2 print I0 # prints 2 (FLOATVAL) entrytype I0, 3 print I0 # prints 1 (INTVAL) print "\n" depth I2 # get entries print I2 # prints 4 print "\n" lookback S1, 1 # get entry at depth 1 print S1 # prints "hi\n" depth I2 # unchanged print I2 # prints 4 print "\n" end This example pushes four elements onto the user stack: an integer, a floating-point number, a string, and a PMC. It checks the C of all four elements and prints them out. It then checks the C of the stack, gets the value of the second element with a C, and checks that the number of elements hasn't changed. =head2 Control Stack Z The X X control stack, also known as the X X call stack, stores return addresses for subroutines called by C and exception handlers. There are no instructions for directly manipulating the control stack. =head2 Register Frames Z The final set of stacks are the X X register backing stacks. Parrot has four backing stacks, one for each type of register. Instead of saving and restoring individual values, the backing stacks work with register frames. Each register frame is the full set of 32 registers for one type. Each frame is separated into two halves: the bottom half (registers 0-15) and the top half (registers 16-32). Some opcodes work with full frames while others work with half-frames. The backing stacks are commonly used for saving the contents of all the registers (or just the top half of each frame) before a subroutine call, so they can be restored when control returns to the caller. PASM has five opcodes for storing full register frames, one for each register type and one that saves all four at once: X X X X X pushi # copy I-register frame pushn # copy N-register frame pushs # copy S-register frame pushp # copy P-register frame saveall # copy all register frames Each C, C, C, or C pushes a register frame containing all the current values of one register type onto the backing stack of that type. C simply calls C, C, C, and C. PASM also has five opcodes to restore full register frames. Again it has one for each register type and one that restores all four at once: X X X X X popi # restore I-register frame popn # restore N-register frame pops # restore S-register frame popp # restore P-register frame restoreall # restore all register frames The C, C, C, and C opcodes pop a single register frame off a particular stack and replace the values in all 32 registers of that type with the values in the restored register frame. C calls C, C, C, and C, restoring every register of every type to values saved earlier. Saving a X register frame to the backing stack doesn't alter the values stored in the registers; it simply copies the values: set I0, 1 print I0 # prints 1 pushi # copy away I0..I31 print I0 # unchanged, still 1 inc I0 print I0 # now 2 popi # restore registers to state of previous pushi print I0 # old value restored, now 1 print "\n" end This example sets the value of C to 1 and stores the complete set of integer registers. Before C is incremented, it has the same value as before the C. In A"Working with Registers" earlier in this chapter we mentioned that string and PMC registers hold pointers to the actual objects. When string or PMC register frames are saved, only the pointers are copied, not the actual contents of the strings or PMCs. The same is true when string or PMC register frames are restored: set S0, "hello" # set S0 to "hello" pushs substr S0, 0, 5, "world" # alter the string in S0 set S0, "test" # set S0 to a new string pops # restores the first string pointer print S0 # prints "world" end In this example, we first use the C opcode to copy the string pointer to the string register frame stack. This gives us two pointers to the same underlying string, with one currently stored in C, and the other saved in the string register frame stack. If we then use C to alter the contents of the string, both pointers will now point to the altered string, and so restoring our original pointer using C does not restore the original string value. Each of the above C and C opcodes has a variant that will save or restore only the top or bottom half of one register set or all the register sets: pushtopi # save I16..I31 popbottoms # restore S0..S15 savetop # save regs 16-31 in each frame restoretop # restore regs 16-31 in each frame PASM also has opcodes to clear individual X register frames: CX, CX, CX, and CX. These reset the numeric registers to 0 values and the string and PMC registers to null pointers, which is the same state that they have when the interpreter first starts. The XX user stack can be useful for holding onto some values that would otherwise be obliterated by a C: # ... coming from a subroutine save I5 # Push some registers save I6 # holding the return values save N5 # of the sub. restoreall # restore registers to state before calling subroutine restore N0 # pop off last pushed restore I0 # pop 2nd restore I1 # and so on =head1 Lexicals and Globals Z So far, we've been treating Parrot registers like the variables of a high-level language. This is fine, as far as it goes, but it isn't the full picture. The dynamic nature and introspective features of languages like Perl make it desirable to manipulate variables by name, instead of just by register or stack location. These languages also have global variables, which are visible throughout the entire program. Storing a global variable in a register would either tie up that register for the lifetime of the program or require unwieldy manipulation of the user stack. Parrot provides structures for storing both global and lexically scoped named variables. Lexical and global variables must be PMC values. PASM provides instructions for storing and retrieving variables from these structures so the PASM opcodes can operate on their values. =head2 Globals Z X Global variables are stored in a C, so every variable name must be unique. PASM has two opcodes for globals, C and C: new P10, "Int" set P10, 42 store_global "$foo", P10 # ... find_global P0, "$foo" print P0 # prints 42 end The first two statements create a C in the PMC register C and give it the value 42. In the third statement, C stores that PMC as the named global variable C<$foo>. At some later point in the program, C retrieves the PMC from the global variable by name, and stores it in C so it can be printed. The C opcode only stores a reference to the object. If we add an increment statement: inc P10 after the C it increments the stored global, printing 43. If that's not what you want, you can C the PMC before you store it. Leaving the global variable as an alias does have advantages, though. If you retrieve a stored global into a register and modify it as follows: find_global P0, "varname" inc P0 the value of the stored global is directly modified, so you don't need to call C again. The two-argument forms of C and C store or retrieve globals from the outermost namespace (what Perl users will know as the "main" namespace). A simple flat global namespace isn't enough for most languages, so Parrot also needs to support hierarchical namespaces for separating packages (classes and modules in Perl 6). The three-argument versions of C and C add an argument to select a nested namespace: store_global "Foo", "var", P0 # store P0 as var in the Foo namespace find_global P1, "Foo", "var" # get Foo::var Eventually the global opcodes will have variants that take a PMC to specify the namespace, but the design and implementation of these aren't finished yet. =head2 Lexicals Z X Lexical variables are stored in a lexical scratchpad. There's one pad for each lexical scope. Every pad has both a hash and an array, so elements can be stored either by name or by numeric index. Parrot stores the scratchpads for nested lexical scopes in a pad stack. =head3 Basic instructions Z The instructions for manipulating lexical scratchpads are C to create a new pad, C to store a variable in a pad, C to retrieve a variable from a pad, C to push a pad onto the pad stack, and C to remove a pad from the stack: new_pad 0 # create and push a pad with depth 0 new P0, "Int" # create a variable set P0, 10 # assign value to it store_lex 0, "$foo", P0 # store the var at depth 0 by name # ... find_lex P1, 0, "$foo" # get the var into P1 print P1 print "\n" # prints 10 pop_pad # remove pad end The first statement creates a new scratchpad and pushes it onto the pad stack. It's created with depth 0, which is the outermost lexical scope. The next two statements create a new PMC object in C, and give it a value. The CX opcode stores the object in C as the named variable C<$foo> in the scratchpad at depth 0. At some later point in the program, the CX opcode retrieves the value of C<$foo> in the pad at depth 0 and stores it in the register C so it can be printed. At the very end, CX removes the pad from the pad stack. The CX opcode has two forms, one that creates a new scratchpad and stores it in a PMC, and another that creates a new scratchpad and immediately pushes it onto the pad stack. If the pad were stored in a PMC, you would have to push it onto the pad stack before you could use it: new_pad P10, 0 # create a new pad in P10 push_pad P10 # push it onto the pad stack In a simple case like this, it really doesn't make sense to separate out the two instructions, but you'll see later in A"Subroutines" why it's valuable to have both. The CX and C X opcodes can take an integer index in place of a name for the variable: store_lex 0, 0, P0 # store by index # ... find_lex P1, 0 # retrieve by index With an index, the variable is stored in the X scratchpad array, instead of the scratchpad hash. =head3 Nested scratchpads Z To create a XX nested scope, you create another scratchpad with a higher depth number and push it onto the pad stack. The outermost scope is always depth 0, and each nested scope is one higher. The pad stack won't allow you to push on a scratchpad that's more than one level higher than the current depth of the top of the stack: new_pad 0 # outer scope new_pad 1 # inner scope new P0, "Int" set P0, 10 store_lex -1, "$foo", P0 # store in top pad new P1, "Int" set P1, 20 store_lex -2, "$foo", P1 # store in next outer scope find_lex P2, "$foo" # find in all scopes print P2 # prints 10 print "\n" find_lex P2, -1, "$foo" # find in top pad print P2 # prints 10 print "\n" find_lex P2, -2, "$foo" # find in next outer scope print P2 # prints 20 print "\n" pop_pad pop_pad end The first two statements create two new scratchpads, one at depth 0 and one at depth 1, and push them onto the pad stack. When C and C have a negative number for the depth specifier, they count backward from the top pad on the stack, so -1 is the top pad, and -2 is the second pad back. In this case, the pad at depth 1 is the top pad, and the pad at depth 0 is the second pad. So: store_lex -1, "$foo", P0 # store in top pad stores the object in C as the named variable C<$foo> in the pad at depth 1. Then: store_lex -2, "$foo", P1 # store in next outer scope stores the object in C as the named variable C<$foo> in the pad at depth 0. A C statement with no depth specified searches every scratchpad in the stack from the top of the stack to the bottom: find_lex P2, "$foo" # find in all scopes Both pad 0 and pad 1 have variables named C<$foo>, but only the value from the top pad is returned. C also has a version with no depth specified, but it only works if the named lexical has already been created at a particular depth. It searches the stack from top to bottom and stores the object in the first lexical it finds with the right name. The C instruction retrieves the top entry on the pad stack into a PMC register, but doesn't pop it off the stack. =head1 Subroutines Z X Subroutines and methods are the basic building blocks of larger programs. At the heart of every subroutine call are two fundamental actions: it has to store the current location so it can come back to it, and it has to transfer control to the subroutine. The CX opcode does both. It pushes the address of the next instruction onto the control stack, and then branches to a label that marks the subroutine: print "in main\n" bsr _sub print "and back\n" end _sub: print "in sub\n" ret At the end of the subroutine, the C instruction pops a location back off the control stack and goes there, returning control to the caller. The CX opcode pushes the current location onto the call stack and jumps to a subroutine. Just like the C opcode, it takes an absolute address in an integer register, so the address has to be calculated first with the CX opcode: print "in main\n" set_addr I0, _sub jsr I0 print "and back\n" end _sub: print "in sub\n" ret =head2 Calling Conventions Z X X X A C or C is fine for a simple subroutine call, but few subroutines are quite that simple. The biggest issues revolve around register usage. Parrot has 32 registers of each type, and the caller and the subroutine share the same set of registers. How does the subroutine keep from destroying the caller's values? More importantly, who is responsible for saving and restoring registers? Where are arguments for the subroutine stored? Where are the subroutine's return values stored? A number of different answers are possible. You've seen how many ways Parrot has of storing values. The critical point is that the caller and the called subroutine have to agree on all the answers. =head3 Reserved registers Z X A very simple system would be to declare that the caller uses registers through 15, and the subroutine uses 16 through 31. This works in a small program with light register usage. But what about a subroutine call from within another subroutine or a recursive call? The solution doesn't extend to a large scale. =head3 Callee saves Z Another possibility is to make the X subroutine responsible for saving the caller's registers: set I0, 42 save I0 # pass args on stack bsr _inc # j = inc(i) restore I1 # restore args from stack print I1 print "\n" end _inc: saveall # preserve all registers restore I0 # get argument inc I0 # do all the work save I0 # push return value restoreall # restore caller's registers ret This example stores arguments to the subroutine and return values from the subroutine on the user stack. The first statement in the C<_inc> subroutine is a C to save all the caller's registers onto the backing stacks, and the last statement before the return restores them. One advantage of this approach is that the subroutine can choose to save and restore only the register frames it actually uses, for a small speed gain. The example above could use C and C instead of C and C because it only uses integer registers. One disadvantage is that it doesn't allow optimization of tail calls, where the last statement of a recursive subroutine is a call to itself. =head3 Parrot calling conventions Z Internal subroutines can use whatever calling convention serves them best. Externally visible subroutines and methods need stricter rules. Since these routines may be called as part of an included library or module and even from a different high level language, it's important to have a consistent interface. Under the X X Parrot calling conventions the caller is responsible for preserving its own registers. The first 11 arguments of each register type are passed in Parrot registers, as are several other pieces of information. X Register usage for subroutine calls is listed in ATable 9-4. X =begin table picture Calling and return conventions Z =headrow =row =cell Register =cell Usage =bodyrows =row =cell C =cell Subroutine/Method object. =row =cell C =cell Return continuation if applicable. =row =cell C =cell Object for a method call (invocant) or NULL for a subroutine call. =row =cell C =cell Array with overflow parameters/return values. =row =cell C =cell Fully qualified method name, if it's a method call. =row =cell C =cell True for prototyped parameters. =row =cell C =cell Number of integer arguments/return results. =row =cell C =cell Number of string arguments/return results. =row =cell C =cell Number of PMC arguments/return results. =row =cell C =cell Number of float arguments/return results. =row =cell C ... C =cell First 11 integer arguments/return results. =row =cell C ... C =cell First 11 float arguments/return results. =row =cell C ... C =cell First 11 string arguments/return results. =row =cell C ... C =cell First 11 PMC arguments/return results. =end table If there are more than 11 arguments or return values of one type for the subroutine, overflow parameters are passed in an array in C. Subroutines without a prototype pass all their arguments or return values in C

registers and if needed in the overflow array.N The C<_inc> subroutine from above can be rewritten as a prototyped subroutine: set I16, 42 # use local regs from 16..31 newsub P0, .Sub, _inc # create a new Sub object set I5, I16 # first integer argument set I0, 1 # prototype used set I1, 1 # one integer argument null I2 # no string arguments null I3 # no PMC arguments null I4 # no numeric arguments null P2 # no object (invocant) pushtopi # preserve top I register frame invokecc # call function object in P0 poptopi # restore registers print I5 print "\n" # I16 is still valid here, whatever the subroutine did end .pcc_sub _inc: inc I5 # do all the work set I0, 1 # prototyped return set I1, 1 # one retval in I5 null I2 # nothing else null I3 null I4 invoke P1 # return from the sub Instead of using a simple C, this set of conventions uses a subroutine object. There are several kinds of subroutine-like objects, but C_{is a class for PASM subroutines.

The C<.pcc_sub> directive defines globally accessible subroutine
objects. The C<_inc> function above can be found as:

find_global P20, "_inc"

Subroutine objects of all kinds can be called with the
CX opcode. With no arguments, it calls the
subroutine in C, which is the standard for the Parrot calling
conventions. There is also an C C> instruction for
calling objects held in a different register.

The CX opcode is like C, but it
also creates and stores a new return continuation in C. When the
called subroutine invokes this return continuation, it returns control
to the instruction after the function call. This kind of call is known
as Continuation Passing Style (CPS).
X
X

In a simple example like this it isn't really necessary to set up all
the registers to obey to the Parrot calling conventions. But when you
call into library code, the subroutine is likely to check the number
and type of arguments passed to it. So it's always a good idea to
follow the full conventions. This is equally true for return values.
The caller might check how many arguments the subroutine really
returned.

Setting all these registers for every subroutine call might look
wasteful at first glance, and it does increase the size of the
bytecode, but you don't need to worry about execution time: the I
system executes each register setup opcode in one CPU cycle.

=head2 Native Call Interface

Z

X
A special version of the Parrot calling conventions are used by the
X Native Call Interface (NCI) for calling
subroutines with a known prototype in shared libraries. This is not
really portable across all libraries, but it's worth a short example.
This is a simplified version of the first test in F:

loadlib P1, "libnci" # get library object for a shared lib
print "loaded\n"
dlfunc P0, P1, "nci_dd", "dd" # obtain the function object
print "dlfunced\n"
set I0, 1 # prototype used - unchecked
set N5, 4.0 # first argument
invoke # call nci_dd
ne N5, 8.0, nok_1 # the test functions returns 2*arg
print "ok 1\n"
end
nok_1:
...

This example shows two new instructions: C and C. The
CX opcode obtains a handle for a shared
library. It searches for the shared library in the current directory,
in F, and in a few other configured
directories. It also tries to load the provided filename unaltered and
with appended extensions like C<.so> or C<.dll>. Which extensions it
tries depends on the OS Parrot is running on.

The CX opcode gets a function object from a
previously loaded library (second argument) of a specified name (third
argument) with a known function signature (fourth argument). The
function signature is a string where the first character is the return
value and the rest of the parameters are the function parameters. The
characters used in X
NCI function signatures are listed in ATable 9-5.

=begin table picture Function signature letters

Z

=headrow

=row

=cell Character

=cell Register set

=cell C type

=bodyrows

=row

=cell C

=cell -

=cell void (no return value)

=row

=cell C

=cell C

=cell char

=row

=cell C

=cell C

=cell short

=row

=cell C

=cell C

=cell int

=row

=cell C

=cell C

=cell long

=row

=cell C

=cell C

=cell float

=row

=cell C

=cell C

=cell double

=row

=cell C

=cell C

=cell char *

=row

=cell C}

~~~~~~=cell C~~~~~~

=cell void * (or other pointer) =row =cell C =cell - =cell Parrot_Interp *interpreter =row =cell C =cell - =cell a callback function pointer =row =cell C =cell - =cell a callback function pointer =row =cell C =cell C

~~~~~~=cell the subroutine C or C calls into =row =cell C =cell C~~~~~~

=cell the argument for C =end table For more information on callback functions, read the documentation in F and F. =head2 Closures Z X X A closure is a subroutine that retains values from the lexical scope where it was defined, even when it's called from an entirely different scope. The closure shown here is equivalent to this Perl 5 code snippet: # sub foo { # my ($n) = @_; # sub {$n += shift} # } # my $closure = foo(10); # print &$closure(3), "\n"; # print &$closure(20), "\n"; # call _foo newsub P16, .Sub, _foo # new subroutine object at address _foo new P17, "Int" # value for $n set P17, 10 # we use local vars from P16 ... set P0, P16 # the subroutine set P5, P17 # first argument pushtopp # save registers invokecc # call foo poptopp # restore registers set P18, P5 # the returned closure # call _closure new P19, "Int" # argument to closure set P19, 3 set P0, P18 # the closure set P5, P19 # one argument pushtopp # save registers invokecc # call closure(3) poptopp print P5 # prints 13 print "\n" # call _closure set P19, 20 # and again set P5, P19 set P0, P18 pushtopp invokecc # call closure(20) poptopp print P5 # prints 33 print "\n" end _foo: new_pad 0 # push a new pad store_lex -1, "$n", P5 # store $n newsub P5, .Closure, _closure # P5 has the lexical "$n" in the pad invoke P1 # return _closure: find_lex P16, "$n" # invoking the closure pushes the lexical pad # of the closure on the pad stack add P16, P5 # $n += shift set P5, P16 # set return value invoke P1 # return That's quite a lot of PASM code for such a little bit of Perl 5 code, but anonymous subroutines and closures hide a lot of magic under that simple interface. The core of this example is that when the new subroutine is created in C<_foo> with: newsub P5, .Closure, _closure it inherits and stores the current lexical scratchpad--the topmost scratchpad on the pad stack at the time. Later, when C<_closure> is invoked from the main body of code, the stored pad is automatically pushed onto the pad stack. So, all the lexical variables that were available when C<_closure> was defined are available when it's called. =head2 Coroutines Z As we mentioned in the previous chapter, coroutines are X subroutines that can suspend themselves and return control to the caller--and then pick up where they left off the next time they're called, as if they never left. X In PASM, coroutines are subroutine-like objects: newsub P0, .Coroutine, _co_entry The C object has its own user stack, register frame stacks, control stack, and pad stack. The pad stack is inherited from the caller. The coroutine's control stack has the caller's control stack prepended, but is still distinct. When the coroutine invokes itself, it returns to the caller and restores the caller's context (basically swapping all stacks). The next time the coroutine is invoked, it continues to execute from the point at which it previously returned: new_pad 0 # push a new lexical pad on stack new P0, "Int" # save one variable in it set P0, 10 store_lex -1, "var", P0 newsub P0, .Coroutine, _cor # make a new coroutine object saveall # preserve environment invoke # invoke the coroutine restoreall print "back\n" saveall invoke # invoke coroutine again restoreall print "done\n" pop_pad end _cor: find_lex P1, "var" # inherited pad from caller print "in cor " print P1 print "\n" inc P1 # var++ saveall invoke # yield( ) restoreall print "again " branch _cor # next invocation of the coroutine This prints out the result: in cor 10 back again in cor 11 done X The C inside the coroutine is commonly referred to as I. The coroutine never ends. When it reaches the bottom, it branches back up to C<_cor> and executes until it hits C again. The interesting part about this example is that the coroutine yields in the same way that a subroutine is called. This means that the coroutine has to preserve its own register values. This example uses C but it could have only stored the registers the coroutine actually used. Saving off the registers like this works because coroutines have their own register frame stacks. =head2 Continuations Z X X A continuation is a subroutine that gets a complete copy of the caller's context, including its own copy of the call stack. Invoking a continuation starts or restarts it at the entry point: new P1, "Int" set P1, 5 newsub P0, .Continuation, _con _con: print "in cont " print P1 print "\n" dec P1 unless P1, done invoke # P0 done: print "done\n" end This prints: in cont 5 in cont 4 in cont 3 in cont 2 in cont 1 done =head2 Evaluating a Code String Z XThis isn't really a subroutine operation, but it does produce a code object that can be invoked. In this case, it's a X bytecode segment object. The first step is to get an assembler or compiler for the target language: compreg P1, "PASM" Within the Parrot interpreter there are currently three registered languages: C, C, and C. The first two are for parrot assembly language and parrot intermediate represention code. The third is for evaluating single statements in PASM. Parrot automatically adds an C opcode at the end of C strings before they're compiled. This example places a bytecode segment object into the destination register C and then invokes it with C: compreg P1, "PASM1" # get compiler set S1, "in eval\n" compile P0, P1, "print S1" invoke # eval code P0 print "back again\n" end You can register a compiler or assembler for any language inside the Parrot core and use it to compile and invoke code from that language. These compilers may be written in PASM or reside in shared libraries. compreg "MyLanguage", P10 In this example the C opcode registers the subroutine-like object C as a compiler for the language "MyLanguage". See F and F for an external compiler in a shared library. =head1 Exceptions and Exception Handlers Z X X Exceptions provide a way of calling a piece of code outside the normal flow of control. They are mainly used for error reporting or cleanup tasks, but sometimes exceptions are just a funny way to branch from one code location to another one. The design and implementation of exceptions in Parrot isn't complete yet, but this section will give you an idea where we're headed. Exceptions are objects that hold all the information needed to handle the exception: the error message, the severity and type of the error, etc. The class of an exception object indicates the kind of exception it is. Exception handlers are derived from continuations. They are ordinary subroutines that follow the Parrot calling conventions, but are never explicitly called from within user code. User code pushes an exception handler onto the control stack with the CX opcode. The system calls the installed exception handler only when an exception is thrown (perhaps because of code that does division by zero or attempts to retrieve a global that wasn't stored.) newsub P20, .Exception_Handler, _handler set_eh P20 # push handler on control stack null P10 # set register to null find_global P10, "none" # may throw exception clear_eh # pop the handler off the stack ... _handler: # if not, execution continues here is_null P10, not_found # test P10 ... This example creates a new exception handler subroutine with the C opcode and installs it on the control stack with the C opcode. It sets the C register to a null value (so it can be checked later) and attempts to retrieve the global variable named C. If the global variable is found, the next statement (C) pops the exception handler off the control stack and normal execution continues. If the C call doesn't find C it throws an exception by pushing an exception object onto the control stack. When Parrot sees that it has an exception, it pops it off the control stack and calls the exception handler C<_handler>. The first exception handler in the control stack sees every exception thrown. The handler has to examine the exception object and decide whether it can handle it (or discard it) or whether it should C the exception to pass it along to an exception handler deeper in the stack. The CX opcode is only valid in exception handlers. It pushes the exception object back onto the control stack so Parrot knows to search for the next exception handler in the stack. The process continues until some exception handler deals with the exception and returns normally, or until there are no more exception handlers on the control stack. When the system finds no installed exception handlers it defaults to a final action, which normally means it prints an appropriate message and terminates the program. When the system installs an exception handler, it creates a return continuation with a snapshot of the current interpreter context. If the exception handler just returns (that is, if the exception is cleanly caught) the return continuation restores the control stack back to its state when the exception handler was called, cleaning up the exception handler and any other changes that were made in the process of handling the exception. Exceptions thrown by standard Parrot opcodes (like the one thrown by C above or by the C opcode) are always resumable, so when the exception handler function returns normally it continues execution at the opcode immediately after the one that threw the exception. Other exceptions at the run-loop level are also generally resumable. new P10, Exception # create new Exception object set P10["_message"], "I die" # set message attribute throw P10 # throw it Exceptions are designed to work with the Parrot calling conventions. Since the return addresses of C subroutine calls and exception handlers are both pushed onto the control stack, it's generally a bad idea to combine the two. =head1 Events Z An event is a notification that something has happened: a timer expired, an IO operation finished, a thread sent a message to another thread, or the user pressed C to interrupt program execution. What all of these events have in common is that they arrive asynchronously. It's generally not safe to interrupt program flow at an arbitrary point and continue at a different position, so the event is placed in the interpreter's task queue. The run loops code regularly checks whether an event needs to be handled. Event handlers may be an internal piece of code or a user-defined event handler subroutine. Events are still experimental in Parrot, so the implementation and design is subject to change. =head2 Timers Z C objects are the replacement for Perl 5's C handlers. They are also a significant improvement. Timers can fire once or repeatedly, and multiple timers can run independently. The precision of a timer is limited by the OS Parrot runs on, but it is always more fine-grained then a whole second. The final syntax isn't yet fixed, so please consult the documentation for examples. =head2 Signals Z Signal handling is related to events. When Parrot gets a signal it needs to handle from the OS, it converts that signal into an event and broadcasts it to all running threads. Each thread independently decides if it's interested in this signal and, if so, how to respond to it. newsub P20, .Exception_Handler, _handler set_eh P20 # establish signal handler print "send SIGINT:\n" sleep 2 # press ^C after you saw start print "no SIGINT\n" end _handler: .include "signal.pasm" # get signal definitions print "caught " set I0, P5["_type"] # if _type is negative, the ... neg I0, I0 # ... negated type is the signal ne I0, .SIGINT, nok print "SIGINT\n" nok: end This example creates a signal handler and pushes it on to the control stack. It then prompts the user to send a C from the shell (this is usually C, but it varies in different shells), and waits for 2 seconds. If the user doesn't send a SIGINT in 2 seconds the example just prints "no SIGINT" and ends. If the user does send a SIGINT, the signal handler catches it, prints out "caught SIGINT" and ends.N C handler, so this example won't work on other platforms.> =head1 Threads Z Threads allow multiple pieces of code to run in parallel. This is useful when you have multiple physical CPUs to share the load of running individual threads. With a single processor, threads still provide the feeling of parallelism, but without any improvement in execution time. Even worse, sometimes using threads on a single processor will actually slow down your program. Still, many algorithms can be expressed more easily in terms of parallel running pieces of code and many applications profit from taking advantage of multiple CPUs. Threads can vastly simplify asynchronous programs like internet servers: a thread splits off, waits for some IO to happen, handles it, and relinquishes the processor again when it's done. Parrot compiles in thread support by default (at least, if the platform provides some kind of support for it). Unlike Perl 5, compiling with threading support doesn't impose any execution time penalty for a non-threaded program. Like exceptions and events, threads are still under development, so you can expect significant changes in the near future. As outlined in the previous chapter, Parrot implements three different threading models. The following example uses the third model, which takes advantage of shared data. It uses a C (thread-safe queue) object to synchronize the two parallel running threads. This is only a simple example to illustrate threads, not a typical usage of threads (no-one really wants to spawn two threads just to print out a simple string). find_global P5, "_th1" # locate thread function new P2, "ParrotThread" # create a new thread find_method P0, P2, "thread3" # a shared thread's entry new P7, "TQueue" # create a Queue object new P8, "Int" # and a Int push P7, P8 # push the Int onto queue new P6, "String" # create new string set P6, "Js nte artHce\n" set I3, 3 # thread function gets 3 args invoke # _th1.run(P5,P6,P7) new P2, "ParrotThread" # same for a second thread find_global P5, "_th2" set P6, "utaohrPro akr" # set string to 2nd thread's invoke # ... data, run 2nd thread too end # Parrot joins both .pcc_sub _th1: # 1st thread function w1: sleep 0.001 # wait a bit and schedule defined I1, P7 # check if queue entry is ... unless I1, w1 # ... defined, yes: it's ours set S5, P6 # get string param substr S0, S5, I0, 1 # extract next char print S0 # and print it inc I0 # increment char pointer shift P8, P7 # pull item off from queue if S0, w1 # then wait again, if todo invoke P1 # done with string .pcc_sub _th2: # 2nd thread function w2: sleep 0.001 defined I1, P7 # if queue entry is defined if I1, w2 # then wait set S5, P6 substr S0, S5, I0, 1 # if not print next char print S0 inc I0 new P8, "Int" # and put a defined entry push P7, P8 # onto the queue so that if S0, w2 # the other thread will run invoke P1 # done with string This example creates a C object and calls its C method, passing three arguments: a PMC for the C<_th1> subroutine in C, a string argument in C, and a C object in C containing a single integer. Remember from the earlier section A"Parrot calling conventions" that registers 5-15 hold the arguments for a subroutine or method call and C stores the number of arguments. The thread object is passed in C. This call to the C method spawns a new thread to run the C<_th1> subroutine. The main body of the code then creates a second C object in C, stores a different subroutine in C, sets C to a new string value, and then calls the C method again, passing it the same C object as the first thread. This method call spawns a second thread. The main body of code then ends, leaving the two threads to do the work. At this point the two threads have already started running. The first thread (C<_th1>) starts off by sleeping for a 1000th of a second. It then checks if the C object contains a value. Since it contains a value when the thread is first called, it goes ahead and runs the body of the subroutine. The first thing this does is shift the element off the C. It then pulls one character off a copy of the string parameter using C, prints the character, increments the current position (C) in the string, and loops back to the C label and sleeps. Since the queue doesn't have any elements now, the subroutine keeps sleeping. Meanwhile, the second thread (C<_th2>) also starts off by sleeping for a 1000th of a second. It checks if the shared C object contains a defined value but unlike the first thread it only continues sleeping if the queue does contain a value. Since the queue contains a value when the second thread is first called, the subroutine loops back to the C label and continues sleeping. It keeps sleeping until the first thread shifts the integer off the queue, then runs the body of the subroutine. The body pulls one character off a copy of the string parameter using C, prints the character, and increments the current position in the string. It then creates a new C, pushes it onto the shared queue, and loops back to the C label again to sleep. The queue has an element now, so the second thread keeps sleeping, but the first thread runs through its loop again. The two threads alternate like this, printing a character and marking the queue so the next thread can run, until there are no more characters in either string. At the end, each subroutine invokes the return continuation in C which terminates the thread. The interpreter waits for all threads to terminate in the cleanup phase after the C in the main body of code. The final printed result (as you might have guessed) is: Just another Parrot Hacker The syntax for threads isn't carved in stone and the implementation still isn't finished but as this example shows, threads are working now and already useful. Several methods are useful when working with threads. The C method belongs to the C class. When it's called on a C object, the calling code waits until the thread terminates. new P2, "ParrotThread" # create a new thread set I5, P2 # get thread ID find_method P0, P2, "join" # get the join method... invoke # ...and join (wait for) the thread set P16, P5 # the return result of the thread C and C are interpreter methods, so you have to grab the current interpreter object before you can look up the method object. set I5, P2 # get thread ID of thread P2 getinterp P3 # get this interpreter object find_method P0, P3, "kill" # get kill method invoke # kill thread with ID I5 find_method P0, P3, "detach" invoke # detach thread with ID I5 By the time you read this, some of these combinations of statements and much of the threading syntax above may be reduced to a simpler set of opcodes. =head1 Loading Bytecode Z In addition to running Parrot bytecode on the command-line, you can also load pre-compiled bytecode directly into your PASM source file. The CX opcode takes a single argument: the name of the bytecode file to load. So, if you create a file named F containing a single subroutine: # file.pasm .pcc_sub _sub2: # .pcc_sub stores a global sub print "in sub2\n" invoke P1 and compile it to bytecode using the C<-o> command-line switch: $ parrot -o file.pbc file.pasm You can then load the compiled bytecode into F and directly call the subroutine defined in F: # main.pasm _main: load_bytecode "file.pbc" # compiled file.pasm find_global P0, "_sub2" invokecc end The C opcode also works with source files, as long as Parrot has a compiler registered for that type of file: # main2.pasm _main: load_bytecode "file.pasm" # PASM source code find_global P0, "_sub2" invokecc end Subroutines marked with C<@LOAD> run as soon as they're loaded (before C returns), rather than waiting to be called. A subroutine marked with C<@MAIN> will always run first, no matter what name you give it or where you define it in the file. # file3.pasm .pcc_sub @LOAD _entry: # mark the sub as to be run print "file3\n" invoke P1 # return # main3.pasm _first: # first is never invoked print "never\n" invoke P1 .pcc_sub @MAIN _main: # because _main is marked as the print "main\n" # MAIN entry of program execution load_bytecode "file3.pasm" print "back\n" end This example uses both C<@LOAD> and C<@MAIN>. Because the C<_main> subroutine is defined with C<@MAIN> it will execute first even though another subroutine comes before it in the file. C<_main> prints a line, loads the PASM source file, and then prints another line. Because C<_entry> in F is marked with C<@LOAD> it runs before C returns, so the final output is: main file3 back =head1 Classes and Objects Z Parrot's object system is a new addition in version 0.1.0. Objects still have some rough edges (for example, you currently can't add new attributes to a class after it has been instantiated), but they're functional for basic use. This section revolves around one complete example that defines a class, instantiates objects, and uses them. The whole example is included at the end of the section. =head2 Class declaration Z X The CX opcode defines a new class. It takes two arguments, the name of the class and the destination register for the class PMC. All classes (and objects) inherit from the C PMCX, which is the core of the Parrot object system. newclass P1, "Foo" To instantiate a new object of a particular class, you first look up the integer value for the class type with the C opcode, then create an object of that type with the C opcode: find_type I1, "Foo" new P3, I1 The C opcode also checks to see if the class defines a method named "__init" and calls it if it exists. =head2 Attributes Z X X The C opcode creates a slot in the class for an attribute (sometimes known as an I) and associates it with a name: addattribute P1, ".i" # Foo.i This chunk of code from the C<__init> method looks up the position of the first attribute, creates a C PMC, and stores it as the first attribute: classoffset I0, P2, "Foo" # first "Foo" attribute of object P2 new P6, "Int" # create storage for the attribute setattribute P2, I0, P6 # store the first attribute The C opcodeX takes a PMC containing an object and the name of its class, and returns an integer index for the position of the first attribute. The C opcode uses the integer index to store a PMC value in one of the object's attribute slots. This example initializes the first attribute. The second attribute would be at C, the third attribute at C, etc: inc I0 setattribute P2, I0, P7 # store next attribute ... There is also support for named parameters with fully qualified parameter names (although this is a little bit slower than getting the class offset once and accessing several attributes by index): new P6, "Int" setattribute P2, "Foo\x0.i", P6 # store the attribute You use the same integer index to retrieve the value of an attribute. The CX opcode takes an object and an index as arguments and returns the attribute PMC at that position: classoffset I0, P2, "Foo" # first "Foo" attribute of object P2 getattribute P10, P2, I0 # indexed get of attribute or getattribute P10, P2, "Foo\x0.i" # named get To set the value of an attribute PMC, first retrieve it with C and then assign to the returned PMC. Because PMC registers are only pointers to values, you don't need to store the PMC again after you modify its value: getattribute P10, P2, I0 set P10, I5 =head2 Methods Z X X X Methods in PASM are just subroutines installed in the namespace of the class. You define a method with the C<.pcc_sub> directive before the label: .pcc_sub _half: # I5 = self."_half"() classoffset I0, P2, "Foo" getattribute P10, P2, I0 set I5, P10 # get value div I5, 2 invoke P1 This routine returns half of the value of the first attribute of the object. Method calls use the Parrot calling conventions so they always pass the I object (often called I) in C. Invoking the return continuation in C returns control to the caller. The C<.pcc_sub> directive automatically stores the subroutine as a global in the current namespace. The C<.namespace> directive sets the current namespace: .namespace [ "Foo" ] If no namespace is set, or if the namespace is explicitly set to an empty string, then the subroutine is stored in the outermost namespace. The CX opcode makes a method call. It follows the Parrot calling conventions, so it expects to find the invocant object in C, the method object in C, etc. It adds one bit of magic, though. If you pass the name of the method in C, C looks up that method name in the invocant object and stores the method object in C for you: set S0, "_half" # set method name set P2, P3 # the object savetop # preserve registers callmethodcc # create return continuation, call restoretop print I5 # result of method call print "\n" The C opcode also generates a return continuation and stores it in C. The C opcode doesn't generate a return continuation, but is otherwise identical to C. Just like ordinary subroutine calls, you have to preserve and restore any registers you want to keep after a method call. Whether you store individual registers, register frames, or half register frames is up to you. =head3 Overriding vtable functions Z Every object inherits a default set of I functions from the C PMC, but you can also override them with your own methods. The vtable functions have predefined names that start with a double underscore "__". The following code defines a method named C<__init> in the C class that initializes the first attribute of the object with an integer: .pcc_sub __init: classoffset I0, P2, "Foo" # lookup first attribute position new P6, "Int" # create storage for the attribute setattribute P2, I0, P6 # store the first attribute invoke P1 # return Ordinary methods have to be called explicitly, but the vtable functions are called implicitly in many different contexts. Parrot saves and restores registers for you in these calls. The C<__init> method is called whenever a new object is constructed: find_type I1, "Foo" new P3, I1 # call __init if it exists A few other vtable functions in the complete code example for this section are C<__set_integer_native>, C<__add>, C<__get_integer>, C<__get_string>, and C<__increment>. The C opcode calls Foo's C<__set_integer_native> vtable function when its destination register is a C object and the source register is a native integer: set P3, 30 # call __set_integer_native method The C opcode calls Foo's C<__add> vtable function when it adds two C objects: new P4, I1 # same with P4 set P4, 12 new P5, I1 # create a new store for add add P5, P3, P4 # __add method The C opcode calls Foo's C<__increment> vtable function when it increments a C object: inc P3 # __increment Foo's C<__get_integer> and C<__get_string> vtable functions are called whenever an integer or string value is retrieved from a C object: set I10, P5 # __get_integer ... print P5 # calls __get_string, prints 'fortytwo' =head2 Inheritance Z X X The CX opcode creates a new class that inherits methods and attributes from another class. It takes 3 arguments: the destination register for the new class, a register containing the parent class, and the name of the new class: subclass P3, P1, "Bar" X For multiple inheritance, the CX opcode adds additional parents to a subclass. newclass P4, "Baz" addparent P3, P4 To override an inherited method, define a method with the same name in the namespace of the subclass. The following code overrides Bar's C<__increment> method so it decrements the value instead of incrementing it: .namespace [ "Bar" ] .pcc_sub __increment: classoffset I0, P2, "Foo" # get Foo's attribute slot offset getattribute P10, P2, I0 # get the first Foo attribute dec P10 # the evil line invoke P1 Notice that the attribute inherited from C can only be looked up with the C class name, not the C class name. This preserves the distinction between attributes that belong to the class and inherited attributes. Object creation for subclasses is the same as for ordinary classes: find_type I1, "Bar" new P5, I1 Calls to inherited methods are just like calls to methods defined in the class: set P5, 42 # inherited __set_integer_native inc P5 # overridden __increment print P5 # prints 41 as Bar's __increment decrements print "\n" set S0, "_half" # set method name set P2, P5 # the object savetop # preserve registers callmethodcc # create return continuation, call restoretop print I5 print "\n" =head2 Additional Object Opcodes Z The C and C opcodes are also useful when working with objects. CX checks whether an object belongs to or inherits from a particular class. CX checks whether an object has a particular method. Both return a true or false value. isa I0, P3, "Foo" # 1 isa I0, P3, "Bar" # 1 can I0, P3, "__add" # 1 =head2 Complete Example Z newclass P1, "Foo" addattribute P1, "$.i" # Foo.i find_type I1, "Foo" new P3, I1 # call __init if it exists set P3, 30 # call __set_integer_native method new P4, I1 # same with P4 set P4, 12 new P5, I1 # create a new LHS for add add P5, P3, P4 # __add method set I10, P5 # __get_integer print I10 print "\n" print P5 # calls __get_string prints 'fortytwo' print "\n" inc P3 # __increment add P5, P3, P4 print P5 # calls __get_string prints '43' print "\n" subclass P3, P1, "Bar" find_type I1, "Bar" new P3, I1 set P3, 100 new P4, I1 set P4, 200 new P5, I1 add P5, P3, P4 print P5 # prints 300 print "\n" set P5, 42 print P5 # prints 'fortytwo' print "\n" inc P5 print P5 # prints 41 as Bar's print P5 # prints 41 as _bar_inc decrements print "\n" set S0, "_half" # set method name set P2, P5 # the object savetop # preserve registers callmethodcc # create return continuation, call restoretop print I5 print "\n" =head2 Additional Object Opcodes Z The C and C opcodes are also useful when working with objects. CX checks whether an object belongs to or inherits from a particular class. CX checks whether an object has a particular method. Both return a true or false value. isa I0, P3, "Foo" # 1 isa I0, P3, "Bar" # 1 can I0, P3, "__add" # 1 =head2 Complete Example Z newclass P1, "Foo" addattribute P1, "$.i" # Foo.i find_type I1, "Foo" new P3, I1 # call __init if it exists set P3, 30 # call __set_integer_native method new P4, I1 # same with P4 set P4, 12 new P5, I1 # create a new LHS for add add P5, P3, P4 # __add method set I10, P5 # __get_integer print I10 print "\n" print P5 # calls __get_string prints 'fortytwo' print "\n" inc P3 # __increment add P5, P3, P4 print P5 # calls __get_string prints '43' print "\n" subclass P3, P1, "Bar" find_type I1, "Bar" new P3, I1 set P3, 100 new P4, I1 set P4, 200 new P5, I1 add P5, P3, P4 print P5 # prints 300 print "\n" set P5, 42 print P5 # prints 'fortytwo' print "\n" inc P5 print P5 # prints 41 as Bar's print P5 # prints 41 as _bar_inc decrements print "\n" set S0, "_half" # set method name set P2, P5 # the object savetop # preserve registers callmethodcc # create return continuation, call restoretop print I5 print "\n" =head2 Additional Object Opcodes Z The C and C opcodes are also useful when working with objects. CX checks whether an object belongs to or inherits from a particular class. CX checks whether an object has a particular method. Both return a true or false value. isa I0, P3, "Foo" # 1 isa I0, P3, "Bar" # 1 can I0, P3, "__add" # 1 =head2 Complete Example Z newclass P1, "Foo" addattribute P1, "$.i" # Foo.i find_type I1, "Foo" new P3, I1 # call __init if it exists set P3, 30 # call __set_integer_native method new P4, I1 # same with P4 set P4, 12 new P5, I1 # create a new LHS for add add P5, P3, P4 # __add method set I10, P5 # __get_integer print I10 print "\n" print P5 # calls __get_string prints 'fortytwo' print "\n" inc P3 # __increment add P5, P3, P4 print P5 # calls __get_string prints '43' print "\n" subclass P3, P1, "Bar" find_type I1, "Bar" new P3, I1 set P3, 100 new P4, I1 set P4, 200 new P5, I1 add P5, P3, P4 print P5 # prints 300 print "\n" set P5, 42 print P5 # prints 'fortytwo' print "\n" inc P5 print P5 # prints 41 as Bar's print "\n" # __increment decrements set S0, "_half" # set method name set P2, P3 # the object savetop # preserve registers callmethodcc # create return continuation, call restoretop print I5 # prints 50 print "\n" end .namespace [ "Foo" ] .pcc_sub __init: classoffset I0, P2, "Foo" # lookup first attribute position new P6, "Int" # create a store for the attribute setattribute P2, I0, P6 # store the first attribute invoke P1 # return .pcc_sub __set_integer_native: classoffset I0, P2, "Foo" getattribute P10, P2, I0 set P10, I5 # assign passed in value invoke P1 .pcc_sub __get_integer: classoffset I0, P2, "Foo" getattribute P10, P2, I0 set I5, P10 # return value invoke P1 .pcc_sub __get_string: classoffset I0, P2, "Foo" getattribute P10, P2, I0 set I5, P10 set S5, P10 # get stringified value ne I5, 42, ok set S5, "fortytwo" # or return modified one ok: invoke P1 .pcc_sub __increment: classoffset I0, P2, "Foo" getattribute P10, P2, I0 # as with all aggregates, this inc P10 # has reference semantics - no invoke P1 # setattribute needed .pcc_sub __add: classoffset I0, P2, "Foo" getattribute P10, P2, I0 # object getattribute P11, P5, I0 # argument getattribute P12, P6, I0 # destination add P12, P10, P11 invoke P1 .pcc_sub _half: # I5 = _half(self) classoffset I0, P2, "Foo" getattribute P10, P2, I0 set I5, P10 # get value div I5, 2 invoke P1 .namespace [ "Bar" ] .pcc_sub __increment: classoffset I0, P2, "Foo" # get Foo's attribute slot offset getattribute P10, P2, I0 # get the first Foo attribute dec P10 # the evil line invoke P1 # end of object example This example prints out: 42 fortytwo 43 300 fortytwo 41 50 =cut