This is elisp, produced by makeinfo version 4.0f from ./elisp.texi. INFO-DIR-SECTION Editors START-INFO-DIR-ENTRY * Elisp: (elisp). The Emacs Lisp Reference Manual. END-INFO-DIR-ENTRY This Info file contains edition 2.8 of the GNU Emacs Lisp Reference Manual, corresponding to Emacs version 21.2. Published by the Free Software Foundation 59 Temple Place, Suite 330 Boston, MA 02111-1307 USA Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being "Copying", with the Front-Cover texts being "A GNU Manual", and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled "GNU Free Documentation License". (a) The FSF's Back-Cover Text is: "You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development."  File: elisp, Node: Format Properties, Next: Sticky Properties, Prev: Special Properties, Up: Text Properties Formatted Text Properties ------------------------- These text properties affect the behavior of the fill commands. They are used for representing formatted text. *Note Filling::, and *Note Margins::. `hard' If a newline character has this property, it is a "hard" newline. The fill commands do not alter hard newlines and do not move words across them. However, this property takes effect only if the variable `use-hard-newlines' is non-`nil'. `right-margin' This property specifies an extra right margin for filling this part of the text. `left-margin' This property specifies an extra left margin for filling this part of the text. `justification' This property specifies the style of justification for filling this part of the text.  File: elisp, Node: Sticky Properties, Next: Saving Properties, Prev: Format Properties, Up: Text Properties Stickiness of Text Properties ----------------------------- Self-inserting characters normally take on the same properties as the preceding character. This is called "inheritance" of properties. In a Lisp program, you can do insertion with inheritance or without, depending on your choice of insertion primitive. The ordinary text insertion functions such as `insert' do not inherit any properties. They insert text with precisely the properties of the string being inserted, and no others. This is correct for programs that copy text from one context to another--for example, into or out of the kill ring. To insert with inheritance, use the special primitives described in this section. Self-inserting characters inherit properties because they work using these primitives. When you do insertion with inheritance, _which_ properties are inherited, and from where, depends on which properties are "sticky". Insertion after a character inherits those of its properties that are "rear-sticky". Insertion before a character inherits those of its properties that are "front-sticky". When both sides offer different sticky values for the same property, the previous character's value takes precedence. By default, a text property is rear-sticky but not front-sticky; thus, the default is to inherit all the properties of the preceding character, and nothing from the following character. You can control the stickiness of various text properties with two specific text properties, `front-sticky' and `rear-nonsticky', and with the variable `text-property-default-nonsticky'. You can use the variable to specify a different default for a given property. You can use those two text properties to make any specific properties sticky or nonsticky in any particular part of the text. If a character's `front-sticky' property is `t', then all its properties are front-sticky. If the `front-sticky' property is a list, then the sticky properties of the character are those whose names are in the list. For example, if a character has a `front-sticky' property whose value is `(face read-only)', then insertion before the character can inherit its `face' property and its `read-only' property, but no others. The `rear-nonsticky' property works the opposite way. Most properties are rear-sticky by default, so the `rear-nonsticky' property says which properties are _not_ rear-sticky. If a character's `rear-nonsticky' property is `t', then none of its properties are rear-sticky. If the `rear-nonsticky' property is a list, properties are rear-sticky _unless_ their names are in the list. - Variable: text-property-default-nonsticky This variable holds an alist which defines the default rear-stickiness of various text properties. Each element has the form `(PROPERTY . NONSTICKINESS)', and it defines the stickiness of a particular text property, PROPERTY. If NONSTICKINESS is non-`nil', this means that the property PROPERTY is rear-nonsticky by default. Since all properties are front-nonsticky by default, this makes PROPERTY nonsticky in both directions by default. The text properties `front-sticky' and `rear-nonsticky', when used, take precedence over the default NONSTICKINESS specifed in `text-property-default-nonsticky'. Here are the functions that insert text with inheritance of properties: - Function: insert-and-inherit &rest strings Insert the strings STRINGS, just like the function `insert', but inherit any sticky properties from the adjoining text. - Function: insert-before-markers-and-inherit &rest strings Insert the strings STRINGS, just like the function `insert-before-markers', but inherit any sticky properties from the adjoining text. *Note Insertion::, for the ordinary insertion functions which do not inherit.  File: elisp, Node: Saving Properties, Next: Lazy Properties, Prev: Sticky Properties, Up: Text Properties Saving Text Properties in Files ------------------------------- You can save text properties in files (along with the text itself), and restore the same text properties when visiting or inserting the files, using these two hooks: - Variable: write-region-annotate-functions This variable's value is a list of functions for `write-region' to run to encode text properties in some fashion as annotations to the text being written in the file. *Note Writing to Files::. Each function in the list is called with two arguments: the start and end of the region to be written. These functions should not alter the contents of the buffer. Instead, they should return lists indicating annotations to write in the file in addition to the text in the buffer. Each function should return a list of elements of the form `(POSITION . STRING)', where POSITION is an integer specifying the relative position within the text to be written, and STRING is the annotation to add there. Each list returned by one of these functions must be already sorted in increasing order by POSITION. If there is more than one function, `write-region' merges the lists destructively into one sorted list. When `write-region' actually writes the text from the buffer to the file, it intermixes the specified annotations at the corresponding positions. All this takes place without modifying the buffer. - Variable: after-insert-file-functions This variable holds a list of functions for `insert-file-contents' to call after inserting a file's contents. These functions should scan the inserted text for annotations, and convert them to the text properties they stand for. Each function receives one argument, the length of the inserted text; point indicates the start of that text. The function should scan that text for annotations, delete them, and create the text properties that the annotations specify. The function should return the updated length of the inserted text, as it stands after those changes. The value returned by one function becomes the argument to the next function. These functions should always return with point at the beginning of the inserted text. The intended use of `after-insert-file-functions' is for converting some sort of textual annotations into actual text properties. But other uses may be possible. We invite users to write Lisp programs to store and retrieve text properties in files, using these hooks, and thus to experiment with various data formats and find good ones. Eventually we hope users will produce good, general extensions we can install in Emacs. We suggest not trying to handle arbitrary Lisp objects as text property names or values--because a program that general is probably difficult to write, and slow. Instead, choose a set of possible data types that are reasonably flexible, and not too hard to encode. *Note Format Conversion::, for a related feature.  File: elisp, Node: Lazy Properties, Next: Clickable Text, Prev: Saving Properties, Up: Text Properties Lazy Computation of Text Properties ----------------------------------- Instead of computing text properties for all the text in the buffer, you can arrange to compute the text properties for parts of the text when and if something depends on them. The primitive that extracts text from the buffer along with its properties is `buffer-substring'. Before examining the properties, this function runs the abnormal hook `buffer-access-fontify-functions'. - Variable: buffer-access-fontify-functions This variable holds a list of functions for computing text properties. Before `buffer-substring' copies the text and text properties for a portion of the buffer, it calls all the functions in this list. Each of the functions receives two arguments that specify the range of the buffer being accessed. (The buffer itself is always the current buffer.) The function `buffer-substring-no-properties' does not call these functions, since it ignores text properties anyway. In order to prevent the hook functions from being called more than once for the same part of the buffer, you can use the variable `buffer-access-fontified-property'. - Variable: buffer-access-fontified-property If this value's variable is non-`nil', it is a symbol which is used as a text property name. A non-`nil' value for that text property means, "the other text properties for this character have already been computed." If all the characters in the range specified for `buffer-substring' have a non-`nil' value for this property, `buffer-substring' does not call the `buffer-access-fontify-functions' functions. It assumes these characters already have the right text properties, and just copies the properties they already have. The normal way to use this feature is that the `buffer-access-fontify-functions' functions add this property, as well as others, to the characters they operate on. That way, they avoid being called over and over for the same text.  File: elisp, Node: Clickable Text, Next: Fields, Prev: Lazy Properties, Up: Text Properties Defining Clickable Text ----------------------- There are two ways to set up "clickable text" in a buffer. There are typically two parts of this: to make the text highlight when the mouse is over it, and to make a mouse button do something when you click it on that part of the text. Highlighting is done with the `mouse-face' text property. Here is an example of how Dired does it: (condition-case nil (if (dired-move-to-filename) (put-text-property (point) (save-excursion (dired-move-to-end-of-filename) (point)) 'mouse-face 'highlight)) (error nil)) The first two arguments to `put-text-property' specify the beginning and end of the text. The usual way to make the mouse do something when you click it on this text is to define `mouse-2' in the major mode's keymap. The job of checking whether the click was on clickable text is done by the command definition. Here is how Dired does it: (defun dired-mouse-find-file-other-window (event) "In dired, visit the file or directory name you click on." (interactive "e") (let (file) (save-excursion (set-buffer (window-buffer (posn-window (event-end event)))) (save-excursion (goto-char (posn-point (event-end event))) (setq file (dired-get-filename)))) (select-window (posn-window (event-end event))) (find-file-other-window (file-name-sans-versions file t)))) The reason for the outer `save-excursion' construct is to avoid changing the current buffer; the reason for the inner one is to avoid permanently altering point in the buffer you click on. In this case, Dired uses the function `dired-get-filename' to determine which file to visit, based on the position found in the event. Instead of defining a mouse command for the major mode, you can define a key binding for the clickable text itself, using the `keymap' text property: (let ((map (make-sparse-keymap))) (define-key map [mouse-2] 'operate-this-button) (put-text-property (point) (save-excursion (dired-move-to-end-of-filename) (point)) 'keymap map)) This method makes it possible to define different commands for various clickable pieces of text. Also, the major mode definition (or the global definition) remains available for the rest of the text in the buffer.  File: elisp, Node: Fields, Next: Not Intervals, Prev: Clickable Text, Up: Text Properties Defining and Using Fields ------------------------- A field is a range of consecutive characters in the buffer that are identified by having the same value (comparing with `eq') of the `field' property (either a text-property or an overlay property). This section describes special functions that are available for operating on fields. You specify a field with a buffer position, POS. We think of each field as containing a range of buffer positions, so the position you specify stands for the field containing that position. When the characters before and after POS are part of the same field, there is no doubt which field contains POS: the one those characters both belong to. When POS is at a boundary between fields, which field it belongs to depends on the stickiness of the `field' properties of the two surrounding characters (*note Sticky Properties::). The field whose property would be inherited by text inserted at POS is the field that contains POS. There is an anomalous case where newly inserted text at POS would not inherit the `field' property from either side. This happens if the previous character's `field' property is not rear-sticky, and the following character's `field' property is not front-sticky. In this case, POS belongs to neither the preceding field nor the following field; the field functions treat it as belonging to an empty field whose beginning and end are both at POS. In all of these functions, if POS is omitted or `nil', the value of point is used by default. - Function: field-beginning &optional pos escape-from-edge This function returns the beginning of the field specified by POS. If POS is at the beginning of its field, and ESCAPE-FROM-EDGE is non-`nil', then the return value is always the beginning of the preceding field that _ends_ at POS, regardless of the stickiness of the `field' properties around POS. - Function: field-end &optional pos escape-from-edge This function returns the end of the field specified by POS. If POS is at the end of its field, and ESCAPE-FROM-EDGE is non-`nil', then the return value is always the end of the following field that _begins_ at POS, regardless of the stickiness of the `field' properties around POS. - Function: field-string &optional pos This function returns the contents of the field specified by POS, as a string. - Function: field-string-no-properties &optional pos This function returns the contents of the field specified by POS, as a string, discarding text properties. - Function: delete-field &optional pos This function deletes the text of the field specified by POS. - Function: constrain-to-field new-pos old-pos &optional escape-from-edge only-in-line inhibit-capture-property This function "constrains" NEW-POS to the field that OLD-POS belongs to--in other words, it returns the position closest to NEW-POS that is in the same field as OLD-POS. If NEW-POS is `nil', then `constrain-to-field' uses the value of point instead, and moves point to the resulting position. If OLD-POS is at the boundary of two fields, then the acceptable positions for NEW-POS depend on the value of the optional argument ESCAPE-FROM-EDGE. If ESCAPE-FROM-EDGE is `nil', then NEW-POS is constrained to the field that has the same `field' property (either a text-property or an overlay property) that new characters inserted at OLD-POS would get. (This depends on the stickiness of the `field' property for the characters before and after OLD-POS.) If ESCAPE-FROM-EDGE is non-`nil', NEW-POS is constrained to the union of the two adjacent fields. Additionally, if two fields are separated by another field with the special value `boundary', then any point within this special field is also considered to be "on the boundary." If the optional argument ONLY-IN-LINE is non-`nil', and constraining NEW-POS in the usual way would move it to a different line, NEW-POS is returned unconstrained. This used in commands that move by line, such as `next-line' and `beginning-of-line', so that they respect field boundaries only in the case where they can still move to the right line. If the optional argument INHIBIT-CAPTURE-PROPERTY is non-`nil', and OLD-POS has a non-`nil' property of that name, then any field boundaries are ignored. You can cause `constrain-to-field' to ignore all field boundaries (and so never constrain anything) by binding the variable `inhibit-field-text-motion' to a non-nil value.  File: elisp, Node: Not Intervals, Prev: Fields, Up: Text Properties Why Text Properties are not Intervals ------------------------------------- Some editors that support adding attributes to text in the buffer do so by letting the user specify "intervals" within the text, and adding the properties to the intervals. Those editors permit the user or the programmer to determine where individual intervals start and end. We deliberately provided a different sort of interface in Emacs Lisp to avoid certain paradoxical behavior associated with text modification. If the actual subdivision into intervals is meaningful, that means you can distinguish between a buffer that is just one interval with a certain property, and a buffer containing the same text subdivided into two intervals, both of which have that property. Suppose you take the buffer with just one interval and kill part of the text. The text remaining in the buffer is one interval, and the copy in the kill ring (and the undo list) becomes a separate interval. Then if you yank back the killed text, you get two intervals with the same properties. Thus, editing does not preserve the distinction between one interval and two. Suppose we "fix" this problem by coalescing the two intervals when the text is inserted. That works fine if the buffer originally was a single interval. But suppose instead that we have two adjacent intervals with the same properties, and we kill the text of one interval and yank it back. The same interval-coalescence feature that rescues the other case causes trouble in this one: after yanking, we have just one interval. One again, editing does not preserve the distinction between one interval and two. Insertion of text at the border between intervals also raises questions that have no satisfactory answer. However, it is easy to arrange for editing to behave consistently for questions of the form, "What are the properties of this character?" So we have decided these are the only questions that make sense; we have not implemented asking questions about where intervals start or end. In practice, you can usually use the text property search functions in place of explicit interval boundaries. You can think of them as finding the boundaries of intervals, assuming that intervals are always coalesced whenever possible. *Note Property Search::. Emacs also provides explicit intervals as a presentation feature; see *Note Overlays::.  File: elisp, Node: Substitution, Next: Transposition, Prev: Text Properties, Up: Text Substituting for a Character Code ================================= The following functions replace characters within a specified region based on their character codes. - Function: subst-char-in-region start end old-char new-char &optional noundo This function replaces all occurrences of the character OLD-CHAR with the character NEW-CHAR in the region of the current buffer defined by START and END. If NOUNDO is non-`nil', then `subst-char-in-region' does not record the change for undo and does not mark the buffer as modified. This was useful for controlling the old selective display feature (*note Selective Display::). `subst-char-in-region' does not move point and returns `nil'. ---------- Buffer: foo ---------- This is the contents of the buffer before. ---------- Buffer: foo ---------- (subst-char-in-region 1 20 ?i ?X) => nil ---------- Buffer: foo ---------- ThXs Xs the contents of the buffer before. ---------- Buffer: foo ---------- - Function: translate-region start end table This function applies a translation table to the characters in the buffer between positions START and END. The translation table TABLE is a string; `(aref TABLE OCHAR)' gives the translated character corresponding to OCHAR. If the length of TABLE is less than 256, any characters with codes larger than the length of TABLE are not altered by the translation. The return value of `translate-region' is the number of characters that were actually changed by the translation. This does not count characters that were mapped into themselves in the translation table.  File: elisp, Node: Registers, Next: Base 64, Prev: Transposition, Up: Text Registers ========= A register is a sort of variable used in Emacs editing that can hold a variety of different kinds of values. Each register is named by a single character. All ASCII characters and their meta variants (but with the exception of `C-g') can be used to name registers. Thus, there are 255 possible registers. A register is designated in Emacs Lisp by the character that is its name. - Variable: register-alist This variable is an alist of elements of the form `(NAME . CONTENTS)'. Normally, there is one element for each Emacs register that has been used. The object NAME is a character (an integer) identifying the register. The CONTENTS of a register can have several possible types: a number A number stands for itself. If `insert-register' finds a number in the register, it converts the number to decimal. a marker A marker represents a buffer position to jump to. a string A string is text saved in the register. a rectangle A rectangle is represented by a list of strings. `(WINDOW-CONFIGURATION POSITION)' This represents a window configuration to restore in one frame, and a position to jump to in the current buffer. `(FRAME-CONFIGURATION POSITION)' This represents a frame configuration to restore, and a position to jump to in the current buffer. (file FILENAME) This represents a file to visit; jumping to this value visits file FILENAME. (file-query FILENAME POSITION) This represents a file to visit and a position in it; jumping to this value visits file FILENAME and goes to buffer position POSITION. Restoring this type of position asks the user for confirmation first. The functions in this section return unpredictable values unless otherwise stated. - Function: get-register reg This function returns the contents of the register REG, or `nil' if it has no contents. - Function: set-register reg value This function sets the contents of register REG to VALUE. A register can be set to any value, but the other register functions expect only certain data types. The return value is VALUE. - Command: view-register reg This command displays what is contained in register REG. - Command: insert-register reg &optional beforep This command inserts contents of register REG into the current buffer. Normally, this command puts point before the inserted text, and the mark after it. However, if the optional second argument BEFOREP is non-`nil', it puts the mark before and point after. You can pass a non-`nil' second argument BEFOREP to this function interactively by supplying any prefix argument. If the register contains a rectangle, then the rectangle is inserted with its upper left corner at point. This means that text is inserted in the current line and underneath it on successive lines. If the register contains something other than saved text (a string) or a rectangle (a list), currently useless things happen. This may be changed in the future.  File: elisp, Node: Transposition, Next: Registers, Prev: Substitution, Up: Text Transposition of Text ===================== This subroutine is used by the transposition commands. - Function: transpose-regions start1 end1 start2 end2 &optional leave-markers This function exchanges two nonoverlapping portions of the buffer. Arguments START1 and END1 specify the bounds of one portion and arguments START2 and END2 specify the bounds of the other portion. Normally, `transpose-regions' relocates markers with the transposed text; a marker previously positioned within one of the two transposed portions moves along with that portion, thus remaining between the same two characters in their new position. However, if LEAVE-MARKERS is non-`nil', `transpose-regions' does not do this--it leaves all markers unrelocated.  File: elisp, Node: Base 64, Next: MD5 Checksum, Prev: Registers, Up: Text Base 64 Encoding ================ Base 64 code is used in email to encode a sequence of 8-bit bytes as a longer sequence of ASCII graphic characters. It is defined in Internet RFC(1)2045. This section describes the functions for converting to and from this code. - Function: base64-encode-region beg end &optional no-line-break This function converts the region from BEG to END into base 64 code. It returns the length of the encoded text. An error is signaled if a character in the region is multibyte, i.e. in a multibyte buffer the region must contain only characters from the charsets `ascii', `eight-bit-control' and `eight-bit-graphic'. Normally, this function inserts newline characters into the encoded text, to avoid overlong lines. However, if the optional argument NO-LINE-BREAK is non-`nil', these newlines are not added, so the output is just one long line. - Function: base64-encode-string string &optional no-line-break This function converts the string STRING into base 64 code. It returns a string containing the encoded text. As for `base64-encode-region', an error is signaled if a character in the string is multibyte. Normally, this function inserts newline characters into the encoded text, to avoid overlong lines. However, if the optional argument NO-LINE-BREAK is non-`nil', these newlines are not added, so the result string is just one long line. - Function: base64-decode-region beg end This function converts the region from BEG to END from base 64 code into the corresponding decoded text. It returns the length of the decoded text. The decoding functions ignore newline characters in the encoded text. - Function: base64-decode-string string This function converts the string STRING from base 64 code into the corresponding decoded text. It returns a string containing the decoded text. The decoding functions ignore newline characters in the encoded text. ---------- Footnotes ---------- (1) An RFC, an acronym for "Request for Comments", is a numbered Internet informational document describing a standard. RFCs are usually written by technical experts acting on their own initiative, and are traditionally written in a pragmatic, experience-driven manner.  File: elisp, Node: MD5 Checksum, Next: Change Hooks, Prev: Base 64, Up: Text MD5 Checksum ============ MD5 cryptographic checksums, or "message digests", are 128-bit "fingerprints" of a document or program. They are used to verify that you have an exact and unaltered copy of the data. The algorithm to calculate the MD5 message digest is defined in Internet RFC(1)1321. This section describes the Emacs facilities for computing message digests. - Function: md5 object &optional start end coding-system noerror This function returns the MD5 message digest of OBJECT, which should be a buffer or a string. The two optional arguments START and END are character positions specifying the portion of OBJECT to compute the message digest for. If they are `nil' or omitted, the digest is computed for the whole of OBJECT. The function `md5' does not compute the message digest directly from the internal Emacs representation of the text (*note Text Representations::). Instead, it encodes the text using a coding system, and computes the message digest from the encoded text. The optional fourth argument CODING-SYSTEM specifies which coding system to use for encoding the text. It should be the same coding system that you used to read the text, or that you used or will use when saving or sending the text. *Note Coding Systems::, for more information about coding systems. If CODING-SYSTEM is `nil' or omitted, the default depends on OBJECT. If OBJECT is a buffer, the default for CODING-SYSTEM is whatever coding system would be chosen by default for writing this text into a file. If OBJECT is a string, the user's most preferred coding system (*note prefer-coding-system: (emacs)Recognize Coding.) is used. Normally, `md5' signals an error if the text can't be encoded using the specified or chosen coding system. However, if NOERROR is non-`nil', it silently uses `raw-text' coding instead. ---------- Footnotes ---------- (1) For an explanation of what is an RFC, see the footnote in *Note Base 64::.  File: elisp, Node: Change Hooks, Prev: MD5 Checksum, Up: Text Change Hooks ============ These hook variables let you arrange to take notice of all changes in all buffers (or in a particular buffer, if you make them buffer-local). See also *Note Special Properties::, for how to detect changes to specific parts of the text. The functions you use in these hooks should save and restore the match data if they do anything that uses regular expressions; otherwise, they will interfere in bizarre ways with the editing operations that call them. - Variable: before-change-functions This variable holds a list of functions to call before any buffer modification. Each function gets two arguments, the beginning and end of the region that is about to change, represented as integers. The buffer that is about to change is always the current buffer. - Variable: after-change-functions This variable holds a list of functions to call after any buffer modification. Each function receives three arguments: the beginning and end of the region just changed, and the length of the text that existed before the change. All three arguments are integers. The buffer that's about to change is always the current buffer. The length of the old text is the difference between the buffer positions before and after that text as it was before the change. As for the changed text, its length is simply the difference between the first two arguments. - Macro: combine-after-change-calls body... The macro executes BODY normally, but arranges to call the after-change functions just once for a series of several changes--if that seems safe. If a program makes several text changes in the same area of the buffer, using the macro `combine-after-change-calls' around that part of the program can make it run considerably faster when after-change hooks are in use. When the after-change hooks are ultimately called, the arguments specify a portion of the buffer including all of the changes made within the `combine-after-change-calls' body. *Warning:* You must not alter the values of `after-change-functions' within the body of a `combine-after-change-calls' form. *Note:* If the changes you combine occur in widely scattered parts of the buffer, this will still work, but it is not advisable, because it may lead to inefficient behavior for some change hook functions. The two variables above are temporarily bound to `nil' during the time that any of these functions is running. This means that if one of these functions changes the buffer, that change won't run these functions. If you do want a hook function to make changes that run these functions, make it bind these variables back to their usual values. One inconvenient result of this protective feature is that you cannot have a function in `after-change-functions' or `before-change-functions' which changes the value of that variable. But that's not a real limitation. If you want those functions to change the list of functions to run, simply add one fixed function to the hook, and code that function to look in another variable for other functions to call. Here is an example: (setq my-own-after-change-functions nil) (defun indirect-after-change-function (beg end len) (let ((list my-own-after-change-functions)) (while list (funcall (car list) beg end len) (setq list (cdr list))))) (add-hooks 'after-change-functions 'indirect-after-change-function) - Variable: first-change-hook This variable is a normal hook that is run whenever a buffer is changed that was previously in the unmodified state. - Variable: inhibit-modification-hooks If this variable is non-`nil', all of the change hooks are disabled; none of them run. This affects all the hook variables described above in this section, as well as the hooks attached to certain special text properties (*note Special Properties::) and overlay properties (*note Overlay Properties::). This variable is available starting in Emacs 21.  File: elisp, Node: Non-ASCII Characters, Next: Searching and Matching, Prev: Text, Up: Top Non-ASCII Characters ******************** This chapter covers the special issues relating to non-ASCII characters and how they are stored in strings and buffers. * Menu: * Text Representations:: Unibyte and multibyte representations * Converting Representations:: Converting unibyte to multibyte and vice versa. * Selecting a Representation:: Treating a byte sequence as unibyte or multi. * Character Codes:: How unibyte and multibyte relate to codes of individual characters. * Character Sets:: The space of possible characters codes is divided into various character sets. * Chars and Bytes:: More information about multibyte encodings. * Splitting Characters:: Converting a character to its byte sequence. * Scanning Charsets:: Which character sets are used in a buffer? * Translation of Characters:: Translation tables are used for conversion. * Coding Systems:: Coding systems are conversions for saving files. * Input Methods:: Input methods allow users to enter various non-ASCII characters without special keyboards. * Locales:: Interacting with the POSIX locale.  File: elisp, Node: Text Representations, Next: Converting Representations, Up: Non-ASCII Characters Text Representations ==================== Emacs has two "text representations"--two ways to represent text in a string or buffer. These are called "unibyte" and "multibyte". Each string, and each buffer, uses one of these two representations. For most purposes, you can ignore the issue of representations, because Emacs converts text between them as appropriate. Occasionally in Lisp programming you will need to pay attention to the difference. In unibyte representation, each character occupies one byte and therefore the possible character codes range from 0 to 255. Codes 0 through 127 are ASCII characters; the codes from 128 through 255 are used for one non-ASCII character set (you can choose which character set by setting the variable `nonascii-insert-offset'). In multibyte representation, a character may occupy more than one byte, and as a result, the full range of Emacs character codes can be stored. The first byte of a multibyte character is always in the range 128 through 159 (octal 0200 through 0237). These values are called "leading codes". The second and subsequent bytes of a multibyte character are always in the range 160 through 255 (octal 0240 through 0377); these values are "trailing codes". Some sequences of bytes are not valid in multibyte text: for example, a single isolated byte in the range 128 through 159 is not allowed. But character codes 128 through 159 can appear in multibyte text, represented as two-byte sequences. All the character codes 128 through 255 are possible (though slightly abnormal) in multibyte text; they appear in multibyte buffers and strings when you do explicit encoding and decoding (*note Explicit Encoding::). In a buffer, the buffer-local value of the variable `enable-multibyte-characters' specifies the representation used. The representation for a string is determined and recorded in the string when the string is constructed. - Variable: enable-multibyte-characters This variable specifies the current buffer's text representation. If it is non-`nil', the buffer contains multibyte text; otherwise, it contains unibyte text. You cannot set this variable directly; instead, use the function `set-buffer-multibyte' to change a buffer's representation. - Variable: default-enable-multibyte-characters This variable's value is entirely equivalent to `(default-value 'enable-multibyte-characters)', and setting this variable changes that default value. Setting the local binding of `enable-multibyte-characters' in a specific buffer is not allowed, but changing the default value is supported, and it is a reasonable thing to do, because it has no effect on existing buffers. The `--unibyte' command line option does its job by setting the default value to `nil' early in startup. - Function: position-bytes position Return the byte-position corresponding to buffer position POSITION in the current buffer. - Function: byte-to-position byte-position Return the buffer position corresponding to byte-position BYTE-POSITION in the current buffer. - Function: multibyte-string-p string Return `t' if STRING is a multibyte string.  File: elisp, Node: Converting Representations, Next: Selecting a Representation, Prev: Text Representations, Up: Non-ASCII Characters Converting Text Representations =============================== Emacs can convert unibyte text to multibyte; it can also convert multibyte text to unibyte, though this conversion loses information. In general these conversions happen when inserting text into a buffer, or when putting text from several strings together in one string. You can also explicitly convert a string's contents to either representation. Emacs chooses the representation for a string based on the text that it is constructed from. The general rule is to convert unibyte text to multibyte text when combining it with other multibyte text, because the multibyte representation is more general and can hold whatever characters the unibyte text has. When inserting text into a buffer, Emacs converts the text to the buffer's representation, as specified by `enable-multibyte-characters' in that buffer. In particular, when you insert multibyte text into a unibyte buffer, Emacs converts the text to unibyte, even though this conversion cannot in general preserve all the characters that might be in the multibyte text. The other natural alternative, to convert the buffer contents to multibyte, is not acceptable because the buffer's representation is a choice made by the user that cannot be overridden automatically. Converting unibyte text to multibyte text leaves ASCII characters unchanged, and likewise character codes 128 through 159. It converts the non-ASCII codes 160 through 255 by adding the value `nonascii-insert-offset' to each character code. By setting this variable, you specify which character set the unibyte characters correspond to (*note Character Sets::). For example, if `nonascii-insert-offset' is 2048, which is `(- (make-char 'latin-iso8859-1) 128)', then the unibyte non-ASCII characters correspond to Latin 1. If it is 2688, which is `(- (make-char 'greek-iso8859-7) 128)', then they correspond to Greek letters. Converting multibyte text to unibyte is simpler: it discards all but the low 8 bits of each character code. If `nonascii-insert-offset' has a reasonable value, corresponding to the beginning of some character set, this conversion is the inverse of the other: converting unibyte text to multibyte and back to unibyte reproduces the original unibyte text. - Variable: nonascii-insert-offset This variable specifies the amount to add to a non-ASCII character when converting unibyte text to multibyte. It also applies when `self-insert-command' inserts a character in the unibyte non-ASCII range, 128 through 255. However, the functions `insert' and `insert-char' do not perform this conversion. The right value to use to select character set CS is `(- (make-char CS) 128)'. If the value of `nonascii-insert-offset' is zero, then conversion actually uses the value for the Latin 1 character set, rather than zero. - Variable: nonascii-translation-table This variable provides a more general alternative to `nonascii-insert-offset'. You can use it to specify independently how to translate each code in the range of 128 through 255 into a multibyte character. The value should be a char-table, or `nil'. If this is non-`nil', it overrides `nonascii-insert-offset'. - Function: string-make-unibyte string This function converts the text of STRING to unibyte representation, if it isn't already, and returns the result. If STRING is a unibyte string, it is returned unchanged. Multibyte character codes are converted to unibyte by using just the low 8 bits. - Function: string-make-multibyte string This function converts the text of STRING to multibyte representation, if it isn't already, and returns the result. If STRING is a multibyte string, it is returned unchanged. The function `unibyte-char-to-multibyte' is used to convert each unibyte character to a multibyte character.  File: elisp, Node: Selecting a Representation, Next: Character Codes, Prev: Converting Representations, Up: Non-ASCII Characters Selecting a Representation ========================== Sometimes it is useful to examine an existing buffer or string as multibyte when it was unibyte, or vice versa. - Function: set-buffer-multibyte multibyte Set the representation type of the current buffer. If MULTIBYTE is non-`nil', the buffer becomes multibyte. If MULTIBYTE is `nil', the buffer becomes unibyte. This function leaves the buffer contents unchanged when viewed as a sequence of bytes. As a consequence, it can change the contents viewed as characters; a sequence of two bytes which is treated as one character in multibyte representation will count as two characters in unibyte representation. Character codes 128 through 159 are an exception. They are represented by one byte in a unibyte buffer, but when the buffer is set to multibyte, they are converted to two-byte sequences, and vice versa. This function sets `enable-multibyte-characters' to record which representation is in use. It also adjusts various data in the buffer (including overlays, text properties and markers) so that they cover the same text as they did before. You cannot use `set-buffer-multibyte' on an indirect buffer, because indirect buffers always inherit the representation of the base buffer. - Function: string-as-unibyte string This function returns a string with the same bytes as STRING but treating each byte as a character. This means that the value may have more characters than STRING has. If STRING is already a unibyte string, then the value is STRING itself. Otherwise it is a newly created string, with no text properties. If STRING is multibyte, any characters it contains of charset EIGHT-BIT-CONTROL or EIGHT-BIT-GRAPHIC are converted to the corresponding single byte. - Function: string-as-multibyte string This function returns a string with the same bytes as STRING but treating each multibyte sequence as one character. This means that the value may have fewer characters than STRING has. If STRING is already a multibyte string, then the value is STRING itself. Otherwise it is a newly created string, with no text properties. If STRING is unibyte and contains any individual 8-bit bytes (i.e. not part of a multibyte form), they are converted to the corresponding multibyte character of charset EIGHT-BIT-CONTROL or EIGHT-BIT-GRAPHIC.  File: elisp, Node: Character Codes, Next: Character Sets, Prev: Selecting a Representation, Up: Non-ASCII Characters Character Codes =============== The unibyte and multibyte text representations use different character codes. The valid character codes for unibyte representation range from 0 to 255--the values that can fit in one byte. The valid character codes for multibyte representation range from 0 to 524287, but not all values in that range are valid. The values 128 through 255 are not entirely proper in multibyte text, but they can occur if you do explicit encoding and decoding (*note Explicit Encoding::). Some other character codes cannot occur at all in multibyte text. Only the ASCII codes 0 through 127 are completely legitimate in both representations. - Function: char-valid-p charcode &optional genericp This returns `t' if CHARCODE is valid for either one of the two text representations. (char-valid-p 65) => t (char-valid-p 256) => nil (char-valid-p 2248) => t If the optional argument GENERICP is non-nil, this function returns `t' if CHARCODE is a generic character (*note Splitting Characters::).  File: elisp, Node: Character Sets, Next: Chars and Bytes, Prev: Character Codes, Up: Non-ASCII Characters Character Sets ============== Emacs classifies characters into various "character sets", each of which has a name which is a symbol. Each character belongs to one and only one character set. In general, there is one character set for each distinct script. For example, `latin-iso8859-1' is one character set, `greek-iso8859-7' is another, and `ascii' is another. An Emacs character set can hold at most 9025 characters; therefore, in some cases, characters that would logically be grouped together are split into several character sets. For example, one set of Chinese characters, generally known as Big 5, is divided into two Emacs character sets, `chinese-big5-1' and `chinese-big5-2'. ASCII characters are in character set `ascii'. The non-ASCII characters 128 through 159 are in character set `eight-bit-control', and codes 160 through 255 are in character set `eight-bit-graphic'. - Function: charsetp object Returns `t' if OBJECT is a symbol that names a character set, `nil' otherwise. - Function: charset-list This function returns a list of all defined character set names. - Function: char-charset character This function returns the name of the character set that CHARACTER belongs to. - Function: charset-plist charset This function returns the charset property list of the character set CHARSET. Although CHARSET is a symbol, this is not the same as the property list of that symbol. Charset properties are used for special purposes within Emacs; for example, `preferred-coding-system' helps determine which coding system to use to encode characters in a charset.