'\" t .\" Copyright (c) 2001-2004, Nadav Har'El and Dan Kenigsberg .TH hspell 3 "21 June 2004" "Hspell 0.8" "Ivrix" .SH NAME hspell \- Hebrew spellchecker (C API) .SH SYNOPSIS .B #include .PP \fBint hspell_init(struct dict_radix **\fRdictp\fB, int \fRflags\fB);\fR .PP \fBvoid hspell_uninit(struct dict_radix *\fRdictp\fB);\fR .PP \fBint hspell_check_word(struct dict_radix *\fRdict\fB, const char *\fRword\fB, int *\fRpreflen\fB);\fR .PP \fBvoid hspell_trycorrect(struct dict_radix *\fRdict\fB, const char *\fRword\fB, struct corlist *\fRcl\fB);\fR .PP \fBint corlist_init(struct corlist *\fRcl\fB);\fR .PP \fBint corlist_free(struct corlist *\fRcl\fB);\fR .PP \fBint corlist_n(struct corlist *\fRcl\fB);\fR .PP \fBchar *corlist_str(struct corlist *\fRcl\fB, int \fRi\fB);\fR .PP \fBint hspell_is_canonic_gimatria(const char *\fRword\fB);\fR .PP \fRtypedef int hspell_word_split_callback_func(const char *word, const char *baseword, int preflen, int prefspec);\fR .PP \fBint hspell_enum_splits(struct dict_radix *\fRdict\fB, const char *\fRword\fB, hspell_word_split_callback_func *\fRenumf\fB);\fR .SH "DESCRIPTION" This manual describes the C API of the Hspell Hebrew spellchecker. Please refer to .BR hspell (1) for a fuller description of the Hspell project, its spelling standard, and how it works. The .B hspell_init() function must be called first to initialize the Hspell library. It sets up some global structures (see CAVEATS section) and then reads the necessary dictionary files (whose places are fixed when the library is built). The .I 'dictp' parameter is a pointer to a .I struct dict_radix* object, which is modified to point to a newly allocated dictionary. A typical .B hspell_init() call therefore looks like struct dict_radix *dict; hspell_init(&dict, flags); Note that the (struct dict_radix*) type is an opaque pointer \- the library user has no access to the separate fields in this structure. The .I 'flags' parameter can contain a bitwise or'ing of several flags that modify Hspell's default behavior; Turning on HSPELL_OPT_HE_SHEELA allows Hspell to recognize the interrogative He prefix (he ha-she'ela). HSPELL_OPT_DEFAULT is a synonym for turning on no special flag, i.e., it evaluates to 0. .B hspell_init() returns 0 on success, or negative numbers on errors. Currently, the only error is -1, meaning the dictionary files could not be read. The .B hspell_uninit() function undoes the effects of .BR hspell_init() , freeing any memory that was allocated during initialization. The .B hspell_check_word() function checks whether a certain word is a correct Hebrew word (possibly with prefix particles attached in a syntacticly-correct manner). 1 is returned if the word is correct, or 0 if it is incorrect. The .I 'word' parameter should be a single Hebrew word, in the iso8859-8 encoding, possibly containing the ASCII quote or double-quote characters (signifying the geresh and gershayim used in Hebrew for abbreviations, acronyms, and a few foreign sounds). If the calling programs works with other encodings, it must convert the word to iso8859-8 first. In particular cp1255 (the MS-Windows Hebrew encoding) extensions to iso8859-8 like niqqud characters, geresh or gershayim, are currently not recognized and must be removed from the word prior to calling .BR hspell_check_word() . Into the .I 'preflen' parameter, the function writes back the number of characters it recognized as a prefix particle \- the rest of the 'word' is a stand-alone word. Because Hebrew words typically can be read in several different ways, this feature (of getting just one prefix from one possible reading) is usually not very useful, and it is likely to be removed in a future version. The .B hspell_enum_splits() function provides a way to get all possible splitting of the given .I 'word' into an optional prefix particle and a stand-alone word. For each possible (and legal, as some words cannot accept certain prefixes) split, a user-defined callback function is called. This callback function is given the whole word, the length of the prefix, the stand-alone word, and a bitfield which describes what types of words this prefix can get. Note that in some cases, a word beginning with the letter waw gets this waw doubled before a prefix, so sometimes strlen(word)!=strlen(baseword)+preflen. The .B hspell_trycorrect() tries to find a list of possible corrections for an incorrect word. Because in Hebrew the word density is high (a random string of letters, especially if short, has a high probability of being a correct word), this function attempts to try corrections based on the assumption of a spelling error (replacement of letters that sound alike, missing or spurious immot qri'a), not typo (slipped finger on the keyboard, etc.) - see also CAVEATS. .B hspell_trycorrect() returns the correction list into a structure of type \fIstruct corlist\fR. This structure must be first allocated with a call to .B corlist_init() and subsequently freed with .BR corlist_free() . The .B corlist_n() macro returns the number of words held in an allocated corlist, and .B corlist_str() returns the i'th word. Accordingly, here is an example usage of .BR hspell_trycorrect() : struct corlist cl; printf ("Found misspelled word %s. Possible corrections:\\n", w); corlist_init (&cl); hspell_trycorrect (dict, w, &cl); for (i=0; i and Dan Kenigsberg . Hspell is free software, released under the GNU General Public License (GPL). Note that not only the programs in the distribution, but also the dictionary files and the generated word lists, are licensed under the GPL. There is no warranty of any kind. See the LICENSE file for more information and the exact license terms. The latest version of this software can be found in .B http://www.ivrix.org.il/projects/spell-checker .SH "SEE ALSO" .BR hspell (1)