Subsections
   
  * 3.1 As an independent spell checker
  * 3.2 As a ``ispell -a'' replacement
      + 3.2.1 Format of the Data Stream
          o 3.2.1.1 Notes of Storing Replacement Pairs
  * 3.3 As an utility to manage word lists
      + 3.3.1 Creating a Individual Word List
      + 3.3.2 Format of the Replacement Word List
      + 3.3.3 Dumping the contents of the word list
  * 3.4 Notes on options pertaining to which word lists to use
  * 3.5 Notes on the options to control run-together words
  * 3.6 Notes on various filters and filter mode
      + 3.6.1 None Mode
      + 3.6.2 Url Filter/Mode
      + 3.6.3 Email Filter/Mode
      + 3.6.4 SGML Filter/Mode
      + 3.6.5 TEX Filter/Mode
  * 3.7 Notes on the different suggestion modes

--------------------------------------------------------------------------

3. The Aspell utility

The Aspell utility is a multipurpose utility that can function as a
``ispell -a'' replacement, as an independent spell checker, and as a
utility for managing dictionaries. Here is a brief summary of Aspell's
command line options. For a more detailed explanation of how to use Aspell
to perform a particular task please see one of the proceeding sections.

    aspell [options] command

command is one of:

check file
    to check a file (see section 3.1)
pipe
    'ispell -a' compatible mode (see section 3.2).
list
    produce a list of misspelled words from standard input
config
    dump the current configuration to stdout
soundslike
    returns the soundslike equivalent for each word entered
filter
    passes standard input through the same set of filters that would be
    used to spell check a document.
help
    display online help
version
    prints a version line
dump|create|merge master|personal|repl [word list]
    dumps, creates, or merges a master, personal, or replacement word list
    (see section 3.3).

[options] is any or all of the following standard aspell library options:

--conf=file
    main configuration file
--conf-dir=dir
    location of main configuration file
--data-dir=dir
    location of language data files
--local-data-dir=dir
    alternative location of language data files. This directory is
    searched before data-dir. It defaults to the same directory the actual
    main word list is in (which is not necessarily dict-dir).
--add|rem-filter=str
    add or removes a filter
--home-dir=dir
    location for personal files
-W,--ignore=integer
    ignore words <= n chars
--[dont-]ignore-case
    ignore case when checking words
--[dont-]ignore-accents
    ignore accents when checking words
--[dont-]ignore-repl
    ignore commands to store replacement pairs
--[dont-]save-repl
    save the replacement word list on save all
--lang=str
    default language to use
--mode=str
    sets the filter mode. Mode is one if none, url, email, sgml, or tex.
-e,--mode=email
    enter Email mode.
-H,--mode=sgml
    enter Html/Sgml mode.
-t,--mode=tex
    enter TEX mode.
--per-conf=file
    personal configuration file
-p,--personal=file
    personal word list file name
--repl=file
    replacements list file name
--sug-mode=mode
    suggestion mode = fast | normal | bad-spellers (see section 3.7)

in addition top options to control which dictionaries to use and how they
behave (see section 3.4):

-d,--master=name
    main word list base name
--dict-dir=dir
    location of the main word list
--add|rem-extra-dicts=<str>
    extra dictionaries to use
--[dont-]strip-accents
    strip accents from all words in the dictionary

in addition to options to control the behavior of run-together words(see
section 3.5):

-C|--run-together
    consider run-together words legal
-B|--dont-run-together
    don't consider run-together words legal
--run-together-limit=<int>
    maximum numbers that can be strung together
--run-together-min=<int>
    minimal length of interior words

plus options to modify the behavior of the various filters (see section
3.6):

--add|rem-email-quote=char
    email quote characters
--email-margin=integer
    num chars that can appear before the quote char
--add|rem-sgml-check=str
    sgml tags to always check.
--add|rem-sgml-extension=str
    sgml file extensions.
--add|rem-tex-command=str
    TEX commands
--[dont-]tex-check-comments
    check TEX comments

in addition to some aspell utility specific command:

-b,--backup
    create a backup file by appending ``.bak'' to the file name. (Only
    applies when the command is check)
-x,--dont-backup
    don't create a backup file.
--[dont-]time
    time load time and suggest time in pipe mode.
--[dont-]reverse
    reverse the order of the suggestions list.

In addition Aspell with try to make seance out of Ispell's command line
options so that it can function as a drop in replacement for Ispell when
used is ``-a'' mode.

If Aspell is specified with out any command line options it will display a
brief help screen and quit.

Aspell can also make use of a global or user configuration file. Each line
of the configuration file has the format:

    option [args]

where option is any one of the standard library options above without the
leading dashes. For example the following line will set the default
language to German:

    lang german

Anything from a ``#'' to a newline is ignored. The global configuration
file is usually named ``aspell.conf'' and is found in the etc directory
while the user configuration file is usually named ``.aspell.conf'' and is
found in the users home directory. Use ``aspell dump config'' to found out
what they are for your installation.

The environmental variable ASPELL_CONF may also be used and it overrides
any options set in the configuration file. The format of the string is
exactly the same as the configuration file except that semicolons ( ; )
are used instead of newlines.


3.1 As an independent spell checker

To use Aspell as an independent spell checker type

    aspell check filename

Where filename is the file you want to check. Aspell will over right the
original file with the corrected version. The original version is saved as
filename.bak unless it is turned off with the dont-backup option.

If the extension is .tex in will check the file in tex mode unless
overridden by the mode option. If the extension is one of the extensions
in the sgml-extension option (see section 3.6.4) it will check the file in
sgml unless overridden by the mode option.

The exit command saves the file with the corrections made so far. If you
want to quite without saving use control-C.


3.2 As a ``ispell -a'' replacement

To actually use Aspell as an Ispell replacement simply follow the
directions in section 2.6.

When given the pipe or -a command aspell goes into a pipe mode that is
compatible with ``ispell -a''. Aspell also defines its own set of
extensions to ispell pipe mode.


3.2.1 Format of the Data Stream

In this mode, Aspell prints a one-line version identification message, and
then begins reading lines of input. For each input line, a single line is
written to the standard output for each word checked for spelling on the
line. If the word was found in the main dictionary, or your personal
dictionary, then the line contains only a '*'.

If the word is not in the dictionary, but there are suggestions, then the
line contains an '&', a space, the misspelled word, a space, the number of
near misses, the number of characters between the beginning of the line
and the beginning of the misspelled word, a colon, another space, and a
list of the suggestions separated by commas and spaces.

Finally, if the word does not appear in the dictionary, and there are no
suggestions, then the line contains a '#', a space, the misspelled word, a
space, and the character offset from the beginning of the line. Each
sentence of text input is terminated with an additional blank line,
indicating that ispell has completed processing the input line.

These output lines can be summarized as follows:

OK:
    *
Suggestions:
    & original count offset: miss, miss, ...
None:
    # original offset

When in the -a mode, Aspell will also accept lines of single words
prefixed with any of '*', '&', '@', '+', '-', '~', '#', '!', '%', or '^'.
A line starting with '*' tells ispell to insert the word into the user's
dictionary. A line starting with '&' tells ispell to insert an
all-lowercase version of the word into the user's dictionary. A line
starting with '@' causes ispell to accept this word in the future. A line
starting with '+', followed immediately by a valid mode will cause aspell
to parse future input according the syntax of that formatter. A line
consisting solely of a '+' will place ispell in TEX/LATEX mode (similar to
the -t option) and '-' returns aspell to its default mode (but these
commands are obsolete). A line '~', is ignored for ispell compatibility. A
line prefixed with '#' will cause the personal dictionaries to be saved. A
line prefixed with '!' will turn on terse mode (see below), and a line
prefixed with '%' will return ispell to normal (non-terse) mode. Any input
following the prefix characters '+', '-', '#', '!', '~', or '%' is
ignored, as is any input following. To allow spell-checking of lines
beginning with these characters, a line starting with '^' has that
character removed before it is passed to the spell-checking code. It is
recommended that programmatic interfaces prefix every data line with an
uparrow to protect themselves against future changes in Aspell.

To summarize these:

*word
    Add a word to the personal dictionary
&word
    Insert the all-lowercase version of the word in the personal
    dictionary
@word
    Accept the word, but leave it out of the dictionary
#
    Save the current personal dictionary
~
    Ignored for ispell compatibility.
+
    Enter TEX mode.
+mode
    Enter the mode specified by mode.
-
    Enter the default mode.
!
    Enter terse mode
%
    Exit terse mode
^
    Spell-check the rest of the line

In terse mode, Aspell will not print lines beginning with '*', which
indicate correct words. This significantly improves running speed when the
driving program is going to ignore correct words anyway.

In addition to the above commands which are designed for Ispell
compatibility Aspell also supports its own extension. All Aspell
extensions follow the following format.

    $$command [data]

Where data may or may not be required depending on the particular command.
Aspell currently supports the following command.

cs option,value
    Change a configuration option.
cr option
    Prints the value of a configuration option.
s word1,word2
    Returns the score of the two words based roughly on how aspell would
    score them.
Sw word
    Returns the soundlike equivalent of the word.
Sl word
    Returns a list of words that have the same soundlike equivalent.
Pw word
    Returns the phoneme equivalent of the word.
pp
    Returns a list of all words in the current personal wordlist.
ps
    Returns a list of all words in the current session dictionary.
l
    Returns the current language name.
ra mis,cor
    Add the word pair to the replacement dictionary for latter use.
    Returns nothing.

Anything returned is returned on its own line line. All lists returned
have the following format

    num of items: item1, item2, etc

(Part of the preceding section was directly copied out of the Ispell
manual)


3.2.1.1 Notes of Storing Replacement Pairs

As of version .27 of Aspell storing replacements pairs has a memory. Which
means if you first store the replacement pair:

    sicolagest -> psycolagest

then store the replacement pair

    psycolagest -> psychologist

The replacement pair

    sicolagest -> psychologist

will also get stored so that you don't have to worry about it.


3.3 As an utility to manage word lists

3.3.1 Creating a Individual Word List

To create an individual main word list from a list of words use the
command

    aspell --lang=lang create master ./base < wordlist

where base is the name of the word list and word list is the list of
words separated by white space. The ``./'' is important because without it
aspell will create the word list in the normal word list directory. If you
are trying to create a word list in a language other than english check
the aspell data-dir (usually /usr/share/aspell, use ``aspell dump config''
to find out what it is on your system) to see if a language data file
exists for your language. If not you will need to create one. See chapter
5 for more information on using Aspell with other languages.

This will create the file base in the current directory. To use the new
word list copy the file to the normal word list directory (use ``aspell
config'' to find out what it is) and use the option --master=base.

The compiled dictionary file is machine dependent. It is dependent on
endian order, and the page size for the machine because they are mmaped
in. Please do not distribute the compiled dictionaries unless you are only
distributing them for a particular platform such as you would a binary.
That is why is normally installed in ``lib/aspell: instead of ``share/
aspell''.

Aspell is now also able to use special ``multi'' dictionaries. See section
3.4 form more information.

A personal and replacement word list can be created in a similar fashion.

Because Aspell does not support any sort of affix compression like Ispell
does Ispell word lists will not work as is. In order to use Ispell's word
lists simply pipe the word list through ``ispell -e'' to expand the
munched word lists.

3.3.2 Format of the Replacement Word List

The replacement word has each replacement pair on its own line in the
following format

    misspelled word: correction

3.3.3 Dumping the contents of the word list

The dump command will simply dump the contents of a word list to stdout in
a format than can be read back in with aspell create.

If no word list is specified the command will act on the default one. For
example the command

    aspell dump personal

will simply dump the contents of the current personal word list to stdout.

This option will currently not work with ``multi'' word lists.


3.4 Notes on options pertaining to which word lists to use

As with precious versions of aspell you can specify the main dictionary to
use via the -d or --master option. However as of Aspell .32 you can now
also:

 1. Specify more than word list to use with add-extra-dicts or
    remove-extra-dicts.
 2. Optionally have all accents striped form the word lists using
    strip-accents option. This is not the same thing as the ignore-accents
    option. As enabling the ignore-accents would accept both cafe and caf
    (notice the accent on the e), but only enabling strip-accents would
    only accent cafe, even if caf is in the original dictionary. Specify
    strip-accents is just like using a word list with out the accents.
 3. Specify special ``multi'' dictionaries.

A ``multi'' dictionary is a special file which basically a list of
dictionary files to use. A multi dictionary must end is .multi and has
roughly the same format of a configuration file where the two valid keys
are add and strip-accents. The add key is used for adding individual word
lists, or other ``multi'' files. The strip-accents key is used to control
if accents are striped from the dictionaries. Unlike the global
strip-accent option this option only effects word lists that came after
the option. For example:

    strip-accents yes
    add english
    strip-accents no
    add must-accent

will strip accents from the english word list but not the must-accent word
list. If the global strip-accents option is specified the local
strip-accents options are ignored.

Aspell now provides the following multi dictionaries:



                     american-medium   american-large                     
                     british-medium    british-large                      
                     canadian-medium   canadian-large                     




The word lists themselves all contain accented words however the
strip-accents option is enabled by default for all the individual word
lists. If you wish to use the accented words you can set the global
strip-accents option to false or create a new multi word list.

To give you an idea of what the two sizes are like here is a random sample
of 50 lower case words from the medium size:

    asked behinds bowstring brute censure chlorines communistic conception
    consorting dichotomies disenfranchised eeriness encouraging
    erectnesses fluff friendless gourd gutted hods illiteracies
    insolvencies jalopies jettier kilted lackeys mangle mattocks minimally
    monotonic nagged neuritis pacifisms pedagogical porosities public
    reposed sandbagged schoolteachers spatters stickleback sweeping trivet
    twang twelfths ukuleles ultimate watersheds wavelengths whinniest
    woolliness

and 50 words found only in the large size:

    abidance airbase batrachian billfish breadroot brightener cabbageworm
    centurial clamberers contemporaneity costar cupolaed devourers
    difficile dugong dyad excursiveness fascinatingly flection freewheels
    gringos horsemint hygrophilous hyperplanes juncaceous loo madrepore
    meiosis meretriciously metathesis micro molluscoid phlegmy
    proconsulate professedly ravishers reabsorbs redecoration sidepiece
    skydiver sorrowfulness studbook tachistoscope toilworn topee
    unnavigable vitrescent waterlilies webworm workroom

Many other dictionary sizes and varieties can be created. See the scowl/
directory in the source distribution for information on the different
varieties you can create and section 3.3 for how to create an individual
dictionary.


3.5 Notes on the options to control run-together words

Aspell has support for either unconditionally accepting run-together words
or only accepting certain words in compound formation.

Support for unconditionally accepting run-together words can either be
turned on in the language data file or as a normal option via the
run-together option. The run-together-limit options controls the maximum
number of words that can be strung together, the default is normally 255.
The run-together-min options controls the minimal length the individual
components of the run together word can be, the default is normally 3.
Both the run-together-limit and run-together-min option may be specified
in both the language data file or as a normal. The run-together-mid
option, which may only be specified in the language data file, may be used
to specify up to three optional characters that may appear between
individual words.

In order for aspell to conditionally only accept certain words in
compounds those words must be flagged when the compiled word list is being
created. The format for each entry is

    word:C[1][2][3]middle char

The 1, 2, and 3 control if the word is allowed to appear in the begging,
middle, or end of the compound, respectfully. More than one position flag
may be specified. If none of them are specified it as assumed that the
word may appear anywhere. The C is optional if 1, 2, or 3 is specified.
The middle char represents an optional character that may appear after
the word in the formation of the compound if the word is not at the end of
the compound. If the letter is lowercase than the character may appear
after the word, if it is in uppercase then that letter must appear after
the compound. Only one letter may be specified and it must also be in the
list of middle letters specified via the run-together-mid option. The
run-together-limit option may also be used to specify the maximum number
of words to string together.

For example the word list:

    beg:1
    mid:2
    end:3
    any:C
    never
    must:CM
    maybe:Cm

Means that the word ``beg'' may only appear at the begging of a word, the
word ``mid'' at the middle, the word ``end'' at the end, and the word
``any'' any place. The word ``never'' is never accepted in a compound
unless the run-together option is set. The word ``must'' may appear
anywhere however it must be followed by an ``m'', while the word maybe may
be followed by an ``m''. Given the above word list the following compounds
or legal:

    begmidend
    begany
    mustmend
    maybeend
    maybemend

are all legal, but the following are not:

    begmid
    mustend
    neverany

Individual words such as ``beg'' are always accepted.

When the run-together option is not set Aspell will only accept words that
have been flagged in a run-together word. When the run-together option is
set aspell will accept words which are as least as long as the value
specified in the run-together-min option. If the words length is less than
run-together-min then it will only accept the word if it has been flagged.
When the run-together option is not set the run-together-min option is
ignored all together.

Currently Aspell only supports run-together words when checking if a word
is in the dictionary. When coming up with suggestions Aspell treats the
word as a normal word and does not do anything special. This means that
the suggestions will be virtually meaningless when the actual word is a
run-together. I plan on more intelligently supporting run-together words
when coming up with suggestions in a future version of Aspell.


3.6 Notes on various filters and filter mode

Aspell now has rudimentary filter support. You can either select from
individual filters or chose a filter mode. To select a filter mode use the
mode option. You may chose from none, url, email, sgml, and tex. The
default mode is url. Individual filters can be added with the option
add-filter and remove with the rem-filter option. The currently available
filters are url, email, sgml, tex as well as a bunch of filters which
translate the text from one format to another.

3.6.1 None Mode

This mode is exactly what it says. It turns off all filters.

3.6.2 Url Filter/Mode

The url filter/mode skips over URL's, host names, and email addresses.
Because this filter is almost always useful and rarely does any harm it is
enabled in all modes except none. To turn it off either select the none
mode or use rem-filter option after the desired mode is selected.

3.6.3 Email Filter/Mode

The email filter/mode skips over quoted text. It currently does not
support skipping over headers however a future version should. In the mean
time I suggest you use Aspell with Newsbody which can be found at http://
home.worldonline.dk/~byrial/newsbody/. The option email-skip controls the
number of characters that can appear before the email quote char, the
default is 10. The option add|rem-email-quote controls the characters that
are considered quote characters, the default is ``>' and '|'.


3.6.4 SGML Filter/Mode

The sgml filter/mode will skip over sgml commands. It currently does not
handle nested < > unless they are in quotes. It also does it handle the
null end tag (net) minimization feature of sgml such as

    <emphasis/important/

The option add|rem-sgml-check controls which sgml tags should always be
checked. The default is ``alt''.

The option add|rem-sgml-extension controls which file extensions are
recognized as sgml/html files. The default is html, htm, php, and sgml.
The extension are not case sensitive so extensions like .HTM will also be
recognized.

The sgml mode also enables a filter which will recognize sgml charter
commands such as &amp; and convert it into the proper iso8859-1 character.
Currently only the iso8859-1 character set is used however in future
versions it will convert it to the encoding that is specified in the
language date file. You can specifically turn on this filter by enable the
SGML&charset/charset filter.

3.6.5 TEX Filter/Mode

The tex (all lowercase) filter/mode skips over TEX commands and parameters
and/or options to certain command. It also skips over TEX comments by
default. The option [dont-]tex-check-comments controls whether or not
aspel will skip over TEX comments. The option add|rem-tex-command controls
which TEX commands should have certain parameters and/or options also
skipped over. Commands that are not specified will have all there
parameters and/or options checked. The format for each item is

    command  a list of p,P,o and Os

The first item is simple the command name. The second item controls which
parameters to skip over. A 'p' skips over a parameter while a 'P' won't.
Similar an 'o' will skip over an optional parameter while a 'O' won't. The
first letter on the list will apply to the first parameter, the second
letter will apply to the second parameter etc. If there are more
parameters than letters Aspell will simply check them as normal. For
example the option

    add-tex-command rule pp

will skip over the first two parameters of the ``rule'' command while the
option

    add-tex-command foo Pop

will check the first parameter of the ``foo'' command, skip over the next
optional parameter, if it is present, and will skip over the second
parameter -- even if the optional parameter is not present -- and will
check any additional parameters.

A'*' at the end of the command is simply ignored. For example the option

    enlargethispage p

will ignore the first parameter in both enlargethispage and
enlargethispage*.

To remove a command simple use the rem-tex-command option. For example

    rem-tex-command foo

will remove the command foo, if present, from the list of TEX commands.


3.7 Notes on the different suggestion modes

In order to understand what these suggestion modes do, a basic
understanding of how aspell works is required. See section 6 for that. The
suggestion modes are as follows.

fast
    This method looks for soundslikes within one edit distance apart. It
    is slower than Ispell by a factor of 1.5 to 2. It speed is only minor
    affected by the size of the word list, if at all. In this mode Aspell
    gets 88% of the words from my small test kernel of misspelled words.
    (Go to http://aspell.sourceforge.net/testfor more info on the test
    kernel as well as comparisons of this version of Aspell with previous
    versions and other spell checkers.)
normal
    This method looks for soundslikes within two edit distance apart. Is
    is around 10 times slower than fast mode with the english word list
    but returns better suggestions. Its speed is directly proportional to
    the size of the word list. This mode gets 93% of the words.
bad-spellers
    This method also looks for soundslikes within two edit distances apart
    but is more tailored for the bad speller where as fast or normal are
    more tailed to strike a good balance between typos and true
    misspellings. This method also returns a huge number of words for the
    really bad spellers who can't seam to get the spelling anything close
    to what it should be. If the misspelled word looks anything like the
    correct spelling it is bound to be found somewhere on the list of 100
    or more suggestions. This mode gets 98% of the words.

--------------------------------------------------------------------------

