Table of Contents

Name

adfa - create acyclic automata from sets of strings

Synopsis

adfa [ options ] [ file ]

Description

adfa constructs a minimal, deterministic, acyclic finite-state automaton from a set of strings. The automaton is created in memory, and it is not saved anywhere. Various construction methods are supported. The default construction method is the incremental algorithm for sorted data. The algorithm requires data to be sorted in lexicographical order. Depending on compile options, certain statistics can be produced on output. See file Makefile for a brief description. Strings are read from the specified file, if given. Otherwise they are read from the standard input. There should be one string (one word) per line.

Additionally, adfa can be used to construct pseudo-minimal automata. Those are bigger, but they have a feature that might prove useful. For each word in the language of the automaton, there is a proper element (here: transition) that is not shared with any other word. To make sure that the proper elements are only transitions, the words should end with a special end-of-word marker, e.g. #. The proper transitions can then be used to e.g. implement dynamic perfect hashing, or in fact any function on words.

With the compile option HASH turned on, adfa can also be used to build automata that implement perfect hashing assigning consecutive natural numbers to consecutive input words. The automata are not minimal with regard to their language, but they are minimal if the weights on transitions are included in the labels.

Options

-u
incremental algorithm for unsorted data. Strings can be in arbitrary order.
-w
semi-incremental Watson's algorithm. Strings must be sorted on descreasing length.
-r
semi-incremental Revuz's algorithm. Strings must be reversed, lexicographically sorted, and reversed again.
-h
constructing a trie, and then using Hopcroft's algorithm to minimize it. Strings may come in arbitrary order.
-p
constructing a trie, and then minimizing it using register-base postorder minimization (the same as in incremental algorithms). Strings may come in arbitrary order.
-l
constructing a trie, and then minimizing it using the minimization phase from Revuz's algorithm (sorting states on height, and then using bucket sort on each layer).
-R
Revuz's algorithm for building a pseudo-minimal automaton. Strings must be reversed, lexicographically sorted, and reversed again.
-S
incremental algorithm for building pseudo-minimal automaton from lexicographically sorted data.
-W
extension of Watson's algorithm for building pseudo-minimal automaton from data sorted on decreasing length.
-U
incremental algorithm for building pseudo-minimal automaton from unsorted data.

Exit Status

  1. OK
  2. Syntax error (bad invocation).
  3. Reading from the specified file not possible
  4. Unknown method

See Also

http://www.pg.gda.pl/~jandac/adfa.html

Bugs

Beware of LANG and LOCALE settings for sorting. For the incremental sorted data algorithm, and for Revuz's algorithm, the data must be properly sorted, not "Microsoft sorted". No duplicates are allowed.

The program simply crushes when not enough memory is available.

Send bug reports to the author: Jan Daciuk, jandac at eti.pg.gda.pl.


Table of Contents