Table of Contents
adfa - create acyclic automata from sets of strings
adfa [ options
] [ file ]
adfa constructs a minimal, deterministic, acyclic
finite-state automaton from a set of strings. The automaton is created in
memory, and it is not saved anywhere. Various construction methods are supported.
The default construction method is the incremental algorithm for sorted
data. The algorithm requires data to be sorted in lexicographical order.
Depending on compile options, certain statistics can be produced on output.
See file Makefile for a brief description. Strings are read from the specified
file, if given. Otherwise they are read from the standard input. There should
be one string (one word) per line.
Additionally, adfa can be used to construct
pseudo-minimal automata. Those are bigger, but they have a feature that might
prove useful. For each word in the language of the automaton, there is a
proper element (here: transition) that is not shared with any other word.
To make sure that the proper elements are only transitions, the words should
end with a special end-of-word marker, e.g. #. The proper transitions can then
be used to e.g. implement dynamic perfect hashing, or in fact any function
on words.
With the compile option HASH turned on, adfa can also be used
to build automata that implement perfect hashing assigning consecutive
natural numbers to consecutive input words. The automata are not minimal
with regard to their language, but they are minimal if the weights on transitions
are included in the labels.
- -u
- incremental algorithm for unsorted
data. Strings can be in arbitrary order.
- -w
- semi-incremental Watson's algorithm.
Strings must be sorted on descreasing length.
- -r
- semi-incremental Revuz's algorithm.
Strings must be reversed, lexicographically sorted, and reversed again.
- -h
- constructing a trie, and then using Hopcroft's algorithm to minimize it.
Strings may come in arbitrary order.
- -p
- constructing a trie, and then minimizing
it using register-base postorder minimization (the same as in incremental
algorithms). Strings may come in arbitrary order.
- -l
- constructing a trie,
and then minimizing it using the minimization phase from Revuz's algorithm
(sorting states on height, and then using bucket sort on each layer).
- -R
- Revuz's algorithm for building a pseudo-minimal automaton. Strings must be
reversed, lexicographically sorted, and reversed again.
- -S
- incremental algorithm
for building pseudo-minimal automaton from lexicographically sorted data.
- -W
- extension of Watson's algorithm for building pseudo-minimal automaton from
data sorted on decreasing length.
- -U
- incremental algorithm for building pseudo-minimal
automaton from unsorted data.
- OK
- Syntax error (bad invocation).
- Reading from the specified file not possible
- Unknown method
http://www.pg.gda.pl/~jandac/adfa.html
Beware of LANG and LOCALE settings for sorting. For the incremental
sorted data algorithm, and for Revuz's algorithm, the data must be properly
sorted, not "Microsoft sorted". No duplicates are allowed.
The program simply
crushes when not enough memory is available.
Send bug reports to the author:
Jan Daciuk, jandac at eti.pg.gda.pl.
Table of Contents