Metadata-Version: 1.0
Name: DAWG-Python
Version: 0.7
Summary: Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension.
Home-page: https://github.com/kmike/DAWG-Python/
Author: Mikhail Korobov
Author-email: kmike84@gmail.com
License: UNKNOWN
Description: DAWG-Python
        ===========
        
        .. image:: https://travis-ci.org/kmike/DAWG-Python.png?branch=master
            :target: https://travis-ci.org/kmike/DAWG-Python
        .. image:: https://coveralls.io/repos/kmike/DAWG-Python/badge.png?branch=master
            :target: https://coveralls.io/r/kmike/DAWG-Python
        
        
        This pure-python package provides read-only access for files
        created by `dawgdic`_ C++ library and `DAWG`_ python package.
        
        .. _dawgdic: https://code.google.com/p/dawgdic/
        .. _DAWG: https://github.com/kmike/DAWG
        
        This package is not capable of creating DAWGs. It works with DAWGs built by
        `dawgdic`_ C++ library or `DAWG`_ Python extension module. The main purpose
        of DAWG-Python is to provide an access to DAWGs without requiring compiled
        extensions. It is also quite fast under PyPy (see benchmarks).
        
        Installation
        ============
        
        pip install DAWG-Python
        
        Usage
        =====
        
        The aim of DAWG-Python is to be API- and binary-compatible
        with `DAWG`_ when it is possible.
        
        First, you have to create a dawg using DAWG_ module::
        
            import dawg
            d = dawg.DAWG(data)
            d.save('words.dawg')
        
        And then this dawg can be loaded without requiring C extensions::
        
            import dawg_python
            d = dawg_python.DAWG().load('words.dawg')
        
        Please consult `DAWG`_ docs for detailed usage. Some features
        (like constructor parameters or ``save`` method) are intentionally
        unsupported.
        
        Benchmarks
        ==========
        
        Benchmark results (100k unicode words, integer values (lenghts of the words),
        PyPy 1.9, macbook air i5 1.8 Ghz)::
        
            dict __getitem__ (hits):        11.090M ops/sec
            DAWG __getitem__ (hits):        not supported
            BytesDAWG __getitem__ (hits):   0.493M ops/sec
            RecordDAWG __getitem__ (hits):  0.376M ops/sec
        
            dict get() (hits):              10.127M ops/sec
            DAWG get() (hits):              not supported
            BytesDAWG get() (hits):         0.481M ops/sec
            RecordDAWG get() (hits):        0.402M ops/sec
            dict get() (misses):            14.885M ops/sec
            DAWG get() (misses):            not supported
            BytesDAWG get() (misses):       1.259M ops/sec
            RecordDAWG get() (misses):      1.337M ops/sec
        
            dict __contains__ (hits):           11.100M ops/sec
            DAWG __contains__ (hits):           1.317M ops/sec
            BytesDAWG __contains__ (hits):      1.107M ops/sec
            RecordDAWG __contains__ (hits):     1.095M ops/sec
        
            dict __contains__ (misses):         10.567M ops/sec
            DAWG __contains__ (misses):         1.902M ops/sec
            BytesDAWG __contains__ (misses):    1.873M ops/sec
            RecordDAWG __contains__ (misses):   1.862M ops/sec
        
            dict items():           44.401 ops/sec
            DAWG items():           not supported
            BytesDAWG items():      3.226 ops/sec
            RecordDAWG items():     2.987 ops/sec
            dict keys():            426.250 ops/sec
            DAWG keys():            not supported
            BytesDAWG keys():       6.050 ops/sec
            RecordDAWG keys():      6.363 ops/sec
        
            DAWG.prefixes (hits):    0.756M ops/sec
            DAWG.prefixes (mixed):   1.965M ops/sec
            DAWG.prefixes (misses):  1.773M ops/sec
        
            RecordDAWG.keys(prefix="xxx"), avg_len(res)==415:       1.429K ops/sec
            RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17:      36.994K ops/sec
            RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3:    121.897K ops/sec
            RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/sec
            RecordDAWG.keys(prefix="xxx"), NON_EXISTING:            2450.898K ops/sec
        
        Under CPython expect it to be about 50x slower.
        Memory consumption of DAWG-Python should be the same as of `DAWG`_.
        
        .. _marisa-trie: https://github.com/kmike/marisa-trie
        
        Current limitations
        ===================
        
        * This package is not capable of creating DAWGs;
        * all the limitations of `DAWG`_ apply.
        
        Contributions are welcome!
        
        
        Contributing
        ============
        
        Development happens at github and bitbucket:
        
        * https://github.com/kmike/DAWG-Python
        * https://bitbucket.org/kmike/DAWG-Python
        
        The main issue tracker is at github: https://github.com/kmike/DAWG-Python/issues
        
        Feel free to submit ideas, bugs, pull requests (git or hg) or
        regular patches.
        
        Running tests and benchmarks
        ----------------------------
        
        Make sure `tox`_ is installed and run
        
        ::
        
            $ tox
        
        from the source checkout. Tests should pass under python 2.6, 2.7, 3.2, 3.3
        and PyPy >= 1.9.
        
        In order to run benchmarks, type
        
        ::
        
            $ tox -c bench.ini -e pypy
        
        This runs benchmarks under PyPy (they are about 50x slower under CPython).
        
        .. _tox: http://tox.testrun.org
        
        Authors & Contributors
        ----------------------
        
        * Mikhail Korobov <kmike84@gmail.com>
        
        The algorithms are from `dawgdic`_ C++ library by Susumu Yata & contributors.
        
        License
        =======
        
        This package is licensed under MIT License.
        
        
        
        Changes
        =======
        
        0.7 (2013-10-13)
        ----------------
        
        IntDAWG and IntCompletionDAWG are implemented.
        
        0.6 (2013-03-23)
        ----------------
        
        Use less shared state internally. This should fix thread-safety bugs and
        make iterkeys/iteritems reenterant.
        
        0.5.1 (2013-03-01)
        ------------------
        
        Internal tweaks: memory usage is reduced; something is a bit faster,
        something is a bit slower.
        
        0.5 (2012-10-08)
        ----------------
        
        Storage scheme is updated to match DAWG==0.5. This enables
        the alphabetical ordering of ``BytesDAWG`` and ``RecordDAWG`` items.
        
        In order to read ``BytesDAWG`` or ``RecordDAWG`` created with
        versions of DAWG < 0.5 use ``payload_separator`` constructor argument::
        
            >>> BytesDAWG(payload_separator=b'\xff').load('old.dawg')
        
        
        0.3.1 (2012-10-01)
        ------------------
        
        Bug with empty DAWGs is fixed.
        
        0.3 (2012-09-26)
        ----------------
        
        - ``iterkeys`` and ``iteritems`` methods.
        
        0.2 (2012-09-24)
        ----------------
        
        ``prefixes`` support.
        
        0.1 (2012-09-20)
        ----------------
        
        Initial release.
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Cython
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
