Metadata-Version: 1.1
Name: corenlp-python
Version: 3.2.0-3
Summary: A Stanford Core NLP wrapper
Home-page: https://bitbucket.org/torotoki/corenlp-python
Author: Hiroyoshi Komatsu
Author-email: hiroyoshi.komat@gmail.com
License: UNKNOWN
Description: # A Python wrapper for the Java Stanford Core NLP tools
        ---------------------------
        
        This is a fork of Dustin Smith's [stanford-corenlp-python](https://github.com/dasmith/stanford-corenlp-python), a Python interface to [Stanford CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml). It can either use as python package, or run as a JSON-RPC server.
        
        ## Edited
           * Update to Stanford CoreNLP v3.2.0
           * Fix many bugs & improve performance
           * Using jsonrpclib for stability and performance
           * Can edit the constants as argument such as Stanford Core NLP directory
           * Adjust parameters not to timeout in high load
           * Fix a problem with long text input by Johannes Castner [stanford-corenlp-python](https://github.com/jac2130/stanford-corenlp-python)
           * Packaging
        
        ## Requirements
           * [pexpect](http://www.noah.org/wiki/pexpect)
           * [unidecode](http://pypi.python.org/pypi/Unidecode)
           * [jsonrpclib](https://github.com/joshmarshall/jsonrpclib) (optionally)
        
        ## Download and Usage
        
        To use this program you must [download](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpack the zip file containing Stanford's CoreNLP package.  By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.
        
        
        In other words:
        
            sudo pip install pexpect unidecode jsonrpclib   # jsonrpclib is optional
            git clone https://bitbucket.org/torotoki/corenlp-python.git
        	  cd corenlp-python
            wget http://nlp.stanford.edu/software/stanford-corenlp-full-2013-06-20.zip
            unzip stanford-corenlp-full-2013-06-20.zip
        
        Then, to launch a server:
        
            python corenlp/corenlp.py
        
        Optionally, you can specify a host or port:
        
            python corenlp/corenlp.py -H 0.0.0.0 -p 3456
        
        That will run a public JSON-RPC server on port 3456.
        And you can specify Stanford CoreNLP directory:
        
            python corenlp/corenlp.py -S stanford-corenlp-full-2013-06-20/
        
        
        Assuming you are running on port 8080 and CoreNLP directory is `stanford-corenlp-full-2013-06-20/` in current directory, the code in `client.py` shows an example parse:
        
            import jsonrpclib
            from simplejson import loads
            server = jsonrpclib.Server("http://localhost:8080")
        
            result = loads(server.parse("Hello world.  It is so beautiful"))
            print "Result", result
        
        That returns a dictionary containing the keys `sentences` and (when applicable) `corefs`. The key `sentences` contains a list of dictionaries for each sentence, which contain `parsetree`, `text`, `tuples` containing the dependencies, and `words`, containing information about parts of speech, NER, etc:
        
        	{u'sentences': [{u'parsetree': u'(ROOT (S (VP (NP (INTJ (UH Hello)) (NP (NN world)))) (. !)))',
        	                 u'text': u'Hello world!',
        	                 u'tuples': [[u'dep', u'world', u'Hello'],
        	                             [u'root', u'ROOT', u'world']],
        	                 u'words': [[u'Hello',
        	                             {u'CharacterOffsetBegin': u'0',
        	                              u'CharacterOffsetEnd': u'5',
        	                              u'Lemma': u'hello',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'UH'}],
        	                            [u'world',
        	                             {u'CharacterOffsetBegin': u'6',
        	                              u'CharacterOffsetEnd': u'11',
        	                              u'Lemma': u'world',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'NN'}],
        	                            [u'!',
        	                             {u'CharacterOffsetBegin': u'11',
        	                              u'CharacterOffsetEnd': u'12',
        	                              u'Lemma': u'!',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'.'}]]},
        	                {u'parsetree': u'(ROOT (S (NP (PRP It)) (VP (VBZ is) (ADJP (RB so) (JJ beautiful))) (. .)))',
        	                 u'text': u'It is so beautiful.',
        	                 u'tuples': [[u'nsubj', u'beautiful', u'It'],
        	                             [u'cop', u'beautiful', u'is'],
        	                             [u'advmod', u'beautiful', u'so'],
        	                             [u'root', u'ROOT', u'beautiful']],
        	                 u'words': [[u'It',
        	                             {u'CharacterOffsetBegin': u'14',
        	                              u'CharacterOffsetEnd': u'16',
        	                              u'Lemma': u'it',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'PRP'}],
        	                            [u'is',
        	                             {u'CharacterOffsetBegin': u'17',
        	                              u'CharacterOffsetEnd': u'19',
        	                              u'Lemma': u'be',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'VBZ'}],
        	                            [u'so',
        	                             {u'CharacterOffsetBegin': u'20',
        	                              u'CharacterOffsetEnd': u'22',
        	                              u'Lemma': u'so',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'RB'}],
        	                            [u'beautiful',
        	                             {u'CharacterOffsetBegin': u'23',
        	                              u'CharacterOffsetEnd': u'32',
        	                              u'Lemma': u'beautiful',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'JJ'}],
        	                            [u'.',
        	                             {u'CharacterOffsetBegin': u'32',
        	                              u'CharacterOffsetEnd': u'33',
        	                              u'Lemma': u'.',
        	                              u'NamedEntityTag': u'O',
        	                              u'PartOfSpeech': u'.'}]]}],
        	u'coref': [[[[u'It', 1, 0, 0, 1], [u'Hello world', 0, 1, 0, 2]]]]}
        
        Not to use JSON-RPC, load the module instead:
        
            from corenlp import StanfordCoreNLP
            corenlp_dir = "stanford-corenlp-full-2013-06-20/"
            corenlp = StanfordCoreNLP(corenlp_dir)  # wait a few minutes...
            corenlp.raw_parse("Parse it")
        
        If you need to parse long texts (more than 30-50 sentences), you must use a `batch_parse` function. It reads text files from input directory and returns a generator object of dictionaries parsed each file results:
        
            from corenlp import batch_parse
            corenlp_dir = "stanford-corenlp-full-2013-06-20/"
            raw_text_directory = "sample_raw_text/"
            parsed = batch_parse(raw_text_directory, corenlp_dir)  # It returns a generator object
            print parsed  #=> [{'coref': ..., 'sentences': ..., 'file_name': 'new_sample.txt'}]
        
        The function uses XML output feature of Stanford CoreNLP, and you can take all information by `raw_output` option. If true, CoreNLP's XML is returned as a dictionary without converting the format.
        
            parsed = batch_parse(raw_text_directory, corenlp_dir, raw_output=True)
        
        (note: The function requires xmltodict now, you should install it by `sudo pip install xmltodict`)
        
        ## Developer
           * Hiroyoshi Komatsu [hiroyoshi.komat@gmail.com]
           * Johannes Castner [jac2130@columbia.edu]
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Classifier: Programming Language :: Python
