from setuptools import setup


setup(
    name='python-oam',
    version='0.2',
    description='OAM toolbox made by the community to the community',
    long_description="""# Python-OAM\n![master CI](https://github.com/rodrigo-fss/python_oam/actions/workflows/github-actions.yml/badge.svg)\n![coverage](https://github.com/rodrigo-fss/python_oam/blob/main/.github/badges/coverage_badge.svg)\n\n### OAM toolbox made by the community to the community\n\nOutlier detection has been used to detect and, if appropriate, remove anomalous observati-\nons from the data. It’s usability can identify system failures and frauds before they escalate\nwith potentially huge consequences.\n\nOne ramification of the contribution made by the outlier detection field is related\nto understand which aspects of the anomalous observation significantly separate it from\nthe others in a given dataset. This area of research has been\ncalled Outlier Aspect Mining (OAM). Promising results and applications and cases have\nbeen presented by the community.\n\nThe objective of this lib is to contribute to OAM research in a practical way. **Python-OAM** allow\nyou to apply Outlier Aspect Mining algorithms and analyze the results in your own\ndatasets.\n\nFeel free to not only use it but also extend it as wished.\n\n\nInstallation\n---\n\nTo see Python-OAM in action on your own data:\n\nYou can install it using pip\n```\npip install python_oam\n```\n\npython_oam is tested with:\n\n|                     | Version (dev)  |\n|---------------------|----------------|\n| Python              | 3.7, 3.8, 3.9  |\n| pandas              | 1.3.2          |\n| seaborn             | 0.11.2         |\n| tqdm                | 4.62.0         |\n\nWhat 's OAM?\n---\nOutlying Aspect Mining (OAM), can be interpreted as "The task of looking for a set of\ncharacteristics (or subspaces) where a given object is different from the rest of the other objects."\n\nSomething in the lines of:\n\n> "OK, I know that this object is an outlier, but why exactly?"\n\n**OAM helps you to figure it out!**\n\nDefining the problem in a generic way, we consider a set of data\n*X = {x1, x2, ..., xn}* with n observations in a D-dimensional space. The application of\nOAM seeks to understand which dimensions of *D* make a sample of *X* to be considered\nan outlier.\n\nThe existing OAM techniques are classified into three major groups,\n*Score and Search* , *Feature Selection* and *Hybrid Approach*.\n\nWe mainly focused on the development of a **Score and Search** based toolbox,\nso let's talk a little bit more about it\n\nScore and Search\n---\nThe Score and Search is the most researched OAM technique and the one with the greatest results till the moment.\n\nThis approach requires a **scoring** function, to measure how much the object differs from the other objects in a\ngiven a set of dimensions (subspace).\nThe **search** algorithm will define which dimensions should be compared.\nThen, the score will be compared across all subspaces generated by the search algorithm to detect the most divergent aspects.\n\nThat may have sounded a little too complex so we may break it down into an example\n\nLet's say you have a dataset with different basketball players and their stats for a given season.\nWe can all agree that Stephen Curry was indeed an outlier in the 2015 season, but among all of\nhis stats, which one, or which set of stats, made him deliver so much more than the other players?\n\nWas it the number of jumps? or his high number of field goals attempts? maybe the average running speed?\n\nThe **scoring** function will evaluate how different he was from the other players in, let's say,\nthe dimension of "Field Goals Made", in that matter he was one of the greatest at that season so, very different.\n\nthe **search** algorithm will combine different dimensions to be evaluated, like\nlet's come with a score of how different Stephen Curry was in comparison with the other\ngiven his "Number of Jumps" and "Three Points Made". How about "Average Running Speed" and\n"Offensive rebound"?\n\nAfter scoring all the dimensions the search function returned, we compare them all\nto get to the most outlier set of dimensions\n> and that's a way to explain why Stephen Curry was recognized as an outlier basketball player in 2015 season.\n\nScoring Functions\n---\n### Ipath\n\nThe iPath (isolation Path) arises from the study published by\n[Vinh et al. (2016)](https://link.springer.com/article/10.1007/s10618-016-0453-2).\nThe authors found that using the iForest anomaly detection approach (isolation\nForest), it is possible to establish a dimensionally unbiased metric. The idea behind scoring the\niPath is that, in the most discrepant subspace, **an anomalous object is easier to isolate than the others.**\n\nThe iPath process consists of making cuts in space, isolating objects from the rest\nof the dough. In this scenario, if the object is surrounded by several others, you will need to\nmore cuts to separate it from the rest, while if the object is an outlier, it will take\nless cuts. This behavior can be observed in Figure 1, where (a) represents the\nprocedure of an outlier that was isolated with only three cuts, while (b), a value\nnot considered an outlier, it needed 7 cuts to be isolated from the rest of the data.\n\n![Image 1](https://i.postimg.cc/3w3Kwd5Q/ipath.png)\n\n\nSearch Functions\n---\n### Simple Combination\n\nSimple Combinations consists of making all possible combinations of spaces between\na minimum and maximum size. This means that, for example, for a dataset of\n*n* dimensions and a minimum and maximum size value equal to *i* and *j* respectively, the Simple\nCombination will create all possible combinations without repetition.\n\nLet's say you have a dataset with the followin dimensions\n```python\n['datetime','ticker','open','close','high','low','volume']\n```\n\nif your maximum value is one the Simple Combination algorithm will return:\n```python\n[['datetime'],['ticker'],['open'],['close'],['high'],['low'],['volume']]\n```\nif your maximum value is two the Simple Combination algorithm will return:\n```python\n[['datetime', 'ticker'], ['datetime', 'open'], ['datetime', 'close'], ['datetime', 'high'], ...]\n```\nThis technique allows the user to analyze a large number of dimensions\ncontrolling the combinatorial explosion caused by an all-to-all combination with no size limit.\n\nArchitecture\n---\n\n- Preprocess: contains a normalization function that allows the user to assign\nweights for chosen dimensions;\n- Score: contains the iPath class and will contain other score classes in the future;\n- Search: contains the SimpleCombination class and will contain other search classes\nin the future;\n- Visualization: contains some functions to assist in the before and after visualization\napplication of OAM techniques\n\nAlthough the implementation separates them into different modules, Score and Search\nfunction as a set. As the scoring algorithm chosen, for the creation of\nan instance of iPath requires only two parameters, the size of the generated subsamples\nand the number of trees to be generated.\n\n```python\nipath = IsolationPath(\n    subsample_size=256,\n    number_of_paths=50\n)\n```\n\nThen SimpleCombination, implemented as a search method, receives the instance of the chosen scoring method,\nthe parameters that define the size of the generated subspaces and which dimensions will be used.\n\n```python\nsearch = SimpleCombination(\n    ipath,\n    min_items_per_subspace=2,\n    max_items_per_subspace=4,\n    dimensions=[\n        "variation_mean", "variation_std", "up_count", "down_count",\n    ]\n)\n```\n\nThe decision to decouple the Score from the Search, and then reference them in the moment\ninstance of classes, is linked to the possibility of combining them in different ways.\nIf a new scoring algorithm is implemented, it will only need to contain one\nmethod called *score*, returning a number, to integrate with *SimpleCombination*.\nLikewise, if a new *search* method is implemented, it will only need to\nevaluate the dimension through the score method of the class received as a parameter to\nintegrate with the rest.\n\nYou can check a few examples of the lib in use in the ```oam/analysis```folder\nand in the ```test/sample_search.py``` as well. Also, the tests should serve you\nwith some guidance on how to use specific functions - the coverage is pretty good\nat the moment.\n\nTests\n---\nWe recognize our testing strategy is quite simple, many times we just look for\nexceptions in the happy path - we'll be glad to accept some help improving this area.\n\nYou can run them on your own through\n```make test```.\n""",
    long_description_content_type='text/markdown',
    author=['Rodrigo Faria', 'Tiago Colli'],
    author_email='rodrigo.f.ss@uol.com.br',
    keywords=['oam', 'outlier', 'aspect', 'mining',
              'data', 'explicability', 'outlying'],
    license='MIT',
    install_requires=[
        'pandas==1.3.2',
        'seaborn==0.11.2',
        'tqdm==4.62.0'
    ],
    packages=['oam'],
    zip_safe=False
)
