module.module module¶

class module_dependencies.module.module.Module[source]¶

Bases: object

__init__(module: str, token: str | None = None, count: int | str = '25000', verbose: bool = True, lazy: bool = True, python: bool = True, jupyter: bool = True) → None[source]¶

Create a Module instance that can be used to find which sections of a Python module are most frequently used.

This class exposes the following methods:

usage()
nested_usage()
repositories()
plot()
n_uses()
n_files()
n_repositories()

Parameters:

module (str) – The name of a Python module of which to find the frequently used objects, e.g. “nltk”.
token (str, optional) – Sourcegraph API token to avoid rate-limiting 429 error
count (Union[int, str], optional) – The maximum number of times an import of module should be fetched. Roughly equivalent to the number of fetched files. Either an integer, a string representing an integer, or “all”, defaults to “25000”.
verbose (bool, optional) – If True, set the logging level to INFO, otherwise to WARNING. True implies that there is some data printed to sys.out, while False makes the class quiet. Defaults to True.
lazy (bool, optional) – If True, waits with fetching and parsing the data to when the data is required. Defaults to True.
python (bool) –
jupyter (bool) –

Return type:

None

property data: Dict¶

Cached property of a Module, containing the parsed data from the SourceGraph API. This property lazily loads the data once upon request, and then parses it using Source(…).dependencies().

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count=3)
>>> pprint(module.data, depth=1)
{
    'alert': None,
    'cloning': [],
    'elapsedMilliseconds': 573,
    'limitHit': True,
    'matchCount': 3,
    'missing': [],
    'repositoriesCount': 1,
    'results': [...],
    'timedout': []
}

Returns:: The cached, parsed SourceGraph API data.
Return type:: Dict

static is_subsection_of(var_one: Tuple[str], var_two: Tuple[str]) → bool[source]¶

Check whether var_one is a subsection of var_two. This means that var_two can be created by inserting strings into the tuple of var_one. For example, var_two as (‘nltk’, ‘tokenize’, ‘word_tokenize’) can be created by inserting ‘tokenize’ into a var_one as (‘nltk’, ‘word_tokenize’), so this function returns True.

Parameters:

var_one (Tuple[str]) – Tuple of strings representing the path to a Python object, e.g. (‘nltk’, ‘word_tokenize’).
var_two (Tuple[str]) – Tuple of strings representing the path to a Python object, e.g. (‘nltk’, ‘tokenize’, ‘word_tokenize’).

Returns:

True if var_one is a subsection of var_two.

Return type:

bool

usage(merge: bool = True, cumulative: bool = False) → List[Tuple[str, int]][source]¶

Get a list of object-occurrence tuples, sorted by most to least frequent.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="3")
>>> module.usage()
[('nltk.metrics.distance.edit_distance', 2),
('nltk.tokenize.sent_tokenize', 1),
('nltk.tokenize.treebank.TreebankWordDetokenizer', 1)]

Parameters:

merge (bool) – Whether to attempt to merge e.g. “nltk.word_tokenize” into “nltk.tokenize.word_tokenize”. May give incorrect results for projects with “compat” folders, as the merging tends to prefer longer paths, e.g. “tensorflow.float32” will become “tensorflow.compat.v1.dtypes.float32” as opposed to just “tensorflow.dtypes.float32”. Defaults to True.
cumulative (bool) –

Returns:

A list of object-occurrence tuples, sorted by most to least frequent.

Return type:

List[Tuple[str, int]]

nested_usage(full_name: bool = False, merge: bool = True, cumulative: bool = True) → Dict[str, Dict | int][source]¶

Get a (recursive) dictionary of objects mapped to occurrence of that object, and the object’s children.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="3")
>>> module.nested_usage()
{
    "nltk": {
        "occurrences": 4,
        "corpus": {
            "occurrences": 2,
            "stopwords": {
                "occurrences": 2,
                "words": {
                    "occurrences": 2
                }
            }
        },
        "tokenize": {
            "occurrences": 2,
            "sent_tokenize": {
                "occurrences": 1
            },
            "treebank": {
                "occurrences": 1,
                "TreebankWordDetokenizer": {
                    "occurrences": 1
                }
            }
        }
    }
}

TODO: Optimize this by relying on usage() better for cumulative

Parameters:

full_name (bool) – Whether each dictionary key should be the full path, e.g. “nltk.tokenize”, rather than just the right-most section. Defaults to False.
merge (bool) – Whether to attempt to merge e.g. “nltk.word_tokenize” into “nltk.tokenize.word_tokenize”. May give incorrect results for projects with “compat” folders, as the merging tends to prefer longer paths, e.g. “tensorflow.float32” will become “tensorflow.compat.v1.dtypes.float32” as opposed to just “tensorflow.dtypes.float32”. Defaults to True.
cumulative (bool) – Whether to include usage counts of e.g. “nltk.tokenize.word_tokenize” into “nltk.tokenize” and “nltk” as well. Defaults to True.
cumulative – bool

Returns:

A dictionary mapping objects to how often that object occurred in the parsed source code.

Return type:

Dict[str, Union[Dict, int]]

repositories(obj: str = '') → Dict[str, Dict[str, Any]][source]¶

Return a mapping of repository names to repository information that were fetched and parsed. Contains “description”, “stars”, “isFork” keys, plus a list of “files” with “name”, “path”, “url”, “dependencies” and “parse_error” fields. The “parse_error” field lists the error that was encountered when attempting to parse the file, e.g. “SyntaxError”. This might happen when a Python 2 file was fetched.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="3")
>>> module.repositories()
{
    "github.com/codelucas/newspaper": {
        "description": "News, full-text, and article metadata extraction in Python 3. Advanced docs:",
        "stars": 11224,
        "isFork": false,
        "files": [
            {
                "name": "download_corpora.py",
                "path": "download_corpora.py",
                "url": "/github.com/codelucas/newspaper/-/blob/download_corpora.py",
                "dependencies": [
                    "nltk.download"
                ],
                "parse_error": null
            },
            {
                "name": "nlp.py",
                "path": "newspaper/nlp.py",
                "url": "/github.com/codelucas/newspaper/-/blob/newspaper/nlp.py",
                "dependencies": [
                    "nltk.data.load"
                ],
                "parse_error": null
            },
            {
                "name": "text.py",
                "path": "newspaper/text.py",
                "url": "/github.com/codelucas/newspaper/-/blob/newspaper/text.py",
                "dependencies": [
                    "nltk.stem.isri.ISRIStemmer",
                    "nltk.tokenize.wordpunct_tokenize"
                ],
                "parse_error": null
            }
        ]
    }
}

Returns:: A mapping of repositories
Return type:: Dict[str, Dict[str, Any]]
Parameters:: obj (str) –

plot(merge: bool = True, threshold: int = 0, limit: int = -1, max_depth: int = 4, transparant: bool = False, show: bool = True) → None[source]¶

Display a plotly Sunburst plot showing the frequency of use of different sections of this module.

Parameters:

merge (bool) – Whether to attempt to merge e.g. “nltk.word_tokenize” into “nltk.tokenize.word_tokenize”. May give incorrect results for projects with “compat” folders, as the merging tends to prefer longer paths, e.g. “tensorflow.float32” will become “tensorflow.compat.v1.dtypes.float32” as opposed to just “tensorflow.dtypes.float32”. Defaults to True.
threshold (int) –
limit (int) –
max_depth (int) –
transparant (bool) –
show (bool) –

Return type:

None

n_uses(obj: str = '') → int[source]¶

Return the number of uses of the module.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="100")
>>> module.n_uses()
137

Returns:: The number of uses, i.e. the number of times self.module was used in the fetched files.
Return type:: int
Parameters:: obj (str) –

n_files() → int[source]¶

Return the number of files fetched.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="100")
>>> module.n_files()
100

Returns:: The number of fetched files in which self.module was imported. Generally equivalent or similar to count if it was provided.
Return type:: int

n_repositories() → int[source]¶

Return the number of repositories fetched.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="100")
>>> module.n_repositories()
52

TODO: Exclude errorred code

Returns:: The number of fetched repositories in which self.module was imported.
Return type:: int

Home

Documentation

module.module module¶