module.module module

class module_dependencies.module.module.Module[source]

Bases: object

__init__(module: str, token: str | None = None, count: int | str = '25000', verbose: bool = True, lazy: bool = True, python: bool = True, jupyter: bool = True) None[source]

Create a Module instance that can be used to find which sections of a Python module are most frequently used.

This class exposes the following methods:

usage()
nested_usage()
repositories()
plot()
n_uses()
n_files()
n_repositories()
Parameters:
  • module (str) – The name of a Python module of which to find the frequently used objects, e.g. “nltk”.

  • token (str, optional) – Sourcegraph API token to avoid rate-limiting 429 error

  • count (Union[int, str], optional) – The maximum number of times an import of module should be fetched. Roughly equivalent to the number of fetched files. Either an integer, a string representing an integer, or “all”, defaults to “25000”.

  • verbose (bool, optional) – If True, set the logging level to INFO, otherwise to WARNING. True implies that there is some data printed to sys.out, while False makes the class quiet. Defaults to True.

  • lazy (bool, optional) – If True, waits with fetching and parsing the data to when the data is required. Defaults to True.

  • python (bool) –

  • jupyter (bool) –

Return type:

None

property data: Dict

Cached property of a Module, containing the parsed data from the SourceGraph API. This property lazily loads the data once upon request, and then parses it using Source(…).dependencies().

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count=3)
>>> pprint(module.data, depth=1)
{
    'alert': None,
    'cloning': [],
    'elapsedMilliseconds': 573,
    'limitHit': True,
    'matchCount': 3,
    'missing': [],
    'repositoriesCount': 1,
    'results': [...],
    'timedout': []
}
Returns:

The cached, parsed SourceGraph API data.

Return type:

Dict

static is_subsection_of(var_one: Tuple[str], var_two: Tuple[str]) bool[source]

Check whether var_one is a subsection of var_two. This means that var_two can be created by inserting strings into the tuple of var_one. For example, var_two as (‘nltk’, ‘tokenize’, ‘word_tokenize’) can be created by inserting ‘tokenize’ into a var_one as (‘nltk’, ‘word_tokenize’), so this function returns True.

Parameters:
  • var_one (Tuple[str]) – Tuple of strings representing the path to a Python object, e.g. (‘nltk’, ‘word_tokenize’).

  • var_two (Tuple[str]) – Tuple of strings representing the path to a Python object, e.g. (‘nltk’, ‘tokenize’, ‘word_tokenize’).

Returns:

True if var_one is a subsection of var_two.

Return type:

bool

usage(merge: bool = True, cumulative: bool = False) List[Tuple[str, int]][source]

Get a list of object-occurrence tuples, sorted by most to least frequent.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="3")
>>> module.usage()
[('nltk.metrics.distance.edit_distance', 2),
('nltk.tokenize.sent_tokenize', 1),
('nltk.tokenize.treebank.TreebankWordDetokenizer', 1)]
Parameters:
  • merge (bool) – Whether to attempt to merge e.g. “nltk.word_tokenize” into “nltk.tokenize.word_tokenize”. May give incorrect results for projects with “compat” folders, as the merging tends to prefer longer paths, e.g. “tensorflow.float32” will become “tensorflow.compat.v1.dtypes.float32” as opposed to just “tensorflow.dtypes.float32”. Defaults to True.

  • cumulative (bool) –

Returns:

A list of object-occurrence tuples, sorted by most to least frequent.

Return type:

List[Tuple[str, int]]

nested_usage(full_name: bool = False, merge: bool = True, cumulative: bool = True) Dict[str, Dict | int][source]

Get a (recursive) dictionary of objects mapped to occurrence of that object, and the object’s children.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="3")
>>> module.nested_usage()
{
    "nltk": {
        "occurrences": 4,
        "corpus": {
            "occurrences": 2,
            "stopwords": {
                "occurrences": 2,
                "words": {
                    "occurrences": 2
                }
            }
        },
        "tokenize": {
            "occurrences": 2,
            "sent_tokenize": {
                "occurrences": 1
            },
            "treebank": {
                "occurrences": 1,
                "TreebankWordDetokenizer": {
                    "occurrences": 1
                }
            }
        }
    }
}

TODO: Optimize this by relying on usage() better for cumulative

Parameters:
  • full_name (bool) – Whether each dictionary key should be the full path, e.g. “nltk.tokenize”, rather than just the right-most section. Defaults to False.

  • merge (bool) – Whether to attempt to merge e.g. “nltk.word_tokenize” into “nltk.tokenize.word_tokenize”. May give incorrect results for projects with “compat” folders, as the merging tends to prefer longer paths, e.g. “tensorflow.float32” will become “tensorflow.compat.v1.dtypes.float32” as opposed to just “tensorflow.dtypes.float32”. Defaults to True.

  • cumulative (bool) – Whether to include usage counts of e.g. “nltk.tokenize.word_tokenize” into “nltk.tokenize” and “nltk” as well. Defaults to True.

  • cumulative – bool

Returns:

A dictionary mapping objects to how often that object occurred in the parsed source code.

Return type:

Dict[str, Union[Dict, int]]

repositories(obj: str = '') Dict[str, Dict[str, Any]][source]

Return a mapping of repository names to repository information that were fetched and parsed. Contains “description”, “stars”, “isFork” keys, plus a list of “files” with “name”, “path”, “url”, “dependencies” and “parse_error” fields. The “parse_error” field lists the error that was encountered when attempting to parse the file, e.g. “SyntaxError”. This might happen when a Python 2 file was fetched.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="3")
>>> module.repositories()
{
    "github.com/codelucas/newspaper": {
        "description": "News, full-text, and article metadata extraction in Python 3. Advanced docs:",
        "stars": 11224,
        "isFork": false,
        "files": [
            {
                "name": "download_corpora.py",
                "path": "download_corpora.py",
                "url": "/github.com/codelucas/newspaper/-/blob/download_corpora.py",
                "dependencies": [
                    "nltk.download"
                ],
                "parse_error": null
            },
            {
                "name": "nlp.py",
                "path": "newspaper/nlp.py",
                "url": "/github.com/codelucas/newspaper/-/blob/newspaper/nlp.py",
                "dependencies": [
                    "nltk.data.load"
                ],
                "parse_error": null
            },
            {
                "name": "text.py",
                "path": "newspaper/text.py",
                "url": "/github.com/codelucas/newspaper/-/blob/newspaper/text.py",
                "dependencies": [
                    "nltk.stem.isri.ISRIStemmer",
                    "nltk.tokenize.wordpunct_tokenize"
                ],
                "parse_error": null
            }
        ]
    }
}
Returns:

A mapping of repositories

Return type:

Dict[str, Dict[str, Any]]

Parameters:

obj (str) –

plot(merge: bool = True, threshold: int = 0, limit: int = -1, max_depth: int = 4, transparant: bool = False, show: bool = True) None[source]

Display a plotly Sunburst plot showing the frequency of use of different sections of this module.

Parameters:
  • merge (bool) – Whether to attempt to merge e.g. “nltk.word_tokenize” into “nltk.tokenize.word_tokenize”. May give incorrect results for projects with “compat” folders, as the merging tends to prefer longer paths, e.g. “tensorflow.float32” will become “tensorflow.compat.v1.dtypes.float32” as opposed to just “tensorflow.dtypes.float32”. Defaults to True.

  • threshold (int) –

  • limit (int) –

  • max_depth (int) –

  • transparant (bool) –

  • show (bool) –

Return type:

None

n_uses(obj: str = '') int[source]

Return the number of uses of the module.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="100")
>>> module.n_uses()
137
Returns:

The number of uses, i.e. the number of times self.module was used in the fetched files.

Return type:

int

Parameters:

obj (str) –

n_files() int[source]

Return the number of files fetched.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="100")
>>> module.n_files()
100
Returns:

The number of fetched files in which self.module was imported. Generally equivalent or similar to count if it was provided.

Return type:

int

n_repositories() int[source]

Return the number of repositories fetched.

Example usage:

>>> from module_dependencies import Module
>>> module = Module("nltk", count="100")
>>> module.n_repositories()
52

TODO: Exclude errorred code

Returns:

The number of fetched repositories in which self.module was imported.

Return type:

int