data — Data Exchange#

Module providing several layers of data access to the wiki.

class data.WaitingMixin[source]#

Bases: object

A mixin to implement wait cycles.

Added in version 8.4.

Variables:
  • max_retries (int) – Maximum number of times to retry an API request before quitting. Defaults to config.max_retries if attribute is missing.

  • retry_wait (int) – Minimum time to wait before resubmitting a failed API request. Defaults to config.retry_wait if attribute is missing.

  • current_retries (int) – counter of retries made for the current request. Starting with 1 if attribute is missing.

wait(delay=None)[source]#

Determine how long to wait after a failed request.

Parameters:

delay (int | None) – Minimum time in seconds to wait. Overwrites retry_wait variable if given. The delay doubles each retry until retry_max seconds is reached.

Return type:

None

data.api — MediaWiki API Requests#

Interface to MediaWiki’s api.php.

class data.api.APIGenerator(action, continue_name='continue', limit_name='limit', data_name='data', **kwargs)[source]#

Bases: APIGeneratorBase, GeneratorWrapper

Generator that handle API responses containing lists.

The generator will iterate each item in the query response and use the continue request parameter to retrieve the next portion of items automatically. If the limit attribute is set, the iterator will stop after iterating that many values.

Changed in version 7.6: subclassed from tools.collections.GeneratorWrapper

Initialize an APIGenerator object.

kwargs are used to create a Request object; see that object’s documentation for values.

Parameters:
  • action (str) – API action name.

  • continue_name (str) – Name of the continue API parameter.

  • limit_name (str) – Name of the limit API parameter.

  • data_name (str) – Name of the data in API response.

property generator#

Submit request and iterate the response.

Continues response as needed until limit (if defined) is reached.

Changed in version 7.6: changed from iterator method to generator property

set_maximum_items(value)[source]#

Set the maximum number of items to be retrieved from the wiki.

If not called, most queries will continue as long as there is more data to be retrieved from the API.

Parameters:

value (int | str | None) – The value of maximum number of items to be retrieved in total to set. Ignores None value.

Return type:

None

set_query_increment(value)[source]#

Set the maximum number of items to be retrieved per API query.

If not called, the default is config.step.

Parameters:

value (int) – The value of maximum number of items to be retrieved per API request to set.

Return type:

None

class data.api.APIGeneratorBase[source]#

Bases: ABC

A wrapper class to handle the usage of the parameters parameter.

Changed in version 7.6: renamed from _RequestWrapper

abstract set_maximum_items(value)[source]#

Set the maximum number of items to be retrieved from the wiki.

Added in version 7.1.

Changed in version 7.6: become an abstract method

Parameters:

value (int | str | None)

Return type:

None

class data.api.CachedRequest(expiry, *args, **kwargs)[source]#

Bases: Request

Cached request.

Changed in version 9.0: timestamp with timezone is used to determine expiry.

Initialize a CachedRequest object.

Parameters:

expiry – either a number of days or a datetime.timedelta object

_cachefile_path()[source]#

Create the cachefile path.

Changed in version 8.0: return a pathlib.Path object.

Return type:

Path

static _make_dir(dir_name)[source]#

Create directory if it does not exist already.

Changed in version 7.0: Only FileExistsError is ignored but other OS exceptions can be still raised

Changed in version 8.0: use dir_name as str or pathlib.Path object but always return a Path object.

Parameters:

dir_name (str | Path) – directory path

Returns:

directory path as pathlib.Path object for test purpose

Return type:

Path

classmethod create_simple(req_site, **kwargs)[source]#

Unsupported as it requires at least two parameters.

submit()[source]#

Submit cached request.

class data.api.ListGenerator(listaction, **kwargs)[source]#

Bases: QueryGenerator

Generator for queries of type action=query&list=foo.

See the API documentation for types of lists that can be queried. Lists include both site-wide information (such as ‘allpages’) and page-specific information (such as ‘backlinks’).

This generator yields a dict object for each member of the list returned by the API, with the format of the dict depending on the particular list command used. For those lists that contain page information, it may be easier to use the PageGenerator class instead, as that will convert the returned information into a Page object.

Required and optional parameters are as for Request, except that action=query is assumed and listaction is required.

Parameters:

listaction (str) – the “list=” type from api.php

class data.api.LogEntryListGenerator(logtype=None, **kwargs)[source]#

Bases: ListGenerator

Generator for queries of list ‘logevents’.

Yields LogEntry objects instead of dicts.

result(pagedata)[source]#

Instantiate LogEntry from data from api.

class data.api.OptionSet(site=None, module=None, param=None, data=None, dict='[deprecated name of data]')[source]#

Bases: MutableMapping

A class to store a set of options which can be either enabled or not.

If it is instantiated with the associated site, module and parameter it will only allow valid names as options. If instantiated ‘lazy loaded’ it won’t checks if the names are valid until the site has been set (which isn’t required, but recommended). The site can only be set once if it’s not None and after setting it, any site (even None) will fail.

If a site is given, the module and param must be given too.

Changed in version 9.0: dict parameter was renamed to data.

Parameters:
  • site (pywikibot.site.APISite | None) – The associated site

  • module (str | None) – The module name which is used by paraminfo. (Ignored when site is None)

  • param (str | None) – The parameter name inside the module. That parameter must have a ‘type’ entry. (Ignored when site is None)

  • data (dict | None) – The initializing data dict which is used for from_dict()

api_iter()[source]#

Iterate over each option as they appear in the URL.

from_dict(dictionary)[source]#

Load options from the dict.

The options are not cleared before. If changes have been made previously, but only the dict values should be applied it needs to be cleared first.

Parameters:

dictionary (dict (keys are strings, values are bool/None)) – a dictionary containing for each entry either the value False, True or None. The names must be valid depending on whether they enable or disable the option. All names with the value None can be in either of the list.

class data.api.PageGenerator(generator, g_content=False, **kwargs)[source]#

Bases: QueryGenerator

Generator for response to a request of type action=query&generator=foo.

This class can be used for any of the query types that are listed in the API documentation as being able to be used as a generator. Instances of this class iterate Page objects.

Required and optional parameters are as for Request, except that action=query is assumed and generator is required.

Changed in version 9.1: retrieve the same imageinfo properties as in APISite.loadimageinfo() with default parameters.

Parameters:
  • generator (str) – the “generator=” type from api.php

  • g_content (bool) – if True, retrieve the contents of the current version of each Page (default False)

result(pagedata)[source]#

Convert page dict entry from api to Page object.

This can be overridden in subclasses to return a different type of object.

Changed in version 9.5: No longer raise exceptions.UnsupportedPageError but return a generic pywikibot.Page obect. The exception is raised when getting the content for example.

Changed in version 9.6: Upcast to page.FilePage if pagedata has imageinfo contents even if the file extension is invalid.

Parameters:

pagedata (dict[str, Any])

Return type:

Page

class data.api.ParamInfo(site, preloaded_modules=None)[source]#

Bases: Sized, Container

API parameter information data object.

Provides cache aware fetching of parameter information.

Deprecated since version 8.4: the modules_only_mode parameter

Parameters:

preloaded_modules (set[str] | None) – API modules to preload

property action_modules#

Set of all action modules.

attributes(attribute, modules=None)[source]#

Mapping of modules with an attribute to the attribute value.

It will include all modules which have that attribute set, also if that attribute is empty or set to False.

Parameters:
  • attribute (str) – attribute name

  • modules (set | None) – modules to include. If None (default), it’ll load all modules including all submodules using the paths.

Returns:

dict using modules as keys

Return type:

dict[str, Any]

fetch(modules)[source]#

Fetch paraminfo for multiple modules.

No exception is raised when paraminfo for a module does not exist. paraminfo[module] to cause an exception if a module does not exist.

Parameters:

modules (Iterable | str) – API modules to load

Return type:

None

init_modules = frozenset({'main', 'paraminfo'})#
property module_paths#

Set of all modules using their paths.

normalize_modules(modules)[source]#

Convert the modules into module paths.

Add query+ to any query module name not also in action modules.

Returns:

The modules converted into a module paths

Return type:

set

static normalize_paraminfo(data)[source]#

Convert API JSON into a new data structure with path as key.

For duplicate paths, the value will be False.

Changed in version 8.4: normalize_paraminfo became a staticmethod.

Parameters:

data (dict[str, Any])

Return type:

dict[str, Any]

param_modules = ('list', 'meta', 'prop')#
parameter(module, param_name)[source]#

Get details about one modules parameter.

Returns None if the parameter does not exist.

Parameters:
  • module (str) – API module name

  • param_name (str) – parameter name in the module

Returns:

metadata that describes how the parameter may be used

Return type:

dict[str, Any] | None

paraminfo_keys#

classproperty Return module types.

Deprecated since version 8.4.

property prefix_map: dict[str, str]#

Mapping of module to its prefix for all modules with a prefix.

This loads paraminfo for all modules.

property preloaded_modules: frozenset[str] | set[str]#

Return set of preloaded modules.

Deprecated since version 8.4.

property query_modules#

Set of all query module names without query+ path prefix.

root_modules = frozenset({'main'})#
submodules(name, path=False)[source]#

Set of all submodules.

Parameters:
  • name (str) – The name of the parent module.

  • path (bool) – Whether the path and not the name is returned.

Returns:

The names or paths of the submodules.

Return type:

set[str]

class data.api.PropertyGenerator(prop, **kwargs)[source]#

Bases: QueryGenerator

Generator for queries of type action=query&prop=foo.

See the API documentation for types of page properties that can be queried.

This generator yields one or more dict object(s) corresponding to each “page” item(s) from the API response; the calling module has to decide what to do with the contents of the dict. There will be one dict for each page queried via a titles= or ids= parameter (which must be supplied when instantiating this class).

Required and optional parameters are as for Request, except that action=query is assumed and prop is required.

Parameters:

prop (str) – the “prop=” type from api.php

property generator#

Yield results.

Changed in version 7.6: changed from iterator method to generator property

property props#

The requested property names.

class data.api.QueryGenerator(**kwargs)[source]#

Bases: APIGeneratorBase, GeneratorWrapper

Base class for generators that handle responses to API action=query.

By default, the generator will iterate each item in the query response, and use the (query-)continue element, if present, to continue iterating as long as the wiki returns additional values. However, if the generators’s limit attribute is set to a positive int, the generators will stop after iterating that many values. If limit is negative, the limit parameter will not be passed to the API at all.

Most common query types are more efficiently handled by subclasses, but this class can be used directly for custom queries and miscellaneous types (such as “meta=…”) that don’t return the usual list of pages or links. See the API documentation for specific query options.

Changed in version 7.6: subclassed from tools.collections.GeneratorWrapper

Initialize a QueryGenerator object.

kwargs are used to create a Request object; see that object’s documentation for values. ‘action’=’query’ is assumed.

continue_update()[source]#

Update query with continue parameters.

Added in version 3.0.

Changed in version 4.0: explicit return a bool value to be used in generator()

Changed in version 6.0: always return False

Changed in version 8.4: return None instead of False.

Return type:

None

property continuekey: list[str]#

Deprecated.

Return deprecated continuekey which is self.modules.

property generator#

Submit request and iterate the response based on self.resultkey.

Continues response as needed until limit (if any) is reached.

Changed in version 7.6: changed from iterator method to generator property

result(data)[source]#

Process result data as needed for particular subclass.

set_maximum_items(value)[source]#

Set the maximum number of items to be retrieved from the wiki.

If not called, most queries will continue as long as there is more data to be retrieved from the API.

If set to -1 (or any negative value), the “limit” parameter will be omitted from the request. For some request types (such as prop=revisions), this is necessary to signal that only current revision is to be returned.

Parameters:

value (int | str | None) – The value of maximum number of items to be retrieved in total to set. Ignores None value.

Return type:

None

set_namespace(namespaces)[source]#

Set a namespace filter on this query.

Parameters:

namespaces (iterable of str or Namespace key, or a single instance of those types. May be a '|' separated list of namespace identifiers. An empty iterator clears any namespace restriction.) – namespace identifiers to limit query results

Raises:

KeyError – a namespace identifier was not resolved

set_query_increment(value)[source]#

Set the maximum number of items to be retrieved per API query.

If not called, the default is to ask for “max” items and let the API decide how many to send.

Return type:

None

support_namespace()[source]#

Check if namespace is a supported parameter on this query.

Note

this function will be removed when set_namespace() will throw TypeError() instead of just giving a warning. See T196619.

Returns:

True if yes, False otherwise

Return type:

bool

class data.api.Request(site=None, mime=None, throttle=True, max_retries=None, retry_wait=None, use_get=None, parameters=<object object>, **kwargs)[source]#

Bases: MutableMapping, WaitingMixin

A request to a Site’s api.php interface.

Attributes of this object (except for the special parameters listed below) get passed as commands to api.php, and can be get or set using the dict interface. All attributes must be strings. Use an empty string for parameters that don’t require a value. For example, Request(action="query", titles="Foo bar", prop="info", redirects="") corresponds to the API request api.php?action=query&titles=Foo%20bar&prop=info&redirects

This is the lowest-level interface to the API, and can be used for any request that a particular site’s API supports. See the API documentation (https://www.mediawiki.org/wiki/API) and site-specific settings for details on what parameters are accepted for each request type.

Uploading files is a special case: to upload, the parameter mime must contain a dict, and the parameter file must be set equal to a valid filename on the local computer, not to the content of the file.

Returns a dict containing the JSON data returned by the wiki. Normally, one of the dict keys will be equal to the value of the ‘action’ parameter. Errors are caught and raise an APIError exception.

Example:

>>> r = Request(parameters={'action': 'query', 'meta': 'userinfo'})
>>> # This is equivalent to
>>> # https://{path}/api.php?action=query&meta=userinfo&format=json
>>> # change a parameter
>>> r['meta'] = "userinfo|siteinfo"
>>> # add a new parameter
>>> r['siprop'] = "namespaces"
>>> # note that "uiprop" param gets added automatically
>>> r.action
'query'
>>> sorted(r._params)
['action', 'meta', 'siprop']
>>> r._params['action']
['query']
>>> r._params['meta']
['userinfo', 'siteinfo']
>>> r._params['siprop']
['namespaces']
>>> data = r.submit()
>>> isinstance(data, dict)
True
>>> set(['query', 'batchcomplete', 'warnings']).issuperset(data.keys())
True
>>> 'query' in data
True
>>> sorted(data['query'])
['namespaces', 'userinfo']

Changed in version 8.4: inherited from WaitingMixin.

Changed in version 9.0: keys and items methods return a view object instead a list

Create a new Request instance with the given parameters.

The parameters for the request can be defined via either the ‘parameters’ parameter or the keyword arguments. The keyword arguments were the previous implementation but could cause problems when there are arguments to the API named the same as normal arguments to this class. So the second parameter ‘parameters’ was added which just contains all parameters. When a Request instance is created it must use either one of them and not both at the same time. To have backwards compatibility it adds a parameter named ‘parameters’ to kwargs when both parameters are set as that indicates an old call and ‘parameters’ was originally supplied as a keyword parameter.

If undefined keyword arguments were given AND the ‘parameters’ parameter was supplied as a positional parameter it still assumes ‘parameters’ were part of the keyword arguments.

If a class is using Request and is directly forwarding the parameters, Request.clean_kwargs can be used to automatically convert the old kwargs mode into the new parameter mode. This normalizes the arguments so that when the API parameters are modified the changes can always be applied to the ‘parameters’ parameter.

Parameters:
  • site – The Site to which the request will be submitted. If not supplied, uses the user’s configured default Site.

  • mime (dict | None) – If not None, send in “multipart/form-data” format (default None). Parameters which should only be transferred via mime mode are defined via this parameter (even an empty dict means mime shall be used).

  • max_retries (int | None) – Maximum number of times to retry after errors, defaults to config.max_retries.

  • retry_wait (int | None) – Minimum time in seconds to wait after an error, defaults to config.retry_wait seconds (doubles each retry until config.retry_max seconds is reached).

  • use_get (bool | None) – Use HTTP GET request if possible. If False it uses a POST request. If None, it’ll try to determine via action=paraminfo if the action requires a POST.

  • parameters (dict) – The parameters used for the request to the API.

  • kwargs – The parameters used for the request to the API.

  • throttle (bool)

_default_warning_handler(mode, msg)[source]#

A default warning handler to handle specific warnings.

Return True to retry the request, False to resume and None if the warning is not handled.

Added in version 7.2.

Parameters:
  • mode (str)

  • msg (str)

Return type:

bool | None

_handle_warnings(result)[source]#

Handle warnings; return True to retry request, False to resume.

Changed in version 7.2: Return True to retry the current request and Falso to resume.

Parameters:

result (dict[str, Any])

Return type:

bool

_http_request(use_get, uri, data, headers, paramstring)[source]#

Get or post a http request with exception handling.

Changed in version 8.2: change the scheme if the previous request didn’t have json content.

Changed in version 9.2: no wait cycles for ImportError and NameError.

Returns:

a tuple containing requests.Response object from comms.http.request() and use_get value

Parameters:
  • use_get (bool)

  • uri (str)

Return type:

tuple

_json_loads(response)[source]#

Return a dict from requests.Response.

Changed in version 8.2: show a warning to add a protocoll() method to the family file if suitable.

Parameters:

response (requests.Response) – a requests.Response object

Returns:

a data dict

Raises:
  • pywikibot.exceptions.APIError – unknown action found

  • pywikibot.exceptions.APIError – unknown query result type

Return type:

dict | None

classmethod clean_kwargs(kwargs)[source]#

Convert keyword arguments into new parameters mode.

If there are no other arguments in kwargs apart from the used arguments by the class’ initializer it’ll just return kwargs and otherwise remove those which aren’t in the initializer and put them in a dict which is added as a ‘parameters’ keyword. It will always create a shallow copy.

Parameters:

kwargs (dict) – The original keyword arguments which is not modified.

Returns:

The normalized keyword arguments.

Return type:

dict

classmethod create_simple(req_site, **kwargs)[source]#

Create a new instance using all args except site for the API.

iteritems()[source]#

Implement dict interface.

Deprecated since version 9.0: Use items() instead.

submit()[source]#

Submit a query and parse the response.

Changed in version 8.0.4: in addition to readapidenied also try to login when API response is notloggedin.

Changed in version 9.0: Raise exceptions.APIError if the same error comes twice in a row within the loop.

Returns:

a dict containing data retrieved from api.php

Return type:

dict

wait(delay=None)[source]#

Determine how long to wait after a failed request.

Also reset last API error with wait cycles.

Added in version 9.0.

Parameters:

delay (int | None) – Minimum time in seconds to wait. Overwrites retry_wait variable if given. The delay doubles each retry until retry_max seconds is reached.

Return type:

None

data.api.encode_url(query)[source]#

Encode parameters to pass with a url.

Reorder parameters so that token parameters go last and call wraps urlencode. Return an HTTP URL query fragment which complies with API:Edit#Parameters (See the ‘token’ bullet.)

Parameters:

query (mapping object or a sequence of two-element tuples) – keys and values to be uncoded for passing with a url

Returns:

encoded parameters with token parameters at the end

Return type:

str

data.api.update_page(page, pagedict, props=None)[source]#

Update attributes of page, based on query data in pagedict.

Parameters:
  • page (Page) – object to be updated

  • pagedict (dict[str, Any]) – the contents of a page element of a query response

  • props (Iterable[str] | None) – the property names which resulted in pagedict. If a missing value in pagedict can indicate both ‘false’ and ‘not present’ the property which would make the value present must be in the props parameter.

Raises:
Return type:

None

data.memento — Memento Requests#

Fix ups for memento-client package version 0.6.1.

Added in version 7.4.

class data.memento.MementoClient(*args, **kwargs)[source]#

Bases: MementoClient

A Memento Client.

It makes it straightforward to access the Web of the past as it is to access the current Web.

Changed in version 7.4: timeout is used in several methods.

Basic usage:

>>> mc = MementoClient()
>>> dt = mc.convert_to_datetime("Sun, 01 Apr 2010 12:00:00 GMT")
>>> mi = mc.get_memento_info("http://www.bbc.com/", dt, timeout=60)
>>> mi['original_uri']
'http://www.bbc.com/'
>>> mi['timegate_uri']
'http://timetravel.mementoweb.org/timegate/http://www.bbc.com/'
>>> sorted(mi['mementos'])
['closest', 'first', 'last', 'next', 'prev']
>>> from pprint import pprint
>>> pprint(mi['mementos'])
{'closest': {'datetime': datetime.datetime(2010, 5, 23, 10, 19, 6),
             'http_status_code': 200,
             'uri': ['https://web.archive.org/web/20100523101906/http://www.bbc.co.uk/']},
 'first': {'datetime': datetime.datetime(1998, 12, 2, 21, 26, 10),
           'uri': ['http://wayback.nli.org.il:8080/19981202212610/http://www.bbc.com/']},
 'last': {'datetime': datetime.datetime(2022, 7, 31, 3, 30, 53),
          'uri': ['http://archive.md/20220731033053/http://www.bbc.com/']},
 'next': {'datetime': datetime.datetime(2010, 6, 2, 17, 29, 9),
          'uri': ['http://wayback.archive-it.org/all/20100602172909/http://www.bbc.com/']},
 'prev': {'datetime': datetime.datetime(2009, 10, 15, 19, 7, 5),
          'uri': ['http://wayback.nli.org.il:8080/20091015190705/http://www.bbc.com/']}}

The output conforms to the Memento API format explained here: http://timetravel.mementoweb.org/guide/api/#memento-json

Note

The mementos result is not deterministic. It may be different for the same parameters.

By default, MementoClient uses the Memento Aggregator: http://mementoweb.org/depot/

It is also possible to use different TimeGate, simply initialize with a preferred timegate base uri. Toggle check_native_timegate to see if the original uri has its own timegate. The native timegate, if found will be used instead of the timegate_uri preferred. If no native timegate is found, the preferred timegate_uri will be used.

Parameters:
  • timegate_uri (str) – A valid HTTP base uri for a timegate. Must start with http(s):// and end with a /.

  • max_redirects (int) – the maximum number of redirects allowed for all HTTP requests to be made.

Returns:

A MementoClient obj.

static convert_to_http_datetime(dt)[source]#

Converts a datetime object to a date string in HTTP format.

Parameters:

dt (datetime | None) – A datetime object.

Returns:

The date in HTTP format.

Raises:

TypeError – Expecting dt parameter to be of type datetime.

Return type:

str

get_memento_info(request_uri, accept_datetime=None, timeout=None, **kwargs)[source]#

Query the preferred timegate and return the closest memento uri.

Given an original uri and an accept datetime, this method queries the preferred timegate and returns the closest memento uri, along with prev/next/first/last if available.

Parameters:
  • request_uri (str) – The input http uri.

  • accept_datetime (datetime | None) – The datetime object of the accept datetime. The current datetime is used if none is provided.

  • timeout (int | None) – the timeout value for the HTTP connection.

Returns:

A map of uri and datetime for the closest/prev/next/first/last mementos.

Return type:

dict

get_native_timegate_uri(original_uri, accept_datetime, timeout=None, **kwargs)[source]#

Check the original uri whether the timegate uri is provided.

Given an original URL and an accept datetime, check the original uri to see if the timegate uri is provided in the Link header.

Parameters:
  • original_uri (str) – An HTTP uri of the original resource.

  • accept_datetime (datetime | None) – The datetime object of the accept datetime

  • timeout (int | None) – the timeout value for the HTTP connection.

Returns:

The timegate uri of the original resource, if provided, else None.

Return type:

str | None

static is_memento(uri, response=None, session=None, timeout=None)[source]#

Determines if the URI given is indeed a Memento.

The simple case is to look for a Memento-Datetime header in the request, but not all archives are Memento-compliant yet.

Parameters:
  • uri (str) – an HTTP URI for testing

  • response (Response | None) – the response object of the uri.

  • session (Session | None) – the requests session object.

  • timeout (int | None) – (int) the timeout value for the HTTP connection.

Returns:

True if a Memento, False otherwise

Return type:

bool

static is_timegate(uri, accept_datetime=None, response=None, session=None, timeout=None)[source]#

Checks if the given uri is a valid timegate according to the RFC.

Parameters:
  • uri (str) – the http uri to check.

  • accept_datetime (str | None) – the accept datetime string in http date format.

  • response (Response | None) – the response object of the uri.

  • session (Session | None) – the requests session object.

  • timeout (int | None) – the timeout value for the HTTP connection.

Returns:

True if a valid timegate, else False.

Return type:

bool

static request_head(uri, accept_datetime=None, follow_redirects=False, session=None, timeout=None)[source]#

Makes HEAD requests.

Parameters:
  • uri (str) – the uri for the request.

  • accept_datetime (str | None) – the accept-datetime in the http format.

  • follow_redirects (bool) – Toggle to follow redirects. False by default, so does not follow any redirects.

  • session (Session | None) – the request session object to avoid opening new connections for every request.

  • timeout (int | None) – the timeout for the HTTP requests.

Returns:

the response object.

Raises:

ValueError – Only HTTP URIs are supported

Return type:

Response

exception data.memento.MementoClientException(message, data)[source]#

Bases: Exception

The memento client Exception class.

data.memento.get_closest_memento_url(url, when=None, timegate_uri=None)[source]#

Get most recent memento for url.

Parameters:
  • url (str)

  • when (datetime | None)

  • timegate_uri (str | None)

data.mysql — Mysql Requests#

Miscellaneous helper functions for mysql queries.

data.mysql.mysql_query(query, params=None, dbname=None, verbose=None)[source]#

Yield rows from a MySQL query.

An example query that yields all ns0 pages might look like:

SELECT
 page_namespace,
 page_title,
FROM page
WHERE page_namespace = 0;

Supported MediaWiki projects use Unicode (UTF-8) character encoding. Cursor charset is utf8.

Parameters:
  • query (str) – MySQL query to execute

  • params (tuple, list or dict of str) – input parameters for the query, if needed if list or tuple, %s shall be used as placeholder in the query string. if a dict, %(key)s shall be used as placeholder in the query string.

  • dbname (str | None) – db name

  • verbose (bool | None) – if True, print query to be executed; if None, config.verbose_output will be used.

Returns:

generator which yield tuples

data.sparql — SPARQL requests#

SPARQL Query interface.

class data.sparql.Bnode(data, **kwargs)[source]#

Bases: SparqlNode

Representation of blank node.

Create Bnode.

Parameters:

data (dict)

class data.sparql.Literal(data, **kwargs)[source]#

Bases: SparqlNode

Representation of RDF literal result type.

Create Literal object.

Parameters:

data (dict)

class data.sparql.SparqlNode(value)[source]#

Bases: object

Base class for SPARQL nodes.

Create a SparqlNode.

class data.sparql.SparqlQuery(endpoint=None, entity_url=None, repo=None, max_retries=None, retry_wait=None)[source]#

Bases: WaitingMixin

SPARQL Query class.

This class allows to run SPARQL queries against any SPARQL endpoint.

Changed in version 8.4: inherited from data.WaitingMixin which provides a data.WaitingMixin.wait() method.

Create endpoint.

Parameters:
  • endpoint (str | None) – SPARQL endpoint URL

  • entity_url (str | None) – URL prefix for any entities returned in a query.

  • repo (pywikibot.site.DataSite) – The Wikibase site which we want to run queries on. If provided this overrides any value in endpoint and entity_url. Defaults to Wikidata.

  • max_retries (int | None) – (optional) Maximum number of times to retry after errors, defaults to config.max_retries.

  • retry_wait (float | None) – (optional) Minimum time in seconds to wait after an error, defaults to config.retry_wait seconds (doubles each retry until config.retry_max is reached).

ask(query, headers=None)[source]#

Run SPARQL ASK query and return boolean result.

Parameters:
  • query (str) – Query text

  • headers (dict[str, str] | None)

Return type:

bool

get_items(query, item_name='item', result_type=<class 'set'>)[source]#

Retrieve items which satisfy given query.

Items are returned as Wikibase IDs.

Parameters:
  • query – Query string. Must contain ?{item_name} as one of the projected values.

  • item_name (str) – Name of the value to extract

  • result_type (iterable) – type of the iterable in which SPARQL results are stored (default set)

Returns:

item ids, e.g. Q1234

Return type:

same as result_type

get_last_response()[source]#

Return last received response.

Returns:

Response object from last request or None

query(query, headers=None)[source]#

Run SPARQL query and return parsed JSON result.

Changed in version 8.5: exceptions.NoUsernameError is raised if the response looks like the user is not logged in.

Changed in version 9.6: retry on internal server error (500).

Parameters:
  • query (str) – Query text

  • headers (dict[str, str] | None)

Raises:

NoUsernameError – User not logged in

select(query, full_data=False, headers=None)[source]#

Run SPARQL query and return the result.

The response is assumed to be in format defined by: https://www.w3.org/TR/2013/REC-sparql11-results-json-20130321/

Parameters:
  • query (str) – Query text

  • full_data (bool) – Whether return full data objects or only values

  • headers (dict[str, str] | None)

Return type:

list[dict[str, str]] | None

class data.sparql.URI(data, entity_url, **kwargs)[source]#

Bases: SparqlNode

Representation of URI result type.

Create URI object.

Parameters:

data (dict)

getID()[source]#

Get ID of Wikibase object identified by the URI.

Returns:

ID of Wikibase object, e.g. Q1234

data.superset — Superset requests#

Superset Query interface.

Added in version 9.2.

class data.superset.SupersetQuery(schema_name=None, site=None, database_id=None)[source]#

Bases: WaitingMixin

Superset Query class.

This class allows to run SQL queries against wikimedia superset service.

Create superset endpoint with initial defaults.

Either site OR schema_name is required. Site and schema_name are mutually exclusive. Database id will be retrieved automatically if needed.

Parameters:
  • site (BaseSite | None) – The mediawiki site to be queried

  • schema_name (str | None) – superset database schema name. Example value “enwiki_p”

  • database_id (int | None) – superset database id.

Raises:

TypeError – if site and schema_name are both defined’

get_csrf_token()[source]#

Get superset CSRF token.

Method retrieves a CSRF token from the Superset service. If the instance is not connected, it attempts to log in first.

Raises:

ServerError – For any http errors

Returns:

CSRF token string

Return type:

str

get_database_id_by_schema_name(schema_name)[source]#

Get superset database_id using superset schema name.

Parameters:

schema_name (str) – superset database schema name. Example value “enwiki_p”

Raises:
  • KeyError – If the database ID could found.

  • ServerError – For any other http errors

Returns:

database id

Return type:

int

login()[source]#

Login to superset.

Function logins first to meta.wikimedia.org and then OAUTH login to superset.wmcloud.org. Working login expects that the user has manually permitted the username to login to the superset.

Raises:
Returns:

True if user has been logged to superset

Return type:

bool

merge_query_arguments(database_id=None, schema_name=None, site=None)[source]#

Determine and validate the database_id and schema_name.

Parameters:
  • database_id (int | None) – The superset database ID.

  • schema_name (str | None) – The superset schema name.

  • site (BaseSite) – The target site

Raises:
  • TypeError – if site and schema_name are both defined’

  • TypeError – If determined database_id is not an integer.

  • TypeError – If neither site nor schema_name is determined.

Returns:

A tuple containing database_id and schema_name.

Return type:

tuple(int, str)

query(sql, database_id=None, schema_name=None, site=None)[source]#

Execute SQL queries on Superset.

Parameters:
  • sql (str) – The SQL query to execute.

  • database_id (int | None) – The database ID.

  • schema_name (str | None) – The schema name.

  • site (BaseSite)

Raises:

RuntimeError – If the query execution fails.

Returns:

The data returned from the query execution.

Return type:

list[Any]

data.wikistats — WikiStats requests#

Objects representing WikiStats API.

class data.wikistats.WikiStats(url='https://wikistats.wmcloud.org/')[source]#

Bases: object

Light wrapper around WikiStats data, caching responses and data.

The methods accept a Pywikibot family name as the WikiStats table name, mapping the names before calling the WikiStats API.

Changed in version 9.0: tables are cached globally instead by instances.

Parameters:

url (str)

ALL_KEYS = {'editthis', 'gamepedias', 'gentoo', 'lxde', 'mediawikis', 'metapedias', 'neoseeker', 'opensuse', 'orain', 'pardus', 'referata', 'rodovid', 'scoutwiki', 'shoutwiki', 'sourceforge', 'uncyclomedia', 'w3cwikis', 'wikia', 'wikibooks', 'wikifur', 'wikinews', 'wikipedia', 'wikipedias', 'wikiquote', 'wikiquotes', 'wikisite', 'wikisource', 'wikisources', 'wikitravel', 'wikiversity', 'wikivoyage', 'wikkii', 'wiktionaries', 'wiktionary', 'wmspecials'}#
ALL_TABLES = {'editthis', 'gamepedias', 'gentoo', 'lxde', 'mediawikis', 'metapedias', 'neoseeker', 'opensuse', 'orain', 'pardus', 'referata', 'rodovid', 'scoutwiki', 'shoutwiki', 'sourceforge', 'uncyclomedia', 'w3cwikis', 'wikia', 'wikibooks', 'wikifur', 'wikinews', 'wikipedias', 'wikiquotes', 'wikisite', 'wikisources', 'wikitravel', 'wikiversity', 'wikivoyage', 'wikkii', 'wiktionaries', 'wmspecials'}#
FAMILY_MAPPING = {'wikipedia': 'wikipedias', 'wikiquote': 'wikiquotes', 'wikisource': 'wikisources', 'wiktionary': 'wiktionaries'}#
MISC_SITES_TABLE = 'mediawikis'#
OTHER_MULTILANG_TABLES = {'gentoo', 'lxde', 'metapedias', 'opensuse', 'pardus', 'rodovid', 'scoutwiki', 'uncyclomedia', 'wikifur', 'wikitravel'}#
OTHER_TABLES = {'editthis', 'gamepedias', 'neoseeker', 'orain', 'referata', 'shoutwiki', 'sourceforge', 'w3cwikis', 'wikia', 'wikisite', 'wikkii', 'wmspecials'}#
WMF_MULTILANG_TABLES = {'wikibooks', 'wikinews', 'wikipedias', 'wikiquotes', 'wikisources', 'wikiversity', 'wikivoyage', 'wiktionaries'}#
get(table)[source]#

Get a list of a table of data.

Parameters:

table (str) – table of data to fetch

Return type:

list

get_dict(table)[source]#

Get dictionary of a table of data.

Parameters:

table (str) – table of data to fetch

Return type:

dict

languages_by_size(table)[source]#

Return ordered list of languages by size from WikiStats.

Parameters:

table (str)

sorted(table, key, reverse=None)[source]#

Reverse numerical sort of data.

Parameters:
  • table (str) – name of table of data

  • key (str) – data table key

  • reverse (bool | None) – If set to True the sorting order is reversed. If None the sorting order for numeric keys are reversed whereas alphanumeric keys are sorted in normal way.

Returns:

The sorted table

Return type:

list