Portal:Data Services
Appearance
Please read the Wikimedia Cloud Services introduction and the Getting Started guide. |
Data Services includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores.
Data services currently include: Wiki Replicas, Wikimedia Dumps, Shared Storage, CirrusSearch Elasticsearch replicas, Superset and PAWS.
Data stores
Wiki Replicas
- About
- Wiki Replicas are MySQL/MariaDB databases that replicate near-realtime from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.
- How to access
- Access to Wiki Replicas using the MySQL protocol is automatically granted to all users of Toolforge. See Help:Toolforge/Database to learn how to access the Wiki Replicas.
- You can also query the content of Wiki Replicas using Superset and Quarry.
Wikimedia Dumps
- About
- Wikimedia Dumps offers a range of data downloads including full text dumps, and other datasets. More documentation about dumps can be found at Data dumps.
- How to access
- See Help:Shared_storage#Dumps.
- Toolforge users can directly access dumps data through their Tool account.
- Cloud VPS users can request to have the share available.
Shared Storage
- About
- Shared Storage is offered via NFS. It includes shared directories offered to VPS and Toolforge users. Currently offered shares are described at Help:Shared storage. Wikimedia Dumps are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.
- How to access
- The Toolforge environment is set up for access by default, and other Cloud VPS projects can access some resources by requesting access to listed shares by filing a task on Phabricator under the Data-Services and VPS-Projects projects.
CirrusSearch Elasticsearch replicas
- About
- The "Cloud Elastic" servers are a replica of the CirrusSearch Elasticsearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge). Applications can use the full power of the elasticsearch search API's to query the search indices in ways that CirrusSearch does not expose directly on the wikis themselves. See Help:CirrusSearch elasticsearch replicas for more details.
- How to access
- These servers are not generally accessible from the internet at large, rather they are only accessible through applications running inside Wikimedia Cloud Services.
Wikimedia Enterprise
- About
- Wikimedia Enterprise is a set of API's targeting large scale user needs. For more information on the APIs, see the service's documentation.
- How to access
- Users of Toolforge, Cloud VPS, or PAWS have access to the Misc and Bulk APIs (Daily and Hourly Exports).
Web interfaces
Superset
- About
- Superset is a graphical web interface that allows users to query Wiki Replicas and ToolsDB using SQL and to create data visualizations. It is powered by Apache Superset, an open-source software that is extensively used by analysts, researchers, and people of all experience levels to easily access databases.
- How to access
- Superset requires a Wikimedia SUL account to login.
PAWS
- About
- PAWS is a Jupyter notebooks installation hosted by Wikimedia Cloud Services that hosts Python notebooks and a terminal accessible through a web browser. You can access Wiki Replicas, ToolsDB and Dumps with PAWS.
- How to access
- PAWS requires a Wikimedia SUL account to login.