వాడుకరి:Arjunaraoc/MySQL queries configured for Tewiki
ON WMF labs
మార్చుWMFlabs లో tools వాడేవారికి (వేరేవిధం
ఎవరైనా http://quarry.wmflabs.org వాడి ఇలాంటి క్వేరీలు చేయవచ్చు)
$ssh <username>@tools-login.wmflabs.org
<type your passphrase to connect>
$ sql tewiki_p
mysql>
sample query for Top 25 pages with maximum edits ఫిభ్రవరి2014ఉదాహరణ ఫలితాలకు ప్రాజెక్టు పేజీ చూడండి.
SELECT page_title, COUNT(*) As Edits FROM page LEFT JOIN revision ON revision.rev_page = page.page_id WHERE rev_timestamp >= '20140201000000' AND rev_timestamp <'20140301000000' AND page_namespace = 0 AND page_title in (SELECT page_title
FROM page JOIN categorylinks ON categorylinks.cl_from = page.page_id WHERE page.page_namespace = 1 AND categorylinks.cl_to= 'వికీప్రాజెక్టు_నాణ్యతాభివృద్ధి-వికీట్రెండ్స్_పేజీలు')
GROUP BY page_title HAVING Edits > 0 ORDER BY Edits DESC LIMIT 25;
వాడుకరి మార్పులు
Sample query for Top 25 Users with maximum edits in Project ఫలితాలకు ప్రాజెక్టు పేజీ చూడండి.
SELECT
user_name, COUNT(*) As Edits
FROM user LEFT JOIN revision
ON rev_user = user_id
LEFT JOIN page
ON page_id = rev_page
WHERE rev_timestamp >= '20140201000000' AND rev_timestamp <= '20140301000000' AND page_namespace = 0 AND page_title IN (SELECT page_title
FROM page JOIN categorylinks ON categorylinks.cl_from = page.page_id WHERE page.page_namespace = 1 AND categorylinks.cl_to= 'వికీప్రాజెక్టు_నాణ్యతాభివృద్ధి-వికీట్రెండ్స్_పేజీలు')
GROUP BY user_name HAVING COUNT(*) > 0 ORDER BY Edits DESC LIMIT 25;
OLD EXAMPLES from tool server
మార్చుMySQL queries allow you to pull data from the Toolserver's replicated databases.
Database layout
మార్చుThe database layout is available at the MediaWiki wiki: mw:Manual:Database layout.
There are also two commands you can use to view the layout. SHOW TABLES will show the available tables in a database. DESCRIBE table_name will show the available columns in a specific table.
Clusters
మార్చుThe various databases are stored in clusters. The clusters are named with a preceding S and a digit, e.g., S1, S2, etc. A table of this information is available here: Wiki server assignments.
Data storage
మార్చుThere are a few tricks to how data is stored in the various tables.
- Page titles use underscores and never include the namespace prefix.
- User names use spaces, not underscores.
- Namespaces are integers. A key to the integers is available here. The localized names for the namespaces can be obtained from the Toolserver database.
Views
మార్చుThe Toolserver has exact replicas of Wikimedia's databases, however, certain information is restricted to Toolserver users using MySQL views.
For example, the user table does not show things like user_password or user_email to Toolserver users.
Accessing the data
మార్చుThere are a variety of ways to access the databases. From the command line, a shell script exists that automatically selects the correct cluster for you.
$ sql tewiki_p
This command will connect to the appropriate cluster (in this case, S1) and give you a MySQL prompt where you can run queries.
$ sql tewiki_p < test-query.sql > test-query.txt
This command takes a .sql file that contains your query, selects the appropriate cluster, runs the query, and outputs to a text file. The advantage here is that it doesn't add lines around the data (making a table), it instead outputs the data in a tab-delimited format.
$ sql tewiki_p < test-query.sql | gzip >test-query.txt.gz
This command does the same thing as the command above, but after outputting to a text file, it gzips the data. This can be very helpful if you're dealing with large amounts of data.
The sql shell script is a wrapper around the mysql command. Either method works, though the sql shell script selects the appropriate cluster for you.
If you wish to use the mysql command directly, a sample query would look like this:
$ mysql -h sql-s1 tewiki_p -e "DESCRIBE `user`;"
The -h option tells MySQL which host to access (in this case sql-s1). The -e option tells MySQL to echo the results of the query to the terminal. You can also have MySQL output the results to a file.
$ mysql -h sql-s1 tewiki_p < test-query.sql > test-query.txt
Writing queries
మార్చుBecause the Toolserver database is read-only, nearly all of the queries you will want to run will be SELECT queries.
SELECT * FROM `user`;
This query selects all columns from the user table. More information about MySQL queries are available below (in the example queries) and in the MySQL manual.
Queries end in a semi-colon (;). If you want to cancel the query, end it in \c. If you want to output in a non-table format, use \G.
Toolserver database
మార్చుThe "toolserver" database contains metadata about the various wikis and databases. It consists of four tables: language, namespace, namespacename, and wiki.
Example queries
మార్చు- w:en:Wikipedia:Database reports (and its equivalents in other languages) contain many useful examples.
Atypical log entries
మార్చుSELECT
user_name,
log_namespace,
ns_name,
log_timestamp,
log_action,
log_title,
log_comment
FROM logging
JOIN toolserver.namespace
ON log_namespace = ns_id
AND dbname = 'tewiki_p'
JOIN `user`
ON log_user = user_id
WHERE log_type='delete'
AND log_action != 'restore'
AND log_action != 'delete';
Explanation: This pulls log entries from the deletion log that aren't restore or delete actions.
Broken redirects
మార్చు- See also source code of Special:BrokenRedirects
SELECT
p1.page_namespace,
ns_name,
p1.page_title
FROM redirect AS rd
JOIN page p1
ON rd.rd_from = p1.page_id
JOIN toolserver.namespace
ON p1.page_namespace = ns_id
AND dbname = 'tewiki_p'
LEFT JOIN page AS p2
ON rd_namespace = p2.page_namespace
AND rd_title = p2.page_title
WHERE rd_namespace >= 0
AND p2.page_namespace IS NULL
ORDER BY p1.page_namespace ASC;
Explanation: This pulls all broken redirects.
Cross-namespace redirects
మార్చుSELECT
pt.page_namespace,
pf.page_title,
ns_name,
rd_title
FROM redirect, page AS pf, page AS pt
JOIN toolserver.namespace
ON pt.page_namespace = ns_id
AND dbname = 'tewiki_p'
WHERE pf.page_namespace = 0
AND rd_title = pt.page_title
AND rd_namespace = pt.page_namespace
AND pt.page_namespace != 0
AND rd_from = pf.page_id
AND pf.page_namespace = 0;
Explanation: This pulls redirects from (Main) to any other namespace.
Deleted red-linked categories
మార్చుSELECT
cattmp.cl_to,
cattmp.cl_count,
user_name,
log_timestamp,
log_comment
FROM logging
JOIN `user` ON log_user = user_id
JOIN
(SELECT
cl_to,
COUNT(cl_to) AS cl_count
FROM categorylinks
LEFT JOIN page ON cl_to = page_title
AND page_namespace = 14
WHERE page_title IS NULL
GROUP BY cl_to) AS cattmp
ON log_title = cattmp.cl_to
WHERE log_namespace = 14
AND log_type = "delete"
AND log_timestamp = (SELECT
MAX(log_timestamp)
FROM logging AS last
WHERE log_namespace = 14
AND cattmp.cl_to = last.log_title);
Explanation: This pulls non-existent used categories that have previously been deleted.
Empty categories
మార్చుSELECT
page_title,
page_len
FROM categorylinks
RIGHT JOIN page ON cl_to = page_title
WHERE page_namespace = 14
AND page_is_redirect = 0
AND cl_to IS NULL
AND NOT EXISTS (SELECT
1
FROM categorylinks
WHERE cl_from = page_id
AND cl_to = 'Wikipedia_category_redirects')
AND NOT EXISTS (SELECT
1
FROM categorylinks
WHERE cl_from = page_id
AND cl_to = 'Disambiguation_categories')
AND NOT EXISTS (SELECT
1
FROM templatelinks
WHERE tl_from = page_id
AND tl_namespace = 10
AND tl_title = 'Empty_category');
Explanation: This pulls empty categories that aren't in specific categories and don't transclude a specific template.
Fully-protected articles with excessively long expiries
మార్చుSELECT
page_is_redirect,
page_title,
user_name,
logs.log_timestamp,
pr_expiry,
logs.log_comment
FROM page
JOIN page_restrictions ON page_id = pr_page
AND page_namespace = 0
AND pr_type = 'edit'
AND pr_level = 'sysop'
AND pr_expiry > DATE_FORMAT(DATE_ADD(NOW(),INTERVAL 1 YEAR),'%Y%m%d%H%i%s')
AND pr_expiry != 'infinity'
LEFT JOIN logging AS logs ON logs.log_title = page_title
AND logs.log_namespace = 0
AND logs.log_type = 'protect'
LEFT JOIN `user` ON logs.log_user = user_id
WHERE CASE WHEN (NOT ISNULL(log_timestamp))
THEN log_timestamp = (SELECT MAX(last.log_timestamp)
FROM logging AS last
WHERE log_title = page_title
AND log_namespace = 0
AND log_type = 'protect')
ELSE 1 END;
Explanation: Articles that are fully-protected from editing for more than one year.
Excessively long IP blocks
మార్చుSELECT
ipb_address,
ipb_by_text,
ipb_timestamp,
ipb_expiry,
ipb_reason
FROM ipblocks
WHERE ipb_expiry > DATE_FORMAT(DATE_ADD(NOW(),INTERVAL 2 YEAR),'%Y%m%d%H%i%s')
AND ipb_expiry != "infinity"
AND ipb_user = 0;
Explanation: Blocks of anonymous users that are longer than two years (but not indefinite).
Indefinitely fully-protected articles
మార్చుSELECT
page_is_redirect,
page_title,
user_name,
logs.log_timestamp,
logs.log_comment
FROM page
JOIN page_restrictions ON page_id = pr_page
AND page_namespace = 0
AND pr_type = 'edit'
AND pr_level = 'sysop'
AND pr_expiry = 'infinity'
LEFT JOIN logging AS logs ON logs.log_title = page_title
AND logs.log_namespace = 0
AND logs.log_type = 'protect'
LEFT JOIN `user` ON logs.log_user = user_id
WHERE CASE WHEN (NOT ISNULL(log_timestamp))
THEN log_timestamp = (SELECT MAX(last.log_timestamp)
FROM logging AS last
WHERE log_title = page_title
AND log_namespace = 0
AND log_type = 'protect')
ELSE 1 END;
Explanation: Articles indefinitely fully-protected from editing.
Long pages
మార్చుSELECT
page_namespace,
ns_name,
page_title,
page_len
FROM page
JOIN toolserver.namespace
ON page_namespace = ns_id
AND dbname = 'tewiki_p'
WHERE page_len > 175000
AND page_title NOT LIKE "%/%"
ORDER BY page_namespace ASC;
Explanation: Pages over 175,000 bytes in length; excludes titles with "/" in them (to avoid archives, etc.).
Redirects obscuring page content
మార్చుSELECT
page_namespace,
ns_name,
page_title,
page_len
FROM page
JOIN toolserver.namespace
ON page_namespace = ns_id
AND dbname = 'tewiki_p'
WHERE page_is_redirect = 1
HAVING page_len > 449
ORDER BY page_namespace ASC;
Explanation: Redirects with large page lengths. This usually indicates there is text below the redirect that should not be there.
Mistagged non-free content
మార్చుSELECT
DISTINCT(tewiki_p.page.page_title),
commonswiki_p.image.img_name
FROM tewiki_p.image, commonswiki_p.image, tewiki_p.categorylinks, tewiki_p.page
WHERE tewiki_p.image.img_sha1 = commonswiki_p.image.img_sha1
AND tewiki_p.page.page_title = tewiki_p.image.img_name
AND tewiki_p.categorylinks.cl_from = tewiki_p.page.page_id
AND tewiki_p.categorylinks.cl_to = 'All_non-free_media'
AND tewiki_p.image.img_sha1 != 'phoiac9h4m842xq45sp7s6u21eteeq1';
Explanation: This pulls files on a local wiki that are in non-free category, but also exist at Commons. This indicates that either the local image or the Commons image should be deleted. It excludes the SHA1 empty string due to bad database rows.
Pages with the most revisions
మార్చుSELECT
page_namespace,
ns_name,
page_title,
COUNT(*)
FROM revision
JOIN page ON page_id = rev_page
JOIN toolserver.namespace ON page_namespace = ns_id
AND dbname = 'tewiki_p'
GROUP BY page_namespace, page_title
ORDER BY COUNT(*) DESC
LIMIT 1000;
Explanation: This pulls the pages with the most revisions. On large wikis, it can take several hours (or days) to run.
Page counts by namespace
మార్చుSELECT
page_namespace,
ns_name,
MAX(notredir),
MAX(redir)
FROM (
SELECT page.page_namespace,
IF( page_is_redirect, COUNT(page.page_namespace), 0 ) AS redir,
IF( page_is_redirect, 0, COUNT(page.page_namespace)) AS notredir
FROM page
GROUP BY page_is_redirect, page_namespace
ORDER BY page_namespace, page_is_redirect
) AS pagetmp
JOIN toolserver.namespace ON page_namespace = ns_id AND dbname = 'tewiki_p'
GROUP BY page_namespace;
Explanation: This pulls the number of redirects and non-redirects in each namespace.
Orphaned talk pages
మార్చుSELECT
p1.page_namespace,
ns_name,
p1.page_title
FROM page AS p1
JOIN toolserver.namespace
ON p1.page_namespace = ns_id
AND dbname = 'tewiki_p'
WHERE p1.page_title NOT LIKE "%/%"
AND p1.page_namespace NOT IN (0,2,3,4,6,8,9,10,12,14,16,18,100,102,104)
AND CASE WHEN p1.page_namespace = 1
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 0
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 5
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 4
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 7
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 6
AND p1.page_title = p2.page_title)
AND NOT EXISTS (SELECT
1
FROM commonswiki_p.page AS p2
WHERE p2.page_namespace = 6
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 11
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 10
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 13
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 12
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 15
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 14
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 17
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 16
AND p1.page_title = p2.page_title)
ELSE 1 END
AND CASE WHEN p1.page_namespace = 101
THEN NOT EXISTS (SELECT
1
FROM page AS p2
WHERE p2.page_namespace = 100
AND p1.page_title = p2.page_title)
ELSE 1 END
AND p1.page_id NOT IN (SELECT
page_id
FROM page
JOIN templatelinks
ON page_id = tl_from
WHERE tl_title="G8-exempt"
AND tl_namespace = 10)
AND p1.page_id NOT IN (SELECT
page_id
FROM page
JOIN templatelinks
ON page_id = tl_from
WHERE tl_title="Go_away"
AND tl_namespace = 10)
AND p1.page_id NOT IN (SELECT
page_id
FROM page
JOIN templatelinks
ON page_id = tl_from
WHERE tl_title="Rtd"
AND tl_namespace = 10);
Explanation: This (very, very hackishly) pulls all pages in the talk namespaces that don't have a corresponding subject-space page. It JOINs against Commons to ensure that File_talk: pages are truly orphaned. It also has some en.wiki-specific template checks in it.
Ownerless pages in the user space
మార్చుSELECT
page_namespace,
ns_name,
page_title,
page_len
FROM page
JOIN toolserver.namespace
ON page_namespace = ns_id
AND dbname = `tewiki_p`
LEFT JOIN `user`
ON user_name = REPLACE(page_title, '_', ' ')
WHERE page_namespace IN (2,3)
AND page_is_redirect = 0
AND page_title NOT LIKE "%/%"
AND page_title NOT RLIKE "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
AND user_name IS NULL;
Explanation: This pulls all User: and User_talk: pages not belonging to a registered user. Pages belonging to an anonymous user are excluded.
Polluted categories
మార్చుSELECT DISTINCT
cl_to
FROM categorylinks AS cat
JOIN page AS pg1
ON cat.cl_from = pg1.page_id
WHERE page_namespace = 2
AND EXISTS (SELECT
1
FROM page AS pg2
JOIN categorylinks AS cl
ON pg2.page_id = cl.cl_from
WHERE pg2.page_namespace = 0
AND cat.cl_to = cl.cl_to)
AND cl_to NOT IN (SELECT
page_title
FROM page
JOIN templatelinks
ON tl_from = page_id
WHERE page_namespace = 14
AND tl_namespace = 10
AND tl_title = 'Pollutedcat')
LIMIT 250;
Explanation: This pulls categories that contain pages in the (Main) namespace and the User: namespace. Generally categories hold one or the other.
Categories categorized in red-linked categories
మార్చుSELECT
page_title,
cl_to
FROM page
JOIN
(SELECT
cl_to,
cl_from
FROM categorylinks
LEFT JOIN page ON cl_to = page_title
AND page_namespace = 14
WHERE page_title IS NULL) AS cattmp
ON cattmp.cl_from = page_id
WHERE page_namespace = 14;
Explanation: This pulls categories categorized in red-linked categories.
Articles containing red-linked files
మార్చుSELECT
page_title,
il_to
FROM page
JOIN imagelinks
ON page_id = il_from
WHERE (NOT EXISTS(
SELECT
1
FROM image
WHERE img_name = il_to))
AND (NOT EXISTS(
SELECT
1
FROM commonswiki_p.page
WHERE page_title = il_to
AND page_namespace = 6))
AND page_namespace = 0;
Explanation: This pulls articles that contain red-linked images. It checks both Commons and the local wiki.
Self-categorized categories
మార్చుSELECT
page_title,
cat_pages,
cat_subcats
FROM page
JOIN categorylinks ON cl_to = page_title
RIGHT JOIN category
ON cat_title = page_title
WHERE page_id = cl_from
AND page_namespace = 14;
Explanation: This pulls self-categorized categories.
Uncategorized categories
మార్చుSELECT
page_title,
page_len,
cat_pages,
rev_timestamp,
rev_user_text
FROM revision
JOIN
(SELECT
page_id,
page_title,
page_len,
cat_pages
FROM category
RIGHT JOIN page ON cat_title = page_title
LEFT JOIN categorylinks ON page_id = cl_from
WHERE cl_from IS NULL
AND page_namespace = 14
AND page_is_redirect = 0) AS pagetmp
ON rev_page = pagetmp.page_id
AND rev_timestamp = (SELECT MAX(rev_timestamp)
FROM revision AS last
WHERE last.rev_page = pagetmp.page_id);
Explanation: This pulls uncategorized categories.
Pages in a specific category using a specific template
మార్చుSELECT
page_title
FROM page
JOIN templatelinks
ON tl_from = page_id
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = "Living_people"
AND tl_namespace = 10
AND tl_title = "Fact"
AND page_namespace = 0
LIMIT 500;
Explanation: This pulls pages that are in a specific category and are using a specific template.
Top edit timestamp for a category of users
మార్చుSELECT
rev_user_text,
rev_timestamp
FROM revision
JOIN (SELECT
page_title
FROM page
JOIN categorylinks
ON cl_from = page_id
WHERE cl_to = 'Category_name_goes_here'
AND page_namespace = 2
AND page_title NOT LIKE '%/%') AS cltmp
ON REPLACE(cltmp.page_title, '_', ' ') = rev_user_text
WHERE rev_timestamp = (SELECT
MAX(rev_timestamp)
FROM revision
WHERE rev_user_text = REPLACE(cltmp.page_title, '_', ' '));
Explanation: This will take the user pages that do not contain a forward slash ("/") in "Category_name_goes_here" and get the top edit timestamp for each user.
Top pages by in a specific namespace for a specific user
మార్చుSELECT
`page_title`,
COUNT(*)
FROM `revision`
JOIN `page`
ON `page_id` = `rev_page`
JOIN `user`
ON `user_id` = `rev_user`
WHERE `user_name` = 'User name'
AND `page_namespace` = 4
GROUP BY `page_title`
ORDER BY COUNT(*) DESC
LIMIT 25;
Explanation: This will pull the page titles of a the most-edited pages by a specific user in a specific namespace.
Number of deletions per day for a specific user
మార్చుSELECT
DATE( CONCAT( YEAR( log_timestamp), "-", MONTH( log_timestamp ), "-", DAY( log_timestamp ) ) ) AS day,
COUNT(log_timestamp) AS deletions
FROM logging
JOIN `user`
ON log_user = user_id
WHERE user_name = 'User name'
AND log_type = 'delete'
AND log_action = 'delete'
GROUP BY day;
Explanation: This will group the number of deletions per day by a specific user.
Number of deletions per day
మార్చుSELECT
DATE(CONCAT(YEAR(log_timestamp),"-",MONTH(log_timestamp),"-",DAY(log_timestamp))) AS day,
COUNT(log_id) AS deletions
FROM logging_ts_alternative
WHERE log_type = 'delete'
AND log_action = 'delete'
GROUP BY day;
Explanation: This will pull the number of deletions per day on a given wiki.
Most common deletion summaries
మార్చుSELECT
log_comment,
COUNT(log_id)
FROM logging_ts_alternative
WHERE log_type = 'delete'
AND log_action = 'delete'
GROUP BY log_comment
ORDER BY COUNT(log_id) DESC;
Explanation: This will pull the most common deletion summaries on a given wiki.
Most common deletion summaries for a specific user
మార్చుSELECT
log_comment AS reason,
COUNT(*) AS uses
FROM logging
JOIN `user`
ON log_user = user_id
WHERE user_name = 'User name'
AND log_type = 'delete'
AND log_action = 'delete'
GROUP BY reason
ORDER BY uses DESC
LIMIT 25;
Explanation: This will pull the most commonly used deletion summaries for a specific user.
Most common edit summaries for a specific user
మార్చుSELECT
rev_comment,
COUNT(*)
FROM revision
WHERE rev_user_text = 'User name'
GROUP BY rev_comment
ORDER BY COUNT(*) DESC
LIMIT 25;
Explanation: This will pull the most commonly used edit summaries for a specific user.
Number of revisions per day
మార్చుSELECT
DATE(CONCAT(YEAR(rev_timestamp),"-",MONTH(rev_timestamp),"-",DAY(rev_timestamp))) AS day,
COUNT(rev_timestamp) AS revisions
FROM revision
GROUP BY day;
Explanation: This will pull the number of (non-deleted) revisions to a particular wiki and group the numbers by day.
Number of revision per day by a specific user
మార్చుSELECT
DATE(CONCAT(YEAR(rev_timestamp),"-",MONTH(rev_timestamp),"-",DAY(rev_timestamp))) AS day,
COUNT(rev_timestamp) AS revisions
FROM revision
WHERE rev_user_text = 'User name'
GROUP BY day;
Explanation: This will pull the number of (non-deleted) revisions by a specific user per day.
File description pages without an associated file
మార్చుSELECT
page_title
FROM page
WHERE NOT EXISTS (SELECT
img_name
FROM image
WHERE img_name = page_title)
AND NOT EXISTS (SELECT
img_name
FROM commonswiki_p.image
WHERE img_name = page_title)
AND page_namespace = 6
AND page_is_redirect = 0
LIMIT 1000;
Explanation: This will pull file description pages that do not have an associated file, either locally or in the Commons repo.
Files without an associated file description page
మార్చుSELECT
img_name
FROM image
WHERE NOT EXISTS (SELECT
page_title
FROM page
WHERE img_name = page_title
AND page_namespace = 6)
AND NOT EXISTS (SELECT
page_title
FROM commonswiki_p.page
WHERE img_name = page_title
AND page_namespace = 6);
Explanation: This will pull files that have no associated file description page, either locally or in the Commons repo.
File description pages containing no templates
మార్చుSELECT
ns_name,
page_title,
page_len
FROM page
JOIN toolserver.namespace
ON dbname = 'tewiki_p'
AND ns_id = page_namespace
LEFT JOIN templatelinks
ON tl_from = page_id
WHERE NOT EXISTS (SELECT
img_name
FROM commonswiki_p.image
WHERE img_name = page_title)
AND page_namespace = 6
AND page_is_redirect = 0
AND tl_from IS NULL
LIMIT 800;
Explanation: This will pull file description pages containing no templates that do not have an associated file description page in the Commons repo.
File description pages containing no templates or categories
మార్చుSELECT
ns_name,
page_title,
page_len
FROM page
JOIN toolserver.namespace
ON dbname = 'tewiki_p'
AND ns_id = page_namespace
LEFT JOIN templatelinks
ON tl_from = page_id
LEFT JOIN categorylinks
ON cl_from = page_id
WHERE NOT EXISTS (SELECT
img_name
FROM commonswiki_p.image
WHERE img_name = page_title)
AND page_namespace = 6
AND page_is_redirect = 0
AND tl_from IS NULL
AND cl_from IS NULL
LIMIT 800;
Explanation: This will pull file description pages containing no templates or categories that do not have an associated file description page in the Commons repo.
Links on a particular page
మార్చుSELECT
pl_namespace,
pl_title
FROM page
JOIN pagelinks
ON pl_from = page_id
WHERE page_namespace = 0
AND page_title = 'Don\'t_poke_the_bear';
Explanation: This will pull the names of the links on a particular page. For example, the text of the "Don't poke the bear" page contains the link "[[bear]]" so the output of this query will list "Bear" as a result.
Links to a particular page
మార్చుSELECT
page_namespace,
page_title
FROM page
JOIN pagelinks
ON pl_from = page_id
WHERE pl_namespace = 0
AND pl_title = 'Don\'t_poke_the_bear';
Explanation: This will pull the names of the links to a particular page. This is the equivalent of Special:WhatLinksHere. For example, the page "What not to do" may contain the link "[[Don't poke the bear]]"; this query would output "What not to do" as a result.
Pages containing 0 page links
మార్చుSELECT
page_namespace,
page_title
FROM page
LEFT JOIN pagelinks
ON pl_from = page_id
WHERE pl_namespace IS NULL
LIMIT 1;
Explanation: This will pull pages that contain 0 page links. This will not account for things like image links, template links, category links, or external links.
Pages with 0 links to them
మార్చుSELECT
page_namespace,
page_title
FROM page
LEFT JOIN pagelinks
ON pl_title = page_title
AND pl_namespace = page_namespace
WHERE pl_namespace IS NULL
LIMIT 1;
Explanation: This will pull pages that have 0 links to them (Special:WhatLinksHere would be empty!).
Short pages with a single author (excluding user pages and redirects)
మార్చుSELECT
CONCAT(ns_name, ':', page_title),
page_len
FROM page
JOIN toolserver.namespace
ON page_namespace = ns_id
AND dbname = 'tewiki_p'
LEFT JOIN templatelinks
ON tl_from = page_id
WHERE page_len < 50
AND page_is_redirect = 0
AND page_namespace NOT IN (2, 3)
AND tl_from IS NULL
AND (SELECT
COUNT(DISTINCT rev_user_text)
FROM revision
WHERE rev_page = page_id) = 1
ORDER BY page_len ASC
Explanation: This will list short pages with a single author (excluding user pages and redirects).
Unused templates
మార్చుSELECT
CONCAT('Template:', page_title)
FROM page
LEFT JOIN templatelinks
ON page_namespace = tl_namespace
AND page_title = tl_title
WHERE page_namespace = 10
AND tl_from IS NULL
Explanation: This will pull unused templates.
Broken image links
మార్చుSELECT
CONCAT("* [[", if (page_namespace = 0, page_title, concat(":", ns_name, ":", page_title)), "]]",
" - [[:Bilde:", il_to, "]]") as links
FROM imagelinks
JOIN page
ON page_id = il_from
JOIN toolserver.namespace
ON ns_id = page_namespace
AND dbname = "nowiki_p"
LEFT JOIN image as I
ON I.img_name = il_to
LEFT JOIN commonswiki_p.image as J
ON J.img_name = il_to
WHERE I.img_name IS NULL
AND J.img_name IS NULL /* SLOW_OK */;
Explanation: List references to images that are not present locally nor on Commons (nowiki).
Revision counting for specific user till specific time
మార్చుSELECT
COUNT(*)
FROM revision
WHERE rev_user_text = 'User name'
AND rev_timestamp < '20120327000000';
Explanation: Return the number of revisions by a specific user up to a certain time.
Log count in recent time by users not in group
మార్చుA query like this outputs the users with:
- the most log actions of the kind specified (in the example, patrol/patrol minus autopatrolled edits),
- in the interval specified (in the example, last 24 months),
- excluding users in some user groups (in the example, sysop and rollbacker).
/* SLOW_OK adapted from Krinkle, bug 25799 */
SELECT
count(*) AS counter,
user_name,
user_editcount,
GROUP_CONCAT(DISTINCT ug_group SEPARATOR ", ")
FROM logging log
JOIN user us
ON log.log_user = us.user_id
JOIN user_groups ug
ON log.log_user = ug.ug_user
WHERE ug.ug_user NOT IN
(SELECT DISTINCT user_groups.ug_user
FROM user_groups
WHERE ug_group = 'sysop'
OR ug_group = 'rollbacker'
)
AND log.log_type = 'patrol'
AND log.log_action = 'patrol'
AND log.log_timestamp > ( NOW() - INTERVAL 24 MONTH )
AND ( log.log_params LIKE '%"6::auto";i:0%'
OR log.log_params LIKE '%\n0' /* not autopatrolled */ )
GROUP BY log.log_user
ORDER BY counter DESC
LIMIT 50;
In wikis where "patrol" right is given to all users, the example query can be used to find active patrollers who would be likely candidates for additional rights (like autropatrolled, rollbacker, sysop etc. depending on the wiki's configuration).
Redirects with more than one revision
మార్చుSELECT
page_title,
rd_namespace,
rd_title,
COUNT(*) as edits
FROM revision
INNER JOIN page
ON rev_page = page_id
INNER JOIN redirect
ON rev_page = rd_from
WHERE page_namespace = 0
AND page_is_redirect = 1
GROUP BY page_title
HAVING COUNT(*) > 1;
Explanation: This query lists redirect pages with more than one revision.
Short non-disambiguation pages
మార్చుSELECT
page_title
FROM page
WHERE page_namespace = 0
AND page_is_redirect = 0
AND page_id NOT IN (SELECT
tl_from
FROM templatelinks
WHERE tl_title = 'Disambiguation'
AND tl_namespace = 10)
ORDER BY page_len ASC
LIMIT 1000;
Explanation: This query lists 1000 non-disambiguation short pages.
List of external links
మార్చుSELECT
COUNT(*),
TRIM(LEADING 'www.' FROM SUBSTRING_INDEX(SUBSTRING_INDEX(el_to, '/', 3),'/',-1)) AS site
FROM externallinks, page
WHERE el_from = page_id
AND page_namespace = 0
GROUP BY 2
HAVING COUNT(*) > 10
ORDER BY 1;
Explanation: This query compiles a list of external links (from the main namespace) grouped by website.
List of external links from all wikis
మార్చుList of external links to a given domain from all namespaces on all Wikimedia projects; requires iterating over wikis, with a bash script in the example, syntax like ./extlinks.sh http://meta.wikimedia.org
(outputs DB name, page title, linked URL).
#!/bin/bash
WIKIS=$(mysql -BN -h sql -e 'SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND is_closed = 0;')
for dbname in $WIKIS;
do
echo $dbname
echo "
/* SLOW_OK */
SELECT CONCAT(ns_name, ':', page_title), el_to
FROM externallinks
JOIN page ON el_from = page_id
JOIN toolserver.namespacename ON dbname = '$dbname' AND ns_is_favorite AND ns_id = page_namespace
WHERE el_to LIKE '$1%'
-- " | mysql -h ${dbname/_/-}.rrdb.toolserver.org -BcN $dbname
# mysql switches
# -c Prevent comment stripping, need to prevent the query killer
# -N No column name heading
# -B Bare formatted / -t Table formatted / -H HTML output / -X XML output
done;
List of interwiki links from all wikis
మార్చుAs above; use lowercase interwikiki prefix, like ./iwlinks.sh meatball
.
#!/bin/bash
WIKIS=$(mysql -BN -h sql -e 'SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND is_closed = 0;')
for dbname in $WIKIS;
do
echo $dbname
echo "
/* SLOW_OK */
SELECT CONCAT(ns_name, ':', page_title), CONCAT('$1:', iwl_title)
FROM iwlinks
JOIN page ON iwl_from = page_id
JOIN toolserver.namespacename ON dbname = '$dbname' AND ns_is_favorite AND ns_id = page_namespace
WHERE iwl_prefix = '$1'
-- " | mysql -h ${dbname/_/-}.rrdb.toolserver.org -BcN $dbname
# mysql switches
# -c Prevent comment stripping, need to prevent the query killer
# -N No column name heading
# -B Bare formatted / -t Table formatted / -H HTML output / -X XML output
done;