Building your own client for Wikipedia's image suggestions API
2022-10-28 13:33:54 +0200 +0200
In a previous post I wrote about how we built the image suggestions feature for Special:Homepage
on Wikipedia. But that is just one implementation, bound specifically to a web client accessing Special:Homepage
.
In this post, I’ll walk through how you could build another implementation (for example, an iOS or Android application) client to consume image suggestions and save user judgments on those suggestions back to Wikipedia.
Most of the APIs described here are “internal” (though public). If you do intend to build a client, please let me know.
Start with a logged-in user
In theory, you could do this with anonymous (IP editor) accounts. But as most of our APIs and workflows for this feature assume a logged-in user, it will be much easier if we assume for the rest of this document that you have a logged-in user, and know how to retrieve a CSRF token from MediaWiki.
Enable suggested edits for your user
Make sure you’ve enabled these two preferences for your user:
{
"growthexperiments-homepage-suggestededits-activated": 1,
"growthexperiments-homepage-enable": 1
}
Those two preferences are useful, because GrowthExperiments extension, which provides a bunch of validation / post-processing logic for image suggestions, checks to see if a user has those two options set.
Getting a list of suggestions
The growthtasks
API module allows you to get a list of task suggestions.
For example, curl https://cs.wikipedia.org/w/api.php?action=query&format=json&list=growthtasks&formatversion=2
returns:
{
"batchcomplete": true,
"continue": {
"gtoffset": 15,
"continue": "-||"
},
"query": {
"growthtasks": {
"suggestions": [
{
"title": "Historicita Bible",
"tasktype": "image-recommendation",
"difficulty": "medium",
"order": 0,
"qualityGateIds": [
"dailyLimit"
],
"qualityGateConfig": {
"image-recommendation": {
"dailyLimit": false,
"dailyCount": 0
},
"link-recommendation": {
"dailyLimit": false,
"dailyCount": 0
}
},
"token": "97ijhm0m0jtpv0jkube39qk29gba4cjo"
},
[...]
However, you’ll notice that this list doesn’t have page IDs.
So instead of a list
API query, you may find the generator
query to be more useful, e.g. curl https://cs.wikipedia.org/w/api.php?action=query&format=json&generator=growthtasks&formatversion=2
:
{
"batchcomplete": true,
"continue": {
"ggtoffset": 15,
"continue": "ggtoffset||"
},
"growthtasks": {
"totalCount": 68336,
"qualityGateConfig": {
"image-recommendation": {
"dailyLimit": false,
"dailyCount": 0
},
"link-recommendation": {
"dailyLimit": false,
"dailyCount": 0
}
}
},
"query": {
"pages": [
{
"pageid": 80867,
"ns": 0,
"title": "Nevus flammeus",
"tasktype": "image-recommendation",
"difficulty": "medium",
"order": 2,
"qualityGateIds": [
"dailyLimit"
],
"qualityGateConfig": {
"image-recommendation": {
"dailyLimit": false,
"dailyCount": 0
},
"link-recommendation": {
"dailyLimit": false,
"dailyCount": 0
}
},
"token": "8dsoq0i2bb5dvr1q7q0a3jjji8mtqgiv"
},
[...]
If you want to filter by topics, you can do that with the topics
parameter (example), and you can also use topicsmode
to decide if you want to AND
or OR
those topics together.
Notes about the contents
batchcomplete
: see this notecontinue
: if you want to get more than 15 results (the default), with most action APIs you’d specify a number here to continue the list from. That does not work for this API, instead you will pass as set of IDs forexcludepageids
parameter. I’ll explain more a bit father down.growthtasks
: Metadata about the request.totalCount
: the total number of tasks available.qualityGateConfig
: information from the site about whether the user has surpassed their daily limit for image recommendation or link recommendation tasks. Server-side code enforces this limit; the client can use this information to tell the user that they’ve reached their limit for the day.query.pages
: List of resultsqualityGateIds
: the list of quality gates that apply to this particular itemtoken
: an analytics token. You can use this with EventLogging data, if you want.
How do I get more tasks
If your client wants to page through results, you do not use the continue
parameter as you would with other action APIs. Instead, keep track of the page IDs that you see in the results. Then make another call to action=query&generator=growthtasks
and set the ggtexcludepageids
parameter with a pipe-delimited list of page IDs (urlencoded), e.g. 1|2
becomes 1%7C2
.
Why can’t I use continuation?
We do this because the API queries ElasticSearch using the srsort=random
flag. As such we don’t have a way to provide stable continuation.1
Getting metadata for a suggestion
OK, to summarize where are so far:
- we have a user account
- we have a list of articles with image suggestions
How do we get the image suggestion metadata?
At the moment, if you have access to maintenance servers at Wikimedia, you set up an SSH tunnel.
If you don’t have access, you need to wait for T306349 – we’ll either have the ability to query the Cassandra-backed database directly for image suggestion metadata, or you’ll proxy your request via a GrowthExperiments API module.
To obtain the metadata, you use the page ID from the growthtasks
API to query the Cassandra DB, e.g. curl /public/image_suggestions/suggestions/cswiki/311675
:
{
"rows": [
{
"wiki": "cswiki",
"page_id": 311675,
"id": "644c90bc-ba40-11ec-ba4c-f0d4e2e69820",
"image": "Anestis_Delias_rebetiko_musician_about_1933.jpg",
"confidence": 80,
"found_on": null,
"kind": [
"istype-commons-category"
],
"origin_wiki": "commonswiki",
"page_rev": 20265888
},
{
"wiki": "cswiki",
"page_id": 311675,
"id": "644c90bc-ba40-11ec-ba4c-f0d4e2e69820",
"image": "Babis_Tsertos.jpg",
"confidence": 80,
"found_on": null,
"kind": [
"istype-commons-category"
],
"origin_wiki": "commonswiki",
"page_rev": 20265888
},
Using the metadata
You can use this metadata to provide some guidance to your user about whether to accept or reject the suggestion.
Here’s an example of how we do that on Wikipedia, via VisualEditor’s AddImage plugin:
Saving the user’s response
There are three possible options: accepting, rejecting, and skipping the suggestion.
Accepting
Use VisualEditor’s Edit API to add a File reference to the top of the article. I don’t think there are external APIs for this.2 But what you need to do is construct a [[File:]]
reference and add it to the page. Here’s an example diff for Czech Wikipedia:
[[Soubor:Yellow cab.JPG|náhled|Osoba je chleba]]
The parameters are: the file name, the type is “thumb”, then the last argument is the user-provided caption.
When saving to VisualEditor, you need to also construct a data plugin payload to send along with the Edit API request.
The part of the payload looks like this:
{
"plugins": "ge-task-image-recommendation",
"data-ge-task-image-recommenation": {
"taskType": "image-recommendation",
"filename": "Zebra_Jump_(2865800718).jpg",
"accepted": true,
"reasons": [],
"caption": "Here is some caption."
}
}
This payload is important, as that’s how GrowthExperiments extension determines that this is a valid acceptance of an image suggestion. The GrowthExperiments code will then:
- log this in
Special:Log
- remove the image recommendation from the search index
- update the user’s task set cache
Rejecting
Rejecting is similar; use the VisualEditor Edit API, but this time the payload looks like this:
{
"plugins": "ge-task-image-recommendation",
"data-ge-task-image-recommenation": {
"taskType": "image-recommendation",
"filename": "Zebra_Jump_(2865800718).jpg",
"accepted": false,
"reasons": ["notrelevant", "noinfo"],
"caption": ""
}
}
The list of valid rejection reasons is here; there should probably be an API to get them.
Skipping
Easiest of all. You don’t need to do anything, other than, if you want, keeping track of the page ID in your client so that you don’t show your user this article again.
Conclusion
That’s the high level overview. Image suggestions processed this way will:
- have
newcomer tasks: image suggestion
added as an edit tag (so you can query them in e.g. RecentChanges) - receive the same validation used in GrowthExperiments’ processed image suggestions
You can look at GrowthExperiments code to see how we do this in a VisualEditor plugin. ↩︎