WordPress Developers: Test your i18n (internationalization) knowledge!

Alex Kirk lives in Austria and is a developer on the i18n (internationalization) team at Automattic. We’re looking for talented people wherever they live —why not join our team

i8n-logo

Whenever we write plugins or themes, there is one thing that needs a little extra attention and is quite frankly hard to get right: Translatable text.

Be it a button or some explanatory text, you generally will want to make that text be translatable to other languages, so that even more people can use your piece of software. While there is a very extensive guide available in the WordPress Handbook, we have created a fun way to brush up your knowledge on how to get things right: a quiz.

If you’re reading this post via a feed reader or an e-mail subscription, we encourage you to view the post on our developers blog to take the test (there are no winners or losers, this is meant to help you learn!), as it uses a little JavaScript to tell you whether an answer is right or wrong.

For each answer, we also provide an explanation, whether it’s right or wrong. So after clicking the answer that you think is right, make sure to click the other ones to explore what might be wrong about them.

So without further ado, take the quiz below!

You want to output the username in a sentence. Assume that the $username has been escaped using esc_html(). How do you do that?
<?php printf( __( 'Howdy, %s!' ), $username ); ?>
Good! Some languages may need to switch the location of the username to the front of this string. This code provides needed flexibility by including both the placeholder and the punctuation mark. Check the other answers though, there is an even improved answer.
<?php /* translators: %s is a username */ printf( __( 'Howdy, %s!' ), $username ); ?>
Awesome, the comment for translators is the cherry on the cake, as they cannot see variable names. Some languages may need to switch the location of the username to the front of this string. This code provides needed flexibility by including both the placeholder and the punctuation mark.
<?php printf( __( 'Howdy, %s' ), $username ); ?>!
This is almost correct. The punctuation mark should be included in the translatable string.
<?php echo __( 'Howdy' ) . ', ' . $username; ?>!
Translators may need to put the username first in other languages. That’s not possible with this code because it isn’t using a placeholder and a function that does substitution such as printf.
<?php _e( 'Howdy, %s!', $username ); ?>
The _e() function can only output text. It does not substitute variables.
<?php _e( "Howdy, $username!" ); ?>
Variables in a string are a no-no because the translated text is loaded by using the original English text which needs to be the same for all possible outputs.
You need to include a link in a sentence. How can you do that?
printf( __( 'Publish something using our <a href="%s">Post by Email</a> feature.'), 'http://support.wordpress.com/post-by-email/' );
Correct. Embed HTML in the string when it is necessary to keep the sentence structure intact for translators. Some examples would be href tags or bold/italics around a mid-sentence word.
_e( 'Publish something using our <a href="http://support.wordpress.com/post-by-email/">Post by Email</a> feature.' );
We don’t want to include URLs in the translation because we don’t want to expose them as translatable to translators. Also, if the URL is hardcoded within the string and then we ever change it, the entire string will become a new translation which will require re-translation.
printf( __( 'Publish something using our %s feature.' ), sprintf( '<a href="http://support.wordpress.com/post-by-email/">%s</a>', __( 'Post by Email' ) ) );
This code breaks the sentence up which causes a loss of full context during translation. We always try to keep full sentences/phrases together because having the whole string leads to much better translations.
Which of these is the correct way to use the single/plural _n() function?
printf( _n( '%d person has seen this post.', '%d people have seen this post.', $view_count ), $view_count );
Correct. Always use a placeholder in both singular and plural strings.
printf( _n( 'One person has seen this post.', '%d people have seen this post.', $view_count ), $view_count );
The hardcoded “One” in the singular string is problematic. We always want to use a placeholder in both singular and plural strings. Some languages (such as Russian) have multiple plurals which require the flexibility provided by using the placeholder in the singular string (#).
“So and so many people have seen this post” should be output like this:
printf( _n( '%d person has seen this post.', '%d people have seen this post.', $view_count ), $view_count );
Correct. We use the variable twice: 1) we need the number for the _n() function to determine the correct singular/plural text and 2) we need the number for the subsequent substitution in the printf. Also, it’s very important that the %d placeholder is used in the singular string (and not a hardcoded “1”) because some languages, such as Russian, have multiple plural forms. Those languages rely on that flexibility in the singular string.
printf( __( '%d people have seen this post.' ), $view_count );
For strings like this containing a numerical count, we want to use _n() instead because we always need to include the singular form of the string–even if the singular case should never happen. Why? Some languages, such as Russian, have multiple plural forms and they rely on flexibility provided by the singular string.
printf( _n( '%d person has seen this post.', '%d people have seen this post.' ), $view_count );
Almost. The _n() function also needs to know about the count value via its third parameter so it can determine the correct text.
printf( 1 == $view_count ? __( '%d person has seen this post.' ) : __( '%d people have seen this post.' ), $view_count );
Some languages have multiple plural forms–not just the typical singular/plural distinction–so this approach is problematic. We need to use _n() instead as it accounts for those multiple plural form complexities.
echo _n( 'One person has seen this post', "$view_count people have seen this post." );
Several things are amiss here. First, the hardcoded “One” needs to be a %d placeholder because some languages have multiple plural forms–not just the typical singular/plural distinction–and _n() with proper placeholdering handles that. The second issue is that $view_count needs to be a %d placeholder as well. Finally, all the above means that we need to switch the echo to a printf to use the placeholders and we’ll also want to add $view_count as a third argument to _n() as it expects a count value to determine which string to use.
How do you deal with outputting a variable in the context of a translation?
<h1><?php printf( __( 'Hello %s' ), esc_html( $world ) ); ?></h1>
Correct. Here PHP 1) swaps in the translated string which also contains the %s placeholder, 2) escapes the $world var safely, and then 3) substitutes the now escaped $world value into the placeholder spot. Exactly what we want.
One reminder, though: if you use this piece of code you need to be sure that you have verified your translations, so that your translation of Hello %s doesn’t include malicious code. If you don’t trust your translations, you should use a esc_html(sprintf()) construction instead of the printf.
<h1><?php printf( esc_html__( 'Hello %s' ), $world ); ?></h1>
This code is unsafe because it isn’t escaping $world at all. PHP runs esc_html__ first which swaps in the translated string (eg, "Hola %s") and then escapes it. Unfortunately, after that, printf swaps the value of $world into the placeholder which is unescaped. Danger, Will Robinson, danger!
<h1><?php echo esc_html__( sprintf( 'Hello %s' ), $world ) ); ?></h1>
We never want a sprintf inside a translation function. Translation files are generated by a cron job that parses (not execute!) PHP files looking for the translation functions sprintf isn’t resolved when that parsing happens which means this code will just be garbage translation data.
<h1><?php esc_html_e( 'Hello %s', $world ); ?></h1>
The second parameter of esc_html_e() is for a context value. We need printf here to do the variable substitution.
What’s the best practice to include formatted numbers in strings?
printf( _n( 'Today you already got %s view.', 'Today you already got %s views.', $view_count ), number_format_i18n( $view_count ) );
Correct. Use _n() for the possibly singular/plural string and use number_format_i18n() to actually format the number to local rules (for example some locales have a different thousand separator). We do indeed use %s here for the number because number_format_i18n() returns a formatted string.
$views = number_format( $view_count );
printf( _n( 'Today you already got %d view.', 'Today you already got %d views.' ), $views );
There are a few problems here. We want to be using number_format_i18n(). Also, number_format_i18n() produces strings, not numbers, so we need to use %s. Finally, in addition to printf, we need to give the count number to the _n() function so it knows which string variant to use.
_en_fmt( 'Today you already got %d view.', 'Today you already got %d views.', $views );
Arrowed! There isn’t a _en_fmt() function.
How to deal with multiple variables in a translated string?
printf( __( 'Posted on %1$s by %2$s.' ), $date, $username );
Almost correct. The placeholders are numbered so their values can be re-arranged if need be in translations. The remaining problem, though: translators don’t see the variable names, therefore they can only guess that the one variable is a date and the otherone is a username.
/* translators: %1$s is a date, %2$s is a username */
printf( __( 'Posted on %1$s by %2$s.' ), $date, $username );
Perfect. We make sure to number our placeholders so their values can be re-arranged if need be in translations. Also we give additional info to translators so that they can know which variable means what.
printf( __( 'Posted on %(date)s by %(username)s.' ), $date, $username );
Good thinking, but this syntax unfortunately is not available in PHP.
printf( __( 'Posted on %s by %s.' ), $date, $username );
We want to make sure we use numbered placeholders (ie, %1$s, %2$s, etc) whenever there is more than one placeholder because translators may need to re-arrange their locations in their translations.
Which of these is correct?
switch ( $type ) {
    case 'date':
        printf( __( 'Sorted by date' ) );
        break;
    case 'comments':
        printf( __( 'Sorted by comments' ) );
        break;
}
Correct. We want to give translators full sentences/phrases.
switch ( $type ) {
    case 'date':
        printf( __( 'Sorted by %s.' ), __( 'date' ) );
        break;
    case 'comments':
        printf( __( 'Sorted by %s.' ), __( 'comments' ) );
        break;
}
Unnecessarily breaking up sentences/phrases is a problem for translators. “Date” by itself may be translated differently from when it is used in a sentence, so we want to keep complete sentences/phrases together whenever possible.
$pattern = __( 'Sorted by %s.' );
switch ( $type ) {
    case 'date':
        printf( $pattern, __( 'date' ) );
        break;
    case 'comments':
        printf( $pattern, __( 'comments' ) );
        break;
}
This looks so efficient but unfortunately it’s wrong: essentially this is a concatenation of strings, which can’t be done in translations, because a generic translation of “date” might be wrong in the context of sorting. Or it would need to be in another grammatical case. Or other reasons. Short: don’t do that.
printf( __( 'Sorted by %s.' ), __( $type ) );
The code here won’t work because translation functions cannot be fed PHP variables. Translation files are generated by a cron job that parses (not execute!) PHP files looking for the translation functions. It doesn’t execute any of the PHP so the variable is unresolved which leads to garbage translation data (actually, the parsing just rejects it).

Lossy Image Compression with Photon

If you were watching closely, you may have noticed that we recently introduced the option for lossy JPEG compression with Photon. The new parameters are quality and strip. Quality is pretty straight forward — the image quality out of 100. Strip refers to meta data that can be stripped from an image — namely exif and color data. It accepts exif, color, or all for both.

For example: https://developer.files.wordpress.com/2015/02/dsc01921.jpg?w=780&quality=80&strip=all

You can drop a snippet like this in a plugin to set the quality and strip parameters for every image on the site.

add_filter('jetpack_photon_pre_args', 'jetpackme_custom_photon_compression' );
function jetpackme_custom_photon_compression( $args ) {
    $args['quality'] = 80;
    $args['strip'] = 'all';
    return $args;
}

The results can be pretty dramatic. At full size, this image of downtown Madison goes from 16MB to 2.7MB by setting the quality to 80%. That’s a big deal on a mobile connection and it’s pretty hard to spot the difference on most images unless you’re looking at them side by side.

DSC01921

A more secure REST API

Because privacy and security are important to users across the internet, many services have begun to encrypt the connection between a user’s browser and their servers. The use of SSL (or TLS) largely eliminates the likelihood that a “man-in-the-middle” is able to monitor a user’s activities on the web. To this end, WordPress.com is joining the likes of Google and Facebook in encrypting all of the traffic sent across our network. We are currently in the process of forcing many of our services to be accessible through HTTPS exclusively.

It was previously possible to access the WordPress.com/Jetpack JSON API through HTTP only for unauthenticated requests. As part of the SSL transition, all public-api.wordpress.com endpoints are now accessible via HTTPS only. Any requests made to the HTTP version of the URL will now 301 redirect to the HTTPS version.

What does this mean for you?

For the majority of our API consumers, this won’t require any change as you are likely already using the HTTPS URLs with authenticated endpoints. If you are not, now is the time to update your API calls to the secure URLs.

By making this change, we’re helping make the web a more secure place for our users.

As always, If you have any questions about the API, don’t hesitate to comment below or reach out to us via our developer contact form.

Version 1.1 of the WordPress.com REST API

Today, we’ve launched version 1.1 of the WordPress.com REST API. In recent weeks, we’ve been hard at work launching new features on WordPress.com, and many of these changes are powered by our REST API. When we started working on the upgrades to stats and post management, we quickly realized that the existing endpoints didn’t have all the power we needed to provide the best experience. In order to add the functionality we needed to the API without breaking existing implementations, we decided to version our API.

What does this mean for you?

If you’re already implementing version 1 of the API, you’ll be able to continue using those endpoints without changing your code for the foreseeable future. Version 1 of the API is now deprecated, so any new development you do should be against 1.1. We currently have no plans to disable version 1 of the REST API — should we ever decide to do so, we’ll give you plenty of advanced notice.

Media Endpoints

  • Upload support for all file types. If you can you upload it though the media explorer, you can upload it with the API. PDFs, Docs, Powerpoints, Audio files, and Videos (Jetpack & .com blogs with VideoPress) are all supported.
  • Better error handling. If you upload multiple files and some fail, it’s easier to pull those out and retry.
  • Improved consistency with other endpoints and cleaned up response parameters.
  • When uploading files, you can now pass attributes like name and description without needing to do a second call to the update endpoint.
  • Bonus: The /sites/$site/ endpoint now returns a list of allowed file types.

Stats Endpoints

  • Support for pulling back stats over multiple days without those stats being grouped into a single result.
  • New stats detailing the top comment authors on your site, as well as the posts that have received the most comments.
  • In addition to chart data for views and visitors: chartable data about likes and comments.
  • Keep on track with your posting goals. The new streak endpoint contains the data to help motivate you to post more often.

We’re looking forward to seeing what you build using version 1.1. Take a look at the REST API documentation to get started. If you have any questions about the API, don’t hesitate to comment below or reach out to us via our developer contact form.

On API Correctness

Developing APIs is hard.

You pour your blood, sweat, and tears into this interface that bares the soul of your company and of your product to the world. The machinery under the hood, though, is often a lot less polished than the fancy paint job would lead the rest of the world to believe. You have to be careful, then, not to inflict your own rough edges on the people you expect to be consuming your API because…

Using APIs is hard.

As an app developer you’re trying to take someone else’s product and somehow integrate it into whatever vision you have in your head. Whether it’s simply getting a list of things from another service (such as embedding a reading list) or wrapping your entire product around another product (using Amazon S3 as your primary binary storage mechanism, for example), you have a lot of things to reconcile.

You have your own programming language (or languages) that you’re using. There’s the use case you have in mind, and the ones the remote devs had in mind for the API. There’s the programming language they used to create the API (and that they used to test it). Finally, don’t forget the encoding or representation of the data — and its limitations. Reconciling all of the slight (or major) differences between these elements is a real challenge sometimes. Despite years of attempts at best practices and industry standards, things just don’t always fit together like we pretend that they will.

As a developer providing an API it’s important to remember three things. There are obviously many other things to consider, but these three things are more universal than most.

#1 You want people to use your API.

Unless you’re developing a completely internal API, you’re hoping that the world sees your API as something amazing, and that your functionality starts popping up in other magical places without any further effort on your part.

#2 You have no control over what tools others are using.

Are you using a language that has little or no variable type enforcement? Some people aren’t. Some of those people still want to use your product. Did you come up with your own way of doing things with custom code instead of using widely-adopted industry standards (which, being widely deployed, come with battle-tested libraries in many languages)? Did that cause you to release a client in your own language (how about Clojure, how about Erlang, how about C++, how about Perl, how about…)?

#3 Your API is a promise.

It’s easy to forget (especially for those of us who spend our time in a forgiving language such as PHP or Python) that the API we provide is a promise to the rest of the world. What it promises is this: “When you provide me with ${this} I will provide you with ${that}”.

The super-important (and insidiously non-obvious) thing about this is that if you do not provide a written promise (in the form of your API’s documentation), then the behavior of your API becomes the implicit promise.

The most important thing to note here is that when your documentation is wrong, the promise of your actual behavior wins every single time.

Keep your promises

When your promises don’t match your actual results things get hairy.

Let’s take a look at a completely hypothetical situation.

  1. You have an API that is documented to return a json object with a success member which should be a boolean value.
  2. You have a case (maybe all cases) where success is actually rendered as an integer (0 for false, 1 for true).
  3. John has an app written in a strongly-typed language that works around this by defining success as an integer type instead of a boolean type. Because John was busy, he never got around to letting you know. Or maybe John never knew because he simply inspected your API and worked backwards from the responses that you gave. Now John’s app has 100k users depending on this functionality.
  4. Mary is writing an app, and because Mary doesn’t like to play fast and loose (and she doesn’t want her app to break later on) she submits an issue pointing out that you are returning the wrong type.

At this point you are trapped. The existing user base (and by extension their user base) is committed to integers. And you only have four options.

  1. You can cripple an existing and deployed application enjoyed by 100k users.
  2. You can version your API — an entire new version to correct what should be a boolean value.
  3. You can work with John to roll out a new version of the app which can handle both (but maybe his app is in the iOS app store, and getting everyone to update is impossible, takes a long time, and/or would require a lengthy, and potentially costly, review process by yet another party).
  4. You make a really sad face and change your promise — to reflect that you are going to do what is actually the less correct thing, forever.

Because you wrote an API whose promise was wrong, or whose promise was missing, you have painted yourself into a very undesirable corner. You’re now in a place where doing the right thing for the right reasons is the wrong move.

So do yourself, and everyone else, one of two favors — depending on the position in which you find yourself.

If you’re producing an API, take extra care to make sure that your results match your documentation (and you need to have documentation).

If you’re consuming an API, don’t be like John. Don’t work backwards from the data — work forwards from the docs. And if the docs are wrong you should submit a ticket and wait for it to be fixed (or at the very, very least, make sure your workaround deals with both the documented expectation and the actual incorrect return value).

In conclusion

Just like a child, it takes a village to raise a good, decent, hard working API.

Data for nothing and bytes for free

WordPress.com is a freemium service, meaning that our awesome blogging platform is provided for free to everyone, and we make money by selling upgrades. We process thousands of user purchases each week and you might expect that we know a lot about our customers. The truth is, we are still learning. In this post, we will give you some insights into how we try to understand the needs and behaviors of users who buy upgrades.

We know there are many kinds of users and sites on WordPress.com. To understand the needs of users who purchase upgrades, one would naturally analyze their content consumption and creation patterns. After all, those two things should tell us everything about our users, right?

Somewhat surprisingly, the median weekly number of posts or pages a user creates, and the median weekly number of likes and comments a user receives is zero! And I’m not talking about dormant users. These are our paying customers. There are lots of reasons for this, like static sites that don’t need to change very often, or blogs with a lower frequencies than weekly. But it doesn’t give us much data to work with.  Well, let’s start with something that IS known about every user: their registration date.

Thousands of users register daily on WordPress.com. What does the day of the week on which the user registered with us say about their purchasing preferences? Is it possible that users who register during the week are more work-oriented, and users who register during weekends are more hobby oriented? To test this question, we’ll look at purchases that were made in our online store between March and September 2013.

We’ll divide the purchasing users in two groups: those who registered between Monday and Friday (let’s call them “workweek users”) and those who registered during Saturday and Sunday (let’s call them “weekend users”).


Side note: To the first approximation, we use registration GMT time to label a user as “registered on weekend” or “registered during the workweek”. We also ignore weekend differences that exist between the different countries. These are non-trivial approximations that make the analysis simpler and do not invalidate the answer to our question.

To examine the purchasing patterns of these groups let’s calculate the fraction of products purchased. For example: the most prevalent products in both categories were [domain mapping and registration](http://en.support.wordpress.com/domains/). These two products, that are usually bought together, are responsible for about 35% of upgrades bought by our workweek and weekend users. Let us now continue this comparison using a graph:

correlation_between_purchases

What do we learn from this comparison? Almost nothing. Which is not surprising, as purchasing distribution pattern is mostly determined by factors such as user preferences, demand, price etc.

Let’s look for more subtle differences. We’ll use a technique known as a Bland / Altman Plot. These British statisticians noted that plotting one value versus another implies that the one on the X axis is the cause and the one on the Y axis is the result. An alternative implication is that the X axis represents the “correct value”. None of these is correct in our case. We are interested in understanding the agreement (disagreement, to be more precise) between two similar measurements, when none of the two is superior over another. Thus, instead of plotting the two closely correlated metrics (purchase fractions in our case), we should plot their average values on the X axis and their difference on the Y axis. In this domain, higher X axis values designate more prevalent products, positive Y values designate preference towards the working days and negative Y values designate preference towards the weekend. This is what we get after transferring the fractions to logarithm domain:

altman_bland_1

Now things become interesting. Let us take a look at some of the individual points:

altman_bland_emphesis

As I have already mentioned, domain mapping and registration are the most popular products. Not surprisingly, these products are equally liked by weekend and working week users. Recall our initial intuition that users who register during weekends will be more hobby-oriented and users that register during the week will be more job oriented. We now have some data that supports this intuition. Of all the products, private registration, followed by space upgrades have the strongest bias towards weekend users. Indeed, one would expect personal users to care about their privacy much more than corporate ones. Being more cost-sensitive, personal users are more likely to purchase space upgrade and not one of plans. The opposite side of the division line makes sense too: blocking ads is the cheapest option to differentiate a workplace site, followed by custom design. These two options are included in all our premium plans, but I can understand how a really small business would prefer buying some individual options.


Another note: If you are worried about statistical significance of this analysis, you are completely right. I don’t show this here, but exactly the same picture appears when we analyze data from different time periods.

So what?

As an app developer, you will at some point be frustrated about how little you know about your customers. Don’t give up! Start with the small things that you know. Things such as day of the week, geographical location and browser version may shed useful light and you can build out a picture from there, adding to it bit by bit. Having such information is like gardening: it sounds like a lot of work, but you might be surprised at what you can get from a little investment of time. With determination (asking lots of questions) and creativity (looking at a problem from new angles, starting with information you already have) and the right tools in your hands, you can learn something about your users and grow your garden of understanding.

OAuth2 Global Scope Tokens

The WordPress.com REST API has enabled developers to create rich applications to interact with blogs hosted on WordPress.com or hosted elsewhere when used with the Jetpack plugin. Until now, it’s only been possible to request an authorization token for a single blog at a time, but we’re happy to announce that this limitation has been lifted. Starting today, you can request access to all sites to which a user has administrative access by using the global scope option with our existing OAuth2 authentication process.

To use the new global scope, redirect your users to the OAuth2 authorization endpoint below to request access to all of the user’s sites:

https://public-api.wordpress.com/oauth2/authorize?client_id=your_client_id&redirect_uri=your_url&response_type=code&scope=global

The user will be presented with an improved authorization screen to more clearly reflect the permissions being granted to your application, as seen in the screenshot below.

global_authorization

You can learn more about the OAuth2 authentication flow at our detailed support article.

If the user chooses to grant you access to all of their sites, you will receive a token which includes a scope value of “global”.

{
    "access_token": "YOUR_API_TOKEN",
    "token_type": "bearer",
    "scope": "global",
    "blog_id": 0,
    "blog_url": null
}

Once you’ve received your access token, you can view all of the user’s sites by making a request to the /me/sites endpoint.

It’s important to consider whether or not your application needs access to all of a user’s sites or if working with a single blog at a time is sufficient. As you might expect, users will tend to be more cautious when granting access to all of their sites to an unfamiliar application.

We hope that this new feature will enable you to build more powerful applications where it’s useful to manage more than one site to which a user has access. If you have any questions, leave a comment below or use our contact form to reach us directly.