New WordPress unified API console

Since the WordPress 4.7 “Vaughan” release, each WordPress installation includes REST API endpoints to access and manipulate its content.  These endpoints will be the foundation for the next generation of WordPress websites and applications.

Today we’re releasing a brand new Open Source WordPress API console. You can use it to try these endpoints and explore the results.  The console works for any website on and also for any self-hosted WordPress installation.

Using the console with APIs

You can use this application today to make read and write requests to the API or the WordPress REST API for any website hosted on or using Jetpack.  Visit the new version of the application here:

Using the console with your self-hosted WordPress sites

To use the console with your self-hosted WordPress installation(s), you’ll need to download the application from GitHub, configure it, and run it on your local machine.  You’ll also need to install the WP REST API – OAuth 1.0a Server plugin on your WordPress site.  The Application Passwords plugin is another option – but – if you use this plugin, make sure that your site is running over HTTPS.  Otherwise, this configuration is insecure.

Full installation and configuration instructions are on the GitHub repository.

Technical Details

The console is a React/Redux application based on create-react-app that persists its state to localStorage.

What’s next?

We have a few more features planned that we think you’ll like.

  • We can use the new console application to allow you to easily generate and save OAuth2 tokens for your API Applications.  As compared to implementing the OAuth2 flow yourself, this will be a much easier way to obtain an API token for testing your applications.
  • We also plan to ship the console as a regular WordPress plugin, replacing the existing older plugin.
  • We can allow you to add/edit self-hosted WordPress websites on our hosted version of the console and persist them to localStorage.  This way you’ll be able to query your WordPress sites without having to install the console yourself.


As usual, the new console is open source, and we hope this will be a tool that will benefit the entire WordPress community.

If you find a bug, think of a new feature, or want to make some modifications to the API console, feel free to look through existing issues and open a new issue or a PR on the GitHub repository.  We welcome all contributions. Goes Open Source

We’re happy to announce that we’ve open-sourced two projects last week: Delphin and its sibling, Delphin Bootstrap. Delphin powers, a site we built for a smooth registration of .blog domains. Delphin Bootstrap is a portable development environment for Delphin — it makes getting Delphin up and running easy on any operating system.

One of Delphin’s main goals was to simplify the process of registering and managing a domain. We’ve focused on the user experience, trying to avoid as much industry jargon as possible, while keeping in mind that purchasing a domain is just the first step in the journey toward a larger goal, like telling a story or selling products.

Delphin is a web application that uses parts of the API, as well as a few more specific endpoints built on the new WordPress REST API. Like Calypso — the front end — it is based on technologies such as React, Redux, and Webpack. We also used Delphin as a platform to experiment with new technologies including Yarn, Jest, CSS modules, and React Router. Some of these experiments will make it back to Calypso.

We wanted to share some of our findings:

  • Yarn is a fantastic package manager. We had a few hiccups when we integrated it with CircleCI, but it now works smoothly.
  • CSS modules was a great addition to this project, as it makes it possible to use semantic names for the CSS classes of components, instead of coming up with a convention to prevent namespace collisions in a global stylesheet.
  • We ended up using a lot of wonderful tools and modules shared by the community.

At Automattic, we’ve been open-sourcing code for more than a decade now, and have more than 350 repositories on GitHub. We’re excited to add Delphin to this list, and welcome contributions, feedback, or reported issues!

State of Elasticsearch Systems 2016

We get asked periodically about how extensively we are using Elasticsearch. And it has come up twice in the past week, so time to write a blog post.

We are constantly expanding what we are using Elasticsearch for and so although some previous posts have broadly define what we are doing, they don’t really capture the continually expanding scale.

So here are some quick bullet points about what we currently have deployed:

  • Five clusters with a mix of versions:
    • 42 data nodes spread across 3 US data centers running ES 1.3.4. This cluster mostly runs related posts queries. 1925 shards. 11B docs. 43TB of data. 60m queries/day. 12m index ops/day (has been as high as 940m in a day though). Each index is 175 shards and has 10m blogs in it. Each blog is routed to a single shard so almost all queries only hit one shard, but we can (and do) search across multiple shards for some use cases.
    • 6 data nodes across 3 DCs running ES 1.3.9. Hosts our VIP indices and lots of other use cases. 321 indices (mostly VIPs). ~8m queries/day. ~1.5m index ops/day. Typical VIP index config is a single shard that is replicated across the three data centers. Most of these indices are small enough that sharding would reduce performance and reduce query relevancy.
    • 12 data nodes across 3 DCs running ES 1.7.5. Primarily powers Indexes the past 6 quarters of all posts. One index per quarter with 30 shards per quarter. Queries typically hit all 180 shards.
    • 3 data nodes across 3 DCs running ES 2.3.1. Currently an experimental cluster as we work to migrate to 2.x. Only production index right now is for
    • 15 (and possibly expanding to 100) data nodes for a Logstash cluster running ES 2.3. A lot of logging use cases for many different services. Growing rapidly.
  • All of our clusters use three dedicated master nodes with one master in each data center. The first cluster has its own master nodes. The next three share master servers with multiple instances of ES running on each server.
  • Typical data server config:
    • 96GB RAM with 31GB for ES heap. Remaining gets used for file system caching
    • 1-3 TB of SSD per server. In our testing SSDs are very worthwhile.
  • Query speed:
    • Related Posts: median 44ms; 95th percentile: 190ms; 99th percentile: 650ms. This is way lower than when we launched in 2013 and 99th percentile was 1.7 seconds.
    • VIP Queries: median: 25ms; 95th percentile: 109ms; 99th percentile: 311ms
    • queries: median: 130ms; 95th percentile: 250ms; 99th percentile: 260ms
  • Client-side Optimizations:
    • We cache all queries results in memcache which cuts our ES query rate in half
    • memcache timeouts vary from 30 seconds to 36 hours depending on use case
    • We analyze all queries on the client side and optimize the ES filters:
      • have a blacklist of fields that we never cache (blog_id, post_id, author_id) because they have such high cardinality (100m+ unique ids)
      • we rewrite and/or/not filters into bool queries and try to flatten them into a single filter
      • We don’t allow some types of queries (we have a whitelist)
      • We don’t allow facets/aggregations on certain fields (content, title, excerpt)
    • We generally don’t allow paging too deep or returning thousands of results at once
    • A general pattern we use is to use ES to get IDs for content, and then we get the real content from MySQL for displaying to users. This reduces what data ES needs (we strip out HTML), and we can be certain the data is not out of date since ES can be up to 60 seconds out of date in some cases (though typically is less than 5 seconds).
  • Query Use Cases (in order of query frequency):
    • Related Posts
    • Replacing WP_Query calls by converting slow SQL calls to an ES query (WordPress tag/category pages, home pages, etc)
    • Language Detection using ES langdetect plugin (used for every post we index)
    • Analyze API (used to perform reliable word counting regardless of language – in conjunction with the langdetect call)
    • Blog Search (replacing the built in WordPress site search)
    • Theme Search
    • Search Queries that are used when reindexing content (eg when a blog’s tag is renamed we need to search for all posts with that tag and reindex them)
    • Various support searches
    • A number of custom VIP use cases
    • A number of custom internal use cases (searching our p2s, suggesting posts that may be relevant to read, searching our internal docs, etc)
    • Calypso /posts and /pages for getting/searching all posts a user has authored across all their blogs (potentially hundreds)
  • ES Plugins Deployed:
    • Whatson for looking at shard distribution, disk usage, index size, etc
    • StatsD for performance monitoring (we also send StatsD data from the client about query speed) See the screenshots of dashboards below.
    • ICU Analysis
    • Langdetect
    • SmartCN and Kuromoji Analyzers
    • Head


Since images are always fun, here are our Graphana dashboards for our largest cluster over the past 6 hours. The first is our client-side tracking of query/indexing/etc speed

Screen Shot

Second is our aggregated stats (from the StatsD plugin) about the cluster’s performance:

Screen Shot

This cluster/index has been really solid for us over the past two years since it was last built. We have some known issues that have us stuck on 1.3.4, but we’ve also had times where the cluster went many months without any incidents. In general the incidents we have seen have been caused by external factors (usually over indexing or some other growth in the data).




What’s new in Calypso?

December Edition

It’s been almost a month since we released Calypso and the response has been great from the community. For those following the project more closely, we’ll be publishing summaries on new developments, focusing on framework-level improvements, new components, and the tools contributors have to work with.

Making Calypso more welcoming for developers and designers

If you install Calypso locally and point your browser to calypso.localhost:3000/devdocs/welcome we have a new developer flow that introduces you to our documentation. You can access the docs anytime from the environment badge in the lower left corner, which also highlights the git branch you are in:


  • Upgraded to Node v4 and React 0.14.
  • Started implementing Redux, a state container solution to manage data flow in the application. If you are interested in this, we are gradually moving our different data modules to Redux.
  • We refined and documented our approach to components.
  • Began exploring how pluggable modules could work in Calypso.
  • Continued migration to use svg icons everywhere instead of the old icon font. We added a few new gridicons as well.

New components and updates

Components are the building blocks of the Calypso UI. We constantly refine them and build new ones, from simple user interface ones to those carrying more complex functionality. This allows us to craft interfaces that are consistent and rich. You can check out all of these if you go to calypso.localhost:3000/devdocs/design, our live components gallery. These are some of the updates we did this month.


We added a borderless variation for one of our most used components.


Added a compact variation.


A utility component, primarily meant for setting up a poller interface wrapping another component.


A dropdown component for selecting a site, which includes instant search, handling of private sites.

Cloudup F0CE7CA7 3367 4AC1 88F9 E4CEE8790D4D


A component that renders other mocked components with a faded effect to illustrate a section when for some reasons the user cannot access it.


Consolidated Notices into a single component in components/notice. Also added a new compact variation with flex-box magic for narrow layouts:


The core component to display a site-card now support a homeLink prop which turns it into a link to the homepage of the site and renders the following icon on hover:


Component used to render individual post items. Now supports a “selected” prop to highlight a single draft in a list. (Used in the editor.)


Now also support custom icons.

Stay tuned to this blog for upcoming Calypso news and updates.

The Story Behind the New

A little over a year and a half ago, we had a dramatic rethink of the technologies and development workflows for building with WordPress.

Our existing codebase and workflows had served us well, but ten years of legacy was beginning to seriously hinder us from building the modern, fast, and mobile-friendly experiences that our users expect. It seemed like collaboration between developers and designers was not firing on all cylinders. So we asked ourselves the question:

“What would look like if we were to start building it today?”

A New Beginning: Prototyping and Iterating

We’d asked ourselves this question before, and had our fair share of initiatives that didn’t result in useful change. Looking back, we were able to pinpoint our biggest mistakes: we’d been starting with a muddy vision, and were trying to solve an ill-defined problem. These insights really helped us change our approach.


One of the original Calypso prototype screens, listing all of your WordPress sites.

Calypso, the codename for this new WordPress admin interface project, started differently. To present a clear vision, we built an aspirational HTML/CSS design prototype — based on clearly defined product goals — that allowed us to imagine what a new could look like when complete. We knew it would change over time as we launched parts to our users, but the vision provided all of Automattic with something to aim for and get excited about.

Once the Calypso prototype was in a good place, the early days of development were all about making tough decisions such as which language to use, whether to use a framework, and how we would extend our API. Automattic had just acquired Cloudup, an API-powered file-sharing tool built with JavaScript. The Cloudup team showed us a solid, maintainable, and scalable path towards making completely JavaScript-based and API-powered.

Since WordPress is a PHP-powered application, our company-wide development skill-set has historically been PHP-heavy with a sprinkling of advanced JavaScript. This made Calypso intimidating to other engineers and designers at the company for much of the first six months of its development — we were building something that few people could jump in on.

Even core Calypso project team members had to get over our intimidation. None of us were strong JavaScript developers. But as each day passed our experience built, we made mistakes, we reviewed them, we fixed them, and we learned. Once we had the project moving, we set better examples for other engineers, and shared our knowledge across the company.

One great change came out of building an early design prototype: improved collaboration using GitHub. Calypso prototyping was done collaboratively between a handful of designers in GitHub; although many of us had long used GitHub for personal projects it was relatively new for internal projects, which historically used Trac for most project management and bug tracking. Using GitHub helped us see how much easier internal collaboration could be, and how to allow for much greater feedback on individual work being done.


Peer code reviews show no sign of slowing up and are now widely accepted.

As GitHub had worked so well for the prototyping stage we switched for all Calypso development, allowing us to harness the pull request (PR) system for peer code reviews, and build our own custom GitHub-based workflow. Code reviews were new for many developers — traditionally at Automattic, we have had no systematic peer code review system outside of the VIP team’s daily code review of client sites. Code review, though it initially added to the intimidation of starting to work with Calypso, greatly increased the quality of our codebase and helped everyone level up their JavaScript skills.

What started as a team of seven people working on Calypso quickly spread to a cross-section of teams with ten, then 14, then 20 Automatticians actively working in the Calypso codebase. Two months after the launch of the first Calypso-powered feature on, we had 40 contributors working on Calypso across five different teams. We iterated over the next year with the “release early, release often” Automattic mindset, launching 40 distinct Calypso-powered features on with over 100 individual contributors.

By the middle of 2015 the Calypso codebase was in good enough shape to be used outside of the web browser. Since Calypso is entirely JavaScript, HTML, and CSS, it can run locally on a device with a lightweight node.js server setup. Using a technology called Electron, we built native desktop clients running the same code bundled inside the applications. We started work first on a native Mac desktop app, which is now available, and continued that work on soon-to-be-launched Windows and Linux apps. Seeing these apps come together and using them internally really started to justify all the hard work we’d spent building the Calypso codebase.

Open Sourcing Calypso, the Power Behind

One of our Calypso developer hangouts in progress, and Team IO, who built the Calypso editor, at our all-company Grand Meetup in October.

Over the past year and a half, Calypso has gone from an idea to an aspirational prototype to a fully functioning product built, launched iteratively, and used by millions of users. Internally, it’s been a period of great change and growth. We’ve embraced cross-team collaboration through GitHub and peer code reviews through the PR review system, gone from just a couple of great JavaScript developers to a company full of them, and seen incredible collaboration between designers and developers on a daily basis.


A handy chart to show the differences between the old and new (pdf, img)

We’re proud to be able to open source all of the hard work we’ve put in, and to continue to build on the product in an open way. You can read more about opening up Calypso development on our CEO Matt Mullenweg’s site.

Over the next few months, we’ll publish more in-depth posts exploring the technicals and workflows behind Calypso: how we manage our own unique GitHub flows, how we’ve used other popular open source libraries like React and concepts like Flux, and our experiences bundling and launching native app clients. Keep an eye out for those by following this blog (in the bottom right), and in the meantime, check out the active Calypso codebase as we continue to iterate on it.

0939030c354e4efefe655fa5107fd888Andy Peatling
Calypso Project Lead

Data for nothing and bytes for free is a freemium service, meaning that our awesome blogging platform is provided for free to everyone, and we make money by selling upgrades. We process thousands of user purchases each week and you might expect that we know a lot about our customers. The truth is, we are still learning. In this post, we will give you some insights into how we try to understand the needs and behaviors of users who buy upgrades.

We know there are many kinds of users and sites on To understand the needs of users who purchase upgrades, one would naturally analyze their content consumption and creation patterns. After all, those two things should tell us everything about our users, right?

Somewhat surprisingly, the median weekly number of posts or pages a user creates, and the median weekly number of likes and comments a user receives is zero! And I’m not talking about dormant users. These are our paying customers. There are lots of reasons for this, like static sites that don’t need to change very often, or blogs with a lower frequencies than weekly. But it doesn’t give us much data to work with.  Well, let’s start with something that IS known about every user: their registration date.

Thousands of users register daily on What does the day of the week on which the user registered with us say about their purchasing preferences? Is it possible that users who register during the week are more work-oriented, and users who register during weekends are more hobby oriented? To test this question, we’ll look at purchases that were made in our online store between March and September 2013.

We’ll divide the purchasing users in two groups: those who registered between Monday and Friday (let’s call them “workweek users”) and those who registered during Saturday and Sunday (let’s call them “weekend users”).

Side note: To the first approximation, we use registration GMT time to label a user as “registered on weekend” or “registered during the workweek”. We also ignore weekend differences that exist between the different countries. These are non-trivial approximations that make the analysis simpler and do not invalidate the answer to our question.

To examine the purchasing patterns of these groups let’s calculate the fraction of products purchased. For example: the most prevalent products in both categories were [domain mapping and registration]( These two products, that are usually bought together, are responsible for about 35% of upgrades bought by our workweek and weekend users. Let us now continue this comparison using a graph:


What do we learn from this comparison? Almost nothing. Which is not surprising, as purchasing distribution pattern is mostly determined by factors such as user preferences, demand, price etc.

Let’s look for more subtle differences. We’ll use a technique known as a Bland / Altman Plot. These British statisticians noted that plotting one value versus another implies that the one on the X axis is the cause and the one on the Y axis is the result. An alternative implication is that the X axis represents the “correct value”. None of these is correct in our case. We are interested in understanding the agreement (disagreement, to be more precise) between two similar measurements, when none of the two is superior over another. Thus, instead of plotting the two closely correlated metrics (purchase fractions in our case), we should plot their average values on the X axis and their difference on the Y axis. In this domain, higher X axis values designate more prevalent products, positive Y values designate preference towards the working days and negative Y values designate preference towards the weekend. This is what we get after transferring the fractions to logarithm domain:


Now things become interesting. Let us take a look at some of the individual points:


As I have already mentioned, domain mapping and registration are the most popular products. Not surprisingly, these products are equally liked by weekend and working week users. Recall our initial intuition that users who register during weekends will be more hobby-oriented and users that register during the week will be more job oriented. We now have some data that supports this intuition. Of all the products, private registration, followed by space upgrades have the strongest bias towards weekend users. Indeed, one would expect personal users to care about their privacy much more than corporate ones. Being more cost-sensitive, personal users are more likely to purchase space upgrade and not one of plans. The opposite side of the division line makes sense too: blocking ads is the cheapest option to differentiate a workplace site, followed by custom design. These two options are included in all our premium plans, but I can understand how a really small business would prefer buying some individual options.

Another note: If you are worried about statistical significance of this analysis, you are completely right. I don’t show this here, but exactly the same picture appears when we analyze data from different time periods.

So what?

As an app developer, you will at some point be frustrated about how little you know about your customers. Don’t give up! Start with the small things that you know. Things such as day of the week, geographical location and browser version may shed useful light and you can build out a picture from there, adding to it bit by bit. Having such information is like gardening: it sounds like a lot of work, but you might be surprised at what you can get from a little investment of time. With determination (asking lots of questions) and creativity (looking at a problem from new angles, starting with information you already have) and the right tools in your hands, you can learn something about your users and grow your garden of understanding.

Authentication improvements for testing your apps

We’ve just made it easier for developers to authenticate and test API calls with their own applications.

As the client owner, you can now authenticate with the password grant_type, allowing you to skip the authorization step of authenticating, and logging in with your username and password. You can also gain the global scope so that you no longer need to request authentication for each blog you wish to test your code with.

This is especially useful to contributors of the WordPress Android and iOS apps, which previously required special whitelisting on our part.

Here’s an example of how you can get started with using both these features:

Note that if you are using 2-step authentication (highly recommended) you will need to create an application password to be able to use the password grant_type.

$curl = curl_init( "" );
curl_setopt( $curl, CURLOPT_POST, true );
curl_setopt( $curl, CURLOPT_POSTFIELDS, array(
    'client_id' => your_client_id,
    'client_secret' => your_client_secret_key,
    'grant_type' => 'password'
    'username' => your_wpcom_username,
    'password' => your_wpcom_password,
) );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1);
$auth = curl_exec( $curl );
$auth = json_decode($auth);
$access_key = $auth->access_token;

As noted above, these are only available to you as the owner of the application, and not to any other user. This is meant for testing purposes only.

You can review existing authentication methods here.

If you have any questions, please drop them in the comments or use our contact form to reach us.

A brand new Developer Site

As you may have noticed, we’ve just relaunched the Developer site (the very one you’re reading right now!) with a brand new look and feel!

We’ve rebranded the site to match the overall aesthetic as well as to align with the new user management and insights sections we launched just a few weeks ago.


The goal of the redesign was not only to modernize the site but make it easier for you, our partners and third-party developers to find the information you are looking for. In addition, we’ve reviewed all of our existing documentation and past blog posts to make sure the information is accurate and relevant.

Over the next few months, you’ll see more updates to the site and more frequent blog posts from our team.

I’d personally like to thank the team that worked on the relaunch with me: Raanan, Kelly, Kat, Justin, and Stephane.

If you’d like to let us know what you think of the new site, report a bug, or have suggestions for future improvements, please comment below, tweet at us @AutomatticEng or contact us privately.

An efficient alternative to paging with SQL OFFSETs


Running means having multimillion-record database tables. Tables which we often need to batch-query.

Provided we could hardly select (or update, etc) millions of records at once and expect speed, we commonly have to “page” our scripts to only handle a limited number of records at once, then move on to the next batch.

Classic, but inefficient, solution

The usual way of paging result sets in most SQL RDMS is to use the OFFSET option (or LIMIT [offset], [limit], which is the same).

SELECT * FROM my_table OFFSET 8000000 LIMIT 100;

But on a performance level, this means you’re asking your DB engine to figure out where to start from all on its own, every time. Which then means it must be aware of every record before the queried offset, because they could be different between queries (deletes, etc). So the higher your offset number, the longer the overall query will take.

Alternative solution

Instead, of keeping track of an offset in your query script, consider keeping track of the last record’s primary key in the previous result set instead. Say, its ID. At the next loop instance, query your table based on other records having a greater value for said ID.

SELECT * FROM my_table WHERE id > 7999999 LIMIT 100;

This will let you page in the same way, but your DB’s engine will know exactly where to start, based on an efficient indexed key, and won’t have to consider any of the records prior to your range. Which will all translate to speedy queries.

Here’s a real-life sample of how much difference this can make:

mysql> SELECT * FROM feeds LIMIT 8000000, 10;
10 rows in set (12.80 sec)

mysql> SELECT * FROM feeds WHERE feed_id > 12958559 LIMIT 10;
10 rows in set (0.01 sec)

I received the very same records back, but the first query took 12.80 seconds, while the alternative took 0.01 instead. 🙂

PHP/WordPress example

// Start with 0
$last_id = 0;

do {
    $blogs = $wpdb->get_results( $wpdb->prepare(
        'SELECT * FROM wp_blogs WHERE blog_id > %d LIMIT 100;',
        $last_id // Use the last ID to start after
    ) );

    foreach ( $blogs as $blog ) {
        // Do your thing!
        // ...
        // Record the last ID for the next loop
        $last_id = $blog->blog_id;
// Do it until we have no more records
} while ( ! empty( $blogs ) );

Like elasticsearch? We do too!

Elasticsearch tools

Elasticsearch, if you’re not familiar with it, is defined as a distributed restful search and analytics tool.

When it comes to implementing such an infrastructure, our developers not only face the challenges involved in indexing tens of millions of sites with grace and skill, they also write quite extensively about their related adventures, so others can benefit from their experiences.

You can find a plethora of posts on Greg Brown’s blog, under the appropriate tag. Subjects ranging from performance and scaling, all the way to “Elasticsearch, Open Source, and the Future“. And in true Automattician fashion, he isn’t even shy about recognizing his mistakes.

But Greg is not alone! Xiao Yu also recently wrote about the tools he uses, and a plugin he concocted for his own needs:

I’ve taken all that I wished I could do with both of those plugins and created a new Elasticsearch plugin that I call Whatson. This plugin utilizes the power of D3.js to visualize the nodes, indices, and shards within a cluster. It also allows the drilling down to segment data per index or shard. With the focus on visualizing large clusters and highlighting potential problems within. I hope this plugin helps others find and diagnose issues so give it a try.

How’s that for advanced? 🙂