State of WordPress.com Elasticsearch Systems 2016

We get asked periodically about how extensively we are using Elasticsearch. And it has come up twice in the past week, so time to write a blog post.

We are constantly expanding what we are using Elasticsearch for and so although some previous posts have broadly define what we are doing, they don’t really capture the continually expanding scale.

So here are some quick bullet points about what we currently have deployed:

  • Five clusters with a mix of versions:
    • 42 data nodes spread across 3 US data centers running ES 1.3.4. This cluster mostly runs related posts queries. 1925 shards. 11B docs. 43TB of data. 60m queries/day. 12m index ops/day (has been as high as 940m in a day though). Each index is 175 shards and has 10m blogs in it. Each blog is routed to a single shard so almost all queries only hit one shard, but we can (and do) search across multiple shards for some use cases.
    • 6 data nodes across 3 DCs running ES 1.3.9. Hosts our WordPress.com VIP indices and lots of other use cases. 321 indices (mostly VIPs). ~8m queries/day. ~1.5m index ops/day. Typical VIP index config is a single shard that is replicated across the three data centers. Most of these indices are small enough that sharding would reduce performance and reduce query relevancy.
    • 12 data nodes across 3 DCs running ES 1.7.5. Primarily powers search.wordpress.com. Indexes the past 6 quarters of all posts. One index per quarter with 30 shards per quarter. Queries typically hit all 180 shards.
    • 3 data nodes across 3 DCs running ES 2.3.1. Currently an experimental cluster as we work to migrate to 2.x. Only production index right now is for en.support.wordpress.com.
    • 15 (and possibly expanding to 100) data nodes for a Logstash cluster running ES 2.3. A lot of logging use cases for many different services. Growing rapidly.
  • All of our clusters use three dedicated master nodes with one master in each data center. The first cluster has its own master nodes. The next three share master servers with multiple instances of ES running on each server.
  • Typical data server config:
    • 96GB RAM with 31GB for ES heap. Remaining gets used for file system caching
    • 1-3 TB of SSD per server. In our testing SSDs are very worthwhile.
  • Query speed:
    • Related Posts: median 44ms; 95th percentile: 190ms; 99th percentile: 650ms. This is way lower than when we launched in 2013 and 99th percentile was 1.7 seconds.
    • VIP Queries: median: 25ms; 95th percentile: 109ms; 99th percentile: 311ms
    • search.wordpress.com queries: median: 130ms; 95th percentile: 250ms; 99th percentile: 260ms
  • Client-side Optimizations:
    • We cache all queries results in memcache which cuts our ES query rate in half
    • memcache timeouts vary from 30 seconds to 36 hours depending on use case
    • We analyze all queries on the client side and optimize the ES filters:
      • have a blacklist of fields that we never cache (blog_id, post_id, author_id) because they have such high cardinality (100m+ unique ids)
      • we rewrite and/or/not filters into bool queries and try to flatten them into a single filter
      • We don’t allow some types of queries (we have a whitelist)
      • We don’t allow facets/aggregations on certain fields (content, title, excerpt)
    • We generally don’t allow paging too deep or returning thousands of results at once
    • A general pattern we use is to use ES to get IDs for content, and then we get the real content from MySQL for displaying to users. This reduces what data ES needs (we strip out HTML), and we can be certain the data is not out of date since ES can be up to 60 seconds out of date in some cases (though typically is less than 5 seconds).
  • Query Use Cases (in order of query frequency):
    • Related Posts
    • Replacing WP_Query calls by converting slow SQL calls to an ES query (WordPress tag/category pages, home pages, etc)
    • search.wordpress.com
    • Language Detection using ES langdetect plugin (used for every post we index)
    • Analyze API (used to perform reliable word counting regardless of language – in conjunction with the langdetect call)
    • Blog Search (replacing the built in WordPress site search)
    • Theme Search
    • Search Queries that are used when reindexing content (eg when a blog’s tag is renamed we need to search for all posts with that tag and reindex them)
    • Various support searches
    • A number of custom VIP use cases
    • A number of custom internal use cases (searching our p2s, suggesting posts that may be relevant to read, searching our internal docs, etc)
    • Calypso /posts and /pages for getting/searching all posts a user has authored across all their blogs (potentially hundreds)
  • ES Plugins Deployed:
    • Whatson for looking at shard distribution, disk usage, index size, etc
    • StatsD for performance monitoring (we also send StatsD data from the client about query speed) See the screenshots of dashboards below.
    • ICU Analysis
    • Langdetect
    • SmartCN and Kuromoji Analyzers
    • Head

 

Since images are always fun, here are our Graphana dashboards for our largest cluster over the past 6 hours. The first is our client-side tracking of query/indexing/etc speed

Screen Shot

Second is our aggregated stats (from the StatsD plugin) about the cluster’s performance:

Screen Shot

This cluster/index has been really solid for us over the past two years since it was last built. We have some known issues that have us stuck on 1.3.4, but we’ve also had times where the cluster went many months without any incidents. In general the incidents we have seen have been caused by external factors (usually over indexing or some other growth in the data).

 

 

 

Open Source Flow Collecting with Elastic, Logstash, and Kibana

Today, most open source network flow tools lack a flexible and easy to use interface. Using Logstash’s built-in netflow codec, Kibana’s great looking and powerful web interface, and the flexibility of Elastic, you can build a tool that rivals commercial flow-collecting products. Continue reading

What’s new in Calypso?

December Edition

It’s been almost a month since we released Calypso and the response has been great from the community. For those following the project more closely, we’ll be publishing summaries on new developments, focusing on framework-level improvements, new components, and the tools contributors have to work with.

Making Calypso more welcoming for developers and designers

If you install Calypso locally and point your browser to calypso.localhost:3000/devdocs/welcome we have a new developer flow that introduces you to our documentation. You can access the docs anytime from the environment badge in the lower left corner, which also highlights the git branch you are in:

Framework

  • Upgraded to Node v4 and React 0.14.
  • Started implementing Redux, a state container solution to manage data flow in the application. If you are interested in this, we are gradually moving our different data modules to Redux.
  • We refined and documented our approach to components.
  • Began exploring how pluggable modules could work in Calypso.
  • Continued migration to use svg icons everywhere instead of the old icon font. We added a few new gridicons as well.

New components and updates

Components are the building blocks of the Calypso UI. We constantly refine them and build new ones, from simple user interface ones to those carrying more complex functionality. This allows us to craft interfaces that are consistent and rich. You can check out all of these if you go to calypso.localhost:3000/devdocs/design, our live components gallery. These are some of the updates we did this month.

Button

We added a borderless variation for one of our most used components.

SelectDropdown

Added a compact variation.

Interval

A utility component, primarily meant for setting up a poller interface wrapping another component.

SitesDropdown

A dropdown component for selecting a site, which includes instant search, handling of private sites.

Cloudup F0CE7CA7 3367 4AC1 88F9 E4CEE8790D4D

FeatureExample

A component that renders other mocked components with a faded effect to illustrate a section when for some reasons the user cannot access it.

Notices

Consolidated Notices into a single component in components/notice. Also added a new compact variation with flex-box magic for narrow layouts:

Site

The core component to display a site-card now support a homeLink prop which turns it into a link to the homepage of the site and renders the following icon on hover:

Draft

Component used to render individual post items. Now supports a “selected” prop to highlight a single draft in a list. (Used in the editor.)

FoldableCard

Now also support custom icons.

Stay tuned to this blog for upcoming Calypso news and updates.

WordPress.com Desktop App Goes Open Source, Linux App Arrives

We are proud to announce the full open sourcing of the WordPress.com desktop app. You can access the source and documentation on GitHub at the automattic/wp-desktop repository.

The core application Calypso was released as open source a few weeks ago, and now the work that went into building the desktop applications using Electron is available as well. Continue reading

The Story Behind the New WordPress.com

A little over a year and a half ago, we had a dramatic rethink of the technologies and development workflows for building with WordPress.

Our existing codebase and workflows had served us well, but ten years of legacy was beginning to seriously hinder us from building the modern, fast, and mobile-friendly experiences that our users expect. It seemed like collaboration between developers and designers was not firing on all cylinders. So we asked ourselves the question:

“What would WordPress.com look like if we were to start building it today?”

A New Beginning: Prototyping and Iterating

We’d asked ourselves this question before, and had our fair share of initiatives that didn’t result in useful change. Looking back, we were able to pinpoint our biggest mistakes: we’d been starting with a muddy vision, and were trying to solve an ill-defined problem. These insights really helped us change our approach.

proto

One of the original Calypso prototype screens, listing all of your WordPress sites.

Calypso, the codename for this new WordPress admin interface project, started differently. To present a clear vision, we built an aspirational HTML/CSS design prototype — based on clearly defined product goals — that allowed us to imagine what a new WordPress.com could look like when complete. We knew it would change over time as we launched parts to our users, but the vision provided all of Automattic with something to aim for and get excited about.

Once the Calypso prototype was in a good place, the early days of development were all about making tough decisions such as which language to use, whether to use a framework, and how we would extend our API. Automattic had just acquired Cloudup, an API-powered file-sharing tool built with JavaScript. The Cloudup team showed us a solid, maintainable, and scalable path towards making WordPress.com completely JavaScript-based and API-powered.

Since WordPress is a PHP-powered application, our company-wide development skill-set has historically been PHP-heavy with a sprinkling of advanced JavaScript. This made Calypso intimidating to other engineers and designers at the company for much of the first six months of its development — we were building something that few people could jump in on.

Even core Calypso project team members had to get over our intimidation. None of us were strong JavaScript developers. But as each day passed our experience built, we made mistakes, we reviewed them, we fixed them, and we learned. Once we had the project moving, we set better examples for other engineers, and shared our knowledge across the company.

One great change came out of building an early design prototype: improved collaboration using GitHub. Calypso prototyping was done collaboratively between a handful of designers in GitHub; although many of us had long used GitHub for personal projects it was relatively new for internal projects, which historically used Trac for most project management and bug tracking. Using GitHub helped us see how much easier internal collaboration could be, and how to allow for much greater feedback on individual work being done.

prs

Peer code reviews show no sign of slowing up and are now widely accepted.

As GitHub had worked so well for the prototyping stage we switched for all Calypso development, allowing us to harness the pull request (PR) system for peer code reviews, and build our own custom GitHub-based workflow. Code reviews were new for many developers — traditionally at Automattic, we have had no systematic peer code review system outside of the VIP team’s daily code review of client sites. Code review, though it initially added to the intimidation of starting to work with Calypso, greatly increased the quality of our codebase and helped everyone level up their JavaScript skills.

What started as a team of seven people working on Calypso quickly spread to a cross-section of teams with ten, then 14, then 20 Automatticians actively working in the Calypso codebase. Two months after the launch of the first Calypso-powered feature on WordPress.com, we had 40 contributors working on Calypso across five different teams. We iterated over the next year with the “release early, release often” Automattic mindset, launching 40 distinct Calypso-powered features on WordPress.com with over 100 individual contributors.

By the middle of 2015 the Calypso codebase was in good enough shape to be used outside of the web browser. Since Calypso is entirely JavaScript, HTML, and CSS, it can run locally on a device with a lightweight node.js server setup. Using a technology called Electron, we built native desktop clients running the same code bundled inside the applications. We started work first on a native Mac desktop app, which is now available, and continued that work on soon-to-be-launched Windows and Linux apps. Seeing these apps come together and using them internally really started to justify all the hard work we’d spent building the Calypso codebase.

Open Sourcing Calypso, the Power Behind WordPress.com

One of our Calypso developer hangouts in progress, and Team IO, who built the Calypso editor, at our all-company Grand Meetup in October.

Over the past year and a half, Calypso has gone from an idea to an aspirational prototype to a fully functioning product built, launched iteratively, and used by millions of WordPress.com users. Internally, it’s been a period of great change and growth. We’ve embraced cross-team collaboration through GitHub and peer code reviews through the PR review system, gone from just a couple of great JavaScript developers to a company full of them, and seen incredible collaboration between designers and developers on a daily basis.

Whats-New-WPcom@2x

A handy chart to show the differences between the old and new WordPress.com. (pdf, img)

We’re proud to be able to open source all of the hard work we’ve put in, and to continue to build on the product in an open way. You can read more about opening up Calypso development on our CEO Matt Mullenweg’s site.

Over the next few months, we’ll publish more in-depth posts exploring the technicals and workflows behind Calypso: how we manage our own unique GitHub flows, how we’ve used other popular open source libraries like React and concepts like Flux, and our experiences bundling and launching native app clients. Keep an eye out for those by following this blog (in the bottom right), and in the meantime, check out the active Calypso codebase as we continue to iterate on it.

0939030c354e4efefe655fa5107fd888Andy Peatling
Calypso Project Lead

oEmbed Updates

Several years ago, WordPress.com introduced oEmbed provider support to allow posts on WordPress.com-hosted blogs to be embedded anywhere that supported oEmbed. WordPress 4.4, due out in December, will bring oEmbed provider support to the wider WordPress world.

One week from today, on October 2, the WordPress.com team will be switching our oEmbed format to match the global WordPress format. The oEmbed spec allows for arbitrary changes in the HTML being returned, but since this is a significant change we wanted to give everyone ample lead time.

Where the current HTML format included some post content and a bit of markup, the new format is more in the style of Facebook or Twitter embeds — the content is now inside of an iframe, with a small bit of JavaScript.

We understand that not everyone is comfortable with embedding arbitrary scripts on their page. If you wish, you may strip the script tag and provide the same functionality using the script from the development plugin. If you choose this method, please keep in mind that this script is still under development, and will likely change between now and WordPress 4.4’s release.

In order to test the new embeds, simply add the query string ?newembed=true to the URL of any post hosted on WordPress.com, like so. Similarly, you can get the output from our oEmbed endpoint by adding &newembed=true to the end of the request. We’re still working on adding support for this new style of embed to the WordPress.com post editor, but you can test it on your own WordPress install using the feature plugin.

If you run into any issues or have any questions, please post them in the comments below!

Update: September 25, 2015 16:00 UTC — We are temporarily reverting this change. Please see the comment below. We’ll update this post when the new oEmbed is re-enabled.

WordPress.com Social Reciprocity Visualization Challenge

Every day, millions of people connect with ideas, photos, and other content on WordPress.com. Here at Automattic, we take pride in enabling this interaction, and continually strive to make the WordPress.com platform better for users.

Our data science team examines these user interactions, and aims to develop our insights into user facing features and tools. With this challenge, we decided to open up some of our work and share with the community some of the questions we are excited about.

On our platform, and across the Web, the question of social reciprocity is one of the most interesting. How does platform design, user content, and social activity combine and affect user engagement?

We are running this visualization challenge on the Databits.io platform, where we’re inviting anyone who’s obsessed with data like we are to come up with some interesting visualizations of the following scenarios. We’re offering a $1,000 prize for the best one!

Two ideas we’d love to see explored are

  • User-to-user social reciprocity. The provided data is sufficiently rich to explore the dynamics of user-to-user social interactions. Are there compelling stories we can tell about how individual users react to other users’ actions on the platform, temporally? How does blog posting and the kind of blogging content enter the picture?
  • User-to-community social reciprocity. There are actions that users send to the broader WordPress community and also records of the community generating social interactions on the users’ blogs. On the scale of user to community interaction, are there patterns that can help understand social reciprocity? Does the interaction depend on blog posting? What are the temporal dynamics?

Read more about the data, the challenge, and the prizes being offered for the best visualizations over at Databits.io.

If you love data like we do, consider joining our team! We’re currently hiring Data Wranglers.