We maintain Elasticsearch indices for all posts on WordPress.com and all posts synced through the Jetpack plugin.

These documents describe the index structures and APIs that are available for querying. These APIs power features that are available on WordPress.com VIP and in the Jetpack Professional Plan. The Jetpack Related Posts API also accepts Elasticsearch filters.

API Availability

There are three tiers of availability:

  • WordPress.com VIP: We can build a single index for your content (setup fee required). The code for building the index is open source. Often experimental features/fields or things we cannot scale to all sites will get implemented here first.
  • Jetpack Professional Plan: All Jetpack Pro sites (including all VIP sites) are indexed into our global Elasticsearch cluster and can be queried with Elasticsearch Query API calls. We do not support the entire ES Query DSL, but do support most of it.
  • Jetpack Free Plan: All sites are indexed to power the Related Posts API. The API accepts custom Elasticsearch filters.

Elasticsearch Versions and Backwards Compatibility

The Elasticsearch project is focused on continually improving its APIs and so regularly breaks backwards compatibility. Our APIs strive to maintain backwards compatibility and so there are a number of features that we choose not to support for various security or performance reasons. For example, all scripting is disabled in our API. We also do not try and use the very latest version of Elasticsearch because we are committed to backwards compatibility.

Our query API looks mostly like the Elasticsearch Search API from version 2.4 of Elasticsearch. We do however accept some deprecated features and perform some query rewriting to maintain compatibility. A good example is when we moved from ES 1.x to 2.x all content had to be migrated from being stored in the “content” field with multiple analyzers to being stored in a separate field for each language analyzer: “content.en”, “content.fr”, “content.default”, etc. We rewrite queries that still use the “content” field.

Existing Libraries

There likely will be times where we have to break backwards compatibility (in hopefully minor ways). The best way to ensure your application will not run into these issues is to use one of the existing libraries or filters rather than writing a completely custom search query:

Because code makes the best documentation, how the mappings and documents are built for our indices is mostly available in WPES-Lib. Data.blog has the best description of how our real time indexing works.

Documents

We index all WordPress posts, pages, and custom post types as long as the post status is one of ‘publish’, ‘trash’, ‘pending’, ‘draft’, ‘future’, or ‘private’. Posts which are publicly available can be queried from our API unauthenticated. Authenticated requests will return all posts. In the case of Jetpack, any posts which are blocked from syncing will not be available in our index.

See the post document schema for details on all fields in the index.

Search API

There are slightly different limitations on the APIs depending on which tier of service you have (WordPress.com VIP, Jetpack Pro, or Jetpack Related Posts). Depending on the level of service different features of the Elasticsearch Query DSL are available.

See the query API for details and limitations of available queries, filters, and aggregations.

 

* Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s