The following tables show all the fields available for each Elasticsearch “post” document, due to the overwhelming number of fields they have been broken up by the following categories. (Please see the Document Schema section of the Elasticsearch at WordPress.com page for data type descriptions.)

Post Info

Field Name Data Type Type Details Notes
site_id number short 1 for WordPress.com, 2 for Jetpack
blog_id number integer
post_id number long
parent_post_id number long
ancestor_post_ids number long
sticky boolean
menu_order number integer
slug string not analyzed
url string analyzed URL no protocol
url.raw string not analyzed URL no protocol
post_type string not analyzed
post_format string not analyzed
post_status string not analyzed
has_password boolean
public boolean
featured_image string analyzed URL no protocol
location geo lat_lon, geohash

Post Language

The post language is determined dynamically by detecting the language in the post title, content, and excerpt fields. If it is not possible to detect the post language then the fall back is the blog’s configured language.

Field Name Data Type Type Details Notes
lang string not analyzed Two letter ISO 639 code
lang_analyzer string not analyzed Name of language specific ES analyzer

Post Author

The post author is the WordPress.com user that authored the post. If it’s a Jetpack site and we are unable to determine the corresponding WordPress.com user the author_id field will be set to 0.

Field Name Data Type Type Details Notes
author string analyzed WordPress.com display name
author.raw string not analyzed WordPress.com display name
author_login string not analyzed WordPress.com username
author_id number integer WordPress.com user id

Post Content

Field Name Data Type Type Details Notes
title string analyzed All HTML and shortcodes are stripped
title.word_count number token_count Count of tokens as analyzed
excerpt string analyzed All HTML and shortcodes are stripped
excerpt.word_count number token_count Count of tokens as analyzed
content string analyzed All HTML and shortcodes are stripped
content.word_count number token_count Count of tokens as analyzed

Extracted Information From Post Content

The shortcode fields are dynamic and the “[NAME]” portion of the field name depends on the name of the shortcode extractred. In addition the name of each shortcode type that’s extracted is stored in the shortcode_types field.

Field Name Data Type Type Details Notes
has.embed number short Count of embeds in post
has.hashtag number short Count of hashtags in post
has.image number short Count of images in post
has.link number short Count of links in post
has.mention number short Count of mentions in post
has.shortcode number short Count of shortcodes in post
embed.url string not analyzed URL no protocol
hashtag.name string not analyzed
image.url string not analyzed URL with protocol
link.url string analyzed URL no protocol
link.url.raw string not analyzed URL no protocol
link.host string not analyzed Host part only (e.g. matt.wordpress.com)
link.host_reversed string not analyzed For efficient prefix searches (e.g. “com.wordpress.*”)
mention.name string not analyzed
mention.name.lc string lowercased
shortcode_types string not analyzed List of shortcodes in this post
shortcode.[NAME].id string not analyzed E.g. shortcode.youtube.id
shortcode.[NAME].count number short E.g. shortcode.youtube.count

Post Tags, Categories, and Taxonomies

The taxonomy fields are dynamic and the “[NAME]” portion of the field name depends on the name of the post taxonomy.

Field Name Data Type Type Details Notes
tag_cat_count number short Total number of tags and categories
tag.name string analyzed
tag.name.raw string not analyzed
tag.name.raw_lc string lowercased
tag.slug string not analyzed
tag.term_id number long
category.name string analyzed
category.name.raw string not analyzed
category.name.raw_lc string lowercased
category.slug string not analyzed
category.term_id number long
taxonomy.[NAME].name string analyzed
taxonomy.[NAME].name.raw string not analyzed
taxonomy.[NAME].name.raw_lc string lowercased
taxonomy.[NAME].slug string not analyzed
taxonomy.[NAME].term_id number long

Post Interactions

Field Name Data Type Type Details Notes
like_count number short
liker_ids number integer WordPress.com users that liked this post
comment_count number integer
commenter_ids number integer WordPress.com users that commented on this post
is_reblogged boolean Post contains reblogged content from another site
reblog_count number long Number of times this post was reblogged elsewhere
reblogger_ids number long WordPress.com users that reblogged this post elsewhere

Post Dates

Each the dates associated with the post is stored as both a date data type as well as broken out into token parts to make granular date based searches easier. For example, finding all posts that were published on a Tuesday (date_token.day_of_week), or those that were modified in the second half of each hour (modified_token.seconds_from_hour). The date data object takes dates in ISO 8601 format either with times (yyyy-MM-dd HH:mm:ss) or without (yyyy-MM-dd).

Field Name Data Type Type Details Notes
date date ISO 8601
date_token.year number short 4 digit year
date_token.month number byte
date_token.day number byte
date_token.hour number byte 24 hour format
date_token.minute number byte
date_token.second number byte
date_token.day_of_year number short The day of the year (starting from 0)
date_token.day_of_week number byte 1 for Monday through 7 for Sunday
date_token.week_of_year number byte Week number of year
date_token.seconds_from_day number integer Seconds since midnight of day
date_token.seconds_from_hour number short Seconds since start of hour
date_gmt date ISO 8601
date_gmt_token.year number short 4 digit year
date_gmt_token.month number byte
date_gmt_token.day number byte
date_gmt_token.hour number byte 24 hour format
date_gmt_token.minute number byte
date_gmt_token.second number byte
date_gmt_token.day_of_year number short The day of the year (starting from 0)
date_gmt_token.day_of_week number byte 1 for Monday through 7 for Sunday
date_gmt_token.week_of_year number byte Week number of year
date_gmt_token.seconds_from_day number integer Seconds since midnight of day
date_gmt_token.seconds_from_hour number short Seconds since start of hour
modified date ISO 8601
modified_token.year number short 4 digit year
modified_token.month number byte
modified_token.day number byte
modified_token.hour number byte 24 hour format
modified_token.minute number byte
modified_token.second number byte
modified_token.day_of_year number short The day of the year (starting from 0)
modified_token.day_of_week number byte 1 for Monday through 7 for Sunday
modified_token.week_of_year number byte Week number of year
modified_token.seconds_from_day number integer Seconds since midnight of day
modified_token.seconds_from_hour number short Seconds since start of hour
modified_gmt date ISO 8601
modified_gmt_token.year number short 4 digit year
modified_gmt_token.month number byte
modified_gmt_token.day number byte
modified_gmt_token.hour number byte 24 hour format
modified_gmt_token.minute number byte
modified_gmt_token.second number byte
modified_gmt_token.day_of_year number short The day of the year (starting from 0)
modified_gmt_token.day_of_week number byte 1 for Monday through 7 for Sunday
modified_gmt_token.week_of_year number byte Week number of year
modified_gmt_token.seconds_from_day number integer Seconds since midnight of day
modified_gmt_token.seconds_from_hour number short Seconds since start of hour

Post Meta — VIP Indices Only

Sites with the Elasticsearch VIP Add On will be indexed to a separate dedicated VIP cluster and also enables the indexing of post meta fields. The post meta fields are dynamic and the “[NAME]” portion of the field name depends on the name (key) of the post meta being indexed. To accommodate advanced querying all post meta values are cast and indexed as numeric and boolean values in addition to being indexed as strings.

Field Name Data Type Type Details Notes
meta.[NAME].value string analyzed
meta.[NAME].value.raw string not analyzed
meta.[NAME].value.raw_lc string lowercased
meta.[NAME].long number long Value cast as 64bit integer (bigint)
meta.[NAME].double number double Value cast as floating point number
meta.[NAME].boolean boolean Value cast as boolean value

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s