The following tables show all the fields available for each Elasticsearch “post
” document, due to the overwhelming number of fields they have been broken up by the following categories. (Please see the Document Schema section of the Elasticsearch at WordPress.com page for data type descriptions.)
- Post Info
- Post Language
- Post Author
- Post Content
- Extracted Information From Post Content
- Post Tags, Categories, and Taxonomies
- Post Interactions
- Post Dates
- Post Meta — VIP Indices Only
Post Info
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
site_id |
number | short | 1 for WordPress.com, 2 for Jetpack |
blog_id |
number | integer | |
post_id |
number | long | |
parent_post_id |
number | long | |
ancestor_post_ids |
number | long | |
sticky |
boolean | ||
menu_order |
number | integer | |
slug |
string | not analyzed | |
url |
string | analyzed | deprecated |
permalink.url |
string | not analyzed | deprecated |
permalink.url.analyzed |
string | analyzed | URL no protocol |
permalink.url.raw |
string | not analyzed | URL no protocol |
permalink.host |
string | not analyzed | |
permalink.reverse_host |
string | not analyzed | |
post_type |
string | not analyzed | |
post_format |
string | not analyzed | |
post_status |
string | not analyzed | |
has_password |
boolean | ||
public |
boolean | ||
featured_image |
string | not analyzed | URL no protocol |
featured_image_url.url |
string | not analyzed | deprecated |
featured_image_url.url.analyzed |
string | analyzed | URL no protocol |
featured_image_url.url.raw |
string | not analyzed | URL no protocol |
featured_image_url.host |
string | not analyzed | |
featured_image_url.reverse_host |
string | not analyzed | |
location |
geo | lat_lon, geohash |
Post Language
The post language is determined dynamically by detecting the language in the post title, content, and excerpt fields. If it is not possible to detect the post language then the fall back is the blog’s configured language.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
lang |
string | not analyzed | Two letter ISO 639 code |
Post Author
The post author is the WordPress.com user that authored the post. If it’s a Jetpack site and we are unable to determine the corresponding WordPress.com user the author_id
field will be set to 0
.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
author |
string | analyzed | WordPress.com display name |
author.raw |
string | not analyzed | WordPress.com display name |
author_login |
string | not analyzed | WordPress.com username |
author_id |
number | integer | WordPress.com user id |
Post Content
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
title |
string | analyzed | All HTML and shortcodes are stripped |
title.word_count |
number | token_count | Count of tokens as analyzed |
excerpt |
string | analyzed | All HTML and shortcodes are stripped |
excerpt.word_count |
number | token_count | Count of tokens as analyzed |
content |
string | analyzed | All HTML and shortcodes are stripped |
content.word_count |
number | token_count | Count of tokens as analyzed |
Extracted Information From Post Content
The shortcode fields are dynamic and the “[NAME]
” portion of the field name depends on the name of the shortcode extractred. In addition the name of each shortcode type that’s extracted is stored in the shortcode_types
field.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
has.embed |
number | short | Count of embeds in post |
has.hashtag |
number | short | Count of hashtags in post |
has.image |
number | short | Count of images in post |
has.link |
number | short | Count of links in post |
has.mention |
number | short | Count of mentions in post |
has.shortcode |
number | short | Count of shortcodes in post |
embed.url |
string | not analyzed | deprecated |
embed.url.analyzed |
string | analyzed | URL no protocol |
embed.url.raw |
string | not analyzed | URL no protocol |
embed.host |
string | not analyzed | Host part only (e.g. matt.wordpress.com) |
embed.host_reversed |
string | not analyzed | For efficient prefix searches (e.g. “com.wordpress.*”) |
hashtag.name |
string | not analyzed | |
image.url |
string | not analyzed | deprecated |
image.url.analyzed |
string | analyzed | URL no protocol |
image.url.raw |
string | not analyzed | URL no protocol |
image.host |
string | not analyzed | Host part only (e.g. matt.wordpress.com) |
image.host_reversed |
string | not analyzed | For efficient prefix searches (e.g. “com.wordpress.*”) |
link.url |
string | analyzed | deprecated |
link.url.analyzed |
string | analyzed | URL no protocol |
link.url.raw |
string | not analyzed | URL no protocol |
link.host |
string | not analyzed | Host part only (e.g. matt.wordpress.com) |
link.host_reversed |
string | not analyzed | For efficient prefix searches (e.g. “com.wordpress.*”) |
links_minus_images |
integer | number of links excluding links to images | |
mention.name |
string | not analyzed | |
mention.name.lc |
string | lowercased | |
shortcode_types |
string | not analyzed | List of shortcodes in this post |
shortcode.[NAME].id |
string | not analyzed | E.g. shortcode.youtube.id |
shortcode.[NAME].count |
number | short | E.g. shortcode.youtube.count |
Post Tags, Categories, and Taxonomies
The taxonomy fields are dynamic and the “[NAME]
” portion of the field name depends on the name of the post taxonomy.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
tag_cat_count |
number | short | Total number of tags and categories |
tag.name |
string | analyzed | |
tag.name.raw |
string | not analyzed | |
tag.name.raw_lc |
string | lowercased | |
tag.slug |
string | not analyzed | |
tag.term_id |
number | long | |
category.name |
string | analyzed | |
category.name.raw |
string | not analyzed | |
category.name.raw_lc |
string | lowercased | |
category.slug |
string | not analyzed | |
category.term_id |
number | long | |
taxonomy.[NAME].name |
string | analyzed | |
taxonomy.[NAME].name.raw |
string | not analyzed | |
taxonomy.[NAME].name.raw_lc |
string | lowercased | |
taxonomy.[NAME].slug |
string | not analyzed | |
taxonomy.[NAME].term_id |
number | long |
Post Interactions
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
like_count |
number | short | |
liker_ids |
number | integer | WordPress.com users that liked this post |
comment_count |
number | integer | |
commenter_ids |
number | integer | WordPress.com users that commented on this post |
is_reblogged |
boolean | Post contains reblogged content from another site | |
reblog_count |
number | long | Number of times this post was reblogged elsewhere |
reblogger_ids |
number | long | WordPress.com users that reblogged this post elsewhere |
Post Dates
Each the dates associated with the post is stored as both a date data type as well as broken out into token parts to make granular date based searches easier. For example, finding all posts that were published on a Tuesday (date_token.day_of_week
), or those that were modified in the second half of each hour (modified_token.seconds_from_hour
). The date data object takes dates in ISO 8601 format either with times (yyyy-MM-dd HH:mm:ss
) or without (yyyy-MM-dd
).
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
date |
date | ISO 8601 | |
date_token.year |
number | short | 4 digit year |
date_token.month |
number | byte | |
date_token.day |
number | byte | |
date_token.hour |
number | byte | 24 hour format |
date_token.minute |
number | byte | |
date_token.second |
number | byte | |
date_token.day_of_year |
number | short | The day of the year (starting from 0) |
date_token.day_of_week |
number | byte | 1 for Monday through 7 for Sunday |
date_token.week_of_year |
number | byte | Week number of year |
date_token.seconds_from_day |
number | integer | Seconds since midnight of day |
date_token.seconds_from_hour |
number | short | Seconds since start of hour |
date_gmt |
date | ISO 8601 | |
date_gmt_token.year |
number | short | 4 digit year |
date_gmt_token.month |
number | byte | |
date_gmt_token.day |
number | byte | |
date_gmt_token.hour |
number | byte | 24 hour format |
date_gmt_token.minute |
number | byte | |
date_gmt_token.second |
number | byte | |
date_gmt_token.day_of_year |
number | short | The day of the year (starting from 0) |
date_gmt_token.day_of_week |
number | byte | 1 for Monday through 7 for Sunday |
date_gmt_token.week_of_year |
number | byte | Week number of year |
date_gmt_token.seconds_from_day |
number | integer | Seconds since midnight of day |
date_gmt_token.seconds_from_hour |
number | short | Seconds since start of hour |
modified |
date | ISO 8601 | |
modified_token.year |
number | short | 4 digit year |
modified_token.month |
number | byte | |
modified_token.day |
number | byte | |
modified_token.hour |
number | byte | 24 hour format |
modified_token.minute |
number | byte | |
modified_token.second |
number | byte | |
modified_token.day_of_year |
number | short | The day of the year (starting from 0) |
modified_token.day_of_week |
number | byte | 1 for Monday through 7 for Sunday |
modified_token.week_of_year |
number | byte | Week number of year |
modified_token.seconds_from_day |
number | integer | Seconds since midnight of day |
modified_token.seconds_from_hour |
number | short | Seconds since start of hour |
modified_gmt |
date | ISO 8601 | |
modified_gmt_token.year |
number | short | 4 digit year |
modified_gmt_token.month |
number | byte | |
modified_gmt_token.day |
number | byte | |
modified_gmt_token.hour |
number | byte | 24 hour format |
modified_gmt_token.minute |
number | byte | |
modified_gmt_token.second |
number | byte | |
modified_gmt_token.day_of_year |
number | short | The day of the year (starting from 0) |
modified_gmt_token.day_of_week |
number | byte | 1 for Monday through 7 for Sunday |
modified_gmt_token.week_of_year |
number | byte | Week number of year |
modified_gmt_token.seconds_from_day |
number | integer | Seconds since midnight of day |
modified_gmt_token.seconds_from_hour |
number | short | Seconds since start of hour |
Post Meta — VIP Indices Only
Sites with the Elasticsearch VIP Add On will be indexed to a separate dedicated VIP cluster and also enables the indexing of post meta fields. The post meta fields are dynamic and the “[NAME]
” portion of the field name depends on the name (key) of the post meta being indexed. To accommodate advanced querying all post meta values are cast and indexed as numeric and boolean values in addition to being indexed as strings.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
meta.[NAME].value |
string | analyzed | |
meta.[NAME].value.raw |
string | not analyzed | |
meta.[NAME].value.raw_lc |
string | lowercased | |
meta.[NAME].long |
number | long | Value cast as 64bit integer (bigint) |
meta.[NAME].double |
number | double | Value cast as floating point number |
meta.[NAME].boolean |
boolean | Value cast as boolean value |
* Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.