Elasticsearch powers search and analytics for many applications, and any downtime can disrupt data querying and indexing. This policy helps prevent prolonged downtime by restarting stopped Elasticsearch services and alerting your team in real time.
Description
This policy monitors the Elasticsearch service on Linux devices tagged with “Search.” If the service stops, it triggers an automatic restart and sends a real-time alert. This ensures minimal disruption to indexing, querying, and search functionalities while giving your team visibility into service health.
Preview
Use Cases
Maintaining Elasticsearch uptime for production search engines.
Automating monitoring for data analytics pipelines.
Supporting distributed Elasticsearch clusters across multiple nodes.
Preventing downtime in applications reliant on Elasticsearch for logging or reporting.
Recommendations
Tagging: Apply the “Search” tag to all relevant devices. We recommend automatically tagging to avoid missing key devices. See “Service Based Tagging” automation as an example.
Testing: Stop the Elasticsearch service manually to validate automatic restarts and alerts.
Cluster Health: Use additional monitors for Elasticsearch cluster health (e.g., node availability, shard status).
Alert Management: Configure alerts to differentiate between minor disruptions and critical failures.
FAQ
Can this policy check cluster health? No, this policy monitors service status. Use other monitors for cluster-specific metrics.
What should I do if the service doesn’t restart? Check Elasticsearch logs for potential causes, such as resource exhaustion or configuration issues.
Is this compatible with all Elasticsearch versions? It targets the service status and should work across all versions, but verify restart commands for your specific setup.
Included with this Monitor:
Below is a list of what you can expect to find when importing this Monitor.
Script details:
The following data and settings will be imported with your script.