Monitoring FSx for ONTAP file systems using Harvest and Grafana
NetApp Harvest is an open source tool for gathering performance and capacity metrics from ONTAP systems, and is compatible with FSx for ONTAP. You can use Harvest with Grafana for an open source monitoring solution.
Getting started with Harvest and Grafana
The following section details how you can set up and configure Harvest and Grafana to measure your FSx for ONTAP file system’s performance and storage capacity utilization.
You can monitor your Amazon FSx for NetApp ONTAP file system by using Harvest and Grafana. NetApp Harvest monitors ONTAP data centers by collecting performance, capacity, and hardware metrics from FSx for ONTAP file systems. Grafana provides a dashboard where the collected Harvest metrics can be displayed.
Supported Harvest dashboards
Amazon FSx for NetApp ONTAP exposes a different set of metrics than does on-premises NetApp ONTAP. Therefore,
only the following out-of-the-box Harvest dashboards tagged with fsx
are currently supported for use with FSx for ONTAP.
Some of the panels in these dashboards may be missing information that is not supported.
ONTAP: Compliance
ONTAP: Data Protection Snapshots
ONTAP: Security
ONTAP: SVM
ONTAP: Volume
AWS CloudFormation template
To get started, you can deploy an AWS CloudFormation template that automatically launches an
Amazon EC2 instance running Harvest and Grafana. As an input to the AWS CloudFormation template, you
specify the fsxadmin
user and the Amazon FSx management endpoint for the
file system which will be added as part of this deployment. After the deployment is
completed, you can log in to the Grafana dashboard to monitor your file system.
This solution uses AWS CloudFormation to automate the deployment of the Harvest and Grafana
solution. The template creates an Amazon EC2 Linux instance and installs Harvest and
Grafana software. To use this solution, download the fsx-ontap-harvest-grafana.template
Note
Implementing this solution incurs billing for the associated AWS services. For more information, see the pricing details pages for those services.
Amazon EC2 instance types
When configuring the template, you provide the Amazon EC2 instance type. NetApp's recommendation for the instance size depends on how many file systems you monitor and the number of metrics you choose to collect. With the default configuration, for each 10 file systems you monitor, NetApp recommends:
CPU: 2 cores
Memory: 1 GB
Disk: 500 MB (mostly used by log files)
Following are some sample configurations and the t3
instance type
you might choose.
File systems | CPU | Disk | Instance type |
---|---|---|---|
Under 10 |
2 cores |
500 MB |
|
10–40 |
4 cores |
1000 MB |
|
40+ |
8 cores |
2000 MB |
|
For more information on Amazon EC2 instance types, see General purpose instances in the Amazon EC2 User Guide.
Instance port rules
When you set up your Amazon EC2 instance, make sure that ports 3000 and 9090 are open for inbound traffic for the security group that the Amazon EC2 Harvest and Grafana instance is in. Because the instance that is launched connects to an endpoint over HTTPS, it needs to resolve the endpoint, which needs port 53 TCP/UDP for DNS. Additionally, to reach the endpoint it needs port 443 TCP for HTTPS and Internet Access.
Deployment procedure
The following procedure configures and deploys the Harvest/Grafana solution. It takes about five minutes to deploy. Before you start, you must have an FSx for ONTAP file system running in an Amazon Virtual Private Cloud (Amazon VPC) in your AWS account, and the parameter information for the template listed below. For more information on creating a file system, see Creating file systems.
To launch the Harvest/Grafana solution stack
-
Download the fsx-ontap-harvest-grafana.template
AWS CloudFormation template. For more information on creating an AWS CloudFormation stack, see Creating a stack on the AWS CloudFormation console in the AWS CloudFormation User Guide. Note
By default, this template launches in the US East (N. Virginia) AWS Region. You must launch this solution in an AWS Region where Amazon FSx is available. For more information, see Amazon FSx endpoints and quotas in the AWS General Reference.
-
For Parameters, review the parameters for the template and modify them for the needs of your file system. This solution uses the following default values.
Parameter Default Description InstanceType t3.micro
The Amazon EC2 instance type. Following are the
t3
instance types.t3.micro
t3.small
t3.medium
t3.large
t3.xlarge
t3.2xlarge
For the complete list of allowed Amazon EC2 instance type values for this parameter, see the fsx-ontap-harvest-grafana.template.
KeyPair No default value The key pair that is used to access the Amazon EC2 instance. SecurityGroup No default value The Security group ID for the Harvest/Grafana Instance. Ensure Inbound ports 3000 and 9090, in addition to ports 53 and 443, are open from the clients you wish to use to access your Grafana dashboard. Subnet Type No default value Specify the subnet type, either public
orprivate
. Use apublic
subnet for resources that must be connected to the internet, and a private subnet for resources that won't be connected to the internet. For more information, see Subnet types in the Amazon VPC User Guide.Subnet No default value Specify the same subnet as your Amazon FSx for NetApp ONTAP file system's preferred subnet. You can find the file system's Preferred subnet ID in the Amazon FSx console, in the Network & security tab of the FSx for ONTAP file system details page LatestLinuxAmiId /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2
The latest version of the Amazon Linux 2 AMI in a given AWS Region. FSxEndPoint No default value The file system's Management endpoint IP address. You can find the file system's management endpoint IP address in the Amazon FSx console, in the Administration tab of the FSx for ONTAP file system details page. SecretName No default value AWS Secrets Manager secret name containing the password for the file system's fsxadmin
user. This is the password you provided when you created the file system. -
Choose Next.
-
For Options, choose Next.
-
For Review, review and confirm the settings. You must select the check box acknowledging that the template create IAM resources.
-
Choose Create to deploy the stack.
You can view the status of the stack in the AWS CloudFormation console in the Status column. You should see a status of CREATE_COMPLETE in about five minutes.
Logging in to Grafana
After the deployment has finished, use your browser to log in to the Grafana dashboard at the IP and port 3000 of the Amazon EC2 instance:
http://EC2_instance_IP
:3000
When prompted, use the Grafana default user name (admin
) and password (pass
).
We recommend that you change your password as soon as you log in.
For more information, see the
NetApp Harvest
Troubleshooting Harvest and Grafana
If you are encountering any data missing mentioned in Harvest and Grafana dashboards or are having trouble setting up Harvest and Grafana with FSx for ONTAP, check the following topics for a potential solution.
SVM and volume dashboards are blank
If the AWS CloudFormation stack deployed successfully and can contact Grafana but the SVM and volume dashboards are blank, use the following procedure to troubleshoot your environment. You will need SSH access to the Amazon EC2 instance that Harvest and Grafana is deployed on.
SSH into the Amazon EC2 instance that your Harvest and Grafana clients are running on.
[~]$
ssh ec2-user@ec2_ip_address
Use the following command to open the
harvest.yml
file and:Verify that an entry was created for your FSx for ONTAP instance as
Cluster-2
.Verify that the entries for username and password match your
fsxadmin
credentials.
[ec2-user@ip-
ec2_ip_address
~]$sudo cat /home/ec2-user/harvest_install/harvest/harvest.yml
-
If the password field is blank, open the file in an editor and update it with the
fsxadmin
password, as follows:[ec2-user@ip-
ec2_ip_address
~]$sudo vi /home/ec2-user/harvest_install/harvest/harvest.yml
Ensure the
fsxadmin
user credentials are stored in Secrets Manager in the following format for any future deployments, replacing
with your password.fsxadmin_password
{"username" : "fsxadmin", "password" : "
fsxadmin_password
"}
CloudFormation stack rolled back after timeout
If you are unable to deploy the CloudFormation stack successfully and it is rolling back with errors, use the following procedure to resolve the issue. You will need SSH access to the EC2 instance deployed by the CloudFormation stack.
Redeploy the CloudFormation stack, making sure that automatic rollback is disabled.
-
SSH into the Amazon EC2 instance that your Harvest and Grafana clients are running on.
[~]$
ssh ec2-user@ec2_ip_address
-
Verfy that the docker containers were successfully started using the following command.
[ec2-user@ip-
ec2_ip_address
~]$sudo docker ps
In the response you should see five containers as follows:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6b9b3f2085ef rahulguptajss/harvest "bin/poller --config…" 8 minutes ago Restarting (1) 20 seconds ago harvest_cluster-2 3cf3e3623fde rahulguptajss/harvest "bin/poller --config…" 8 minutes ago Up About a minute harvest_cluster-1 708f3b7ef6f8 grafana/grafana "/run.sh" 8 minutes ago Up 8 minutes 0.0.0.0:3000->3000/tcp harvest_grafana 0febee61cab7 prom/alertmanager "/bin/alertmanager -…" 8 minutes ago Up 8 minutes 0.0.0.0:9093->9093/tcp harvest_prometheus_alertmanager 1706d8cd5a0c prom/prometheus "/bin/prometheus --c…" 8 minutes ago Up 8 minutes 0.0.0.0:9090->9090/tcp harvest_prometheus
If the docker containers are not running, check for failures in the
/var/log/cloud-init-output.log
file as follows.[ec2-user@ip-
ec2_ip_address
~]$sudo cat /var/log/cloud-init-output.log
PLAY [Manage Harvest] ********************************************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [Verify images] *********************************************************** failed: [localhost] (item=prom/prometheus) => {"ansible_loop_var": "item", "changed": false, "item": "prom/prometheus", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Co nnection reset by peer'))"} failed: [localhost] (item=prom/alertmanager) => {"ansible_loop_var": "item", "changed": false, "item": "prom/alertmanage r", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))"} failed: [localhost] (item=rahulguptajss/harvest) => {"ansible_loop_var": "item", "changed": false, "item": "rahulguptajs s/harvest", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetEr ror(104, 'Connection reset by peer'))"} failed: [localhost] (item=grafana/grafana) => {"ansible_loop_var": "item", "changed": false, "item": "grafana/grafana", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Co nnection reset by peer'))"} PLAY RECAP ********************************************************************* localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
If there are failures, execute the following commands to deploy the Harvest and Grafana containers.
[ec2-user@ip-
ec2_ip_address
~]$sudo su
[ec2-user@ip-
ec2_ip_address
~]$cd /home/ec2-user/harvest_install
[ec2-user@ip-
ec2_ip_address
~]$/usr/local/bin/ansible-playbook manage_harvest.yml
[ec2-user@ip-
ec2_ip_address
~]$/usr/local/bin/ansible-playbook manage_harvest.yml --tags api
Validate the containers started successfully by running sudo docker ps and connecting to your Harvest and Grafana URL.