I want to monitor some services such that, those services needs to restart when they goes down and I found an amazing tool monit. It works fine for Zookeeper since I got a condition like matching "QuorumPeerMain" as shown below in monitrc file
check process Zookeeper matching "QuorumPeerMain"
start program = "path/to/zkServer.sh start"
stop program = "path/to/zkServer.sh stop"
In the sameway, I want to monitor these : hadoop, yarn and hbase
check process Hadoop matching "?"
start program = "startorstop.sh start" #equivalent to start-dfs.sh
stop program = "startorstop.sh stop" #equivalent to stop-dfs.sh
What should be written in the place of ?
These are the questions
- In the hadoop case, there may be a chance any one of these going down
NameNode,DataNode,SecondaryNameNode. Monit Doc says that "The top-most matching parent with highest uptime is selected". For e.g., If DataNode goes down, it still considers NameNode and won't try to restarthadoop. Another option was using pid file and I am not able to find hadoop's pid file in/var/run/ - I want something like a top to bottom approach (not exactly). After starting
zookeeperonly, I want to start the remaining services likehbase,hadoopandyarn
CodePudding user response:
I got a way to start NameNode, DataNode, SecondaryNameNode independently using shell scripts i.e., hadoop-daemon.sh So in my monit conf NameNode looks like
Credits to @OneCricketeer for the comment, So that I can find a way to start these process independently
check process NameNode matching "NameNode"
start program = "startorstop.sh start" #hadoop-daemon.sh start namenode
stop program = "startorstop.sh stop" #hadoop-daemon.sh stop namenode
group hadoop
and for another part of my question, I got depends option. For more detail take a look here Service Dependencies
. In my case, I wanted to restart HRegionServer whenever DataNode goes down. So below conf works
check process HRegionServer matching "HRegionServer"
start program = "startorstop.sh start" #hbase-daemon.sh start regionserver
stop program = "startorstop.sh stop" #hbase-daemon.sh stop regionserver
depends on DataNode
check process DataNode matching "DataNode"
start program = "startorstop.sh start" #hbase-daemon.sh start datanode
stop program = "startorstop.sh stop" #hbase-daemon.sh stop datanode
