Start with happybase on Mac

JDK

You have Java and the latest JDK on your OS X system.

1
brew cask install Java

Install and config hadoop

Install

1
brew install hadoop

Config

Edit hadoop-env.sh

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh where 2.7.3 is the hadoop version.

Find the line with

1
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

and change it to

1
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Edit Core-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/core-site.xml .

Edit configuration at last.

1
2
3
4
5
6
7
8
9
10
11
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Edit hdfs-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/mapred-site.xml and by default will be blank. You can copy and change mapred-site.xml.template.

1
2
3
4
5
6
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>

Edit hdfs-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hdfs-site.xml .

Edit configuration at last

1
2
3
4
5
6
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Alias

To simplify life edit your ~/.bash_profile using vim and add the following two commands.

1
2
alias hstart="/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh"

Edit your ~/.bashrc or ~/.zshrc using vim and add the following two commands

1
. ~/.bash_profile

and excute

1
2
$ source ~/.bashrc
$ source ~/.zshrc

in the terminal to update.
Before we can run Hadoop we first need to format the HDFS using

1
$ hdfs namenode -format

SSH Localhost

Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using

1
$ ssh-keygen -t rsa

Enable Remote Login

“System Preferences” -> “Sharing”. Check “Remote Login”

Authorize SSH Keys

To allow your system to accept login, we have to make it aware of the keys that will be used

1
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Let’s try to login.

1
2
3
$ ssh localhost
Last login: Fri Mar 6 20:30:53 2015
$ exit

Running Hadoop

Now we can run Hadoop just by typing

1
$ hstart

and stopping using

1
$ hstop

Good to know

We can access the Hadoop web interface by connecting to

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088
Specific Node Information: http://localhost:8042
Command
$ hstart
$ jps
32384 NameNode
32712 ResourceManager
32587 SecondaryNameNode
32859 Jps
32476 DataNode
32812 NodeManager
# hstop
$ yarn // For resource management more information than the web interface.
$ mapred // Detailed information about jobs

Install zookeeper

1
brew install zookeeper

Install hbase

Install

1
brew install hbase

Alias

To simplify life edit your ~/.bash_profile using vim and add the following two commands.

1
2
alias hbstart="/usr/local/Cellar/hbase/1.2.2/bin/start-hbase.sh"
alias hbstop="/usr/local/Cellar/hbase/1.2.2/bin/stop-hbase.sh"

Good to know

1
2
3
4
5
6
7
8
9
10
11
12
$ hstart
$ hbstart
$ jps
32384 NameNode
33024 Jps
32967 HMaster
32712 ResourceManager
32587 SecondaryNameNode
32476 DataNode
32812 NodeManager
$ hbstop
$ hstop

Install happybase

1
pip install happybase

Examplesfor happybase

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import happybase
connection = happybase.Connection('localhost',table_prefix='namespace')
# Starting with HBase 0.94, the Thrift server optionally uses a framed transport.
# table_prefix is similar to namespace
connection.create_table('data',{'p':dict()},)
connection.tables()
table = connection.table('data')
# input one row
table.put('o', {'p:label': '0', 'p:version': '201701', 'p:weight': '0.4'})
# input rows
with table.batch() as bat:
bat.put('1',{'p:label': '1', 'p:version': '201702', 'p:weight': '0.5'})
bat.put('2',{'p:label': '2', 'p:version': '201703', 'p:weight': '0.6'})
# scan datas stored by batch with timestamp
scan(row='2',timestamnp=201704) #must larger than stored timestamp
# scan whole table
for key, value in table.scan():
print key, value
# scan with timestamp
table.put('2', {'p:label': '233', 'p:version': '2017234', 'p:weight': '0.9'},timestamp=201704)
table.row('2',timestamp=201704)
# enable and disable tables
connection.enable_table('data')
connection.disable_table('data')
# delete table
connection.delete_table('data')

##Problems need to solve

Connect again

Need to coonect again after error or don’t use after second

1
2
connection = happybase.Connection('localhost')
table = connection.table('namespace_data')

TTransportException

1
2
3
4
5
6
7
8
/usr/local/lib/python2.7/site-packages/thriftpy/transport/socket.pyc in read(self, sz)
123 if len(buff) == 0:
124 raise TTransportException(type=TTransportException.END_OF_FILE,
--> 125 message='TSocket read 0 bytes')
126 return buff
127
TTransportException: TTransportException(message='TSocket read 0 bytes', type=4)

Broken pipe

1
2
3
4
5
6
7
8
/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.pyc in meth(name, self, *args)
226
227 def meth(name,self,*args):
--> 228 return getattr(self._sock,name)(*args)
229
230 for _m in _socketmethods:
error: [Errno 32] Broken pipe

Thrift closed

thrift closed selfly after one hour, need to config it.

1
2
3
4
5
6
7
8
9
10
11
12
13
2017-04-10 09:38:31,614 INFO [main] util.VersionInfo: HBase 1.2.2
2017-04-10 09:38:31,615 INFO [main] util.VersionInfo: Source code repository git://asf-dev/home/busbey/projects/hbase revision=3f671c1ead70d249ea4598f1bbcc5151322b3a13
2017-04-10 09:38:31,615 INFO [main] util.VersionInfo: Compiled by busbey on Fri Jul 1 08:28:55 CDT 2016
2017-04-10 09:38:31,615 INFO [main] util.VersionInfo: From source with checksum 7ac43c3d2f62f134b2a6aa1a05ad66ac
2017-04-10 09:38:31,888 INFO [main] thrift.ThriftServerRunner: Using default thrift server type
2017-04-10 09:38:31,888 INFO [main] thrift.ThriftServerRunner: Using thrift server type threadpool
2017-04-10 09:38:31,920 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-04-10 10:38:32,580 INFO [ConnectionCache_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2017-04-10 10:38:32,581 INFO [ConnectionCache_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x15b558148f70006
2017-04-10 10:38:32,588 INFO [ConnectionCache_ChoreService_1] zookeeper.ZooKeeper: Session: 0x15b558148f70006 closed
2017-04-10 10:38:32,588 INFO [thrift-worker-2-EventThread] zookeeper.ClientCnxn: EventThread shut down

1
2
3
4
5
6
7
8
In [152]: table.row('66613341731',include_timestamp=True)
Out[152]: {'p:1': ('0.13', 2017)}
In [156]: table.row('66613341731',timestamp=2018, include_timestamp=True)
Out[156]: {'p:1': ('0.13', 2017)}
In [157]: table.row('66613341731',timestamp=2017, include_timestamp=True)
Out[157]: {}