WYY's Blog

JUST DO IT


  • Home

  • About

  • Tags

  • Categories

  • Archives

Event tracking

Posted on 2017-08-01 | In Security & Privacy

无埋点

无埋点 = 可视化监测部署 = 所见即所得的监测部署

无论是监测网站,还是监测App,都必须加上监测代码或SDK,不加就搜集不到数据。

无论是埋点方法还是不埋点方法,要搜集数据都要加监测代码或SDK,称之为基础代码。要搜集到所有用户行为的数据,光有基础代码是不够的。无论是网站还是App(尤其是App),总有一些特殊的用户操作行为时不能靠基础代码捕获的。

不能被基础代码捕获的用户操作行为,称为event(事件)。

  • 网页:Javascript、Flash、Silverlight、AJAX、各种页面插件的交互等等
  • App:包含用户点击在内的所有交互
  • 凡是遵守http协议的交互(最典型的就是网页的链接),皆是可以由基础监测代码直接监测到数据的。
  • 非http类型的用户交互,基础监测代码都无能为力。

每一个需要监测的event互动,都被称为一个“监测点”。Web上需要的监测点较少,App上则布满了监测点。为了搜集这些监测点上的用户数据,必须部署上专用点事件监测代码(event tracking code)。

事件监测(埋点)只有在基础代码工作的情况下才能发挥作用。

机构为App配置无埋点监测点两种方法:

  • 手指取代鼠标,直接在手机上操作设置。
  • 在电脑浏览器端操作,类似于一个模拟器。

全埋点:本质上和无埋点并无不同。因为无埋点就是直接对页面中所有的交互元素的用户行为进行监听。即使App或网页的开发者不需要监测点部分,无埋点也会将用户行为数据和对应的发生地信息照单全收。

无埋点和埋点点优劣对比

优劣 无埋点 埋点
是否需要加基础代码 要 要
事件监测部署简便性 非常简便,业务人员可以轻松操作 复杂,需要工程师帮助
事件历史数据回溯 可以回溯 埋点前点数据不可以回溯
事件的属性数据 常见的无埋点方法基本不可以添加额外属性 可添加或监测数个事件属性
事件的分类报告 常见的无埋点方法基本不可以实现分类报告 可以实现事件的分类报告
无明确发生地点的时间监测 无法监测 可以监测
  • 用户的交互行为,如果没有具体的“地点”,无埋点方法不适用。如:用户上滑屏幕时,新内容瀑布流式的底部载入。这种交互没有一个明确的监测点位置,是不可见点。在可视化事件监测设置点界面上找不到这类交互。
  • 可视化的无埋点部署,能够为交互行为设置的属性非常有限。一般无埋点部署,只能给交互行为起一个名字,然后机械的记录这个交互行为发生的次数。而埋点方法可以非常好的添加event背后的属性。
  • 埋点方法因为带有多个属性,用户可以轻松的通过这些属性进行分类数据报告的读取。

一种基于App的行为采集系统的设计与实现

App行为数据

企业数据分为三类:

  • 企业内部的交易数据,特别是金融企业有丰富的交易数据
  • 企业和用户之间的交互数据
  • 第三方数据

App行为数据特指用户在网站货App中浏览与点击等行为数据,主要用于统计分析App运营情况。用户在社交论坛互动也称为交互数据,此类交互数据在业务场景中更偏向客户关系管理与产品迭代货舆情监控。

App行为数据分为三个维度:

  • 时间

    某金融App中,正常交易行为数据流程时间在15秒左右,但部分设备的时间段却集中在2-3秒,明显属于非正常用户行为,通过特征分析针对流程时间段进行标签设置,有助于判断用户是否为欺诈行为。

  • 频次

    结合热力图了解产品体验和客户需求,同时也可以用来优化App内部的布局和销售关联产品。
    频次数据经过进一步分析后可转化为趋势数据,这些数据痛产品转化和客户购买行为具有强相关关系。

  • 结果标签

    主要关注是否完成交易,用于判断客户点击浏览的结果。结果数据分为成交和不成交。成交数据可以用于产品体验分析,客户体验分析,渠道ROI分析;不成交数据可以用于二次营销。

| KPI | 维度标准|
| :– | :– |
| 访问次数 | 在一定时间范围内,所有的用户对App的使用次数|
| UV(独立访客) | 在指定时间段内不重复(仅计数一次)的访问者人数 |
| PV(浏览量) | App界面的浏览量,用户每次进入界面即被计数一次 |
| 访问深度 | 用户每次使用App进入的界面层数 |
| 停留时间 | 用户对App中各界面访问的时长 |
| 跳出率 | 单界面访问次数占总访问次数多百分比 |

用户对于某界面:停留时间 = 进入时间 - 离开时间

在App端,同一时间只能展示一个界面,故根据这些时间可以推算出用户的界面访问路径。

  1. App用户使用手机App,产生用户行为,App中行为采集功能根据用户行为的类型采用不同的函数对用户行为进行采集,然后将采集到的数据以二进制文本形式暂存到手机中;
  2. App读取暂存数据文件,对数据进行合并去冗余处理后,采用一定策略集中发送到后端行为采集服务器;
  3. 采集服务器对行为数据进行拆解,根据数据的类型和时间,将数据保存到不同的日志服务器中;
  4. 日志分析服务器定期抓取数据进行分析,产生分析结果;
  5. Web服务器读取分析结果,生成图表以供展示;
  • 提供类库供App调用
  • 提供用户行为的记录函数
    • 记录用户基本信息
    • 记录用户进入、退出界面时页面驻留时间
    • 记录用户进行特定操作时的特定事件
  • 提供数据暂存机制
  • 提供数据发送策略
  • 提供数据接收拆解存储功能

MySQL grammer

Posted on 2017-08-01 | In Database

SQL基础

Select

1
2
3
select column1, column2, ...
from table_name;
select * from table_name;

Distinct

1
2
select distinct column1, column2, ...
from table_name;

Where

1
2
3
select column1, column2, ...
from table_name
where condition;

And , Or, Not

1
2
3
4
5
6
7
8
9
10
11
select column1, column2, ...
from table_name
where condition1 and condition2 and condition3 ...;
select column1, column2, ...
from table_name
where condition1 or condition2 or condition3 ...;
select column1, column2, ...
from table_name
where not condition;

Order By

1
2
3
select column1, column2, ...
from table_name
order by column1, column2, ... ASC|DESC;

Insert Into

1
2
3
4
5
insert into table_name
values (value1, value2, value3, ...);
insert into table_name (column1, column2, column3, ...)
values (value1, value2, value3, ...);

Null Values

1
2
3
4
5
6
7
select column_names
from table_name
where column_name is null;
select column_names
from table_name
where column_name is not null;

Update

1
2
3
update table_name
set column1 = value1, column2 = value2, ...
where condition;

Delete

1
2
delete from table_name
where condition;

Start with happybase on Mac

Posted on 2017-04-11 | In Database

JDK

You have Java and the latest JDK on your OS X system.

1
brew cask install Java

Install and config hadoop

Install

1
brew install hadoop

Config

Edit hadoop-env.sh

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh where 2.7.3 is the hadoop version.

Find the line with

1
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

and change it to

1
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Edit Core-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/core-site.xml .

Edit configuration at last.

1
2
3
4
5
6
7
8
9
10
11
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Edit hdfs-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/mapred-site.xml and by default will be blank. You can copy and change mapred-site.xml.template.

1
2
3
4
5
6
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>

Edit hdfs-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hdfs-site.xml .

Edit configuration at last

1
2
3
4
5
6
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Alias

To simplify life edit your ~/.bash_profile using vim and add the following two commands.

1
2
alias hstart="/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh"

Edit your ~/.bashrc or ~/.zshrc using vim and add the following two commands

1
. ~/.bash_profile

and excute

1
2
$ source ~/.bashrc
$ source ~/.zshrc

in the terminal to update.
Before we can run Hadoop we first need to format the HDFS using

1
$ hdfs namenode -format

SSH Localhost

Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using

1
$ ssh-keygen -t rsa

Enable Remote Login

“System Preferences” -> “Sharing”. Check “Remote Login”

Authorize SSH Keys

To allow your system to accept login, we have to make it aware of the keys that will be used

1
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Let’s try to login.

1
2
3
$ ssh localhost
Last login: Fri Mar 6 20:30:53 2015
$ exit

Running Hadoop

Now we can run Hadoop just by typing

1
$ hstart

and stopping using

1
$ hstop

Good to know

We can access the Hadoop web interface by connecting to

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088
Specific Node Information: http://localhost:8042
Command
$ hstart
$ jps
32384 NameNode
32712 ResourceManager
32587 SecondaryNameNode
32859 Jps
32476 DataNode
32812 NodeManager
# hstop
$ yarn // For resource management more information than the web interface.
$ mapred // Detailed information about jobs

Install zookeeper

1
brew install zookeeper

Install hbase

Install

1
brew install hbase

Alias

To simplify life edit your ~/.bash_profile using vim and add the following two commands.

1
2
alias hbstart="/usr/local/Cellar/hbase/1.2.2/bin/start-hbase.sh"
alias hbstop="/usr/local/Cellar/hbase/1.2.2/bin/stop-hbase.sh"

Good to know

1
2
3
4
5
6
7
8
9
10
11
12
$ hstart
$ hbstart
$ jps
32384 NameNode
33024 Jps
32967 HMaster
32712 ResourceManager
32587 SecondaryNameNode
32476 DataNode
32812 NodeManager
$ hbstop
$ hstop

Install happybase

1
pip install happybase

Examplesfor happybase

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import happybase
connection = happybase.Connection('localhost',table_prefix='namespace')
# Starting with HBase 0.94, the Thrift server optionally uses a framed transport.
# table_prefix is similar to namespace
connection.create_table('data',{'p':dict()},)
connection.tables()
table = connection.table('data')
# input one row
table.put('o', {'p:label': '0', 'p:version': '201701', 'p:weight': '0.4'})
# input rows
with table.batch() as bat:
bat.put('1',{'p:label': '1', 'p:version': '201702', 'p:weight': '0.5'})
bat.put('2',{'p:label': '2', 'p:version': '201703', 'p:weight': '0.6'})
# scan datas stored by batch with timestamp
scan(row='2',timestamnp=201704) #must larger than stored timestamp
# scan whole table
for key, value in table.scan():
print key, value
# scan with timestamp
table.put('2', {'p:label': '233', 'p:version': '2017234', 'p:weight': '0.9'},timestamp=201704)
table.row('2',timestamp=201704)
# enable and disable tables
connection.enable_table('data')
connection.disable_table('data')
# delete table
connection.delete_table('data')

##Problems need to solve

Connect again

Need to coonect again after error or don’t use after second

1
2
connection = happybase.Connection('localhost')
table = connection.table('namespace_data')

TTransportException

1
2
3
4
5
6
7
8
/usr/local/lib/python2.7/site-packages/thriftpy/transport/socket.pyc in read(self, sz)
123 if len(buff) == 0:
124 raise TTransportException(type=TTransportException.END_OF_FILE,
--> 125 message='TSocket read 0 bytes')
126 return buff
127
TTransportException: TTransportException(message='TSocket read 0 bytes', type=4)

Broken pipe

1
2
3
4
5
6
7
8
/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.pyc in meth(name, self, *args)
226
227 def meth(name,self,*args):
--> 228 return getattr(self._sock,name)(*args)
229
230 for _m in _socketmethods:
error: [Errno 32] Broken pipe

Thrift closed

thrift closed selfly after one hour, need to config it.

1
2
3
4
5
6
7
8
9
10
11
12
13
2017-04-10 09:38:31,614 INFO [main] util.VersionInfo: HBase 1.2.2
2017-04-10 09:38:31,615 INFO [main] util.VersionInfo: Source code repository git://asf-dev/home/busbey/projects/hbase revision=3f671c1ead70d249ea4598f1bbcc5151322b3a13
2017-04-10 09:38:31,615 INFO [main] util.VersionInfo: Compiled by busbey on Fri Jul 1 08:28:55 CDT 2016
2017-04-10 09:38:31,615 INFO [main] util.VersionInfo: From source with checksum 7ac43c3d2f62f134b2a6aa1a05ad66ac
2017-04-10 09:38:31,888 INFO [main] thrift.ThriftServerRunner: Using default thrift server type
2017-04-10 09:38:31,888 INFO [main] thrift.ThriftServerRunner: Using thrift server type threadpool
2017-04-10 09:38:31,920 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-04-10 10:38:32,580 INFO [ConnectionCache_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2017-04-10 10:38:32,581 INFO [ConnectionCache_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x15b558148f70006
2017-04-10 10:38:32,588 INFO [ConnectionCache_ChoreService_1] zookeeper.ZooKeeper: Session: 0x15b558148f70006 closed
2017-04-10 10:38:32,588 INFO [thrift-worker-2-EventThread] zookeeper.ClientCnxn: EventThread shut down

1
2
3
4
5
6
7
8
In [152]: table.row('66613341731',include_timestamp=True)
Out[152]: {'p:1': ('0.13', 2017)}
In [156]: table.row('66613341731',timestamp=2018, include_timestamp=True)
Out[156]: {'p:1': ('0.13', 2017)}
In [157]: table.row('66613341731',timestamp=2017, include_timestamp=True)
Out[157]: {}

Basic Principles of Information Protect

Posted on 2017-03-27 | In Security & Privacy

Basic Principles of Information Protect

An externally administered code of ethics or a lack of knowledge about computers adequately protects the stored information.
The protection mechanisms not only protect one user from another, they may also protect their own implementation.
A narrow view is dangerous. It is hard to prove that this negative requirement has been achieved

Considerations Surrounding the Study of Protection

Examples of security techniques sometimes applied to computer systems are the following:

  • labeling files with lists of authorized users,
  • verifying the identity of a prospective user by demanding a password,
  • shielding the computer to prevent interception and subsequent interpretation of electromagnetic radiation,
  • enciphering information sent over telephone lines,
  • locking the room containing the computer,
  • controlling who is allowed to make changes to the computer system (both its hardware and software),
  • using redundant circuits or programmed cross-checks that maintain security in the face of hardware or software failures,
  • certifying that the hardware and software are actually implemented as intended.

Functional Levels of Information Protection:

  • unprotected system
  • before release
    • all-or-nothing system
    • controlled sharing
    • user_programmed sharing controls
  • after release
    • putting strings on information

Design principles

  • Economy of mechanism(simple and small)
  • faill-safe default(white list)
  • complete mediation
  • open design
  • separation of privilege
  • least privilege
  • least common mechanism
  • psychological acceptability
  • two further design principles
    • work factor
    • compromise recording

Technical Underpinnings

development plan

  • from the top down: a subject is coherent and self-contained
  • bottom-up: a topic still containing ad hoc strategies and competing world views

essentials of information protection

  • protect
  • authenticate

an isolated virtual machine

  • descriptor register
  • privileged bit
  • supervisor

authentication mechanisms

  • password
    • easy to guess
    • exposed to be used

shared information

  • list-oriented(high-level)
  • ticket-oriented(low-level)

CLick Fraud

Posted on 2017-03-27 | In Security & Privacy

PPC(Pay Per Click) == CPC(Cost Per Click)
Click Fraud: Caused by competitors, disgruntled employees, software bots, and angry customers clicking on your ads.

ClickCease

ClickCease

  • 24*7 Always On Click-Fraud Monitoring
  • Stops Click-Fraud Before It Happens
  • Business-Specific Click-Fraud detection Rules
  • Save Money(up to 20%-25% of your advertising budget gets lost due to click fraud)

FAQs

  • Doesn’t Google prevent click fraud?
    Google does detect click fraud, but does not prevent it. Instead, Google will give back credit to your account days after the fraud took place and only after you claim it.
  • How do you detect fraudulent ad clicks?
    (1) A HTML code that detects the attacker’s activity on your website.
    (2) An Adwords tracking template that connects us directly to Google to get unique information regarding each click.

PPC Surge

PPC Surge
Once the Monitor is created, we will insert a tracking URL into your Google Adwords account. When a visitor clicks on your ad the tracking URL directs them to our website where we collect their IP address and other data (e.g., keyword) and redirect them to your site. This happens so quickly that, from the visitor’s perspective, it appears as though he/she simply clicked through to your site directly.

If we detect a potential click fraud, instead of instantly redirecting the visitor to your website, we display a warning message on our site.

If after seeing the warning message the visitor continues to click on your ads, our software tells Google to not display any more ads to that IP address thereby saving your advertising budget.

How Google doing about it?

  • Advanced algorithms detect and filter out invalid clicks in real time, before advertisers are even charged.
  • Google’s Ad Traffic Quality Team also conducts manual, offline analysis and removes any clicks that they deem invalid before advertisers are charged.
  • Google also launches investigations based on advertisers’ reports of suspicious activity.

How to identify click fraud by yourself?

You need internal reporting. Internal reporting would tell you if that lead became a sale.

  • IP address
    IP address is pretty self-explanatory.
  • click timestamp
    The time when someone arrives on your site after clicking an ad.
  • action timestamp
    The time when that person completed an action on your site.
    If you see an IP address with a bunch of click timestamps but no action timestamps, then that is likely click fraud.
  • user agent
    Identify whether someone on a particular IP is the same person.
  • proxy
    If the searches are very different, it’s likely a proxy server. If the search queries are similar and are occurring over a super short time period, the clicks are probably fraudulent.

4 tips to protect yourself from click-happy criminals

  • Turn to Facebook/Twitter Ads
    社交网络没有第三方发布者,他们往往是主要的点击诈骗的来源。恶意竞争者也很少见,因为有先进的定位选择。Set up IP Exclusions in AdWords
  • Run GDN Remarketing Campaigns(只发部给到过网站的人看)
    不是很懂
  • Adjust Your Ad Targeting
    排除一些劳动力便宜的地点或语言(click farms)

    To be Continued

1…456

Yuanyi Wu

朝闻道 夕死可矣

29 posts
9 categories
30 tags
GitHub FB Page
© 2019 Yuanyi Wu
Powered by Hexo
|
Theme — NexT.Mist v5.1.2