new_activity_metrics 插件new_activity_metrics plugin

New Users 的队列计算有用的活动指标(非重复计数值、非重复的新值计数、保留率和变动率)。Calculates useful activity metrics (distinct count values, distinct count of new values, retention rate, and churn rate) for the cohort of New Users. New Users 的每个队列(时间窗口内首先看到的所有用户)与以前的所有队列进行比较。Each cohort of New Users (all users which were 1st seen in time window) is compared to all prior cohorts. 比较会考虑所有以前的时间窗口。Comparison takes into account all previous time windows. 例如,在 from=T2 和 to=T3 的记录中,非重复用户计数是出现在 T3 中同时未出现在 T1 和 T2 中的所有用户。For example, in the record for from=T2 and to=T3, the distinct count of users will be all users in T3 who were not seen in both T1 and T2.

T | evaluate new_activity_metrics(id, datetime_column, startofday(ago(30d)), startofday(now()), 1d, dim1, dim2, dim3)

语法Syntax

T | evaluate new_activity_metrics(IdColumn, TimelineColumn, Start, End, Window [, Cohort ] [, dim1, dim2, ...] [, Lookback ] )T | evaluate new_activity_metrics(IdColumn, TimelineColumn, Start, End, Window [, Cohort ] [, dim1, dim2, ...] [, Lookback ] )

参数Arguments

  • T :输入表格表达式。T : The input tabular expression.
  • IdColumn :列的名称,其 ID 值表示用户活动。IdColumn : The name of the column with ID values that represent user activity.
  • TimelineColumn:表示时间线的列的名称。TimelineColumn : The name of the column that represent timeline.
  • 开始 :带有分析开始时段值的标量。Start : Scalar with value of the analysis start period.
  • End:带有分析结束时段值的标量。End : Scalar with value of the analysis end period.
  • Window:带有分析窗口时段值的标量。Window : Scalar with value of the analysis window period. 既可以是数字/日期/时间/时间戳值,也可以是 week/month/year 中的一个字符串(在这种情况下,这些时段分别为 startofweek/startofmonth/startofyear)。Can be either a numeric/datetime/timestamp value, or a string which is one of week/month/year, in which case all periods will be startofweek/startofmonth/startofyear accordingly.
  • Cohort :(可选)指示特定队列的标量常数。Cohort : (optional) a scalar constant indicating specific cohort. 如未提供,则计算并返回与分析时间窗口相对应的所有队列。If not provided, all cohorts corresponding to the analysis time window are calculated and returned.
  • dim1, dim2, ... :(可选)维度列的列表,用于切分活动指标计算。dim1 , dim2 , ...: (optional) list of the dimensions columns that slice the activity metrics calculation.
  • Lookback:(可选)一个表格表达式,其中包含一组属于回溯时段的 IDLookback : (optional) a tabular expression with a set of IDs that belong to the 'look back' period

返回Returns

返回一个表,该表包含每个“from”和“to”时间线时段组合和每个现有维度组合的非重复计数值、非重复的新值计数、保留率和变动率。Returns a table that has the distinct count values, distinct count of new values, retention rate, and churn rate for each combination of 'from' and 'to' timeline periods and for each existing dimensions combination.

输出表架构如下:Output table schema is:

from_TimelineColumnfrom_TimelineColumn to_TimelineColumnto_TimelineColumn dcount_new_valuesdcount_new_values dcount_retained_valuesdcount_retained_values dcount_churn_valuesdcount_churn_values retention_rateretention_rate churn_ratechurn_rate dim1dim1 .... dim_ndim_n
类型:自 TimelineColumn 起type: as of TimelineColumn 相同same longlong longlong Doubledouble Doubledouble Doubledouble .... .... ....
  • from_TimelineColumn - 新用户的队列。from_TimelineColumn - the cohort of new users. 此记录中的指标是指此时间段内首次见到的所有用户。Metrics in this record refer to all users who were first seen in this period. 决定是否为首次见到要考虑分析时段中所有之前的时段。The decision on first seen takes into account all previous periods in the analysis period.
  • to_TimelineColumn - 进行比较的时段。to_TimelineColumn - the period being compared to.
  • dcount_new_values - 出现在 to_TimelineColumn 中同时未出现在 from_TimelineColumn(含)之前的所有时段中的非重复用户数。dcount_new_values - the number of distinct users in to_TimelineColumn which were not seen in all periods prior to and including from_TimelineColumn.
  • dcount_retained_values - 在所有的新用户(即 from_TimelineColumn 中首次出现的用户)中,出现在 to_TimelineCoumn 中的非重复用户数。dcount_retained_values - out of all new users, first seen in from_TimelineColumn, the number of distinct users which were seen in to_TimelineCoumn.
  • dcount_churn_values - 在所有的新用户(即 from_TimelineColumn 中首次出现的用户)中,未出现在 to_TimelineCoumn 中的非重复用户数。dcount_churn_values - out of all new users, first seen in from_TimelineColumn, the number of distinct users which were not seen in to_TimelineCoumn.
  • retention_rate - 队列(首次出现在 from_TimelineColumn 中的用户)中的 dcount_retained_values 百分比。retention_rate - the percent of dcount_retained_values out of the cohort (users first seen in from_TimelineColumn).
  • churn_rate - 队列(首次出现在 from_TimelineColumn 中的用户)中的 dcount_churn_values 百分比。churn_rate - the percent of dcount_churn_values out of the cohort (users first seen in from_TimelineColumn).

备注Notes

有关 Retention RateChurn Rate 的定义,请参阅 activity_metrics 插件文档中的“注意”部分。For definitions of Retention Rate and Churn Rate - refer to Notes section in activity_metrics plugin documentation.

示例Examples

以下示例数据集显示了哪些用户将出现在哪些天。The following sample data set shows which users seen on which days. 该表基于源 Users 表而生成,如下所示:The table was generated based on a source Users table, as follows:

Users | summarize tostring(make_set(user)) by bin(Timestamp, 1d) | order by Timestamp asc;
TimestampTimestamp set_userset_user
2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 [0,2,3,4][0,2,3,4]
2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 [0,1,3,4,5][0,1,3,4,5]
2019-11-03 00:00:00.00000002019-11-03 00:00:00.0000000 [0,2,4,5][0,2,4,5]
2019-11-04 00:00:00.00000002019-11-04 00:00:00.0000000 [0,1,2,3][0,1,2,3]
2019-11-05 00:00:00.00000002019-11-05 00:00:00.0000000 [0,1,2,3,4][0,1,2,3,4]

原始表插件的输出如下所示:The output of the plugin for the original table is the following:

let StartDate = datetime(2019-11-01 00:00:00);
let EndDate = datetime(2019-11-07 00:00:00);
Users 
| evaluate new_activity_metrics(user, Timestamp, StartDate, EndDate-1tick, 1d) 
| where from_Timestamp < datetime(2019-11-03 00:00:00.0000000)
RR from_Timestampfrom_Timestamp to_Timestampto_Timestamp dcount_new_valuesdcount_new_values dcount_retained_valuesdcount_retained_values dcount_churn_valuesdcount_churn_values retention_rateretention_rate churn_ratechurn_rate
11 2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 44 44 00 11 00
22 2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 22 33 11 0.750.75 0.250.25
33 2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 2019-11-03 00:00:00.00000002019-11-03 00:00:00.0000000 11 33 11 0.750.75 0.250.25
44 2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 2019-11-04 00:00:00.00000002019-11-04 00:00:00.0000000 11 33 11 0.750.75 0.250.25
55 2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 2019-11-05 00:00:00.00000002019-11-05 00:00:00.0000000 11 44 00 11 00
66 2019-11-01 00:00:00.00000002019-11-01 00:00:00.0000000 2019-11-06 00:00:00.00000002019-11-06 00:00:00.0000000 00 00 44 00 11
77 2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 22 22 00 11 00
88 2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 2019-11-03 00:00:00.00000002019-11-03 00:00:00.0000000 00 11 11 0.50.5 0.50.5
99 2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 2019-11-04 00:00:00.00000002019-11-04 00:00:00.0000000 00 11 11 0.50.5 0.50.5
1010 2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 2019-11-05 00:00:00.00000002019-11-05 00:00:00.0000000 00 11 11 0.50.5 0.50.5
1111 2019-11-02 00:00:00.00000002019-11-02 00:00:00.0000000 2019-11-06 00:00:00.00000002019-11-06 00:00:00.0000000 00 00 22 00 11

下面分析了输出中的数条记录:The following is an analysis of a few records from the output:

  • 记录 R=3from_TimelineColumn = 2019-11-01to_TimelineColumn = 2019-11-03Record R=3, from_TimelineColumn = 2019-11-01, to_TimelineColumn = 2019-11-03:

    • 此记录的用户即为 11/1 出现的所有新用户。The users considered for this record are all new users seen on 11/1. 由于这是第一个时段,因此这些是该箱中的所有用户 - [0,2,3,4]Since this is the first period, these are all users in that bin – [0,2,3,4]
    • dcount_new_values - 出现在 11/3 同时未出现 11/1 的用户的数目。dcount_new_values – the number of users on 11/3 who weren't seen on 11/1. 这里有一个用户 - 5This includes a single user – 5.
    • dcount_retained_values - 11/1 出现的所有新用户中,有多少保留到了 11/3?dcount_retained_values – out of all new users on 11/1, how many were retained until 11/3? 有三个 ([0,2,4]),而 count_churn_values 为 1(用户 3)。There are three ([0,2,4]), while count_churn_values is one (user=3).
    • retention_rate = 0.75 - 四个新用户(出现在 11/1 的用户)中保留了三个。retention_rate = 0.75 – the three retained users out of the four new users who were first seen in 11/1.
  • 记录 R=9from_TimelineColumn = 2019-11-02to_TimelineColumn = 2019-11-04Record R=9, from_TimelineColumn = 2019-11-02, to_TimelineColumn = 2019-11-04:

    • 此记录关注首次出现在 11/2 的新用户 - 用户15This record focuses on the new users who were first seen on 11/2 – users 1 and 5.
    • dcount_new_values - 出现在 11/4 同时未出现在 T0 .. from_Timestamp 间所有时段的用户的数目。dcount_new_values – the number of users on 11/4 who weren't seen through all periods T0 .. from_Timestamp. 这意味着,出现在 11/4 但未出现在 11/1 或 11/2 的用户,没有这样的用户。Meaning, users who are seen on 11/4 but who were not seen on either 11/1 or 11/2 – there are no such users.
    • dcount_retained_values - 11/2 出现的所有新用户 ([1,5]) 中,有多少保留到了 11/4?dcount_retained_values – out of all new users on 11/2 ([1,5]), how many were retained until 11/4? 有一个这样的用户 ([1]),count_churn_values 为一(用户 5)。There's one such user ([1]), while count_churn_values is one (user 5).
    • retention_rate 为 0.5 - 11/2 出现的两个新用户中 11/4 保留了一个。retention_rate is 0.5 – the single user that was retained on 11/4 out of the two new ones on 11/2.

每周保留率和变动率(一周)Weekly retention rate, and churn rate (single week)

下个查询计算 New Users 队列(首周出现的用户)的每周保留率和变动率。The next query calculates a retention and churn rate for week-over-week window for New Users cohort (users that arrived on the first week).

// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end  step 1d
| extend d = tolong((Day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Take only the first week cohort (last parameter)
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d, _start)
| project from_Day, to_Day, retention_rate, churn_rate
from_Dayfrom_Day to_Dayto_Day retention_rateretention_rate churn_ratechurn_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 11 00
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 0.5446327683615820.544632768361582 0.4553672316384180.455367231638418
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 0.0316384180790960.031638418079096 0.9683615819209040.968361581920904
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 00 11
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 00 11

每周保留率和变动率(完整矩阵)Weekly retention rate, and churn rate (complete matrix)

下个查询计算 New Users 队列的每周保留率和变动率。The next query calculates retention and churn rate for week-over-week window for New Users cohort. 如果前面的示例计算一周的统计信息,则下面将生成每个 from/to 组合的 NxN 表。If the previous example calculated the statistics for a single week - the below produces NxN table for each from/to combination.

// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end  step 1d
| extend d = tolong((Day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Last parameter is omitted - 
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d)
| project from_Day, to_Day, retention_rate, churn_rate
from_Dayfrom_Day to_Dayto_Day retention_rateretention_rate churn_ratechurn_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 11 00
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 0.1903973509933770.190397350993377 0.8096026490066220.809602649006622
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 00 11
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 00 11
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 00 11
2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 11 00
2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 0.4052631578947370.405263157894737 0.5947368421052630.594736842105263
2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 0.2276315789473680.227631578947368 0.7723684210526320.772368421052632
2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 00 11
2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 11 00
2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 0.7854889589905360.785488958990536 0.2145110410094640.214511041009464
2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 0.2376445846477390.237644584647739 0.7623554153522610.762355415352261
2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 11 00
2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 0.6218354430379750.621835443037975 0.3781645569620250.378164556962025
2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 11 00

回溯时段的每周保留率Weekly retention rate with lookback period

下面的查询在考虑 lookback 时段的情况下计算 New Users 队列的保留率:一个表格查询,其中包含用于定义 New Users 队列的一组 ID(此集中未出现的所有 ID 为 New Users)。The following query calculates the retention rate of New Users cohort when taking into consideration lookback period: a tabular query with set of Ids that are used to define the New Users cohort (all IDs that do not appear in this set are New Users). 查询在分析期间检查 New Users 的保留行为。The query examines the retention behavior of the New Users during the analysis period.

// Generate random data of user activities
let _lookback = datetime(2017-02-01);
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
let _data = range Day from _lookback to _end  step 1d
| extend d = tolong((Day - _lookback)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000;
//
let lookback_data = _data | where Day < _start | project Day, id;
_data
| evaluate new_activity_metrics(id, Day, _start, _end, 7d, _start, lookback_data)
| project from_Day, to_Day, retention_rate
from_Dayfrom_Day to_Dayto_Day retention_rateretention_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 11
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-08 00:00:00.00000002017-05-08 00:00:00.0000000 0.4040816326530610.404081632653061
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-15 00:00:00.00000002017-05-15 00:00:00.0000000 0.2571428571428570.257142857142857
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-22 00:00:00.00000002017-05-22 00:00:00.0000000 0.2963265306122450.296326530612245
2017-05-01 00:00:00.00000002017-05-01 00:00:00.0000000 2017-05-29 00:00:00.00000002017-05-29 00:00:00.0000000 0.05877551020408160.0587755102040816