Dataset statistics
Number of variables | 12 |
---|---|
Number of observations | 2988181 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Total size in memory | 1.7 GiB |
Average record size in memory | 597.6 B |
Variable types
Categorical | 9 |
---|---|
DateTime | 2 |
Numeric | 1 |
user_id has a high cardinality: 322897 distinct values | High cardinality |
session_id has a high cardinality: 1048594 distinct values | High cardinality |
click_article_id has a high cardinality: 46033 distinct values | High cardinality |
Reproduction
Analysis started | 2022-05-07 17:29:08.631589 |
---|---|
Analysis finished | 2022-05-07 17:29:32.735522 |
Duration | 24.1 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 322897 |
---|---|
Distinct (%) | 10.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 177.7 MiB |
5890 | 1232 |
---|---|
73574 | 939 |
15867 | 900 |
80350 | 783 |
15275 | 746 |
Other values (322892) |
Characters and Unicode
Total characters | 16019411 |
---|---|
Distinct characters | 10 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 1 |
4th row | 1 |
5th row | 2 |
Common Values
Value | Count | Frequency (%) |
5890 | 1232 | < 0.1% |
73574 | 939 | < 0.1% |
15867 | 900 | < 0.1% |
80350 | 783 | < 0.1% |
15275 | 746 | < 0.1% |
2151 | 722 | < 0.1% |
4568 | 529 | < 0.1% |
12897 | 513 | < 0.1% |
11521 | 502 | < 0.1% |
34541 | 501 | < 0.1% |
Other values (322887) | 2980814 |
Value | Count | Frequency (%) |
5890 | 1232 | < 0.1% |
73574 | 939 | < 0.1% |
15867 | 900 | < 0.1% |
80350 | 783 | < 0.1% |
15275 | 746 | < 0.1% |
2151 | 722 | < 0.1% |
4568 | 529 | < 0.1% |
12897 | 513 | < 0.1% |
11521 | 502 | < 0.1% |
34541 | 501 | < 0.1% |
Other values (322887) | 2980814 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 2406584 | |
2 | 1986207 | |
3 | 1547037 | |
5 | 1517002 | |
4 | 1508460 | |
6 | 1456353 | |
7 | 1433310 | |
8 | 1410350 | |
9 | 1392458 | |
0 | 1361650 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 16019411 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 2406584 | |
2 | 1986207 | |
3 | 1547037 | |
5 | 1517002 | |
4 | 1508460 | |
6 | 1456353 | |
7 | 1433310 | |
8 | 1410350 | |
9 | 1392458 | |
0 | 1361650 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 16019411 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 2406584 | |
2 | 1986207 | |
3 | 1547037 | |
5 | 1517002 | |
4 | 1508460 | |
6 | 1456353 | |
7 | 1433310 | |
8 | 1410350 | |
9 | 1392458 | |
0 | 1361650 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 16019411 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 2406584 | |
2 | 1986207 | |
3 | 1547037 | |
5 | 1517002 | |
4 | 1508460 | |
6 | 1456353 | |
7 | 1433310 | |
8 | 1410350 | |
9 | 1392458 | |
0 | 1361650 |
Distinct | 1048594 |
---|---|
Distinct (%) | 35.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 208.0 MiB |
1507563657895091 | 124 |
---|---|
1507896573228093 | 107 |
1507133567968022 | 106 |
1507309773225261 | 98 |
1508112331270612 | 94 |
Other values (1048589) |
Characters and Unicode
Total characters | 47810896 |
---|---|
Distinct characters | 10 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1506825423271737 |
---|---|
2nd row | 1506825423271737 |
3rd row | 1506825426267738 |
4th row | 1506825426267738 |
5th row | 1506825435299739 |
Common Values
Value | Count | Frequency (%) |
1507563657895091 | 124 | < 0.1% |
1507896573228093 | 107 | < 0.1% |
1507133567968022 | 106 | < 0.1% |
1507309773225261 | 98 | < 0.1% |
1508112331270612 | 94 | < 0.1% |
1507647366292530 | 92 | < 0.1% |
1507475403662486 | 86 | < 0.1% |
1506959499272114 | 82 | < 0.1% |
1508154737228813 | 79 | < 0.1% |
1506999909218419 | 75 | < 0.1% |
Other values (1048584) | 2987238 |
Value | Count | Frequency (%) |
1507563657895091 | 124 | < 0.1% |
1507896573228093 | 107 | < 0.1% |
1507133567968022 | 106 | < 0.1% |
1507309773225261 | 98 | < 0.1% |
1508112331270612 | 94 | < 0.1% |
1507647366292530 | 92 | < 0.1% |
1507475403662486 | 86 | < 0.1% |
1506959499272114 | 82 | < 0.1% |
1508154737228813 | 79 | < 0.1% |
1506999909218419 | 75 | < 0.1% |
Other values (1048584) | 2987238 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 7222437 | |
5 | 6370248 | |
0 | 6306506 | |
7 | 5505572 | |
2 | 4058812 | |
3 | 3977203 | |
6 | 3794560 | |
8 | 3596989 | |
9 | 3536107 | |
4 | 3442462 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 47810896 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 7222437 | |
5 | 6370248 | |
0 | 6306506 | |
7 | 5505572 | |
2 | 4058812 | |
3 | 3977203 | |
6 | 3794560 | |
8 | 3596989 | |
9 | 3536107 | |
4 | 3442462 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 47810896 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 7222437 | |
5 | 6370248 | |
0 | 6306506 | |
7 | 5505572 | |
2 | 4058812 | |
3 | 3977203 | |
6 | 3794560 | |
8 | 3596989 | |
9 | 3536107 | |
4 | 3442462 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 47810896 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 7222437 | |
5 | 6370248 | |
0 | 6306506 | |
7 | 5505572 | |
2 | 4058812 | |
3 | 3977203 | |
6 | 3794560 | |
8 | 3596989 | |
9 | 3536107 | |
4 | 3442462 |
session_start
Date
Distinct | 646874 |
---|---|
Distinct (%) | 21.6% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.8 MiB |
Minimum | 2017-10-01 04:37:03 |
---|---|
Maximum | 2017-10-17 05:36:19 |
Histogram with fixed size bins (bins=50)
session_size
Real number (ℝ≥0)
Distinct | 72 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 3.901885127 |
Minimum | 2 |
---|---|
Maximum | 124 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.8 MiB |
Quantile statistics
Minimum | 2 |
---|---|
5-th percentile | 2 |
Q1 | 2 |
median | 3 |
Q3 | 4 |
95-th percentile | 9 |
Maximum | 124 |
Range | 122 |
Interquartile range (IQR) | 2 |
Descriptive statistics
Standard deviation | 3.929941495 |
---|---|
Coefficient of variation (CV) | 1.007190465 |
Kurtosis | 158.4608899 |
Mean | 3.901885127 |
Median Absolute Deviation (MAD) | 1 |
Skewness | 9.090074854 |
Sum | 11659539 |
Variance | 15.44444016 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
2 | 1260372 | |
3 | 670185 | |
4 | 374240 | 12.5% |
5 | 220105 | 7.4% |
6 | 135762 | 4.5% |
7 | 88354 | 3.0% |
8 | 58544 | 2.0% |
9 | 40878 | 1.4% |
10 | 29530 | 1.0% |
11 | 21714 | 0.7% |
Other values (62) | 88497 | 3.0% |
Value | Count | Frequency (%) |
2 | 1260372 | |
3 | 670185 | |
4 | 374240 | 12.5% |
5 | 220105 | 7.4% |
6 | 135762 | 4.5% |
7 | 88354 | 3.0% |
8 | 58544 | 2.0% |
9 | 40878 | 1.4% |
10 | 29530 | 1.0% |
11 | 21714 | 0.7% |
Value | Count | Frequency (%) |
124 | 124 | |
107 | 107 | |
106 | 106 | |
98 | 98 | |
94 | 94 | |
92 | 92 | |
86 | 86 | |
82 | 82 | |
79 | 79 | |
75 | 75 |
Distinct | 46033 |
---|---|
Distinct (%) | 1.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 179.0 MiB |
160974 | 37213 |
---|---|
272143 | 28943 |
336221 | 23851 |
234698 | 23499 |
123909 | 23122 |
Other values (46028) |
Characters and Unicode
Total characters | 17347006 |
---|---|
Distinct characters | 10 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 24811 ? |
---|---|
Unique (%) | 0.8% |
Sample
1st row | 157541 |
---|---|
2nd row | 68866 |
3rd row | 235840 |
4th row | 96663 |
5th row | 119592 |
Common Values
Value | Count | Frequency (%) |
160974 | 37213 | 1.2% |
272143 | 28943 | 1.0% |
336221 | 23851 | 0.8% |
234698 | 23499 | 0.8% |
123909 | 23122 | 0.8% |
336223 | 21855 | 0.7% |
96210 | 21577 | 0.7% |
162655 | 21062 | 0.7% |
183176 | 20303 | 0.7% |
168623 | 19526 | 0.7% |
Other values (46023) | 2747230 |
Value | Count | Frequency (%) |
160974 | 37213 | 1.2% |
272143 | 28943 | 1.0% |
336221 | 23851 | 0.8% |
234698 | 23499 | 0.8% |
123909 | 23122 | 0.8% |
336223 | 21855 | 0.7% |
96210 | 21577 | 0.7% |
162655 | 21062 | 0.7% |
183176 | 20303 | 0.7% |
168623 | 19526 | 0.7% |
Other values (46023) | 2747230 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 2669004 | |
1 | 2322402 | |
3 | 2172869 | |
6 | 1692346 | |
5 | 1494065 | |
0 | 1440544 | |
8 | 1433872 | |
4 | 1406484 | |
9 | 1401337 | |
7 | 1314083 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 17347006 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 2669004 | |
1 | 2322402 | |
3 | 2172869 | |
6 | 1692346 | |
5 | 1494065 | |
0 | 1440544 | |
8 | 1433872 | |
4 | 1406484 | |
9 | 1401337 | |
7 | 1314083 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 17347006 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
2 | 2669004 | |
1 | 2322402 | |
3 | 2172869 | |
6 | 1692346 | |
5 | 1494065 | |
0 | 1440544 | |
8 | 1433872 | |
4 | 1406484 | |
9 | 1401337 | |
7 | 1314083 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 17347006 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2 | 2669004 | |
1 | 2322402 | |
3 | 2172869 | |
6 | 1692346 | |
5 | 1494065 | |
0 | 1440544 | |
8 | 1433872 | |
4 | 1406484 | |
9 | 1401337 | |
7 | 1314083 |
click_timestamp
Date
Distinct | 1016184 |
---|---|
Distinct (%) | 34.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.8 MiB |
Minimum | 2017-10-01 05:00:00 |
---|---|
Maximum | 2017-11-13 21:04:14 |
Histogram with fixed size bins (bins=50)
click_environment
Categorical
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 183.0 MiB |
4 - Web | |
---|---|
2 - Mobile App | 79743 |
1 - Facebook Instant Article | 3960 |
Characters and Unicode
Total characters | 21558628 |
---|---|
Distinct characters | 23 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 4 - Web |
---|---|
2nd row | 4 - Web |
3rd row | 4 - Web |
4th row | 4 - Web |
5th row | 4 - Web |
Common Values
Value | Count | Frequency (%) |
4 - Web | 2904478 | |
2 - Mobile App | 79743 | 2.7% |
1 - Facebook Instant Article | 3960 | 0.1% |
Category Frequency Plot
Value | Count | Frequency (%) |
2988181 | ||
4 | 2904478 | |
web | 2904478 | |
2 | 79743 | 0.9% |
mobile | 79743 | 0.9% |
app | 79743 | 0.9% |
1 | 3960 | < 0.1% |
3960 | < 0.1% | |
instant | 3960 | < 0.1% |
article | 3960 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
6064025 | ||
e | 2992141 | |
- | 2988181 | |
b | 2988181 | |
4 | 2904478 | |
W | 2904478 | |
p | 159486 | 0.7% |
o | 87663 | 0.4% |
l | 83703 | 0.4% |
A | 83703 | 0.4% |
Other values (13) | 302589 | 1.4% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 6442397 | |
Space Separator | 6064025 | |
Uppercase Letter | 3075844 | |
Dash Punctuation | 2988181 | |
Decimal Number | 2988181 |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 2992141 | |
b | 2988181 | |
p | 159486 | 2.5% |
o | 87663 | 1.4% |
l | 83703 | 1.3% |
i | 83703 | 1.3% |
t | 11880 | 0.2% |
a | 7920 | 0.1% |
c | 7920 | 0.1% |
n | 7920 | 0.1% |
Other values (3) | 11880 | 0.2% |
Uppercase Letter
Value | Count | Frequency (%) |
W | 2904478 | |
A | 83703 | 2.7% |
M | 79743 | 2.6% |
F | 3960 | 0.1% |
I | 3960 | 0.1% |
Decimal Number
Value | Count | Frequency (%) |
4 | 2904478 | |
2 | 79743 | 2.7% |
1 | 3960 | 0.1% |
Space Separator
Value | Count | Frequency (%) |
6064025 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2988181 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 12040387 | |
Latin | 9518241 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 2992141 | |
b | 2988181 | |
W | 2904478 | |
p | 159486 | 1.7% |
o | 87663 | 0.9% |
l | 83703 | 0.9% |
A | 83703 | 0.9% |
i | 83703 | 0.9% |
M | 79743 | 0.8% |
t | 11880 | 0.1% |
Other values (8) | 43560 | 0.5% |
Common
Value | Count | Frequency (%) |
6064025 | ||
- | 2988181 | |
4 | 2904478 | |
2 | 79743 | 0.7% |
1 | 3960 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 21558628 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
6064025 | ||
e | 2992141 | |
- | 2988181 | |
b | 2988181 | |
4 | 2904478 | |
W | 2904478 | |
p | 159486 | 0.7% |
o | 87663 | 0.4% |
l | 83703 | 0.4% |
A | 83703 | 0.4% |
Other values (13) | 302589 | 1.4% |
click_deviceGroup
Categorical
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 189.9 MiB |
1 - Tablet | |
---|---|
3 - Empty | |
4 - Mobile | 117640 |
5 - Desktop | 283 |
2 - TV | 10 |
Characters and Unicode
Total characters | 28834967 |
---|---|
Distinct characters | 24 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 3 - Empty |
---|---|
2nd row | 3 - Empty |
3rd row | 1 - Tablet |
4th row | 1 - Tablet |
5th row | 1 - Tablet |
Common Values
Value | Count | Frequency (%) |
1 - Tablet | 1823162 | |
3 - Empty | 1047086 | |
4 - Mobile | 117640 | 3.9% |
5 - Desktop | 283 | < 0.1% |
2 - TV | 10 | < 0.1% |
Category Frequency Plot
Value | Count | Frequency (%) |
2988181 | ||
1 | 1823162 | |
tablet | 1823162 | |
3 | 1047086 | 11.7% |
empty | 1047086 | 11.7% |
4 | 117640 | 1.3% |
mobile | 117640 | 1.3% |
5 | 283 | < 0.1% |
desktop | 283 | < 0.1% |
2 | 10 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
5976362 | ||
- | 2988181 | |
t | 2870531 | |
e | 1941085 | 6.7% |
b | 1940802 | 6.7% |
l | 1940802 | 6.7% |
T | 1823172 | 6.3% |
1 | 1823162 | 6.3% |
a | 1823162 | 6.3% |
p | 1047369 | 3.6% |
Other values (14) | 4660339 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 13894052 | |
Space Separator | 5976362 | |
Uppercase Letter | 2988191 | 10.4% |
Dash Punctuation | 2988181 | 10.4% |
Decimal Number | 2988181 | 10.4% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
t | 2870531 | |
e | 1941085 | |
b | 1940802 | |
l | 1940802 | |
a | 1823162 | |
p | 1047369 | 7.5% |
m | 1047086 | 7.5% |
y | 1047086 | 7.5% |
o | 117923 | 0.8% |
i | 117640 | 0.8% |
Other values (2) | 566 | < 0.1% |
Uppercase Letter
Value | Count | Frequency (%) |
T | 1823172 | |
E | 1047086 | |
M | 117640 | 3.9% |
D | 283 | < 0.1% |
V | 10 | < 0.1% |
Decimal Number
Value | Count | Frequency (%) |
1 | 1823162 | |
3 | 1047086 | |
4 | 117640 | 3.9% |
5 | 283 | < 0.1% |
2 | 10 | < 0.1% |
Space Separator
Value | Count | Frequency (%) |
5976362 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2988181 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 16882243 | |
Common | 11952724 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
t | 2870531 | |
e | 1941085 | |
b | 1940802 | |
l | 1940802 | |
T | 1823172 | |
a | 1823162 | |
p | 1047369 | 6.2% |
E | 1047086 | 6.2% |
m | 1047086 | 6.2% |
y | 1047086 | 6.2% |
Other values (7) | 354062 | 2.1% |
Common
Value | Count | Frequency (%) |
5976362 | ||
- | 2988181 | |
1 | 1823162 | 15.3% |
3 | 1047086 | 8.8% |
4 | 117640 | 1.0% |
5 | 283 | < 0.1% |
2 | 10 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 28834967 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5976362 | ||
- | 2988181 | |
t | 2870531 | |
e | 1941085 | 6.7% |
b | 1940802 | 6.7% |
l | 1940802 | 6.7% |
T | 1823172 | 6.3% |
1 | 1823162 | 6.3% |
a | 1823162 | 6.3% |
p | 1047369 | 3.6% |
Other values (14) | 4660339 |
click_os
Categorical
Distinct | 8 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 198.8 MiB |
17 - Firefox OS | |
---|---|
2 - iOS | |
20 - Chromecast | |
12 - tvOS | 60096 |
13 - Chrome OS | 23711 |
Other values (3) | 7951 |
Characters and Unicode
Total characters | 38114007 |
---|---|
Distinct characters | 36 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 20 - Chromecast |
---|---|
2nd row | 20 - Chromecast |
3rd row | 17 - Firefox OS |
4th row | 17 - Firefox OS |
5th row | 17 - Firefox OS |
Common Values
Value | Count | Frequency (%) |
17 - Firefox OS | 1738138 | |
2 - iOS | 788699 | |
20 - Chromecast | 369586 | 12.4% |
12 - tvOS | 60096 | 2.0% |
13 - Chrome OS | 23711 | 0.8% |
19 - Brew MP | 6384 | 0.2% |
5 - Windows Mobile | 1513 | 0.1% |
3 - Android | 54 | < 0.1% |
Category Frequency Plot
Value | Count | Frequency (%) |
2988181 | ||
os | 1761849 | |
17 | 1738138 | |
firefox | 1738138 | |
2 | 788699 | 7.3% |
ios | 788699 | 7.3% |
20 | 369586 | 3.4% |
chromecast | 369586 | 3.4% |
12 | 60096 | 0.6% |
tvos | 60096 | 0.6% |
Other values (10) | 71221 | 0.7% |
Most occurring characters
Value | Count | Frequency (%) |
7746108 | ||
- | 2988181 | 7.8% |
O | 2610644 | 6.8% |
S | 2610644 | 6.8% |
i | 2529917 | 6.6% |
e | 2139332 | 5.6% |
r | 2137873 | 5.6% |
o | 2134515 | 5.6% |
1 | 1828329 | 4.8% |
x | 1738138 | 4.6% |
Other values (26) | 9650326 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 14818667 | |
Space Separator | 7746108 | |
Uppercase Letter | 7374955 | |
Decimal Number | 5186096 | 13.6% |
Dash Punctuation | 2988181 | 7.8% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
i | 2529917 | |
e | 2139332 | |
r | 2137873 | |
o | 2134515 | |
x | 1738138 | |
f | 1738138 | |
t | 429682 | 2.9% |
h | 393297 | 2.7% |
m | 393297 | 2.7% |
s | 371099 | 2.5% |
Other values (8) | 813379 | 5.5% |
Uppercase Letter
Value | Count | Frequency (%) |
O | 2610644 | |
S | 2610644 | |
F | 1738138 | |
C | 393297 | 5.3% |
M | 7897 | 0.1% |
B | 6384 | 0.1% |
P | 6384 | 0.1% |
W | 1513 | < 0.1% |
A | 54 | < 0.1% |
Decimal Number
Value | Count | Frequency (%) |
1 | 1828329 | |
7 | 1738138 | |
2 | 1218381 | |
0 | 369586 | 7.1% |
3 | 23765 | 0.5% |
9 | 6384 | 0.1% |
5 | 1513 | < 0.1% |
Space Separator
Value | Count | Frequency (%) |
7746108 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2988181 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 22193622 | |
Common | 15920385 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
O | 2610644 | |
S | 2610644 | |
i | 2529917 | |
e | 2139332 | |
r | 2137873 | |
o | 2134515 | |
x | 1738138 | |
f | 1738138 | |
F | 1738138 | |
t | 429682 | 1.9% |
Other values (17) | 2386601 |
Common
Value | Count | Frequency (%) |
7746108 | ||
- | 2988181 | 18.8% |
1 | 1828329 | 11.5% |
7 | 1738138 | 10.9% |
2 | 1218381 | 7.7% |
0 | 369586 | 2.3% |
3 | 23765 | 0.1% |
9 | 6384 | < 0.1% |
5 | 1513 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 38114007 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
7746108 | ||
- | 2988181 | 7.8% |
O | 2610644 | 6.8% |
S | 2610644 | 6.8% |
i | 2529917 | 6.6% |
e | 2139332 | 5.6% |
r | 2137873 | 5.6% |
o | 2134515 | 5.6% |
1 | 1828329 | 4.8% |
x | 1738138 | 4.6% |
Other values (26) | 9650326 |
click_country
Categorical
Distinct | 11 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 165.4 MiB |
1 | |
---|---|
10 | 61377 |
11 | 29999 |
8 | 9556 |
6 | 7256 |
Other values (6) | 27587 |
Characters and Unicode
Total characters | 3079557 |
---|---|
Distinct characters | 10 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 2852406 | |
10 | 61377 | 2.1% |
11 | 29999 | 1.0% |
8 | 9556 | 0.3% |
6 | 7256 | 0.2% |
9 | 6746 | 0.2% |
2 | 6101 | 0.2% |
3 | 4540 | 0.2% |
5 | 3498 | 0.1% |
4 | 3389 | 0.1% |
Value | Count | Frequency (%) |
1 | 2852406 | |
10 | 61377 | 2.1% |
11 | 29999 | 1.0% |
8 | 9556 | 0.3% |
6 | 7256 | 0.2% |
9 | 6746 | 0.2% |
2 | 6101 | 0.2% |
3 | 4540 | 0.2% |
5 | 3498 | 0.1% |
4 | 3389 | 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
1 | 2973781 | |
0 | 61377 | 2.0% |
8 | 9556 | 0.3% |
6 | 7256 | 0.2% |
9 | 6746 | 0.2% |
2 | 6101 | 0.2% |
3 | 4540 | 0.1% |
5 | 3498 | 0.1% |
4 | 3389 | 0.1% |
7 | 3313 | 0.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 3079557 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 2973781 | |
0 | 61377 | 2.0% |
8 | 9556 | 0.3% |
6 | 7256 | 0.2% |
9 | 6746 | 0.2% |
2 | 6101 | 0.2% |
3 | 4540 | 0.1% |
5 | 3498 | 0.1% |
4 | 3389 | 0.1% |
7 | 3313 | 0.1% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 3079557 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 2973781 | |
0 | 61377 | 2.0% |
8 | 9556 | 0.3% |
6 | 7256 | 0.2% |
9 | 6746 | 0.2% |
2 | 6101 | 0.2% |
3 | 4540 | 0.1% |
5 | 3498 | 0.1% |
4 | 3389 | 0.1% |
7 | 3313 | 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 3079557 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 2973781 | |
0 | 61377 | 2.0% |
8 | 9556 | 0.3% |
6 | 7256 | 0.2% |
9 | 6746 | 0.2% |
2 | 6101 | 0.2% |
3 | 4540 | 0.1% |
5 | 3498 | 0.1% |
4 | 3389 | 0.1% |
7 | 3313 | 0.1% |
click_region
Categorical
Distinct | 28 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 167.6 MiB |
25 | |
---|---|
21 | |
13 | |
8 | |
16 | |
Other values (23) |
Characters and Unicode
Total characters | 5435935 |
---|---|
Distinct characters | 10 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 20 |
---|---|
2nd row | 20 |
3rd row | 16 |
4th row | 16 |
5th row | 24 |
Common Values
Value | Count | Frequency (%) |
25 | 804985 | |
21 | 464230 | |
13 | 320957 | 10.7% |
8 | 179339 | 6.0% |
16 | 164884 | 5.5% |
28 | 135793 | 4.5% |
24 | 130537 | 4.4% |
20 | 120884 | 4.0% |
5 | 96979 | 3.2% |
9 | 84693 | 2.8% |
Other values (18) | 484900 |
Value | Count | Frequency (%) |
25 | 804985 | |
21 | 464230 | |
13 | 320957 | 10.7% |
8 | 179339 | 6.0% |
16 | 164884 | 5.5% |
28 | 135793 | 4.5% |
24 | 130537 | 4.4% |
20 | 120884 | 4.0% |
5 | 96979 | 3.2% |
9 | 84693 | 2.8% |
Other values (18) | 484900 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 1767881 | |
1 | 1247851 | |
5 | 931499 | |
8 | 330215 | 6.1% |
3 | 324997 | 6.0% |
6 | 241031 | 4.4% |
4 | 186510 | 3.4% |
7 | 144287 | 2.7% |
0 | 142879 | 2.6% |
9 | 118785 | 2.2% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 5435935 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 1767881 | |
1 | 1247851 | |
5 | 931499 | |
8 | 330215 | 6.1% |
3 | 324997 | 6.0% |
6 | 241031 | 4.4% |
4 | 186510 | 3.4% |
7 | 144287 | 2.7% |
0 | 142879 | 2.6% |
9 | 118785 | 2.2% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 5435935 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
2 | 1767881 | |
1 | 1247851 | |
5 | 931499 | |
8 | 330215 | 6.1% |
3 | 324997 | 6.0% |
6 | 241031 | 4.4% |
4 | 186510 | 3.4% |
7 | 144287 | 2.7% |
0 | 142879 | 2.6% |
9 | 118785 | 2.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5435935 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2 | 1767881 | |
1 | 1247851 | |
5 | 931499 | |
8 | 330215 | 6.1% |
3 | 324997 | 6.0% |
6 | 241031 | 4.4% |
4 | 186510 | 3.4% |
7 | 144287 | 2.7% |
0 | 142879 | 2.6% |
9 | 118785 | 2.2% |
click_referrer_type
Categorical
Distinct | 7 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 165.3 MiB |
2 | |
---|---|
1 | |
5 | 80766 |
7 | 69798 |
6 | 20455 |
Other values (2) | 20240 |
Characters and Unicode
Total characters | 2988181 |
---|---|
Distinct characters | 7 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2 |
---|---|
2nd row | 2 |
3rd row | 2 |
4th row | 2 |
5th row | 2 |
Common Values
Value | Count | Frequency (%) |
2 | 1602601 | |
1 | 1194321 | |
5 | 80766 | 2.7% |
7 | 69798 | 2.3% |
6 | 20455 | 0.7% |
4 | 19820 | 0.7% |
3 | 420 | < 0.1% |
Category Frequency Plot
Value | Count | Frequency (%) |
2 | 1602601 | |
1 | 1194321 | |
5 | 80766 | 2.7% |
7 | 69798 | 2.3% |
6 | 20455 | 0.7% |
4 | 19820 | 0.7% |
3 | 420 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
2 | 1602601 | |
1 | 1194321 | |
5 | 80766 | 2.7% |
7 | 69798 | 2.3% |
6 | 20455 | 0.7% |
4 | 19820 | 0.7% |
3 | 420 | < 0.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 2988181 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 1602601 | |
1 | 1194321 | |
5 | 80766 | 2.7% |
7 | 69798 | 2.3% |
6 | 20455 | 0.7% |
4 | 19820 | 0.7% |
3 | 420 | < 0.1% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 2988181 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
2 | 1602601 | |
1 | 1194321 | |
5 | 80766 | 2.7% |
7 | 69798 | 2.3% |
6 | 20455 | 0.7% |
4 | 19820 | 0.7% |
3 | 420 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 2988181 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2 | 1602601 | |
1 | 1194321 | |
5 | 80766 | 2.7% |
7 | 69798 | 2.3% |
6 | 20455 | 0.7% |
4 | 19820 | 0.7% |
3 | 420 | < 0.1% |