Overview

Dataset statistics

Number of variables12
Number of observations2988181
Missing cells0
Missing cells (%)0.0%
Total size in memory1.7 GiB
Average record size in memory597.6 B

Variable types

Categorical9
DateTime2
Numeric1

Alerts

user_id has a high cardinality: 322897 distinct values High cardinality
session_id has a high cardinality: 1048594 distinct values High cardinality
click_article_id has a high cardinality: 46033 distinct values High cardinality

Reproduction

Analysis started2022-05-07 17:29:08.631589
Analysis finished2022-05-07 17:29:32.735522
Duration24.1 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

user_id
Categorical

HIGH CARDINALITY

Distinct322897
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Memory size177.7 MiB
5890
 
1232
73574
 
939
15867
 
900
80350
 
783
15275
 
746
Other values (322892)
2983581 

Characters and Unicode

Total characters16019411
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
58901232
 
< 0.1%
73574939
 
< 0.1%
15867900
 
< 0.1%
80350783
 
< 0.1%
15275746
 
< 0.1%
2151722
 
< 0.1%
4568529
 
< 0.1%
12897513
 
< 0.1%
11521502
 
< 0.1%
34541501
 
< 0.1%
Other values (322887)2980814
99.8%
ValueCountFrequency (%)
58901232
 
< 0.1%
73574939
 
< 0.1%
15867900
 
< 0.1%
80350783
 
< 0.1%
15275746
 
< 0.1%
2151722
 
< 0.1%
4568529
 
< 0.1%
12897513
 
< 0.1%
11521502
 
< 0.1%
34541501
 
< 0.1%
Other values (322887)2980814
99.8%

Most occurring characters

ValueCountFrequency (%)
12406584
15.0%
21986207
12.4%
31547037
9.7%
51517002
9.5%
41508460
9.4%
61456353
9.1%
71433310
8.9%
81410350
8.8%
91392458
8.7%
01361650
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16019411
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
12406584
15.0%
21986207
12.4%
31547037
9.7%
51517002
9.5%
41508460
9.4%
61456353
9.1%
71433310
8.9%
81410350
8.8%
91392458
8.7%
01361650
8.5%

Most occurring scripts

ValueCountFrequency (%)
Common16019411
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12406584
15.0%
21986207
12.4%
31547037
9.7%
51517002
9.5%
41508460
9.4%
61456353
9.1%
71433310
8.9%
81410350
8.8%
91392458
8.7%
01361650
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII16019411
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12406584
15.0%
21986207
12.4%
31547037
9.7%
51517002
9.5%
41508460
9.4%
61456353
9.1%
71433310
8.9%
81410350
8.8%
91392458
8.7%
01361650
8.5%

session_id
Categorical

HIGH CARDINALITY

Distinct1048594
Distinct (%)35.1%
Missing0
Missing (%)0.0%
Memory size208.0 MiB
1507563657895091
 
124
1507896573228093
 
107
1507133567968022
 
106
1507309773225261
 
98
1508112331270612
 
94
Other values (1048589)
2987652 

Characters and Unicode

Total characters47810896
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1506825423271737
2nd row1506825423271737
3rd row1506825426267738
4th row1506825426267738
5th row1506825435299739

Common Values

ValueCountFrequency (%)
1507563657895091124
 
< 0.1%
1507896573228093107
 
< 0.1%
1507133567968022106
 
< 0.1%
150730977322526198
 
< 0.1%
150811233127061294
 
< 0.1%
150764736629253092
 
< 0.1%
150747540366248686
 
< 0.1%
150695949927211482
 
< 0.1%
150815473722881379
 
< 0.1%
150699990921841975
 
< 0.1%
Other values (1048584)2987238
> 99.9%
ValueCountFrequency (%)
1507563657895091124
 
< 0.1%
1507896573228093107
 
< 0.1%
1507133567968022106
 
< 0.1%
150730977322526198
 
< 0.1%
150811233127061294
 
< 0.1%
150764736629253092
 
< 0.1%
150747540366248686
 
< 0.1%
150695949927211482
 
< 0.1%
150815473722881379
 
< 0.1%
150699990921841975
 
< 0.1%
Other values (1048584)2987238
> 99.9%

Most occurring characters

ValueCountFrequency (%)
17222437
15.1%
56370248
13.3%
06306506
13.2%
75505572
11.5%
24058812
8.5%
33977203
8.3%
63794560
7.9%
83596989
7.5%
93536107
7.4%
43442462
7.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number47810896
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
17222437
15.1%
56370248
13.3%
06306506
13.2%
75505572
11.5%
24058812
8.5%
33977203
8.3%
63794560
7.9%
83596989
7.5%
93536107
7.4%
43442462
7.2%

Most occurring scripts

ValueCountFrequency (%)
Common47810896
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
17222437
15.1%
56370248
13.3%
06306506
13.2%
75505572
11.5%
24058812
8.5%
33977203
8.3%
63794560
7.9%
83596989
7.5%
93536107
7.4%
43442462
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII47810896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17222437
15.1%
56370248
13.3%
06306506
13.2%
75505572
11.5%
24058812
8.5%
33977203
8.3%
63794560
7.9%
83596989
7.5%
93536107
7.4%
43442462
7.2%
Distinct646874
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Memory size22.8 MiB
Minimum2017-10-01 04:37:03
Maximum2017-10-17 05:36:19
2022-05-07T19:29:32.927106image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-07T19:29:33.112046image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

session_size
Real number (ℝ≥0)

Distinct72
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.901885127
Minimum2
Maximum124
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.8 MiB
2022-05-07T19:29:33.259818image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q12
median3
Q34
95-th percentile9
Maximum124
Range122
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.929941495
Coefficient of variation (CV)1.007190465
Kurtosis158.4608899
Mean3.901885127
Median Absolute Deviation (MAD)1
Skewness9.090074854
Sum11659539
Variance15.44444016
MonotonicityNot monotonic
2022-05-07T19:29:33.397709image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21260372
42.2%
3670185
22.4%
4374240
 
12.5%
5220105
 
7.4%
6135762
 
4.5%
788354
 
3.0%
858544
 
2.0%
940878
 
1.4%
1029530
 
1.0%
1121714
 
0.7%
Other values (62)88497
 
3.0%
ValueCountFrequency (%)
21260372
42.2%
3670185
22.4%
4374240
 
12.5%
5220105
 
7.4%
6135762
 
4.5%
788354
 
3.0%
858544
 
2.0%
940878
 
1.4%
1029530
 
1.0%
1121714
 
0.7%
ValueCountFrequency (%)
124124
< 0.1%
107107
< 0.1%
106106
< 0.1%
9898
< 0.1%
9494
< 0.1%
9292
< 0.1%
8686
< 0.1%
8282
< 0.1%
7979
< 0.1%
7575
< 0.1%

click_article_id
Categorical

HIGH CARDINALITY

Distinct46033
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size179.0 MiB
160974
 
37213
272143
 
28943
336221
 
23851
234698
 
23499
123909
 
23122
Other values (46028)
2851553 

Characters and Unicode

Total characters17347006
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24811 ?
Unique (%)0.8%

Sample

1st row157541
2nd row68866
3rd row235840
4th row96663
5th row119592

Common Values

ValueCountFrequency (%)
16097437213
 
1.2%
27214328943
 
1.0%
33622123851
 
0.8%
23469823499
 
0.8%
12390923122
 
0.8%
33622321855
 
0.7%
9621021577
 
0.7%
16265521062
 
0.7%
18317620303
 
0.7%
16862319526
 
0.7%
Other values (46023)2747230
91.9%
ValueCountFrequency (%)
16097437213
 
1.2%
27214328943
 
1.0%
33622123851
 
0.8%
23469823499
 
0.8%
12390923122
 
0.8%
33622321855
 
0.7%
9621021577
 
0.7%
16265521062
 
0.7%
18317620303
 
0.7%
16862319526
 
0.7%
Other values (46023)2747230
91.9%

Most occurring characters

ValueCountFrequency (%)
22669004
15.4%
12322402
13.4%
32172869
12.5%
61692346
9.8%
51494065
8.6%
01440544
8.3%
81433872
8.3%
41406484
8.1%
91401337
8.1%
71314083
7.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number17347006
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
22669004
15.4%
12322402
13.4%
32172869
12.5%
61692346
9.8%
51494065
8.6%
01440544
8.3%
81433872
8.3%
41406484
8.1%
91401337
8.1%
71314083
7.6%

Most occurring scripts

ValueCountFrequency (%)
Common17347006
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
22669004
15.4%
12322402
13.4%
32172869
12.5%
61692346
9.8%
51494065
8.6%
01440544
8.3%
81433872
8.3%
41406484
8.1%
91401337
8.1%
71314083
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII17347006
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22669004
15.4%
12322402
13.4%
32172869
12.5%
61692346
9.8%
51494065
8.6%
01440544
8.3%
81433872
8.3%
41406484
8.1%
91401337
8.1%
71314083
7.6%
Distinct1016184
Distinct (%)34.0%
Missing0
Missing (%)0.0%
Memory size22.8 MiB
Minimum2017-10-01 05:00:00
Maximum2017-11-13 21:04:14
2022-05-07T19:29:33.533262image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-07T19:29:33.666485image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size183.0 MiB
4 - Web
2904478 
2 - Mobile App
 
79743
1 - Facebook Instant Article
 
3960

Characters and Unicode

Total characters21558628
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4 - Web
2nd row4 - Web
3rd row4 - Web
4th row4 - Web
5th row4 - Web

Common Values

ValueCountFrequency (%)
4 - Web2904478
97.2%
2 - Mobile App79743
 
2.7%
1 - Facebook Instant Article3960
 
0.1%

Category Frequency Plot

2022-05-07T19:29:33.783443image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2988181
33.0%
42904478
32.1%
web2904478
32.1%
279743
 
0.9%
mobile79743
 
0.9%
app79743
 
0.9%
13960
 
< 0.1%
facebook3960
 
< 0.1%
instant3960
 
< 0.1%
article3960
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
6064025
28.1%
e2992141
13.9%
-2988181
13.9%
b2988181
13.9%
42904478
13.5%
W2904478
13.5%
p159486
 
0.7%
o87663
 
0.4%
l83703
 
0.4%
A83703
 
0.4%
Other values (13)302589
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6442397
29.9%
Space Separator6064025
28.1%
Uppercase Letter3075844
14.3%
Dash Punctuation2988181
13.9%
Decimal Number2988181
13.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2992141
46.4%
b2988181
46.4%
p159486
 
2.5%
o87663
 
1.4%
l83703
 
1.3%
i83703
 
1.3%
t11880
 
0.2%
a7920
 
0.1%
c7920
 
0.1%
n7920
 
0.1%
Other values (3)11880
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
W2904478
94.4%
A83703
 
2.7%
M79743
 
2.6%
F3960
 
0.1%
I3960
 
0.1%
Decimal Number
ValueCountFrequency (%)
42904478
97.2%
279743
 
2.7%
13960
 
0.1%
Space Separator
ValueCountFrequency (%)
6064025
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2988181
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12040387
55.8%
Latin9518241
44.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2992141
31.4%
b2988181
31.4%
W2904478
30.5%
p159486
 
1.7%
o87663
 
0.9%
l83703
 
0.9%
A83703
 
0.9%
i83703
 
0.9%
M79743
 
0.8%
t11880
 
0.1%
Other values (8)43560
 
0.5%
Common
ValueCountFrequency (%)
6064025
50.4%
-2988181
24.8%
42904478
24.1%
279743
 
0.7%
13960
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII21558628
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6064025
28.1%
e2992141
13.9%
-2988181
13.9%
b2988181
13.9%
42904478
13.5%
W2904478
13.5%
p159486
 
0.7%
o87663
 
0.4%
l83703
 
0.4%
A83703
 
0.4%
Other values (13)302589
 
1.4%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size189.9 MiB
1 - Tablet
1823162 
3 - Empty
1047086 
4 - Mobile
 
117640
5 - Desktop
 
283
2 - TV
 
10

Characters and Unicode

Total characters28834967
Distinct characters24
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3 - Empty
2nd row3 - Empty
3rd row1 - Tablet
4th row1 - Tablet
5th row1 - Tablet

Common Values

ValueCountFrequency (%)
1 - Tablet1823162
61.0%
3 - Empty1047086
35.0%
4 - Mobile117640
 
3.9%
5 - Desktop283
 
< 0.1%
2 - TV10
 
< 0.1%

Category Frequency Plot

2022-05-07T19:29:33.875427image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2988181
33.3%
11823162
20.3%
tablet1823162
20.3%
31047086
 
11.7%
empty1047086
 
11.7%
4117640
 
1.3%
mobile117640
 
1.3%
5283
 
< 0.1%
desktop283
 
< 0.1%
210
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
5976362
20.7%
-2988181
10.4%
t2870531
10.0%
e1941085
 
6.7%
b1940802
 
6.7%
l1940802
 
6.7%
T1823172
 
6.3%
11823162
 
6.3%
a1823162
 
6.3%
p1047369
 
3.6%
Other values (14)4660339
16.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter13894052
48.2%
Space Separator5976362
20.7%
Uppercase Letter2988191
 
10.4%
Dash Punctuation2988181
 
10.4%
Decimal Number2988181
 
10.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t2870531
20.7%
e1941085
14.0%
b1940802
14.0%
l1940802
14.0%
a1823162
13.1%
p1047369
 
7.5%
m1047086
 
7.5%
y1047086
 
7.5%
o117923
 
0.8%
i117640
 
0.8%
Other values (2)566
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
T1823172
61.0%
E1047086
35.0%
M117640
 
3.9%
D283
 
< 0.1%
V10
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
11823162
61.0%
31047086
35.0%
4117640
 
3.9%
5283
 
< 0.1%
210
 
< 0.1%
Space Separator
ValueCountFrequency (%)
5976362
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2988181
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin16882243
58.5%
Common11952724
41.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t2870531
17.0%
e1941085
11.5%
b1940802
11.5%
l1940802
11.5%
T1823172
10.8%
a1823162
10.8%
p1047369
 
6.2%
E1047086
 
6.2%
m1047086
 
6.2%
y1047086
 
6.2%
Other values (7)354062
 
2.1%
Common
ValueCountFrequency (%)
5976362
50.0%
-2988181
25.0%
11823162
 
15.3%
31047086
 
8.8%
4117640
 
1.0%
5283
 
< 0.1%
210
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII28834967
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5976362
20.7%
-2988181
10.4%
t2870531
10.0%
e1941085
 
6.7%
b1940802
 
6.7%
l1940802
 
6.7%
T1823172
 
6.3%
11823162
 
6.3%
a1823162
 
6.3%
p1047369
 
3.6%
Other values (14)4660339
16.2%

click_os
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size198.8 MiB
17 - Firefox OS
1738138 
2 - iOS
788699 
20 - Chromecast
369586 
12 - tvOS
 
60096
13 - Chrome OS
 
23711
Other values (3)
 
7951

Characters and Unicode

Total characters38114007
Distinct characters36
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20 - Chromecast
2nd row20 - Chromecast
3rd row17 - Firefox OS
4th row17 - Firefox OS
5th row17 - Firefox OS

Common Values

ValueCountFrequency (%)
17 - Firefox OS1738138
58.2%
2 - iOS788699
26.4%
20 - Chromecast369586
 
12.4%
12 - tvOS60096
 
2.0%
13 - Chrome OS23711
 
0.8%
19 - Brew MP6384
 
0.2%
5 - Windows Mobile1513
 
0.1%
3 - Android54
 
< 0.1%

Category Frequency Plot

2022-05-07T19:29:34.017643image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2988181
27.8%
os1761849
16.4%
171738138
16.2%
firefox1738138
16.2%
2788699
 
7.3%
ios788699
 
7.3%
20369586
 
3.4%
chromecast369586
 
3.4%
1260096
 
0.6%
tvos60096
 
0.6%
Other values (10)71221
 
0.7%

Most occurring characters

ValueCountFrequency (%)
7746108
20.3%
-2988181
 
7.8%
O2610644
 
6.8%
S2610644
 
6.8%
i2529917
 
6.6%
e2139332
 
5.6%
r2137873
 
5.6%
o2134515
 
5.6%
11828329
 
4.8%
x1738138
 
4.6%
Other values (26)9650326
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14818667
38.9%
Space Separator7746108
20.3%
Uppercase Letter7374955
19.3%
Decimal Number5186096
 
13.6%
Dash Punctuation2988181
 
7.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i2529917
17.1%
e2139332
14.4%
r2137873
14.4%
o2134515
14.4%
x1738138
11.7%
f1738138
11.7%
t429682
 
2.9%
h393297
 
2.7%
m393297
 
2.7%
s371099
 
2.5%
Other values (8)813379
 
5.5%
Uppercase Letter
ValueCountFrequency (%)
O2610644
35.4%
S2610644
35.4%
F1738138
23.6%
C393297
 
5.3%
M7897
 
0.1%
B6384
 
0.1%
P6384
 
0.1%
W1513
 
< 0.1%
A54
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
11828329
35.3%
71738138
33.5%
21218381
23.5%
0369586
 
7.1%
323765
 
0.5%
96384
 
0.1%
51513
 
< 0.1%
Space Separator
ValueCountFrequency (%)
7746108
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2988181
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin22193622
58.2%
Common15920385
41.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
O2610644
11.8%
S2610644
11.8%
i2529917
11.4%
e2139332
9.6%
r2137873
9.6%
o2134515
9.6%
x1738138
7.8%
f1738138
7.8%
F1738138
7.8%
t429682
 
1.9%
Other values (17)2386601
10.8%
Common
ValueCountFrequency (%)
7746108
48.7%
-2988181
 
18.8%
11828329
 
11.5%
71738138
 
10.9%
21218381
 
7.7%
0369586
 
2.3%
323765
 
0.1%
96384
 
< 0.1%
51513
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII38114007
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7746108
20.3%
-2988181
 
7.8%
O2610644
 
6.8%
S2610644
 
6.8%
i2529917
 
6.6%
e2139332
 
5.6%
r2137873
 
5.6%
o2134515
 
5.6%
11828329
 
4.8%
x1738138
 
4.6%
Other values (26)9650326
25.3%

click_country
Categorical

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size165.4 MiB
1
2852406 
10
 
61377
11
 
29999
8
 
9556
6
 
7256
Other values (6)
 
27587

Characters and Unicode

Total characters3079557
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
12852406
95.5%
1061377
 
2.1%
1129999
 
1.0%
89556
 
0.3%
67256
 
0.2%
96746
 
0.2%
26101
 
0.2%
34540
 
0.2%
53498
 
0.1%
43389
 
0.1%
ValueCountFrequency (%)
12852406
95.5%
1061377
 
2.1%
1129999
 
1.0%
89556
 
0.3%
67256
 
0.2%
96746
 
0.2%
26101
 
0.2%
34540
 
0.2%
53498
 
0.1%
43389
 
0.1%

Most occurring characters

ValueCountFrequency (%)
12973781
96.6%
061377
 
2.0%
89556
 
0.3%
67256
 
0.2%
96746
 
0.2%
26101
 
0.2%
34540
 
0.1%
53498
 
0.1%
43389
 
0.1%
73313
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3079557
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
12973781
96.6%
061377
 
2.0%
89556
 
0.3%
67256
 
0.2%
96746
 
0.2%
26101
 
0.2%
34540
 
0.1%
53498
 
0.1%
43389
 
0.1%
73313
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common3079557
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12973781
96.6%
061377
 
2.0%
89556
 
0.3%
67256
 
0.2%
96746
 
0.2%
26101
 
0.2%
34540
 
0.1%
53498
 
0.1%
43389
 
0.1%
73313
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3079557
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12973781
96.6%
061377
 
2.0%
89556
 
0.3%
67256
 
0.2%
96746
 
0.2%
26101
 
0.2%
34540
 
0.1%
53498
 
0.1%
43389
 
0.1%
73313
 
0.1%

click_region
Categorical

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size167.6 MiB
25
804985 
21
464230 
13
320957 
8
179339 
16
164884 
Other values (23)
1053786 

Characters and Unicode

Total characters5435935
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20
2nd row20
3rd row16
4th row16
5th row24

Common Values

ValueCountFrequency (%)
25804985
26.9%
21464230
15.5%
13320957
 
10.7%
8179339
 
6.0%
16164884
 
5.5%
28135793
 
4.5%
24130537
 
4.4%
20120884
 
4.0%
596979
 
3.2%
984693
 
2.8%
Other values (18)484900
16.2%
ValueCountFrequency (%)
25804985
26.9%
21464230
15.5%
13320957
 
10.7%
8179339
 
6.0%
16164884
 
5.5%
28135793
 
4.5%
24130537
 
4.4%
20120884
 
4.0%
596979
 
3.2%
984693
 
2.8%
Other values (18)484900
16.2%

Most occurring characters

ValueCountFrequency (%)
21767881
32.5%
11247851
23.0%
5931499
17.1%
8330215
 
6.1%
3324997
 
6.0%
6241031
 
4.4%
4186510
 
3.4%
7144287
 
2.7%
0142879
 
2.6%
9118785
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5435935
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21767881
32.5%
11247851
23.0%
5931499
17.1%
8330215
 
6.1%
3324997
 
6.0%
6241031
 
4.4%
4186510
 
3.4%
7144287
 
2.7%
0142879
 
2.6%
9118785
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
Common5435935
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21767881
32.5%
11247851
23.0%
5931499
17.1%
8330215
 
6.1%
3324997
 
6.0%
6241031
 
4.4%
4186510
 
3.4%
7144287
 
2.7%
0142879
 
2.6%
9118785
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII5435935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21767881
32.5%
11247851
23.0%
5931499
17.1%
8330215
 
6.1%
3324997
 
6.0%
6241031
 
4.4%
4186510
 
3.4%
7144287
 
2.7%
0142879
 
2.6%
9118785
 
2.2%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size165.3 MiB
2
1602601 
1
1194321 
5
 
80766
7
 
69798
6
 
20455
Other values (2)
 
20240

Characters and Unicode

Total characters2988181
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
21602601
53.6%
11194321
40.0%
580766
 
2.7%
769798
 
2.3%
620455
 
0.7%
419820
 
0.7%
3420
 
< 0.1%

Category Frequency Plot

2022-05-07T19:29:34.168150image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
21602601
53.6%
11194321
40.0%
580766
 
2.7%
769798
 
2.3%
620455
 
0.7%
419820
 
0.7%
3420
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
21602601
53.6%
11194321
40.0%
580766
 
2.7%
769798
 
2.3%
620455
 
0.7%
419820
 
0.7%
3420
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2988181
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21602601
53.6%
11194321
40.0%
580766
 
2.7%
769798
 
2.3%
620455
 
0.7%
419820
 
0.7%
3420
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common2988181
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21602601
53.6%
11194321
40.0%
580766
 
2.7%
769798
 
2.3%
620455
 
0.7%
419820
 
0.7%
3420
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2988181
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21602601
53.6%
11194321
40.0%
580766
 
2.7%
769798
 
2.3%
620455
 
0.7%
419820
 
0.7%
3420
 
< 0.1%