Overview

Dataset statistics

Number of variables5
Number of observations364047
Missing cells0
Missing cells (%)0.0%
Total size in memory37.9 MiB
Average record size in memory109.0 B

Variable types

Categorical3
DateTime1
Numeric1

Alerts

publisher_id has constant value "0" Constant
article_id has a high cardinality: 364047 distinct values High cardinality
category_id has a high cardinality: 461 distinct values High cardinality
article_id has unique values Unique

Reproduction

Analysis started2022-05-07 17:26:30.784804
Analysis finished2022-05-07 17:26:34.273965
Duration3.49 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

article_id
Categorical

HIGH CARDINALITY
UNIQUE

Distinct364047
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size31.2 MiB
0
 
1
242727
 
1
242703
 
1
242702
 
1
242701
 
1
Other values (364042)
364042 

Characters and Unicode

Total characters2073172
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique364047 ?
Unique (%)100.0%

Sample

1st row0
2nd row1
3rd row2
4th row3
5th row4

Common Values

ValueCountFrequency (%)
01
 
< 0.1%
2427271
 
< 0.1%
2427031
 
< 0.1%
2427021
 
< 0.1%
2427011
 
< 0.1%
2427001
 
< 0.1%
2426991
 
< 0.1%
2426981
 
< 0.1%
2426971
 
< 0.1%
2426961
 
< 0.1%
Other values (364037)364037
> 99.9%
ValueCountFrequency (%)
01
 
< 0.1%
1000741
 
< 0.1%
1001
 
< 0.1%
10001
 
< 0.1%
100001
 
< 0.1%
1000001
 
< 0.1%
1000011
 
< 0.1%
1000021
 
< 0.1%
1000031
 
< 0.1%
1000141
 
< 0.1%
Other values (364037)364037
> 99.9%

Most occurring characters

ValueCountFrequency (%)
2286215
13.8%
1286215
13.8%
3250262
12.1%
4185259
8.9%
5185205
8.9%
6179252
8.6%
7175204
8.5%
9175204
8.5%
8175204
8.5%
0175152
8.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2073172
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2286215
13.8%
1286215
13.8%
3250262
12.1%
4185259
8.9%
5185205
8.9%
6179252
8.6%
7175204
8.5%
9175204
8.5%
8175204
8.5%
0175152
8.4%

Most occurring scripts

ValueCountFrequency (%)
Common2073172
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2286215
13.8%
1286215
13.8%
3250262
12.1%
4185259
8.9%
5185205
8.9%
6179252
8.6%
7175204
8.5%
9175204
8.5%
8175204
8.5%
0175152
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2073172
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2286215
13.8%
1286215
13.8%
3250262
12.1%
4185259
8.9%
5185205
8.9%
6179252
8.6%
7175204
8.5%
9175204
8.5%
8175204
8.5%
0175152
8.4%

category_id
Categorical

HIGH CARDINALITY

Distinct461
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size754.2 KiB
281
 
12817
375
 
10005
399
 
9049
412
 
8648
431
 
7759
Other values (456)
315769 

Characters and Unicode

Total characters1019790
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique48 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
28112817
 
3.5%
37510005
 
2.7%
3999049
 
2.5%
4128648
 
2.4%
4317759
 
2.1%
4287731
 
2.1%
267343
 
2.0%
76726
 
1.8%
2996634
 
1.8%
3016446
 
1.8%
Other values (451)280889
77.2%
ValueCountFrequency (%)
28112817
 
3.5%
37510005
 
2.7%
3999049
 
2.5%
4128648
 
2.4%
4317759
 
2.1%
4287731
 
2.1%
267343
 
2.0%
76726
 
1.8%
2996634
 
1.8%
3016446
 
1.8%
Other values (451)280889
77.2%

Most occurring characters

ValueCountFrequency (%)
3184074
18.1%
2165307
16.2%
4148758
14.6%
1116808
11.5%
988587
8.7%
881393
8.0%
578927
7.7%
755791
 
5.5%
052391
 
5.1%
647754
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1019790
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3184074
18.1%
2165307
16.2%
4148758
14.6%
1116808
11.5%
988587
8.7%
881393
8.0%
578927
7.7%
755791
 
5.5%
052391
 
5.1%
647754
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Common1019790
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3184074
18.1%
2165307
16.2%
4148758
14.6%
1116808
11.5%
988587
8.7%
881393
8.0%
578927
7.7%
755791
 
5.5%
052391
 
5.1%
647754
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1019790
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3184074
18.1%
2165307
16.2%
4148758
14.6%
1116808
11.5%
988587
8.7%
881393
8.0%
578927
7.7%
755791
 
5.5%
052391
 
5.1%
647754
 
4.7%
Distinct359552
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
Minimum2006-09-27 13:14:35
Maximum2018-03-13 13:12:30
2022-05-07T19:26:34.432944image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-07T19:26:34.599024image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

publisher_id
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size355.7 KiB
0
364047 

Characters and Unicode

Total characters364047
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0364047
100.0%

Category Frequency Plot

2022-05-07T19:26:34.733005image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0364047
100.0%

Most occurring characters

ValueCountFrequency (%)
0364047
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number364047
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0364047
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common364047
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0364047
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII364047
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0364047
100.0%

words_count
Real number (ℝ≥0)

Distinct866
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean190.8977275
Minimum0
Maximum6690
Zeros35
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size2.8 MiB
2022-05-07T19:26:34.821526image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile120
Q1159
median186
Q3218
95-th percentile277
Maximum6690
Range6690
Interquartile range (IQR)59

Descriptive statistics

Standard deviation59.50276597
Coefficient of variation (CV)0.3116997083
Kurtosis607.7951834
Mean190.8977275
Median Absolute Deviation (MAD)29
Skewness10.14486675
Sum69495745
Variance3540.579158
MonotonicityNot monotonic
2022-05-07T19:26:34.933412image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1763485
 
1.0%
1823480
 
1.0%
1793463
 
1.0%
1783458
 
0.9%
1743456
 
0.9%
1833432
 
0.9%
1843427
 
0.9%
1733414
 
0.9%
1803403
 
0.9%
1773391
 
0.9%
Other values (856)329638
90.5%
ValueCountFrequency (%)
035
< 0.1%
55
 
< 0.1%
64
 
< 0.1%
76
 
< 0.1%
820
 
< 0.1%
917
 
< 0.1%
1049
< 0.1%
1130
< 0.1%
1261
< 0.1%
1325
< 0.1%
ValueCountFrequency (%)
66901
< 0.1%
38081
< 0.1%
35071
< 0.1%
30821
< 0.1%
29951
< 0.1%
28991
< 0.1%
28811
< 0.1%
28551
< 0.1%
27981
< 0.1%
27431
< 0.1%