Dataset statistics
Number of variables | 5 |
---|---|
Number of observations | 364047 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Total size in memory | 37.9 MiB |
Average record size in memory | 109.0 B |
Variable types
Categorical | 3 |
---|---|
DateTime | 1 |
Numeric | 1 |
publisher_id has constant value "0" | Constant |
article_id has a high cardinality: 364047 distinct values | High cardinality |
category_id has a high cardinality: 461 distinct values | High cardinality |
article_id has unique values | Unique |
Reproduction
Analysis started | 2022-05-07 17:26:30.784804 |
---|---|
Analysis finished | 2022-05-07 17:26:34.273965 |
Duration | 3.49 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 364047 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 31.2 MiB |
0 | 1 |
---|---|
242727 | 1 |
242703 | 1 |
242702 | 1 |
242701 | 1 |
Other values (364042) |
Characters and Unicode
Total characters | 2073172 |
---|---|
Distinct characters | 10 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 364047 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 0 |
---|---|
2nd row | 1 |
3rd row | 2 |
4th row | 3 |
5th row | 4 |
Common Values
Value | Count | Frequency (%) |
0 | 1 | < 0.1% |
242727 | 1 | < 0.1% |
242703 | 1 | < 0.1% |
242702 | 1 | < 0.1% |
242701 | 1 | < 0.1% |
242700 | 1 | < 0.1% |
242699 | 1 | < 0.1% |
242698 | 1 | < 0.1% |
242697 | 1 | < 0.1% |
242696 | 1 | < 0.1% |
Other values (364037) | 364037 |
Value | Count | Frequency (%) |
0 | 1 | < 0.1% |
100074 | 1 | < 0.1% |
100 | 1 | < 0.1% |
1000 | 1 | < 0.1% |
10000 | 1 | < 0.1% |
100000 | 1 | < 0.1% |
100001 | 1 | < 0.1% |
100002 | 1 | < 0.1% |
100003 | 1 | < 0.1% |
100014 | 1 | < 0.1% |
Other values (364037) | 364037 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 286215 | |
1 | 286215 | |
3 | 250262 | |
4 | 185259 | |
5 | 185205 | |
6 | 179252 | |
7 | 175204 | |
9 | 175204 | |
8 | 175204 | |
0 | 175152 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 2073172 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 286215 | |
1 | 286215 | |
3 | 250262 | |
4 | 185259 | |
5 | 185205 | |
6 | 179252 | |
7 | 175204 | |
9 | 175204 | |
8 | 175204 | |
0 | 175152 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 2073172 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
2 | 286215 | |
1 | 286215 | |
3 | 250262 | |
4 | 185259 | |
5 | 185205 | |
6 | 179252 | |
7 | 175204 | |
9 | 175204 | |
8 | 175204 | |
0 | 175152 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 2073172 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2 | 286215 | |
1 | 286215 | |
3 | 250262 | |
4 | 185259 | |
5 | 185205 | |
6 | 179252 | |
7 | 175204 | |
9 | 175204 | |
8 | 175204 | |
0 | 175152 |
Distinct | 461 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 754.2 KiB |
281 | 12817 |
---|---|
375 | 10005 |
399 | 9049 |
412 | 8648 |
431 | 7759 |
Other values (456) |
Characters and Unicode
Total characters | 1019790 |
---|---|
Distinct characters | 10 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 48 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | 0 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
281 | 12817 | 3.5% |
375 | 10005 | 2.7% |
399 | 9049 | 2.5% |
412 | 8648 | 2.4% |
431 | 7759 | 2.1% |
428 | 7731 | 2.1% |
26 | 7343 | 2.0% |
7 | 6726 | 1.8% |
299 | 6634 | 1.8% |
301 | 6446 | 1.8% |
Other values (451) | 280889 |
Value | Count | Frequency (%) |
281 | 12817 | 3.5% |
375 | 10005 | 2.7% |
399 | 9049 | 2.5% |
412 | 8648 | 2.4% |
431 | 7759 | 2.1% |
428 | 7731 | 2.1% |
26 | 7343 | 2.0% |
7 | 6726 | 1.8% |
299 | 6634 | 1.8% |
301 | 6446 | 1.8% |
Other values (451) | 280889 |
Most occurring characters
Value | Count | Frequency (%) |
3 | 184074 | |
2 | 165307 | |
4 | 148758 | |
1 | 116808 | |
9 | 88587 | |
8 | 81393 | |
5 | 78927 | |
7 | 55791 | 5.5% |
0 | 52391 | 5.1% |
6 | 47754 | 4.7% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 1019790 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
3 | 184074 | |
2 | 165307 | |
4 | 148758 | |
1 | 116808 | |
9 | 88587 | |
8 | 81393 | |
5 | 78927 | |
7 | 55791 | 5.5% |
0 | 52391 | 5.1% |
6 | 47754 | 4.7% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 1019790 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
3 | 184074 | |
2 | 165307 | |
4 | 148758 | |
1 | 116808 | |
9 | 88587 | |
8 | 81393 | |
5 | 78927 | |
7 | 55791 | 5.5% |
0 | 52391 | 5.1% |
6 | 47754 | 4.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1019790 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
3 | 184074 | |
2 | 165307 | |
4 | 148758 | |
1 | 116808 | |
9 | 88587 | |
8 | 81393 | |
5 | 78927 | |
7 | 55791 | 5.5% |
0 | 52391 | 5.1% |
6 | 47754 | 4.7% |
created_at_ts
Date
Distinct | 359552 |
---|---|
Distinct (%) | 98.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.8 MiB |
Minimum | 2006-09-27 13:14:35 |
---|---|
Maximum | 2018-03-13 13:12:30 |
Histogram with fixed size bins (bins=50)
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 355.7 KiB |
0 |
---|
Characters and Unicode
Total characters | 364047 |
---|---|
Distinct characters | 1 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 0 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 364047 |
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 364047 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 364047 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 364047 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 364047 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 364047 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 364047 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 364047 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 364047 |
words_count
Real number (ℝ≥0)
Distinct | 866 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 190.8977275 |
Minimum | 0 |
---|---|
Maximum | 6690 |
Zeros | 35 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.8 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 120 |
Q1 | 159 |
median | 186 |
Q3 | 218 |
95-th percentile | 277 |
Maximum | 6690 |
Range | 6690 |
Interquartile range (IQR) | 59 |
Descriptive statistics
Standard deviation | 59.50276597 |
---|---|
Coefficient of variation (CV) | 0.3116997083 |
Kurtosis | 607.7951834 |
Mean | 190.8977275 |
Median Absolute Deviation (MAD) | 29 |
Skewness | 10.14486675 |
Sum | 69495745 |
Variance | 3540.579158 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
176 | 3485 | 1.0% |
182 | 3480 | 1.0% |
179 | 3463 | 1.0% |
178 | 3458 | 0.9% |
174 | 3456 | 0.9% |
183 | 3432 | 0.9% |
184 | 3427 | 0.9% |
173 | 3414 | 0.9% |
180 | 3403 | 0.9% |
177 | 3391 | 0.9% |
Other values (856) | 329638 |
Value | Count | Frequency (%) |
0 | 35 | |
5 | 5 | < 0.1% |
6 | 4 | < 0.1% |
7 | 6 | < 0.1% |
8 | 20 | < 0.1% |
9 | 17 | < 0.1% |
10 | 49 | |
11 | 30 | |
12 | 61 | |
13 | 25 |
Value | Count | Frequency (%) |
6690 | 1 | |
3808 | 1 | |
3507 | 1 | |
3082 | 1 | |
2995 | 1 | |
2899 | 1 | |
2881 | 1 | |
2855 | 1 | |
2798 | 1 | |
2743 | 1 |