Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 364047 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Total size in memory | 37.9 MiB |
| Average record size in memory | 109.0 B |
Variable types
| Categorical | 3 |
|---|---|
| DateTime | 1 |
| Numeric | 1 |
publisher_id has constant value "0" | Constant |
article_id has a high cardinality: 364047 distinct values | High cardinality |
category_id has a high cardinality: 461 distinct values | High cardinality |
article_id has unique values | Unique |
Reproduction
| Analysis started | 2022-05-07 17:26:30.784804 |
|---|---|
| Analysis finished | 2022-05-07 17:26:34.273965 |
| Duration | 3.49 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 364047 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.2 MiB |
| 0 | 1 |
|---|---|
| 242727 | 1 |
| 242703 | 1 |
| 242702 | 1 |
| 242701 | 1 |
| Other values (364042) |
Characters and Unicode
| Total characters | 2073172 |
|---|---|
| Distinct characters | 10 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 364047 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 2 |
| 4th row | 3 |
| 5th row | 4 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 242727 | 1 | < 0.1% |
| 242703 | 1 | < 0.1% |
| 242702 | 1 | < 0.1% |
| 242701 | 1 | < 0.1% |
| 242700 | 1 | < 0.1% |
| 242699 | 1 | < 0.1% |
| 242698 | 1 | < 0.1% |
| 242697 | 1 | < 0.1% |
| 242696 | 1 | < 0.1% |
| Other values (364037) | 364037 |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 100074 | 1 | < 0.1% |
| 100 | 1 | < 0.1% |
| 1000 | 1 | < 0.1% |
| 10000 | 1 | < 0.1% |
| 100000 | 1 | < 0.1% |
| 100001 | 1 | < 0.1% |
| 100002 | 1 | < 0.1% |
| 100003 | 1 | < 0.1% |
| 100014 | 1 | < 0.1% |
| Other values (364037) | 364037 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 286215 | |
| 1 | 286215 | |
| 3 | 250262 | |
| 4 | 185259 | |
| 5 | 185205 | |
| 6 | 179252 | |
| 7 | 175204 | |
| 9 | 175204 | |
| 8 | 175204 | |
| 0 | 175152 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2073172 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 286215 | |
| 1 | 286215 | |
| 3 | 250262 | |
| 4 | 185259 | |
| 5 | 185205 | |
| 6 | 179252 | |
| 7 | 175204 | |
| 9 | 175204 | |
| 8 | 175204 | |
| 0 | 175152 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2073172 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 286215 | |
| 1 | 286215 | |
| 3 | 250262 | |
| 4 | 185259 | |
| 5 | 185205 | |
| 6 | 179252 | |
| 7 | 175204 | |
| 9 | 175204 | |
| 8 | 175204 | |
| 0 | 175152 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2073172 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 286215 | |
| 1 | 286215 | |
| 3 | 250262 | |
| 4 | 185259 | |
| 5 | 185205 | |
| 6 | 179252 | |
| 7 | 175204 | |
| 9 | 175204 | |
| 8 | 175204 | |
| 0 | 175152 |
| Distinct | 461 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 754.2 KiB |
| 281 | 12817 |
|---|---|
| 375 | 10005 |
| 399 | 9049 |
| 412 | 8648 |
| 431 | 7759 |
| Other values (456) |
Characters and Unicode
| Total characters | 1019790 |
|---|---|
| Distinct characters | 10 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 48 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 281 | 12817 | 3.5% |
| 375 | 10005 | 2.7% |
| 399 | 9049 | 2.5% |
| 412 | 8648 | 2.4% |
| 431 | 7759 | 2.1% |
| 428 | 7731 | 2.1% |
| 26 | 7343 | 2.0% |
| 7 | 6726 | 1.8% |
| 299 | 6634 | 1.8% |
| 301 | 6446 | 1.8% |
| Other values (451) | 280889 |
| Value | Count | Frequency (%) |
| 281 | 12817 | 3.5% |
| 375 | 10005 | 2.7% |
| 399 | 9049 | 2.5% |
| 412 | 8648 | 2.4% |
| 431 | 7759 | 2.1% |
| 428 | 7731 | 2.1% |
| 26 | 7343 | 2.0% |
| 7 | 6726 | 1.8% |
| 299 | 6634 | 1.8% |
| 301 | 6446 | 1.8% |
| Other values (451) | 280889 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 184074 | |
| 2 | 165307 | |
| 4 | 148758 | |
| 1 | 116808 | |
| 9 | 88587 | |
| 8 | 81393 | |
| 5 | 78927 | |
| 7 | 55791 | 5.5% |
| 0 | 52391 | 5.1% |
| 6 | 47754 | 4.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1019790 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 184074 | |
| 2 | 165307 | |
| 4 | 148758 | |
| 1 | 116808 | |
| 9 | 88587 | |
| 8 | 81393 | |
| 5 | 78927 | |
| 7 | 55791 | 5.5% |
| 0 | 52391 | 5.1% |
| 6 | 47754 | 4.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1019790 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 184074 | |
| 2 | 165307 | |
| 4 | 148758 | |
| 1 | 116808 | |
| 9 | 88587 | |
| 8 | 81393 | |
| 5 | 78927 | |
| 7 | 55791 | 5.5% |
| 0 | 52391 | 5.1% |
| 6 | 47754 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1019790 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 184074 | |
| 2 | 165307 | |
| 4 | 148758 | |
| 1 | 116808 | |
| 9 | 88587 | |
| 8 | 81393 | |
| 5 | 78927 | |
| 7 | 55791 | 5.5% |
| 0 | 52391 | 5.1% |
| 6 | 47754 | 4.7% |
created_at_ts
Date
| Distinct | 359552 |
|---|---|
| Distinct (%) | 98.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 MiB |
| Minimum | 2006-09-27 13:14:35 |
|---|---|
| Maximum | 2018-03-13 13:12:30 |
Histogram with fixed size bins (bins=50)
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 355.7 KiB |
| 0 |
|---|
Characters and Unicode
| Total characters | 364047 |
|---|---|
| Distinct characters | 1 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 364047 |
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 364047 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 364047 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 364047 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 364047 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 364047 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 364047 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 364047 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 364047 |
words_count
Real number (ℝ≥0)
| Distinct | 866 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 190.8977275 |
| Minimum | 0 |
|---|---|
| Maximum | 6690 |
| Zeros | 35 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 120 |
| Q1 | 159 |
| median | 186 |
| Q3 | 218 |
| 95-th percentile | 277 |
| Maximum | 6690 |
| Range | 6690 |
| Interquartile range (IQR) | 59 |
Descriptive statistics
| Standard deviation | 59.50276597 |
|---|---|
| Coefficient of variation (CV) | 0.3116997083 |
| Kurtosis | 607.7951834 |
| Mean | 190.8977275 |
| Median Absolute Deviation (MAD) | 29 |
| Skewness | 10.14486675 |
| Sum | 69495745 |
| Variance | 3540.579158 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 176 | 3485 | 1.0% |
| 182 | 3480 | 1.0% |
| 179 | 3463 | 1.0% |
| 178 | 3458 | 0.9% |
| 174 | 3456 | 0.9% |
| 183 | 3432 | 0.9% |
| 184 | 3427 | 0.9% |
| 173 | 3414 | 0.9% |
| 180 | 3403 | 0.9% |
| 177 | 3391 | 0.9% |
| Other values (856) | 329638 |
| Value | Count | Frequency (%) |
| 0 | 35 | |
| 5 | 5 | < 0.1% |
| 6 | 4 | < 0.1% |
| 7 | 6 | < 0.1% |
| 8 | 20 | < 0.1% |
| 9 | 17 | < 0.1% |
| 10 | 49 | |
| 11 | 30 | |
| 12 | 61 | |
| 13 | 25 |
| Value | Count | Frequency (%) |
| 6690 | 1 | |
| 3808 | 1 | |
| 3507 | 1 | |
| 3082 | 1 | |
| 2995 | 1 | |
| 2899 | 1 | |
| 2881 | 1 | |
| 2855 | 1 | |
| 2798 | 1 | |
| 2743 | 1 |