Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 10407 |
Missing cells | 32503 |
Missing cells (%) | 28.4% |
Total size in memory | 6.5 MiB |
Average record size in memory | 655.2 B |
Variable types
Categorical | 11 |
---|
0 has a high cardinality: 9695 distinct values | High cardinality |
1 has a high cardinality: 332 distinct values | High cardinality |
2 has a high cardinality: 339 distinct values | High cardinality |
3 has a high cardinality: 382 distinct values | High cardinality |
4 has a high cardinality: 392 distinct values | High cardinality |
5 has a high cardinality: 151 distinct values | High cardinality |
6 has a high cardinality: 155 distinct values | High cardinality |
7 has a high cardinality: 129 distinct values | High cardinality |
8 has a high cardinality: 131 distinct values | High cardinality |
9 has a high cardinality: 225 distinct values | High cardinality |
10 has a high cardinality: 228 distinct values | High cardinality |
1 has 2120 (20.4%) missing values | Missing |
2 has 787 (7.6%) missing values | Missing |
3 has 2100 (20.2%) missing values | Missing |
4 has 776 (7.5%) missing values | Missing |
5 has 4120 (39.6%) missing values | Missing |
6 has 2977 (28.6%) missing values | Missing |
7 has 5620 (54.0%) missing values | Missing |
8 has 4673 (44.9%) missing values | Missing |
9 has 5152 (49.5%) missing values | Missing |
10 has 4178 (40.1%) missing values | Missing |
Reproduction
Analysis started | 2022-05-16 16:20:51.392335 |
---|---|
Analysis finished | 2022-05-16 16:20:52.078447 |
Duration | 0.69 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 9695 |
---|---|
Distinct (%) | 93.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 1.1 MiB |
Thanks! | 73 |
---|---|
Thank you! | 58 |
yes | 31 |
Thanks | 26 |
thanks! | 24 |
Other values (9690) |
Characters and Unicode
Total characters | 544786 |
---|---|
Distinct characters | 94 |
Distinct categories | 15 ? |
Distinct scripts | 2 ? |
Distinct blocks | 3 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 9480 ? |
---|---|
Unique (%) | 91.1% |
Sample
1st row | I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700. |
---|---|
2nd row | Yes, how about going to Neverland from Caprica on August 13, 2016 for 5 adults. For this trip, my budget would be 1900. |
3rd row | I have no flexibility for dates... but I can leave from Atlantis rather than Caprica. How about that? |
4th row | I suppose I'll speak with my husband to see if we can choose other dates, and then I'll come back to you.Thanks for your help |
5th row | Hello, I am looking to book a vacation from Gotham City to Mos Eisley for $2100. |
Common Values
Value | Count | Frequency (%) |
Thanks! | 73 | 0.7% |
Thank you! | 58 | 0.6% |
yes | 31 | 0.3% |
Thanks | 26 | 0.2% |
thanks! | 24 | 0.2% |
no | 23 | 0.2% |
thanks | 21 | 0.2% |
Thank you | 20 | 0.2% |
thank you! | 13 | 0.1% |
thank you | 13 | 0.1% |
Other values (9685) | 10105 |
Value | Count | Frequency (%) |
to | 4570 | 4.2% |
i | 4053 | 3.8% |
the | 3088 | 2.9% |
you | 2078 | 1.9% |
a | 1844 | 1.7% |
and | 1778 | 1.6% |
for | 1709 | 1.6% |
is | 1436 | 1.3% |
me | 1420 | 1.3% |
from | 1327 | 1.2% |
Other values (4396) | 84587 |
Most occurring characters
Value | Count | Frequency (%) |
97131 | ||
e | 46090 | 8.5% |
t | 39848 | 7.3% |
o | 37111 | 6.8% |
a | 34531 | 6.3% |
n | 25884 | 4.8% |
i | 23774 | 4.4% |
s | 21881 | 4.0% |
h | 19730 | 3.6% |
r | 18671 | 3.4% |
Other values (84) | 180135 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 392489 | |
Space Separator | 97131 | 17.8% |
Uppercase Letter | 27189 | 5.0% |
Other Punctuation | 17702 | 3.2% |
Decimal Number | 8509 | 1.6% |
Control | 535 | 0.1% |
Final Punctuation | 508 | 0.1% |
Currency Symbol | 250 | < 0.1% |
Dash Punctuation | 231 | < 0.1% |
Connector Punctuation | 174 | < 0.1% |
Other values (5) | 68 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 46090 | |
t | 39848 | 10.2% |
o | 37111 | 9.5% |
a | 34531 | 8.8% |
n | 25884 | 6.6% |
i | 23774 | 6.1% |
s | 21881 | 5.6% |
h | 19730 | 5.0% |
r | 18671 | 4.8% |
l | 18137 | 4.6% |
Other values (17) | 106832 |
Uppercase Letter
Value | Count | Frequency (%) |
I | 5179 | |
T | 2191 | 8.1% |
A | 1983 | 7.3% |
S | 1898 | 7.0% |
H | 1544 | 5.7% |
W | 1541 | 5.7% |
O | 1417 | 5.2% |
C | 1229 | 4.5% |
E | 1075 | 4.0% |
N | 1040 | 3.8% |
Other values (16) | 8092 |
Other Punctuation
Value | Count | Frequency (%) |
. | 6004 | |
? | 4028 | |
, | 2540 | |
! | 2481 | |
' | 2028 | 11.5% |
: | 456 | 2.6% |
… | 81 | 0.5% |
/ | 39 | 0.2% |
" | 15 | 0.1% |
; | 12 | 0.1% |
Other values (4) | 18 | 0.1% |
Decimal Number
Value | Count | Frequency (%) |
0 | 2137 | |
1 | 1434 | |
2 | 1275 | |
3 | 815 | 9.6% |
5 | 642 | 7.5% |
4 | 572 | 6.7% |
7 | 483 | 5.7% |
6 | 459 | 5.4% |
8 | 368 | 4.3% |
9 | 324 | 3.8% |
Math Symbol
Value | Count | Frequency (%) |
+ | 13 | |
~ | 3 | 15.8% |
| | 2 | 10.5% |
= | 1 | 5.3% |
Control
Value | Count | Frequency (%) |
533 | ||
2 | 0.4% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 230 | |
— | 1 | 0.4% |
Close Punctuation
Value | Count | Frequency (%) |
) | 24 | |
] | 1 | 4.0% |
Space Separator
Value | Count | Frequency (%) |
97131 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 508 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 250 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 174 |
Open Punctuation
Value | Count | Frequency (%) |
( | 21 |
Modifier Symbol
Value | Count | Frequency (%) |
` | 2 |
Initial Punctuation
Value | Count | Frequency (%) |
“ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 419678 | |
Common | 125108 | 23.0% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 46090 | 11.0% |
t | 39848 | 9.5% |
o | 37111 | 8.8% |
a | 34531 | 8.2% |
n | 25884 | 6.2% |
i | 23774 | 5.7% |
s | 21881 | 5.2% |
h | 19730 | 4.7% |
r | 18671 | 4.4% |
l | 18137 | 4.3% |
Other values (43) | 134021 |
Common
Value | Count | Frequency (%) |
97131 | ||
. | 6004 | 4.8% |
? | 4028 | 3.2% |
, | 2540 | 2.0% |
! | 2481 | 2.0% |
0 | 2137 | 1.7% |
' | 2028 | 1.6% |
1 | 1434 | 1.1% |
2 | 1275 | 1.0% |
3 | 815 | 0.7% |
Other values (31) | 5235 | 4.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 544193 | |
Punctuation | 591 | 0.1% |
None | 2 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
97131 | ||
e | 46090 | 8.5% |
t | 39848 | 7.3% |
o | 37111 | 6.8% |
a | 34531 | 6.3% |
n | 25884 | 4.8% |
i | 23774 | 4.4% |
s | 21881 | 4.0% |
h | 19730 | 3.6% |
r | 18671 | 3.4% |
Other values (79) | 179542 |
Punctuation
Value | Count | Frequency (%) |
’ | 508 | |
… | 81 | 13.7% |
— | 1 | 0.2% |
“ | 1 | 0.2% |
None
Value | Count | Frequency (%) |
é | 2 |
Distinct | 332 |
---|---|
Distinct (%) | 4.0% |
Missing | 2120 |
Missing (%) | 20.4% |
Memory size | 591.2 KiB |
-1 | 158 |
---|---|
Punta Cana | 140 |
Tijuana | 126 |
Toronto | 123 |
Calgary | 101 |
Other values (327) |
Characters and Unicode
Total characters | 65103 |
---|---|
Distinct characters | 58 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 28 ? |
---|---|
Unique (%) | 0.3% |
Sample
1st row | Caprica |
---|---|
2nd row | Caprica |
3rd row | Atlantis |
4th row | Gotham City |
5th row | Gotham City |
Common Values
Value | Count | Frequency (%) |
-1 | 158 | 1.5% |
Punta Cana | 140 | 1.3% |
Tijuana | 126 | 1.2% |
Toronto | 123 | 1.2% |
Calgary | 101 | 1.0% |
Busan | 100 | 1.0% |
St. Petersburg | 99 | 1.0% |
Fukuoka | 89 | 0.9% |
Nagoya | 85 | 0.8% |
Burlington | 85 | 0.8% |
Other values (322) | 7181 | |
(Missing) | 2120 | 20.4% |
Value | Count | Frequency (%) |
san | 296 | 2.9% |
punta | 178 | 1.8% |
cana | 178 | 1.8% |
1 | 158 | 1.6% |
st | 155 | 1.5% |
toronto | 151 | 1.5% |
tijuana | 144 | 1.4% |
vancouver | 117 | 1.2% |
petersburg | 113 | 1.1% |
city | 109 | 1.1% |
Other values (192) | 8518 |
Most occurring characters
Value | Count | Frequency (%) |
a | 7594 | 11.7% |
n | 5685 | 8.7% |
o | 5615 | 8.6% |
e | 4596 | 7.1% |
i | 4472 | 6.9% |
r | 3393 | 5.2% |
t | 3349 | 5.1% |
l | 2945 | 4.5% |
u | 2601 | 4.0% |
s | 2503 | 3.8% |
Other values (48) | 22350 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 53240 | |
Uppercase Letter | 9496 | 14.6% |
Space Separator | 1830 | 2.8% |
Other Punctuation | 221 | 0.3% |
Dash Punctuation | 158 | 0.2% |
Decimal Number | 158 | 0.2% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 7594 | |
n | 5685 | |
o | 5615 | |
e | 4596 | |
i | 4472 | |
r | 3393 | 6.4% |
t | 3349 | 6.3% |
l | 2945 | 5.5% |
u | 2601 | 4.9% |
s | 2503 | 4.7% |
Other values (16) | 10487 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 1220 | |
B | 913 | 9.6% |
C | 889 | 9.4% |
P | 752 | 7.9% |
M | 727 | 7.7% |
T | 657 | 6.9% |
L | 568 | 6.0% |
A | 480 | 5.1% |
D | 375 | 3.9% |
R | 331 | 3.5% |
Other values (16) | 2584 |
Other Punctuation
Value | Count | Frequency (%) |
. | 155 | |
, | 60 | 27.1% |
' | 6 | 2.7% |
Space Separator
Value | Count | Frequency (%) |
1830 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 158 |
Decimal Number
Value | Count | Frequency (%) |
1 | 158 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 62736 | |
Common | 2367 | 3.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 7594 | 12.1% |
n | 5685 | 9.1% |
o | 5615 | 9.0% |
e | 4596 | 7.3% |
i | 4472 | 7.1% |
r | 3393 | 5.4% |
t | 3349 | 5.3% |
l | 2945 | 4.7% |
u | 2601 | 4.1% |
s | 2503 | 4.0% |
Other values (42) | 19983 |
Common
Value | Count | Frequency (%) |
1830 | ||
- | 158 | 6.7% |
1 | 158 | 6.7% |
. | 155 | 6.5% |
, | 60 | 2.5% |
' | 6 | 0.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 65103 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 7594 | 11.7% |
n | 5685 | 8.7% |
o | 5615 | 8.6% |
e | 4596 | 7.1% |
i | 4472 | 6.9% |
r | 3393 | 5.2% |
t | 3349 | 5.1% |
l | 2945 | 4.5% |
u | 2601 | 4.0% |
s | 2503 | 3.8% |
Other values (48) | 22350 |
Distinct | 339 |
---|---|
Distinct (%) | 3.5% |
Missing | 787 |
Missing (%) | 7.6% |
Memory size | 634.0 KiB |
-1 | 174 |
---|---|
Punta Cana | 160 |
Tijuana | 144 |
Toronto | 140 |
Calgary | 115 |
Other values (334) |
Characters and Unicode
Total characters | 75614 |
---|---|
Distinct characters | 58 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 29 ? |
---|---|
Unique (%) | 0.3% |
Sample
1st row | Caprica |
---|---|
2nd row | Caprica |
3rd row | Atlantis |
4th row | Atlantis |
5th row | Gotham City |
Common Values
Value | Count | Frequency (%) |
-1 | 174 | 1.7% |
Punta Cana | 160 | 1.5% |
Tijuana | 144 | 1.4% |
Toronto | 140 | 1.3% |
Calgary | 115 | 1.1% |
Busan | 113 | 1.1% |
St. Petersburg | 113 | 1.1% |
Fukuoka | 103 | 1.0% |
Burlington | 101 | 1.0% |
Beijing | 97 | 0.9% |
Other values (329) | 8360 | |
(Missing) | 787 | 7.6% |
Value | Count | Frequency (%) |
san | 346 | 2.9% |
cana | 203 | 1.7% |
punta | 203 | 1.7% |
st | 176 | 1.5% |
1 | 174 | 1.5% |
toronto | 174 | 1.5% |
tijuana | 164 | 1.4% |
vancouver | 134 | 1.1% |
petersburg | 128 | 1.1% |
calgary | 125 | 1.1% |
Other values (193) | 9912 |
Most occurring characters
Value | Count | Frequency (%) |
a | 8788 | 11.6% |
n | 6601 | 8.7% |
o | 6512 | 8.6% |
e | 5351 | 7.1% |
i | 5221 | 6.9% |
r | 3949 | 5.2% |
t | 3889 | 5.1% |
l | 3441 | 4.6% |
u | 3014 | 4.0% |
s | 2910 | 3.8% |
Other values (48) | 25938 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 61847 | |
Uppercase Letter | 11050 | 14.6% |
Space Separator | 2119 | 2.8% |
Other Punctuation | 250 | 0.3% |
Dash Punctuation | 174 | 0.2% |
Decimal Number | 174 | 0.2% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 8788 | |
n | 6601 | |
o | 6512 | |
e | 5351 | |
i | 5221 | |
r | 3949 | 6.4% |
t | 3889 | 6.3% |
l | 3441 | 5.6% |
u | 3014 | 4.9% |
s | 2910 | 4.7% |
Other values (16) | 12171 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 1407 | |
B | 1071 | 9.7% |
C | 1023 | 9.3% |
P | 870 | 7.9% |
M | 842 | 7.6% |
T | 767 | 6.9% |
L | 657 | 5.9% |
A | 573 | 5.2% |
D | 432 | 3.9% |
R | 385 | 3.5% |
Other values (16) | 3023 |
Other Punctuation
Value | Count | Frequency (%) |
. | 176 | |
, | 67 | 26.8% |
' | 7 | 2.8% |
Space Separator
Value | Count | Frequency (%) |
2119 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 174 |
Decimal Number
Value | Count | Frequency (%) |
1 | 174 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 72897 | |
Common | 2717 | 3.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 8788 | 12.1% |
n | 6601 | 9.1% |
o | 6512 | 8.9% |
e | 5351 | 7.3% |
i | 5221 | 7.2% |
r | 3949 | 5.4% |
t | 3889 | 5.3% |
l | 3441 | 4.7% |
u | 3014 | 4.1% |
s | 2910 | 4.0% |
Other values (42) | 23221 |
Common
Value | Count | Frequency (%) |
2119 | ||
. | 176 | 6.5% |
- | 174 | 6.4% |
1 | 174 | 6.4% |
, | 67 | 2.5% |
' | 7 | 0.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 75614 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 8788 | 11.6% |
n | 6601 | 8.7% |
o | 6512 | 8.6% |
e | 5351 | 7.1% |
i | 5221 | 6.9% |
r | 3949 | 5.2% |
t | 3889 | 5.1% |
l | 3441 | 4.6% |
u | 3014 | 4.0% |
s | 2910 | 3.8% |
Other values (48) | 25938 |
Distinct | 382 |
---|---|
Distinct (%) | 4.6% |
Missing | 2100 |
Missing (%) | 20.2% |
Memory size | 590.2 KiB |
-1 | 257 |
---|---|
Punta Cana | 243 |
Rome | 173 |
Hamburg | 139 |
Kingston | 135 |
Other values (377) |
Characters and Unicode
Total characters | 63554 |
---|---|
Distinct characters | 57 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 60 ? |
---|---|
Unique (%) | 0.7% |
Sample
1st row | Atlantis |
---|---|
2nd row | Neverland |
3rd row | Atlantis |
4th row | Mos Eisley |
5th row | Neverland |
Common Values
Value | Count | Frequency (%) |
-1 | 257 | 2.5% |
Punta Cana | 243 | 2.3% |
Rome | 173 | 1.7% |
Hamburg | 139 | 1.3% |
Kingston | 135 | 1.3% |
Mexico City | 132 | 1.3% |
Kyoto | 130 | 1.2% |
Ulsan | 125 | 1.2% |
San Juan | 122 | 1.2% |
Denver | 118 | 1.1% |
Other values (372) | 6733 | |
(Missing) | 2100 | 20.2% |
Value | Count | Frequency (%) |
san | 408 | 4.0% |
cana | 275 | 2.7% |
punta | 266 | 2.6% |
1 | 257 | 2.5% |
rome | 206 | 2.0% |
mexico | 189 | 1.8% |
city | 180 | 1.7% |
kyoto | 163 | 1.6% |
kingston | 159 | 1.5% |
paris | 159 | 1.5% |
Other values (218) | 8045 |
Most occurring characters
Value | Count | Frequency (%) |
a | 7808 | 12.3% |
n | 5537 | 8.7% |
o | 5396 | 8.5% |
i | 4327 | 6.8% |
e | 4107 | 6.5% |
t | 3237 | 5.1% |
r | 3235 | 5.1% |
u | 2700 | 4.2% |
s | 2572 | 4.0% |
l | 2531 | 4.0% |
Other values (47) | 22104 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 51087 | |
Uppercase Letter | 9747 | 15.3% |
Space Separator | 2000 | 3.1% |
Dash Punctuation | 258 | 0.4% |
Decimal Number | 257 | 0.4% |
Other Punctuation | 205 | 0.3% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 7808 | |
n | 5537 | |
o | 5396 | |
i | 4327 | |
e | 4107 | |
t | 3237 | 6.3% |
r | 3235 | 6.3% |
u | 2700 | 5.3% |
s | 2572 | 5.0% |
l | 2531 | 5.0% |
Other values (16) | 9637 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 1128 | |
C | 1034 | 10.6% |
P | 1034 | 10.6% |
M | 833 | 8.5% |
B | 723 | 7.4% |
A | 570 | 5.8% |
K | 497 | 5.1% |
L | 472 | 4.8% |
R | 385 | 3.9% |
D | 375 | 3.8% |
Other values (15) | 2696 |
Other Punctuation
Value | Count | Frequency (%) |
, | 110 | |
. | 91 | |
' | 4 | 2.0% |
Space Separator
Value | Count | Frequency (%) |
2000 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 258 |
Decimal Number
Value | Count | Frequency (%) |
1 | 257 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 60834 | |
Common | 2720 | 4.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 7808 | |
n | 5537 | 9.1% |
o | 5396 | 8.9% |
i | 4327 | 7.1% |
e | 4107 | 6.8% |
t | 3237 | 5.3% |
r | 3235 | 5.3% |
u | 2700 | 4.4% |
s | 2572 | 4.2% |
l | 2531 | 4.2% |
Other values (41) | 19384 |
Common
Value | Count | Frequency (%) |
2000 | ||
- | 258 | 9.5% |
1 | 257 | 9.4% |
, | 110 | 4.0% |
. | 91 | 3.3% |
' | 4 | 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 63554 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 7808 | 12.3% |
n | 5537 | 8.7% |
o | 5396 | 8.5% |
i | 4327 | 6.8% |
e | 4107 | 6.5% |
t | 3237 | 5.1% |
r | 3235 | 5.1% |
u | 2700 | 4.2% |
s | 2572 | 4.0% |
l | 2531 | 4.0% |
Other values (47) | 22104 |
Distinct | 392 |
---|---|
Distinct (%) | 4.1% |
Missing | 776 |
Missing (%) | 7.5% |
Memory size | 632.7 KiB |
Punta Cana | 283 |
---|---|
-1 | 279 |
Rome | 195 |
Hamburg | 161 |
Kingston | 155 |
Other values (387) |
Characters and Unicode
Total characters | 73918 |
---|---|
Distinct characters | 57 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 59 ? |
---|---|
Unique (%) | 0.6% |
Sample
1st row | Atlantis |
---|---|
2nd row | Neverland |
3rd row | Atlantis |
4th row | Atlantis |
5th row | Mos Eisley |
Common Values
Value | Count | Frequency (%) |
Punta Cana | 283 | 2.7% |
-1 | 279 | 2.7% |
Rome | 195 | 1.9% |
Hamburg | 161 | 1.5% |
Kingston | 155 | 1.5% |
Mexico City | 150 | 1.4% |
Kyoto | 145 | 1.4% |
Ulsan | 143 | 1.4% |
San Juan | 142 | 1.4% |
Denver | 136 | 1.3% |
Other values (382) | 7842 | |
(Missing) | 776 | 7.5% |
Value | Count | Frequency (%) |
san | 471 | 3.9% |
cana | 318 | 2.7% |
punta | 308 | 2.6% |
1 | 279 | 2.3% |
rome | 232 | 1.9% |
mexico | 217 | 1.8% |
city | 204 | 1.7% |
kingston | 184 | 1.5% |
paris | 182 | 1.5% |
kyoto | 182 | 1.5% |
Other values (220) | 9392 |
Most occurring characters
Value | Count | Frequency (%) |
a | 9067 | 12.3% |
n | 6448 | 8.7% |
o | 6267 | 8.5% |
i | 5011 | 6.8% |
e | 4806 | 6.5% |
t | 3777 | 5.1% |
r | 3773 | 5.1% |
u | 3145 | 4.3% |
s | 2989 | 4.0% |
l | 2954 | 4.0% |
Other values (47) | 25681 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 59417 | |
Uppercase Letter | 11358 | 15.4% |
Space Separator | 2338 | 3.2% |
Dash Punctuation | 280 | 0.4% |
Decimal Number | 279 | 0.4% |
Other Punctuation | 246 | 0.3% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 9067 | |
n | 6448 | |
o | 6267 | |
i | 5011 | |
e | 4806 | |
t | 3777 | 6.4% |
r | 3773 | 6.4% |
u | 3145 | 5.3% |
s | 2989 | 5.0% |
l | 2954 | 5.0% |
Other values (16) | 11180 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 1320 | |
C | 1208 | 10.6% |
P | 1200 | 10.6% |
M | 978 | 8.6% |
B | 828 | 7.3% |
A | 674 | 5.9% |
K | 562 | 4.9% |
L | 559 | 4.9% |
R | 447 | 3.9% |
D | 437 | 3.8% |
Other values (15) | 3145 |
Other Punctuation
Value | Count | Frequency (%) |
, | 137 | |
. | 104 | |
' | 5 | 2.0% |
Space Separator
Value | Count | Frequency (%) |
2338 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 280 |
Decimal Number
Value | Count | Frequency (%) |
1 | 279 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 70775 | |
Common | 3143 | 4.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 9067 | |
n | 6448 | 9.1% |
o | 6267 | 8.9% |
i | 5011 | 7.1% |
e | 4806 | 6.8% |
t | 3777 | 5.3% |
r | 3773 | 5.3% |
u | 3145 | 4.4% |
s | 2989 | 4.2% |
l | 2954 | 4.2% |
Other values (41) | 22538 |
Common
Value | Count | Frequency (%) |
2338 | ||
- | 280 | 8.9% |
1 | 279 | 8.9% |
, | 137 | 4.4% |
. | 104 | 3.3% |
' | 5 | 0.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 73918 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 9067 | 12.3% |
n | 6448 | 8.7% |
o | 6267 | 8.5% |
i | 5011 | 6.8% |
e | 4806 | 6.5% |
t | 3777 | 5.1% |
r | 3773 | 5.1% |
u | 3145 | 4.3% |
s | 2989 | 4.0% |
l | 2954 | 4.0% |
Other values (47) | 25681 |
Distinct | 151 |
---|---|
Distinct (%) | 2.4% |
Missing | 4120 |
Missing (%) | 39.6% |
Memory size | 527.7 KiB |
-1 | |
---|---|
august 27 | |
august 30 | 284 |
september 8 | 231 |
september 6 | 224 |
Other values (146) |
Characters and Unicode
Total characters | 50050 |
---|---|
Distinct characters | 43 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 14 ? |
---|---|
Unique (%) | 0.2% |
Sample
1st row | august 13 |
---|---|
2nd row | august 13 |
3rd row | august 13 |
4th row | august 17 |
5th row | august 17 |
Common Values
Value | Count | Frequency (%) |
-1 | 567 | 5.4% |
august 27 | 495 | 4.8% |
august 30 | 284 | 2.7% |
september 8 | 231 | 2.2% |
september 6 | 224 | 2.2% |
september 2 | 200 | 1.9% |
august 17 | 187 | 1.8% |
september 12 | 181 | 1.7% |
august 15 | 178 | 1.7% |
august 25 | 175 | 1.7% |
Other values (141) | 3565 | |
(Missing) | 4120 |
Value | Count | Frequency (%) |
august | 2204 | |
september | 1741 | |
sept | 846 | 7.4% |
1 | 764 | 6.7% |
27 | 565 | 4.9% |
8 | 434 | 3.8% |
6 | 381 | 3.3% |
30 | 364 | 3.2% |
12 | 356 | 3.1% |
2 | 322 | 2.8% |
Other values (38) | 3488 |
Most occurring characters
Value | Count | Frequency (%) |
e | 6223 | |
5178 | ||
s | 4871 | |
t | 4856 | |
u | 4639 | 9.3% |
1 | 2909 | 5.8% |
p | 2641 | 5.3% |
a | 2461 | 4.9% |
g | 2405 | 4.8% |
2 | 2194 | 4.4% |
Other values (33) | 11673 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 33570 | |
Decimal Number | 9902 | 19.8% |
Space Separator | 5178 | 10.3% |
Uppercase Letter | 621 | 1.2% |
Dash Punctuation | 567 | 1.1% |
Currency Symbol | 212 | 0.4% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 6223 | |
s | 4871 | |
t | 4856 | |
u | 4639 | |
p | 2641 | |
a | 2461 | 7.3% |
g | 2405 | 7.2% |
r | 1785 | 5.3% |
m | 1756 | 5.2% |
b | 1741 | 5.2% |
Other values (7) | 192 | 0.6% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 146 | |
N | 146 | |
I | 146 | |
T | 71 | |
G | 36 | 5.8% |
L | 30 | 4.8% |
Y | 10 | 1.6% |
E | 10 | 1.6% |
W | 6 | 1.0% |
S | 5 | 0.8% |
Other values (3) | 15 | 2.4% |
Decimal Number
Value | Count | Frequency (%) |
1 | 2909 | |
2 | 2194 | |
7 | 1019 | 10.3% |
3 | 855 | 8.6% |
6 | 664 | 6.7% |
8 | 645 | 6.5% |
0 | 638 | 6.4% |
5 | 445 | 4.5% |
9 | 271 | 2.7% |
4 | 262 | 2.6% |
Space Separator
Value | Count | Frequency (%) |
5178 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 567 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 212 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 34191 | |
Common | 15859 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 6223 | |
s | 4871 | |
t | 4856 | |
u | 4639 | |
p | 2641 | |
a | 2461 | 7.2% |
g | 2405 | 7.0% |
r | 1785 | 5.2% |
m | 1756 | 5.1% |
b | 1741 | 5.1% |
Other values (20) | 813 | 2.4% |
Common
Value | Count | Frequency (%) |
5178 | ||
1 | 2909 | |
2 | 2194 | |
7 | 1019 | 6.4% |
3 | 855 | 5.4% |
6 | 664 | 4.2% |
8 | 645 | 4.1% |
0 | 638 | 4.0% |
- | 567 | 3.6% |
5 | 445 | 2.8% |
Other values (3) | 745 | 4.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 50050 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 6223 | |
5178 | ||
s | 4871 | |
t | 4856 | |
u | 4639 | 9.3% |
1 | 2909 | 5.8% |
p | 2641 | 5.3% |
a | 2461 | 4.9% |
g | 2405 | 4.8% |
2 | 2194 | 4.4% |
Other values (33) | 11673 |
Distinct | 155 |
---|---|
Distinct (%) | 2.1% |
Missing | 2977 |
Missing (%) | 28.6% |
Memory size | 564.8 KiB |
-1 | |
---|---|
august 27 | |
august 30 | 327 |
september 8 | 273 |
september 6 | 268 |
Other values (150) |
Characters and Unicode
Total characters | 59436 |
---|---|
Distinct characters | 43 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 16 ? |
---|---|
Unique (%) | 0.2% |
Sample
1st row | august 13 |
---|---|
2nd row | august 13 |
3rd row | august 13 |
4th row | august 13 |
5th row | august 17 |
Common Values
Value | Count | Frequency (%) |
-1 | 655 | 6.3% |
august 27 | 573 | 5.5% |
august 30 | 327 | 3.1% |
september 8 | 273 | 2.6% |
september 6 | 268 | 2.6% |
september 2 | 249 | 2.4% |
august 17 | 216 | 2.1% |
september 12 | 209 | 2.0% |
august 25 | 202 | 1.9% |
august 15 | 196 | 1.9% |
Other values (145) | 4262 | |
(Missing) | 2977 |
Value | Count | Frequency (%) |
august | 2578 | |
september | 2112 | |
sept | 1008 | 7.4% |
1 | 897 | 6.6% |
27 | 652 | 4.8% |
8 | 502 | 3.7% |
6 | 455 | 3.4% |
30 | 417 | 3.1% |
12 | 409 | 3.0% |
2 | 395 | 2.9% |
Other values (39) | 4146 |
Most occurring characters
Value | Count | Frequency (%) |
e | 7523 | |
6141 | ||
s | 5796 | |
t | 5770 | |
u | 5416 | 9.1% |
1 | 3419 | 5.8% |
p | 3187 | 5.4% |
a | 2867 | 4.8% |
g | 2806 | 4.7% |
2 | 2610 | 4.4% |
Other values (33) | 13901 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 39984 | |
Decimal Number | 11696 | 19.7% |
Space Separator | 6141 | 10.3% |
Uppercase Letter | 716 | 1.2% |
Dash Punctuation | 655 | 1.1% |
Currency Symbol | 244 | 0.4% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 7523 | |
s | 5796 | |
t | 5770 | |
u | 5416 | |
p | 3187 | |
a | 2867 | 7.2% |
g | 2806 | 7.0% |
r | 2162 | 5.4% |
m | 2128 | 5.3% |
b | 2112 | 5.3% |
Other values (7) | 217 | 0.5% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 166 | |
N | 166 | |
I | 166 | |
T | 84 | |
G | 43 | 6.0% |
L | 35 | 4.9% |
Y | 12 | 1.7% |
E | 12 | 1.7% |
W | 7 | 1.0% |
A | 7 | 1.0% |
Other values (3) | 18 | 2.5% |
Decimal Number
Value | Count | Frequency (%) |
1 | 3419 | |
2 | 2610 | |
7 | 1184 | 10.1% |
3 | 1011 | 8.6% |
6 | 788 | 6.7% |
8 | 759 | 6.5% |
0 | 751 | 6.4% |
5 | 506 | 4.3% |
4 | 338 | 2.9% |
9 | 330 | 2.8% |
Space Separator
Value | Count | Frequency (%) |
6141 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 655 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 244 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 40700 | |
Common | 18736 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 7523 | |
s | 5796 | |
t | 5770 | |
u | 5416 | |
p | 3187 | |
a | 2867 | 7.0% |
g | 2806 | 6.9% |
r | 2162 | 5.3% |
m | 2128 | 5.2% |
b | 2112 | 5.2% |
Other values (20) | 933 | 2.3% |
Common
Value | Count | Frequency (%) |
6141 | ||
1 | 3419 | |
2 | 2610 | |
7 | 1184 | 6.3% |
3 | 1011 | 5.4% |
6 | 788 | 4.2% |
8 | 759 | 4.1% |
0 | 751 | 4.0% |
- | 655 | 3.5% |
5 | 506 | 2.7% |
Other values (3) | 912 | 4.9% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 59436 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 7523 | |
6141 | ||
s | 5796 | |
t | 5770 | |
u | 5416 | 9.1% |
1 | 3419 | 5.8% |
p | 3187 | 5.4% |
a | 2867 | 4.8% |
g | 2806 | 4.7% |
2 | 2610 | 4.4% |
Other values (33) | 13901 |
Distinct | 129 |
---|---|
Distinct (%) | 2.7% |
Missing | 5620 |
Missing (%) | 54.0% |
Memory size | 472.1 KiB |
-1 | 344 |
---|---|
september 5 | 118 |
17 | 117 |
21 | 114 |
september 7 | 113 |
Other values (124) |
Characters and Unicode
Total characters | 30561 |
---|---|
Distinct characters | 33 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 4 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | august 22 |
---|---|
2nd row | august 22 |
3rd row | august 24 |
4th row | august 24 |
5th row | august 24 |
Common Values
Value | Count | Frequency (%) |
-1 | 344 | 3.3% |
september 5 | 118 | 1.1% |
17 | 117 | 1.1% |
21 | 114 | 1.1% |
september 7 | 113 | 1.1% |
19 | 109 | 1.0% |
september 11 | 104 | 1.0% |
september 3 | 102 | 1.0% |
august 31 | 101 | 1.0% |
september 14 | 99 | 1.0% |
Other values (119) | 3466 | |
(Missing) | 5620 |
Value | Count | Frequency (%) |
september | 1708 | |
1 | 447 | 6.1% |
august | 391 | 5.3% |
sept | 357 | 4.8% |
17 | 199 | 2.7% |
19 | 186 | 2.5% |
11 | 178 | 2.4% |
15 | 178 | 2.4% |
7 | 174 | 2.4% |
28 | 170 | 2.3% |
Other values (31) | 3388 |
Most occurring characters
Value | Count | Frequency (%) |
e | 5593 | |
2589 | ||
s | 2571 | |
1 | 2479 | |
t | 2474 | |
p | 2174 | 7.1% |
2 | 1787 | 5.8% |
r | 1717 | 5.6% |
b | 1711 | 5.6% |
m | 1708 | 5.6% |
Other values (23) | 5758 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 19599 | |
Decimal Number | 8011 | |
Space Separator | 2589 | 8.5% |
Dash Punctuation | 344 | 1.1% |
Uppercase Letter | 15 | < 0.1% |
Currency Symbol | 3 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 5593 | |
s | 2571 | |
t | 2474 | |
p | 2174 | 11.1% |
r | 1717 | 8.8% |
b | 1711 | 8.7% |
m | 1708 | 8.7% |
u | 800 | 4.1% |
a | 421 | 2.1% |
g | 403 | 2.1% |
Other values (6) | 27 | 0.1% |
Decimal Number
Value | Count | Frequency (%) |
1 | 2479 | |
2 | 1787 | |
3 | 696 | 8.7% |
7 | 518 | 6.5% |
5 | 456 | 5.7% |
8 | 442 | 5.5% |
0 | 433 | 5.4% |
9 | 423 | 5.3% |
4 | 404 | 5.0% |
6 | 373 | 4.7% |
Uppercase Letter
Value | Count | Frequency (%) |
S | 6 | |
O | 3 | |
L | 3 | |
T | 3 |
Space Separator
Value | Count | Frequency (%) |
2589 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 344 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 3 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 19614 | |
Common | 10947 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 5593 | |
s | 2571 | |
t | 2474 | |
p | 2174 | 11.1% |
r | 1717 | 8.8% |
b | 1711 | 8.7% |
m | 1708 | 8.7% |
u | 800 | 4.1% |
a | 421 | 2.1% |
g | 403 | 2.1% |
Other values (10) | 42 | 0.2% |
Common
Value | Count | Frequency (%) |
2589 | ||
1 | 2479 | |
2 | 1787 | |
3 | 696 | 6.4% |
7 | 518 | 4.7% |
5 | 456 | 4.2% |
8 | 442 | 4.0% |
0 | 433 | 4.0% |
9 | 423 | 3.9% |
4 | 404 | 3.7% |
Other values (3) | 720 | 6.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 30561 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 5593 | |
2589 | ||
s | 2571 | |
1 | 2479 | |
t | 2474 | |
p | 2174 | 7.1% |
2 | 1787 | 5.8% |
r | 1717 | 5.6% |
b | 1711 | 5.6% |
m | 1708 | 5.6% |
Other values (23) | 5758 |
Distinct | 131 |
---|---|
Distinct (%) | 2.3% |
Missing | 4673 |
Missing (%) | 44.9% |
Memory size | 500.9 KiB |
-1 | 404 |
---|---|
september 5 | 142 |
17 | 141 |
21 | 133 |
september 7 | 133 |
Other values (126) |
Characters and Unicode
Total characters | 36451 |
---|---|
Distinct characters | 33 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 3 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | august 22 |
---|---|
2nd row | august 22 |
3rd row | august 22 |
4th row | august 24 |
5th row | august 24 |
Common Values
Value | Count | Frequency (%) |
-1 | 404 | 3.9% |
september 5 | 142 | 1.4% |
17 | 141 | 1.4% |
21 | 133 | 1.3% |
september 7 | 133 | 1.3% |
19 | 129 | 1.2% |
september 3 | 121 | 1.2% |
august 31 | 120 | 1.2% |
september 14 | 119 | 1.1% |
september 11 | 118 | 1.1% |
Other values (121) | 4174 | |
(Missing) | 4673 |
Value | Count | Frequency (%) |
september | 2021 | |
1 | 529 | 6.0% |
august | 483 | 5.5% |
sept | 432 | 4.9% |
17 | 235 | 2.7% |
19 | 216 | 2.4% |
11 | 213 | 2.4% |
7 | 211 | 2.4% |
15 | 211 | 2.4% |
28 | 203 | 2.3% |
Other values (31) | 4067 |
Most occurring characters
Value | Count | Frequency (%) |
e | 6620 | |
3087 | ||
s | 3064 | |
1 | 2963 | |
t | 2958 | |
p | 2574 | 7.1% |
2 | 2150 | 5.9% |
r | 2032 | 5.6% |
b | 2025 | 5.6% |
m | 2021 | 5.5% |
Other values (23) | 6957 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 23332 | |
Decimal Number | 9605 | |
Space Separator | 3087 | 8.5% |
Dash Punctuation | 404 | 1.1% |
Uppercase Letter | 19 | 0.1% |
Currency Symbol | 4 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 6620 | |
s | 3064 | |
t | 2958 | |
p | 2574 | 11.0% |
r | 2032 | 8.7% |
b | 2025 | 8.7% |
m | 2021 | 8.7% |
u | 988 | 4.2% |
a | 519 | 2.2% |
g | 498 | 2.1% |
Other values (6) | 33 | 0.1% |
Decimal Number
Value | Count | Frequency (%) |
1 | 2963 | |
2 | 2150 | |
3 | 846 | 8.8% |
7 | 627 | 6.5% |
5 | 544 | 5.7% |
8 | 529 | 5.5% |
0 | 502 | 5.2% |
4 | 500 | 5.2% |
9 | 496 | 5.2% |
6 | 448 | 4.7% |
Uppercase Letter
Value | Count | Frequency (%) |
S | 7 | |
O | 4 | |
L | 4 | |
T | 4 |
Space Separator
Value | Count | Frequency (%) |
3087 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 404 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 4 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 23351 | |
Common | 13100 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 6620 | |
s | 3064 | |
t | 2958 | |
p | 2574 | 11.0% |
r | 2032 | 8.7% |
b | 2025 | 8.7% |
m | 2021 | 8.7% |
u | 988 | 4.2% |
a | 519 | 2.2% |
g | 498 | 2.1% |
Other values (10) | 52 | 0.2% |
Common
Value | Count | Frequency (%) |
3087 | ||
1 | 2963 | |
2 | 2150 | |
3 | 846 | 6.5% |
7 | 627 | 4.8% |
5 | 544 | 4.2% |
8 | 529 | 4.0% |
0 | 502 | 3.8% |
4 | 500 | 3.8% |
9 | 496 | 3.8% |
Other values (3) | 856 | 6.5% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 36451 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 6620 | |
3087 | ||
s | 3064 | |
1 | 2963 | |
t | 2958 | |
p | 2574 | 7.1% |
2 | 2150 | 5.9% |
r | 2032 | 5.6% |
b | 2025 | 5.6% |
m | 2021 | 5.5% |
Other values (23) | 6957 |
Distinct | 225 |
---|---|
Distinct (%) | 4.3% |
Missing | 5152 |
Missing (%) | 49.5% |
Memory size | 479.1 KiB |
-1 | |
---|---|
3300.0 | 111 |
4000.0 | 107 |
3200.0 | 104 |
3500.0 | 93 |
Other values (220) |
Characters and Unicode
Total characters | 26085 |
---|---|
Distinct characters | 38 |
Distinct categories | 7 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 30 ? |
---|---|
Unique (%) | 0.6% |
Sample
1st row | 1700.0 |
---|---|
2nd row | 1900.0 |
3rd row | 1700.0 |
4th row | 2100.0 |
5th row | 2100.0 |
Common Values
Value | Count | Frequency (%) |
-1 | 1469 | 14.1% |
3300.0 | 111 | 1.1% |
4000.0 | 107 | 1.0% |
3200.0 | 104 | 1.0% |
3500.0 | 93 | 0.9% |
400.0 | 92 | 0.9% |
3100.0 | 89 | 0.9% |
2900.0 | 84 | 0.8% |
1900.0 | 83 | 0.8% |
4300.0 | 72 | 0.7% |
Other values (215) | 2951 | |
(Missing) | 5152 |
Value | Count | Frequency (%) |
1 | 1469 | |
3300.0 | 112 | 2.1% |
4000.0 | 107 | 2.0% |
3200.0 | 104 | 1.9% |
3100.0 | 96 | 1.8% |
3500.0 | 93 | 1.7% |
400.0 | 92 | 1.7% |
2900.0 | 84 | 1.6% |
1900.0 | 83 | 1.5% |
2000.0 | 73 | 1.4% |
Other values (215) | 3047 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 11049 | |
. | 3671 | 14.1% |
1 | 2634 | 10.1% |
- | 1469 | 5.6% |
3 | 1364 | 5.2% |
2 | 1209 | 4.6% |
4 | 1112 | 4.3% |
5 | 671 | 2.6% |
6 | 586 | 2.2% |
9 | 484 | 1.9% |
Other values (28) | 1836 | 7.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 20012 | |
Other Punctuation | 3671 | 14.1% |
Dash Punctuation | 1469 | 5.6% |
Uppercase Letter | 455 | 1.7% |
Lowercase Letter | 192 | 0.7% |
Currency Symbol | 181 | 0.7% |
Space Separator | 105 | 0.4% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 36 | |
s | 22 | |
n | 17 | |
v | 16 | |
i | 15 | |
a | 15 | |
l | 15 | |
p | 8 | 4.2% |
m | 8 | 4.2% |
y | 8 | 4.2% |
Other values (6) | 32 |
Decimal Number
Value | Count | Frequency (%) |
0 | 11049 | |
1 | 2634 | 13.2% |
3 | 1364 | 6.8% |
2 | 1209 | 6.0% |
4 | 1112 | 5.6% |
5 | 671 | 3.4% |
6 | 586 | 2.9% |
9 | 484 | 2.4% |
7 | 465 | 2.3% |
8 | 438 | 2.2% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 93 | |
T | 88 | |
N | 71 | |
I | 71 | |
L | 65 | |
G | 23 | 5.1% |
A | 22 | 4.8% |
X | 22 | 4.8% |
Other Punctuation
Value | Count | Frequency (%) |
. | 3671 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 1469 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 181 |
Space Separator
Value | Count | Frequency (%) |
105 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 25438 | |
Latin | 647 | 2.5% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
M | 93 | |
T | 88 | |
N | 71 | |
I | 71 | |
L | 65 | |
e | 36 | 5.6% |
G | 23 | 3.6% |
A | 22 | 3.4% |
s | 22 | 3.4% |
X | 22 | 3.4% |
Other values (14) | 134 |
Common
Value | Count | Frequency (%) |
0 | 11049 | |
. | 3671 | 14.4% |
1 | 2634 | 10.4% |
- | 1469 | 5.8% |
3 | 1364 | 5.4% |
2 | 1209 | 4.8% |
4 | 1112 | 4.4% |
5 | 671 | 2.6% |
6 | 586 | 2.3% |
9 | 484 | 1.9% |
Other values (4) | 1189 | 4.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 26085 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 11049 | |
. | 3671 | 14.1% |
1 | 2634 | 10.1% |
- | 1469 | 5.6% |
3 | 1364 | 5.2% |
2 | 1209 | 4.6% |
4 | 1112 | 4.3% |
5 | 671 | 2.6% |
6 | 586 | 2.2% |
9 | 484 | 1.9% |
Other values (28) | 1836 | 7.0% |
Distinct | 228 |
---|---|
Distinct (%) | 3.7% |
Missing | 4178 |
Missing (%) | 40.1% |
Memory size | 507.8 KiB |
-1 | |
---|---|
3300.0 | 136 |
3200.0 | 124 |
4000.0 | 123 |
400.0 | 115 |
Other values (223) |
Characters and Unicode
Total characters | 31074 |
---|---|
Distinct characters | 38 |
Distinct categories | 7 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 24 ? |
---|---|
Unique (%) | 0.4% |
Sample
1st row | 1700.0 |
---|---|
2nd row | 1900.0 |
3rd row | 1700.0 |
4th row | 1700.0 |
5th row | 2100.0 |
Common Values
Value | Count | Frequency (%) |
-1 | 1704 | |
3300.0 | 136 | 1.3% |
3200.0 | 124 | 1.2% |
4000.0 | 123 | 1.2% |
400.0 | 115 | 1.1% |
3500.0 | 114 | 1.1% |
3100.0 | 107 | 1.0% |
2900.0 | 106 | 1.0% |
1900.0 | 95 | 0.9% |
4300.0 | 85 | 0.8% |
Other values (218) | 3520 | |
(Missing) | 4178 |
Value | Count | Frequency (%) |
1 | 1704 | |
3300.0 | 138 | 2.2% |
3200.0 | 124 | 2.0% |
4000.0 | 123 | 1.9% |
400.0 | 115 | 1.8% |
3100.0 | 115 | 1.8% |
3500.0 | 114 | 1.8% |
2900.0 | 106 | 1.7% |
1900.0 | 95 | 1.5% |
2000.0 | 90 | 1.4% |
Other values (218) | 3631 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 13218 | |
. | 4391 | 14.1% |
1 | 3084 | 9.9% |
- | 1704 | 5.5% |
3 | 1629 | 5.2% |
2 | 1466 | 4.7% |
4 | 1319 | 4.2% |
5 | 801 | 2.6% |
6 | 696 | 2.2% |
9 | 580 | 1.9% |
Other values (28) | 2186 | 7.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 23879 | |
Other Punctuation | 4391 | 14.1% |
Dash Punctuation | 1704 | 5.5% |
Uppercase Letter | 541 | 1.7% |
Lowercase Letter | 217 | 0.7% |
Currency Symbol | 216 | 0.7% |
Space Separator | 126 | 0.4% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 41 | |
s | 25 | |
n | 19 | |
v | 18 | |
i | 17 | |
a | 17 | |
l | 17 | |
p | 9 | 4.1% |
m | 9 | 4.1% |
y | 9 | 4.1% |
Other values (6) | 36 |
Decimal Number
Value | Count | Frequency (%) |
0 | 13218 | |
1 | 3084 | 12.9% |
3 | 1629 | 6.8% |
2 | 1466 | 6.1% |
4 | 1319 | 5.5% |
5 | 801 | 3.4% |
6 | 696 | 2.9% |
9 | 580 | 2.4% |
7 | 558 | 2.3% |
8 | 528 | 2.2% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 109 | |
T | 107 | |
N | 83 | |
I | 83 | |
L | 78 | |
G | 29 | 5.4% |
A | 26 | 4.8% |
X | 26 | 4.8% |
Other Punctuation
Value | Count | Frequency (%) |
. | 4391 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 1704 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 216 |
Space Separator
Value | Count | Frequency (%) |
126 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 30316 | |
Latin | 758 | 2.4% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
M | 109 | |
T | 107 | |
N | 83 | |
I | 83 | |
L | 78 | |
e | 41 | 5.4% |
G | 29 | 3.8% |
A | 26 | 3.4% |
X | 26 | 3.4% |
s | 25 | 3.3% |
Other values (14) | 151 |
Common
Value | Count | Frequency (%) |
0 | 13218 | |
. | 4391 | 14.5% |
1 | 3084 | 10.2% |
- | 1704 | 5.6% |
3 | 1629 | 5.4% |
2 | 1466 | 4.8% |
4 | 1319 | 4.4% |
5 | 801 | 2.6% |
6 | 696 | 2.3% |
9 | 580 | 1.9% |
Other values (4) | 1428 | 4.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 31074 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 13218 | |
. | 4391 | 14.1% |
1 | 3084 | 9.9% |
- | 1704 | 5.5% |
3 | 1629 | 5.2% |
2 | 1466 | 4.7% |
4 | 1319 | 4.2% |
5 | 801 | 2.6% |
6 | 696 | 2.2% |
9 | 580 | 1.9% |
Other values (28) | 2186 | 7.0% |