Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 10407 |
| Missing cells | 32503 |
| Missing cells (%) | 28.4% |
| Total size in memory | 6.5 MiB |
| Average record size in memory | 655.2 B |
Variable types
| Categorical | 11 |
|---|
0 has a high cardinality: 9695 distinct values | High cardinality |
1 has a high cardinality: 332 distinct values | High cardinality |
2 has a high cardinality: 339 distinct values | High cardinality |
3 has a high cardinality: 382 distinct values | High cardinality |
4 has a high cardinality: 392 distinct values | High cardinality |
5 has a high cardinality: 151 distinct values | High cardinality |
6 has a high cardinality: 155 distinct values | High cardinality |
7 has a high cardinality: 129 distinct values | High cardinality |
8 has a high cardinality: 131 distinct values | High cardinality |
9 has a high cardinality: 225 distinct values | High cardinality |
10 has a high cardinality: 228 distinct values | High cardinality |
1 has 2120 (20.4%) missing values | Missing |
2 has 787 (7.6%) missing values | Missing |
3 has 2100 (20.2%) missing values | Missing |
4 has 776 (7.5%) missing values | Missing |
5 has 4120 (39.6%) missing values | Missing |
6 has 2977 (28.6%) missing values | Missing |
7 has 5620 (54.0%) missing values | Missing |
8 has 4673 (44.9%) missing values | Missing |
9 has 5152 (49.5%) missing values | Missing |
10 has 4178 (40.1%) missing values | Missing |
Reproduction
| Analysis started | 2022-05-16 16:20:51.392335 |
|---|---|
| Analysis finished | 2022-05-16 16:20:52.078447 |
| Duration | 0.69 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 9695 |
|---|---|
| Distinct (%) | 93.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 MiB |
| Thanks! | 73 |
|---|---|
| Thank you! | 58 |
| yes | 31 |
| Thanks | 26 |
| thanks! | 24 |
| Other values (9690) |
Characters and Unicode
| Total characters | 544786 |
|---|---|
| Distinct characters | 94 |
| Distinct categories | 15 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 3 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 9480 ? |
|---|---|
| Unique (%) | 91.1% |
Sample
| 1st row | I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700. |
|---|---|
| 2nd row | Yes, how about going to Neverland from Caprica on August 13, 2016 for 5 adults. For this trip, my budget would be 1900. |
| 3rd row | I have no flexibility for dates... but I can leave from Atlantis rather than Caprica. How about that? |
| 4th row | I suppose I'll speak with my husband to see if we can choose other dates, and then I'll come back to you.Thanks for your help |
| 5th row | Hello, I am looking to book a vacation from Gotham City to Mos Eisley for $2100. |
Common Values
| Value | Count | Frequency (%) |
| Thanks! | 73 | 0.7% |
| Thank you! | 58 | 0.6% |
| yes | 31 | 0.3% |
| Thanks | 26 | 0.2% |
| thanks! | 24 | 0.2% |
| no | 23 | 0.2% |
| thanks | 21 | 0.2% |
| Thank you | 20 | 0.2% |
| thank you! | 13 | 0.1% |
| thank you | 13 | 0.1% |
| Other values (9685) | 10105 |
| Value | Count | Frequency (%) |
| to | 4570 | 4.2% |
| i | 4053 | 3.8% |
| the | 3088 | 2.9% |
| you | 2078 | 1.9% |
| a | 1844 | 1.7% |
| and | 1778 | 1.6% |
| for | 1709 | 1.6% |
| is | 1436 | 1.3% |
| me | 1420 | 1.3% |
| from | 1327 | 1.2% |
| Other values (4396) | 84587 |
Most occurring characters
| Value | Count | Frequency (%) |
| 97131 | ||
| e | 46090 | 8.5% |
| t | 39848 | 7.3% |
| o | 37111 | 6.8% |
| a | 34531 | 6.3% |
| n | 25884 | 4.8% |
| i | 23774 | 4.4% |
| s | 21881 | 4.0% |
| h | 19730 | 3.6% |
| r | 18671 | 3.4% |
| Other values (84) | 180135 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 392489 | |
| Space Separator | 97131 | 17.8% |
| Uppercase Letter | 27189 | 5.0% |
| Other Punctuation | 17702 | 3.2% |
| Decimal Number | 8509 | 1.6% |
| Control | 535 | 0.1% |
| Final Punctuation | 508 | 0.1% |
| Currency Symbol | 250 | < 0.1% |
| Dash Punctuation | 231 | < 0.1% |
| Connector Punctuation | 174 | < 0.1% |
| Other values (5) | 68 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 46090 | |
| t | 39848 | 10.2% |
| o | 37111 | 9.5% |
| a | 34531 | 8.8% |
| n | 25884 | 6.6% |
| i | 23774 | 6.1% |
| s | 21881 | 5.6% |
| h | 19730 | 5.0% |
| r | 18671 | 4.8% |
| l | 18137 | 4.6% |
| Other values (17) | 106832 |
Uppercase Letter
| Value | Count | Frequency (%) |
| I | 5179 | |
| T | 2191 | 8.1% |
| A | 1983 | 7.3% |
| S | 1898 | 7.0% |
| H | 1544 | 5.7% |
| W | 1541 | 5.7% |
| O | 1417 | 5.2% |
| C | 1229 | 4.5% |
| E | 1075 | 4.0% |
| N | 1040 | 3.8% |
| Other values (16) | 8092 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 6004 | |
| ? | 4028 | |
| , | 2540 | |
| ! | 2481 | |
| ' | 2028 | 11.5% |
| : | 456 | 2.6% |
| … | 81 | 0.5% |
| / | 39 | 0.2% |
| " | 15 | 0.1% |
| ; | 12 | 0.1% |
| Other values (4) | 18 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2137 | |
| 1 | 1434 | |
| 2 | 1275 | |
| 3 | 815 | 9.6% |
| 5 | 642 | 7.5% |
| 4 | 572 | 6.7% |
| 7 | 483 | 5.7% |
| 6 | 459 | 5.4% |
| 8 | 368 | 4.3% |
| 9 | 324 | 3.8% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 13 | |
| ~ | 3 | 15.8% |
| | | 2 | 10.5% |
| = | 1 | 5.3% |
Control
| Value | Count | Frequency (%) |
| 533 | ||
| 2 | 0.4% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 230 | |
| — | 1 | 0.4% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 24 | |
| ] | 1 | 4.0% |
Space Separator
| Value | Count | Frequency (%) |
| 97131 |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 508 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 250 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 174 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 21 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ` | 2 |
Initial Punctuation
| Value | Count | Frequency (%) |
| “ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 419678 | |
| Common | 125108 | 23.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 46090 | 11.0% |
| t | 39848 | 9.5% |
| o | 37111 | 8.8% |
| a | 34531 | 8.2% |
| n | 25884 | 6.2% |
| i | 23774 | 5.7% |
| s | 21881 | 5.2% |
| h | 19730 | 4.7% |
| r | 18671 | 4.4% |
| l | 18137 | 4.3% |
| Other values (43) | 134021 |
Common
| Value | Count | Frequency (%) |
| 97131 | ||
| . | 6004 | 4.8% |
| ? | 4028 | 3.2% |
| , | 2540 | 2.0% |
| ! | 2481 | 2.0% |
| 0 | 2137 | 1.7% |
| ' | 2028 | 1.6% |
| 1 | 1434 | 1.1% |
| 2 | 1275 | 1.0% |
| 3 | 815 | 0.7% |
| Other values (31) | 5235 | 4.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 544193 | |
| Punctuation | 591 | 0.1% |
| None | 2 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 97131 | ||
| e | 46090 | 8.5% |
| t | 39848 | 7.3% |
| o | 37111 | 6.8% |
| a | 34531 | 6.3% |
| n | 25884 | 4.8% |
| i | 23774 | 4.4% |
| s | 21881 | 4.0% |
| h | 19730 | 3.6% |
| r | 18671 | 3.4% |
| Other values (79) | 179542 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 508 | |
| … | 81 | 13.7% |
| — | 1 | 0.2% |
| “ | 1 | 0.2% |
None
| Value | Count | Frequency (%) |
| é | 2 |
| Distinct | 332 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 2120 |
| Missing (%) | 20.4% |
| Memory size | 591.2 KiB |
| -1 | 158 |
|---|---|
| Punta Cana | 140 |
| Tijuana | 126 |
| Toronto | 123 |
| Calgary | 101 |
| Other values (327) |
Characters and Unicode
| Total characters | 65103 |
|---|---|
| Distinct characters | 58 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 28 ? |
|---|---|
| Unique (%) | 0.3% |
Sample
| 1st row | Caprica |
|---|---|
| 2nd row | Caprica |
| 3rd row | Atlantis |
| 4th row | Gotham City |
| 5th row | Gotham City |
Common Values
| Value | Count | Frequency (%) |
| -1 | 158 | 1.5% |
| Punta Cana | 140 | 1.3% |
| Tijuana | 126 | 1.2% |
| Toronto | 123 | 1.2% |
| Calgary | 101 | 1.0% |
| Busan | 100 | 1.0% |
| St. Petersburg | 99 | 1.0% |
| Fukuoka | 89 | 0.9% |
| Nagoya | 85 | 0.8% |
| Burlington | 85 | 0.8% |
| Other values (322) | 7181 | |
| (Missing) | 2120 | 20.4% |
| Value | Count | Frequency (%) |
| san | 296 | 2.9% |
| punta | 178 | 1.8% |
| cana | 178 | 1.8% |
| 1 | 158 | 1.6% |
| st | 155 | 1.5% |
| toronto | 151 | 1.5% |
| tijuana | 144 | 1.4% |
| vancouver | 117 | 1.2% |
| petersburg | 113 | 1.1% |
| city | 109 | 1.1% |
| Other values (192) | 8518 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 7594 | 11.7% |
| n | 5685 | 8.7% |
| o | 5615 | 8.6% |
| e | 4596 | 7.1% |
| i | 4472 | 6.9% |
| r | 3393 | 5.2% |
| t | 3349 | 5.1% |
| l | 2945 | 4.5% |
| u | 2601 | 4.0% |
| s | 2503 | 3.8% |
| Other values (48) | 22350 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 53240 | |
| Uppercase Letter | 9496 | 14.6% |
| Space Separator | 1830 | 2.8% |
| Other Punctuation | 221 | 0.3% |
| Dash Punctuation | 158 | 0.2% |
| Decimal Number | 158 | 0.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 7594 | |
| n | 5685 | |
| o | 5615 | |
| e | 4596 | |
| i | 4472 | |
| r | 3393 | 6.4% |
| t | 3349 | 6.3% |
| l | 2945 | 5.5% |
| u | 2601 | 4.9% |
| s | 2503 | 4.7% |
| Other values (16) | 10487 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1220 | |
| B | 913 | 9.6% |
| C | 889 | 9.4% |
| P | 752 | 7.9% |
| M | 727 | 7.7% |
| T | 657 | 6.9% |
| L | 568 | 6.0% |
| A | 480 | 5.1% |
| D | 375 | 3.9% |
| R | 331 | 3.5% |
| Other values (16) | 2584 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 155 | |
| , | 60 | 27.1% |
| ' | 6 | 2.7% |
Space Separator
| Value | Count | Frequency (%) |
| 1830 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 158 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 158 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 62736 | |
| Common | 2367 | 3.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 7594 | 12.1% |
| n | 5685 | 9.1% |
| o | 5615 | 9.0% |
| e | 4596 | 7.3% |
| i | 4472 | 7.1% |
| r | 3393 | 5.4% |
| t | 3349 | 5.3% |
| l | 2945 | 4.7% |
| u | 2601 | 4.1% |
| s | 2503 | 4.0% |
| Other values (42) | 19983 |
Common
| Value | Count | Frequency (%) |
| 1830 | ||
| - | 158 | 6.7% |
| 1 | 158 | 6.7% |
| . | 155 | 6.5% |
| , | 60 | 2.5% |
| ' | 6 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 65103 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 7594 | 11.7% |
| n | 5685 | 8.7% |
| o | 5615 | 8.6% |
| e | 4596 | 7.1% |
| i | 4472 | 6.9% |
| r | 3393 | 5.2% |
| t | 3349 | 5.1% |
| l | 2945 | 4.5% |
| u | 2601 | 4.0% |
| s | 2503 | 3.8% |
| Other values (48) | 22350 |
| Distinct | 339 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 787 |
| Missing (%) | 7.6% |
| Memory size | 634.0 KiB |
| -1 | 174 |
|---|---|
| Punta Cana | 160 |
| Tijuana | 144 |
| Toronto | 140 |
| Calgary | 115 |
| Other values (334) |
Characters and Unicode
| Total characters | 75614 |
|---|---|
| Distinct characters | 58 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 29 ? |
|---|---|
| Unique (%) | 0.3% |
Sample
| 1st row | Caprica |
|---|---|
| 2nd row | Caprica |
| 3rd row | Atlantis |
| 4th row | Atlantis |
| 5th row | Gotham City |
Common Values
| Value | Count | Frequency (%) |
| -1 | 174 | 1.7% |
| Punta Cana | 160 | 1.5% |
| Tijuana | 144 | 1.4% |
| Toronto | 140 | 1.3% |
| Calgary | 115 | 1.1% |
| Busan | 113 | 1.1% |
| St. Petersburg | 113 | 1.1% |
| Fukuoka | 103 | 1.0% |
| Burlington | 101 | 1.0% |
| Beijing | 97 | 0.9% |
| Other values (329) | 8360 | |
| (Missing) | 787 | 7.6% |
| Value | Count | Frequency (%) |
| san | 346 | 2.9% |
| cana | 203 | 1.7% |
| punta | 203 | 1.7% |
| st | 176 | 1.5% |
| 1 | 174 | 1.5% |
| toronto | 174 | 1.5% |
| tijuana | 164 | 1.4% |
| vancouver | 134 | 1.1% |
| petersburg | 128 | 1.1% |
| calgary | 125 | 1.1% |
| Other values (193) | 9912 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 8788 | 11.6% |
| n | 6601 | 8.7% |
| o | 6512 | 8.6% |
| e | 5351 | 7.1% |
| i | 5221 | 6.9% |
| r | 3949 | 5.2% |
| t | 3889 | 5.1% |
| l | 3441 | 4.6% |
| u | 3014 | 4.0% |
| s | 2910 | 3.8% |
| Other values (48) | 25938 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 61847 | |
| Uppercase Letter | 11050 | 14.6% |
| Space Separator | 2119 | 2.8% |
| Other Punctuation | 250 | 0.3% |
| Dash Punctuation | 174 | 0.2% |
| Decimal Number | 174 | 0.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 8788 | |
| n | 6601 | |
| o | 6512 | |
| e | 5351 | |
| i | 5221 | |
| r | 3949 | 6.4% |
| t | 3889 | 6.3% |
| l | 3441 | 5.6% |
| u | 3014 | 4.9% |
| s | 2910 | 4.7% |
| Other values (16) | 12171 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1407 | |
| B | 1071 | 9.7% |
| C | 1023 | 9.3% |
| P | 870 | 7.9% |
| M | 842 | 7.6% |
| T | 767 | 6.9% |
| L | 657 | 5.9% |
| A | 573 | 5.2% |
| D | 432 | 3.9% |
| R | 385 | 3.5% |
| Other values (16) | 3023 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 176 | |
| , | 67 | 26.8% |
| ' | 7 | 2.8% |
Space Separator
| Value | Count | Frequency (%) |
| 2119 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 174 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 174 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 72897 | |
| Common | 2717 | 3.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 8788 | 12.1% |
| n | 6601 | 9.1% |
| o | 6512 | 8.9% |
| e | 5351 | 7.3% |
| i | 5221 | 7.2% |
| r | 3949 | 5.4% |
| t | 3889 | 5.3% |
| l | 3441 | 4.7% |
| u | 3014 | 4.1% |
| s | 2910 | 4.0% |
| Other values (42) | 23221 |
Common
| Value | Count | Frequency (%) |
| 2119 | ||
| . | 176 | 6.5% |
| - | 174 | 6.4% |
| 1 | 174 | 6.4% |
| , | 67 | 2.5% |
| ' | 7 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 75614 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 8788 | 11.6% |
| n | 6601 | 8.7% |
| o | 6512 | 8.6% |
| e | 5351 | 7.1% |
| i | 5221 | 6.9% |
| r | 3949 | 5.2% |
| t | 3889 | 5.1% |
| l | 3441 | 4.6% |
| u | 3014 | 4.0% |
| s | 2910 | 3.8% |
| Other values (48) | 25938 |
| Distinct | 382 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 2100 |
| Missing (%) | 20.2% |
| Memory size | 590.2 KiB |
| -1 | 257 |
|---|---|
| Punta Cana | 243 |
| Rome | 173 |
| Hamburg | 139 |
| Kingston | 135 |
| Other values (377) |
Characters and Unicode
| Total characters | 63554 |
|---|---|
| Distinct characters | 57 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 60 ? |
|---|---|
| Unique (%) | 0.7% |
Sample
| 1st row | Atlantis |
|---|---|
| 2nd row | Neverland |
| 3rd row | Atlantis |
| 4th row | Mos Eisley |
| 5th row | Neverland |
Common Values
| Value | Count | Frequency (%) |
| -1 | 257 | 2.5% |
| Punta Cana | 243 | 2.3% |
| Rome | 173 | 1.7% |
| Hamburg | 139 | 1.3% |
| Kingston | 135 | 1.3% |
| Mexico City | 132 | 1.3% |
| Kyoto | 130 | 1.2% |
| Ulsan | 125 | 1.2% |
| San Juan | 122 | 1.2% |
| Denver | 118 | 1.1% |
| Other values (372) | 6733 | |
| (Missing) | 2100 | 20.2% |
| Value | Count | Frequency (%) |
| san | 408 | 4.0% |
| cana | 275 | 2.7% |
| punta | 266 | 2.6% |
| 1 | 257 | 2.5% |
| rome | 206 | 2.0% |
| mexico | 189 | 1.8% |
| city | 180 | 1.7% |
| kyoto | 163 | 1.6% |
| kingston | 159 | 1.5% |
| paris | 159 | 1.5% |
| Other values (218) | 8045 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 7808 | 12.3% |
| n | 5537 | 8.7% |
| o | 5396 | 8.5% |
| i | 4327 | 6.8% |
| e | 4107 | 6.5% |
| t | 3237 | 5.1% |
| r | 3235 | 5.1% |
| u | 2700 | 4.2% |
| s | 2572 | 4.0% |
| l | 2531 | 4.0% |
| Other values (47) | 22104 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 51087 | |
| Uppercase Letter | 9747 | 15.3% |
| Space Separator | 2000 | 3.1% |
| Dash Punctuation | 258 | 0.4% |
| Decimal Number | 257 | 0.4% |
| Other Punctuation | 205 | 0.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 7808 | |
| n | 5537 | |
| o | 5396 | |
| i | 4327 | |
| e | 4107 | |
| t | 3237 | 6.3% |
| r | 3235 | 6.3% |
| u | 2700 | 5.3% |
| s | 2572 | 5.0% |
| l | 2531 | 5.0% |
| Other values (16) | 9637 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1128 | |
| C | 1034 | 10.6% |
| P | 1034 | 10.6% |
| M | 833 | 8.5% |
| B | 723 | 7.4% |
| A | 570 | 5.8% |
| K | 497 | 5.1% |
| L | 472 | 4.8% |
| R | 385 | 3.9% |
| D | 375 | 3.8% |
| Other values (15) | 2696 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 110 | |
| . | 91 | |
| ' | 4 | 2.0% |
Space Separator
| Value | Count | Frequency (%) |
| 2000 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 258 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 257 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 60834 | |
| Common | 2720 | 4.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 7808 | |
| n | 5537 | 9.1% |
| o | 5396 | 8.9% |
| i | 4327 | 7.1% |
| e | 4107 | 6.8% |
| t | 3237 | 5.3% |
| r | 3235 | 5.3% |
| u | 2700 | 4.4% |
| s | 2572 | 4.2% |
| l | 2531 | 4.2% |
| Other values (41) | 19384 |
Common
| Value | Count | Frequency (%) |
| 2000 | ||
| - | 258 | 9.5% |
| 1 | 257 | 9.4% |
| , | 110 | 4.0% |
| . | 91 | 3.3% |
| ' | 4 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 63554 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 7808 | 12.3% |
| n | 5537 | 8.7% |
| o | 5396 | 8.5% |
| i | 4327 | 6.8% |
| e | 4107 | 6.5% |
| t | 3237 | 5.1% |
| r | 3235 | 5.1% |
| u | 2700 | 4.2% |
| s | 2572 | 4.0% |
| l | 2531 | 4.0% |
| Other values (47) | 22104 |
| Distinct | 392 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 776 |
| Missing (%) | 7.5% |
| Memory size | 632.7 KiB |
| Punta Cana | 283 |
|---|---|
| -1 | 279 |
| Rome | 195 |
| Hamburg | 161 |
| Kingston | 155 |
| Other values (387) |
Characters and Unicode
| Total characters | 73918 |
|---|---|
| Distinct characters | 57 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 59 ? |
|---|---|
| Unique (%) | 0.6% |
Sample
| 1st row | Atlantis |
|---|---|
| 2nd row | Neverland |
| 3rd row | Atlantis |
| 4th row | Atlantis |
| 5th row | Mos Eisley |
Common Values
| Value | Count | Frequency (%) |
| Punta Cana | 283 | 2.7% |
| -1 | 279 | 2.7% |
| Rome | 195 | 1.9% |
| Hamburg | 161 | 1.5% |
| Kingston | 155 | 1.5% |
| Mexico City | 150 | 1.4% |
| Kyoto | 145 | 1.4% |
| Ulsan | 143 | 1.4% |
| San Juan | 142 | 1.4% |
| Denver | 136 | 1.3% |
| Other values (382) | 7842 | |
| (Missing) | 776 | 7.5% |
| Value | Count | Frequency (%) |
| san | 471 | 3.9% |
| cana | 318 | 2.7% |
| punta | 308 | 2.6% |
| 1 | 279 | 2.3% |
| rome | 232 | 1.9% |
| mexico | 217 | 1.8% |
| city | 204 | 1.7% |
| kingston | 184 | 1.5% |
| paris | 182 | 1.5% |
| kyoto | 182 | 1.5% |
| Other values (220) | 9392 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 9067 | 12.3% |
| n | 6448 | 8.7% |
| o | 6267 | 8.5% |
| i | 5011 | 6.8% |
| e | 4806 | 6.5% |
| t | 3777 | 5.1% |
| r | 3773 | 5.1% |
| u | 3145 | 4.3% |
| s | 2989 | 4.0% |
| l | 2954 | 4.0% |
| Other values (47) | 25681 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 59417 | |
| Uppercase Letter | 11358 | 15.4% |
| Space Separator | 2338 | 3.2% |
| Dash Punctuation | 280 | 0.4% |
| Decimal Number | 279 | 0.4% |
| Other Punctuation | 246 | 0.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 9067 | |
| n | 6448 | |
| o | 6267 | |
| i | 5011 | |
| e | 4806 | |
| t | 3777 | 6.4% |
| r | 3773 | 6.4% |
| u | 3145 | 5.3% |
| s | 2989 | 5.0% |
| l | 2954 | 5.0% |
| Other values (16) | 11180 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1320 | |
| C | 1208 | 10.6% |
| P | 1200 | 10.6% |
| M | 978 | 8.6% |
| B | 828 | 7.3% |
| A | 674 | 5.9% |
| K | 562 | 4.9% |
| L | 559 | 4.9% |
| R | 447 | 3.9% |
| D | 437 | 3.8% |
| Other values (15) | 3145 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 137 | |
| . | 104 | |
| ' | 5 | 2.0% |
Space Separator
| Value | Count | Frequency (%) |
| 2338 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 280 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 279 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 70775 | |
| Common | 3143 | 4.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 9067 | |
| n | 6448 | 9.1% |
| o | 6267 | 8.9% |
| i | 5011 | 7.1% |
| e | 4806 | 6.8% |
| t | 3777 | 5.3% |
| r | 3773 | 5.3% |
| u | 3145 | 4.4% |
| s | 2989 | 4.2% |
| l | 2954 | 4.2% |
| Other values (41) | 22538 |
Common
| Value | Count | Frequency (%) |
| 2338 | ||
| - | 280 | 8.9% |
| 1 | 279 | 8.9% |
| , | 137 | 4.4% |
| . | 104 | 3.3% |
| ' | 5 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 73918 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 9067 | 12.3% |
| n | 6448 | 8.7% |
| o | 6267 | 8.5% |
| i | 5011 | 6.8% |
| e | 4806 | 6.5% |
| t | 3777 | 5.1% |
| r | 3773 | 5.1% |
| u | 3145 | 4.3% |
| s | 2989 | 4.0% |
| l | 2954 | 4.0% |
| Other values (47) | 25681 |
| Distinct | 151 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 4120 |
| Missing (%) | 39.6% |
| Memory size | 527.7 KiB |
| -1 | |
|---|---|
| august 27 | |
| august 30 | 284 |
| september 8 | 231 |
| september 6 | 224 |
| Other values (146) |
Characters and Unicode
| Total characters | 50050 |
|---|---|
| Distinct characters | 43 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 14 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | august 13 |
|---|---|
| 2nd row | august 13 |
| 3rd row | august 13 |
| 4th row | august 17 |
| 5th row | august 17 |
Common Values
| Value | Count | Frequency (%) |
| -1 | 567 | 5.4% |
| august 27 | 495 | 4.8% |
| august 30 | 284 | 2.7% |
| september 8 | 231 | 2.2% |
| september 6 | 224 | 2.2% |
| september 2 | 200 | 1.9% |
| august 17 | 187 | 1.8% |
| september 12 | 181 | 1.7% |
| august 15 | 178 | 1.7% |
| august 25 | 175 | 1.7% |
| Other values (141) | 3565 | |
| (Missing) | 4120 |
| Value | Count | Frequency (%) |
| august | 2204 | |
| september | 1741 | |
| sept | 846 | 7.4% |
| 1 | 764 | 6.7% |
| 27 | 565 | 4.9% |
| 8 | 434 | 3.8% |
| 6 | 381 | 3.3% |
| 30 | 364 | 3.2% |
| 12 | 356 | 3.1% |
| 2 | 322 | 2.8% |
| Other values (38) | 3488 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 6223 | |
| 5178 | ||
| s | 4871 | |
| t | 4856 | |
| u | 4639 | 9.3% |
| 1 | 2909 | 5.8% |
| p | 2641 | 5.3% |
| a | 2461 | 4.9% |
| g | 2405 | 4.8% |
| 2 | 2194 | 4.4% |
| Other values (33) | 11673 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 33570 | |
| Decimal Number | 9902 | 19.8% |
| Space Separator | 5178 | 10.3% |
| Uppercase Letter | 621 | 1.2% |
| Dash Punctuation | 567 | 1.1% |
| Currency Symbol | 212 | 0.4% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 6223 | |
| s | 4871 | |
| t | 4856 | |
| u | 4639 | |
| p | 2641 | |
| a | 2461 | 7.3% |
| g | 2405 | 7.2% |
| r | 1785 | 5.3% |
| m | 1756 | 5.2% |
| b | 1741 | 5.2% |
| Other values (7) | 192 | 0.6% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 146 | |
| N | 146 | |
| I | 146 | |
| T | 71 | |
| G | 36 | 5.8% |
| L | 30 | 4.8% |
| Y | 10 | 1.6% |
| E | 10 | 1.6% |
| W | 6 | 1.0% |
| S | 5 | 0.8% |
| Other values (3) | 15 | 2.4% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 2909 | |
| 2 | 2194 | |
| 7 | 1019 | 10.3% |
| 3 | 855 | 8.6% |
| 6 | 664 | 6.7% |
| 8 | 645 | 6.5% |
| 0 | 638 | 6.4% |
| 5 | 445 | 4.5% |
| 9 | 271 | 2.7% |
| 4 | 262 | 2.6% |
Space Separator
| Value | Count | Frequency (%) |
| 5178 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 567 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 212 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 34191 | |
| Common | 15859 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 6223 | |
| s | 4871 | |
| t | 4856 | |
| u | 4639 | |
| p | 2641 | |
| a | 2461 | 7.2% |
| g | 2405 | 7.0% |
| r | 1785 | 5.2% |
| m | 1756 | 5.1% |
| b | 1741 | 5.1% |
| Other values (20) | 813 | 2.4% |
Common
| Value | Count | Frequency (%) |
| 5178 | ||
| 1 | 2909 | |
| 2 | 2194 | |
| 7 | 1019 | 6.4% |
| 3 | 855 | 5.4% |
| 6 | 664 | 4.2% |
| 8 | 645 | 4.1% |
| 0 | 638 | 4.0% |
| - | 567 | 3.6% |
| 5 | 445 | 2.8% |
| Other values (3) | 745 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 50050 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 6223 | |
| 5178 | ||
| s | 4871 | |
| t | 4856 | |
| u | 4639 | 9.3% |
| 1 | 2909 | 5.8% |
| p | 2641 | 5.3% |
| a | 2461 | 4.9% |
| g | 2405 | 4.8% |
| 2 | 2194 | 4.4% |
| Other values (33) | 11673 |
| Distinct | 155 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 2977 |
| Missing (%) | 28.6% |
| Memory size | 564.8 KiB |
| -1 | |
|---|---|
| august 27 | |
| august 30 | 327 |
| september 8 | 273 |
| september 6 | 268 |
| Other values (150) |
Characters and Unicode
| Total characters | 59436 |
|---|---|
| Distinct characters | 43 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 16 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | august 13 |
|---|---|
| 2nd row | august 13 |
| 3rd row | august 13 |
| 4th row | august 13 |
| 5th row | august 17 |
Common Values
| Value | Count | Frequency (%) |
| -1 | 655 | 6.3% |
| august 27 | 573 | 5.5% |
| august 30 | 327 | 3.1% |
| september 8 | 273 | 2.6% |
| september 6 | 268 | 2.6% |
| september 2 | 249 | 2.4% |
| august 17 | 216 | 2.1% |
| september 12 | 209 | 2.0% |
| august 25 | 202 | 1.9% |
| august 15 | 196 | 1.9% |
| Other values (145) | 4262 | |
| (Missing) | 2977 |
| Value | Count | Frequency (%) |
| august | 2578 | |
| september | 2112 | |
| sept | 1008 | 7.4% |
| 1 | 897 | 6.6% |
| 27 | 652 | 4.8% |
| 8 | 502 | 3.7% |
| 6 | 455 | 3.4% |
| 30 | 417 | 3.1% |
| 12 | 409 | 3.0% |
| 2 | 395 | 2.9% |
| Other values (39) | 4146 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 7523 | |
| 6141 | ||
| s | 5796 | |
| t | 5770 | |
| u | 5416 | 9.1% |
| 1 | 3419 | 5.8% |
| p | 3187 | 5.4% |
| a | 2867 | 4.8% |
| g | 2806 | 4.7% |
| 2 | 2610 | 4.4% |
| Other values (33) | 13901 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 39984 | |
| Decimal Number | 11696 | 19.7% |
| Space Separator | 6141 | 10.3% |
| Uppercase Letter | 716 | 1.2% |
| Dash Punctuation | 655 | 1.1% |
| Currency Symbol | 244 | 0.4% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 7523 | |
| s | 5796 | |
| t | 5770 | |
| u | 5416 | |
| p | 3187 | |
| a | 2867 | 7.2% |
| g | 2806 | 7.0% |
| r | 2162 | 5.4% |
| m | 2128 | 5.3% |
| b | 2112 | 5.3% |
| Other values (7) | 217 | 0.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 166 | |
| N | 166 | |
| I | 166 | |
| T | 84 | |
| G | 43 | 6.0% |
| L | 35 | 4.9% |
| Y | 12 | 1.7% |
| E | 12 | 1.7% |
| W | 7 | 1.0% |
| A | 7 | 1.0% |
| Other values (3) | 18 | 2.5% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 3419 | |
| 2 | 2610 | |
| 7 | 1184 | 10.1% |
| 3 | 1011 | 8.6% |
| 6 | 788 | 6.7% |
| 8 | 759 | 6.5% |
| 0 | 751 | 6.4% |
| 5 | 506 | 4.3% |
| 4 | 338 | 2.9% |
| 9 | 330 | 2.8% |
Space Separator
| Value | Count | Frequency (%) |
| 6141 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 655 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 244 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 40700 | |
| Common | 18736 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 7523 | |
| s | 5796 | |
| t | 5770 | |
| u | 5416 | |
| p | 3187 | |
| a | 2867 | 7.0% |
| g | 2806 | 6.9% |
| r | 2162 | 5.3% |
| m | 2128 | 5.2% |
| b | 2112 | 5.2% |
| Other values (20) | 933 | 2.3% |
Common
| Value | Count | Frequency (%) |
| 6141 | ||
| 1 | 3419 | |
| 2 | 2610 | |
| 7 | 1184 | 6.3% |
| 3 | 1011 | 5.4% |
| 6 | 788 | 4.2% |
| 8 | 759 | 4.1% |
| 0 | 751 | 4.0% |
| - | 655 | 3.5% |
| 5 | 506 | 2.7% |
| Other values (3) | 912 | 4.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 59436 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 7523 | |
| 6141 | ||
| s | 5796 | |
| t | 5770 | |
| u | 5416 | 9.1% |
| 1 | 3419 | 5.8% |
| p | 3187 | 5.4% |
| a | 2867 | 4.8% |
| g | 2806 | 4.7% |
| 2 | 2610 | 4.4% |
| Other values (33) | 13901 |
| Distinct | 129 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 5620 |
| Missing (%) | 54.0% |
| Memory size | 472.1 KiB |
| -1 | 344 |
|---|---|
| september 5 | 118 |
| 17 | 117 |
| 21 | 114 |
| september 7 | 113 |
| Other values (124) |
Characters and Unicode
| Total characters | 30561 |
|---|---|
| Distinct characters | 33 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 4 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | august 22 |
|---|---|
| 2nd row | august 22 |
| 3rd row | august 24 |
| 4th row | august 24 |
| 5th row | august 24 |
Common Values
| Value | Count | Frequency (%) |
| -1 | 344 | 3.3% |
| september 5 | 118 | 1.1% |
| 17 | 117 | 1.1% |
| 21 | 114 | 1.1% |
| september 7 | 113 | 1.1% |
| 19 | 109 | 1.0% |
| september 11 | 104 | 1.0% |
| september 3 | 102 | 1.0% |
| august 31 | 101 | 1.0% |
| september 14 | 99 | 1.0% |
| Other values (119) | 3466 | |
| (Missing) | 5620 |
| Value | Count | Frequency (%) |
| september | 1708 | |
| 1 | 447 | 6.1% |
| august | 391 | 5.3% |
| sept | 357 | 4.8% |
| 17 | 199 | 2.7% |
| 19 | 186 | 2.5% |
| 11 | 178 | 2.4% |
| 15 | 178 | 2.4% |
| 7 | 174 | 2.4% |
| 28 | 170 | 2.3% |
| Other values (31) | 3388 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 5593 | |
| 2589 | ||
| s | 2571 | |
| 1 | 2479 | |
| t | 2474 | |
| p | 2174 | 7.1% |
| 2 | 1787 | 5.8% |
| r | 1717 | 5.6% |
| b | 1711 | 5.6% |
| m | 1708 | 5.6% |
| Other values (23) | 5758 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 19599 | |
| Decimal Number | 8011 | |
| Space Separator | 2589 | 8.5% |
| Dash Punctuation | 344 | 1.1% |
| Uppercase Letter | 15 | < 0.1% |
| Currency Symbol | 3 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 5593 | |
| s | 2571 | |
| t | 2474 | |
| p | 2174 | 11.1% |
| r | 1717 | 8.8% |
| b | 1711 | 8.7% |
| m | 1708 | 8.7% |
| u | 800 | 4.1% |
| a | 421 | 2.1% |
| g | 403 | 2.1% |
| Other values (6) | 27 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 2479 | |
| 2 | 1787 | |
| 3 | 696 | 8.7% |
| 7 | 518 | 6.5% |
| 5 | 456 | 5.7% |
| 8 | 442 | 5.5% |
| 0 | 433 | 5.4% |
| 9 | 423 | 5.3% |
| 4 | 404 | 5.0% |
| 6 | 373 | 4.7% |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 6 | |
| O | 3 | |
| L | 3 | |
| T | 3 |
Space Separator
| Value | Count | Frequency (%) |
| 2589 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 344 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 19614 | |
| Common | 10947 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 5593 | |
| s | 2571 | |
| t | 2474 | |
| p | 2174 | 11.1% |
| r | 1717 | 8.8% |
| b | 1711 | 8.7% |
| m | 1708 | 8.7% |
| u | 800 | 4.1% |
| a | 421 | 2.1% |
| g | 403 | 2.1% |
| Other values (10) | 42 | 0.2% |
Common
| Value | Count | Frequency (%) |
| 2589 | ||
| 1 | 2479 | |
| 2 | 1787 | |
| 3 | 696 | 6.4% |
| 7 | 518 | 4.7% |
| 5 | 456 | 4.2% |
| 8 | 442 | 4.0% |
| 0 | 433 | 4.0% |
| 9 | 423 | 3.9% |
| 4 | 404 | 3.7% |
| Other values (3) | 720 | 6.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 30561 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 5593 | |
| 2589 | ||
| s | 2571 | |
| 1 | 2479 | |
| t | 2474 | |
| p | 2174 | 7.1% |
| 2 | 1787 | 5.8% |
| r | 1717 | 5.6% |
| b | 1711 | 5.6% |
| m | 1708 | 5.6% |
| Other values (23) | 5758 |
| Distinct | 131 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 4673 |
| Missing (%) | 44.9% |
| Memory size | 500.9 KiB |
| -1 | 404 |
|---|---|
| september 5 | 142 |
| 17 | 141 |
| 21 | 133 |
| september 7 | 133 |
| Other values (126) |
Characters and Unicode
| Total characters | 36451 |
|---|---|
| Distinct characters | 33 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | august 22 |
|---|---|
| 2nd row | august 22 |
| 3rd row | august 22 |
| 4th row | august 24 |
| 5th row | august 24 |
Common Values
| Value | Count | Frequency (%) |
| -1 | 404 | 3.9% |
| september 5 | 142 | 1.4% |
| 17 | 141 | 1.4% |
| 21 | 133 | 1.3% |
| september 7 | 133 | 1.3% |
| 19 | 129 | 1.2% |
| september 3 | 121 | 1.2% |
| august 31 | 120 | 1.2% |
| september 14 | 119 | 1.1% |
| september 11 | 118 | 1.1% |
| Other values (121) | 4174 | |
| (Missing) | 4673 |
| Value | Count | Frequency (%) |
| september | 2021 | |
| 1 | 529 | 6.0% |
| august | 483 | 5.5% |
| sept | 432 | 4.9% |
| 17 | 235 | 2.7% |
| 19 | 216 | 2.4% |
| 11 | 213 | 2.4% |
| 7 | 211 | 2.4% |
| 15 | 211 | 2.4% |
| 28 | 203 | 2.3% |
| Other values (31) | 4067 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 6620 | |
| 3087 | ||
| s | 3064 | |
| 1 | 2963 | |
| t | 2958 | |
| p | 2574 | 7.1% |
| 2 | 2150 | 5.9% |
| r | 2032 | 5.6% |
| b | 2025 | 5.6% |
| m | 2021 | 5.5% |
| Other values (23) | 6957 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 23332 | |
| Decimal Number | 9605 | |
| Space Separator | 3087 | 8.5% |
| Dash Punctuation | 404 | 1.1% |
| Uppercase Letter | 19 | 0.1% |
| Currency Symbol | 4 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 6620 | |
| s | 3064 | |
| t | 2958 | |
| p | 2574 | 11.0% |
| r | 2032 | 8.7% |
| b | 2025 | 8.7% |
| m | 2021 | 8.7% |
| u | 988 | 4.2% |
| a | 519 | 2.2% |
| g | 498 | 2.1% |
| Other values (6) | 33 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 2963 | |
| 2 | 2150 | |
| 3 | 846 | 8.8% |
| 7 | 627 | 6.5% |
| 5 | 544 | 5.7% |
| 8 | 529 | 5.5% |
| 0 | 502 | 5.2% |
| 4 | 500 | 5.2% |
| 9 | 496 | 5.2% |
| 6 | 448 | 4.7% |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 7 | |
| O | 4 | |
| L | 4 | |
| T | 4 |
Space Separator
| Value | Count | Frequency (%) |
| 3087 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 404 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 4 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 23351 | |
| Common | 13100 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 6620 | |
| s | 3064 | |
| t | 2958 | |
| p | 2574 | 11.0% |
| r | 2032 | 8.7% |
| b | 2025 | 8.7% |
| m | 2021 | 8.7% |
| u | 988 | 4.2% |
| a | 519 | 2.2% |
| g | 498 | 2.1% |
| Other values (10) | 52 | 0.2% |
Common
| Value | Count | Frequency (%) |
| 3087 | ||
| 1 | 2963 | |
| 2 | 2150 | |
| 3 | 846 | 6.5% |
| 7 | 627 | 4.8% |
| 5 | 544 | 4.2% |
| 8 | 529 | 4.0% |
| 0 | 502 | 3.8% |
| 4 | 500 | 3.8% |
| 9 | 496 | 3.8% |
| Other values (3) | 856 | 6.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 36451 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 6620 | |
| 3087 | ||
| s | 3064 | |
| 1 | 2963 | |
| t | 2958 | |
| p | 2574 | 7.1% |
| 2 | 2150 | 5.9% |
| r | 2032 | 5.6% |
| b | 2025 | 5.6% |
| m | 2021 | 5.5% |
| Other values (23) | 6957 |
| Distinct | 225 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 5152 |
| Missing (%) | 49.5% |
| Memory size | 479.1 KiB |
| -1 | |
|---|---|
| 3300.0 | 111 |
| 4000.0 | 107 |
| 3200.0 | 104 |
| 3500.0 | 93 |
| Other values (220) |
Characters and Unicode
| Total characters | 26085 |
|---|---|
| Distinct characters | 38 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 30 ? |
|---|---|
| Unique (%) | 0.6% |
Sample
| 1st row | 1700.0 |
|---|---|
| 2nd row | 1900.0 |
| 3rd row | 1700.0 |
| 4th row | 2100.0 |
| 5th row | 2100.0 |
Common Values
| Value | Count | Frequency (%) |
| -1 | 1469 | 14.1% |
| 3300.0 | 111 | 1.1% |
| 4000.0 | 107 | 1.0% |
| 3200.0 | 104 | 1.0% |
| 3500.0 | 93 | 0.9% |
| 400.0 | 92 | 0.9% |
| 3100.0 | 89 | 0.9% |
| 2900.0 | 84 | 0.8% |
| 1900.0 | 83 | 0.8% |
| 4300.0 | 72 | 0.7% |
| Other values (215) | 2951 | |
| (Missing) | 5152 |
| Value | Count | Frequency (%) |
| 1 | 1469 | |
| 3300.0 | 112 | 2.1% |
| 4000.0 | 107 | 2.0% |
| 3200.0 | 104 | 1.9% |
| 3100.0 | 96 | 1.8% |
| 3500.0 | 93 | 1.7% |
| 400.0 | 92 | 1.7% |
| 2900.0 | 84 | 1.6% |
| 1900.0 | 83 | 1.5% |
| 2000.0 | 73 | 1.4% |
| Other values (215) | 3047 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 11049 | |
| . | 3671 | 14.1% |
| 1 | 2634 | 10.1% |
| - | 1469 | 5.6% |
| 3 | 1364 | 5.2% |
| 2 | 1209 | 4.6% |
| 4 | 1112 | 4.3% |
| 5 | 671 | 2.6% |
| 6 | 586 | 2.2% |
| 9 | 484 | 1.9% |
| Other values (28) | 1836 | 7.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 20012 | |
| Other Punctuation | 3671 | 14.1% |
| Dash Punctuation | 1469 | 5.6% |
| Uppercase Letter | 455 | 1.7% |
| Lowercase Letter | 192 | 0.7% |
| Currency Symbol | 181 | 0.7% |
| Space Separator | 105 | 0.4% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 36 | |
| s | 22 | |
| n | 17 | |
| v | 16 | |
| i | 15 | |
| a | 15 | |
| l | 15 | |
| p | 8 | 4.2% |
| m | 8 | 4.2% |
| y | 8 | 4.2% |
| Other values (6) | 32 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 11049 | |
| 1 | 2634 | 13.2% |
| 3 | 1364 | 6.8% |
| 2 | 1209 | 6.0% |
| 4 | 1112 | 5.6% |
| 5 | 671 | 3.4% |
| 6 | 586 | 2.9% |
| 9 | 484 | 2.4% |
| 7 | 465 | 2.3% |
| 8 | 438 | 2.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 93 | |
| T | 88 | |
| N | 71 | |
| I | 71 | |
| L | 65 | |
| G | 23 | 5.1% |
| A | 22 | 4.8% |
| X | 22 | 4.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 3671 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1469 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 181 |
Space Separator
| Value | Count | Frequency (%) |
| 105 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 25438 | |
| Latin | 647 | 2.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 93 | |
| T | 88 | |
| N | 71 | |
| I | 71 | |
| L | 65 | |
| e | 36 | 5.6% |
| G | 23 | 3.6% |
| A | 22 | 3.4% |
| s | 22 | 3.4% |
| X | 22 | 3.4% |
| Other values (14) | 134 |
Common
| Value | Count | Frequency (%) |
| 0 | 11049 | |
| . | 3671 | 14.4% |
| 1 | 2634 | 10.4% |
| - | 1469 | 5.8% |
| 3 | 1364 | 5.4% |
| 2 | 1209 | 4.8% |
| 4 | 1112 | 4.4% |
| 5 | 671 | 2.6% |
| 6 | 586 | 2.3% |
| 9 | 484 | 1.9% |
| Other values (4) | 1189 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 26085 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 11049 | |
| . | 3671 | 14.1% |
| 1 | 2634 | 10.1% |
| - | 1469 | 5.6% |
| 3 | 1364 | 5.2% |
| 2 | 1209 | 4.6% |
| 4 | 1112 | 4.3% |
| 5 | 671 | 2.6% |
| 6 | 586 | 2.2% |
| 9 | 484 | 1.9% |
| Other values (28) | 1836 | 7.0% |
| Distinct | 228 |
|---|---|
| Distinct (%) | 3.7% |
| Missing | 4178 |
| Missing (%) | 40.1% |
| Memory size | 507.8 KiB |
| -1 | |
|---|---|
| 3300.0 | 136 |
| 3200.0 | 124 |
| 4000.0 | 123 |
| 400.0 | 115 |
| Other values (223) |
Characters and Unicode
| Total characters | 31074 |
|---|---|
| Distinct characters | 38 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 24 ? |
|---|---|
| Unique (%) | 0.4% |
Sample
| 1st row | 1700.0 |
|---|---|
| 2nd row | 1900.0 |
| 3rd row | 1700.0 |
| 4th row | 1700.0 |
| 5th row | 2100.0 |
Common Values
| Value | Count | Frequency (%) |
| -1 | 1704 | |
| 3300.0 | 136 | 1.3% |
| 3200.0 | 124 | 1.2% |
| 4000.0 | 123 | 1.2% |
| 400.0 | 115 | 1.1% |
| 3500.0 | 114 | 1.1% |
| 3100.0 | 107 | 1.0% |
| 2900.0 | 106 | 1.0% |
| 1900.0 | 95 | 0.9% |
| 4300.0 | 85 | 0.8% |
| Other values (218) | 3520 | |
| (Missing) | 4178 |
| Value | Count | Frequency (%) |
| 1 | 1704 | |
| 3300.0 | 138 | 2.2% |
| 3200.0 | 124 | 2.0% |
| 4000.0 | 123 | 1.9% |
| 400.0 | 115 | 1.8% |
| 3100.0 | 115 | 1.8% |
| 3500.0 | 114 | 1.8% |
| 2900.0 | 106 | 1.7% |
| 1900.0 | 95 | 1.5% |
| 2000.0 | 90 | 1.4% |
| Other values (218) | 3631 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 13218 | |
| . | 4391 | 14.1% |
| 1 | 3084 | 9.9% |
| - | 1704 | 5.5% |
| 3 | 1629 | 5.2% |
| 2 | 1466 | 4.7% |
| 4 | 1319 | 4.2% |
| 5 | 801 | 2.6% |
| 6 | 696 | 2.2% |
| 9 | 580 | 1.9% |
| Other values (28) | 2186 | 7.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 23879 | |
| Other Punctuation | 4391 | 14.1% |
| Dash Punctuation | 1704 | 5.5% |
| Uppercase Letter | 541 | 1.7% |
| Lowercase Letter | 217 | 0.7% |
| Currency Symbol | 216 | 0.7% |
| Space Separator | 126 | 0.4% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 41 | |
| s | 25 | |
| n | 19 | |
| v | 18 | |
| i | 17 | |
| a | 17 | |
| l | 17 | |
| p | 9 | 4.1% |
| m | 9 | 4.1% |
| y | 9 | 4.1% |
| Other values (6) | 36 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 13218 | |
| 1 | 3084 | 12.9% |
| 3 | 1629 | 6.8% |
| 2 | 1466 | 6.1% |
| 4 | 1319 | 5.5% |
| 5 | 801 | 3.4% |
| 6 | 696 | 2.9% |
| 9 | 580 | 2.4% |
| 7 | 558 | 2.3% |
| 8 | 528 | 2.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 109 | |
| T | 107 | |
| N | 83 | |
| I | 83 | |
| L | 78 | |
| G | 29 | 5.4% |
| A | 26 | 4.8% |
| X | 26 | 4.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 4391 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1704 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 216 |
Space Separator
| Value | Count | Frequency (%) |
| 126 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 30316 | |
| Latin | 758 | 2.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 109 | |
| T | 107 | |
| N | 83 | |
| I | 83 | |
| L | 78 | |
| e | 41 | 5.4% |
| G | 29 | 3.8% |
| A | 26 | 3.4% |
| X | 26 | 3.4% |
| s | 25 | 3.3% |
| Other values (14) | 151 |
Common
| Value | Count | Frequency (%) |
| 0 | 13218 | |
| . | 4391 | 14.5% |
| 1 | 3084 | 10.2% |
| - | 1704 | 5.6% |
| 3 | 1629 | 5.4% |
| 2 | 1466 | 4.8% |
| 4 | 1319 | 4.4% |
| 5 | 801 | 2.6% |
| 6 | 696 | 2.3% |
| 9 | 580 | 1.9% |
| Other values (4) | 1428 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 31074 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 13218 | |
| . | 4391 | 14.1% |
| 1 | 3084 | 9.9% |
| - | 1704 | 5.5% |
| 3 | 1629 | 5.2% |
| 2 | 1466 | 4.7% |
| 4 | 1319 | 4.2% |
| 5 | 801 | 2.6% |
| 6 | 696 | 2.2% |
| 9 | 580 | 1.9% |
| Other values (28) | 2186 | 7.0% |