Analytical Results/nneonneo Transliteration
This page is a compilation of analytical results using nneonneo's transliteration.
Phrase Frequency
These are frequencies of phrases (groups of two or more words that appear more than once).
Phrase | Occurrences |
---|---|
da rilota’ta | 4 |
& sadoce’ta | 4 |
rilota dota’ta | 3 |
& sadoce | 3 |
gera ruta | 3 |
weri wonsi | 3 |
dota’ta wiga | 2 |
wiga sige’gin | 2 |
gin rola’ta | 2 |
ine dota’ta | 2 |
wele sine’on | 2 |
sine’on rine | 2 |
rine sine’te | 2 |
tin’de’sa gera | 2 |
gera ruta’sa | 2 |
teri’sa wonsi | 2 |
wonsi eto | 2 |
ransan’sa & | 2 |
rilota’ta rede | 2 |
rede dota | 2 |
dota ruta | 2 |
rilota’sa & | 2 |
cen rine | 2 |
cen duwo’ta | 2 |
duwo’ta satan | 2 |
wele’da cen | 2 |
galon’on galon’sa | 2 |
galon’sa guto | 2 |
wele anre’de’sa | 2 |
wonsi gede’te | 2 |
rilota dota’ta wiga | 2 |
dota’ta wiga sige’gin | 2 |
wele sine’on rine | 2 |
sine’on rine sine’te | 2 |
teri’sa wonsi eto | 2 |
da rilota’ta rede | 2 |
cen duwo’ta satan | 2 |
galon’on galon’sa guto | 2 |
rilota dota’ta wiga sige’gin | 2 |
wele sine’on rine sine’te | 2 |
' Used as Space
These are frequencies of phrases if Template:Woo is treated as a space (word terminator)
Phrase | Occurrences |
---|---|
de sa | 10 |
& sadoce | 8 |
sa & | 7 |
sa & sadoce | 7 |
dota ta | 6 |
gin rola | 6 |
ta gede | 6 |
gera ruta | 6 |
tin de | 6 |
da rilota | 6 |
rilota ta | 6 |
dota ruta | 5 |
te ine | 5 |
ta wele | 5 |
wele de | 5 |
sa gera | 5 |
duwo ta | 5 |
ta tin | 5 |
sadoce ta | 5 |
rilota sa | 5 |
cen rine | 5 |
tin de sa | 5 |
de sa gera | 5 |
sa gera ruta | 5 |
& sadoce ta | 5 |
de sa gera ruta | 5 |
sa & sadoce ta | 5 |
rola ta | 4 |
ta satan | 4 |
te rilota | 4 |
ta ense | 4 |
da rilota ta | 4 |
rilota dota | 3 |
ine ta | 3 |
ci o | 3 |
o de | 3 |
ine dota | 3 |
sa guto | 3 |
sa rilota | 3 |
sa ransan | 3 |
o rilota | 3 |
sa de | 3 |
ense o | 3 |
weri wonsi | 3 |
len ta | 3 |
galon sa | 3 |
rilota dota ta | 3 |
te ine ta | 3 |
gin rola ta | 3 |
ta ense o | 3 |
rilota sa & | 3 |
tin de sa gera | 3 |
rilota sa & sadoce | 3 |
tin de sa gera ruta | 3 |
o na | 2 |
ta wiga | 2 |
wiga sige | 2 |
sige gin | 2 |
ruta te | 2 |
nense letu | 2 |
ta ison | 2 |
ta guwo | 2 |
guwo o | 2 |
etere o | 2 |
ta sene | 2 |
sene o | 2 |
o nada | 2 |
wen de | 2 |
wonsi wele | 2 |
wele sine | 2 |
sine on | 2 |
on rine | 2 |
rine sine | 2 |
sine te | 2 |
ago ta | 2 |
ete o | 2 |
runi te | 2 |
o neran | 2 |
gede wele | 2 |
de sana | 2 |
ruta sa | 2 |
teri sa | 2 |
sa wonsi | 2 |
wonsi eto | 2 |
dede dota | 2 |
rilota eto | 2 |
eto sa | 2 |
ransan sa | 2 |
ta rede | 2 |
na te | 2 |
rede dota | 2 |
ruta ta | 2 |
rola sa | 2 |
sa ine | 2 |
sa dota | 2 |
ruta wen | 2 |
sa da | 2 |
da duwo | 2 |
de ran | 2 |
ta gosogon | 2 |
wonsi rago | 2 |
sa nense | 2 |
eto te | 2 |
da ta | 2 |
sa cen | 2 |
cen duwo | 2 |
wele da | 2 |
da cen | 2 |
tata ta | 2 |
rilota da | 2 |
rilota duwo | 2 |
gede ci | 2 |
galon on | 2 |
on galon | 2 |
guto ete | 2 |
rine duwo | 2 |
wele anre | 2 |
anre de | 2 |
wonsi gede | 2 |
gede te | 2 |
te nense | 2 |
dota ta wiga | 2 |
ta wiga sige | 2 |
wiga sige gin | 2 |
sige gin rola | 2 |
ruta te ine | 2 |
ta guwo o | 2 |
etere o de | 2 |
ine dota ta | 2 |
ta sene o | 2 |
wele sine on | 2 |
sine on rine | 2 |
on rine sine | 2 |
rine sine te | 2 |
ta gede wele | 2 |
gera ruta sa | 2 |
teri sa wonsi | 2 |
sa wonsi eto | 2 |
sa ransan sa | 2 |
ransan sa & | 2 |
rilota ta rede | 2 |
rede dota ruta | 2 |
dota ruta ta | 2 |
sa dota ruta | 2 |
te rilota sa | 2 |
rilota sa ransan | 2 |
o de ran | 2 |
ta wele de | 2 |
sa cen rine | 2 |
cen duwo ta | 2 |
duwo ta satan | 2 |
wele da cen | 2 |
ta gede ci | 2 |
gede ci o | 2 |
galon on galon | 2 |
on galon sa | 2 |
galon sa guto | 2 |
sa guto ete | 2 |
ta tin de | 2 |
wele anre de | 2 |
anre de sa | 2 |
wonsi gede te | 2 |
len ta tin | 2 |
rilota dota ta wiga | 2 |
dota ta wiga sige | 2 |
ta wiga sige gin | 2 |
wiga sige gin rola | 2 |
ruta te ine ta | 2 |
wele sine on rine | 2 |
sine on rine sine | 2 |
on rine sine te | 2 |
sa gera ruta sa | 2 |
teri sa wonsi eto | 2 |
sa ransan sa & | 2 |
ransan sa & sadoce | 2 |
da rilota ta rede | 2 |
te rilota sa ransan | 2 |
cen duwo ta satan | 2 |
ta gede ci o | 2 |
galon on galon sa | 2 |
on galon sa guto | 2 |
galon sa guto ete | 2 |
wele anre de sa | 2 |
rilota dota ta wiga sige | 2 |
dota ta wiga sige gin | 2 |
ta wiga sige gin rola | 2 |
wele sine on rine sine | 2 |
sine on rine sine te | 2 |
de sa gera ruta sa | 2 |
sa ransan sa & sadoce | 2 |
rilota sa & sadoce ta | 2 |
galon on galon sa guto | 2 |
on galon sa guto ete | 2 |
rilota dota ta wiga sige gin | 2 |
dota ta wiga sige gin rola | 2 |
wele sine on rine sine te | 2 |
tin de sa gera ruta sa | 2 |
galon on galon sa guto ete | 2 |
rilota dota ta wiga sige gin rola | 2 |
Division into smaller words
Here are the short words and their counts:
52 ta 45 sa 26 o 22 de 20 te 15 tin 14 da 10 cen 9 on 7 na 6 gin 4 ran 5 len 4 wen 3 ci 2 gan 1 wu 1 i 1 gon
and the long words and their counts:
25 ri_lo_ta 22 we_le 15 ru_ta 14 won_si 13 do_ta 11 ro_la 11 i_ne 10 ri_ne 10 sa_do_ce 9 du_wo 9 e_to 9 e_te 8 ge_de 6 tu_ge 6 en_wo 6 ge_ra 6 di_ta 5 re_de 5 sa_na 5 ran_san 5 ga_lon 5 nen_se 5 si_ne 5 sa_tan 5 ta_ta 5 ru_ni 4 de_de 4 ne_ran 4 a_go 4 en_se 3 se_ge_te 3 go_so_gon 3 gu_to 3 na_da 3 we_ri 3 an_re 3 ra_go 3 si_ge 2 u_we 2 du_ra 2 se_ne 2 wi_di 2 gu_wo 2 le_le 2 te_le 2 se_ta 2 i_son 2 e_te_re 2 wi_ga 2 te_ri 2 le_tu 2 le_ri 1 da_co 1 ra_wo_ton 1 la_to 1 tu_gan 1 e_ta 1 i_tan 1 no_ne 1 ro_ta 1 wi_go 1 a_ti 1 ni_ge 1 te_so 1 to_wo 1 din_ru_we
There are a total of 85 words (19 short and 66 long). The long words use 52 unique tokens, which makes for over 140,000 possible three-token words. Therefore, it is very possible that these 85 words are, by themselves, complete words. Furthermore, these word counts are very consistent with a sample English text I looked at, once the most common words were removed. I don't know much about German, unfortunately, so I can't be much use there.
Token Frequency
The following shows the frequency with which each token appears in the text:
‘ 204 15.5% ra 25 1.9% du 11 0.8% n 181 13.7% do 23 1.7% ga 10 0.8% ta 131 9.9% a 23 1.7% re 10 0.8% sa 70 5.3% si 22 1.7% di 9 0.7% e 69 5.2% ru 21 1.6% tu 9 0.7% de 43 3.3% i 21 1.6% & 8 0.6% ri 42 3.2% ce 20 1.5% gi 6 0.5% te 39 3.0% da 18 1.4% so 6 0.5% le 37 2.8% se 16 1.2% gu 5 0.4% o 36 2.7% ti 16 1.2% wi 5 0.4% wo 33 2.5% go 15 1.1% ci 3 0.2% we 32 2.4% to 15 1.1% u 2 0.2% lo 30 2.3% ro 12 0.9% co 1 0.1% ge 27 2.0% la 12 0.9% wu 1 0.1%
and the frequency of tokens, as defined by nneonneo:
125 ta 65 sa 43 de 42 ri 39 te 33 ne 32 le 28 we 27 ge 26 o 25 lo 23 do 22 si 21 ru 21 e 19 wo 18 da 16 na 16 se 15 tin 15 i 14 won 14 to 13 ran 12 la 12 ra 12 ro 11 go 11 du 10 en 10 ce 10 re 10 cen 9 tu 9 on 8 di 7 ga 6 tan 6 gin 6 ni 5 san 5 gu 5 nen 5 lon 5 len 5 wi 5 a 4 gon 4 wen 4 so 3 ci 3 an 3 gan 2 son 2 u 1 co 1 ton 1 ti 1 din 1 wu 1 no