Analytical Results/nneonneo Transliteration

From BookOfWoo
Revision as of 22:22, 2 March 2018 by Narga (talk | contribs) (' Used as Space)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page is a compilation of analytical results using nneonneo's transliteration.

Phrase Frequency[edit]

These are frequencies of phrases (groups of two or more words that appear more than once).

Phrase Occurrences
da rilota’ta 4
& sadoce’ta 4
rilota dota’ta 3
& sadoce 3
gera ruta 3
weri wonsi 3
dota’ta wiga 2
wiga sige’gin 2
gin rola’ta 2
ine dota’ta 2
wele sine’on 2
sine’on rine 2
rine sine’te 2
tin’de’sa gera 2
gera ruta’sa 2
teri’sa wonsi 2
wonsi eto 2
ransan’sa & 2
rilota’ta rede 2
rede dota 2
dota ruta 2
rilota’sa & 2
cen rine 2
cen duwo’ta 2
duwo’ta satan 2
wele’da cen 2
galon’on galon’sa 2
galon’sa guto 2
wele anre’de’sa 2
wonsi gede’te 2
rilota dota’ta wiga 2
dota’ta wiga sige’gin 2
wele sine’on rine 2
sine’on rine sine’te 2
teri’sa wonsi eto 2
da rilota’ta rede 2
cen duwo’ta satan 2
galon’on galon’sa guto 2
rilota dota’ta wiga sige’gin 2
wele sine’on rine sine’te 2

' Used as Space[edit]

These are frequencies of phrases if W/w is treated as a space (word terminator)

Phrase Occurrences
de sa 10
& sadoce 8
sa & 7
sa & sadoce 7
dota ta 6
gin rola 6
ta gede 6
gera ruta 6
tin de 6
da rilota 6
rilota ta 6
dota ruta 5
te ine 5
ta wele 5
wele de 5
sa gera 5
duwo ta 5
ta tin 5
sadoce ta 5
rilota sa 5
cen rine 5
tin de sa 5
de sa gera 5
sa gera ruta 5
& sadoce ta 5
de sa gera ruta 5
sa & sadoce ta 5
rola ta 4
ta satan 4
te rilota 4
ta ense 4
da rilota ta 4
rilota dota 3
ine ta 3
ci o 3
o de 3
ine dota 3
sa guto 3
sa rilota 3
sa ransan 3
o rilota 3
sa de 3
ense o 3
weri wonsi 3
len ta 3
galon sa 3
rilota dota ta 3
te ine ta 3
gin rola ta 3
ta ense o 3
rilota sa & 3
tin de sa gera 3
rilota sa & sadoce 3
tin de sa gera ruta 3
o na 2
ta wiga 2
wiga sige 2
sige gin 2
ruta te 2
nense letu 2
ta ison 2
ta guwo 2
guwo o 2
etere o 2
ta sene 2
sene o 2
o nada 2
wen de 2
wonsi wele 2
wele sine 2
sine on 2
on rine 2
rine sine 2
sine te 2
ago ta 2
ete o 2
runi te 2
o neran 2
gede wele 2
de sana 2
ruta sa 2
teri sa 2
sa wonsi 2
wonsi eto 2
dede dota 2
rilota eto 2
eto sa 2
ransan sa 2
ta rede 2
na te 2
rede dota 2
ruta ta 2
rola sa 2
sa ine 2
sa dota 2
ruta wen 2
sa da 2
da duwo 2
de ran 2
ta gosogon 2
wonsi rago 2
sa nense 2
eto te 2
da ta 2
sa cen 2
cen duwo 2
wele da 2
da cen 2
tata ta 2
rilota da 2
rilota duwo 2
gede ci 2
galon on 2
on galon 2
guto ete 2
rine duwo 2
wele anre 2
anre de 2
wonsi gede 2
gede te 2
te nense 2
dota ta wiga 2
ta wiga sige 2
wiga sige gin 2
sige gin rola 2
ruta te ine 2
ta guwo o 2
etere o de 2
ine dota ta 2
ta sene o 2
wele sine on 2
sine on rine 2
on rine sine 2
rine sine te 2
ta gede wele 2
gera ruta sa 2
teri sa wonsi 2
sa wonsi eto 2
sa ransan sa 2
ransan sa & 2
rilota ta rede 2
rede dota ruta 2
dota ruta ta 2
sa dota ruta 2
te rilota sa 2
rilota sa ransan 2
o de ran 2
ta wele de 2
sa cen rine 2
cen duwo ta 2
duwo ta satan 2
wele da cen 2
ta gede ci 2
gede ci o 2
galon on galon 2
on galon sa 2
galon sa guto 2
sa guto ete 2
ta tin de 2
wele anre de 2
anre de sa 2
wonsi gede te 2
len ta tin 2
rilota dota ta wiga 2
dota ta wiga sige 2
ta wiga sige gin 2
wiga sige gin rola 2
ruta te ine ta 2
wele sine on rine 2
sine on rine sine 2
on rine sine te 2
sa gera ruta sa 2
teri sa wonsi eto 2
sa ransan sa & 2
ransan sa & sadoce 2
da rilota ta rede 2
te rilota sa ransan 2
cen duwo ta satan 2
ta gede ci o 2
galon on galon sa 2
on galon sa guto 2
galon sa guto ete 2
wele anre de sa 2
rilota dota ta wiga sige 2
dota ta wiga sige gin 2
ta wiga sige gin rola 2
wele sine on rine sine 2
sine on rine sine te 2
de sa gera ruta sa 2
sa ransan sa & sadoce 2
rilota sa & sadoce ta 2
galon on galon sa guto 2
on galon sa guto ete 2
rilota dota ta wiga sige gin 2
dota ta wiga sige gin rola 2
wele sine on rine sine te 2
tin de sa gera ruta sa 2
galon on galon sa guto ete 2
rilota dota ta wiga sige gin rola 2

Division into smaller words[edit]

Here are the short words and their counts:

52 ta
45 sa
26 o
22 de
20 te
15 tin
14 da
10 cen
9 on
7 na
6 gin
4 ran
5 len
4 wen
3 ci
2 gan
1 wu
1 i
1 gon

and the long words and their counts:

25 ri_lo_ta
22 we_le
15 ru_ta
14 won_si
13 do_ta
11 ro_la
11 i_ne
10 ri_ne
10 sa_do_ce
9 du_wo
9 e_to
9 e_te
8 ge_de
6 tu_ge
6 en_wo
6 ge_ra
6 di_ta
5 re_de
5 sa_na
5 ran_san
5 ga_lon
5 nen_se
5 si_ne
5 sa_tan
5 ta_ta
5 ru_ni
4 de_de
4 ne_ran
4 a_go
4 en_se
3 se_ge_te
3 go_so_gon
3 gu_to
3 na_da
3 we_ri
3 an_re
3 ra_go
3 si_ge
2 u_we
2 du_ra
2 se_ne
2 wi_di
2 gu_wo
2 le_le
2 te_le
2 se_ta
2 i_son
2 e_te_re
2 wi_ga
2 te_ri
2 le_tu
2 le_ri
1 da_co
1 ra_wo_ton
1 la_to
1 tu_gan
1 e_ta
1 i_tan
1 no_ne
1 ro_ta
1 wi_go
1 a_ti
1 ni_ge
1 te_so
1 to_wo
1 din_ru_we

There are a total of 85 words (19 short and 66 long). The long words use 52 unique tokens, which makes for over 140,000 possible three-token words. Therefore, it is very possible that these 85 words are, by themselves, complete words. Furthermore, these word counts are very consistent with a sample English text I looked at, once the most common words were removed. I don't know much about German, unfortunately, so I can't be much use there.

Token Frequency[edit]

The following shows the frequency with which each token appears in the text:

 ‘    204   15.5%       ra    25    1.9%       du    11    0.8%
 n    181   13.7%       do    23    1.7%       ga    10    0.8%
ta    131   9.9%         a    23    1.7%       re    10    0.8%
sa    70    5.3%        si    22    1.7%       di    9     0.7%
 e    69    5.2%        ru    21    1.6%       tu    9     0.7%
de    43    3.3%         i    21    1.6%        &    8     0.6%
ri    42    3.2%        ce    20    1.5%       gi    6     0.5%
te    39    3.0%        da    18    1.4%       so    6     0.5%
le    37    2.8%        se    16    1.2%       gu    5     0.4%
 o    36    2.7%        ti    16    1.2%       wi    5     0.4%
wo    33    2.5%        go    15    1.1%       ci    3     0.2%
we    32    2.4%        to    15    1.1%        u    2     0.2%
lo    30    2.3%        ro    12    0.9%       co    1     0.1%
ge    27    2.0%        la    12    0.9%       wu    1     0.1%

and the frequency of tokens, as defined by nneonneo:

125 ta
65 sa
43 de
42 ri
39 te
33 ne
32 le
28 we
27 ge
26 o
25 lo
23 do
22 si
21 ru
21 e
19 wo
18 da
16 na
16 se
15 tin
15 i
14 won
14 to
13 ran
12 la
12 ra
12 ro
11 go
11 du
10 en
10 ce
10 re
10 cen
9 tu
9 on
8 di
7 ga
6 tan
6 gin
6 ni
5 san
5 gu
5 nen
5 lon
5 len
5 wi
5 a
4 gon
4 wen
4 so
3 ci
3 an
3 gan
2 son
2 u
1 co
1 ton
1 ti
1 din
1 wu
1 no