Pergi ke kandungan

Modul:data consistency check/doc

Daripada Wikikamus

Ini ialah laman pendokumenan untuk Modul:data consistency check

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Output

[sunting]

Discrepancies detected:

  • The code oos has the invalid parent code "xln".
  • The code pnb is not unique; it is also defined in Modul:languages/data/3/p.
  • The code zhx-dan has the invalid parent code "zhx" (a family code).
  • The code zhx-zho has the invalid parent code "zhx" (a family code).
  • The code zls-chs-ru has the invalid parent code "zls-chs".
  • The code zls-chs-uk has the invalid parent code "zls-chs".
  • ero, the code for the canonical name Horpa, is wrong; it should be ero.
  • gba, the code for the canonical name Gbaya, is wrong; it should be gba.
  • gio, the code for the canonical name Gelao, is wrong; it should be gio.
  • raj, the code for the canonical name Rajasthan, is wrong; it should be raj.
  • ero, the code for the canonical name Horpa, is wrong; it should be ero.
  • gba, the code for the canonical name Gbaya, is wrong; it should be gba.
  • gio, the code for the canonical name Gelao, is wrong; it should be gio.
  • raj, the code for the canonical name Rajasthan, is wrong; it should be raj.
  • Gbaya (gba) has the invalid family code "alv-gba".
  • Gbaya-Bossangoa (gbp) has the invalid family code "alv-gbw".
  • Gbaya-Bozoum (gbq) has the invalid family code "alv-gbw".
  • Gbanu (gbv) has the invalid family code "alv-gbf".
  • Gelao Hijau (giq) has the invalid family code "qfa-gel".
  • Gelao Merah (gir) has the invalid family code "qfa-gel".
  • Mulao (giu) has the invalid family code "qfa-gel".
  • Gelao Putih (giw) has the invalid family code "qfa-gel".
  • Gbaya-Mbodomo (gmm) has the invalid family code "alv-gbf".
  • Qau (gqu) has the invalid family code "qfa-gel".
  • Gbaya Barat Daya (gso) has the invalid family code "alv-gbs".
  • Gbaya Barat Laut (gya) has the invalid family code "alv-gbw".
  • Salako (knx) has its canonical name ("Salako") repeated in the table of otherNames.
  • The translit field in the data table for Kui (India) (kxu) specifies the module Modul:kxv-translit, which does not exist.
  • The translit field in the data table for Kuvi (kxv) specifies the module Modul:kxv-translit, which does not exist.
  • Ye'kwana (mch) has its canonical name ("Ye'kwana") repeated in the table of otherNames.
  • Mah Meri (mhe) has its canonical name ("Mah Meri") repeated in the table of aliases.
  • Chin Mara (mrh) has its canonical name ("Chin Mara") repeated in the table of otherNames.
  • Manza (mzv) has the invalid family code "alv-gbf".
  • The sort_key field in the data table for Ireland Pertengahan (mga) specifies the module Modul:mga-sortkey, which does not exist.
  • The sort_key field in the data table for Mari Barat (mrj) specifies the module Modul:mrj-sortkey, which does not exist.
  • The sort_key field in the data table for Moksha (mdf) specifies the module Modul:mdf-sortkey, which does not exist.
  • The translit field in the data table for Mozarab (mxi) specifies the module Modul:mxi-translit, which does not exist.
  • The translit field in the data table for Manda (India) (mha) specifies the module Modul:kxv-translit, which does not exist.
  • Äiwoo (nfl) has its canonical name ("Äiwoo") repeated in the table of otherNames.
  • The sort_key field in the data table for Nivkh (niv) specifies the module Modul:niv-sortkey, which does not exist.
  • The sort_key field in the data table for Nupe (nup) specifies the module Modul:nup-sortkey, which does not exist.
  • Rejang (rej) has the invalid family code "poz-sus".
  • Rajbanshi (rjs) lists the invalid language code "inc-mgd" as its ancestor.
  • Kamta (rkt) lists the invalid language code "inc-ork" as its ancestor.
  • Lomavren (rmi) lists the invalid language code "psu" as its ancestor.
  • Domari (rmt) lists the invalid language code "psu" as its ancestor.
  • Romani (rom) lists the invalid language code "psu" as its ancestor.
  • Kriol (rop) has its canonical name ("Kriol") repeated in the table of otherNames.
  • Rusyn (rue) has its canonical name ("Rusyn") repeated in the table of aliases.
  • The sort_key field in the data table for Rusyn (rue) specifies the module Modul:rue-sortkey, which does not exist.
  • The sort_key field in the data table for Udmurt (udm) specifies the module Modul:udm-sortkey, which does not exist.
  • The sort_key field in the data table for Ulch (ulc) specifies the module Modul:ulc-sortkey, which does not exist.
  • The sort_key field in the data table for Ubykh (uby) specifies the module Modul:uby-sortkey, which does not exist.
  • The translit field in the data table for Ubykh (uby) specifies the module Modul:uby-translit, which does not exist.
  • Ura (Papua New Guinea) (uro) has its canonical name ("Ura (Papua New Guinea)") repeated in the table of otherNames.
  • tulisan Blissymbolic (Blis) is not used by any language and has no characters listed for auto-detection.
  • tulisan Cypro-Minoan (Cpmn) is not used by any language.
  • tulisan Hiragana (Hira) is not used by any language.
  • tulisan Kana (Hrkt) is not used by any language.
  • tulisan Iberia Timur Laut (Ibrnn) is not used by any language and has no characters listed for auto-detection.
  • tulisan Iberia Tenggara (Ibrns) is not used by any language and has no characters listed for auto-detection.
  • tulisan Kemasan Imej (Image) is not used by any language and has no characters listed for auto-detection.
  • Abjad Fonetik Antarabangsa (Ipach) is not used by any language and has no characters listed for auto-detection.
  • tulisan Kulitan (Kulit) is not used by any language and has no characters listed for auto-detection.
  • tulisan Moon (Moon) is not used by any language and has no characters listed for auto-detection.
  • Kod Morse (Morse) is not used by any language and has no characters listed for auto-detection.
  • Notasi Muzik (Music) is not used by any language.
  • tulisan Kuneiform Purba (Pcun) is not used by any language and has no characters listed for auto-detection.
  • tulisan Elam Purba (Pelm) is not used by any language and has no characters listed for auto-detection.
  • tulisan Sinaitik Purba (Psin) is not used by any language and has no characters listed for auto-detection.
  • tulisan Rongorongo (Roro) is not used by any language and has no characters listed for auto-detection.
  • tulisan Penomboran Rumi (Rumin) is not used by any language.
  • Tulisan flag semaphore (Semap) is not used by any language and has no characters listed for auto-detection.
  • tulisan Sidetic (Sidt) is not used by any language and has no characters listed for auto-detection.
  • tulisan Sunuwar (Sunu) is not used by any language.
  • tulisan Visible Speech (Visp) is not used by any language and has no characters listed for auto-detection.
  • tulisan Woleai (Wole) is not used by any language and has no characters listed for auto-detection.
  • Notasi Matematik (Zmth) is not used by any language.
  • tulisan Simbolik (Zsym) is not used by any language.
  • Tulisan undetermined (Zyyy) is not used by any language and has no characters listed for auto-detection.
  • tulisan Tidak Terkod (Zzzz) is not used by any language and has no characters listed for auto-detection.
  • The codes fa-Arab, ug-Arab, ks-Arab, ps-Arab, ur-Arab, ku-Arab, tt-Arab, ota-Arab, mzn-Arab and sd-Arab are currently alias codes. Only one code should be used in the data.
  • The codes ms-Arab and kk-Arab are currently alias codes. Only one code should be used in the data.

Checks performed

[sunting]

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".