ayaports/backports/postgresql15/icu-collations-hack.patch

From: Jakub Jirutka <jakub@jirutka.cz>
Date: Wed, 03 Aug 2022 20:40:33 +0200
Subject: [PATCH] Hack to generate usable ICU-based collations with
  icu-data-en

This is a downstream patch for Alpine Linux, it should never be
upstreamed in this form!

When the PostgreSQL cluster is initialized (using initdb(1)) or the
DB administrator calls `pg_import_system_collations()` directly, this
function creates COLLATIONs in the system catalog (pg_collations).
There are two types: libc-based and ICU-based. The latter are created
based on *locales* (not collations) known to ICU, i.e. based on the ICU
data installed at the time.

collationcmds.c includes the following comment:
> We use uloc_countAvailable()/uloc_getAvailable() rather than
> ucol_countAvailable()/ucol_getAvailable().  The former returns a full
> set of language+region combinations, whereas the latter only returns
> language+region combinations if they are distinct from the language's
> base collation.  So there might not be a de-DE or en-GB, which would be
> confusing.

There's a problem with this approach: locales and collations are two
different things. ICU data may include collation algorithms and data for
all or some languages, but not locales (language + country/region).
The collation data is small compared to locales. There are ~800 locales
(combinations of language, country and variants), but only 98 collations.
There's a mapping between collations and locales hidden somewhere in ICU
data.

Since full ICU data is very big (30 MiB), we have created a stripped down
variant with only English locale (package icu-data-en, 2.6 MiB). It also
includes a subset of 18 collations that cover hundreds of languages.

When the cluster is initialized or `pg_import_system_collations()` is
called directly and only icu-data-en (default) is installed, the user
ends up with only und, en and en_GB ICU-based COLLATIONs. The user can
create missing COLLATIONs manually, but this a) is not expected nor
reasonable behaviour, b) it's not easy to find out for which locales
there's a collation available for.

I couldn't find any way how to list all language+country variants for the
given collation. It can be constructed when we iterate over all locales,
but this approach is useless when we don't have the locale data
available... I should also note that the reverse lookup (locale ->
collation) is not a problem for ICU when full locale data is stripped.

So I ended up with a very ugly workaround: pre-generating a list of
collation -> locale mapping and embedding it in the collationcmds.c
source. Then we replace `uloc_countAvailable()`/`uloc_getAvailable()`
with `ucol_countAvailable()` / `ucol_getAvailable()` to iterate over
the collations instead of locales and lookup the locales in the
pre-generated list.

This data is quite stable, there's a very low risk of getting outdated in
a way that would be a problem.

`icu_coll_locales` has been generated using the following code:

    #include <stdio.h>
    #include <string.h>
    #include <unicode/ucol.h>
    
    // Copy-pasted from collationcmds.c.
    static char *get_icu_language_tag(const char *localename) {
        char buf[ULOC_FULLNAME_CAPACITY];
        UErrorCode status = U_ZERO_ERROR;
    
        uloc_toLanguageTag(localename, buf, sizeof(buf), true, &status);
    
        if (U_FAILURE(status)) {
            fprintf(stderr, "could not convert locale name \"%s\" to language tag: %s\n",
                    localename, u_errorName(status));
            return strdup(localename);
        }
        return strdup(buf);
    }
    
    int main() {
        UErrorCode status = U_ZERO_ERROR;
    
        for (int i = 0; i < uloc_countAvailable(); i++) {
            const char *locale = uloc_getAvailable(i);
    
            UCollator *collator = ucol_open(locale, &status);
            const char *actual_locale = ucol_getLocaleByType(collator, ULOC_ACTUAL_LOCALE, &status);
    
            // Strip @.*
            char *ptr = strchr(actual_locale, '@');
            if (ptr != NULL) {
                *ptr = '\0';
            }
            if (strcmp(actual_locale, "root") == 0) {
                actual_locale = "";
            }
            if (strcmp(actual_locale, locale) != 0) {
                printf("\"%s\", \"%s\",\n", actual_locale, get_icu_language_tag(locale));
            }
            ucol_close(collator);
        }
        return 0;
    }

compiled and executed using:

    gcc -o main main.c $(pkg-config --libs icu-uc icu-io) && ./main | sort | uniq

--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -572,6 +572,715 @@
 
 	return result;
 }
+
+/*
+ * XXX-Patched: Added a static mapping: collation name (parent) to locale (children)
+ * I'm gonna burn in hell for this...
+ */
+static char* icu_coll_locales[] = {
+	"", "agq",
+	"", "agq-CM",
+	"", "ak",
+	"", "ak-GH",
+	"", "asa",
+	"", "asa-TZ",
+	"", "ast",
+	"", "ast-ES",
+	"", "bas",
+	"", "bas-CM",
+	"", "bem",
+	"", "bem-ZM",
+	"", "bez",
+	"", "bez-TZ",
+	"", "bm",
+	"", "bm-ML",
+	"", "brx",
+	"", "brx-IN",
+	"", "ca",
+	"", "ca-AD",
+	"", "ca-ES",
+	"", "ca-FR",
+	"", "ca-IT",
+	"", "ccp",
+	"", "ccp-BD",
+	"", "ccp-IN",
+	"", "ce",
+	"", "ce-RU",
+	"", "cgg",
+	"", "cgg-UG",
+	"", "ckb",
+	"", "ckb-IQ",
+	"", "ckb-IR",
+	"", "dav",
+	"", "dav-KE",
+	"", "de",
+	"", "de-AT",
+	"", "de-BE",
+	"", "de-CH",
+	"", "de-DE",
+	"", "de-IT",
+	"", "de-LI",
+	"", "de-LU",
+	"", "dje",
+	"", "dje-NE",
+	"", "doi",
+	"", "doi-IN",
+	"", "dua",
+	"", "dua-CM",
+	"", "dyo",
+	"", "dyo-SN",
+	"", "dz",
+	"", "dz-BT",
+	"", "ebu",
+	"", "ebu-KE",
+	"", "en",
+	"", "en-001",
+	"", "en-150",
+	"", "en-AE",
+	"", "en-AG",
+	"", "en-AI",
+	"", "en-AS",
+	"", "en-AT",
+	"", "en-AU",
+	"", "en-BB",
+	"", "en-BE",
+	"", "en-BI",
+	"", "en-BM",
+	"", "en-BS",
+	"", "en-BW",
+	"", "en-BZ",
+	"", "en-CA",
+	"", "en-CC",
+	"", "en-CH",
+	"", "en-CK",
+	"", "en-CM",
+	"", "en-CX",
+	"", "en-CY",
+	"", "en-DE",
+	"", "en-DG",
+	"", "en-DK",
+	"", "en-DM",
+	"", "en-ER",
+	"", "en-FI",
+	"", "en-FJ",
+	"", "en-FK",
+	"", "en-FM",
+	"", "en-GB",
+	"", "en-GD",
+	"", "en-GG",
+	"", "en-GH",
+	"", "en-GI",
+	"", "en-GM",
+	"", "en-GU",
+	"", "en-GY",
+	"", "en-HK",
+	"", "en-IE",
+	"", "en-IL",
+	"", "en-IM",
+	"", "en-IN",
+	"", "en-IO",
+	"", "en-JE",
+	"", "en-JM",
+	"", "en-KE",
+	"", "en-KI",
+	"", "en-KN",
+	"", "en-KY",
+	"", "en-LC",
+	"", "en-LR",
+	"", "en-LS",
+	"", "en-MG",
+	"", "en-MH",
+	"", "en-MO",
+	"", "en-MP",
+	"", "en-MS",
+	"", "en-MT",
+	"", "en-MU",
+	"", "en-MV",
+	"", "en-MW",
+	"", "en-MY",
+	"", "en-NA",
+	"", "en-NF",
+	"", "en-NG",
+	"", "en-NL",
+	"", "en-NR",
+	"", "en-NU",
+	"", "en-NZ",
+	"", "en-PG",
+	"", "en-PH",
+	"", "en-PK",
+	"", "en-PN",
+	"", "en-PR",
+	"", "en-PW",
+	"", "en-RW",
+	"", "en-SB",
+	"", "en-SC",
+	"", "en-SD",
+	"", "en-SE",
+	"", "en-SG",
+	"", "en-SH",
+	"", "en-SI",
+	"", "en-SL",
+	"", "en-SS",
+	"", "en-SX",
+	"", "en-SZ",
+	"", "en-TC",
+	"", "en-TK",
+	"", "en-TO",
+	"", "en-TT",
+	"", "en-TV",
+	"", "en-TZ",
+	"", "en-UG",
+	"", "en-UM",
+	"", "en-US",
+	"", "en-VC",
+	"", "en-VG",
+	"", "en-VI",
+	"", "en-VU",
+	"", "en-WS",
+	"", "en-ZA",
+	"", "en-ZM",
+	"", "en-ZW",
+	"", "eu",
+	"", "eu-ES",
+	"", "ewo",
+	"", "ewo-CM",
+	"", "ff",
+	"", "ff-Latn",
+	"", "ff-Latn-BF",
+	"", "ff-Latn-CM",
+	"", "ff-Latn-GH",
+	"", "ff-Latn-GM",
+	"", "ff-Latn-GN",
+	"", "ff-Latn-GW",
+	"", "ff-Latn-LR",
+	"", "ff-Latn-MR",
+	"", "ff-Latn-NE",
+	"", "ff-Latn-NG",
+	"", "ff-Latn-SL",
+	"", "ff-Latn-SN",
+	"", "fr",
+	"", "fr-BE",
+	"", "fr-BF",
+	"", "fr-BI",
+	"", "fr-BJ",
+	"", "fr-BL",
+	"", "fr-CD",
+	"", "fr-CF",
+	"", "fr-CG",
+	"", "fr-CH",
+	"", "fr-CI",
+	"", "fr-CM",
+	"", "fr-DJ",
+	"", "fr-DZ",
+	"", "fr-FR",
+	"", "fr-GA",
+	"", "fr-GF",
+	"", "fr-GN",
+	"", "fr-GP",
+	"", "fr-GQ",
+	"", "fr-HT",
+	"", "fr-KM",
+	"", "fr-LU",
+	"", "fr-MA",
+	"", "fr-MC",
+	"", "fr-MF",
+	"", "fr-MG",
+	"", "fr-ML",
+	"", "fr-MQ",
+	"", "fr-MR",
+	"", "fr-MU",
+	"", "fr-NC",
+	"", "fr-NE",
+	"", "fr-PF",
+	"", "fr-PM",
+	"", "fr-RE",
+	"", "fr-RW",
+	"", "fr-SC",
+	"", "fr-SN",
+	"", "fr-SY",
+	"", "fr-TD",
+	"", "fr-TG",
+	"", "fr-TN",
+	"", "fr-VU",
+	"", "fr-WF",
+	"", "fr-YT",
+	"", "fur",
+	"", "fur-IT",
+	"", "fy",
+	"", "fy-NL",
+	"", "ga",
+	"", "ga-GB",
+	"", "ga-IE",
+	"", "gd",
+	"", "gd-GB",
+	"", "gsw",
+	"", "gsw-CH",
+	"", "gsw-FR",
+	"", "gsw-LI",
+	"", "guz",
+	"", "guz-KE",
+	"", "gv",
+	"", "gv-IM",
+	"", "ia",
+	"", "ia-001",
+	"", "id",
+	"", "id-ID",
+	"", "ii",
+	"", "ii-CN",
+	"", "it",
+	"", "it-CH",
+	"", "it-IT",
+	"", "it-SM",
+	"", "it-VA",
+	"", "jgo",
+	"", "jgo-CM",
+	"", "jmc",
+	"", "jmc-TZ",
+	"", "jv",
+	"", "jv-ID",
+	"", "kab",
+	"", "kab-DZ",
+	"", "kam",
+	"", "kam-KE",
+	"", "kde",
+	"", "kde-TZ",
+	"", "kea",
+	"", "kea-CV",
+	"", "kgp",
+	"", "kgp-BR",
+	"", "khq",
+	"", "khq-ML",
+	"", "ki",
+	"", "ki-KE",
+	"", "kkj",
+	"", "kkj-CM",
+	"", "kln",
+	"", "kln-KE",
+	"", "ks",
+	"", "ks-Arab",
+	"", "ks-Arab-IN",
+	"", "ks-Deva",
+	"", "ks-Deva-IN",
+	"", "ksb",
+	"", "ksb-TZ",
+	"", "ksf",
+	"", "ksf-CM",
+	"", "ksh",
+	"", "ksh-DE",
+	"", "kw",
+	"", "kw-GB",
+	"", "lag",
+	"", "lag-TZ",
+	"", "lb",
+	"", "lb-LU",
+	"", "lg",
+	"", "lg-UG",
+	"", "lrc",
+	"", "lrc-IQ",
+	"", "lrc-IR",
+	"", "lu",
+	"", "lu-CD",
+	"", "luo",
+	"", "luo-KE",
+	"", "luy",
+	"", "luy-KE",
+	"", "mai",
+	"", "mai-IN",
+	"", "mas",
+	"", "mas-KE",
+	"", "mas-TZ",
+	"", "mer",
+	"", "mer-KE",
+	"", "mfe",
+	"", "mfe-MU",
+	"", "mg",
+	"", "mg-MG",
+	"", "mgh",
+	"", "mgh-MZ",
+	"", "mgo",
+	"", "mgo-CM",
+	"", "mi",
+	"", "mi-NZ",
+	"", "mni",
+	"", "mni-Beng",
+	"", "mni-Beng-IN",
+	"", "ms",
+	"", "ms-BN",
+	"", "ms-ID",
+	"", "ms-MY",
+	"", "ms-SG",
+	"", "mua",
+	"", "mua-CM",
+	"", "mzn",
+	"", "mzn-IR",
+	"", "naq",
+	"", "naq-NA",
+	"", "nd",
+	"", "nd-ZW",
+	"", "nl",
+	"", "nl-AW",
+	"", "nl-BE",
+	"", "nl-BQ",
+	"", "nl-CW",
+	"", "nl-NL",
+	"", "nl-SR",
+	"", "nl-SX",
+	"", "nmg",
+	"", "nmg-CM",
+	"", "nnh",
+	"", "nnh-CM",
+	"", "nus",
+	"", "nus-SS",
+	"", "nyn",
+	"", "nyn-UG",
+	"", "os",
+	"", "os-GE",
+	"", "os-RU",
+	"", "pcm",
+	"", "pcm-NG",
+	"", "pt",
+	"", "pt-AO",
+	"", "pt-BR",
+	"", "pt-CH",
+	"", "pt-CV",
+	"", "pt-GQ",
+	"", "pt-GW",
+	"", "pt-LU",
+	"", "pt-MO",
+	"", "pt-MZ",
+	"", "pt-PT",
+	"", "pt-ST",
+	"", "pt-TL",
+	"", "qu",
+	"", "qu-BO",
+	"", "qu-EC",
+	"", "qu-PE",
+	"", "rm",
+	"", "rm-CH",
+	"", "rn",
+	"", "rn-BI",
+	"", "rof",
+	"", "rof-TZ",
+	"", "rw",
+	"", "rw-RW",
+	"", "rwk",
+	"", "rwk-TZ",
+	"", "sa",
+	"", "sa-IN",
+	"", "sah",
+	"", "sah-RU",
+	"", "saq",
+	"", "saq-KE",
+	"", "sat",
+	"", "sat-Olck",
+	"", "sat-Olck-IN",
+	"", "sbp",
+	"", "sbp-TZ",
+	"", "sc",
+	"", "sc-IT",
+	"", "sd",
+	"", "sd-Arab",
+	"", "sd-Arab-PK",
+	"", "sd-Deva",
+	"", "sd-Deva-IN",
+	"", "seh",
+	"", "seh-MZ",
+	"", "ses",
+	"", "ses-ML",
+	"", "sg",
+	"", "sg-CF",
+	"", "shi",
+	"", "shi-Latn",
+	"", "shi-Latn-MA",
+	"", "shi-Tfng",
+	"", "shi-Tfng-MA",
+	"", "sn",
+	"", "sn-ZW",
+	"", "so",
+	"", "so-DJ",
+	"", "so-ET",
+	"", "so-KE",
+	"", "so-SO",
+	"", "su",
+	"", "su-Latn",
+	"", "su-Latn-ID",
+	"", "sw",
+	"", "sw-CD",
+	"", "sw-KE",
+	"", "sw-TZ",
+	"", "sw-UG",
+	"", "teo",
+	"", "teo-KE",
+	"", "teo-UG",
+	"", "tg",
+	"", "tg-TJ",
+	"", "ti",
+	"", "ti-ER",
+	"", "ti-ET",
+	"", "tt",
+	"", "tt-RU",
+	"", "twq",
+	"", "twq-NE",
+	"", "tzm",
+	"", "tzm-MA",
+	"", "vai",
+	"", "vai-Latn",
+	"", "vai-Latn-LR",
+	"", "vai-Vaii",
+	"", "vai-Vaii-LR",
+	"", "vun",
+	"", "vun-TZ",
+	"", "wae",
+	"", "wae-CH",
+	"", "xh",
+	"", "xh-ZA",
+	"", "xog",
+	"", "xog-UG",
+	"", "yav",
+	"", "yav-CM",
+	"", "yrl",
+	"", "yrl-BR",
+	"", "yrl-CO",
+	"", "yrl-VE",
+	"", "zgh",
+	"", "zgh-MA",
+	"", "zu",
+	"", "zu-ZA",
+	"af", "af-NA",
+	"af", "af-ZA",
+	"am", "am-ET",
+	"ar", "ar-001",
+	"ar", "ar-AE",
+	"ar", "ar-BH",
+	"ar", "ar-DJ",
+	"ar", "ar-DZ",
+	"ar", "ar-EG",
+	"ar", "ar-EH",
+	"ar", "ar-ER",
+	"ar", "ar-IL",
+	"ar", "ar-IQ",
+	"ar", "ar-JO",
+	"ar", "ar-KM",
+	"ar", "ar-KW",
+	"ar", "ar-LB",
+	"ar", "ar-LY",
+	"ar", "ar-MA",
+	"ar", "ar-MR",
+	"ar", "ar-OM",
+	"ar", "ar-PS",
+	"ar", "ar-QA",
+	"ar", "ar-SA",
+	"ar", "ar-SD",
+	"ar", "ar-SO",
+	"ar", "ar-SS",
+	"ar", "ar-SY",
+	"ar", "ar-TD",
+	"ar", "ar-TN",
+	"ar", "ar-YE",
+	"as", "as-IN",
+	"az", "az-Cyrl",
+	"az", "az-Cyrl-AZ",
+	"az", "az-Latn",
+	"az", "az-Latn-AZ",
+	"be", "be-BY",
+	"bg", "bg-BG",
+	"bn", "bn-BD",
+	"bn", "bn-IN",
+	"bo", "bo-CN",
+	"bo", "bo-IN",
+	"br", "br-FR",
+	"bs", "bs-Latn",
+	"bs", "bs-Latn-BA",
+	"bs_Cyrl", "bs-Cyrl-BA",
+	"ceb", "ceb-PH",
+	"chr", "chr-US",
+	"cs", "cs-CZ",
+	"cy", "cy-GB",
+	"da", "da-DK",
+	"da", "da-GL",
+	"dsb", "dsb-DE",
+	"ee", "ee-GH",
+	"ee", "ee-TG",
+	"el", "el-CY",
+	"el", "el-GR",
+	"eo", "eo-001",
+	"es", "es-419",
+	"es", "es-AR",
+	"es", "es-BO",
+	"es", "es-BR",
+	"es", "es-BZ",
+	"es", "es-CL",
+	"es", "es-CO",
+	"es", "es-CR",
+	"es", "es-CU",
+	"es", "es-DO",
+	"es", "es-EA",
+	"es", "es-EC",
+	"es", "es-ES",
+	"es", "es-GQ",
+	"es", "es-GT",
+	"es", "es-HN",
+	"es", "es-IC",
+	"es", "es-MX",
+	"es", "es-NI",
+	"es", "es-PA",
+	"es", "es-PE",
+	"es", "es-PH",
+	"es", "es-PR",
+	"es", "es-PY",
+	"es", "es-SV",
+	"es", "es-US",
+	"es", "es-UY",
+	"es", "es-VE",
+	"et", "et-EE",
+	"fa", "fa-IR",
+	"ff_Adlm", "ff-Adlm-BF",
+	"ff_Adlm", "ff-Adlm-CM",
+	"ff_Adlm", "ff-Adlm-GH",
+	"ff_Adlm", "ff-Adlm-GM",
+	"ff_Adlm", "ff-Adlm-GN",
+	"ff_Adlm", "ff-Adlm-GW",
+	"ff_Adlm", "ff-Adlm-LR",
+	"ff_Adlm", "ff-Adlm-MR",
+	"ff_Adlm", "ff-Adlm-NE",
+	"ff_Adlm", "ff-Adlm-NG",
+	"ff_Adlm", "ff-Adlm-SL",
+	"ff_Adlm", "ff-Adlm-SN",
+	"fi", "fi-FI",
+	"fil", "fil-PH",
+	"fo", "fo-DK",
+	"fo", "fo-FO",
+	"gl", "gl-ES",
+	"gu", "gu-IN",
+	"ha", "ha-GH",
+	"ha", "ha-NE",
+	"ha", "ha-NG",
+	"haw", "haw-US",
+	"he", "he-IL",
+	"hi", "hi-IN",
+	"hi", "hi-Latn",
+	"hi", "hi-Latn-IN",
+	"hr", "hr-BA",
+	"hr", "hr-HR",
+	"hsb", "hsb-DE",
+	"hu", "hu-HU",
+	"hy", "hy-AM",
+	"ig", "ig-NG",
+	"is", "is-IS",
+	"ja", "ja-JP",
+	"ka", "ka-GE",
+	"kk", "kk-KZ",
+	"kl", "kl-GL",
+	"km", "km-KH",
+	"kn", "kn-IN",
+	"ko", "ko-KP",
+	"ko", "ko-KR",
+	"kok", "kok-IN",
+	"ku", "ku-TR",
+	"ky", "ky-KG",
+	"lkt", "lkt-US",
+	"ln", "ln-AO",
+	"ln", "ln-CD",
+	"ln", "ln-CF",
+	"ln", "ln-CG",
+	"lo", "lo-LA",
+	"lt", "lt-LT",
+	"lv", "lv-LV",
+	"mk", "mk-MK",
+	"ml", "ml-IN",
+	"mn", "mn-MN",
+	"mr", "mr-IN",
+	"mt", "mt-MT",
+	"my", "my-MM",
+	"ne", "ne-IN",
+	"ne", "ne-NP",
+	"no", "nb",
+	"no", "nb-NO",
+	"no", "nb-SJ",
+	"no", "nn",
+	"no", "nn-NO",
+	"om", "om-ET",
+	"om", "om-KE",
+	"or", "or-IN",
+	"pa", "pa-Arab",
+	"pa", "pa-Arab-PK",
+	"pa", "pa-Guru",
+	"pa", "pa-Guru-IN",
+	"pl", "pl-PL",
+	"ps", "ps-AF",
+	"ps", "ps-PK",
+	"ro", "ro-MD",
+	"ro", "ro-RO",
+	"ru", "ru-BY",
+	"ru", "ru-KG",
+	"ru", "ru-KZ",
+	"ru", "ru-MD",
+	"ru", "ru-RU",
+	"ru", "ru-UA",
+	"se", "se-FI",
+	"se", "se-NO",
+	"se", "se-SE",
+	"si", "si-LK",
+	"sk", "sk-SK",
+	"sl", "sl-SI",
+	"smn", "smn-FI",
+	"sq", "sq-AL",
+	"sq", "sq-MK",
+	"sq", "sq-XK",
+	"sr", "sr-Cyrl",
+	"sr", "sr-Cyrl-BA",
+	"sr", "sr-Cyrl-ME",
+	"sr", "sr-Cyrl-RS",
+	"sr", "sr-Cyrl-XK",
+	"sr_Latn", "sr-Latn-BA",
+	"sr_Latn", "sr-Latn-ME",
+	"sr_Latn", "sr-Latn-RS",
+	"sr_Latn", "sr-Latn-XK",
+	"sv", "sv-AX",
+	"sv", "sv-FI",
+	"sv", "sv-SE",
+	"ta", "ta-IN",
+	"ta", "ta-LK",
+	"ta", "ta-MY",
+	"ta", "ta-SG",
+	"te", "te-IN",
+	"th", "th-TH",
+	"tk", "tk-TM",
+	"to", "to-TO",
+	"tr", "tr-CY",
+	"tr", "tr-TR",
+	"ug", "ug-CN",
+	"uk", "uk-UA",
+	"ur", "ur-IN",
+	"ur", "ur-PK",
+	"uz", "uz-Arab",
+	"uz", "uz-Arab-AF",
+	"uz", "uz-Cyrl",
+	"uz", "uz-Cyrl-UZ",
+	"uz", "uz-Latn",
+	"uz", "uz-Latn-UZ",
+	"vi", "vi-VN",
+	"wo", "wo-SN",
+	"yi", "yi-001",
+	"yo", "yo-BJ",
+	"yo", "yo-NG",
+	"zh", "yue",
+	"zh", "yue-Hans",
+	"zh", "yue-Hans-CN",
+	"zh", "yue-Hant",
+	"zh", "yue-Hant-HK",
+	"zh", "zh-Hans",
+	"zh", "zh-Hans-CN",
+	"zh", "zh-Hans-HK",
+	"zh", "zh-Hans-MO",
+	"zh", "zh-Hans-SG",
+	"zh", "zh-Hant",
+	"zh", "zh-Hant-HK",
+	"zh", "zh-Hant-MO",
+	"zh", "zh-Hant-TW",
+	NULL, NULL,
+};
+
 #endif							/* USE_ICU */
 
 
@@ -772,18 +1481,19 @@
 		 * Start the loop at -1 to sneak in the root locale without too much
 		 * code duplication.
 		 */
-		for (i = -1; i < uloc_countAvailable(); i++)
+		for (i = -1; i < ucol_countAvailable(); i++)  /* XXX-Patched: changed from uloc_countAvailable() */
 		{
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
 			const char *iculocstr;
 			Oid			collid;
+			char	   **ptr;  /* XXX-Patched: added */
 
 			if (i == -1)
 				name = "";		/* ICU root locale */
 			else
-				name = uloc_getAvailable(i);
+				name = ucol_getAvailable(i);  /* XXX-Patched: changed from uloc_getAvailable() */
 
 			langtag = get_icu_language_tag(name);
 			iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
@@ -812,6 +1523,44 @@
 					CreateComments(collid, CollationRelationId, 0,
 								   icucomment);
 			}
+
+			/*
+			 * XXX-Patched: The following block is added to create collations also for derived
+			 * locales (combination of language+country/region).
+			 * It's terribly inefficient, but in the big picture, it doesn't matter that much
+			 * (it's typically called only once in the life of the cluster).
+			 */
+			for (ptr = icu_coll_locales; *ptr != NULL; ptr++)
+			{
+				/*
+				 * icu_coll_locales is a 1D array of pairs: collation name and locale (langtag).
+				 * ptr++ moves pointer to the second string of the pair and it's a post-increment,
+				 * so after the comparison with name is evaluated.
+				 */
+				if (strcmp(*ptr++, name) == 0) {
+					const char *langtag;
+
+					langtag = pstrdup(*ptr);
+					collid = CollationCreate(psprintf("%s-x-icu", langtag),
+										 nspid, GetUserId(),
+										 COLLPROVIDER_ICU, true, -1,
+										 NULL, NULL, langtag,
+										 get_collation_actual_version(COLLPROVIDER_ICU, langtag),
+										 true, true);
+
+					if (OidIsValid(collid))
+					{
+						ncreated++;
+
+						CommandCounterIncrement();
+
+						icucomment = get_icu_locale_comment(langtag);
+						if (icucomment)
+							CreateComments(collid, CollationRelationId, 0,
+										   icucomment);
+					}
+				}
+			}
 		}
 	}
 #endif							/* USE_ICU */
backports/postgresql15: new aport 2023-07-12 17:16:48 -04:00			`From: Jakub Jirutka <jakub@jirutka.cz>`
			`Date: Wed, 03 Aug 2022 20:40:33 +0200`
			`Subject: [PATCH] Hack to generate usable ICU-based collations with`
			`icu-data-en`

			`This is a downstream patch for Alpine Linux, it should never be`
			`upstreamed in this form!`

			`When the PostgreSQL cluster is initialized (using initdb(1)) or the`
			DB administrator calls `pg_import_system_collations()` directly, this
			`function creates COLLATIONs in the system catalog (pg_collations).`
			`There are two types: libc-based and ICU-based. The latter are created`
			`based on locales (not collations) known to ICU, i.e. based on the ICU`
			`data installed at the time.`

			`collationcmds.c includes the following comment:`
			`> We use uloc_countAvailable()/uloc_getAvailable() rather than`
			`> ucol_countAvailable()/ucol_getAvailable(). The former returns a full`
			`> set of language+region combinations, whereas the latter only returns`
			`> language+region combinations if they are distinct from the language's`
			`> base collation. So there might not be a de-DE or en-GB, which would be`
			`> confusing.`

			`There's a problem with this approach: locales and collations are two`
			`different things. ICU data may include collation algorithms and data for`
			`all or some languages, but not locales (language + country/region).`
			`The collation data is small compared to locales. There are ~800 locales`
			`(combinations of language, country and variants), but only 98 collations.`
			`There's a mapping between collations and locales hidden somewhere in ICU`
			`data.`

			`Since full ICU data is very big (30 MiB), we have created a stripped down`
			`variant with only English locale (package icu-data-en, 2.6 MiB). It also`
			`includes a subset of 18 collations that cover hundreds of languages.`

			When the cluster is initialized or `pg_import_system_collations()` is
			`called directly and only icu-data-en (default) is installed, the user`
			`ends up with only und, en and en_GB ICU-based COLLATIONs. The user can`
			`create missing COLLATIONs manually, but this a) is not expected nor`
			`reasonable behaviour, b) it's not easy to find out for which locales`
			`there's a collation available for.`

			`I couldn't find any way how to list all language+country variants for the`
			`given collation. It can be constructed when we iterate over all locales,`
			`but this approach is useless when we don't have the locale data`
			`available... I should also note that the reverse lookup (locale ->`
			`collation) is not a problem for ICU when full locale data is stripped.`

			`So I ended up with a very ugly workaround: pre-generating a list of`
			`collation -> locale mapping and embedding it in the collationcmds.c`
			source. Then we replace `uloc_countAvailable()`/`uloc_getAvailable()`
			with `ucol_countAvailable()` / `ucol_getAvailable()` to iterate over
			`the collations instead of locales and lookup the locales in the`
			`pre-generated list.`

			`This data is quite stable, there's a very low risk of getting outdated in`
			`a way that would be a problem.`

			`icu_coll_locales` has been generated using the following code:

			`#include <stdio.h>`
			`#include <string.h>`
			`#include <unicode/ucol.h>`

			`// Copy-pasted from collationcmds.c.`
			`static char get_icu_language_tag(const char localename) {`
			`char buf[ULOC_FULLNAME_CAPACITY];`
			`UErrorCode status = U_ZERO_ERROR;`

			`uloc_toLanguageTag(localename, buf, sizeof(buf), true, &status);`

			`if (U_FAILURE(status)) {`
			`fprintf(stderr, "could not convert locale name \"%s\" to language tag: %s\n",`
			`localename, u_errorName(status));`
			`return strdup(localename);`
			`}`
			`return strdup(buf);`
			`}`

			`int main() {`
			`UErrorCode status = U_ZERO_ERROR;`

			`for (int i = 0; i < uloc_countAvailable(); i++) {`
			`const char *locale = uloc_getAvailable(i);`

			`UCollator *collator = ucol_open(locale, &status);`
			`const char *actual_locale = ucol_getLocaleByType(collator, ULOC_ACTUAL_LOCALE, &status);`

			`// Strip @.*`
			`char *ptr = strchr(actual_locale, '@');`
			`if (ptr != NULL) {`
			`*ptr = '\0';`
			`}`
			`if (strcmp(actual_locale, "root") == 0) {`
			`actual_locale = "";`
			`}`
			`if (strcmp(actual_locale, locale) != 0) {`
			`printf("\"%s\", \"%s\",\n", actual_locale, get_icu_language_tag(locale));`
			`}`
			`ucol_close(collator);`
			`}`
			`return 0;`
			`}`

			`compiled and executed using:`

			`gcc -o main main.c $(pkg-config --libs icu-uc icu-io) && ./main \| sort \| uniq`

			`--- a/src/backend/commands/collationcmds.c`
			`+++ b/src/backend/commands/collationcmds.c`
			`@@ -572,6 +572,715 @@`

			`return result;`
			`}`
			`+`
			`+/*`
			`+ * XXX-Patched: Added a static mapping: collation name (parent) to locale (children)`
			`+ * I'm gonna burn in hell for this...`
			`+ */`
			`+static char* icu_coll_locales[] = {`
			`+ "", "agq",`
			`+ "", "agq-CM",`
			`+ "", "ak",`
			`+ "", "ak-GH",`
			`+ "", "asa",`
			`+ "", "asa-TZ",`
			`+ "", "ast",`
			`+ "", "ast-ES",`
			`+ "", "bas",`
			`+ "", "bas-CM",`
			`+ "", "bem",`
			`+ "", "bem-ZM",`
			`+ "", "bez",`
			`+ "", "bez-TZ",`
			`+ "", "bm",`
			`+ "", "bm-ML",`
			`+ "", "brx",`
			`+ "", "brx-IN",`
			`+ "", "ca",`
			`+ "", "ca-AD",`
			`+ "", "ca-ES",`
			`+ "", "ca-FR",`
			`+ "", "ca-IT",`
			`+ "", "ccp",`
			`+ "", "ccp-BD",`
			`+ "", "ccp-IN",`
			`+ "", "ce",`
			`+ "", "ce-RU",`
			`+ "", "cgg",`
			`+ "", "cgg-UG",`
			`+ "", "ckb",`
			`+ "", "ckb-IQ",`
			`+ "", "ckb-IR",`
			`+ "", "dav",`
			`+ "", "dav-KE",`
			`+ "", "de",`
			`+ "", "de-AT",`
			`+ "", "de-BE",`
			`+ "", "de-CH",`
			`+ "", "de-DE",`
			`+ "", "de-IT",`
			`+ "", "de-LI",`
			`+ "", "de-LU",`
			`+ "", "dje",`
			`+ "", "dje-NE",`
			`+ "", "doi",`
			`+ "", "doi-IN",`
			`+ "", "dua",`
			`+ "", "dua-CM",`
			`+ "", "dyo",`
			`+ "", "dyo-SN",`
			`+ "", "dz",`
			`+ "", "dz-BT",`
			`+ "", "ebu",`
			`+ "", "ebu-KE",`
			`+ "", "en",`
			`+ "", "en-001",`
			`+ "", "en-150",`
			`+ "", "en-AE",`
			`+ "", "en-AG",`
			`+ "", "en-AI",`
			`+ "", "en-AS",`
			`+ "", "en-AT",`
			`+ "", "en-AU",`
			`+ "", "en-BB",`
			`+ "", "en-BE",`
			`+ "", "en-BI",`
			`+ "", "en-BM",`
			`+ "", "en-BS",`
			`+ "", "en-BW",`
			`+ "", "en-BZ",`
			`+ "", "en-CA",`
			`+ "", "en-CC",`
			`+ "", "en-CH",`
			`+ "", "en-CK",`
			`+ "", "en-CM",`
			`+ "", "en-CX",`
			`+ "", "en-CY",`
			`+ "", "en-DE",`
			`+ "", "en-DG",`
			`+ "", "en-DK",`
			`+ "", "en-DM",`
			`+ "", "en-ER",`
			`+ "", "en-FI",`
			`+ "", "en-FJ",`
			`+ "", "en-FK",`
			`+ "", "en-FM",`
			`+ "", "en-GB",`
			`+ "", "en-GD",`
			`+ "", "en-GG",`
			`+ "", "en-GH",`
			`+ "", "en-GI",`
			`+ "", "en-GM",`
			`+ "", "en-GU",`
			`+ "", "en-GY",`
			`+ "", "en-HK",`
			`+ "", "en-IE",`
			`+ "", "en-IL",`
			`+ "", "en-IM",`
			`+ "", "en-IN",`
			`+ "", "en-IO",`
			`+ "", "en-JE",`
			`+ "", "en-JM",`
			`+ "", "en-KE",`
			`+ "", "en-KI",`
			`+ "", "en-KN",`
			`+ "", "en-KY",`
			`+ "", "en-LC",`
			`+ "", "en-LR",`
			`+ "", "en-LS",`
			`+ "", "en-MG",`
			`+ "", "en-MH",`
			`+ "", "en-MO",`
			`+ "", "en-MP",`
			`+ "", "en-MS",`
			`+ "", "en-MT",`
			`+ "", "en-MU",`
			`+ "", "en-MV",`
			`+ "", "en-MW",`
			`+ "", "en-MY",`
			`+ "", "en-NA",`
			`+ "", "en-NF",`
			`+ "", "en-NG",`
			`+ "", "en-NL",`
			`+ "", "en-NR",`
			`+ "", "en-NU",`
			`+ "", "en-NZ",`
			`+ "", "en-PG",`
			`+ "", "en-PH",`
			`+ "", "en-PK",`
			`+ "", "en-PN",`
			`+ "", "en-PR",`
			`+ "", "en-PW",`
			`+ "", "en-RW",`
			`+ "", "en-SB",`
			`+ "", "en-SC",`
			`+ "", "en-SD",`
			`+ "", "en-SE",`
			`+ "", "en-SG",`
			`+ "", "en-SH",`
			`+ "", "en-SI",`
			`+ "", "en-SL",`
			`+ "", "en-SS",`
			`+ "", "en-SX",`
			`+ "", "en-SZ",`
			`+ "", "en-TC",`
			`+ "", "en-TK",`
			`+ "", "en-TO",`
			`+ "", "en-TT",`
			`+ "", "en-TV",`
			`+ "", "en-TZ",`
			`+ "", "en-UG",`
			`+ "", "en-UM",`
			`+ "", "en-US",`
			`+ "", "en-VC",`
			`+ "", "en-VG",`
			`+ "", "en-VI",`
			`+ "", "en-VU",`
			`+ "", "en-WS",`
			`+ "", "en-ZA",`
			`+ "", "en-ZM",`
			`+ "", "en-ZW",`
			`+ "", "eu",`
			`+ "", "eu-ES",`
			`+ "", "ewo",`
			`+ "", "ewo-CM",`
			`+ "", "ff",`
			`+ "", "ff-Latn",`
			`+ "", "ff-Latn-BF",`
			`+ "", "ff-Latn-CM",`
			`+ "", "ff-Latn-GH",`
			`+ "", "ff-Latn-GM",`
			`+ "", "ff-Latn-GN",`
			`+ "", "ff-Latn-GW",`
			`+ "", "ff-Latn-LR",`
			`+ "", "ff-Latn-MR",`
			`+ "", "ff-Latn-NE",`
			`+ "", "ff-Latn-NG",`
			`+ "", "ff-Latn-SL",`
			`+ "", "ff-Latn-SN",`
			`+ "", "fr",`
			`+ "", "fr-BE",`
			`+ "", "fr-BF",`
			`+ "", "fr-BI",`
			`+ "", "fr-BJ",`
			`+ "", "fr-BL",`
			`+ "", "fr-CD",`
			`+ "", "fr-CF",`
			`+ "", "fr-CG",`
			`+ "", "fr-CH",`
			`+ "", "fr-CI",`
			`+ "", "fr-CM",`
			`+ "", "fr-DJ",`
			`+ "", "fr-DZ",`
			`+ "", "fr-FR",`
			`+ "", "fr-GA",`
			`+ "", "fr-GF",`
			`+ "", "fr-GN",`
			`+ "", "fr-GP",`
			`+ "", "fr-GQ",`
			`+ "", "fr-HT",`
			`+ "", "fr-KM",`
			`+ "", "fr-LU",`
			`+ "", "fr-MA",`
			`+ "", "fr-MC",`
			`+ "", "fr-MF",`
			`+ "", "fr-MG",`
			`+ "", "fr-ML",`
			`+ "", "fr-MQ",`
			`+ "", "fr-MR",`
			`+ "", "fr-MU",`
			`+ "", "fr-NC",`
			`+ "", "fr-NE",`
			`+ "", "fr-PF",`
			`+ "", "fr-PM",`
			`+ "", "fr-RE",`
			`+ "", "fr-RW",`
			`+ "", "fr-SC",`
			`+ "", "fr-SN",`
			`+ "", "fr-SY",`
			`+ "", "fr-TD",`
			`+ "", "fr-TG",`
			`+ "", "fr-TN",`
			`+ "", "fr-VU",`
			`+ "", "fr-WF",`
			`+ "", "fr-YT",`
			`+ "", "fur",`
			`+ "", "fur-IT",`
			`+ "", "fy",`
			`+ "", "fy-NL",`
			`+ "", "ga",`
			`+ "", "ga-GB",`
			`+ "", "ga-IE",`
			`+ "", "gd",`
			`+ "", "gd-GB",`
			`+ "", "gsw",`
			`+ "", "gsw-CH",`
			`+ "", "gsw-FR",`
			`+ "", "gsw-LI",`
			`+ "", "guz",`
			`+ "", "guz-KE",`
			`+ "", "gv",`
			`+ "", "gv-IM",`
			`+ "", "ia",`
			`+ "", "ia-001",`
			`+ "", "id",`
			`+ "", "id-ID",`
			`+ "", "ii",`
			`+ "", "ii-CN",`
			`+ "", "it",`
			`+ "", "it-CH",`
			`+ "", "it-IT",`
			`+ "", "it-SM",`
			`+ "", "it-VA",`
			`+ "", "jgo",`
			`+ "", "jgo-CM",`
			`+ "", "jmc",`
			`+ "", "jmc-TZ",`
			`+ "", "jv",`
			`+ "", "jv-ID",`
			`+ "", "kab",`
			`+ "", "kab-DZ",`
			`+ "", "kam",`
			`+ "", "kam-KE",`
			`+ "", "kde",`
			`+ "", "kde-TZ",`
			`+ "", "kea",`
			`+ "", "kea-CV",`
			`+ "", "kgp",`
			`+ "", "kgp-BR",`
			`+ "", "khq",`
			`+ "", "khq-ML",`
			`+ "", "ki",`
			`+ "", "ki-KE",`
			`+ "", "kkj",`
			`+ "", "kkj-CM",`
			`+ "", "kln",`
			`+ "", "kln-KE",`
			`+ "", "ks",`
			`+ "", "ks-Arab",`
			`+ "", "ks-Arab-IN",`
			`+ "", "ks-Deva",`
			`+ "", "ks-Deva-IN",`
			`+ "", "ksb",`
			`+ "", "ksb-TZ",`
			`+ "", "ksf",`
			`+ "", "ksf-CM",`
			`+ "", "ksh",`
			`+ "", "ksh-DE",`
			`+ "", "kw",`
			`+ "", "kw-GB",`
			`+ "", "lag",`
			`+ "", "lag-TZ",`
			`+ "", "lb",`
			`+ "", "lb-LU",`
			`+ "", "lg",`
			`+ "", "lg-UG",`
			`+ "", "lrc",`
			`+ "", "lrc-IQ",`
			`+ "", "lrc-IR",`
			`+ "", "lu",`
			`+ "", "lu-CD",`
			`+ "", "luo",`
			`+ "", "luo-KE",`
			`+ "", "luy",`
			`+ "", "luy-KE",`
			`+ "", "mai",`
			`+ "", "mai-IN",`
			`+ "", "mas",`
			`+ "", "mas-KE",`
			`+ "", "mas-TZ",`
			`+ "", "mer",`
			`+ "", "mer-KE",`
			`+ "", "mfe",`
			`+ "", "mfe-MU",`
			`+ "", "mg",`
			`+ "", "mg-MG",`
			`+ "", "mgh",`
			`+ "", "mgh-MZ",`
			`+ "", "mgo",`
			`+ "", "mgo-CM",`
			`+ "", "mi",`
			`+ "", "mi-NZ",`
			`+ "", "mni",`
			`+ "", "mni-Beng",`
			`+ "", "mni-Beng-IN",`
			`+ "", "ms",`
			`+ "", "ms-BN",`
			`+ "", "ms-ID",`
			`+ "", "ms-MY",`
			`+ "", "ms-SG",`
			`+ "", "mua",`
			`+ "", "mua-CM",`
			`+ "", "mzn",`
			`+ "", "mzn-IR",`
			`+ "", "naq",`
			`+ "", "naq-NA",`
			`+ "", "nd",`
			`+ "", "nd-ZW",`
			`+ "", "nl",`
			`+ "", "nl-AW",`
			`+ "", "nl-BE",`
			`+ "", "nl-BQ",`
			`+ "", "nl-CW",`
			`+ "", "nl-NL",`
			`+ "", "nl-SR",`
			`+ "", "nl-SX",`
			`+ "", "nmg",`
			`+ "", "nmg-CM",`
			`+ "", "nnh",`
			`+ "", "nnh-CM",`
			`+ "", "nus",`
			`+ "", "nus-SS",`
			`+ "", "nyn",`
			`+ "", "nyn-UG",`
			`+ "", "os",`
			`+ "", "os-GE",`
			`+ "", "os-RU",`
			`+ "", "pcm",`
			`+ "", "pcm-NG",`
			`+ "", "pt",`
			`+ "", "pt-AO",`
			`+ "", "pt-BR",`
			`+ "", "pt-CH",`
			`+ "", "pt-CV",`
			`+ "", "pt-GQ",`
			`+ "", "pt-GW",`
			`+ "", "pt-LU",`
			`+ "", "pt-MO",`
			`+ "", "pt-MZ",`
			`+ "", "pt-PT",`
			`+ "", "pt-ST",`
			`+ "", "pt-TL",`
			`+ "", "qu",`
			`+ "", "qu-BO",`
			`+ "", "qu-EC",`
			`+ "", "qu-PE",`
			`+ "", "rm",`
			`+ "", "rm-CH",`
			`+ "", "rn",`
			`+ "", "rn-BI",`
			`+ "", "rof",`
			`+ "", "rof-TZ",`
			`+ "", "rw",`
			`+ "", "rw-RW",`
			`+ "", "rwk",`
			`+ "", "rwk-TZ",`
			`+ "", "sa",`
			`+ "", "sa-IN",`
			`+ "", "sah",`
			`+ "", "sah-RU",`
			`+ "", "saq",`
			`+ "", "saq-KE",`
			`+ "", "sat",`
			`+ "", "sat-Olck",`
			`+ "", "sat-Olck-IN",`
			`+ "", "sbp",`
			`+ "", "sbp-TZ",`
			`+ "", "sc",`
			`+ "", "sc-IT",`
			`+ "", "sd",`
			`+ "", "sd-Arab",`
			`+ "", "sd-Arab-PK",`
			`+ "", "sd-Deva",`
			`+ "", "sd-Deva-IN",`
			`+ "", "seh",`
			`+ "", "seh-MZ",`
			`+ "", "ses",`
			`+ "", "ses-ML",`
			`+ "", "sg",`
			`+ "", "sg-CF",`
			`+ "", "shi",`
			`+ "", "shi-Latn",`
			`+ "", "shi-Latn-MA",`
			`+ "", "shi-Tfng",`
			`+ "", "shi-Tfng-MA",`
			`+ "", "sn",`
			`+ "", "sn-ZW",`
			`+ "", "so",`
			`+ "", "so-DJ",`
			`+ "", "so-ET",`
			`+ "", "so-KE",`
			`+ "", "so-SO",`
			`+ "", "su",`
			`+ "", "su-Latn",`
			`+ "", "su-Latn-ID",`
			`+ "", "sw",`
			`+ "", "sw-CD",`
			`+ "", "sw-KE",`
			`+ "", "sw-TZ",`
			`+ "", "sw-UG",`
			`+ "", "teo",`
			`+ "", "teo-KE",`
			`+ "", "teo-UG",`
			`+ "", "tg",`
			`+ "", "tg-TJ",`
			`+ "", "ti",`
			`+ "", "ti-ER",`
			`+ "", "ti-ET",`
			`+ "", "tt",`
			`+ "", "tt-RU",`
			`+ "", "twq",`
			`+ "", "twq-NE",`
			`+ "", "tzm",`
			`+ "", "tzm-MA",`
			`+ "", "vai",`
			`+ "", "vai-Latn",`
			`+ "", "vai-Latn-LR",`
			`+ "", "vai-Vaii",`
			`+ "", "vai-Vaii-LR",`
			`+ "", "vun",`
			`+ "", "vun-TZ",`
			`+ "", "wae",`
			`+ "", "wae-CH",`
			`+ "", "xh",`
			`+ "", "xh-ZA",`
			`+ "", "xog",`
			`+ "", "xog-UG",`
			`+ "", "yav",`
			`+ "", "yav-CM",`
			`+ "", "yrl",`
			`+ "", "yrl-BR",`
			`+ "", "yrl-CO",`
			`+ "", "yrl-VE",`
			`+ "", "zgh",`
			`+ "", "zgh-MA",`
			`+ "", "zu",`
			`+ "", "zu-ZA",`
			`+ "af", "af-NA",`
			`+ "af", "af-ZA",`
			`+ "am", "am-ET",`
			`+ "ar", "ar-001",`
			`+ "ar", "ar-AE",`
			`+ "ar", "ar-BH",`
			`+ "ar", "ar-DJ",`
			`+ "ar", "ar-DZ",`
			`+ "ar", "ar-EG",`
			`+ "ar", "ar-EH",`
			`+ "ar", "ar-ER",`
			`+ "ar", "ar-IL",`
			`+ "ar", "ar-IQ",`
			`+ "ar", "ar-JO",`
			`+ "ar", "ar-KM",`
			`+ "ar", "ar-KW",`
			`+ "ar", "ar-LB",`
			`+ "ar", "ar-LY",`
			`+ "ar", "ar-MA",`
			`+ "ar", "ar-MR",`
			`+ "ar", "ar-OM",`
			`+ "ar", "ar-PS",`
			`+ "ar", "ar-QA",`
			`+ "ar", "ar-SA",`
			`+ "ar", "ar-SD",`
			`+ "ar", "ar-SO",`
			`+ "ar", "ar-SS",`
			`+ "ar", "ar-SY",`
			`+ "ar", "ar-TD",`
			`+ "ar", "ar-TN",`
			`+ "ar", "ar-YE",`
			`+ "as", "as-IN",`
			`+ "az", "az-Cyrl",`
			`+ "az", "az-Cyrl-AZ",`
			`+ "az", "az-Latn",`
			`+ "az", "az-Latn-AZ",`
			`+ "be", "be-BY",`
			`+ "bg", "bg-BG",`
			`+ "bn", "bn-BD",`
			`+ "bn", "bn-IN",`
			`+ "bo", "bo-CN",`
			`+ "bo", "bo-IN",`
			`+ "br", "br-FR",`
			`+ "bs", "bs-Latn",`
			`+ "bs", "bs-Latn-BA",`
			`+ "bs_Cyrl", "bs-Cyrl-BA",`
			`+ "ceb", "ceb-PH",`
			`+ "chr", "chr-US",`
			`+ "cs", "cs-CZ",`
			`+ "cy", "cy-GB",`
			`+ "da", "da-DK",`
			`+ "da", "da-GL",`
			`+ "dsb", "dsb-DE",`
			`+ "ee", "ee-GH",`
			`+ "ee", "ee-TG",`
			`+ "el", "el-CY",`
			`+ "el", "el-GR",`
			`+ "eo", "eo-001",`
			`+ "es", "es-419",`
			`+ "es", "es-AR",`
			`+ "es", "es-BO",`
			`+ "es", "es-BR",`
			`+ "es", "es-BZ",`
			`+ "es", "es-CL",`
			`+ "es", "es-CO",`
			`+ "es", "es-CR",`
			`+ "es", "es-CU",`
			`+ "es", "es-DO",`
			`+ "es", "es-EA",`
			`+ "es", "es-EC",`
			`+ "es", "es-ES",`
			`+ "es", "es-GQ",`
			`+ "es", "es-GT",`
			`+ "es", "es-HN",`
			`+ "es", "es-IC",`
			`+ "es", "es-MX",`
			`+ "es", "es-NI",`
			`+ "es", "es-PA",`
			`+ "es", "es-PE",`
			`+ "es", "es-PH",`
			`+ "es", "es-PR",`
			`+ "es", "es-PY",`
			`+ "es", "es-SV",`
			`+ "es", "es-US",`
			`+ "es", "es-UY",`
			`+ "es", "es-VE",`
			`+ "et", "et-EE",`
			`+ "fa", "fa-IR",`
			`+ "ff_Adlm", "ff-Adlm-BF",`
			`+ "ff_Adlm", "ff-Adlm-CM",`
			`+ "ff_Adlm", "ff-Adlm-GH",`
			`+ "ff_Adlm", "ff-Adlm-GM",`
			`+ "ff_Adlm", "ff-Adlm-GN",`
			`+ "ff_Adlm", "ff-Adlm-GW",`
			`+ "ff_Adlm", "ff-Adlm-LR",`
			`+ "ff_Adlm", "ff-Adlm-MR",`
			`+ "ff_Adlm", "ff-Adlm-NE",`
			`+ "ff_Adlm", "ff-Adlm-NG",`
			`+ "ff_Adlm", "ff-Adlm-SL",`
			`+ "ff_Adlm", "ff-Adlm-SN",`
			`+ "fi", "fi-FI",`
			`+ "fil", "fil-PH",`
			`+ "fo", "fo-DK",`
			`+ "fo", "fo-FO",`
			`+ "gl", "gl-ES",`
			`+ "gu", "gu-IN",`
			`+ "ha", "ha-GH",`
			`+ "ha", "ha-NE",`
			`+ "ha", "ha-NG",`
			`+ "haw", "haw-US",`
			`+ "he", "he-IL",`
			`+ "hi", "hi-IN",`
			`+ "hi", "hi-Latn",`
			`+ "hi", "hi-Latn-IN",`
			`+ "hr", "hr-BA",`
			`+ "hr", "hr-HR",`
			`+ "hsb", "hsb-DE",`
			`+ "hu", "hu-HU",`
			`+ "hy", "hy-AM",`
			`+ "ig", "ig-NG",`
			`+ "is", "is-IS",`
			`+ "ja", "ja-JP",`
			`+ "ka", "ka-GE",`
			`+ "kk", "kk-KZ",`
			`+ "kl", "kl-GL",`
			`+ "km", "km-KH",`
			`+ "kn", "kn-IN",`
			`+ "ko", "ko-KP",`
			`+ "ko", "ko-KR",`
			`+ "kok", "kok-IN",`
			`+ "ku", "ku-TR",`
			`+ "ky", "ky-KG",`
			`+ "lkt", "lkt-US",`
			`+ "ln", "ln-AO",`
			`+ "ln", "ln-CD",`
			`+ "ln", "ln-CF",`
			`+ "ln", "ln-CG",`
			`+ "lo", "lo-LA",`
			`+ "lt", "lt-LT",`
			`+ "lv", "lv-LV",`
			`+ "mk", "mk-MK",`
			`+ "ml", "ml-IN",`
			`+ "mn", "mn-MN",`
			`+ "mr", "mr-IN",`
			`+ "mt", "mt-MT",`
			`+ "my", "my-MM",`
			`+ "ne", "ne-IN",`
			`+ "ne", "ne-NP",`
			`+ "no", "nb",`
			`+ "no", "nb-NO",`
			`+ "no", "nb-SJ",`
			`+ "no", "nn",`
			`+ "no", "nn-NO",`
			`+ "om", "om-ET",`
			`+ "om", "om-KE",`
			`+ "or", "or-IN",`
			`+ "pa", "pa-Arab",`
			`+ "pa", "pa-Arab-PK",`
			`+ "pa", "pa-Guru",`
			`+ "pa", "pa-Guru-IN",`
			`+ "pl", "pl-PL",`
			`+ "ps", "ps-AF",`
			`+ "ps", "ps-PK",`
			`+ "ro", "ro-MD",`
			`+ "ro", "ro-RO",`
			`+ "ru", "ru-BY",`
			`+ "ru", "ru-KG",`
			`+ "ru", "ru-KZ",`
			`+ "ru", "ru-MD",`
			`+ "ru", "ru-RU",`
			`+ "ru", "ru-UA",`
			`+ "se", "se-FI",`
			`+ "se", "se-NO",`
			`+ "se", "se-SE",`
			`+ "si", "si-LK",`
			`+ "sk", "sk-SK",`
			`+ "sl", "sl-SI",`
			`+ "smn", "smn-FI",`
			`+ "sq", "sq-AL",`
			`+ "sq", "sq-MK",`
			`+ "sq", "sq-XK",`
			`+ "sr", "sr-Cyrl",`
			`+ "sr", "sr-Cyrl-BA",`
			`+ "sr", "sr-Cyrl-ME",`
			`+ "sr", "sr-Cyrl-RS",`
			`+ "sr", "sr-Cyrl-XK",`
			`+ "sr_Latn", "sr-Latn-BA",`
			`+ "sr_Latn", "sr-Latn-ME",`
			`+ "sr_Latn", "sr-Latn-RS",`
			`+ "sr_Latn", "sr-Latn-XK",`
			`+ "sv", "sv-AX",`
			`+ "sv", "sv-FI",`
			`+ "sv", "sv-SE",`
			`+ "ta", "ta-IN",`
			`+ "ta", "ta-LK",`
			`+ "ta", "ta-MY",`
			`+ "ta", "ta-SG",`
			`+ "te", "te-IN",`
			`+ "th", "th-TH",`
			`+ "tk", "tk-TM",`
			`+ "to", "to-TO",`
			`+ "tr", "tr-CY",`
			`+ "tr", "tr-TR",`
			`+ "ug", "ug-CN",`
			`+ "uk", "uk-UA",`
			`+ "ur", "ur-IN",`
			`+ "ur", "ur-PK",`
			`+ "uz", "uz-Arab",`
			`+ "uz", "uz-Arab-AF",`
			`+ "uz", "uz-Cyrl",`
			`+ "uz", "uz-Cyrl-UZ",`
			`+ "uz", "uz-Latn",`
			`+ "uz", "uz-Latn-UZ",`
			`+ "vi", "vi-VN",`
			`+ "wo", "wo-SN",`
			`+ "yi", "yi-001",`
			`+ "yo", "yo-BJ",`
			`+ "yo", "yo-NG",`
			`+ "zh", "yue",`
			`+ "zh", "yue-Hans",`
			`+ "zh", "yue-Hans-CN",`
			`+ "zh", "yue-Hant",`
			`+ "zh", "yue-Hant-HK",`
			`+ "zh", "zh-Hans",`
			`+ "zh", "zh-Hans-CN",`
			`+ "zh", "zh-Hans-HK",`
			`+ "zh", "zh-Hans-MO",`
			`+ "zh", "zh-Hans-SG",`
			`+ "zh", "zh-Hant",`
			`+ "zh", "zh-Hant-HK",`
			`+ "zh", "zh-Hant-MO",`
			`+ "zh", "zh-Hant-TW",`
			`+ NULL, NULL,`
			`+};`
			`+`
			`#endif /* USE_ICU */`


			`@@ -772,18 +1481,19 @@`
			`* Start the loop at -1 to sneak in the root locale without too much`
			`* code duplication.`
			`*/`
			`- for (i = -1; i < uloc_countAvailable(); i++)`
			`+ for (i = -1; i < ucol_countAvailable(); i++) /* XXX-Patched: changed from uloc_countAvailable() */`
			`{`
			`const char *name;`
			`char *langtag;`
			`char *icucomment;`
			`const char *iculocstr;`
			`Oid collid;`
			`+ char *ptr; / XXX-Patched: added */`

			`if (i == -1)`
			`name = ""; /* ICU root locale */`
			`else`
			`- name = uloc_getAvailable(i);`
			`+ name = ucol_getAvailable(i); /* XXX-Patched: changed from uloc_getAvailable() */`

			`langtag = get_icu_language_tag(name);`
			`iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;`
			`@@ -812,6 +1523,44 @@`
			`CreateComments(collid, CollationRelationId, 0,`
			`icucomment);`
			`}`
			`+`
			`+ /*`
			`+ * XXX-Patched: The following block is added to create collations also for derived`
			`+ * locales (combination of language+country/region).`
			`+ * It's terribly inefficient, but in the big picture, it doesn't matter that much`
			`+ * (it's typically called only once in the life of the cluster).`
			`+ */`
			`+ for (ptr = icu_coll_locales; *ptr != NULL; ptr++)`
			`+ {`
			`+ /*`
			`+ * icu_coll_locales is a 1D array of pairs: collation name and locale (langtag).`
			`+ * ptr++ moves pointer to the second string of the pair and it's a post-increment,`
			`+ * so after the comparison with name is evaluated.`
			`+ */`
			`+ if (strcmp(*ptr++, name) == 0) {`
			`+ const char *langtag;`
			`+`
			`+ langtag = pstrdup(*ptr);`
			`+ collid = CollationCreate(psprintf("%s-x-icu", langtag),`
			`+ nspid, GetUserId(),`
			`+ COLLPROVIDER_ICU, true, -1,`
			`+ NULL, NULL, langtag,`
			`+ get_collation_actual_version(COLLPROVIDER_ICU, langtag),`
			`+ true, true);`
			`+`
			`+ if (OidIsValid(collid))`
			`+ {`
			`+ ncreated++;`
			`+`
			`+ CommandCounterIncrement();`
			`+`
			`+ icucomment = get_icu_locale_comment(langtag);`
			`+ if (icucomment)`
			`+ CreateComments(collid, CollationRelationId, 0,`
			`+ icucomment);`
			`+ }`
			`+ }`
			`+ }`
			`}`
			`}`
			`#endif /* USE_ICU */`