-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python script to generate a .json.gz file per each locale #5
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also have this output the necessary lines to copy into .htaccess to map locale
parameter values to the files between written. We mostly won't need to update them, but good to keep them coupled to the available locales, and the logic might need tweaking.
And for that logic, we basically want to implement this code:
…but entirely with rewrite rules. (I haven't totally though through how possible this is, but I think it's mostly doable by writing out a lot of rules in a given order. Let's see what we can do.)
en-US → en-US (exact match)
ar → ar (exact match)
de-DE → de (matching language part)
ca → ca-AD (matching language part)
en-NZ → en-US (prefer en-US over other available en locales for inexact match)
pt → pt-PT (prefer country code matching language code if unspecified — this one's debatable, and 217 million Brazilians would presumably disagree, but it's what we do elsewhere)
zh → zh-CN (this we just get from sorting the country codes, but it makes sense for our userbase)
zz → full file (ignore locale parameter if language code is unknown)
scripts/update-gz.py
Outdated
schema_text = f.read() | ||
schema = json.loads(schema_text) | ||
|
||
if not os.path.exists('../locales'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Script should be runnable from any folder. You're already getting the parent folder above, so you should just use that for other paths.
You should also wipe locales
to remove any existing files. (Locales will pretty much never be removed, but just to fix possible bugs, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, this has been done.
scripts/update-gz.py
Outdated
for creator_type in item_type['creatorTypes']: | ||
if creator_type['creatorType'] in current_locale["itemTypes"]: | ||
creator_type['creatorType'] = current_locale["itemTypes"][creator_type['creatorType']] | ||
del schema_with_one_locale['locales'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely don't need all of this — sorry for not specifying. Can just keep the one locale keyed properly in locales
, as below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of it!
Added the generation of the .htaccess file. The example output is as below:
Each of these rules matches |
scripts/update-gz.py
Outdated
os.mkdir(locales_folder) | ||
|
||
# String that accumulates the rules to paste into htaccess | ||
htaccess_rules = f"RewriteRule ^schema/({'|'.join(schema['locales'].keys())})$ /zotero-schema/locales/$1.gz [L]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a locale
query string parameter, not a path component.
scripts/update-gz.py
Outdated
# Catch all for default schema with all locales | ||
htaccess_rules += f'\nRewriteRule ^schema/* /zotero-schema/schema.json.gz [L]' | ||
|
||
print("--- .htacess rules --- \n" + htaccess_rules + "\n--- ---") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can skip the header and footer
Can replace update-gz with update-gz.py, no extension, mode 755 |
I think we'll need some additional conditions/rules for the locale gz files, like we already have for schema.json.gz: E.g., checking Let's just generate the whole block in this script. And we don't need newlines — better to keep this as a single block. |
Not sure it matters, but since we're generating this in a script anyway and know the total count, we can probably add a skip flag ( (Will this all be meaningfully faster than just having a single schema.php file that generate a locale-specific file and dumped it in memcached? Unclear. But this is certainly the more fun way to do it…) |
…rect content type, and filematch
After testing it out with a version of dataserver I ran locally, I had to add one more condition https://gist.github.com/abaevbog/23a986e6966000325f609652cd25e6ce |
I think we can do a slightly simpler block: Specifically:
|
Yes, you are right... |
scripts/update-gz
Outdated
htaccess_rules = f'''RewriteCond %{{REQUEST_URI}} !^/(schema|zotero-schema) | ||
RewriteRule ".?" "-" [S=LINES_TO_SKIP] | ||
htaccess_rules = f'''RewriteCond %{{REQUEST_URI}} !^/schema | ||
RewriteRule ".?" "-" [S=LINES_TO_SKIP,L] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L
isn't right here, though — that breaks the dataserver completely by skipping the main redirect at the bottom of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean - I removed it.
For some reason when I was testing it on my local dataserver setup, all non /schema requests reached index.php as they were supposed to even with that L flag. Probably can blame it on my local dataserver or apache setup
scripts/update-gz
Outdated
RewriteCond %{{HTTP:Accept-Encoding}} !gzip | ||
RewriteRule ^schema(/.*)?$ /zotero-schema/schema.json [QSD,L] | ||
RewriteCond %{{QUERY_STRING}} (?:^|&)locale=({'|'.join(schema['locales'].keys())})(?:&|$) | ||
RewriteRule ^schema(/.*)?$ /zotero-schema/locales/%1.json.gz [QSD] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing L here, so exact matches (e.g., locale=en-US
) don't work
scripts/update-gz
Outdated
# For every country code, sort locale candidates and add rule to htacecss | ||
for country_code in htaccess_mapings.keys(): | ||
htaccess_mapings[country_code].sort(key=locale_sort_key) | ||
# Each rule is only applid is gzip encoding is accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applid is
→ applied if
No description provided.