Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #35718 -- Add JSONArray to django.db.models.functions. #18541

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

john-parton
Copy link
Contributor

@john-parton john-parton commented Sep 4, 2024

Trac ticket number

Ticket: https://code.djangoproject.com/ticket/35718

Branch description

This branch adds a JSONArray function so that you can construct json arrays. It works extremely similarly to the existing JSONObject function.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@john-parton john-parton force-pushed the feature/35718-json-array branch from d8bba27 to ef4943e Compare September 4, 2024 23:29
@john-parton
Copy link
Contributor Author

john-parton commented Sep 4, 2024

It turned out to be pretty straightforward to get a basic implementation. I did essentially copy/paste the existing JSONObject function to create the new JSONArray function. It might be possible to slightly reduce the amount of code duplication with a mixin or some refactoring, but I didn't do that yet, obviously.

Additionally, I reused the connection.features.has_json_object_function property for feature detection. I'm not sure it necessarily makes sense to create a new flag just for this function.

I wrote tests for JSONArray which were again basically copy/pasted from the JSONObject test, but instead of using **kwargs, it uses *args and the expected values are lists. I also tested some more complicated behavior combining the usage of both JSONObject and JSONArray where there's a list of objects and an object containing a list.

@john-parton john-parton force-pushed the feature/35718-json-array branch 2 times, most recently from fd8af12 to fd328af Compare September 4, 2024 23:47
@john-parton
Copy link
Contributor Author

john-parton commented Sep 4, 2024

Added very basic docs, again basically copy/pasted from JSONObject with necessary tweaks to make it make sense. Force pushed update

Few tiny notes:

  • I'm not super sure this will work correctly on Oracle. I think it should, but that's not my area of expertise.
  • We can certainly have a more involved/longer discussion on the forums. I think this specific function is pretty obviously useful and fits in nicely, and I like to center my discussions around code

Some motivation for this feature

Let's say you want to build out a payload for an API https://shopify.dev/docs/api/admin-graphql/2024-07/mutations/productvariantupdate

You might be able to have your database construct some deeply nested objects both more efficiently and more "declarative" than if you did so as a post-processing step.

Consider

def metafield(namespace: str, key: str, value) -> JSONObject:
    return JSONObject(
        namespace=Value(namespace),
        key=Value(key),
        value=value
    )

ProductVariant.objects.all().annotate(
    metafields=JSONArray(
          metafield(
              "search",
              "width", 
              value=Cast("width", output_field=TextField())
          ),
          metafield(
              "specs",
              "color",
              value="attributes__color",
         )
     )
)

vs

def get_metafields(variant: ProductVariant) -> list[dict[str, Any]]:
    return [{
        # Did you spell all your keys correctly here?
        "namespace": "search",
        "key": "width",
        "value": str(variant.width),
    }, {
        "namespace": "specs",
        "key": "color",
        # Did you make sure to include `select_related("attributes")`?
        # If not, this will result in an extra database call and will
        # also fail in an async context
        "value": variant.attributes.color,
    }]

@john-parton john-parton force-pushed the feature/35718-json-array branch from fd328af to 2856cb8 Compare September 5, 2024 00:15
@john-parton
Copy link
Contributor Author

Force-pushed to fix postgres tests failing. Think I got it.

@john-parton
Copy link
Contributor Author

Postgres tests still failing. Will try to do more local tests before pushing again.

@john-parton john-parton force-pushed the feature/35718-json-array branch from 2856cb8 to 3fd9aed Compare September 5, 2024 04:06
@john-parton
Copy link
Contributor Author

john-parton commented Sep 5, 2024

So I figured out the failing test, and it has to do with how the different database engines handle NULL (the SQL type) in the function call. The default behavior for sqlite is to replace NULL with null (the JS type). The default behavior for Oracle and Postgres (with the native json_array) is to omit the NULL values. The older postgres function jsonb_build_array replaces NULL with null.

With both the newer postgres function and oracle, you can actually switch between either behavior by providing null on null or absent on null clauses.

I went ahead and included null on null by default, so the behavior across all the supported databases should be the same/normalized.

Unfortunately, there's not really a way to opt into the absent on null behavior.

It might be desirable to have something like

JsonArray(
    ...,
    exclude_nulls=True
)

or some other such construction, to use the other behavior. However, we'd have to trigger some kind of warning or wrap the functions somehow with SQLite and older postgres.

I also check to see if there's at least one expression, because on postgres, providing the null on null clause without any values was a syntax error.

@john-parton john-parton force-pushed the feature/35718-json-array branch from 3fd9aed to 729747a Compare September 5, 2024 12:09
@john-parton
Copy link
Contributor Author

john-parton commented Sep 5, 2024

I don't understand why the tests failed.

I checked out the main branch and tried running the test suite with these settings

from test_sqlite import *  # NOQA

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "USER": "user",
        "NAME": "django",
        "PASSWORD": "postgres",
        "HOST": "localhost",
        "PORT": 5436,
        "OPTIONS": {
            "server_side_binding": True,
        },
    },
}

Using ./runtests.py --settings test_postgres db_functions.comparison

I still got errors

ERROR: test_textfield (db_functions.comparison.test_json_object.JSONObjectTests.test_textfield)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/john/Code/django/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
    ^^^^^^^^^^^^^^^^^
  File "/home/john/.pyenv/versions/django/lib/python3.12/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
    ^^^^^^^^^^^^^^^^^
psycopg.errors.IndeterminateDatatype: could not determine data type of parameter $1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/john/.pyenv/versions/3.12.3/lib/python3.12/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/home/john/.pyenv/versions/3.12.3/lib/python3.12/unittest/case.py", line 634, in run
    self._callTestMethod(testMethod)
    ^^^^^^^^^^^^^^^^^
  File "/home/john/.pyenv/versions/3.12.3/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/tests/db_functions/comparison/test_json_object.py", line 93, in test_textfield
    obj = Article.objects.annotate(json_object=JSONObject(text=F("text"))).first()
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/models/query.py", line 1078, in first
    for obj in queryset[:1]:
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/models/query.py", line 381, in __iter__
    self._fetch_all()
  File "/home/john/Code/django/django/db/models/query.py", line 1911, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/models/sql/compiler.py", line 1585, in execute_sql
    cursor.execute(sql, params)
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
    ^^^^^^^^^^^^^^^^^
  File "/home/john/Code/django/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
    ^^^^^^^^^^^^^^^^^
  File "/home/john/.pyenv/versions/django/lib/python3.12/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
    ^^^^^^^^^^^^^^^^^
django.db.utils.ProgrammingError: could not determine data type of parameter $1

Output of pip freeze

aiohappyeyeballs==2.4.0
aiohttp==3.10.5
aiosignal==1.3.1
aiosmtpd==1.4.6
alabaster==1.0.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
asgiref==3.8.1
atpublic==5.0
attrs==24.2.0
babel==2.16.0
bcrypt==4.2.0
black==24.4.2
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.3.2
click==8.1.7
-e git+ssh://[email protected]/john-parton/django.git@aa5293068782dfa2d2173c75c8477f58a9989942#egg=Django
docutils==0.21.2
frozenlist==1.4.1
geoip2==4.8.0
h11==0.14.0
idna==3.8
imagesize==1.4.1
Jinja2==3.1.4
MarkupSafe==2.1.5
maxminddb==2.6.2
multidict==6.0.5
mypy-extensions==1.0.0
numpy==2.1.1
outcome==1.3.0.post0
packaging==24.1
pathspec==0.12.1
pillow==10.4.0
platformdirs==4.2.2
psycopg==3.2.1
pycparser==2.22
Pygments==2.18.0
pylibmc==1.6.3
pymemcache==4.0.0
PySocks==1.7.1
pywatchman==2.0.0
PyYAML==6.0.2
redis==5.0.8
requests==2.32.3
selenium==4.24.0
setuptools==74.1.1
sniffio==1.3.1
snowballstemmer==2.2.0
sortedcontainers==2.4.0
Sphinx==8.0.2
sphinxcontrib-applehelp==2.0.0
sphinxcontrib-devhelp==2.0.0
sphinxcontrib-htmlhelp==2.1.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==2.0.0
sphinxcontrib-serializinghtml==2.0.0
sqlparse==0.5.1
tblib==3.0.0
trio==0.26.2
trio-websocket==0.11.1
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.2
websocket-client==1.8.0
wsproto==1.2.0
yarl==1.9.9

Here's my docker-compoose.yml

services:
    postgres:
        image: postgres:16.3
        shm_size: 1g
        ports:
        - "5436:5432"
        environment:
            POSTGRES_USER: user
            POSTGRES_PASSWORD: postgres
            POSTGRES_DB: django

Postgres 15 worked.

@john-parton
Copy link
Contributor Author

Looks like there's an issue I reported here: https://code.djangoproject.com/ticket/35734#comment:4

I am going to hold off on working on this feature until that's resolved.

@john-parton john-parton force-pushed the feature/35718-json-array branch from 729747a to 2a8f2c6 Compare September 7, 2024 05:19
@john-parton
Copy link
Contributor Author

john-parton commented Sep 7, 2024

Turns out that psycopg3 is really good at sending the correct type information for server side binding when using the json_array function, and really bad when using the jsonb_build_array. In order to make sure that the types are properly sent, I just cast all of the variadic arguments. This is only totally necessary if you're using server-side binding, so we could skip that, but I don't particularly like the idea of emitting different SQL based on options in the settings if we can get away with it.

As per my tradition, I included way-too-verbose comments in the code that will need to be pared down.

@john-parton john-parton marked this pull request as ready for review September 7, 2024 17:12
@john-parton
Copy link
Contributor Author

Marked ready for review. It still has the error with Postgres 16+ and server-side bindings, but that's not specific to the code here.

@john-parton john-parton force-pushed the feature/35718-json-array branch from 2a8f2c6 to 2a9a12b Compare September 17, 2024 15:26
@john-parton
Copy link
Contributor Author

Rebased, force-pushed. All checks are passing now (perhaps some issue with main or CI before.)

@john-parton john-parton force-pushed the feature/35718-json-array branch from 2a9a12b to 42116c8 Compare September 26, 2024 15:57
@john-parton
Copy link
Contributor Author

Rebase / force-pushed. I believe this is ready for review.

We might want to review #18622 first. It might inform the general approach or "vibes" of this pull request.

@john-parton john-parton force-pushed the feature/35718-json-array branch from 1814f69 to 93f9847 Compare November 6, 2024 16:06
@john-parton
Copy link
Contributor Author

I reworked the flow-control/logic of the new JsonArray class to match the recent changes to JsonObject (for consistency.)

In particular:

  • Inner expressions are explicitly cast to their outputfield on all versions of postgres (similar to how all keys are cast to charfield in the JsonObject class)
  • Postgres 16 is checked as true (previously was not features.postgres_16) and newer implementation is returned

Some questions I have:

What if a user has already explicitly casted an element? Does that result in a double cast? That can't be true of the JSONObject class because the keys are passed as **kwargs

@john-parton
Copy link
Contributor Author

Yes, I actually confirmed that explicitly passing an expression which was already wrapped in a Cast would result in a double Cast. I added logic skip wrapping terms in Cast that were already wrapped. I also added a test. test_explicit_cast

Curious if anyone has a better way of implementing test_explicit_cast. I ended up just searching through the generated SQL to make sure there's a cast, but not a double cast. I think that might be a little brittle, but it does work in this case.

@john-parton
Copy link
Contributor Author

@sarahboyce Could we get someone to look at this? It's starting to get pretty good, I think.

@john-parton
Copy link
Contributor Author

john-parton commented Nov 6, 2024

Just updated docs to include a note about SQL NULL values mapping to json scalar null. This is not the default behavior on many backends, so worth noting for those users familiar with their backends particular behavior.

In the future, it would be nice to add a on_null behavior to function.

EDIT Also rebase/force pushed.

@john-parton john-parton force-pushed the feature/35718-json-array branch from 15d115e to 96ff0ed Compare November 6, 2024 16:55
@sarahboyce
Copy link
Contributor

sarahboyce commented Nov 6, 2024

@sarahboyce Could we get someone to look at this? It's starting to get pretty good, I think.

You need to update the flags on the ticket like "Needs tests" so it goes in the review queue

@john-parton
Copy link
Contributor Author

john-parton commented Nov 6, 2024

Great, thanks.

Done.

Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like jsonb_array_elements was also discussed in the ticket - was it decided not to add this?

docs/ref/models/database-functions.txt Outdated Show resolved Hide resolved
@john-parton
Copy link
Contributor Author

john-parton commented Nov 7, 2024

I feel like jsonb_array_elements was also discussed in the ticket - was it decided not to add this?

So I left it out for for a few reasons:

  1. I wasn't sure how to implement it on non-postgresql backends.
  2. I was trying to keep my PR relatively slim. It turned out that there was a decent amount of subtly to how the json_array implementation works, so I didn't want to go crazy adding every conceivable function.
  3. In my code, I actually never made use of jsonb_array_elements. I implemented it for completion, but it turned out to be less useful than json_array.

@john-parton john-parton force-pushed the feature/35718-json-array branch from 117cd80 to de10500 Compare November 7, 2024 18:27
@sarahboyce
Copy link
Contributor

We don't have to include it in the PR but it changes this from Fixed to Refs if it doesn't complete the ticket 👍

@john-parton john-parton changed the title Fixed #35718 -- Add JSONArray to django.db.models.functions. Refs #35718 -- Add JSONArray to django.db.models.functions. Nov 7, 2024
@john-parton john-parton force-pushed the feature/35718-json-array branch from de10500 to 6dc6906 Compare November 7, 2024 18:40
@john-parton
Copy link
Contributor Author

Changed. It's good practice for me.

Copy link
Contributor

@laymonage laymonage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the function (on all databases, wow!) and doing all the research 😄

For an initial pass I think it looks good, but there's a potential simplification we can do in as_postgresql. Also, we might want to have a dedicated json module for all the JSON-related functions (see comment for details).

Since the ticket has been updated to only be about adding JSONArray, we can probably change the title/commit message back to Fixed instead of Refs.

I might try to think of other test cases later, but so far, great work @john-parton! 👍

django/db/models/functions/comparison.py Outdated Show resolved Hide resolved
django/db/models/functions/comparison.py Outdated Show resolved Hide resolved
@john-parton john-parton force-pushed the feature/35718-json-array branch 4 times, most recently from 0ba972c to 40b3259 Compare December 16, 2024 18:38
@john-parton
Copy link
Contributor Author

Just adding a root comment here as well. I've rebased against main:

  • Moved class to new json module
  • Shuffled documentation around
  • Added a release note
  • Tweaked the postgres logic to avoid double casting to be a bit simpler, with a slightly more helpful comment

@john-parton
Copy link
Contributor Author

Unchecked the needs documentation checkbox in Trac.

Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates - can you also move the test files to tests/db_functions/json? 🙏

docs/ref/models/database-functions.txt Outdated Show resolved Hide resolved
@john-parton john-parton force-pushed the feature/35718-json-array branch from 688ce5f to 3489767 Compare December 19, 2024 17:24
@john-parton
Copy link
Contributor Author

Thank you for the updates - can you also move the test files to tests/db_functions/json? 🙏

Done. I ran a quick ./runtests.py --settings test_postgres db_functions.json to confirm everything's still wired up correctly. Rebased/force-pushed.

Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of small comments before I go away till the new year 🎄

docs/releases/5.2.txt Outdated Show resolved Hide resolved
docs/ref/models/database-functions.txt Outdated Show resolved Hide resolved
tests/db_functions/json/test_json_array_object.py Outdated Show resolved Hide resolved
Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-parton Thanks 👍 I left small comments.

django/db/models/functions/json.py Outdated Show resolved Hide resolved
tests/db_functions/json/test_json_array.py Outdated Show resolved Hide resolved
tests/db_functions/json/test_json_array.py Outdated Show resolved Hide resolved
django/db/models/functions/json.py Outdated Show resolved Hide resolved
tests/db_functions/json/test_json_array.py Outdated Show resolved Hide resolved
@john-parton john-parton force-pushed the feature/35718-json-array branch from 9f05015 to d3fc786 Compare December 20, 2024 22:56
@john-parton
Copy link
Contributor Author

I applied most of the suggestions using Github, pulled it all down, cleaned up things (mostly black) and made some minor required tweaks.

Rebased, ran tests locally, force pushed.

@sarahboyce sarahboyce force-pushed the feature/35718-json-array branch from d3fc786 to 1a50133 Compare January 3, 2025 09:00
@sarahboyce
Copy link
Contributor

buildbot, test on oracle.

Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @john-parton
I pushed minor tweaks, will wait for the tests to run and hopefully land today/Monday 👍

Co-authored-by: Sarah Boyce <[email protected]>
Co-authored-by: Mariusz Felisiak <[email protected]>
@sarahboyce sarahboyce force-pushed the feature/35718-json-array branch from e1dae39 to 4b83fba Compare January 3, 2025 14:10
@sarahboyce sarahboyce changed the title Refs #35718 -- Add JSONArray to django.db.models.functions. Fixed #35718 -- Add JSONArray to django.db.models.functions. Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants