-
-
Notifications
You must be signed in to change notification settings - Fork 32.8k
Fixed #35859 -- Implement Tasks interface as per DEP 14 #18627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of methods here which probably work better in django.utils
- just not sure where makes sense
Just one comment: Missing periods at the end of some of the docstrings. |
@carltongibson (as shepherd 🐑 ), what's the best way to get some Review Team eyeballs on this? Social media posting only gets the reviews so far. |
@RealOrangeOne Good hustle! Right... let me mention it to some folks. But also — create a Trac ticket "Add DEP 14 Task interface" (or similar), check Has Patch, link here, and change the title to |
I had a feeling a ticket might be the way forward. Opened ticket-35859. Something something more content for the DEP talk |
I have this checked out and am going through and reviewing this. Excited about getting this moving forward, but have a lot of little nits and things I think we want to clean up. There's so much scar tissue around backends that I know we all have, so I think we can provide something great but not complicated here! Really don't want to expand the scope, maybe some of this will involve simply hiding some stuff to future PRs. In any case hope to have finished reviewing this later tonight or tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a lot of tiny things I commented on, but here are some big overall points:
- I like the general shape of this! It aligns both with how Django deals with pluggable backends well, and it also aligns somewhat with task management systems. I want to get this merged.
- I think we should include at least one "real" backend in this PR. There are concerns that I have (in particular around task routing and priorities) that I think are hard to highlight without actually having the Database backend in here. I suspect some of the utils are being used in there as well... handling the cross process nature of tasks is fundamental to everything, and not having a backend that is cross process makes it hard to judge the rest
Now for some tactical things to get this easier to merge:
- I really dislike the exception serialization/deserialization stuff.
import_string
alone worries me, in the same way that usingpickle
would worry me for passing around the args/kwargs. I would be 100% fine with just not having any sort of exception storage, but I think "we only store a stringified exception" makes a lot of sense. Really, having backends determine exception handling makes sense to me. Cross-process transmission of rich objects (especially when you might not even be in the same version of a codebase!) is a messy problem, and we can choose to entirely sidestep it here. - I think
task.using(**options).enqueue(*args, **kwargs)
should be doable without ausing
call. Something liketask.enqueue_message(args, kwargs, **options)
(orenqueue_with_options
orenqueue_task
or something. Celery it'sdelay
vsapply_async
... maybe something better than that) - I don't think we should freeze the dataclasses, if only to reduce the number of
object.__setattr__
calls.
For some "design opinions": - arg serialization/deserialization can be a method on the backends. I think the "normalized JSON" option is a fine default still, but we can just do the work here.
- I think that
task.enqueue(*args, **kwargs)
returning aTaskResult
is a mistaken design. I believe that enqueueing should return an....EnqueueResult
, and thenEnqueueResult.task_result_id
might (or might not!) be present. If you're using something like Rabbit, you are in fact queueing out to another system, and your result storage might be something else!
Having said that, I don't want to fight aboutTaskResult
being the return result. The DEP has it as returningTaskResult
, we can just stick to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great -- thanks for your work on this!
I mostly looked over the docs as a way to better understand what is happening, and did some basic review, but don't take my suggestions too seriously because I didn't look into the code too deeply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @RealOrangeOne ⭐
I have started looking into the docs and checked the coverage 👍
Can you rebase and add a release note to Django 6.0? This would be a "headline" feature 👍 (ie. not within the minor section)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. Very nice work! Having written my own internal simple version of this, I think this looks good.
One thing I'm missing is retry handling. I see there is a retry
util function but it's not references anywhere as far as I can tell. Is this something that is planned?
django/tasks/task.py
Outdated
Queue up the task to be executed | ||
""" | ||
return self.get_backend().enqueue( | ||
self, json_normalize(args), json_normalize(kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to wrap the json_normalize
in try/except and raise a clear Django error that the input needs to be JSON-serializable. Not everyone will read the documentation note, and some TypeError exception would be confusing for beginners I imagine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering something similar.
Currently, if the return value isn't JSON serializable, the task is still considered a failure. I wonder if that's the right way to go? I think it probably still needs to be a failure of the task, but perhaps some way to distinguish between "The task errored" and "you returned something unserializable".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i haven't dug deep enough into python typing to know if that's possible, but maybe the decorator could also type hint at the dev right away when the return value is not json serializable? probably not for deep dicts i guess :S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sadly not possible without a lot of work (not to mention that there's no types in this codebase). Even Python's json.dumps
method just uses Any
for its type.
Looking forward for this go out, any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ⭐ added a few comments
We also need a release note in 6.0 within the "What's new in Django 6.0" (headline) section
""" | ||
Queue up a task to be executed | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should raise a NotImplementedError
I think if we're writing some guidlines in the docs as to how folks should write their own tasks backend, the main thing is writing this method (and setting appropriate feature flags). I guess they also need to implement a way to "run" the tasks (which I think we're giving no API or opinion on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"run" is always handled externally. I'd like a common API to do it, but that falls firmly in the "some day" category for me.
The class is an ABC, so it'll fail to subclass if someone doesn't override enqueue
, but I'll make it raise an exception anyway to be clearer that that's the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think this is being documented by:
Outside of Django, a Worker looks at the Queue Store for new Tasks to run. When
a new Task is added, the Worker claims the Task, executes it, and saves the
status and result back to the Queue Store.
I wonder if we can emphasize this a little more that Django doesn't implement the running mechanism
django/tasks/backends/base.py
Outdated
async def aget_result(self, result_id): | ||
""" | ||
Queue up a task function (or coroutine) to be executed | ||
""" | ||
return await sync_to_async(self.get_result, thread_sensitive=True)( | ||
result_id=result_id | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docstring is off here and I also think we should implement the async function (here just raise the NotImplementedError
error) as we haven't documented that we are doing this for folks and we implement Task.aget_result
so feels good to be consistent (would mean implementing DummyBackend.aget_result
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've not documented that it's automatically implemented, but we have that it exists and can be used. My thinking was mainly that it means backend authors get async functionality for free, without needing to think of the implementation, and users can depend on every backend having both a sync and async API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to avoid the sync_to_async
overhead where possible. So for each backend that we implement, we should not rely on the default (if possible). So I would suggest we implement DummyBackend.aget_result
still
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a play at this, but transaction handling doesn't seem to work in async yet, so it doesn't work properly.
django/tasks/task.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this module shadows the django.tasks.task
decorator. Any thoughts on how to handle that conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at other modules, we might name this file base.py
4ad753d
to
2e74a7a
Compare
Trac ticket number
ticket-35859
https://github.com/django/deps/blob/main/accepted/0014-background-workers.rst
Branch description
This PR implements the first parts of
django.tasks
, as implemented indjango-tasks
:The database backend is intentionally absent, and will be added in future (after this PR is merged). This makes the PR significantly smaller, and easier to review.
The commits are also not squashed, to assist with iterative reviewing. I'll squash them all towards the end once the change is approved.
Notable deviations from the DEP
Checklist
main
branch.