Skip to content

Fixed #35859 -- Implement Tasks interface as per DEP 14 #18627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RealOrangeOne
Copy link
Member

@RealOrangeOne RealOrangeOne commented Sep 27, 2024

Trac ticket number

ticket-35859

https://github.com/django/deps/blob/main/accepted/0014-background-workers.rst

Branch description

This PR implements the first parts of django.tasks, as implemented in django-tasks:

  • Common API interface
  • Base backend
  • Task / result classes
  • Immediate / Dummy backends

The database backend is intentionally absent, and will be added in future (after this PR is merged). This makes the PR significantly smaller, and easier to review.

The commits are also not squashed, to assist with iterative reviewing. I'll squash them all towards the end once the change is approved.

Notable deviations from the DEP
  • The email backend has been removed, as it isn't possible to be API-compatible, due to the need to return the number of successfully-sent emails.
  • Enqueue and finished task signals were added, to allow easier introspection from day 1.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable (TBC).
  • I have attached screenshots in both light and dark modes for any UI changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of methods here which probably work better in django.utils - just not sure where makes sense

@RealOrangeOne RealOrangeOne changed the title WIP: Fixed DEP 10 -- Implement Tasks interface WIP: Fixed DEP 14 -- Implement Tasks interface Oct 11, 2024
@RealOrangeOne RealOrangeOne marked this pull request as ready for review October 11, 2024 07:52
@RealOrangeOne RealOrangeOne changed the title WIP: Fixed DEP 14 -- Implement Tasks interface Fixed DEP 14 -- Implement Tasks interface Oct 11, 2024
@hvdklauw
Copy link

Just one comment: Missing periods at the end of some of the docstrings.

@RealOrangeOne
Copy link
Member Author

@carltongibson (as shepherd 🐑 ), what's the best way to get some Review Team eyeballs on this? Social media posting only gets the reviews so far.

@carltongibson
Copy link
Member

@RealOrangeOne Good hustle! Right... let me mention it to some folks.

But also — create a Trac ticket "Add DEP 14 Task interface" (or similar), check Has Patch, link here, and change the title to Fixed #... -- . (Please also add carltongibson to the CC field on the ticket.) — That way it will appear on the Fellow's "Patches needing review" list, which is pretty much all important.

@carltongibson carltongibson requested a review from a team October 22, 2024 08:04
@RealOrangeOne RealOrangeOne changed the title Fixed DEP 14 -- Implement Tasks interface Fixed #35859 -- Implement Tasks interface as per DEP 14 Oct 22, 2024
@RealOrangeOne
Copy link
Member Author

I had a feeling a ticket might be the way forward. Opened ticket-35859.

Something something more content for the DEP talk

@rtpg
Copy link
Contributor

rtpg commented Nov 5, 2024

I have this checked out and am going through and reviewing this. Excited about getting this moving forward, but have a lot of little nits and things I think we want to clean up. There's so much scar tissue around backends that I know we all have, so I think we can provide something great but not complicated here!

Really don't want to expand the scope, maybe some of this will involve simply hiding some stuff to future PRs. In any case hope to have finished reviewing this later tonight or tomorrow

Copy link
Contributor

@rtpg rtpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of tiny things I commented on, but here are some big overall points:

  • I like the general shape of this! It aligns both with how Django deals with pluggable backends well, and it also aligns somewhat with task management systems. I want to get this merged.
  • I think we should include at least one "real" backend in this PR. There are concerns that I have (in particular around task routing and priorities) that I think are hard to highlight without actually having the Database backend in here. I suspect some of the utils are being used in there as well... handling the cross process nature of tasks is fundamental to everything, and not having a backend that is cross process makes it hard to judge the rest

Now for some tactical things to get this easier to merge:

  • I really dislike the exception serialization/deserialization stuff. import_string alone worries me, in the same way that using pickle would worry me for passing around the args/kwargs. I would be 100% fine with just not having any sort of exception storage, but I think "we only store a stringified exception" makes a lot of sense. Really, having backends determine exception handling makes sense to me. Cross-process transmission of rich objects (especially when you might not even be in the same version of a codebase!) is a messy problem, and we can choose to entirely sidestep it here.
  • I think task.using(**options).enqueue(*args, **kwargs) should be doable without a using call. Something like task.enqueue_message(args, kwargs, **options) (or enqueue_with_options or enqueue_task or something. Celery it's delay vs apply_async... maybe something better than that)
  • I don't think we should freeze the dataclasses, if only to reduce the number of object.__setattr__ calls.
    For some "design opinions":
  • arg serialization/deserialization can be a method on the backends. I think the "normalized JSON" option is a fine default still, but we can just do the work here.
  • I think that task.enqueue(*args, **kwargs) returning a TaskResult is a mistaken design. I believe that enqueueing should return an.... EnqueueResult, and then EnqueueResult.task_result_id might (or might not!) be present. If you're using something like Rabbit, you are in fact queueing out to another system, and your result storage might be something else!
    Having said that, I don't want to fight about TaskResult being the return result. The DEP has it as returning TaskResult, we can just stick to that.

Copy link
Contributor

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- thanks for your work on this!

I mostly looked over the docs as a way to better understand what is happening, and did some basic review, but don't take my suggestions too seriously because I didn't look into the code too deeply.

Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @RealOrangeOne

I have started looking into the docs and checked the coverage 👍
Can you rebase and add a release note to Django 6.0? This would be a "headline" feature 👍 (ie. not within the minor section)

@RealOrangeOne RealOrangeOne marked this pull request as draft January 24, 2025 17:11
@RealOrangeOne RealOrangeOne marked this pull request as ready for review January 24, 2025 17:32
Copy link
Contributor

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Very nice work! Having written my own internal simple version of this, I think this looks good.

One thing I'm missing is retry handling. I see there is a retry util function but it's not references anywhere as far as I can tell. Is this something that is planned?

Queue up the task to be executed
"""
return self.get_backend().enqueue(
self, json_normalize(args), json_normalize(kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to wrap the json_normalize in try/except and raise a clear Django error that the input needs to be JSON-serializable. Not everyone will read the documentation note, and some TypeError exception would be confusing for beginners I imagine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering something similar.

Currently, if the return value isn't JSON serializable, the task is still considered a failure. I wonder if that's the right way to go? I think it probably still needs to be a failure of the task, but perhaps some way to distinguish between "The task errored" and "you returned something unserializable".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i haven't dug deep enough into python typing to know if that's possible, but maybe the decorator could also type hint at the dev right away when the return value is not json serializable? probably not for deep dicts i guess :S

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sadly not possible without a lot of work (not to mention that there's no types in this codebase). Even Python's json.dumps method just uses Any for its type.

@malkoG
Copy link

malkoG commented Apr 20, 2025

Looking forward for this go out, any update on this?

@carltongibson
Copy link
Member

@malkoG https://justinmayer.com/posts/any-updates/

Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ⭐ added a few comments
We also need a release note in 6.0 within the "What's new in Django 6.0" (headline) section

Comment on lines +93 to +99
"""
Queue up a task to be executed
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should raise a NotImplementedError
I think if we're writing some guidlines in the docs as to how folks should write their own tasks backend, the main thing is writing this method (and setting appropriate feature flags). I guess they also need to implement a way to "run" the tasks (which I think we're giving no API or opinion on)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"run" is always handled externally. I'd like a common API to do it, but that falls firmly in the "some day" category for me.

The class is an ABC, so it'll fail to subclass if someone doesn't override enqueue, but I'll make it raise an exception anyway to be clearer that that's the intention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think this is being documented by:

Outside of Django, a Worker looks at the Queue Store for new Tasks to run. When
a new Task is added, the Worker claims the Task, executes it, and saves the
status and result back to the Queue Store.

I wonder if we can emphasize this a little more that Django doesn't implement the running mechanism

Comment on lines 114 to 126
async def aget_result(self, result_id):
"""
Queue up a task function (or coroutine) to be executed
"""
return await sync_to_async(self.get_result, thread_sensitive=True)(
result_id=result_id
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docstring is off here and I also think we should implement the async function (here just raise the NotImplementedError error) as we haven't documented that we are doing this for folks and we implement Task.aget_result so feels good to be consistent (would mean implementing DummyBackend.aget_result)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've not documented that it's automatically implemented, but we have that it exists and can be used. My thinking was mainly that it means backend authors get async functionality for free, without needing to think of the implementation, and users can depend on every backend having both a sync and async API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to avoid the sync_to_async overhead where possible. So for each backend that we implement, we should not rely on the default (if possible). So I would suggest we implement DummyBackend.aget_result still

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a play at this, but transaction handling doesn't seem to work in async yet, so it doesn't work properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this module shadows the django.tasks.task decorator. Any thoughts on how to handle that conflict?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at other modules, we might name this file base.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.