Skip to content

chore(eco): Refactors organization report building #96917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

GabeVillalobos
Copy link
Member

@GabeVillalobos GabeVillalobos commented Jul 31, 2025

Continuation of #96869, merge that first

Includes the following refactors:

  • Adds a new OrganizationReportContextFactory to clean up the prepare_organization_report task.
  • Moves is_empty check logic to OrganizationReportContext class, from the dedicated util where it lived before.
  • Adds stronger typing for ProjectContext checks when building org contexts.

Why these changes?

This is a precursor for adding metrics, SLOs, and logging for organiztion reporting logic.

This also will precede SLOs for individual email sending, which will come in a later PR as well.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 31, 2025
@GabeVillalobos GabeVillalobos requested review from kcons and a team July 31, 2025 22:45
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

❌ Patch coverage is 97.68786% with 4 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/tasks/summaries/weekly_reports.py 95.16% 3 Missing ⚠️
...s/summaries/organization_report_context_factory.py 98.82% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #96917    +/-   ##
========================================
  Coverage   80.68%   80.69%            
========================================
  Files        8498     8506     +8     
  Lines      374269   374689   +420     
  Branches    24290    24290            
========================================
+ Hits       301996   302361   +365     
- Misses      71896    71951    +55     
  Partials      377      377            

Comment on lines +38 to +45
with sentry_sdk.start_span(op="weekly_reports.user_project_ownership"):
for project_id, user_id in OrganizationMember.objects.filter(
organization_id=ctx.organization.id,
teams__projectteam__project__isnull=False,
teams__status=TeamStatus.ACTIVE,
).values_list("teams__projectteam__project_id", "user_id"):
if user_id is not None:
ctx.project_ownership.setdefault(user_id, set()).add(project_id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these chunks is copied pretty much verbatim from the prepare_organization_report task. It just allows us to individually wrap chunks in SLOs if we decide to go this route.

I could also see a scenario where we run each of these steps on a per project context basis and remove all of the nested mutability of passing around a partially populated OrganozationReportContext object. Future goal though 😅

Comment on lines +61 to +67
"""
Returns True if every project context is empty.
"""
return all(
project_ctx.check_if_project_is_empty()
for project_ctx in self.projects_context_map.values()
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the util for this with an actual helper method for simplicity.

Comment on lines +52 to +96
@dataclass
class WeeklyReportProgressTracker:
"""
This class is used to track the last processed org ID for a given
weekly report. It can either be configured with an explicit start time and
watermark TTL, or it will assume beginning of day, with a 7 day TTL.
"""

beginning_of_day_timestamp: float
duration: int
_redis_connection: LocalClient

REPORT_REDIS_CLIENT_KEY: Final[str] = "weekly_reports_org_id_min"

def __init__(self, timestamp: float | None = None, duration: int | None = None):
if timestamp is None:
# The time that the report was generated
timestamp = floor_to_utc_day(timezone.now()).timestamp()

self.beginning_of_day_timestamp = timestamp

if duration is None:
# The total timespan that the task covers
duration = ONE_DAY * 7

self.duration = duration
self._redis_connection = redis.clusters.get("default").get_local_client_for_key(
self.REPORT_REDIS_CLIENT_KEY
)

@property
def min_org_id_redis_key(self) -> str:
return f"{self.REPORT_REDIS_CLIENT_KEY}:{self.beginning_of_day_timestamp}"

def get_last_processed_org_id(self) -> int | None:
min_org_id_from_redis = self._redis_connection.get(self.min_org_id_redis_key)
return int(min_org_id_from_redis) if min_org_id_from_redis else None

def set_last_processed_org_id(self, org_id: int) -> None:
self._redis_connection.set(self.min_org_id_redis_key, org_id)

def delete_min_org_id(self) -> None:
self._redis_connection.delete(self.min_org_id_redis_key)


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the parent PR, ignore pls

Comment on lines 79 to 124
if timestamp is None:
# The time that the report was generated
timestamp = floor_to_utc_day(timezone.now()).timestamp()

if duration is None:
# The total timespan that the task covers
duration = ONE_DAY * 7

batch_id = str(uuid.uuid4())

def min_org_id_redis_key(timestamp: float) -> str:
return f"weekly_reports_org_id_min:{timestamp}"

redis_cluster = redis.clusters.get("default").get_local_client_for_key(
"weekly_reports_org_id_min"
)

min_org_id_from_redis = redis_cluster.get(min_org_id_redis_key(timestamp))
minimum_organization_id = int(min_org_id_from_redis) if min_org_id_from_redis else None
batching = WeeklyReportProgressTracker(timestamp, duration)
minimum_organization_id = batching.get_last_processed_org_id()

organizations = Organization.objects.filter(status=OrganizationStatus.ACTIVE)

for organization in RangeQuerySetWrapper(
organizations,
step=10000,
result_value_getter=lambda item: item.id,
min_id=minimum_organization_id,
):
# Create a celery task per organization
logger.info(
"weekly_reports.schedule_organizations",
extra={
"batch_id": str(batch_id),
"organization": organization.id,
"minimum_organization_id": minimum_organization_id,
},
)
prepare_organization_report.delay(
timestamp, duration, organization.id, batch_id, dry_run=dry_run
)
redis_cluster.set(min_org_id_redis_key(timestamp), organization.id)
with WeeklyReportSLO(
operation_type=WeeklyReportOperationType.SCHEDULE_ORGANIZATION_REPORTS
).capture() as lifecycle:
try:
batch_id = str(uuid.uuid4())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with these changes.

@GabeVillalobos GabeVillalobos force-pushed the gv/refactor-organization-report-context-building branch from 685b49f to e54c100 Compare August 1, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant