Skip to content

Link checker excludes getting out of hand #35

@gwct

Description

@gwct

We are getting continuous problems with link checker errors, the solution to which seems to be adding more and more domains to the list that are excluded from being checked. As the number of excluded domains grows, the utility of the link checker diminishes.

As of Jan. 9 2025, these are the excluded domains from our link checker:

Domain Date added to exclude list Reason for adding
scholar.google.com 05.01.2025 403: Network error: Forbidden
useast.ensembl.org 04.10.2025 403: Network error: Forbidden
academic.oup.com/bioinformatics 04.10.2025 403: Network error: Forbidden
doi.org 01.09.2025 403: Network error: Forbidden
academic.oup.com/nar 01.09.2025 403: Network error: Forbidden
gnu.org 10.25.2024 429: Network error: Too Many Requests
anaconda.org 04.05.2024 unknown
fonts.gstatic.com 12.06.2023 unknown
www.microsoft.com/en-us/microsoft-365/onedrive/online-cloud-storage 12.08.2023 timeout

I think some of these are justifiably excluded, like fonts.gstatic.com and the microsoft one, but others are very wide-ranging, like doi.org.

We also run into errors with the cache, which has to be manually deleted here if some links threw errors in previous runs - at least I think that is the reason. These show up as cache errors.

This issue is for discussing possible solutions to these problems. Ultimately I think the link checker is a good thing to have, but it becomes tedious to deal with these same issues each time we build the page, and the growing list of excludes defeats the purpose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscussing various things about the website

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions