Skip to content

add ldappool module #582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

michaelalang
Copy link

I would like to add ldappooling capability to the library.

LDAP Pooling example

entries as dict

from ldappool import ConnectionPool
pool = ConnectionPool(
        params={"keep": True, "autoBind": True, "retries": 2},
        max=5)
pool.set_uri("ldaps://ldap.example.com:636/dc=example,dc=com?uid,mail?sub?(|(uid=test)([email protected]))")
pool.set_credentials("binddn", "bindpw")
with pool.get() as conn:
        for entry in conn.search_s(pool.basedn,
                                   pool.scope,
                                   pool.filter,
                                   pool.attributes):
                print(f"{entry[0]}: {entry[1].get('uid')} {entry[1].get('mail')}")
                for member in entry[1].get("memberOf", []):
                        print(member)

changing the connection or credentials for the pool

from ldappool import ConnectionPool
from ldappool import e2c
pool = ConnectionPool(                                                                                                                                           40,0-1        13%
        params={"keep": True, "autoBind": True, "retries": 2},
        max=5)
pool.set_uri("ldaps://ldap.example.com:636/dc=example,dc=com?uid,mail?sub?(|(uid=test)([email protected]))")
pool.set_credentials("binddn", "bindpw")
with pool.get() as conn:
        for entry in map(e2c, conn.search_s(pool.basedn,
                                            pool.scope,
                                            pool.filter,
                                            pool.attributes)):
                print(f"{entry.dn}: {entry.uid} {entry.mail}")

pool.set_credentials(entry.dn, "changeme")
with pool.get() as conn:
        for entry in map(e2c, conn.search_s(pool.basedn,
                                            pool.scope,
                                            pool.filter,
                                            pool.attributes)):
                print(f"{entry.dn}: {entry.uid} {entry.mail}")

@michaelalang
Copy link
Author

@droideck Hi sorry to bother you. The tests fail with c90-py37: skipped because could not find python interpreter with spec(s): py37

I double checked and I am not referencing any shebang interpreter so I wonder what can I do to get the unittests run, fix them until everything show's successs to get the PR finally discussed and worth being considered ?

All the best and kind regards
Michi

@droideck
Copy link
Contributor

droideck commented Jan 29, 2025

@droideck Hi sorry to bother you. The tests fail with c90-py37: skipped because could not find python interpreter with spec(s): py37

I double checked and I am not referencing any shebang interpreter so I wonder what can I do to get the unittests run, fix them until everything show's successs to get the PR finally discussed and worth being considered ?

All the best and kind regards Michi

Hi!
I think this PR should fix it - #583

Regarding this PR, could you share a bit about the motivation behind adding this new ldappool module? I’m curious about what problem precisely the mechanism for pooling and reusing LDAP connections solves for you (what sparked the need for it)?
Thanks!

@michaelalang
Copy link
Author

michaelalang commented Jan 29, 2025

@droideck Hi sorry to bother you. The tests fail with c90-py37: skipped because could not find python interpreter with spec(s): py37
I double checked and I am not referencing any shebang interpreter so I wonder what can I do to get the unittests run, fix them until everything show's successs to get the PR finally discussed and worth being considered ?
All the best and kind regards Michi

Hi! I think this PR should fix it - #583

Regarding this PR, could you share a bit about the motivation behind adding this new ldappool module? I’m curious about what problem precisely the mechanism for pooling and reusing LDAP connections solves for you (what sparked the need for it)? Thanks!

Sure, I did want to put up that for the discussion but I am sure it's also good to have it in here.

The motivation for the module is coming from an application which utilizes the LDAP connections in an context manner when objects are created.
Those objects spin of through a Flask frontend for each connection coming in and loose context between various steps in the workflow.

With adding an connection pool, the behavior can be changed in a manner of avoiding cost-intensive LDAP(s) connection being created over and over and with the builtin lock-authenticate-unlock method the waste of adding yet another connection to verify user authentication is granted.
I am more a fan of the bind dn directly instead of using a search DN to fetch the user bind dn and authenticate with that on another connection, but that binddn game is a holy-cow and I don't want to break off a discussion where both sides are right (for their use-case).

The ldappool module shall abstract the burdens of connection handling, prewarming, ondemand creation as well as basic authentication of users when search by a generic binddn.

For some numbers, the application I am talking about does provide a docker v2 API and calling 100 failing docker auth requests ends up with ~900+ LDAP connections being created and dropped. The module shows that this can also be done with a fixed set of connections avoiding session/connection limits and improving the duration for such events (not talking about the fail delay response for secrutiy obfuscation) from tcp/tls connection establish point of view.

I also decided to not use threading Queues as they do not give the possibility to choose a random available connection for asubsequent search or an authentication request.

Hope it helps solving other similar issues in the python-ldap world
just in case someone is interested, that's the docker v2 API discussion I started off quay/quay#3539

@droideck droideck requested a review from mistotebe January 30, 2025 01:14
@droideck
Copy link
Contributor

@michaelalang, thanks for the great write-up! It makes sense, and I think it can be useful for many cases!
I'll need to review the code a bit closer, though.

@mistotebe could you please check the idea too, please? I think you worked on connection wrappers at some point (not exactly like this one, but still, we need your point of view here...)

@mistotebe
Copy link
Contributor

From my experience with the OpenLDAP load balancer interacting with various pooling implementations, the following are useful when you need a connection pool:

  • a source of connections that a user can safely send a bind() on (Bind requests destroy most of a session's state), not just a source of pre-bound connections
  • something that gives you a connection for as long as you need to let others know when your requests might be correlated, not for one operation only (Python context managers are a good way to do this, I wouldn't react to a del however, the caller should use a with statement or a try:+finally: to make sure they don't leak it, it's not the pool's job to babysit them)
  • when a connection is returned, the pool checks that it hasn't been closed/Unbound (in async contexts you can also monitor idle connection status with the event loop's select())
  • a minimum/maximum number of open connections for efficiency, opening one on request if none are available but connection limit hasn't been reached yet
  • a way to borrow a connection exclusively but maybe (but that's quite advanced use) also in a shared way, promising to use the asynchronous methods only

Also think the implementation could be greatly simplified if the Connection class inherited from LDAPObject (or #464's Connection).

@michaelalang
Copy link
Author

He @mistotebe

thanks for your input and support, in regards to

Also think the implementation could be greatly simplified if the Connection class inherited from LDAPObject (or #464's Connection).

I am not trying to add yet another LDAPObject. The methods implemented are for the specific use case of utilizing one connection with a search principal and and authentication principal. All the other methods should only relate to the Pool.

The other points you mentioned are a generic concept description and thanks for that (really love the definitions).
I haven't check on any asynchronous code parts as from my perspective (and as you stated with bind on shared connections) the connection cannot live outside of a context (unless people can code that properly on their own).
That said and meaning, if one enters the context implemented and calls async on the connection and leaves the context it should not stay in the state as the Pool (at the moment) would not return you the inUsed connection anymore later for another context (only if returned properly).

I also wouldn't see how this should be working out as a pool retuned connection with server controls set on paging will not reset if someone just does a simple_s on it ...

Anyways, thanks again for the great input. From the code I submitted, what exactly would you change in regards to the mentioned points ? (@droideck I think that question is for you unless @mistotebe wants to answer too)

@michaelalang
Copy link
Author

@droideck anything I can do to help getting a go or no go decision ? Not talking about merging but if no go, we would need to look for alternatives of the concept somehow ...
thanks <3

@michaelalang
Copy link
Author

@mistotebe anything I can do to get this PR progressing ?

@michaelalang
Copy link
Author

@droideck @mistotebe any way to get an update on the request ?

@droideck
Copy link
Contributor

droideck commented Aug 6, 2025

After reviewing the code and thinking more on the subject, I don't think this should be added to python-ldap in its current form. It covers one specific use case, which is valid but niche.

And to cover a lot of pooling cases for different kinds of applications, it will be enormous work (and it will require a lot of support from us). To name a few: toggble connection affinity support, batch processors need long-lived connections, async applications might need completely different approaches, etc.

There are other alternatives already for some cases, as far as I know (django-ldap, flask-ldapconn, HAProxy as an external force)

Could we make something simple like this below, maybe?
But it will require some work and testing...

class PoolableLDAPObject(LDAPObject):
    """Base class that makes LDAPObject pooling-friendly"""
    
    def reset_session_state(self):
        """Reset any session-specific state for pool reuse"""
        # Abandon pending operations
        # Clear controls
        # Reset to base bind
        pass
    
    def validate_connection(self):
        """Check if connection is still valid"""
        try:
            self.whoami_s()
            return True
        except:
            return False

This would let users build their own pooling while python-ldap stays focused on the things it does best.

@michaelalang @mistotebe What do you think?

@michaelalang
Copy link
Author

@droideck I am not sure to understand what you mean with it covers one use case ? ...

I am fine with this PR being rejected it's just annoying that it takes 8 months to get to such a conclusion ...

@mistotebe
Copy link
Contributor

@michaelalang sorry, I have not had much time to review in more detail.

My concerns (if this were to be considered):

  • consider "how" this is going to be used - why is the Connection a context manager rather than ConnectionPool? As a user, I'd find this very surprising
  • having things marked as @property and using them to trigger side-effects is very confusing as well
  • I don't understand the locking discipline and how it's meant to be used if connections are not shared, if sharing is expected, how does one do it (right)?
  • the e2c code doesn't belong anywhere near the pool (think "one logical change per patch, one patch per logical change")
  • I still think the ConnectionPool should yield a subclass of LDAPObject or 464's Connection unless overridden (e.g. subclassed)
  • what happens to the connection as it is returned? e.g. if user calls bind() or do something that has side-effects, do they have a way of telling the pool to clean up afterwards?

@michaelalang
Copy link
Author

@michaelalang sorry, I have not had much time to review in more detail.

I didn't mean you in particular with my frustration on the timing ...

My concerns (if this were to be considered):

  • consider "how" this is going to be used - why is the Connection a context manager rather than ConnectionPool? As a user, I'd find this very surprising

because the Connection should be what is put into context ... since one thread(user) works with one connection, from my POV the connection is what should be put into state not the Pool ... but maybe I am seeing that wrong ?

  • having things marked as @property and using them to trigger side-effects is very confusing as well

sorry, if that's confusing I can change that to normal calls ... just find that more convinient if no parameters are expected on a method anyway.

  • I don't understand the locking discipline and how it's meant to be used if connections are not shared, if sharing is expected, how does one do it (right)?

the locking only ensure's authentication calls are not being interrupted like
Pool with 10 connections:
Thread1 grabs connection 1 and does what ever it does.
Thread2 grabs connection 2 and does what ever it does.

Event though the context should return the connections to the pool, let's just assume someone did not use a context and had connection 3 grabbed.
Thread4 now grabs connection 3 which is already "in use" and for the race-condition game the debugging thread and thread4 both try to issue an authenticate on the same connection.

Sharing is not meant in a one simple_s call on one thread and the next thread calls simple_s on the same connection but within/from a different context.

  • the e2c code doesn't belong anywhere near the pool (think "one logical change per patch, one patch per logical change")

more than fine to remove that ... was more convenient to handle response

  • I still think the ConnectionPool should yield a subclass of LDAPObject or 464's Connection unless overridden (e.g. subclassed)

not sure what you mean by that ...

>>> with pool.get() as conn:
...     print(isinstance(conn, ldap.ldapobject.LDAPObject))
... 
True
  • what happens to the connection as it is returned? e.g. if user calls bind() or do something that has side-effects, do they have a way of telling the pool to clean up afterwards?

I think that merges with the sharing confusion ...
the bind needs to be handled manually if a users decides to do so. For simple verification the connection brings the method authenticae(binddn, bindpw) which as explained above locks and reauthenticates to the pool default credentials. Impersonating with sharing is simply not possible without understanding the logic of the application that implements what-ever ACL .

does this help ?

And for the if this were to be considered yes I do consider as even if that PR is rejected I learned a ton again ... <3

@droideck
Copy link
Contributor

droideck commented Aug 6, 2025

I'm also sorry that we didn't spend enough time with the PR sooner.
Speaking about myself, I had a lot of other priorities in Red Hat that consumed too much focus and time...

I can see you've learned a lot through this process, and that kind of exploration is valuable for the community even when the code doesn't land in core.

Regarding the topic, connection pooling is indeed valuable, but it's a complex topic that requires a lot of testing, iteration, and correct application. And this PR needs a lot of work as noted by @mistotebe (I'll leave it to him to answer your replies, as he has more experience on the subject).

I'd suggest (if you are interested in the topic) to develop this as a separate package (perhaps ldappool on PyPI, or something like that) where you can:

  • Iterate on the design based on user feedback
  • Fix issues (I'd be interested to review there too)
  • Build a community of users who need this specific approach (as I mentioned before, there are a lot of different needs to cover with the tool)

Once it's battle-tested and widely used, we could revisit integration or at least recommend it in our documentation.

For python-ldap itself, we might consider adding minimal pooling-friendly features (like a reset_session() method or connection validation helpers), but full pooling is better handled by specialized packages, IMO.

@michaelalang
Copy link
Author

all good and understand ... please close the PR accordingly :)

@mistotebe
Copy link
Contributor

@michaelalang sorry, I have not had much time to review in more detail.

I didn't mean you in particular with my frustration on the timing ...

My concerns (if this were to be considered):

  • consider "how" this is going to be used - why is the Connection a context manager rather than ConnectionPool? As a user, I'd find this very surprising

because the Connection should be what is put into context ... since one thread(user) works with one connection, from my POV the connection is what should be put into state not the Pool ... but maybe I am seeing that wrong?

No you're right, sorry, my wires got crossed and somehow thought that a with pool.get() as c: would trigger an __enter__() on the pool.

  • I don't understand the locking discipline and how it's meant to be used if connections are not shared, if sharing is expected, how does one do it (right)?

the locking only ensure's authentication calls are not being interrupted like Pool with 10 connections: Thread1 grabs connection 1 and does what ever it does. Thread2 grabs connection 2 and does what ever it does.

Event though the context should return the connections to the pool, let's just assume someone did not use a context and had connection 3 grabbed. Thread4 now grabs connection 3 which is already "in use" and for the race-condition game the debugging thread and thread4 both try to issue an authenticate on the same connection.
Sharing is not meant in a one simple_s call on one thread and the next thread calls simple_s on the same connection but within/from a different context.

A lock on the pool to not hand out the same connection twice makes sense (but I thought the interpreter concurrency model shouldn't really make it possible?) However your code has locks on the connection as well?

Just like I feel the pool management code should be part of the pool class (e.g. conn.giveback() might optionally do a state reset and then just call self._pool.put(self) where the pool does the hard work of deciding whether to keep it for later .get() etc.)

  • I still think the ConnectionPool should yield a subclass of LDAPObject or 464's Connection unless overridden (e.g. subclassed)

not sure what you mean by that ...

>>> with pool.get() as conn:
...     print(isinstance(conn, ldap.ldapobject.LDAPObject))
... 
True

Looking at your code, I'm not sure how that's achieved, since pool._pool holds ldappool.Connections, they are not derived from LDAPObject or anything else.

  • what happens to the connection as it is returned? e.g. if user calls bind() or do something that has side-effects, do they have a way of telling the pool to clean up afterwards?

I think that merges with the sharing confusion ... the bind needs to be handled manually if a users decides to do so. For simple verification the connection brings the method authenticae(binddn, bindpw) which as explained above locks and reauthenticates to the pool default credentials. Impersonating with sharing is simply not possible without understanding the logic of the application that implements what-ever ACL .

does this help ?

Sure, but to have a pool (with reusable connections) you want a way to signal to the code this needs to be done. Unless you're doing it unconditionally for all case, would happen either as you're asking for a new connection (like having different endpoints pool.get(), pool.get_bound(), pool.get_shareable()) or as you're returning it (where "this connection is unclean and needs to be reset for use by pool.get_bound()/shareable()" being the default).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants