the persistent idiocy of "privileged ports" on Unix

TCP and UDP have ports.  These are 16 bit; there are 65535 or so per IP address.

These protocols don't care to differentiate between the ports.  Elsewhere, IANA presumes to operate a process to allocate "well-known" ports in the range 1-1023, "registered ports" in the range 1024-49151, and to reserve the remainder, 49152–65535, for "ephemeral" ports.  The caller end has to have a port, which is how replies get back within the virtual connection, and these are conventionally picked from the ephemeral range by the OS's networking stack.

The whole idea of ports is ridiculous, because it allows ISPs to arse around presuming to decide which services they will "not allow".  Anything that allows IPSs to do anything other than shift opaque packets will allow ISPs to meddle and break things, and due to the Law of Meddling, if they can, they will.  I am currently working around an issue with Claro, a pretend ISP, blocking port 5060, allocated to SIP, not because they think doing so will help their telephony business, but simply because they "can".  The Law of Meddling in action.

This said, ports are a current reality, and this article is not about doing away with ports.

In Unix land, alongside the registry notion of the "well-known" port range, came the hack that root could bind(2) to such a port, known as a "privileged" port in the Unix world.  Any user can bind(2) to any port outside this range, the non-privileged ports.  It should be immediately obvious, to anyone with a passing interest in this topic, how disastrously stupid this is, but we can provide some historical context to help understand it as a limited and temporary hack.

Imagine a world without mobile devices, wifi, laptops, and so on.  In this world, a network consists of Unix servers and Unix workstations.  The equipment is all owned, operated, controlled, by a single entity.  In the days of "privileged ports" this was a research department in a university, or a hybrid private-sector / defence department blue-skies research department, or something like that.  We can call this entity the Corporation here.

If all equipment connected to the network is controlled by the Corporation, you can use this privileged port thing as authentication, in both directions.  Users could run their own things on whatever high-numbered ports, but if you connected to the server's telnet port, 23, then you were definitely talking to that system's telnet service, as controlled by the Corporation.  Similarly, if you were a file server getting NFS requests from a workstation's low-numbered port , then this was legit Corporation-controlled NFS requests, and the client therefore wasn't shitting you about the UID (for example) on behalf of which it was making the requests.

In the modern world, ability to listen on (server), or originate from (client), particular ports almost never signifies anything of that sort by itself, outside deliberately-designed and carefully-controlled environments where it may hold.  On a typical network, someone's toothbrush can fire up an extra VM, having whatever IPs, and just do what it wants with regard to the ports on those IPs.

There are now two, converse, problems with the Unix privileged port "design".

Firstly, the systems administrator should reserve service ports to the uids running those servers, even when those ports are outside the privileged range. Let's take SIP as our example again. Suppose you are running a sip server as a system user called _sipd.0.  The SIP port is 5060, outside the privileged range.  How do you ensure that only the user _sipd.0 can bind(2) to port 5060 on the IP address(es) on which the service is defined as being available?  The answer is not Mandatory Access Control.  It's a simple OS resource and it should be configurable as such.  The answer is basically you can't.  Any process can bind(2) to the service address, it just has to get there first.  Ah, but the server should not be configured to allow such a thing!  There shouldn't be users on there who are going to do that!  Well, then why bother with privilege separation, having users and stuff, in the first place?  All "solutions", like dedicating a server, virtualization, containerization, fall into the category of being pragmatic workarounds for a basic deficiency in the OS, but not addressing the basic deficiency in the OS itself.  Linux also has CAP_NET_BIND_SERVICE, which isn't the design we're looking for.  It's a property of a process, not a user.  And it allows binding to any port.  It doesn't allow per-port configuration.

Secondly, the systems administrator typically wants to run servers as normal users, including servers whose service ports are inside the privileged range.  Take http, on port 80, a "privileged port" in this scheme.  If the sysadmin decided to run this particular web server instance as user _httpd.7, then why can _httpd.7 not bind(2) to port 80 on those IPs?  Again, because the OS does not have a way to say "this user can this port".  So we have web servers, and all other servers whose standard ports happen to fall below 1024, expecting to be started as root, bind(2) to their service address, and then "drop privileges".  At this point, you must think I'm shitting you, but I'm not.  This is for real.  Failure to drop privileges after binding to listen port is a whole category of vulnerability.  It's like throwing egg on the stairs every morning, and several times later each day carefully classifying various accidents in the accident log as "slipped on eggy stairs".  Stop throwing egg on the stairs!

It's interesting to ask whether this behaviour is required by the specification, meaning here POSIX, or if it's more of an implementers' convention. Looking at bind(2) in POSIX 2018 (warning: links do not work because IEEE have deliberately broken linking to a specific page of the spec, you have to search again yourself for "bind" each time you go there and look at that page), the only thing of possible relevance seems to be this error value:


[EACCES]
 
The specified address is protected and the current user does not have permission to bind to it.

The policy or mechanism of such protection are not specified or mentioned.  I haven't exhaustively searched POSIX to confirm privileged ports don't crop up elsewhere, but I'd be surprised.  This is good enough to shift any onus and assume it doesn't, until someone points to where it does.

Thus we can conclude that what POSIX calls "implementers", meaning designers of derived interfaces still compatible with POSIX (conceptually later, these are "implemented"), are free to design whatever they want, to control access to ports.

The best workaround I've found so far for the second problem (allowing access to the privileged ports) is Ian Jackson's authbind(1).  It's currently done as a suid root and LD_PRELOAD hack, but the nice part is that port access is controlled by execute rights on normal files under a particular location, /etc/authbind/by{addr,port,uid}.  Whatever the warts of the exact interface design, the basic idea of using ordinary filesystem object access semantics, is sound.  The implementation choices seem proof-of-concept-y.  An implementation of this kind of thing on monolithic Linux would naturally go in the kernel (I'm not sure if people would claim the kernel should not look at /etc, but this is basically irrelevant).

authbind(1) seems to only ship with Debian.  The suid root component is /usr/lib/authbind/helper.  You have to give option --deep if you want child processes to be affected the same way -- I have no idea why you wouldn't want that.

This does lead to rather ridiculous situations of dropping privilege from the system to run the server, only to elevate it again, just for the port, like in the following example, run from process supervision as root:

dropto _ng.test authbind --deep nginx -c /etc/nginx.dtest.coulddobetter.at.conf -e /var/log/nginx/error.log -g 'daemon off;'

(dropto(1) is my privilege-dropper, kindly allowed for publication as free software by a former employer, tho it can be implemented in just a few lines of change from DJB's setuidgid(1).  See https://github.com/tomgjones/dropto).

The filesystem configuration that goes along with the above is

-rw-r-xr-- 1 _ng.test _ng.test 0 Nov 18 14:45 /etc/authbind/byaddr/2001:ba8:0:402e::,80

Picking the above apart, it starts off as root.  The dropto _ng.test ... exec's the remainder as _ng.test, a normal user, and the one intended to run the web server.  The authbind --deep then arranges, due to its configuration under /etc/authbind, that the nginx ... it runs can bind to the configured TCP port(s).

In case the long command is perplexing, it can be useful to know that authbind(1) and dropto(1) can be thought of as adverbial programs, in that they exec(2) their arguments, with something about how they're run being modified.  In this way, adverbial commands chain together simply.  So the real command is the part starting nginx, and there are two "adverbs" preceding it, the dropto one, and the authbind one.

It's fascinating to think of the vast hordes of sysadmins who have "worked around" this issue in their own ways since it's existed (usually just by running everything as root, perhaps in conjunction with containerisation or similar), with no satisfactory systematic, mature and standard solution emerging.  This does seem to be how things work.  Millions, or billions of people will accept how things are, ignoring improvements that are in plain sight.  It's disappointing.  But it's also somehow encouraging, that creating basic improvements is so accessible.

I do not yet have my own solution to this issue designed.  At this point, the problem is identified and described.

Finally, I have to note that "answers" sites are full of things like "just use sudo", meaning, just run it as root, for any issue involving binding to a low-numbered port, or indeed, any other access issue.  One example: https://stackoverflow.com/questions/34258894/nginx-still-try-to-open-default-error-log-file-even-though-i-set-nginx-config-fi.  Weeping Jesus help us.

Comments

Popular posts from this blog

google is giving more and more 500 errors

label remanence