URL parsing is a pretty common problem. The rules for it are pretty well established, and we have around 600 libraries to choose from, and that's just in the node.js ecosystem. If we account for every major programming language, that number will likely grow into the thousands. And new languages are being created all the time, and each of them will implement URL parsing.
This seems like insanity. Is URL parsing such a complex problem that each language needs to have its own set of libraries to handle it? Why can't node.js reuse the URL parsing code that has doubtless already been written in C? Or Python? Or Java? Why are we redoing work that's already been done?
And every time the URL parsing library is recreated, there is non-zero chance that it will have bugs. So the more times we copy some code, the more likely that at least one of those copies will be wrong.
Let's take a step back and view the problem from a business perspective. Let's say you have some code which determines how long an article will take to read. Eventually you will want to call this function from more than one place. This isn't a problem if your entire codebase is built on one language, but most businesses aren't like that. You have some Python here, some Perl there, some PHP hiding under this rock, and one service that someone insisted on writing in Haskell. Even if your business is really simple, you probably have a backend language and some browser side code.
Homogeneous Code
"It's easy!", I hear you saying. "Write all your code in {language}! Then all language compatibility issues disappear and you can lead a pleasant, trouble-free life." The problem is, not everyone agrees on what {language} should be. And unless your code will never have to touch browsers, that {language} is Javascript if you want real cross-code compatibility. And if you also have an android app, then your code needs to be... JS still? Kotlin maybe? I'm not even sure if there are good solutions for cross-platform code these days.
Even if you could use one programming language, you'd be stuck with it forever. Do you want to be in the same situation banks are in? Struggling to hire Cobol programmers because they refuse to update their tech even though almost everyone who understands that tech is dead? Eventually you want to take advantage of what new languages have to offer.
The other problem here is that even if we could get everyone using one language, we've stifled growth. C, Rust, Python, Lisp, and Haskell would each never exist if programmers had ensured that all code they wrote was Fortran. We can't stop the future to make the present a bit easier.
The above point about using one language industry wide is a moot point anyway. You will never make all programmers agree on The One True Language. To some programmers, anything but C lacks precision and control. And to others, anything but Haskell feels like tedious repetition. Everybody values something different in a programming language, and there is no programming language in existence that makes everybody happy.
Encouraging code uniformity has some merits, especially within a single project. But I think it's a terrible industry wide solution. If anything, we should be creating experiments and new languages constantly, because this is how the science of computing evolves.
Foreign Functions
So one shared language is a bad solution. What about using FFI (Foreign Function Interface) libraries? Then you could import modules from other languages and use them as if they were in your local language. Some particular language could own The Official URL Parser and another language could own The Official Valid Email Regex (Perl). Then we can all use each other's code and we can stop rewriting libraries! ...except that now we've just shifted the code duplication into the FFI library world.
If each language needs to have a FFI library to bridge to every other language, we have N*(N-1)
FFI libraries. So if there are 30 major programming languages (probably more, and growing daily), that's 870 FFI libraries. And that's if there is only 1 FFI library per import language. Remember, there are 600 libraries for URL parsing in node.js alone.
Also, you will need to remember which source language the library you want is implemented in. Then you need to install it with whatever package manager is required for this language (if the language even has a package manager). Then you need to import the FFI library in addition to the library that you actually wanted, and finally you need to remember any language specific quirks that might affect the way you call the newly imported code. Needless to say, this solution gets messy.
Transpile Your Code
Since the algorithms that we use to solve problems don't change that much from language to language, why can't we just write all our algorithmic library code in a shared language that is designed to compile down to lots of different target languages. For example, Haxe is just such a language. So there we go, we could have common modules written in Haxe, then compiled down into all the major platforms so that every language could use the same algorithm code to solve common problems.
The good part about this solution is that module importing becomes easy again. Since the source code each module is actually available in whatever language you happen to be building with, there's no dealing with many package managers, no need to remember which language a module is written in, and no need to double-import a FFI library and then the real library.
This solves the FFI library explosion problem as well. Instead of needing to write N*(N-1)
FFI libraries, we would only need to write N
language targets for the Haxe compiler. I'm not sure how idiomatic that compiled Haxe code is, but remembering the Haxe quirks seems better to me than Python quirks sometimes, Perl quirks other times, and sometimes C quirks (and so on).
Perhaps there could be an organization that collects vetted Haxe modules and automates the build pipeline that transpiles and publishes updated modules in each language ecosystem. Then we could have 1 official version for basic things like URL parsing, and completely avoid situations like we had with JSON Web Tokens where some library authors didn't implement the standard correctly.
You could use this for your own code as well. Anything that needs to be shared between programming languages and environments would be written in Haxe, and anything that is one time code could be written in whatever language the project chooses.
Of course, it's difficult to know ahead of time which code will be shared and which code is only needed once. And the line between "shared library code" and "business logic that I only need once" is not as clean as I would like it to be. Over time you discover that code that used to be single use is actually needed all over now. So over time, you would need to extract more and more of your business logic and rewrite them into Haxe libraries code so you can reuse them. Now we're stuck with the same problems as the Homogeneous Code solution, except that the One Language is Haxe.
Also, sometimes performance really matters. Sometimes a language like D just happens to have a really fast JSON parser. If all your standard code is written in one shared language, it's tricky to guarantee that the code is going to be fast in each of the target languages.
Microservices
"Microservices!", some people will inevitably shout. "Turn your function that calculates article reading time into a JSON API. Everything language already speaks HTTP, after all". I won't spend too much time rebuffing this, since it's a very clearly terrible idea.
First, this turns normally simple function calls into complex network calls. These operations now need to handle an extra layer of network related errors in addition to the normal range of errors that they normally would have handled.
This solution is also the slowest. Every level of indirection adds webserver overhead, so your code won't execute as fast. And network speed is a lot slower than RAM speed, so your data transfer will be slower too. This results in your whole system slowing down. Heaven forbid you do some networked "function" calls in a loop.
Microservices also results in lower system stability or higher system complexity. Making reliable services is already hard enough without adding lots of utility servers which can go down and block everything. You could add redundant copies of all your utility servers and load balancers for the load balancers, and retries baked in at every level of your program, but that's a solution for a problem that doesn't need to exist.
Trans-language Module Standards
So microservices are a non-starter, and transpiling everything from 1 source language isn't a silver bullet either. So we need to keep all the languages, and allow them write modules is their own code, but we still want to share them somehow. FFI wasn't quite right, but what if we took the FFI solution one step further and establish some sort of "standard module" interface. We could define a shared data format that allows us to call a foreign compiled function as if it were native code. I'll call this "foreign standard modules".
Each language that wants access to all the standard modules would just need to implement a translation layer between the standard format and their own internals. Once again, instead of N*(N-1)
translation layers, we would only have N
. The standard foreign function call could communicate whether the call will result in a synchronous, asynchronous, or a stream value.
The problem of package managers disappears too. A programming language team could build a system to automatically publish a local language friendly wrapper around the standard module. This way, the module will a) work with the package manager the language provides, and b) feel idiomatic since the foreign standard module has been wrapped by a bit of translation code.
Scripting languages like Javascript and Python, or runtime languages like Java would more difficult to integrate, since they would need to bundle their runtime with the module somehow. This would result in a larger build size. But this is an optimization problem, not really a technical barrier. In fact, Java has already added support for building a module with a minimal runtime.
In order for this to work, we would have to actually get a decent number of language designers to agree on a shared binary function interface (and actually implement it). And universally accepted standards are... difficult.
Additionally, there are some ideas which simply cannot translate to other languages. For example, Rust has a memory safety guarantee. Since other languages don't have a memory safety guarantee, it would be difficult for Rust code to import modules built from most other languages.
Overall, I still think this is the cleanest solution, but I'm no expert in cross-language code reuse.
The end bit
Basically any way you try to tackle this problem, you find more problems. Trying to adopt one uniform programming language is never going to work. Foreign function interfaces are a good idea, but the current state implementation is not easy to use. Transpilation is good in some contexts, but it has a lot of the same problems as a uniform programming language. And microservices are a joke answer.
It's a tricky situation, but I think it's a waste of good programmer talent to continually implement the same libraries for every new language that comes along. We should try to find a way to reuse what's already been done.