Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
Hi! Since I’ve set up a new RSS aggregator, I’m looking for your favorite blogs, sites, and newsletters to help populate it with interesting stuff. Y’all know I find most things fascinating, so send me anything on any topic, no matter how niche or tangential!
A lot of data work in 2023 is very much different than how things were when I started 15+ years ago. Most of us will now source our data out of giant databases that someone else has set up and manages for us. A lot of our tools have GUI interfaces, whether it’s a browser window or the occasional stand-alone app like a spreadsheet or IDE. With the rise of the “data engineer” title, the many people who don’t want to fuss about on the command line modifying config files have been freed from much of that work. This is all a positive thing because the data world is more than big enough to accommodate people who don’t like working in specific sections of the tech stack.
But even if we don’t need to do it at a professional level, it is a great idea to have familiarity with running a small system with actual useful services. Not only do we get to make some useful stuff for ourselves and learn some skills for if we do have to dive into data engineering tasks, we also get a base understanding (and appreciation) of what all the various Ops and SRE do for us.
You’d think that with a cheap $5/mo server instance on your favorite second-tier cloud provider wouldn’t give you enough resources to do stuff like run containers or compile software, but you actually can. Plus you can host simple stuff like your own homepage and resume on it for easy reference. The value for money spent is pretty great.
In search of an web-based RSS app
Here’s what prompted me to write about running servers this week — with the rapid mass-implosion of Twitter, Reddit, and other social media, I came to the conclusion that the data universe is going to be fragmented for the next couple of years, and I was going to need to keep track of an increasing number of blogs and newsletters to stay on top of things. My email box is utter chaos, so it was time to get a dedicated tool to manage the chaos because my inbox can’t filter it well enough — enter RSS feed aggregation.
In my mind, there must be an RSS client that runs as a web-based service that I could install on my tiny $8mo server. I didn’t want to run a local RSS client because I constantly swap between computers and keeping things in sync would just be a nightmare. So, off the internets I went.
I’ve been putting stuff online a while. First by using free web hosting on things like Geocities (it was the late 90s). Then I graduated to paying for a managed web host to host some simple web pages. Finally, I slowly learned FreeBSD enough and moved all my stuff onto small virtual web servers. From Wordpress installations with billing storefronts, to random little web apps and custom web services, I’ve done plenty with little machines that cost a couple of dollars a month to run. So, surely, there must be some software I can run to solve this task, right? How hard could it be?
There seemed to be a few RSS reader applications that seemed popular and could be self-hosted. Newsblur (a subscription SaSS product that you can self-host), Tiny Tiny RSS, and FreshRSS came up as respectable candidates. There’s many other alternatives, but I figured the more popular ones are probably the most feature-rich and well maintained for a lazy admin like myself.
Then I hit my first big issue: the two relatively “big” projects, Newsblur and Tiny Tiny RSS (TT-RSS) both only distribute their installations via Docker containers.
The biggest thought in my head when I saw this was — What the hell?
Whatever happened to “get the package from your distro. or compile this binary from source” like a normal system app? Web apps usually don’t even have a compile stage and we’d just do the “Here’s the app binaries/files, point the web server at this directory, have this list of dependencies installed and configured, go!” process.
When did application distribution start going to containers? Is this some trend that I completely missed out on? I get that containerization is all the rage in the enterprise/ops world, but for pet projects and things too?
Newsblur is a full on SaSS subscription service built to horizontally scale with their paying customers, they’re just providing the self-hosted version as a way to support their freemium business model. It’s somewhat understandable that their self-hosted installation process actually spawns a number of docker containers to host various services that they need, like Redis caches and database instances, on top of a basic web server. It’s overkill for a home gamer who just wants to support a single instance, but the company is also not incentivized to make the most absolute efficient free option.
TT-RSS doesn’t have a business model, being just open source software. But they just decided to pack their app, which includes the Postgres database and PHP 8.1 in one Dockerfile and nginx in another. I don’t understand why both the app and Postgres are installed into the same container since any scaling situation I can imagine would want the stateful data separated from the stateless applications.
The only reason any of this makes sense seems to be due to the Linux software ecosystem being a chaotic flock of software all vaguely flying in the same direction. Putting all dependencies into containers allows the developers to not deal with handling tons of support requests from people running arbitrary unholy mixes of OS/DB/PHP versions that tend to generate the weirdest and most frustrating bugs and feature requests. It’s nice from a dev standpoint, but at the cost of user flexibility.
For people like us who are mostly learning how to run out own servers and work with containers, it’s a rather blunt forcing function to make us engage with container specific workflows — something we’re increasingly going to have to do over the course of our careers. I actually considered whether it’d be worth taking these containers and seeing if I can run them on a cloud container running service. I only stopped because it’d actually cost money to do so.
Aren’t containers “The Future”?
Ugh.
So drives me up the wall with this is how I cannot find a documented way to install these things without going the Docker route. There’s no official documentation about doing it manually, and I am certainly not interested in reading through an unfamiliar codebase to try to figure out what assumptions are baked into everything to make it all work. It could be as simple as pointing the web server to a specific directory and changing certain config files… or it could be something else. Who knows.
Plus, this isn’t some magical hard-to-port microservice-based thing. It’s PHP with a database! The LAMP (Linux-Apache-MySQL-PHP) stack has been around with that name since 1998. ARGH.
I want non-containerized software here because my primary little web server runs FreeBSD, not Linux. We don’t have any containers in that world. Instead we’ve had the concept of ‘jails’ which provide similar process and file isolation constructs. Since I’m running a completely different operating system, Docker and all the other container runtimes that exist don’t even work on my server. So I’m effectively locked out of installing the software the “developer-intended” way.
Given that I’ve got a little kid running around the house, I don’t have the time to download the core repositories and then try to figure out how to get the app to execute in a dirty environment. Yes, I can get a Linux server like everyone else, but the one I already have is busy with other apps already while the BSD server has plenty of headroom left.
So in the end, frustrated at the offerings, I gave up and installed FreshRSS because at least they’re still doing things the old sane way. One relatively painless download, a slightly more painful half hour finding and installing all the PHP extensions I didn’t have, a quick database update, had me up and running. A bit more work got SSL running on the subdomain via LetsEncrypt and it was done! It took twenty years of tinkering to accumulate the knowledge (and cheat sheet notes) to get to this done in a few hours, but I at least I can write a post about it!
Sidebar: What’s with your whole FreeBSD thing?
While I had been playing with command line stuff since the MS-DOS 5.0 days, Unix was a whole different universe of complexity. Back in the late 1990s and early 2000s, getting help with a busted OS install was extremely difficult. Most homes that owned a computer only had one that the whole family shared. So tinkering with *nix usually meant one of a few things: either you were rich and owned multiple computers, you printed out detailed instructions for reference and hoped for the best, you prepared a backup/multi-boot setup in place Just In Case, or you could ask someone in person to help you like at a Linux User Group meet.
Thanks to that high risk profile, I never really got into working with Linux systems until much later in my career because they can break in unintuitive ways for a beginner. Heck, a number of years ago I accidentally rendered a system unusable by trying to update the core system and a weird circular dependency on a critical component cropped up.
At the time, FreeBSD had a really good actual manual (that is still very good today). It actually explains where stuff is and how to do simple tasks like install, update, and backup the system. Plus it was maintained and kept in sync with the OS as versions were released. It’s also very difficult to break the underlying OS in a way that makes it unbootable, even today. So it’s great for a beginner like myself who couldn’t afford to brick their machine. It’s also great for someone who isn’t 100% confident in their sysadmin skills to recover a busted virtual server out in the cloud somewhere.
So while I now use Linux at work because it’s got official support for GPUs and CUDA and containers, I still use BSD for personal small things that I just want to keep functioning while doing fairly simple stuff.
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord.
Support the newsletter:
This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Share posts with other people
Consider a paid Substack subscription or a small one-time Ko-fi donation
Tweet me with comments and questions
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!
I installed TT-RSS a few years ago, and got annoyed when they started saying regular installations are deprecated. I found the old installation documentation at https://tt-rss.org/wiki/InstallationNotesHost, which looks like what I'm still running, but I do worry that they're going to break something. I'm running it on NetBSD.
I also hate this trend of docker as default application deployment. Docker includes mutable application state and data into the recipe process. If it was stateless it would not be so bad and it would be much more auditable. We would then be able to have a command to unfurl any dockerfile into the full set of build instructions.