I think Nushell seems cool, and Iâve had the thought of switching to it a couple of times.
The main blocker for me was that I just donât do that much in my terminal aside from moving around and running individual commands with their arguments. All the data processing capabilities sound cool, but just not part of what I find myself needing. On the other hand, sometimes I copy a command from somewhere or run a curl > sh install script (I know, the horror!), and the fact that that doesnât work with nu was a knock against it.
Now, I may be outside of the target audience for Nushell as a backend engineer that doesnât do a lot of data/local scripting stuff. But I have a bit of a hard time imagining who the ideal target user isâŚ
Itâs a way of detecting dangling references to slab-like allocations in runtime.
It exists purely in runtime, it requires each allocation to have a generation index, it requires every reference to carry that index.
When a reference is dereferenced, indices must be checked in runtime, if the allocation does not have the generation of the reference, the caller must handle the situation in runtime.
Unlike RC, this does not give any static guarantees about validity of the allocation. It might be slightly faster when there are lots of copying of references and few accesses to their allocations.
Generational references is basically a term invented by the Vale author for a software version of ARM memory tagging.
this is a random-access data structure that stores an unordered bag of items and gives you a handle for each specific item you insert. Looking things up by handle is O(1) â the backing storage is just a Vec â and items can be removed as well, which is also O(1). Handles are small (two usizeâs) and easy to copy, similar to a slice. The trick is that there is a generation number stored with each item, so that a âdanglingâ handle that refers to an item that has been removed is invalid. Unlike array indices, if you remove an item from the array and a new one gets put in its place, the old stale handle does not refer to the new one and trying to use it will fail at runtime.
This is useful for various things: managing lots of uniform things with shared ownership (such as video game resources), interning strings, that sort of thing. Essentially by using handles to items you have runtime-checked memory safety: you can get an item from the map using the handle, since itâs basically just an array index.
In practice this is not unrelated to Rc, itâs just that Rc does the accounting of memory on cloning the Rc, and this does it on âdereferencingâ by looking up an objectâs handle. Referencing counting, threading garbage collection and this sort of map are all different methods of achieving the same thing (checking for memory safety at runtime) with different tradeoffs. With a reference count you donât have to explicitly free items and canât have stale handles, but loops require special handling and you pay for the accounting cost on cloning a handle. With this you have to free items explicitly, but stale handles can be safely detected and the accounting happens when you look the item up. You can also use it as a slab allocator type thing, where you pre-allocate a large amount of storage and free it all at once.
Irrespective of whether theyâre hard to learn, Iâd agree that theyâre a useful tool worth learning at least this subset of.
Iâd also highly recommend https://regexr.com/. I always use it when Iâm trying to build up and kind of more complicated pattern, because it makes it easy to visualize how the parts of the pattern match text and it letâs you put as many example strings as youâd like to test against. Their built in cheat sheet is also very handy.
Iâd pretty much given up on learning regular expressions until I found some interactive, visual tools. In the early 2000s, it was RegexBuddy. The one Iâve been using for the last decade or so is regexplained.co.uk. Seeing a regular expression as a railroad diagram as you build it really helps not only to understand a specific regular expression but to learn them in general.
Prometheus queries are hard because itâs design priorities scale (via very effecient data collection) over querying data. In normal usage of Prometheus you canât escape using sampled data and itâs hard to escape using monotonic gauges, both of which place a minimum complexity on any query.
âOlderâ sytems like InfluxDB, TimescaleDB, or even âancientâ systems like Graphite make querying data far easier and can be a lot more flexible around the type of data being stored. You have the option to use same data the systems being monitored generated, and arenât forced into using monotonic gauges as the query engine can operate non-sampled data. So the data is structured in a way thatâs far easier to query in a lot of cases, and (moving entirely into the world of opinions), they have query languages that are a lot easier for engineers to use and understand.
I like Prometheus a lot, but everytime Iâve worked for a company using it theyâve failed to understand the trade-offs Prometheus and itâs design makes, leading to a lot of frustration with the resulting monitoring stack.
What has helped me when learning PromQL has been to think about it as a query language for filtering instead of something similar to SQL. You are always filtering down to get the subset of metrics you are interested in and then aggregating those into something to graph with some function (sum, avg, etc.).
I agree that to fully understand a query youâll need to grasp more details that are not immediately âbubbled upâ to the user: scrape interval, type of the metric (counter, gauge, histogram), query look back period, etc.
Iâm always uneasy when reading articles like âSQL is overengineeredâ, âgit is complicatedâ, â(some other core technology) is hardâ.
Especially with Prometheus query, I know Iâm repeating myself but I think that PromQL, like SQL, Git, IP, PKCS, ⌠is part of the software engineer toolbox. There should be no corner cutting, IMHO. The technology should be learned and mastered by anybody who want to qualify as a software âcraftperson.â Iâm more and more sadden at the lowering of the standard of my profession⌠But I might just have become an old man⌠Maybe you shouldnât listen to me.
Iâm fine with learning difficult technologies, but PromQL just seems poorly designed. Every time I touch it and try to do something well within the purview of what a time series database ought to be able to do, it seems there isnât a good way to express that in PromQLâIâll ask the PromQL gurus in my organization and theyâll mull it over for a few hours, trying different things, and ultimately conclude that hacky workarounds are the best case. Unfortunately itâs been a couple of years since I dealt with it and I donât remember the details, but PromQL always struck me as uniquely bad even worse than git.
Similarly, the idea that software craftsmen need to settle for abysmal toolsâeven if theyâre best in class todayâmakes me sad. Whatâs the point of software craftsmanship if not making things better?
Every time I touch [Prometheus] and try to do something well within the purview of what a time series database ought to be able to doâŚ
One big conceptual thing about Prometheus is that it isnât really a time series database. Itâs a tool for ingesting and querying real-time telemetry data from a fleet of services. It uses a (bespoke and very narrowly scoped) time series database under the hood, yes â edit: and PromQL has many similarities with TSDB query languages â but these are implementation details.
If you think of Prometheus as a general-purpose TSDB then youâre always going to end up pretty frustrated.
Could you expand on that more? Iâm curious what features/aspects of a general TSDB youâre referring to Prometheus lacking. (This is a curiosity coming from someone with no experience with other TSDBs)
Itâs not that Prometheus lacks any particular TSDB feature, because Prometheus isnât a (general-purpose) TSDB. Prometheus is a system for ingesting and querying real-time operational telemetry from a fleet of [production] services. Thatâs a much narrower use case, at a higher level of abstraction than a TSDB. PromQL reflects that design intent.
I mean, Iâm using it for telemetry data specifically. My bit about âordinary time series queriesâ was mostly intended to mean Iâm not doing weird high-cardinality shit or anything Prom shouldnât reasonably be able to handle. Iâm not doing general purpose TS stuff.
Gotcha. Iâd be curious to hear a few examples of what you mean, just to better understand where youâre coming from. Personally, Iâm also (sometimes) frustrated by my inability to express a concept in PromQL. In particular, I feel like joining different time series on common labels should be easier than it is. But itâs not (yet) gotten to the point that I consider PromQL to be poorly designed.
Yeah, unfortunately itâs been a while and Iâve forgotten all of the good examples. :/ Poorly designed feels harsh, but suffice it to say I donât feel like itâs clicked and it seems like itâs a lot more complicated than it should be.
Iâve wondered about this as well â how much of the difficulty has to do with a) working with time series b) PromQL syntax c) not knowing what metrics would actually be helpful for answering a given situation d) statistics are hard if youâre not familiar or e) a combination of the above.
Iâm curious if folks that have used something like TimescaleDB, which I believe uses a more SQL-flavored query syntax, have had a very different experience.
In my experience, itâs been a combination of everything youâve listed, with the addition of (at least my) teams not always being good about instrumenting our applications beyond the typical RED metrics.
I canât speak for TimescaleDB, but my team uses AWS Timestream for some of our data and itâs pretty similar as far as I can tell. Timestreamâs more SQL-like syntax makes it both easier and harder to write queries, Iâve found. On the one hand, itâs great because I grok SQL and can write queries pretty quickly, but on the other hand I can start expecting it to behave like a relational database if Iâm not careful. Iâd almost rather just use PromQL or something like it to create that mental separation of behaviors.
Iâm more and more sadden at the lowering of the standard of my profession
I see the reverse. Being willing to accept poor-quality UIs is a sign of low standards in a profession. Few other skilled trades or professions [1] contain people using poorly designed tools and regard using them as a matter of pride. Sometimes you have to put up with a poorly designed tool because there isnât a better alternative but that doesnât mean that you should accept it: you should point out its flaws and encourage improvement. Even very simple tools have improved a lot over the last few decades. If I compare a modern hammer to the one my father had when I was a child, for example, mine is better in several obvious ways:
The grip is contoured to fit my hand better,
The head and handle are now a single piece of metal so you donât risk the head flying off when the wood ages (as happened with one of his).
The nail remover is a better shape and actually works.
If carpenters had had your attitude then this wouldnât have happened: a mediocre hammer is a technology that should be learned and mastered by anybody who wants to qualify as a âcraftpersonâ. My hammer is better than my fatherâs hammer in all of these ways and it was cheap because people overwhelmingly bought the better one in preference.
Some things are intrinsically hard. Understanding the underlying model behind a distributed revision control system is non-trivial. If you want to use such a tool effectively, you must acquire this understanding. This is an unavoidable part of any solution in the problem space (though you can avoid it if you just want to the tool in the simplest way).
Other things are needlessly hard. The fact that implementation details of git leak into the UI and the UI is inconsistent between commands are both problems that are unique to git and not to the problem space.
As an industry, we have a long history of putting up with absolutely awful tools. Thatâs not the attitude of a skilled craft.
[1] Medicine is the only one that springs to mind and thatâs largely due to regulators putting totally wrong incentives in place.
I agree with you, although I think itâs worth calling out that git has at least tried to address the glaring problems with its UI. PromQL has remained stubbornly terrible since I first encountered it, and I donât think itâs just a question of design constraints. All the Prometheus-related things are missing what I consider to be fairly basic quality-of-life improvements (like allowing you to name a subexpression instead of repeating it 3 times).
Maybe PromQL also has limitations derived from its limited scope, but frankly I think that argument is⌠questionable. (It doesnât help that the author of this article hasnât really identified the problems very effectively, IMO.) The times Iâve resorted to terrible hacks in Prometheus I donât think I was abusing it at all. Prometheus is actively, heavily, some might say oppressively marketed at tech people to do their operational monitoring stuff. But on the surface itâs incapable of anything beyond the utterly trivial, and in the hands of an expert itâs capable of doing a handful of things that are merely fairly simple, usually with O(lots) performance because youâre running a range subquery for every point in your original query.
As an aside, I think the relentless complaining about gitâs model being hard to understand is not helping in either dimension. Saying âDVCS is too hard, letâs reinvent svnâ doesnât stop DVCS being useful, but it makes people scared to learn it, and it probably makes other people think that trying to improve git is pointless, too.
This is a very interesting point. I hear you in the general case (and Iâll also say that actually working more with PromQL has given me a lot of respect for it).
I think itâs easier to make that argument for tools that people use on a daily or at least very regular basis. Depending on the stage of company youâre at, to what extent your job involves routinely investigating incidents, etc, PromQL may be something you reach for more or less frequently. Itâs quite a different paradigm than a lot of other programming tools, so it makes sense to me that engineers who are new to it or donât use it frequently would have a hard time. Also, speaking as someone who learned it pretty recently, the materials for learning it and trying to get to a deeper level of understanding of what you can and might want to do with it areâŚsparse.
I think you nailed it - in many cases you donât touch Prometheus until youâre investigating some issue and thatâs often when itâs urgent, and doing so using an unfamiliar query language is a recipe for pain. Of course, you could set aside some time to learn it, but if a lot of time passes until you need it again, those skills will have faded again.
git is hard to learn compared to some of its competitors but has become ubiquitous enough, and once you start using it daily you will learn it properly in no time. Learning additional stuff about it becomes easier too once you have a good foundation and it will stick around better, as well. For SQL Iâd argue the same - at uni I almost flunked my SQL course due to simply not grokking it, but Iâve worked with it so much that Iâm currently one of the companyâs SQL experts.
(Maybe off-topic) I almost reported it as spam. As it reads like native advertising. All of this supposed complexity is described, to then the article concludes on:
We kept hearing engineers say that âqueries are hardâ while working on collaborative notebooks for DevOps and Site Reliability Engineers at Fiberplane. This is what motivated us to create the open source Autometrics project.
Autometrics builds on top of existing Prometheus and OpenTelemetry client libraries and makes it trivial to instrument functions in your code with the most useful metrics: request rate, error rate, and latency.
I didnât flag the story as âspamâ because at the end autometrics.dev is just a fully opensource product, with no pricing/open-core/⌠And the company behind it âFiberplaneâ is only subtly linked to the project. So I guess itâs fine, but the blog post feels weirdâŚ
Thanks for the heads up â and for not flagging it as spam. I thought it was okay to plug the open source project, but Iâll be more mindful of doing that in the future.
Itâs a fine line and everyone has their own sense of where the line is. The article seemed good to me, as someone who hasnât really used Prometheus, to overview of how it works and why, but maybe someone who already knew it all might feel differently.
As soon as you want to look up A from B and B from A it can get rough, but still usable. Once you want ranges or sums or whatever youâre sunk of course.
From my read of the docs, it seems like you can do some amount of looking up multiple things and keeping them consistent, though it seems like it would get pretty annoying if you had any amount of contention on a specific key.
Good point on ranges and sums. That makes me wonder even more what kinds of applications you can build where you wouldnât want those things đ¤
One that scales? AIUI to scale a SQL db you end up sharding and denormalizing your data until you canât make complex queries anyway.
Maybe itâs better to begin development with a more flexible, less scalable db and then once youâve locked down your design work out how to make your db scale, or maybe itâs better to start with the constraints of a scalable db. It probably dependsâŚ
It very much depends on the shape of your data and the amount of âscaleâ. You can get very very big with âjustâ postgresql. If you do need to shard, your data shape might be such that you can still do all the complex queries that matter.
Sometimes youâre so big and have such an unshardable data shape that this is bad. By then youâre probably running into all kinds of other limits and need a custom engineered solution.
I havenât dug into the docs, but I think itâs likely they do the typical NoSQL document database things, a la CouchDB or Firebase. If so, then you give the object with key A a property âbâ containing the key of B, and you create an index on property âbâ that lets you look up A given B. There would likely also be an API on indexes to iterate ranges, which makes sums easy.
This looks quite useful! I recently instrumented a codebase with something similar (though not codegen, just some context-based spans in hot paths) and it took a while. I like the editor integration, though it would be nice if the hover even showed some live data!
Thatâs in the works! Weâre looking into building up a little library that can render the raw Prometheus data so we can include it in the VS Code extension.
The tedium you describe is definitely one of the things that motivated this project. So many people end up writing basically the same instrumentation code. And, by standardizing the metrics, we can share a lot more across teams.
Out of curiosity, what language is that code base in? And were you setting up metrics, traces, or both?
Also I peeked at your profile and saw you mainly do Go. Two of my colleagues worked on a first version of a Go implementation but weâre definitely not experts. Weâd love to hear any feedback you might have on https://github.com/autometrics-dev/autometrics-go!
Hi all, I wrote this up after reading a post from Cloudflare about running Prometheus at scale. They mentioned a couple of challenges related to the full life cycle of metrics, which perfectly fit the problems that motivated us to start working on the Autometrics project.
I tried to lay out the case for why I think function-level metrics are so useful here. Iâd especially love to hear what folks think who have experience setting up observability & metrics. Do the ideas resonate? Still skeptical?
Nushell seems like a very cool project. However, when I tried it out, I didnât feel like my shell use cases really overlapped with those where it shines.
Has anyone found nushell super helpful for something they were trying to do? If so, what was your use case?
I donât use it as a real shell but as a scripting tool, my use cases were:
I had a text file with lines that had the template: âlast access date | directory pathâ, and I wanted to sort the lines not by the last access dates but by the paths.
I wrote a small script that checked memory usage of Firefox, and showed a OS notification (with notifu) if it was using more than 2 gigabytes of memory.
Cool idea. You may find this coupling too tight. I would be interested in a retrospective update in a year or two, letting us know about the experience.
Sure. Consider what happens when a service is designed to be instantiated many times by many different teams. Each team may have their own distinct monitoring configuration. Different teams may expect different levels of service from each instance.
Hmm. Iâm curious how common it is that youâd have a service instantiated and run by different teams (as opposed to a team being responsible for it as a microservice and exposing the functionality to other teams via APIs). Have you seen that kind of setup in practice? If so, Iâd be very interested to hear what kind of service it was.
If a team did have this setup and wanted to define different SLOs for the different instances, you could set up the code to use a Rust feature flag or build time environment variable to change the objective targets. Itâs definitely a little clunky but my first reaction is that this seems like a somewhat niche caseâŚ
Consider e.g. Prometheus! Anyway, itâs fine to find my point irrelevant. This is only a problem for popular services, and a service wholly owned and operated by a single team will likely not notice.
Fair point. Youâre right that this approach wouldnât really make sense for that type of service thatâs being operated by many different teams with different performance expectations. Thanks for pointing that out! (Useful to find the limitations of the approach :) )
Iâve been working with a mid-sized non-profit in Czechia for like 5 years now. Since the non-profit has limited IT resources we have to balance the Devops workload and the Development workload. Right now there is one part time developer, one wordpress form filler, and two outside consultants (a CSS guy and me). When we moved to k8s on Digital Ocean from Heroku & EC2 I was working there part time and lead the transition. The transition was due to several factors:
Heroku is bloody expensive when you have lots of small projects or even just one larger project. Financially, it made sense to move away from Heroku. We also had the trouble that Heroku had stopped maintaining their python build packs years ago and we were running into really weird and ancient bugs in pipenv. Basically we were âpaying them to manage the platformâ but they werenât actually doing that. Finally, when Heroku moved under salesforce they changed the service agreement. The new agreement forbids reselling of Heroku services. I wrote them an email asking if we were allowed to host other non-profitâs web services and forward the bill, and they said âNoâ. Since we were doing that, it was a problem.
On the EC2 side, we have one large project dopracenakole.cz. Itâs a Django application with a pretty high peek workload and very poor optimization. It gets around 5-10k concurrent visitors around lunch time during the main event. We donât have much time to optimize it, remember, we have low IT resources. So we just throw compute at it. Managing auto-scaling groups on EC2 and deploying new versions was a pain. It also wasnât free.
So we moved to k8s. Gitops to be exact. You can find the full configuration for the organizations IT infrastructure on github (except for the secrets). This reduced costs for the services on Heroku while making it much easier to deploy do-prace-na-kole. We can now do zero downtime upgrades with a commit like this:
We can revert any time and run a parallel staging environment at NO EXTRA FINANCIAL COST save a few hundred megs of memory. Leveraging the excellent ytt templating language, we can keep the configurations for the staging environment perfectly in sync with production.
Last night we decided to rebuild the entire cluster and move to the caddy ingress server. We did this due to the circleci security breach. For some strange reason we were having trouble revoking the cluster certificate. Our new method of setting up the cluster certificate should be better.
Rebuilding the entire cluster along with the transition to caddy took us just 5 hours. If for financial or other reasons we wanted to move to another provider, it shouldnât take us much longer. Indeed, it should be even faster, since we spent most of that time setting up caddy. Due to using a unified ingress, such a transition can be done with zero downtime, simply by putting up two parallel clusters and switching the DNS. Go ahead and look at all the web services we are providing and tell me that such a transition in such a short time isnât incredible.
Why did we choose managed k8s over teraform?
Easier scaling
Low IT resources
Not much more expensive on Digital Ocean
Overall weâve had a mixed experience with our choice of k8s. We had a lot of trouble with cert-manager on subdomains. It is just a broken piece of technology with lots of weird race conditions. We hope that caddy will better in this regard. It has been expensive (money wise), and has a huge learning curve. That said, most of the time it is âset it and forget itâ infrastructure. It really doesnât take much of our time at all and we are happy with it. It is extremely easy and cheep to set up new services, and provides us with the ability to set up staging environments at zero cost and very little effort. Finally, weâve gotten really good support from Digital Ocean. Every time we write them they respond within hours and Iâm always shocked to find that the support personnel are actually quite technically proficient and are able to help me in most cases.
Edit: Iâd like to add that a lot of our stuff was running on a VPS and assumed file system access. We had to re-write this when moving to k8s. We ran into loads of bugs working with PVCs (sometimes the service would restart and the PVC would be detached or stuck attached to another node leading to crashes and downtime). We fully transitioned all services that work with the file system to work with S3 compatible file shares. This was a PITA but its just the way of the world. If you want stateless, you canât have state!
BTW: Anyone know what the point of HELM is? It feels to me that it is taking infrastructure as code and putting an imperative utility (helm) in front of it⌠Seems weird. I thought the whole point of k8s was to have a declarative configuration language that you keep in git.
Even without Helm, itâs easy to deploy resources on Kubernetes and even template different versions of resources.
Helm lets you create one big bundle of resources and quickly create and delete them. Itâs fantastic for quickly iterating (so you donât end up with tons of random junk polluting your cluster as you develop). Itâs great for delivering products or more complex open source tools that have a lot of parts that need to all be deployed together. Itâs probably unnecessary if youâre not doing that stuff.
There are several systems that allow declarative Helm charts, like k3s (Rancher) and Terraform. Think of Helm as a parameterized description of Kubernetes objects; it lives in the same space as Kustomize.
And yes, Helm is not good. But configuring e.g. a Prometheus stack via Helm is easier than doing it by hand.
Whenever the design of the service requires something beyond what Fly or Heroku offers. Interestingly, this is a harder question to answer than five years ago, because Fly and similar services have not stopped improving. Nonetheless, thereâs a few standard situations:
Dynamic creation of Pods for background computation
Vertical autoscaling (admittedly, this is messy on any platform)
Fancy horizontal autoscaling, depending on platform
Multiple teams on one cluster, using namespaces for isolation
Service discovery, dependency injection, etc. at a fast-growing company
Of course, if you think that you can just run a single container with a managed lifecycle, then a serverless solution makes a lot of sense. They are great as long as your needs are wholly within their featureset.
For sure! I personally have chosen Kubernetes over Heroku, Fly, etc.; both in professional situations and also when managing my own small business, and the multi-cloud portability is a real benefit. I remember doing analyses like the one in the article, comparing the cost of clusters and measuring various metrics in order to decide where to host my clusters.
Rustâs reference model is fairly simple. Borrowers can have as many shared reference to something as they need, but there can only be a single exclusive reference at a time. Otherwise, you could have many callers trying to modify a value at the same time. If many borrowers could also hold exclusive references, you risk undefined behavior, which safe Rust makes impossible.
Calling &mut âexclusiveâ references would have saved me some time while learning Rust.
This seems like such a helpful point for new Rust developers. Borrowing and ownership is always the thing that trips people up first. Iâm curious what folks who are less familiar with Rust make of this way of thinking about it.
A couple of years ago I was working on a project where I wanted to write more complicated Redis scripts but found the experience quite painful because of the lack of types.
I want to do more things in typescript either on client via Solid or inside Deno running locally or on the edge. So want to learn those very well.
Recently fell in love with Chiselstrike and EdgeDB so want to learn them well too. Will probably settle on Chiselstrike as EdgeDB is still maturing. Chiselstrike abstracts well over SQLite or Postgres.
If you wouldnât mind sharing, Iâm curious what you loved about ChiselStrike and EdgeDB. I havenât tried either but I used Supabase for a side project and have been looking at Convex but it seems like there are suddenly a ton of different backend as a service providers.
Also re: Rust, if you have Rust Analyzer you shouldnât have to wait for the code to fully compile while youâre developing.
I wanted to have some way to have a declarative schema. So I donât have to write migrations manually and things like this. From this, Prisma was one option but I found EdgeDB to have nicer way to define things.
And ChiselStrike is nice as it gives me endpoints. I am still exploring it.
They take up a lot of space in a URL: api.planetscale.com/v1/deploy-requests/7cb776c5-8c12-4b1a-84aa-9941b815d873. Try double clicking on that ID to select and copy it. You canât. The browser interprets it as 5 different words. It may seem minor, but to build a product that developers love to use, we need to care about details like these.
UUIDs are 128 bit numbers; the 7cb776c5-8c12-4b1a-84aa-9941b815d873 form is one of many possible encodings. Youâre not beholden to it! Iâve become quite fond of just plain old hex encoding, for example.
Small note: the backstory says it was written in 2024 (so this project must be from the future đ) https://nuejs.org/backstory/
LOL. Fixed. Thanks!
I think Nushell seems cool, and Iâve had the thought of switching to it a couple of times.
The main blocker for me was that I just donât do that much in my terminal aside from moving around and running individual commands with their arguments. All the data processing capabilities sound cool, but just not part of what I find myself needing. On the other hand, sometimes I copy a command from somewhere or run a curl > sh install script (I know, the horror!), and the fact that that doesnât work with nu was a knock against it.
Now, I may be outside of the target audience for Nushell as a backend engineer that doesnât do a lot of data/local scripting stuff. But I have a bit of a hard time imagining who the ideal target user isâŚ
Sysadmins, people who write scripts, shell enthusiasts.
Could someone explain in simple language what the idea behind generational indicies is?
Itâs a way of detecting dangling references to slab-like allocations in runtime.
It exists purely in runtime, it requires each allocation to have a generation index, it requires every reference to carry that index.
When a reference is dereferenced, indices must be checked in runtime, if the allocation does not have the generation of the reference, the caller must handle the situation in runtime.
Unlike RC, this does not give any static guarantees about validity of the allocation. It might be slightly faster when there are lots of copying of references and few accesses to their allocations.
Generational references is basically a term invented by the Vale author for a software version of ARM memory tagging.
From the readme of one I wrote for myself a while back:
Irrespective of whether theyâre hard to learn, Iâd agree that theyâre a useful tool worth learning at least this subset of.
Iâd also highly recommend https://regexr.com/. I always use it when Iâm trying to build up and kind of more complicated pattern, because it makes it easy to visualize how the parts of the pattern match text and it letâs you put as many example strings as youâd like to test against. Their built in cheat sheet is also very handy.
Iâd pretty much given up on learning regular expressions until I found some interactive, visual tools. In the early 2000s, it was RegexBuddy. The one Iâve been using for the last decade or so is regexplained.co.uk. Seeing a regular expression as a railroad diagram as you build it really helps not only to understand a specific regular expression but to learn them in general.
Prometheus queries are hard because itâs design priorities scale (via very effecient data collection) over querying data. In normal usage of Prometheus you canât escape using sampled data and itâs hard to escape using monotonic gauges, both of which place a minimum complexity on any query.
âOlderâ sytems like InfluxDB, TimescaleDB, or even âancientâ systems like Graphite make querying data far easier and can be a lot more flexible around the type of data being stored. You have the option to use same data the systems being monitored generated, and arenât forced into using monotonic gauges as the query engine can operate non-sampled data. So the data is structured in a way thatâs far easier to query in a lot of cases, and (moving entirely into the world of opinions), they have query languages that are a lot easier for engineers to use and understand.
I like Prometheus a lot, but everytime Iâve worked for a company using it theyâve failed to understand the trade-offs Prometheus and itâs design makes, leading to a lot of frustration with the resulting monitoring stack.
Mm thatâs a great point. Hadnât thought about the difficulty of grokking queries using monotonic counters coming from that design tradeoff.
What has helped me when learning PromQL has been to think about it as a query language for filtering instead of something similar to SQL. You are always filtering down to get the subset of metrics you are interested in and then aggregating those into something to graph with some function (sum, avg, etc.).
I agree that to fully understand a query youâll need to grasp more details that are not immediately âbubbled upâ to the user: scrape interval, type of the metric (counter, gauge, histogram), query look back period, etc.
Agreed. This blog post, and specifically his illustrations of how grouping labels works, helped me grok the point that youâre always filtering down first: https://iximiuz.com/en/posts/prometheus-vector-matching/
Iâm always uneasy when reading articles like âSQL is overengineeredâ, âgit is complicatedâ, â(some other core technology) is hardâ.
Especially with Prometheus query, I know Iâm repeating myself but I think that PromQL, like SQL, Git, IP, PKCS, ⌠is part of the software engineer toolbox. There should be no corner cutting, IMHO. The technology should be learned and mastered by anybody who want to qualify as a software âcraftperson.â Iâm more and more sadden at the lowering of the standard of my profession⌠But I might just have become an old man⌠Maybe you shouldnât listen to me.
Iâm fine with learning difficult technologies, but PromQL just seems poorly designed. Every time I touch it and try to do something well within the purview of what a time series database ought to be able to do, it seems there isnât a good way to express that in PromQLâIâll ask the PromQL gurus in my organization and theyâll mull it over for a few hours, trying different things, and ultimately conclude that hacky workarounds are the best case. Unfortunately itâs been a couple of years since I dealt with it and I donât remember the details, but PromQL always struck me as uniquely bad even worse than git.
Similarly, the idea that software craftsmen need to settle for abysmal toolsâeven if theyâre best in class todayâmakes me sad. Whatâs the point of software craftsmanship if not making things better?
One big conceptual thing about Prometheus is that it isnât really a time series database. Itâs a tool for ingesting and querying real-time telemetry data from a fleet of services. It uses a (bespoke and very narrowly scoped) time series database under the hood, yes â edit: and PromQL has many similarities with TSDB query languages â but these are implementation details.
If you think of Prometheus as a general-purpose TSDB then youâre always going to end up pretty frustrated.
Could you expand on that more? Iâm curious what features/aspects of a general TSDB youâre referring to Prometheus lacking. (This is a curiosity coming from someone with no experience with other TSDBs)
Itâs not that Prometheus lacks any particular TSDB feature, because Prometheus isnât a (general-purpose) TSDB. Prometheus is a system for ingesting and querying real-time operational telemetry from a fleet of [production] services. Thatâs a much narrower use case, at a higher level of abstraction than a TSDB. PromQL reflects that design intent.
I mean, Iâm using it for telemetry data specifically. My bit about âordinary time series queriesâ was mostly intended to mean Iâm not doing weird high-cardinality shit or anything Prom shouldnât reasonably be able to handle. Iâm not doing general purpose TS stuff.
Gotcha. Iâd be curious to hear a few examples of what you mean, just to better understand where youâre coming from. Personally, Iâm also (sometimes) frustrated by my inability to express a concept in PromQL. In particular, I feel like joining different time series on common labels should be easier than it is. But itâs not (yet) gotten to the point that I consider PromQL to be poorly designed.
Yeah, unfortunately itâs been a while and Iâve forgotten all of the good examples. :/ Poorly designed feels harsh, but suffice it to say I donât feel like itâs clicked and it seems like itâs a lot more complicated than it should be.
Iâve wondered about this as well â how much of the difficulty has to do with a) working with time series b) PromQL syntax c) not knowing what metrics would actually be helpful for answering a given situation d) statistics are hard if youâre not familiar or e) a combination of the above.
Iâm curious if folks that have used something like TimescaleDB, which I believe uses a more SQL-flavored query syntax, have had a very different experience.
In my experience, itâs been a combination of everything youâve listed, with the addition of (at least my) teams not always being good about instrumenting our applications beyond the typical RED metrics.
I canât speak for TimescaleDB, but my team uses AWS Timestream for some of our data and itâs pretty similar as far as I can tell. Timestreamâs more SQL-like syntax makes it both easier and harder to write queries, Iâve found. On the one hand, itâs great because I grok SQL and can write queries pretty quickly, but on the other hand I can start expecting it to behave like a relational database if Iâm not careful. Iâd almost rather just use PromQL or something like it to create that mental separation of behaviors.
I see the reverse. Being willing to accept poor-quality UIs is a sign of low standards in a profession. Few other skilled trades or professions [1] contain people using poorly designed tools and regard using them as a matter of pride. Sometimes you have to put up with a poorly designed tool because there isnât a better alternative but that doesnât mean that you should accept it: you should point out its flaws and encourage improvement. Even very simple tools have improved a lot over the last few decades. If I compare a modern hammer to the one my father had when I was a child, for example, mine is better in several obvious ways:
If carpenters had had your attitude then this wouldnât have happened: a mediocre hammer is a technology that should be learned and mastered by anybody who wants to qualify as a âcraftpersonâ. My hammer is better than my fatherâs hammer in all of these ways and it was cheap because people overwhelmingly bought the better one in preference.
Some things are intrinsically hard. Understanding the underlying model behind a distributed revision control system is non-trivial. If you want to use such a tool effectively, you must acquire this understanding. This is an unavoidable part of any solution in the problem space (though you can avoid it if you just want to the tool in the simplest way).
Other things are needlessly hard. The fact that implementation details of git leak into the UI and the UI is inconsistent between commands are both problems that are unique to git and not to the problem space.
As an industry, we have a long history of putting up with absolutely awful tools. Thatâs not the attitude of a skilled craft.
[1] Medicine is the only one that springs to mind and thatâs largely due to regulators putting totally wrong incentives in place.
I agree with you, although I think itâs worth calling out that git has at least tried to address the glaring problems with its UI. PromQL has remained stubbornly terrible since I first encountered it, and I donât think itâs just a question of design constraints. All the Prometheus-related things are missing what I consider to be fairly basic quality-of-life improvements (like allowing you to name a subexpression instead of repeating it 3 times).
Maybe PromQL also has limitations derived from its limited scope, but frankly I think that argument is⌠questionable. (It doesnât help that the author of this article hasnât really identified the problems very effectively, IMO.) The times Iâve resorted to terrible hacks in Prometheus I donât think I was abusing it at all. Prometheus is actively, heavily, some might say oppressively marketed at tech people to do their operational monitoring stuff. But on the surface itâs incapable of anything beyond the utterly trivial, and in the hands of an expert itâs capable of doing a handful of things that are merely fairly simple, usually with O(lots) performance because youâre running a range subquery for every point in your original query.
As an aside, I think the relentless complaining about gitâs model being hard to understand is not helping in either dimension. Saying âDVCS is too hard, letâs reinvent svnâ doesnât stop DVCS being useful, but it makes people scared to learn it, and it probably makes other people think that trying to improve git is pointless, too.
This is a very interesting point. I hear you in the general case (and Iâll also say that actually working more with PromQL has given me a lot of respect for it).
I think itâs easier to make that argument for tools that people use on a daily or at least very regular basis. Depending on the stage of company youâre at, to what extent your job involves routinely investigating incidents, etc, PromQL may be something you reach for more or less frequently. Itâs quite a different paradigm than a lot of other programming tools, so it makes sense to me that engineers who are new to it or donât use it frequently would have a hard time. Also, speaking as someone who learned it pretty recently, the materials for learning it and trying to get to a deeper level of understanding of what you can and might want to do with it areâŚsparse.
I think you nailed it - in many cases you donât touch Prometheus until youâre investigating some issue and thatâs often when itâs urgent, and doing so using an unfamiliar query language is a recipe for pain. Of course, you could set aside some time to learn it, but if a lot of time passes until you need it again, those skills will have faded again.
git is hard to learn compared to some of its competitors but has become ubiquitous enough, and once you start using it daily you will learn it properly in no time. Learning additional stuff about it becomes easier too once you have a good foundation and it will stick around better, as well. For SQL Iâd argue the same - at uni I almost flunked my SQL course due to simply not grokking it, but Iâve worked with it so much that Iâm currently one of the companyâs SQL experts.
(Maybe off-topic) I almost reported it as spam. As it reads like native advertising. All of this supposed complexity is described, to then the article concludes on:
I didnât flag the story as âspamâ because at the end autometrics.dev is just a fully opensource product, with no pricing/open-core/⌠And the company behind it âFiberplaneâ is only subtly linked to the project. So I guess itâs fine, but the blog post feels weirdâŚ
Thanks for the heads up â and for not flagging it as spam. I thought it was okay to plug the open source project, but Iâll be more mindful of doing that in the future.
Itâs a fine line and everyone has their own sense of where the line is. The article seemed good to me, as someone who hasnât really used Prometheus, to overview of how it works and why, but maybe someone who already knew it all might feel differently.
This seems pretty neat.
I wonder how far you can get before youâd start feeling like you need a full-blown SQL database.
Does anyone have experience using a persistence layer along these lines? Did you run into any issues?
As soon as you want to look up A from B and B from A it can get rough, but still usable. Once you want ranges or sums or whatever youâre sunk of course.
From my read of the docs, it seems like you can do some amount of looking up multiple things and keeping them consistent, though it seems like it would get pretty annoying if you had any amount of contention on a specific key.
Good point on ranges and sums. That makes me wonder even more what kinds of applications you can build where you wouldnât want those things đ¤
One that scales? AIUI to scale a SQL db you end up sharding and denormalizing your data until you canât make complex queries anyway.
Maybe itâs better to begin development with a more flexible, less scalable db and then once youâve locked down your design work out how to make your db scale, or maybe itâs better to start with the constraints of a scalable db. It probably dependsâŚ
It very much depends on the shape of your data and the amount of âscaleâ. You can get very very big with âjustâ postgresql. If you do need to shard, your data shape might be such that you can still do all the complex queries that matter.
Sometimes youâre so big and have such an unshardable data shape that this is bad. By then youâre probably running into all kinds of other limits and need a custom engineered solution.
I havenât dug into the docs, but I think itâs likely they do the typical NoSQL document database things, a la CouchDB or Firebase. If so, then you give the object with key A a property âbâ containing the key of B, and you create an index on property âbâ that lets you look up A given B. There would likely also be an API on indexes to iterate ranges, which makes sums easy.
Keys are ordered lexicographically and you can query for a range of keys: https://deno.land/api@v1.33.1?unstable&s=Deno.Kv#method_list_0
Iâm disappointed there arenât secondary index APIs, but it wouldnât be too bad to manage those manually with atomic operations.
This looks quite useful! I recently instrumented a codebase with something similar (though not codegen, just some context-based spans in hot paths) and it took a while. I like the editor integration, though it would be nice if the hover even showed some live data!
Thatâs in the works! Weâre looking into building up a little library that can render the raw Prometheus data so we can include it in the VS Code extension.
The tedium you describe is definitely one of the things that motivated this project. So many people end up writing basically the same instrumentation code. And, by standardizing the metrics, we can share a lot more across teams.
Out of curiosity, what language is that code base in? And were you setting up metrics, traces, or both?
Also I peeked at your profile and saw you mainly do Go. Two of my colleagues worked on a first version of a Go implementation but weâre definitely not experts. Weâd love to hear any feedback you might have on https://github.com/autometrics-dev/autometrics-go!
Hi all, I wrote this up after reading a post from Cloudflare about running Prometheus at scale. They mentioned a couple of challenges related to the full life cycle of metrics, which perfectly fit the problems that motivated us to start working on the Autometrics project.
I tried to lay out the case for why I think function-level metrics are so useful here. Iâd especially love to hear what folks think who have experience setting up observability & metrics. Do the ideas resonate? Still skeptical?
Nushell seems like a very cool project. However, when I tried it out, I didnât feel like my shell use cases really overlapped with those where it shines.
Has anyone found nushell super helpful for something they were trying to do? If so, what was your use case?
I donât use it as a real shell but as a scripting tool, my use cases were:
I got a bit obsessed with golfing the powershell script, and hereâs what I got:
Still not competitive with grepping or
jq
but at least it beats the nushell!Cool idea. You may find this coupling too tight. I would be interested in a retrospective update in a year or two, letting us know about the experience.
Interesting â would you mind expanding on the concern youâd have?
Sure. Consider what happens when a service is designed to be instantiated many times by many different teams. Each team may have their own distinct monitoring configuration. Different teams may expect different levels of service from each instance.
Hmm. Iâm curious how common it is that youâd have a service instantiated and run by different teams (as opposed to a team being responsible for it as a microservice and exposing the functionality to other teams via APIs). Have you seen that kind of setup in practice? If so, Iâd be very interested to hear what kind of service it was.
If a team did have this setup and wanted to define different SLOs for the different instances, you could set up the code to use a Rust feature flag or build time environment variable to change the objective targets. Itâs definitely a little clunky but my first reaction is that this seems like a somewhat niche caseâŚ
Consider e.g. Prometheus! Anyway, itâs fine to find my point irrelevant. This is only a problem for popular services, and a service wholly owned and operated by a single team will likely not notice.
Fair point. Youâre right that this approach wouldnât really make sense for that type of service thatâs being operated by many different teams with different performance expectations. Thanks for pointing that out! (Useful to find the limitations of the approach :) )
(Library author here: happy to answer any questions you might have! Also would love to hear your feedback)
Other languages are coming soon
When would you opt for using a managed k8s provider versus an even more managed platform like fly.io?
Hi,
Iâve been working with a mid-sized non-profit in Czechia for like 5 years now. Since the non-profit has limited IT resources we have to balance the Devops workload and the Development workload. Right now there is one part time developer, one wordpress form filler, and two outside consultants (a CSS guy and me). When we moved to k8s on Digital Ocean from Heroku & EC2 I was working there part time and lead the transition. The transition was due to several factors:
Heroku is bloody expensive when you have lots of small projects or even just one larger project. Financially, it made sense to move away from Heroku. We also had the trouble that Heroku had stopped maintaining their python build packs years ago and we were running into really weird and ancient bugs in pipenv. Basically we were âpaying them to manage the platformâ but they werenât actually doing that. Finally, when Heroku moved under salesforce they changed the service agreement. The new agreement forbids reselling of Heroku services. I wrote them an email asking if we were allowed to host other non-profitâs web services and forward the bill, and they said âNoâ. Since we were doing that, it was a problem.
On the EC2 side, we have one large project dopracenakole.cz. Itâs a Django application with a pretty high peek workload and very poor optimization. It gets around 5-10k concurrent visitors around lunch time during the main event. We donât have much time to optimize it, remember, we have low IT resources. So we just throw compute at it. Managing auto-scaling groups on EC2 and deploying new versions was a pain. It also wasnât free.
So we moved to k8s. Gitops to be exact. You can find the full configuration for the organizations IT infrastructure on github (except for the secrets). This reduced costs for the services on Heroku while making it much easier to deploy do-prace-na-kole. We can now do zero downtime upgrades with a commit like this:
https://github.com/auto-mat/k8s/commit/8eaafc5bb97376bc3c984b7d33d5d53d9942ad92
CI builds the images for us.
We can revert any time and run a parallel staging environment at NO EXTRA FINANCIAL COST save a few hundred megs of memory. Leveraging the excellent ytt templating language, we can keep the configurations for the staging environment perfectly in sync with production.
Last night we decided to rebuild the entire cluster and move to the caddy ingress server. We did this due to the circleci security breach. For some strange reason we were having trouble revoking the cluster certificate. Our new method of setting up the cluster certificate should be better.
Rebuilding the entire cluster along with the transition to caddy took us just 5 hours. If for financial or other reasons we wanted to move to another provider, it shouldnât take us much longer. Indeed, it should be even faster, since we spent most of that time setting up caddy. Due to using a unified ingress, such a transition can be done with zero downtime, simply by putting up two parallel clusters and switching the DNS. Go ahead and look at all the web services we are providing and tell me that such a transition in such a short time isnât incredible.
Why did we choose managed k8s over teraform?
Overall weâve had a mixed experience with our choice of k8s. We had a lot of trouble with cert-manager on subdomains. It is just a broken piece of technology with lots of weird race conditions. We hope that caddy will better in this regard. It has been expensive (money wise), and has a huge learning curve. That said, most of the time it is âset it and forget itâ infrastructure. It really doesnât take much of our time at all and we are happy with it. It is extremely easy and cheep to set up new services, and provides us with the ability to set up staging environments at zero cost and very little effort. Finally, weâve gotten really good support from Digital Ocean. Every time we write them they respond within hours and Iâm always shocked to find that the support personnel are actually quite technically proficient and are able to help me in most cases.
Edit: Iâd like to add that a lot of our stuff was running on a VPS and assumed file system access. We had to re-write this when moving to k8s. We ran into loads of bugs working with PVCs (sometimes the service would restart and the PVC would be detached or stuck attached to another node leading to crashes and downtime). We fully transitioned all services that work with the file system to work with S3 compatible file shares. This was a PITA but its just the way of the world. If you want stateless, you canât have state!
BTW: Anyone know what the point of HELM is? It feels to me that it is taking infrastructure as code and putting an imperative utility (helm) in front of it⌠Seems weird. I thought the whole point of k8s was to have a declarative configuration language that you keep in git.
Even without Helm, itâs easy to deploy resources on Kubernetes and even template different versions of resources.
Helm lets you create one big bundle of resources and quickly create and delete them. Itâs fantastic for quickly iterating (so you donât end up with tons of random junk polluting your cluster as you develop). Itâs great for delivering products or more complex open source tools that have a lot of parts that need to all be deployed together. Itâs probably unnecessary if youâre not doing that stuff.
There are several systems that allow declarative Helm charts, like k3s (Rancher) and Terraform. Think of Helm as a parameterized description of Kubernetes objects; it lives in the same space as Kustomize.
And yes, Helm is not good. But configuring e.g. a Prometheus stack via Helm is easier than doing it by hand.
Whenever the design of the service requires something beyond what Fly or Heroku offers. Interestingly, this is a harder question to answer than five years ago, because Fly and similar services have not stopped improving. Nonetheless, thereâs a few standard situations:
Of course, if you think that you can just run a single container with a managed lifecycle, then a serverless solution makes a lot of sense. They are great as long as your needs are wholly within their featureset.
Donât you end up with vendor lock in with fly though if the configuration is complex? I feel that is a pretty strong argument for k8s over flyâŚ
For sure! I personally have chosen Kubernetes over Heroku, Fly, etc.; both in professional situations and also when managing my own small business, and the multi-cloud portability is a real benefit. I remember doing analyses like the one in the article, comparing the cost of clusters and measuring various metrics in order to decide where to host my clusters.
This seems like such a helpful point for new Rust developers. Borrowing and ownership is always the thing that trips people up first. Iâm curious what folks who are less familiar with Rust make of this way of thinking about it.
A couple of years ago I was working on a project where I wanted to write more complicated Redis scripts but found the experience quite painful because of the lack of types.
If anyone is trying to do something along these lines, Iâd recommend checking out TypeScriptToLua and I wrote up some type definitions for the Redis built-ins here: https://github.com/emschwartz/redis-lua-types
I want to do more things in typescript either on client via Solid or inside Deno running locally or on the edge. So want to learn those very well.
Recently fell in love with Chiselstrike and EdgeDB so want to learn them well too. Will probably settle on Chiselstrike as EdgeDB is still maturing. Chiselstrike abstracts well over SQLite or Postgres.
Also want to build proper iOS native app so SwiftUI. Will go over Point Free and go over Composable Architecture. In future might try learn Expo to build cross platform.
Also want to learn fundamentals of ML/statistics. Heard good things about Probabilistic Machine Learning book.
Since I am building a native app with Tauri, I also will need to learn some Rust this year although I dread its feedback compile loop.
Also if I get to it, doing some local first architecture would be great to study. Some things with CRDTs.
If you wouldnât mind sharing, Iâm curious what you loved about ChiselStrike and EdgeDB. I havenât tried either but I used Supabase for a side project and have been looking at Convex but it seems like there are suddenly a ton of different backend as a service providers.
Also re: Rust, if you have Rust Analyzer you shouldnât have to wait for the code to fully compile while youâre developing.
I wanted to have some way to have a declarative schema. So I donât have to write migrations manually and things like this. From this, Prisma was one option but I found EdgeDB to have nicer way to define things.
And ChiselStrike is nice as it gives me endpoints. I am still exploring it.
prisma is the most infuriating technology I work with day to day. Itâs filled with bad decision after bad decision, avoid at all costs.
Rust procedural macros (using this tutorial https://github.com/dtolnay/proc-macro-workshop)
UUIDs are 128 bit numbers; the
7cb776c5-8c12-4b1a-84aa-9941b815d873
form is one of many possible encodings. Youâre not beholden to it! Iâve become quite fond of just plain old hex encoding, for example.Similarly, my company just base64url-encodes UUIDs. We store UUIDs in the database and encode them whenever we need the string representation.