spoiler

made you look

  • 0 Posts
  • 45 Comments
Joined 10 months ago
cake
Cake day: July 27th, 2024

help-circle


  • So one of the problems is the size of a “physical page”, on a stock x86 system that’s only 4KiB. If you allocate just 1MiB of RAM you need to back that with 256 “page table entries”, and to then load a virtual address within that allocation you need to walk that list of 256 entries to find the physical address in RAM that the CPU needs to request.

    Of course these days an app is more likely to use 1 GiB of RAM, that’s a mere 262,144 page table entries to scan through, on each memory load.

    Oh but then we’re also not running a single process, there’s multiple processes on the system, so there will be several million of these entries, each one indexed by address (Which can be duplicated, each process has its own private view of the address space), and then by process ID to disambiguate which entry belongs to each process.

    That’s where the TLB comes in handy, to avoid the million or so indexing operations on each and every memory load.

    But caching alone can’t solve everything, you need a smarter way to perform bookkeeping than simply using a flat list for when you don’t have a cached result. So the OS breaks down those mappings into smaller chunks and then provides a table that maps address ranges to those chunks. An OS might cap a list of PTEs at 4096 and have another table index that, so to resolve an address the CPU checks which block of PTEs to load from the first table and then only has to scan the list it points to.

    Like this, this is a 2 level scheme that Intel CPUs used before the Pentium Pro (iirc), the top 10 bits of an address selected an entry in the “page directory”, the CPU loads that and uses the next 10 bits to select the group of PTEs from that list, following that link that it finds the actual PTEs that describe the mappings and then it can scan that list to find the specific matching entry that describes the physical address to load (And it then promptly caches the result to avoid doing that again)

    So yes, for a given page size and CPU you have a fixed number of walks regardless of where the address lives in memory, but we also have more memory now. And much like a hoarder, the more space we have to store things, the more things we do store, and the more disorganised it gets. And even if you do clear a spot, the next thing you want to store might not fit there and you end up storing it someplace else. If you end up bouncing around looking for things you end up thrashing the TLB, throwing out cached entries you still need so now need to perform the entire table walk again (Just to invariably throw that result away soon after).

    Basically, you need to defrag your RAM periodically so that the mappings don’t get too complex and slow things down (Same is true for SSDs btw, you still need to defrag them to clean up the filesystem metadata itself, just less often than HDDs). Meta have been working on improvements to how Linux handles all this stuff (page table layout and memory compaction) for a while because they were seeing some of their long-lived servers ending up spending about 20% of CPU time simply wasted on doing repetitive walks due to a highly fragmented address space.


  • The RA in RAM stands for random access; there is no seeking necessary.

    Well there is, CPUs need to map virtual addresses to physical ones. And the more RAM you have the more management of that memory you need to do (e.g. modern Intel and AMD CPUs have 5 levels of indirection between a virtual and physical address)

    But it also caches those address mappings, as long as your TLB is happy, you’re happy. An alternative is to use larger page sizes (A page being the smallest amount of RAM you can address), the larger the page the less you need recurse into the page tables to actually find said page, but you also can end up wasting RAM if you’re not careful.




  • Yeah, I think Windows actually handles it quite well, the actual filesystem has no notion of what the filenames are outside of basic “It’s UTF-16”, it’s the OS filesystem layer that handles all the quirks.

    Because that’s what people seem to dismiss, there’s no one standard notion of case folding. It depends on the locale you’re using, and that shouldn’t be built into the FS itself. The classic one was the German “long S”, where “SS” should be case folded with “ß”, except they changed it in 2024 so now they shouldn’t match (“ß” becomes “ẞ” now), good luck updating your FS to support rules like that.

    Now your shell? That’s easy, you can just warn the user that a “matching” filename already exists and prompt them to change it, and you can vary those warnings based on the locale, and you can push out updates as easily as any other patch.







  • The_Decryptor@aussie.zonetoLinux@programming.dev*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    1 month ago

    2011? That’s basically last week right?

    Support for it (and UEFI ) came with their push into servers, they were forced to make the platform a lot less special and more general purpose like x86 traditionally has been.

    End user facing hardware is a different matter though, like I know you can boot the Raspberry Pi via UEFI/ACPI (It builds the ACPI tables in the bootloader), but then Apple doesn’t use it at all for their ARM hardware and it uses something closer to a modern OpenFirmware.


  • The_Decryptor@aussie.zonetoLinux@programming.dev*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    1 month ago

    I think x86 is basically the only platform that’s used ACPI, other hardware usually ships a fixed hardware list in firmware that the bootloader/kernel can read (Since it’s not like the motherboards are modular, e.g. the RTC is never going to randomly be connected to a different controller)

    Historically ARM didn’t even do that, it was mostly used in tightly linked systems so you’d just build those assumptions into the software itself (e.g. a Gameboy always has a directional pad on specific pins, so you just read those pins directly) I remember the early days of the Raspberry Pi involved device dependent kernel images because they had to code the specific initialisation routines into the drivers, it took a while for them to gain “device tree” support so you could have a generic kernel.




  • A place I worked at did it by duplicating and modifying a function, then commenting out the existing one. The dev would leave their name and date each time, because they never deleted the old commented out functions of course, history is important.

    They’d also copy the source tree around on burnt CDs, so good luck finding out who had the latest copy at any one point (Hint: It was always the lead dev, because they wouldn’t share their code, so “merging to main” involved giving them a copy of your source tree on a burnt disk)