Ricochet is the best place on the internet to discuss the issues of the day, either through commenting on posts or writing your own for our active and dynamic community in a fully moderated environment. In addition, the Ricochet Audio Network offers over 50 original podcasts with new episodes released every day.
Tech’s Productivity Problem
I am a computer graphics software engineer. Computer graphics is unusual in tech for having an appetite for performance that far outstrips what computer hardware can deliver. In many ways, we are the ones who drive computing performance forward, and our hardware is often used by other fields for unrelated tasks (for example, AI makes heavy use of the GPUs–graphical processor units–that are used to power computer games).
Most of the software world does not face these constraints. For most programmers (and the managers who employ them) the increases in computing power over the past 30 years were truly awe-inspiring; far in excess of what they thought they needed. It didn’t matter how crappy programmers or their tools were when in a few years’ time computers could be counted on to be exponentially faster. Programmers didn’t need to learn to deal with hard things like memory management, optimizing performance, or writing code that can run on multiple CPU cores simultaneously. Even worse, the people who write the tools programmers use–programming languages–felt they need not worry about these things either. One of the members of the C++ standards committee (a widely used programming language) admitted to me earlier this year to having once thought this way.
But computers aren’t getting faster anymore. There is a physical limit to how small you can make transistors, and there is also a limit to how many transistors you can turn on at once and not melt the chip. We have probably reached both limits and we certainly will have reached them in a year or two’s time.
People are panicking. Industry leaders are wondering how they will manage, but their dependence on ever-faster CPUs will ultimately be their salvation. There is a wide scope to make computer software faster simply by rewriting old code. Most managers (and many programmers) fear this the way most people fear math, but I think they will be pleasantly surprised. Parallel programming and memory management are simply not as hard as they think, not when the right tools are used. Programmers who have spent 20 years thinking they can’t deal with managing memory or write parallel code are going to find that, actually, they can do these things.
Moore’s Law may be ending, but software will continue to advance.
Published in General
“Mega-dittos.”
“Kids today” just don’t seem to understand how computers really work.
Indeed. :-)
One problem you have with that kind of “subroutine” situation is that they will always have to be designed to include handling more possibilities than they will be needed for any given situation, which means they will contain more code than is needed for any given situation. Which amounts to bloat.
These days, we actually need more procreation.
But they themselves may have played a part in getting rid of those roles and purposes because they felt them to be oppressive etc.
Yes. The modern Javascript stack is even worse. There are more pieces you can’t understand because you are meant to install it and move quickly. At least in my case I understand the stack pretty far down, and Clojure has a mentality of “go read your library if you want to know how stuff works.”
Being able to bootstrap the world from just a working computer is an open problem. We need more people who can bootstrap different languages without having those languages to begin with. Try compiling gcc without gcc, it can get to be quite an adventure.
use clang?
How do you bootstrap llvm?
Cool some of them with liquid acetylene and others with liquid oxygen.
use Microsoft visual studio.
But you ask the question and there is a cppcon video on the subject
Speaking of bootstrapping…
The one thing I really liked the most about the PDP-12 was that you could bootstrap it from a LINCtape/DECtape by just entering a single instruction on the switches, and then START. I don’t think any other DEC system let you execute an instruction immediately from the front panel, without first “depositing” it into memory. Certainly the PDP-8 models didn’t. Entering even the simplest “bootstrap” program to start from paper tape, took a few minutes, even if you had it memorized like I did. (And most PDP-8 system seemed to have that bootstrap program on a piece of paper taped to the front.)
But in practical terms the most useful thing about the PDP-12 was probably the ability to use the “oscilloscope” display as a “console” and just the keyboard of a teletype for input, without using lots and lots of paper (and ribbons) like we did with the 8/L.
The photo in my previous post makes it look deceptively simple, but in reality that text/number display was not a traditional terminal display like on even the simplest CRT video terminal.
The PDP-12 display was really an oscilloscope, and everything seen in that photo is actually being DRAWN on the screen, basically one dot/pixel at a time.
And for example in the LAP6/DIAL system, one of the A/D (Analog to Digital) control knobs could be used as a “cursor” for moving around in text to be edited, etc. Pretty cool for the 1960s!
I wrote a PhD dissertation based on solving a problem using Integer Dynamic Programming optimization. IDP iterates toward a solution by incremental choices along many paths of possible solutions. When you get to the end and see the least expensive (in my case; there could be other objectives) solution, you have to work your way back along the path to reveal all of the decisions that were made to get that value. That means you have to store all of the intermediate partial paths. This was a classic case of using up all the available CPU and storage capacity given to me, so I had to write the software to be as efficient as possible.
I was fortunate to have been at a university that had an array of IBM supercomputers (well, super in the late 80s). So I had access to a lot of memory and CPU power. I could not afford the time it would have taken to write the partial paths to disk (or maybe tape!) and get it back. So it had to fit into RAM. My intermediate arrays were triangular, so I paired them and declared a rectangular array for both halves; when I read or wrote to each half I had to compute the row and column by context.
I wrote the software in FORTRAN. Why? Because that language offered me a 1-bit memory type and I could store my 1-0 data in the smallest memory footprint possible.
The procedure was basically to submit a text file with the program surrounded by JCL(!) statements to submit as batch jobs to the supercomputer array. JCL (Job Control Language) came out of early computing, and if somebody really knew it, you wanted that person as a friend for life. Every keystroke counted, a missing space or a double space where one was called for would totally invalidate that run. JCL was based on the 80-character Hollerith card, so no statement could exceed 80 characters. Even in the 1980s this felt like horse-and-buggy technology.
My procedure was this: when I was ready to leave the office for the day, I would submit about 8 of these jobs (with different data for the experiments I was running). Next morning (if I was lucky) I would have the results. The biggest problems I would submit took around 30 MINUTES of CPU time. My main competition for access on this array was the Meteorology Department running arcane weather models. My FORTRAN programs were very difficult (for me) to write, but the final one was only about 300 lines of code. I may have set a record for the most CPU time per line of code.
I could pretty easily run those models today on my home computer; lots of RAM, so I could rewrite in a modern language. But each data point would still take quite a while to compute.
There is a programming practice whose acronym is “SOLID,” which would take a lot of writing to explain here. Following its practices can increase portability and decrease bloat. If you search for it on Big Brother Google, you’ll find good explanations of it.
That sounds a bit (har har) like my second-year CS project. I started college when most students could get there without having SEEN, let alone actually USED, a real computer. But the High School I went to, had one – ONE, the PDP-8/L, and it wasn’t in the office! In the office they used typewriters! Big advantage!
I started college with the usual mix of “bonehead” classes everyone had to take for any kind of degree. Back then it was common to not take courses seriously related to your major, until 2nd or even 3rd year.
By third term of first year I knew I needed some computer courses just to keep some sanity. But going through the course catalog, nothing seemed the least bit (again!) challenging. I wound up taking 3 senior-level (400) classes, and totally blew them away: 2-hour finals in 20 or 30 minutes with perfect scores, things like that…
Before starting the second year, my advisor, who was also the head of the math department, had a special project he wanted done, a Normal Algorithm processor for their basic (at that time) AI classes. He offered a full 12-credit-hours of “A” grade for it. I met with him once a week for most of the term to work out what he wanted, I wrote it out long-hand on paper pads the last weekend of the term, and keypunched it myself. It worked the first time.
[continued for word limit]
While CPUs have hit a wall (Intel has been stuck at 10nm since 2014) AMD and TSMC have successfully been on 7nm for several years, and will soon shift to 5 nm. AMD’s Epyc ‘Rome’ CPU is a 7nm chipset with nearly 40 Billion transistors in the CPU package. In order to keep yield up, the chip has been divided in 9 separate semiconductor components “Chiplets”. This is a return to the past, Linked is a video of a tear down of IBM mainframe CPU from the 90s that had 100s of chips inside its package:
While it might be a while before we see new lines launched with 100’s of chips encapsulated into a single cpu package – the technology to do that is quite old. Should a market require it – it could be done.
Underneath the heat spreader – Nude photos of AMD’s Epyc server chip:
The entire article is here:
AMD Epyc Rome CPU
With faster and faster CPUs the bottle neck in overall system speed is again becoming memory DDR4 is showing its age, higher end desktop and workstation CPUs have Quad channel memory controllers, and some servers have 8 channel memory controllers. DDR5, has been finalized and engineering samples have already been shipping to vendors for design verification and certifications. DDR5 will double the speed of the memory channels into the pc, PC5-51200 speed memory should be available next year. (DDR5 is not backwards compatible – and will not work in currently shipping DDR4 systems)
[continued]
However, the Computer Center admins were not happy about it, because to keep from tying up CPU power just unpacking and re-packing character text, I stored the “program” and “working” text one character per “word,” using just 6 bits out of 60. (CDC Cyber mainframe system.)
I suppose I could have packed things like you describe, but then my program would have been chewing up CPU time like yours did, essentially for the make-work of “optimizing” memory use. And they were far more conservative of CPU time. But nobody had ever encountered a situation where so much memory would be used at once.
And where most even rather complicated programs might be just a few hundred (60-bit) words at most, for my Normal Algorithm processor to handle a decent-size program might require upwards of 100k. And the whole mainframe system only had 192k at that time. So when someone wanted to run my program for their Normal Algorithm test, most or all of everything else previously running on the system had to get “swapped out.”
To optimize it as much as possible, I wound up (after “encouragement” from the Computer Center admins, passed through the department head/my advisor) breaking up the main program into “segments” which meant that only the data part was always in system memory. That at least allowed a FEW other programs to be resident and run at the same time…
And that’s also when I discovered that the setup CDC had created for segmenting programs like that, didn’t actually work very well… One of several times I’ve managed to uncover bugs in sometimes very important and complex systems.
Oh, I also did a little embedded work. I came in partway through the development of a microprocessor-controlled video effects unit. The previous software guy hadn’t been able to get it to work, and just couldn’t figure out why. Finally he just gave up and left.
The basis of the hardware was a 64180 chip, basically a Z80 enhanced with some on-chip RAM (not very much, maybe 1k?) and I/O capability so it didn’t need a lot of external chips to be used in devices.
The code was mostly in C, with some assembly for hardware-specific functions that couldn’t be handled by C. The code development was done on a PC and then cross-compiled to Z80/64180, put on a ROM chip, and stuck in the device for testing.
It didn’t take me long to see that he hadn’t included the initialization routine at the very start of the assembly code, to initialize the CPU stack etc. How can someone look and look and not see that? I suppose it’s a tree that people can miss if they’re in the forest of C and just assume things like that are done by the Software Elves.
Cross-compiling was a big thing since it let people (at least theoretically) write portable code in C that could then be compiled for different hardware just by using a different version of the software. What it actually did, at least in the case I was involved with, was compile the C code to Z80 assembly, and then a regular Z80 assembler took it to machine code level for programming the EPROM.
And that was actually another case where I encountered a hidden bug: At least in some cases, I don’t remember the circumstances now, the compiler would produce assembly code missing some linking designations, like EXTERNAL values between modules, and then the assembly and linking steps would fail.
Fortunately in those days, if you had a technical problem like that, it wasn’t yet impossible to reach someone at a company who actually knew what they were doing. I let someone at Aztec/Manx software know about the problem, including reproducible examples, and they fixed it.
There was also an optimizer available which I used, but the things it optimized seemed to me like they should have been done by the compiler to start with. More profit from the optimizer, I guess.
A couple issues here. One is that it depends on what you mean by “improvement”. If you measure just one parameter, sure. But different types of improvements happen in different areas at different times.
But Moore’s Law wasn’t about improvement in speed, or number of computations, or whatever. Moore was only about the number of transistors on an IC product (ie., not a lab curiosity).
Here’s the chart from the Wikipedia entry (it’s nicely up to date):
I think it’s lookin’ pretty good. How amazing is that?
My understanding was that DDR4 has about the same latency as DDR5?
Yes, and no. When counted in clock cycles the latency of ram has not changed in 15-20 years. The trick is to increase in the clock speed, DDR5 will support clock speeds from 3200 MHz – 4800 MHz out of the box, and eventually up to 8400 MHz.
In my first company, we were doing a proof of concept IFF (Identify Friend or Foe) pattern recognition system for the Navy based on Radar return patterns. The system was trained by repeatedly passing through a set of known data and then tested against a different set. The “Train” took overnight and sometimes needed to be baby-sat. I was just starting out and volunteered for the overnight runs. I have always felt that my willingness to do that gave my career a kick-start. And it also let me go to Engineering school full time (paid for by my company). I can’t imagine having the energy for that again.
also:
I absolutely HATED JCL. It might as well be called “Just Comeback Later”. I went to Wright Patterson AFB to port one of our programs to their multiple IBM mainframe systems. The head guy would show me what JCL I needed and I would type that up, submit the card deck and the next day get a rejection notice. The head guy would say “Oh, you forgot the gobbldygook”, so I would add that and rinse and repeat. I think I was there a week and probably 4 days was getting the JCL right.
The other people you wanted to keep happy were the computer operators (when you weren’t doing it yourself). My first programming course was at UNC which had one of the first IBM 360 computers in a shared arrangement with Duke and NC State. The professor was Fred Brooks who had managed the 360 and 360/OS projects.
My first project was to print a 3D plot on a line printer using overprinting to alter the perceived density. It was the typical big computer setup where you would punch the code on cards, submit the card deck and come back in a couple of hours to get the results. Unfortunately, I misread the special code to reprint on the same line and used the one to eject the page and print at the top of the next page.
The result was a stack of about 6″ of paper with one character at the top left of each page. This was labelled with a very nasty note from the operator who managed to stop my program before it ran too long.
It took a long time to get that operator’s trust back.
Staying up all night used to be easy. Now it happens inadvertently via insomnia. But the next day is no fun. And you got a nice benefit out of your all-nighters.
The terrible part was that the syntax was incomprehensible, so you could never guess or intuit what was needed. You had to know it or look it up, and if you weren’t doing it constantly you needed to find the person who knew. I remember only one line of JCL:
SYSIN DD *
which means treat the lines (cards) that follow as data to be read. I remember it because I mis-heard my expert and I didn’t put the space between DD and *. Instant run failure, and no information came back as to why.
I own his book “The Mythical Man-Month”. Good read.
Always unpredictable. At Pitt’s data center you’d get a slip of paper with a number, and you could phone up and a recorded voice would tell you the last job run, so you wouldn’t have to guess. That was luxury, in the day.
I heard a story when I was an undergraduate about somebody who had done almost exactly that error. He got yelled at, too.
Between waiting for an available keypunch machine and waiting for the run delay, you quickly learned to be a very careful pre-run debugger or you’d never get anything done.
ah, punch cards.
In High School, each person had a couple cards with a standard heading. It printed your name, the class period, etc.
A friend slipped a card into another friends deck that had a print command on it to print “I’m a porno freak” on the next line after his name. Friend #2 never noticed, and friend #1 finally fessed up and told him to re-run it just before he was going to hand it in.
High School for me was all about paper tape. Cards didn’t come until college.
All this discussion reminds me of this old Dilbert cartoon: Link
When running a compile, our system would print out error messages on a separate line starting with FRSOPN….. It made a particular sound when printing out the compile results, so we all were keyed to that particular sound.
I have no idea what the FRSOPN stood for, but a great trick was to stick in a card which started
#FRSPON …. with some error message.
The # meant it was a compiler comment and it would usually take the target programmer a while to figure out there wasn’t a problem.
I rather liked JCL.
That might explain a lot!
🤣
I have very fond memories of gcc. When I was working for a company which did a custom chip design for battery management, the first silicon had a bug which made branches work incorrectly. We were able to modify gcc so that an extra location was added after each branch so they would work correctly.
I think open source tools are a very important part of improving computer programming.