Questions tagged [cpu-architecture]

The hardware microarchitecture (x86, x86_64, ARM, ...) of a CPU or microcontroller.

The hardware architecture and ISA (x86, x86_64, ARM, ...) and the micro-architectural implementation of a CPU or microcontroller.

Use this tag for questions regarding features, bugs and details concerning the inner working of specific CPU architectures.

Don't use this question if you have no reason to believe your issue is related to the CPU architecture.

2863 questions
28 answers

Why is processing a sorted array faster than processing an unsorted array?

Here is a piece of C++ code that shows some very peculiar behavior. For some strange reason, sorting the data (before the timed region) miraculously makes the loop almost six times faster. #include #include #include…
  • 474,637
  • 51
  • 473
  • 538
4 answers

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE add and five cycles for a mul to complete on most…
  • 7,308
  • 3
  • 18
  • 16
4 answers

Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs

I've been racking my brain for a week trying to complete this assignment and I'm hoping someone here can lead me toward the right path. Let me start with the instructor's instructions: Your assignment is the opposite of our first lab assignment,…
  • 2,437
  • 4
  • 10
  • 17
3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called…
  • 55,740
  • 14
  • 167
  • 336
7 answers

Difference between core and processor

What is the difference between a core and a processor? I've already looked for it on Google, but I only get definitions for multi-core and multi-processor, which is not what I am looking for.
Saad Achemlal
  • 3,055
  • 5
  • 14
  • 17
3 answers

What is the purpose of the "Prefer 32-bit" setting in Visual Studio and how does it actually work?

It is unclear to me how the compiler will automatically know to compile for 64-bit when it needs to. How does it know when it can confidently target 32-bit? I am mainly curious about how the compiler knows which architecture to target when…
  • 9,716
  • 13
  • 36
  • 53
3 answers

What Every Programmer Should Know About Memory?

I am wondering how much of Ulrich Drepper's What Every Programmer Should Know About Memory from 2007 is still valid. Also I could not find a newer version than 1.0 or an errata. (Also in PDF form on Ulrich Drepper's own site:…
  • 29,170
  • 48
  • 127
  • 185
10 answers

What is the difference between Trap and Interrupt?

What is the difference between Trap and Interrupt? If the terminology is different for different systems, then what do they mean on x86?
  • 3,010
  • 6
  • 22
  • 30
13 answers

Why is a boolean 1 byte and not 1 bit of size?

In C++, Why is a boolean 1 byte and not 1 bit of size? Why aren't there types like a 4-bit or 2-bit integers? I'm missing out the above things when writing an emulator for a CPU
  • 1,801
  • 2
  • 12
  • 4
2 answers

What is difference between sjlj vs dwarf vs seh?

I can't find enough information to decide which compiler should I use to compile my project. There are several programs on different computers simulating a process. On Linux, I'm using GCC. Everything is great. I can optimize code, it compiles fast…
  • 9,776
  • 15
  • 82
  • 165
1 answer

Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?

I discovered this popular ~9-year-old SO question and decided to double-check its outcomes. So, I have AMD Ryzen 9 5950X, clang++ 10 and Linux, I copy-pasted code from the question and here is what I got: Sorted - 0.549702s: ~/d/so_sorting_faster$…
  • 1,612
  • 2
  • 12
  • 16
1 answer

Bubble sort slower with -O3 than -O2 with GCC

I made a bubble sort implementation in C, and was testing its performance when I noticed that the -O3 flag made it run even slower than no flags at all! Meanwhile -O2 was making it run a lot faster as expected. Without optimisations: time ./sort…
  • 1,219
  • 2
  • 3
  • 6
5 answers

Write-back vs Write-Through caching?

My understanding is that the main difference between the two methods is that in "write-through" method data is written to the main memory through the cache immediately, while in "write-back" data is written in a "later time". We still need to wait…
  • 12,110
  • 7
  • 28
  • 44
16 answers

Are there any smart cases of runtime code modification?

Can you think of any legitimate (smart) uses for runtime code modification (program modifying it's own code at runtime)? Modern operating systems seem to frown upon programs that do this since this technique has been used by viruses to avoid…
10 answers

Why do x86-64 systems have only a 48 bit virtual address space?

In a book I read the following: 32-bit processors have 2^32 possible addresses, while current 64-bit processors have a 48-bit address space My expectation was that if it's a 64-bit processor, the address space should also be 2^64. So I was…
  • 4,471
  • 8
  • 40
  • 61
2 3
99 100