Addresses and Pointers
If I had to guess, the most challenging thing about C is memory management. Unlike other languages, C doesn’t hide the details about where your data lives in memory and, unless you’re writing very simple programs, sooner or later, you’ll have to make sure your data lives long enough to do something useful. To effectively manipulate memory with C, first, it’s necessary to understand addresses and pointers.
Addresses
Addresses are numbers that identify the location of some program’s data in your computer’s memory. When you declare a variable, some memory will be reserved to store its data. On most occasions, you won’t really need to know precisely where it is, but you can easily check it nonetheless: the &
operator gives you the address of a variable. Consider the following example:
int x = 101;
printf("%d is stored at %p\n", x, &x);
Once you run it, you’ll get something like this (the memory address will probably not be the same):
101 is stored at address 0x7fffe319a3a4
From the previous example, one other thing that stands out is the format specifier %p
, which is a pointer to a void. Pointers are a “special type” that points or refers to something in memory (Gustedt 2020, chap. 11), and these are, in my opinion, what makes C both so fun and so hard.
Pointers
You can think of a pointer as a variable that stores the address of another variable. For example, instead of directly passing &x
to printf
, we can pass a variable of type int*
to it. In C, *
denotes a pointer to something, and it works this way:
int x = 101;
int* p = &x;
printf("%d is stored at address %p\n"
" which is the same as %p\n", x, p, &x);
Try it yourself:
101 is stored at address 0x7ffe653ffc3c
which is the same as 0x7ffe653ffc3c
The asterisk tells the compiler that p
is a pointer that points to an int
, that is, it points to another variable’s memory location. But I think this is still not very useful because it doesn’t tell us the actual value stored at the address. For that, we need to dereference the pointer.
To dereference a pointer is to retrieve the value stored at the location that it points to. Said another way, a pointer dereference gives us the data we have on memory, not the address. For example:
int x = 101;
int* p = &x;
printf("%d is stored at address %p\n", x, &x);
printf("%d is stored at address %p\n", *p, p);
From this, we will get the same thing twice:
101 is stored at address 0x7ffedec03b9c
101 is stored at address 0x7ffedec03b9c
So it is clear that a pointer can help us keep track of where things are and what they contain. This is particularly useful once you start working with functions and data structures because now you can share data between them and your main
function.
Passing pointers to a function
To pass a pointer to a function, just add the *
to the parameter in the function declaration. Now, to really appreciate what pointers do in a function, see this:
void DoNothing(int x) {
printf("%d is stored at address %p\n", x, &x);
x += 1;
printf("x is now %d\n", x);
}
void DoSomething(int* p) {
printf("%d is stored at address %p\n", *p, p);
*p += 1;
printf("x is now %d\n", *p);
}
int main() {
int x = 101;
int* p = &x;
printf("%d is stored at %p\n", x, &x);
DoNothing(x);
printf("but here, x is %d\n", x);
DoSomething(p);
printf("now, x is %d\n", x);
return 0;
}
DoNothing(int x)
gets a copy of x
, whereas DoSomething(int* p)
gets a pointer, which means that changes to *p
(what you get when you dereference p
) in DoSomething
persist after returning from the function. The code will print:
101 is stored at 0x7ffcc9002d1c
I'm inside DoNothing
101 is stored at 0x7ffcc9002cfc
x is now 102
but here in main, x is 101
I'm inside DoSomething
101 is stored at 0x7ffcc9002d1c
x is now 102
now back in main, x is 102
In DoNothing
, x
is a local variable with its own memory address and is discarded after the function returns. If you’re trying to modify a variable in a function, you’ll either have to assign it a return values:
int DoWithCopy(int x) {
return x += 1;
}
int x = 101;
x = DoWithCopy(x);
Or you’ll have to pass a pointer as we did with DoSomething
. The advantage of using pointers is that it’s more efficient since you don’t need to allocate more memory to store a local copy, and you won’t spend time initializing it. Of course, for small objects such as an int
, it won’t make much of a difference. But once you’re trying to manipulate big, complex data structures, passing around a copy becomes impractical.
Pointers to pointers
Sometimes, a single pointer isn’t enough to work with your data, and you need a pointer to a pointer. I won’t show you here any practical reason to use a pointer to a pointer because that would involve introducing even more concepts, so I’ll leave that for a future post, but I’ll show how pointers to pointers work.
I think the mystery about pointers fades away once you realize that a pointer is just another object, very much like int x
. It has its own data value, and it is also stored somewhere in memory, which means that you can also get its address and store it in a new pointer:
int x = 101;
int* p = &x;
int** q = &p;
printf("x equals %d == %d == %d\n", x, *p, **q);
printf("x is at %p == %p == %p\n", &x, p, *q);
printf("p equals %p == %p == %p\n", &x, p, *q);
printf("p is at %p == %p\n", &p, q);
printf("q equals %p == %p\n", &p, q);
printf("q is at %p\n", &q);
Before running the code, let’s take a look at it. First, notice that to declare a pointer to a pointer, you just need to add one more *
. When you first dereference q
, you’re pointing to a pointer, but if you dereference it a second time (as you do in the first printf
), you’re pointing to an integer. The syntax is pretty straightforward: some_type* p
points to a variable of type some_type
, and anything with more than one *
points to another pointer.
The third printf
is where I think it gets most interesting, as it ties all the concepts mentioned here together: a pointer gives you the address of something, so its own value (not the value you get when you dereference it) is an address. And, since a pointer is just a variable, you can also point to it with another pointer and by dereferencing it you get its value, which is an address. More concretely, p
equals some memory location where the value of x
is stored, and *q
will also give you that memory location. Let’s look at the code’s output:
x equals 101 == 101 == 101
x is at 0x7ffda9458924 == 0x7ffda9458924 == 0x7ffda9458924
p equals 0x7ffda9458924 == 0x7ffda9458924 == 0x7ffda9458924
p is at 0x7ffda9458928 == 0x7ffda9458928
q equals 0x7ffda9458928 == 0x7ffda9458928
q is at 0x7ffda9458930
A note about efficiency
Passing a pointer to a function still means that you’ll have to store a local copy of the pointer. So in our simple example, passing an int
is probably as efficient as passing a pointer to an int
. Where you will really start noticing a performance improvement is when you work with composite data types, but I won’t get into that today.
And a note on style
At the beginning of this post, I mentioned a pointer’s type is a pointer to a void. Strictly speaking, I should have cast pointers p
and q
to void*
before printing them (but I just didn’t to avoid cluttering the code examples). Depending on your compilation flags, you might get an error like this if you fail to cast your pointer:
error: format specifies type 'void *' but the argument has type
'int *' [-Werror,-Wformat-pedantic]
You could pass the -Wno-format
flag to your compiler to ignore this error, but I would suggest doing the proper casting instead:
printf("q is at %p\n", (void*)&q);
One other contentious topic is that of where to place the asterisk. Some people declare a pointer like int* p
, while others do int *p
. I like the first way, putting the type and the asterisk together, or more specifically, binding “type modifiers and qualifiers to the left” (Gustedt 2020, Level 1). When I read code in that style, I see a pointer to an int variable. The other style doesn’t make a lot of sense to me. A counterargument is that placing the asterisk right next to the variable’s name helps the reader identify it as a pointer. Consider the following examples from Stack Overflow:
int *p, x;
int* q, y;
p
and q
are both pointers, whereas x
and y
are just int
s, but the first line makes that more evident. So some prefer to always put the asterisk next to your variable to avoid this kind of ambiguity. But I think the bigger issue here is mixing types in one declaration. In fact, that’s the first comment to the answer on Stack Overflow. Gustedt also advises against mixing types in one declaration, or what he calls “continued declarations”, so I suggest following his style and being consistent.
Wrapping it up
A solid understanding of pointer syntax, style, and behavior is essential to use pointers effectively. Hopefully, these examples cleared some things up for you. If not, maybe my next blog entry will; I’m planning on writing more about pointers and delve into some complex topics such as memory allocation, structs, garbage collection, and strings. Now that we know some pointer fundamentals, I believe these other subjects will be much more accessible.