The Linear Address Space

The Linear Address Space

Post by Cameron Gi » Wed, 28 Apr 2004 16:43:16


Hey people.

I have been working through the IA-32 Software Developer's Manuals. I have
read pretty much all of it in some detail (skipping mostly just tables and
lists that can be referenced as needed). Still there is something very basic
in my understanding that is lacking.

There is a Physical Address Space of 4GBytes (2^32 bytes) which basically
identifies all of the binary numbers that can be used as addresses for
assembly language programming.
With the 36-bit address extensions it is possible to increase this to
64Gbytes.
The problem is that in a computer there is Hard Drive storage, RAM storage
and a heap of other devices and the like that all require byte addresses.
Say for example you had a 180GByte HDD in your system, even with the 36 bit
address extension mechanisms which bring the Physical Address Space up to
64GBytes, there are still just not enough Logical Addresses to reference the
whole of the addresses on a system.

Also with Hard Drive partitions and with every program having to be put in a
particular spot on the Hard Drive, it is a very rare thing that one would
write a program that is to be put at address zero.
an abstraction or a virtualization of the true state of all of the addresses
within a computer so what I need to know is how are these abstractions that
are used in a program "mapped" to the real addresses of the computer?
How does this work?

Also, I think that paging has something to do with RAM management and the
"checking in" and "checking out" of data to and from RAM but frankly I can
neither confirm nor deny this and hence I just cannot figure out what is
actually going on with paging.
Am I on the right track with paging? - Please correct!

Might the mapping to real addresses have something to do with the Page
Directory Base Register (PDBR) which is Control Register 3 (CR3)?

All help appreciated.

Cameron.

The PC & Electronic Data Map Project:
http://www.yqcomputer.com/ ~cameron111



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system ( http://www.yqcomputer.com/ ).
Version: 6.0.658 / Virus Database: 421 - Release Date: 4/9/04
 
 
 

The Linear Address Space

Post by Matt Taylo » Thu, 29 Apr 2004 00:26:40


<snip>
bit
the
a
<snip>

Physical address space has nothing to do with the hard drive. The IDE
controller (or SCSI if you prefer) is part of the physical address space.
Some of its registers are accessible as memory. When for some reason data
must be read off the hard drive, the OS sends a command to the IDE or SCSI
controller to read the data into physical memory (RAM).

Just to make things more confusing, there is also an I/O address space. It
is essentially the same thing, but it was used exclusively for devices in
the 8086 days. Legacy PC hardware still uses it. I avoided the distinction
between Physical address space and I/O address space above; I believe the
IDE controller hardware still uses the I/O address space for commands, but
I'm not too sure.


Partly. For the CPU, paging refers only to the mapping between linear
addresses and physical addresses. OSes use the paging mechanism to implement
virtual memory, using the hard drive to expand the RAM limit in the system.


It most certainly does. The cr3 register points to a table (page directory).
That table is indexed with the first 10 bits of the linear address. The
entry indexed then points to another table (page table). That table is
indexed with the next 10 bits of the linear address. The entry indexed this
time points to a 4K frame of physical memory, and the lower 12 bits are then
used as an offset into that frame.

The more complicated PSE/PSE36 schemes allow the translation process to stop
at the page directory, and the remaining 22 bits are used as the index into
a 4M frame of physical memory. PAE allows this too, but there are 21
remaining bits instead of 22, so the frame is only 2M in size.

-Matt

 
 
 

The Linear Address Space

Post by fleks » Thu, 29 Apr 2004 01:36:43

wrote the reply in maximized window, so i guess it would be best viewed as
such.

basic

Don't skip the tables :).
Better yet get a book Intel Microprocessors 8086 .. Pentium 4
(http://users1.ee.net/brey/p8.htm). Much more comprehensive than intel's
stuff. I plan to use the first book to get more understanding of the topic
then use Intel's stuff as a reference/update.


That's virtual address space. Physical is the amount your DRAM sticks add up
to.

bit
the

I'm not sure about the whole ata standard on i/o, but i'm pretty sure that
it doesn't require direct map of every hdd byte into virtual space.
Maybe a small amount of ram is occupied by memory mapped i/o registers and
cache/buffers, and then you write your ata commands to these registers and
get the results in cache (not sure about all this though).
Also, standard size for disk i/o packet(sector) is 512 bytes at once, so, a
36 bit pointer would allow you to access quite a lot of 180GB drives at
once... (2^36 * 512 bytes).

a
simply
addresses
that

Few key terms associated with virtual/paged memory:

Memory is 'formatted into 4KB blocks named page(s) and whole thing revolves
around that.

Page directory - its *physical* base address is located in register CR3.
Contains 1024 real pointers to page tables.

Page tables - each page table contains 1024 real pointers to pages. (real =
physical address, virtual address is just a pack of three indexes to the
real tables/pages).

Pages - in a 32-bit addressing system, page represents 4KB of physical
address
space.

Dir
----Table0
----Page0
----Page1
----Page2
----Page3
...
----Table1
----Page0
----Page1
----Page2
----Page3
...
...

Each table covers an area of 1024 * 4096 bytes (1024 pages in a table * size
of a page) = 4MB. Multiply that by the number of tables in a directory and
you
get 4GB (1024 tables * 4MB covered by each table).

Linear(virtual) 32-bit address is composed of three indexes (left 10 bits
are directory index, middle 10 bits are table index, and last 12 bits are
offset in the selected page).
2^10 = 1024 (max. number of tables in a directory, also max. number of pages
in a table).
2^12 = 4096 (size of the page, ideal for offset).

So virtual address 00000000h would have a meaning (it could point anywhere,
even to physical 00000000h :) ):

10 bits 10 bits 12 bits ..total of 32
bits
[0000000000][0000000000][000000000000]
dir_index 0 tbl_index 0 offset 0

How cpu accesses ram in pseudo C:

unsigned long *phyAct;
unsigned long *phyPD;
unsigned long *phyPT;
unsigned long *phyP;

phyPD = cr3 ; // get page directory base address

phyPT = *( phyPD + ( dir_index * 4 ) ) ; // add dir_index * 4 (entries in
directory and tables are 4-byte) to phyPD and dereference to get the table
address

phyP = *( phyPT + ( tbl_index * 4 ) ) ; // add tbl_index * 4 to phyPT and
dereference to get the page

phyAct = *( phyP + offset ) ; // add offset to phyP to get the actual
address pointed to by this whole mess

Earlier i stated that the directory/tables contain 4-byte pointers to
tables/pages but that's not accurate, only the leftmost 20 bits are used as
a pointer, but since pages/tables are 4KB, for 32-bit addressing scheme that
can access 4GB max that's no problem ( 2^20 = 1MB, times 4KB pages = 4GB).
Rem
 
 
 

The Linear Address Space

Post by Heck » Thu, 29 Apr 2004 02:56:35

""Cameron Gibbs" < XXXX@XXXXX.COM >" draws inspiration from a reality most
only suspect:

The physical address space are all the addresses the processor can use,
correct. None of the other potential addresses are mapped or recognized
by the processor in any way, however, they can be used if they are paged
in. Just as with any conventional packing problem, bringing something in
necessitates swapping something out. Segment registers hold the offsets
into the total available pool of storage.

Read chapter 3 again, and, in particular, 3-4, in the System Programmer's
Guide.
 
 
 

The Linear Address Space

Post by Ivan Korot » Thu, 29 Apr 2004 02:56:46

> The problem is that in a computer there is Hard Drive storage, RAM storage
bit
the

Hard drive storage has nothing to do with RAM. The former one is dealt with
via IDE/SCSI bus and the latter one - directly via system bus. Disk space is
addressed by sectors which are usually 512b or 1kb in size (at the level of
a logical disk) and by clusters (at the level of filesystem) of size about
4kb. This makes it possible to address the whole disk with 32-bit addresses.

addresses
that

Virtual memory addresses are mapped to physical. A program never knows
(unless it's a driver) where actually its data and code resides, it deals
with virtual addresses.


Of course it has. It is used to organize swap memory on disk, for memory
protection and (in some OSs) to make each process have its own address
space.

Ivan
 
 
 

The Linear Address Space

Post by johnny » Thu, 29 Apr 2004 13:19:55

> Physical address space has nothing to do with the hard drive. The IDE

Your post is correct, but I thought I'd point something interesting
out for everyone. In the HURD, the ext2 filesystem driver is actually
built by mmap()ing in the whole partition into the driver's address
space, and then just dealing with it as a big block of memory. Of
course, this limitted the HURD to only 2GB partitions, but it was a
pretty nifty idea from a programming perspective, and one of the
benefits of being able to have operating system pieces running in
userspace. They may have changed the implementation since I last
looked at the HURD, but it was really interesting.

Jon
-------------
Learn assembly language programming with Linux
http://www.yqcomputer.com/
 
 
 

The Linear Address Space

Post by Cameron Gi » Fri, 30 Apr 2004 15:07:14

I think I am getting there.
Please correct where necessary.

Let's forget the distinction between the legacy I/O address space for now.
The Address Space *is* all of the addresses that a processor can use!
When a computer starts up, hardware sets up the addresses for things like
the Hard Drive which works to assign address values to things like it's
accessible registers.
This would probably also mean that all of the processor registers, the
video card registers etc. would all be given addresses.
This is probably what is meant by the term "mapping."
A programmer uses these registers (at their assigned address) to write
binary numbers to it that act to command the device.
In the case of a Hard Drive, the registers can be used to send packets of
data (512b or 1k packets) to RAM which are then executed or read by a
program.
So if you want to access a byte of Hard Disk space, one first commands it's
sector to read it into RAM and then selects it's offset.

A few things still to clear up:
If for example a program has a loop in it, then at some stage it has to
point (address) back to a previous part of itself.
Obviously if it is executing it is in RAM and hence doesn't use the Hard
Drive sector address and offset so what address does it use?
Does RAM have it's own set of addresses?
Do Operating Systems have to manage the loop link addresses if more than one
program is in RAM or are they in some way taken care of when programs are
put into RAM?


Also:
Paging handles the virtual RAM feature which manages the swap space to
increase the effective size of RAM by using Hard Disk space.
So paging is not actually necessary for the execution of a program and RAM
is in fact used regardless of paging.

Thanks to all who are helping.
The PC & Electronic Data Map Project:
http://www.yqcomputer.com/ ~cameron111



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system ( http://www.yqcomputer.com/ ).
Version: 6.0.669 / Virus Database: 431 - Release Date: 4/26/04
 
 
 

The Linear Address Space

Post by Charles A. » Fri, 30 Apr 2004 15:47:38

On Thu, 29 Apr 2004 06:07:14 +0000 (UTC)


:Let's forget the distinction between the legacy I/O address space for now.
:The Address Space *is* all of the addresses that a processor can use!

The 80x86 hardware makes a clear distinction between physical memory
address space, and physical I/O address space. The distinction can not be
forgotten.

: This would probably also mean that all of the processor registers, the
:video card registers etc. would all be given addresses.

Processor registers do not have addresses in either memory address space or
I/O address space. Video card registers have addresses in I/O space, but
have display memory addresses in memory address space.

:Obviously if it is executing it is in RAM and hence doesn't use the Hard
:Drive sector address and offset so what address does it use?

It uses address in virtual memory space, which are translated into physical
address space by the paging mechanism.

-- Chuck
 
 
 

The Linear Address Space

Post by Cameron Gi » Fri, 30 Apr 2004 17:52:52

Even closer I think :)
Please correct where necessary.

Rewritten after the help from Charles.
Thanks Charles.

There is an I/O Address Space and the Address Space described in the IA-32
manuals under the heading of Basic Execution Environment (called the Virtual
Address Space).

When a computer starts up, hardware sets up the addresses for things like
the Hard Drive which works to assign address values to things like it's
accessible registers and these addresses are a part of the I/O address
space.
Alot of registers for other devices are included here too.
A programmer uses these registers (at their assigned address) to write
binary numbers to it that act to command the device.
In the case of a Hard Drive, the registers can be used to send packets of
data (512b or 1k packets) to RAM which are then executed or read by a
program.
When a packet of data is sent to RAM, it exists in the Virtual Address
Space.

So if you want to access a byte of Hard Disk space, one first commands it's
HDD sector to read it into RAM and then one can select the byte with it's
Virtual Address which would automatically make itself relative when you use
the paging mechanism.... I think.

A few things still to clear up (sowwy I'm slow :):
If for example a program has a loop in it, then at some stage it has to
point (address) back to a previous part of itself.
When writing a program, one would use the Virtual Address Space and can
start using the addresses from 00.
If you write another program, does it also use the same Virtual Address
Space and can it also start from the address 00?
When you put these two programs into RAM, if they are both using the same
Address Spaces, how does the processor make the distinction?
Is this where paging is needed?
Can you do this without paging?


I think I can summarize by saying that one addresses the HDD in blocks, and
these blocks have addresses within them that use the Virtual Address Space.

I no longer think this is related but out of curiosity, does RAM have it's
own set of addresses (probably in the I/O Address Space)?

Thanks enormously to all who are helping.
The PC & Electronic Data Map Project:
http://www.yqcomputer.com/ ~cameron111



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system ( http://www.yqcomputer.com/ ).
Version: 6.0.669 / Virus Database: 431 - Release Date: 4/26/04
 
 
 

The Linear Address Space

Post by Ivan Korot » Sat, 01 May 2004 02:50:53

gt; There is an I/O Address Space and the Address Space described in the IA-32
Virtual

Yes!


To confuse you completely: some hardware registers are mapped to i/o space,
and some - to memory address space :). For example, my Ultra ATA storage
controller uses i/o addresses 9400-940f (for transferring commands I
believe) and memory addresses E5800000-E58003FF (for transferring data
blocks). I/O space isn't used for fast transferring of data that is big in
size (except for COM/LPT ports) cause it's too slow to do it byte-by-byte
and 64k i/o space won't allow to use big blocks.


What do you actually understand under RAM? System memory or ANY random
access storage?

it's
<snip>

The following happens: first the driver sends SEEK and READ requests to IDE
(or SCSI) controller by IN/OUT instructions (that is, writes this commands
to HDD's registers via I/O space) and then read block of data from memory
address range that is used by controller. You see, there is no big
difference between system memory and any other
hardware buffers mapped to physical memory address space. When CPU executes
a MOV or IN/OUT command it does the following:

1) calculates the physical address in case of MOV command (for i/o space,
there is no paging)
2) writes the address to A0-A31 pins of system bus (actually, some lower
bits in address aren't used because of alignment)
3) sets M/IO# line to 0 if access is made to memory a. space, or 1 if to i/o
space
4) sets R/W# line which indicates whether to read or write data
5) asserts the signal on ADS (Address Data Strobe) line thus telling the
world to process the request
6) all devices connected directly to system bus (RAM, PCI bridge, BIOS,
etc.) verify whether this address does belong to any of them; the device
that has then recognized it's address writes the reply data to D0-D63 lines
and asserts READY line.

For example, physical address 00034567 will be recognized by RAM,
FFFFFFF0 - by BIOS, E8000000 (on my system) - by PCI bus (and some device
connected to it), i/o address 378 - by ISA bus (if it's connected to system
bus) or by PCI-to-ISA bridge, etc.



All OSes are different and use different memory mapping schemes. For
example, 32-bit Windows's assign each process it's own virtual address space
(more precisely, lower 2GB are private, and upper 2GB is the kernel memory
that is mapped equally for all processes).
And most programs are loaded at the same base address 00400000 (or, another
frequent base, 10000000).


They use different mappings and **different virtual** a.s.


Yes.


Yes. Win 3.x did it using segmentation but it's worse.

and
Space.

No. Like I said before, HDD maps it's buffers at concrete *physical*
addresses which may be or not be mapped to the VA space of the process.


??? :-O

Ivan


 
 
 

The Linear Address Space

Post by Bjarni Jul » Sat, 01 May 2004 04:00:40

gt; There is an I/O Address Space and the Address Space described in the IA-32

Correct. The I/O address space is a legacy thing from the 8086 days, and
we can mostly ignore it for clarity's sake. The virtual address space is
a 4Gb space of addresses that you access with instructions like

mov eax, [ebx]

Here, ebx contains a 32 bit address in the 2^32 bytes = 4G bytes virtual
address space.


Here we are talking about the physical address space. A physical address
is an address as it appears electrically on the system bus outside of
the CPU. The virtual address space exists only inside the CPU. The CPU
itself performs a translation from virtual addresses to physical
addresses when a program makes an address reference. On startup, various
devices on the bus have either already been assigned hardwired physical
addresses, or set their addresses up by some means. How this works on
PCI busses is not my field, but that is not important for this
discussion. The RAM in the computer is one of the devices that are
mapped from start. Usually, it is simply mapped from physical address 0
and up, for as many addresses as there is RAM in the machine. One
address in that range corresponds to one byte of storage in a RAM chip.
One address somewhere else might correspond to a byte in a device
register in, for example, a hard drive.


When a program needs to read a sector (usually 512 bytes) of data from a
hard drive, it in essence writes a read command to the drive, and
provides the drive with a physical address where it wants the read data.
In reality this involves a little more fiddling with things like DMA,
and direct interfacing to hard drives is usually only done by the OS.
The OS knows about the physical addresses and is responsible for setting
up the virtual address space, and user programs run only in a virtual
address space. In principle, anyway.


The sector is put somewhere in RAM, and each byte in it is readable to
the CPU at the base address which you provided to the HD, plus the
offset within the sector. Usually, a user program demands some data from
a hard drive, and the OS reads it in. It then sees to it that the
physical addresses that contain the data are accesible in the virtual
address space of the user program.


Don't worry. If you don't ask, you don't learn.


In principle, yes.


It depends on the OS. It may assign the same virtual address space to
both programs, but modern operating systems usually do not, for reasons
of protection: if the address spaces are kept separate, the two programs
can not interfere with one another. Given two separate virtual address
spaces, both programs can begin at their own address 0.


If you give both programs the same virtual address space (that is assign
the virtual address spaces of both programs to the same physical address
space), the first program loaded will simply be overwritten when the
second program is loaded.


Now here is where it gets interesting. The way it works is this: each
address that a program references consists of a segment and an offset.
Most operating systems today don't make much use of the segment part any
more. Simply think of the segment as a 4Gb area of addresses somewhere
in a (possibly larger) address space. The segment is usually implicit: a
"mov eax, [ebx]" uses the Data Segment for the [ebx] address, and the
program itself is addressed within the Cod
 
 
 

The Linear Address Space

Post by Markus Hum » Sat, 01 May 2004 05:19:15

The distinction between I/O space and RAM access (so what the adress put
uon the bus by the CPU actually is) is done by a single pin on the CPU
as far as I know.

So it is always clear if it's RAm access or I/O port access. Some
hardware as already pointed out, map registers into RAM for faster
access. Modern graphics cards are such candidates.

Greetings

Markus
 
 
 

The Linear Address Space

Post by Cameron Gi » Sat, 01 May 2004 19:40:39

think the penny has dropped with Bjarni's post but everyone else has
helped.
Thankyou.

Of course I may still have a few things to clear up.
Reply embedded.
The PC & Electronic Data Map Project:
http://members.dodo.com.au/~cameron111

Bjarni Juliusson < XXXX@XXXXX.COM > wrote
in message news:lTbkc.57607$ XXXX@XXXXX.COM ...


Now I'm getting somewhere practical :)


Yay!!
That's what I needed to know.
Now I understand what "mapping" actually is and where this map actually
resides.
Now I know what I have to do and what is already done for me (at least in a
clean system -no OS).

Basically from this I can conclude that RAM (where a program sits as it is
executing) has addresses 00 up.
So when I write a program, I can see it inside of this space and I can point
to it and manage it as necesary.

In actual fact, there is an I/O Address Space, a Memory Address Space and a
Virtual Address space which is managed by an Operating System.
The Virtual and Memory Address Spaces often overlap because if there is no
Operating System then the Virtual Address Space for a program is the RAM
part of the Memory Address Space.

Now below to the question of what happens in the presence of an Operating
System.

it's
it's
use

So this is where writing a program is OS specific.
Every OS would have a different protocol for managing how a program is given
it's own Virtual Address Space.

Can I assume then that if you have an Assembler running in Windows that it w
ould compile a program based on the Window's protocols and likewise for an
Assembler for another OS?
I guess it would also be likely that you could get an Assembler running
under Windows to compile a program to run under Linux or to run on a clean,
OS free system.

Thanks again to all who are helping.


I actually wrote a lot more on this but understanding is as much about what
you throw away (put aside for another day) as it is about what you keep :)



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.669 / Virus Database: 431 - Release Date: 4/26/04

 
 
 

The Linear Address Space

Post by Bjarni Jul » Sun, 02 May 2004 09:59:46

ameron Gibbs wrote:

Excellent.


Roughly, yes. As you get along writing programs all this will sink in. A
note about the I/O space is that it is only accessed by special I/O
instructions, like IN and OUT. This is essentially the distinction
between the memory space and the I/O space; most instructions manipulate
the memory space, but the I/O instructions manipulate the I/O space.
What is it good for? Hooh! Yeah!


Oh yes.


Nope. The One Thing to understand about assembly language is that it
only maps your assembly source code directly into the corresponding
machine code. It's just a textual representation of exactly what is in
the computer's memory, which is to say the very bytes that you will find
in the assembled object code. Nothing more, nothing less. The behaviour
of the program itself will always be the same as long as you run it on
the same kind of computer, but the way it needs to interact with other
programs (like the OS) can vary. An assembly program written for one OS
is not likely to function under a different OS, unless you only use some
portable library for all your OS interaction. Even if you reassemble it.
If you assemble to a linkable object file, and use it together with a
library that exists in different versions for Windows and FreeBSD, you
shouldn't even have to reassemble your code to run on the other
platform, just link with the correct library.


This is true, which (unless I'm missing something) contradicts what you
stated above. You may assemble the program under windows and run it
under Linux.

One thing to note: an assembler is not generally considered a compiler,
so the translation process is usually referred to as "assembly", and not
as "compilation".


You are always welcome to ask more questions.


Bjarni
--

INFORMATION WANTS TO BE FREE

 
 
 

The Linear Address Space

Post by Cameron Gi » Sun, 02 May 2004 13:23:50

Reply embedded.
The PC & Electronic Data Map Project:
http://www.yqcomputer.com/ ~cameron111

Bjarni Juliusson < XXXX@XXXXX.COM > wrote



I think that one of the main problems I have been having is that there are
so many terms that are interchangeable, defined by context (often where the
context is not explained), imprecisely defined or used incorrectly.
The term "Virtual" appears to be such a term.

Even though my term use may need some refining, I thankyou all for at least
sorting out the conceptual side of my understanding.


I can see it having it's uses.

it w
an

That makes sense.


Thankyou everyone.
On to coding I go.



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system ( http://www.yqcomputer.com/ ).
Version: 6.0.669 / Virus Database: 431 - Release Date: 4/26/04