## Forking in One-Hot FSMs

### Forking in One-Hot FSMs

Having two bits hot in a one-hot FSM would normally be a bad thing. But
I was wondering if anybody does this purposely, in order to fork, which
might be a syntactically nicer way to have a concurrent FSM. This would
imply that multiple states in the FSM could be active at once. This
would be an example:

parameter STATE1=1, STATE2=2, STATE33,... // state defs
casex (state)
...
if (state[STATE1]) begin
if (condition)
begin
m <= a*b;
state[STATE3] <= 1; // fork into 2 new states
state[STATE4] <= 1;
state[STATE1] <= 0; // leave current state
end
end
if (state[STATE3]) begin // DSP48 Adder Stage
p <= m+c;
state[STATE3] <= 0; // this fork dies
end
if (state[STATE4]) begin
m <= a2*b2;
state[STATE3] <= 1; // fork into 2 new states
state[STATE5] <= 1;
state[STATE4] <= 0; // leave current state
end

In this case I have a pipeline (as in a DSP48) which I can keep
continuously fed. A separate fork of the SM runs the pipeline. I can
turn on two one-hot bits (essentially ORing the states) to fork into
multiple states. One fork eventually kills itself. This might be nicer
than having a separate concurrent FSM. There may be a better syntax
that still allows a case statement. I just wondered if this is a common
or useful technique.
-Kevin

### Forking in One-Hot FSMs

> reg [31:0] state;
Sorry; there was not supposed to be a case statement here.

### Forking in One-Hot FSMs

But why not combine these two states into one states?and let that
states to do the pipline stuff?
Your coding may let your design slower and may not be implemented as
state machine in the final design.

On May 2, 2:54m, Kevin Neilson < XXXX@XXXXX.COM >

> reg [31:0] state; >> >>>> if (state[STATE1]) begin >>>> if (condition)> > > beg>n> > > m lt;> > <= a*b;
> > <tate[STATE3] <= 1; / fork i>t> 2 new states
> > &<t;t>te[STATE4] <= 1;
> < state[STATE1] <= 0; gt;gt;/ leave >u>rent s>a>e
> > end
> > end
> > if (state[STATE3]) be>i> // DSP48 Adder <>a>e
> > p <= m+c<
> > > >tate[S>A>E3] <= 0; / this fork d>e>
> > end
> > if (st<t>[>TATE4]) begin
> > m lt;<= a2*b2;
> > > state[STATE3] <= 1; > >// fork into 2 new states
> > lt;state[STATE5> >= 1;
> > end
>

### Forking in One-Hot FSMs

The example might not show this well, but you may want to fork from
several different states and the length of the fork, before it dies,
could be several states. So if I just have the single state, I would
have to manually branch out through the state tree and figure out all
states I could possibly be in while the "fork" would be operating and
then add the fork logic to all those states. If that makes sense.
Anyway, it's completely unmaintainable, because when you add a new state
to the machine you would have to figure out if the pipeline is supposed
to be full at that time and remember to add in the logic for that.
-Kevin

### Forking in One-Hot FSMs

evin,

I wonder about this too. I am currently doing a pipeline and
some code is shown below. So I wrote out the states without
an array so when the ModelSim comes up I don't have to expand
the states to see them. I also group signals that I want to
see associated with the states in the declarative region so
I don't have to futz too much with the ModelSim.

Some states stay on longer with the state_count condition.
I read one header record on state31 and follow it with three
of another type of data record with state32.

Now the "branching" happens because these states address
a NoBL SRAM and there is a two cycle lag between the
address and the data. Not show below, I also have clock
delays on these states, state32_1, state32_2, and so on
so when the address goes out on state32, I then have data
to process on state32_2.

I always have a When state associated with every What.

It actually gets deeper than that because there are FIFOs
involved as well. You'll need FIFOs in your design if
you are going to tackle a Sobel function. Here the trick
is to start thinking about your processses starting from
the READ data and figure out how many delays you need to
deliver an answer, then figure out where the WRITE data
should marry into the flow. I have now states out to _7.

Perhaps someone could suggest a better term than state
machine "forking"? And if there is some guidelines on how
to code ane debug pipelined architecture. I'm with Kevin,
it get's real messy, real soon.

AiVision

inner_cell_state_machine:process(clk)
begin
if(clk'event and clk='1')then
inner_cell_restart <='0';
if(reset='1')then
state30<='0'; state31<='0'; state32<='0'; state33<='0';
state34<='0'; state35<='0'; state36<='0'; state37<='0';
state38<='0';state39<='0';
state_count <= (others=>'0');
elsif(state29='1')then -- state29 automatically turns off from
init_state_machine
state30<='1';
--State30 Initial inner cell state
elsif(state30='1')then
state30<='0';
state31<='1';
state_count <= (others=>'0');
-- State31 Read the inner cell
elsif(state31='1')then
state31 <='0';
state32<='1';
-- State32 Read the inner cell connections
elsif(state32='1')then
if(state_count=2)then
state32<='0';
state33<='1';
state_count <= (others=>'0');
else
state_count <= state_count+1;
end if;
-- State33 Wait for SRAM to deliver first connection
elsif(state33='1')then
state33<='0';
state34<='1';
state_count <= (others=>'0');
-- State34 Read connection
elsif(state34='1')then
if(state_count=2)then
state34<='0';
state35<='1';
state_count <= (others=>'0');
else
state_count <= state_count+1;
end if;
. . .

### Forking in One-Hot FSMs

DEC used that style of design in the PDP-16 Register Transfer Modules.
Possibly also in the control units of some of their asynchronous
processors such as the PDP-6 and KA10.

### Forking in One-Hot FSMs

"Brad Smallridge" < XXXX@XXXXX.COM > wrote in message
news:Y9MSj.9245\$ XXXX@XXXXX.COM ...

'One hot' is a particular implementation of a FSM, but from a logic
perspective (i.e. how you go about designing your state machine, the states
needed, the branching, etc.) means absolutely nothing.

'Concurrent' state machines though is simply another way of saying state
machines that either are totally independent of one another, or only loosely
connected (i.e. there is some signalling going on between the two, but you
can *usually* futz with one without breaking the other).

As I mentioned in more detail in my response on 'Style for Highly-Pipelined
State Machines', I only really see two basic approaches:

- The first is some form of counting or adding states where you have a known
fixed number of states between 'doing this' and 'doing that'. This method
works but it really quickly leads to rather complicated code that is
difficult to understand and (because of the complexity) probably has some
logic holes as well that may take some time to surface. In certain designs
though this method is just fine and the results are fine and easy to
maintain. The problem though is when the realization sets in that the code
is getting out of control and how to manage it (which was the point of the

- The second method uses request/acknowledge handshaking between the
'concurrent' state machines. This method scales very nicely from a design
perspective and is just as efficient from an implementation perspective as
well.

Bottom line here is to realize that a 'fork in an FSM' is really a call to
think of it as two separate state machines that have a
communication/signalling requirement and don't try to force your mental
model as being 'one' state machine. After all, the entire design can be
considered to be a single state machine...but it is generally of no use to
think of it that way from a design perspective.

<snip>

Try looking at it now from a somewhat different perspective. Let's say you
have one state machine who's sole purpose is to generate read and write
commands and addresses to the memory but not to process the data at all. In
addition there is a second state machine who's sole purpose is to process
the data that gets read back from memory and produce some sort of result,
that maybe goes to memory, maybe goes somewhere else, it doesn't matter.

If the 'address generator' state machine needs the results from some
computation in order to procede on, then it sends a read request to the
'data processor' state machine, and waits until it gets the acknowledge. It
may sound queer, but what that does is allows you to design with any sort of
latency and does not require any apriori knowledge of how many clock ticks
it will take to get that result back, the address generator state machine is
waiting for the acknowledge back from the data processor state machine
(which in turn is waiting for the data to come back from the memory).

Now you could argue that the data processor state machine can't just
*process data*, it most likely needs to know what it is supposed to be doing
with it, and that knowledge likely lives in the address generator state
machine. Fair enough, but all that means is that the address generator
needs to be able to send commands over to the data processor. This could be
done in an ad hoc manner by settin

### Forking in One-Hot FSMs

> Perhaps someone could suggest a better term than state
> machine "forking"? And if there is some guidelines on how
> to code and debug pipelined architecture. I'm with Kevin,
> it gets real messy, real soon.

There is no requirement that a process/block must update only
a single register named 'State'.

When I look at large, textbook style state machine
examples, like the ones in this thread, I imagine
a much simpler process that updates several smaller registers.
Maybe an input reg, output reg, a couple of counters
and a few well-named booleans.

-- Mike Treseler

### Forking in One-Hot FSMs

> You'll also find that changes (like switching the Nobl SRAM to DRAM as an

That has been on my mind because there is a DRAM on my board. Not only
will the DRAM require more cycles but perhaps too a varying number of
cycles depending on the sequentiality or randomness of the addressing.

I have seen controllers on the Xilinx site, but nothing, that talks about
several ports, and how the hand shaking is handled. My FAE has said that
some multiport examples are availble.

AiVision

### Forking in One-Hot FSMs

On May 5, 12:13m, "Brad Smallridge" < XXXX@XXXXX.COM >

Except for the most special case examples, DRAM access will be a
variable delay because of page changes and memory refresh.

Trying to design a state machine that is simply trying to *access*
memory for some algorithmic purpose would likely result in a difficult
to maintain design.

Designing a request/acknowledge interface to some other process or
entity (in this case the 'other' being a DRAM controller) results in a
much easier to maintain design.

Using the exact same interface signal functionality whether one is
talking to internal FPGA memory, NoBL or SDRAM or SPI results in a
design that can be reused, retargeted and improved upon if necessary.

Using the same signal naming functionality as an existing documented
specification (i.e. Avalon, Wishbone) allows others to (re)use your
design without getting bogged down in details that they are not
currently interested in and allows them (and you when you re-use the
design) to be more productive.

Figure out where you are and where you want to be in the design
productivity chain. The synthesis cost in terms of logic resource is
zero, the upfront learning cost will start to pay back in the form of
quicker debug and reusable designs.

Kevin Jennings

### Forking in One-Hot FSMs

That's interesting--I'm not even familiar with an "asynchronous
processor". What does that mean? -Kevin

### Forking in One-Hot FSMs

...
...

This is a great example, because switching from one type of RAM to
another means you *do* have to change everything, if you want the
controller to be good. You can certainly modularlize the code and make
concurrent SMs with handshaking and this is easy to maintain. And a lot
of DRAM controllers are designed this way. But here is the problem:
while you are waiting around for acknowledges, you have just wasted a
bunch of memory bandwidth. If you want to make better use of your
bandwidth, you can't use handshaking. You have to start another burst
while one is in the pipe. You have to look ahead in the command FIFO to
see if the next request is going to be in the same row/bank to see if
you need to close the row during this burst and precharge or if you can
continue in the same open row in a different bank, etc. If I do all
that with handshaking, I'm frittering away cycles. And to do this in a
way that doesn't fritter away cycles with standard methodology means
everything is so tightly bound together that to change from SDRAM to
some other type of RAM means I have to tear up most of the design.

Another issue I came up with today in the design of my current SM is
that I updated a value x and then in the next cycle realized I wanted
the old value of x. But I hadn't really updated x; I had issued a
request that gets put into a matching delay line and then goes to a
concurrent FSM which then updates x. So even though I had "updated" x,
I could still used the old value for a few cycles and didn't need a
temporary storage register. Again, I can't just send the request to
update x and then wait for an ack because the SM has to keep on
trucking. This is confusing, and I'd like to have some sort of
methodology that would be as efficient as what I'm doing but somewhat
more abstract.
-Kevin

### Forking in One-Hot FSMs

"Kevin Neilson" < XXXX@XXXXX.COM > wrote in message
news:fvo2o9\$ XXXX@XXXXX.COM ...

The methodology I use makes use of every clock cycle, DRAMs are running full
tilt, transfers from fast FPGA through a PCI bus to some other processor,
etc., the whole 9 yards.

Then you're waiting for the wrong acknowledgement. Taking the DRAM again as
an example, every data transfer consists of two parts: address/command and
data. During a memory write, all of this happens on the same clock cycle.
When the controller 'fills up' it sets the wait request to hold off until it
can accept more commands (reads or writes).

During a read though, the address/command portion happens on one clock
cycle, the actual delivery of the data back to the requestor occurs sometime
later. The state machine that requests the read does not necessarily have
to wait for the data to come back before starting up the next read. The
acknowledge that comes back from a 'memory read' command is that the request
to read has been accepted, another command (read or write) can now be
started. There are also situations where one really does need to wait until
the data is returned to continue on, but in many data processing
applications, the data can lag significantly with no real impact on
performance, the read requests can be queued up as fast as the controller
can accept them.

Although I've been using the DRAM as an example, nothing in the handshaking
or methodology is 'DRAM specific', it is simply having to do with
transmitting information (i.e. writing) and requesting information (i.e.
reading) and having a protocol that separates the request for information
from the delivery of that information (i.e. specifically allowing for
latency and allowing multiple commands to be queued up).

I disagree.

That's correct...but you can't start one if the pipe is full (which can
happen when a memory refresh or a page hit occurs and the pipe fills up
waiting while those things get serviced). The handshake tells you that the
pipe is full and you absolutely need to have it. The 'pipe full' signal is
a handshake, when it is full, it says 'wait', when it is not full, it says
'got it'.

OK

Then you're not doing it properly. It's called pipelining, not frittering.

Latency can matter in certain situations, in others it doesn't. If there is
some situation where latency mattered, one would have to come up with a way
where the requestor could start up the read cycle earlier...but if there is
such a way to start it up earlier, then that change could be applied equally
well to the lower latency situation as well which means that you could have
a common design

Kevin Jennings

### Forking in One-Hot FSMs

Someone asked about state machines using encoding similar to one-hot but
with "forking" where multiple states make be active simultaneously,

There's no central clock. At any given time, one particular "unit"
in the computer is active. When it completes its work, it sends a
pulse to the next unit that needs to do something, thus handing off
control.

In some situations, a unit might trigger two other units. Usually
in such a case, a later unit implements a "join" between the two paths,
by waiting for both to complete.

The logic implementing such a control system looks just like a flowchart.

There were quite a few asynchronous computers in the old days, but
the world settled on synchronous designs for various reasons. In recent
years there has been a resurgence of interest in asynchronous designs,
partly due to the possibility of power savings. There are still no
mainstream asynchronous processors, though.

### Forking in One-Hot FSMs

(snip)

Sometimes also known as "self timed logic", and probably easier
to search under that name.
(snip)

There are rumors of asynchronous functional modules, such as
multipliers or dividers. That might make more sense in current
systems than a completely asynchronous design.

-- glen