This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Network driver problem only with larger programs (ARM adv needed)


I am using a JTAG debugger (JEENI) with GDB.  The JEENI (and GDB) do not
provide any method of accessing the coprocessor registers of the ARM to
configure/control the cache.  As long as the flash on the board was blank,
the board would work because the ARM core would be in the reset/default
state (caches disabled) until after I loaded/ran my code.

The situation I ran into was that I had already programmed the flash with
RedBoot, which boots very quickly - faster than I can start GDB. Therefore,
the caches on the board were usually enabled (depending on how quick on the
draw I was).  On my previous projects (ARM7TDMI) the cache was a unified
instruction/data cache (write through) which avoids this issue.  With the
ARM940T it had separate instruction and data cache (write back).  This made
all the difference.

What is necessary is after loading your program to RAM, you must clean and
flush the DCACHE, and flush the ICACHE before executing the code loaded to
RAM.  When loading the code to RAM, this looks to the ARM core like data
writes, and with a write back DCACHE, will be cached (or at least some if it
will).  The clean and flush of the DCACHE gets this committed to actual RAM.
This is necessary because when you execute code from this RAM, it will not
already be in the ICACHE (ie, you couldn't have been executing code from the
RAM you were loading the program to), so the instructions will be fetched
from RAM.  If the data cache hasn't been cleaned and flushed, incorrect
instructions will be fetched.  After a few compile/load/run cycles, the RAM
starts to contain old stale fragments causing strange unpredictable results.


My half-baked work-around was to write a relocatable assembly stub that
flushed the ICACHE/DCACHE, and have my GDB init script poke this into high
RAM (an area that my RedBoot marked as non-cacheable for use with DMA
devices) and execute the stub to gain control of the cache.  Although not
perfect, it works reasonably well.

For ROM and ROMRAM startup applications, the program starts with the ARM
core in the reset/default state, so the caches are always disabled on
startup.  With RAM startup, it is all up to the 'loader' ('loader' =
GDB/JTAG, RedBoot, etc.).  The loader must load the RAM code to RAM, then
use the HAL_ICACHE_SYNC() macro, then branch to the RAM code entry point.
When GDB/JTAG is the 'loader', the equivalent of the HAL_ICACHE_SYNC() macro
must be performed by GDB/JTAG debugger.

Jay

-----Original Message-----
From: Joe Porthouse [mailto:jporthouse@toptech.com]
Sent: Tuesday, August 29, 2006 8:29 PM
To:
ecos-discuss-return-35956-jporthouse=toptech.com@ecos.sourceware.org;
ecos-discuss-return-35843-jporthouse=toptech.com@ecos.sourceware.org
Cc: ecos-discuss@ecos.sourceware.org
Subject: RE: [ECOS] Network driver problem only with larger programs
(ARM adv needed)


Jay,
I'm on an xScale (PXA255).

Now the ICache...now that seems to make sense!  This could explain why the
problem seems to come and go with each build.  I also just noticed that
without my JTAG connected, my build from last night seems to run, but with
the JTAG connected, it hangs up.  I believe I have also had builds not run
even with the JTAG disconnected.

Where did you put your HAL_ICACHE_SYNC() call?

I am running with a single eCos/Application image (no redboot).  The
platform specific startup code I had did not support ROMRAM so I had to add
the necessary code to copy the image.

At boot up my platform_setup1 macro:
#1. disables the MMU and cache and sets up SDRAM
#2. copies the application from flash (at 0x00000000) into SDRAM (at
0xA0000000)
#3. starts the MMU and cache.  The MMU is setup to swap flash and SDRAM
memory locations (SDRAM at 0x00000000), and execution simply continues from
the same PC location, but now it's the SDRAM copy of the application.

Oops, I just noticed that when the MMU and cache starts, the ICache is first
started and flushed, then the MMU starts, swapping memory locations,
followed by the DCache starting and being flushed.  Is this order wrong?  Or
is my problem something that the JTAG or JTAG startup script is doing?

For reference I included the platform_setup1 macro, the init_mmu_cache_on
macro and my JTAG startup script.

Once again any help is greatly appreciated.


/**********************************************************************
* initialize controller
**********************************************************************/

#if defined(CYG_HAL_STARTUP_ROM) || defined(CYG_HAL_STARTUP_ROMRAM)
#define PLATFORM_SETUP1 _platform_setup1
#define CYGHWR_HAL_ARM_HAS_MMU
#else
#define PLATFORM_SETUP1
#endif

.macro _platform_setup1

    // disable MMU and cache
    mov    r0, #0x78
    mcr    p15, 0, r0, c1, c0, 0

    // invalidate I&D cache & BTB
    mcr    p15, 0, r0, c7, c7, 0  // r0 ignored

    // drain write (& fill) buffer
    mcr    p15, 0, r0, c7, c10, 4   // r0 ignored

    CPWAIT r0

    // there is only one co processor on the PXA25x
    ldr    r0, =0x0000001
    mcr    p15, 0, r0, c3, c0, 0

    init_sdram_cnt

    // Wake up from sleep if necessary.
    wake_from_sleep

    init_gpio

    early_uart_init

// TOPTECH
#if defined(CYG_HAL_STARTUP_ROMRAM)
    // Relocate [copy] program image from ROM to RAM
    ldr     r3,=0x00000000  // flash start phy addr
    ldr     r4,=0xA0000000  // ram start phy addr
    ldr     r5,=__ram_data_end
    cmp     r4,r5           // jump if no data to move
    beq     2f
    sub     r3,r3,#4        // loop adjustments
    sub     r4,r4,#4
1:  ldr     r0,[r3,#4]!     // copy info
    str     r0,[r4,#4]!
    cmp     r3,r5
    bne     1b
2:
#endif

    LED(11)

    init_intc_cnt        // Interrupt Controller

    LED(10)

    init_clks            // Clocks

    LED(9)

    init_mmu_cache_on    // MMU and Cache

    LED(8)

.endm

#endif /* CYGONCE_HAL_PLATFORM_SETUP_H */


/**********************************************************************
* MMU/Cache
**********************************************************************/
.macro init_mmu_cache_on

    early_uart_out r0, r2, '.'

    ldr    r0, =0x2001
    mcr    p15, 0, r0, c15, c1, 0
    mcr    p15, 0, r0, c7, c10, 4    // drain the write & fill buffers
    CPWAIT r0
    mcr    p15, 0, r0, c7, c7, 0    // flush Icache, Dcache and BTB
    CPWAIT r0
    mcr    p15, 0, r0, c8, c7, 0    // flush instuction and data TLBs
    CPWAIT r0

    early_uart_out r0, r2, '.'

    // Icache on
    mrc    p15, 0, r0, c1, c0, 0
    orr    r0, r0, #MMU_Control_I
    orr    r0, r0, #MMU_Control_BTB
    mcr    p15, 0, r0, c1, c0, 0
    CPWAIT r0

    early_uart_out r0, r2, '.'

    // Set up a stack [for calling C code]
    ldr    r1, =__startup_stack
    ldr    r2, =PXA2X0_RAM_BANK0_BASE
    orr    sp, r1, r2

    // Create MMU tables
    bl     hal_mmu_init

    early_uart_out r0, r2, '.'

    // MMU on
    ldr    r2,=1f
    mrc    p15, 0, r0, c1, c0, 0
    orr    r0, r0, #MMU_Control_M
    orr    r0, r0, #MMU_Control_R
    mcr    p15, 0, r0, c1, c0, 0
    mov    pc,r2
    nop
    nop
    nop
1:

    early_uart_out r0, r2, '.'

    mcr    p15, 0, r0, c7, c10, 4    // drain the write & fill buffers
    CPWAIT r0

    early_uart_out r0, r2, '.'

    // Dcache on
    mrc    p15, 0, r0, c1, c0, 0
    orr    r0, r0, #MMU_Control_C
    mcr    p15, 0, r0, c1, c0, 0
    CPWAIT r0

    early_uart_out r0, r2, '.'

    // clean/drain/flush the main Dcache
    mov    r1, #0xe0000000
    mov    r0, #1024
2:
    mcr    p15, 0, r1, c7, c2, 5
    add    r1, r1, #32
    subs   r0, r0, #1
    bne    2b

    early_uart_out r0, r2, '.'

    // clean/drain/flush the mini Dcache
    mov    r0, #64          // number of lines in the mini Dcache
3:
    mcr    p15, 0, r1, c7, c2, 5      // allocate a Dcache line
    add    r1, r1, #32        // increment the address to
    subs   r0, r0, #1        // decrement the loop count
    bne    3b
  
    early_uart_out r0, r2, '.'

    // flush Dcache
    mcr    p15, 0, r0, c7, c6, 0
    CPWAIT r0

    // drain the write & fill buffers
    mcr    p15, 0, r0, c7, c10, 4
    CPWAIT r0
.endm


>// Reset target
>reset
>// Set endian to little
>control.b=0
>// Set semi host variables
>_heap_base    =           A0018000H
>_heap_size    =           00002000H
>_stack_size   =           00000400H
>_top_of_memory=           A0020000H
>// Write MSC0, MSC1, MSC2
>// CS0 - Rbuff=0, RRR=010, RDN=0010, RDF=1101, 16bits=0, 000=FLASH
>// CS1 - N.C.
>// CS2 - Ethernet I/O
>// CS3 - NVSRAM
>// CS4 - Ethernet
>// CS5 - Rbuff=0, RRR=???, RDN=????, RDF=????, 16bits=1, 000=Non Burst
>//word 0x48000008 = 0x2ef15af0    // CS1 (N.C.)      / CS0 (Flash)
>word 0x48000008 = 0x7ff07ff0    // CS1 (N.C.)      / CS0 (Flash)
>//word 0x4800000c = 0x7ff97ff8    // CS3 (NVSRAM 16) / CS2 (Ethernet I/O
16)
>word 0x4800000c = 0x7ff97ffc    // CS3 (NVSRAM 16) / CS2 (Ethernet I/O 16
VLIO)
>//word 0x48000010 = 0x7ff87ff0    // CS5 (UART 8)    / CS4 (Ethernet Data
32)
>word 0x48000010 = 0x7ff87ff4    // CS5 (UART 8)    / CS4 (Ethernet Data 32
VLIO)
>// GPSR2: Set CS3 (GPIO79) high
>//word 0x40e00020 = 0x00008000
>// GPDR2: Set CS3 (GPIO79) output
>//word 0x40e00014 = 0x00008000 
>// GPFR2_L: Set CS3 (GPIO79) CS function
>//word 0x40e00064 = 0x80000000
>// GPSR1: Set nPWE (GPIO49) high
>word 0x40e0001c = 0x00020000
>// GPSR2: Set CS3 (GPIO78, GPIO79 & GPIO80) high
>word 0x40e00020 = 0x0001c000
>// GPDR1: Set nPWE (GPIO49) output
>word 0x40e00010 = 0x00020000 
>// GPDR2: Set CS3 (GPIO78, GPIO79 & GPIO80) output
>word 0x40e00014 = 0x0001c000 
>// GAFR0_U: Set RDY (GPIO18) RDY function
>word 0x40e00058 = 0x00000010
>// GAFR1_U: Set nPWE (GPIO49) nPWE function
>word 0x40e00060 = 0xa0000008
>// GAFR2_L: Set CS2, CS3 (GPIO78, GPIO79) CS function
>word 0x40e00064 = 0xa0000000
>// GAFR2_U: Set CS4 (GPIO80) CS function
>word 0x40e00068 = 0x00000002
>// Assert MDREFR:K1RUN and MDREFR:K2RUN and configure MDREFR:K1DB2 and
>// MDREFR:K2DB2 as desired.
>word 0x48000004 = 0x03ca4fff	// Controller default
>word 0x48000004 = 0x03ca4018	// Refresh1 Rate = (64MS/8192
Rows)*99.5Mhz/32 = 24
>word 0x48000004 = 0x03cf6018	// Set K0RUN, K1RUN and K2RUN
>word 0x48000004 = 0x038f6018	// Clear Self Refresh
>word 0x48000004 = 0x038ff018	// Set E0PIN and E1PIN
>// Set SDRAM config register, but don't enable any banks yet
>word 0x48000000 = 0x000009c9
>// write a reg as a delay tactic to wait 200usec
>r0 = 0
>// Write the disabled bank 9 times.  Each time will cause a CBR refresh
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>word 0xa0000000 = 0
>// Enable SDRAM and send MRS to configure SDRAM
>word 0x48000000 = 0x000009c9
>word 0x48000040 = 0x00220022
>// GPSR1: Set FF_RXD/GPIO34 High
>// GPSR1: Set FF_RXD/GPIO39 High
>word 0x40e0001c = 0x00000084
>// GPDR1: Set FF_RXD/GPIO34 Input
>// GPDR1: Set FF_RXD/GPIO39 Output
>//word 0x40e00010 = 0x00000080
>word 0x40e00010 = 0x00020080
>// GPFR1_L: Set FF_RXD/GPIO34 FF Function 01
>// GPFR1_L: Set FF_RXD/GPIO39 FF Function 10
>word 0x40e0005c = 0x00008010
>// setup FFUART
>// disable uart and disable interrupts
>word 0x4010000c = 0x00000000  // DLAB off
>word 0x40100004 = 0x00000000
>// set baud rate divisor
>word 0x4010000c = 0x00000080 // DLAB on
>word 0x40100000 = 0x00000008 // 115200
>word 0x40100004 = 0x00000000  // IER_DLH = 0
>// set parameters to 8, n, 1
>word 0x4010000c = 0x00000000  // DLAB off
>word 0x4010000c = 0x00000003  // LCR=3 8 bit character
>// set polled mode
>word 0x40100004 = 0x00000000
>// set normal UART mode
>word 0x40100010 = 0x00000000  // MCR = 0
>// enable UART
>word 0x40100004 = 0x00000040
>// enable and clear FIFOs
>word 0x40100008 = 0x000000c1
>word 0x40100008 = 0x000000c3
>word 0x40100008 = 0x00000005
>// send ABCD
>word 0x40100000 = 0x0000000d  // CR
>word 0x40100000 = 0x0000000a  // LF
>word 0x40100000 = 0x00000041
>word 0x40100000 = 0x00000042
>word 0x40100000 = 0x00000043
>word 0x40100000 = 0x00000044

Thanks,
Joe Porthouse
Toptech Systems, Inc.
Longwood, FL 32750
-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org
[mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Jay Foster
Sent: Tuesday, August 29, 2006 8:30 PM
To: 'jporthouse@toptech.com';
ecos-discuss-return-35843-jporthouse=toptech.com@ecos.sourceware.org
Cc: ecos-discuss@ecos.sourceware.org
Subject: RE: [ECOS] Network driver problem only with larger programs (ARM
adv needed)

Which ARM core are you using?  I had some similar weird problems with an
ARM940T processor that turned out to be cache related.  This was
particularly tricky with the JTAG debugger.  After loading the application
code I needed to do the equivalent of the HAL_ICACHE_SYNC() macro to flush
the instruction cache, flush and clean the data cache.  Then again, your
problem might be something completely different.
Jay

-----Original Message-----
From: Joe Porthouse [mailto:jporthouse@toptech.com]
Sent: Tuesday, August 29, 2006 2:02 PM
To: ecos-discuss-return-35843-jporthouse=toptech.com@ecos.sourceware.org
Cc: ecos-discuss@ecos.sourceware.org
Subject: RE: [ECOS] Network driver problem only with larger programs
(ARM adv needed)


I enabled asserts and stack checking and the problem stopped.  I then turned
off asserts and stack checking and the problem did not reoccur...until
today.

Now with asserts and stack checking enabled I get no errors, but the
execution still gets hung up in the cyg_do_net_init() call from the
cyg_hal_invoke_constructors() routine.

Using breakpoints and the traceback feature of my JTAG I can see exactly
where things go wrong, but don't know why.

All constructors get called correctly until the cyg_do_net_init is called.
When this occurs execution gets two instructions into the procedure and then
jumps into the middle of the cyg_timeout() function where it enters an
endless loop.

Checking addresses and registers everything looks ok (to me).  I have even
tried this on three different pieces of hardware.  I am at a complete loss
on why this is occurring.  I can step through the same piece of code in a
small program and execution occurs as expected.

Any advice would be greatly appreciated.

Trace leading up to the offending instruction looks like:
       hal_misc.c Line 202 (cyg_hal_invoke_constructors)
       202               (*p) ();
       000E937C e1a0e00f  MOV       LR,PC
  TRIG 000E9380 e414f004  LDR       PC,[R4],#-004   // jump from here
       001007C8 e52de004  STR       LR,[SP,#-004]!  // to here, ok!

Registers at this point are:
R0  00008000
R1  00000004
R2  003d940c
R3  0037d0fc
R4  0037d85c <- constructor table address, good
R5  0037d848
R6  0b0b0b0b
R7  0b0b0b0b
R8  00000000
R9  a0003000
R10 0010032c
R11 0037f00c
R12 003d940c
SP  0037eff8
LR  000e9384
PC  001007cc <- PC jumped to correct address, now at 2nd address
CPSR 200000d3
SPSR 000000d3

Execution should follow the listing as:
_GLOBAL__I.52100_cyg_do_net_init: 
001007C8 e52de004   STR       LR,[SP,#-004]!  <- jumped here ok.
001007CC e3a01ccb   MOV       R1,#0000cb00    <- PC now here.
001007D0 e2811084   ADD       R1,R1,#00000084
001007D4 e3a00001   MOV       R0,#00000001
001007D8 e49de004   LDR       LR,[SP],#004
001007DC eafffff1   B        _Z41__static_initialization_and_destruction_0ii

But on the next step execution jumps into timeout() at address 00100330:
262                     cyg_uint32 
263                     timeout(timeout_fun *fun, void *arg, cyg_int32
delta) 
264                     { 
cyg_timeout: 
00100308 e1a0c00d   MOV       R12,SP
0010030C e92dddf0   STMFD     SP!,{R4-R8,R10-R12,LR,PC}
00100310 e24cb004   SUB       R11,R12,#00000004
00100314 e1a07002   MOV       R7,R2
00100318 e1a08000   MOV       R8,R0
0010031C e1a0a001   MOV       R10,R1
265                         int i; 
266                         timeout_entry *e; 
267                         cyg_uint32 stamp; 
268                      
269                         // this needs to be atomic - recursive calls
from the alarm 
270                         // handler thread itself are allowed: 
271                         int spl = cyg_splinternal(); 
00100320 ebfffd88   BL        cyg_splinternal
274                         for (e = _timeouts, i = 0;  i < NTIMEOUTS;  i++,
e++) { 
00100324 e59f4060   LDR       R4,0010038c
272                      
273                         stamp = 0;  // Assume no slots available 
00100328 e3a05000   MOV       R5,#00000000
0010032C e1a06000   MOV       R6,R0
00100330 e1a02005   MOV       R2,R5       <- WHY ARE WE HERE NOW???
275                             if ((e->flags & CALLOUT_PENDING) == 0) { 
00100334 e5943014   LDR       R3,[R4,#014]
00100338 e2822001   ADD       R2,R2,#00000001
0010033C e3130004   TST       R3,#00000004
00100340 0a000006   BEQ       cyg_timeout+58
00100344 e3520007   CMP       R2,#00000007
00100348 e2844018   ADD       R4,R4,#00000018
0010034C dafffff8   BLE       cyg_timeout+2c
282                             } 
283                         }


Joe Porthouse
Toptech Systems, Inc.
Longwood, FL 32750



-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss




-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]