[meta-xilinx] xilinx_emacps.c

andrey andrey at elphel.com
Mon Nov 18 14:20:59 PST 2013


We are working on our Zynq-based board ( http://blog.elphel.com ) and I had to get into the network driver - it did not work out of the box with Atheros AR8035 we have. I've spent quite a while troubleshooting the problems, but then I tried the same test with the Microzed with the factory image and it performed even worse. Here is the script I used:

------------
#!/bin/sh
 IP="192.168.0.15"
 fails=0
 tries=0
 while true
 do
   ifconfig eth0 down
   ifconfig eth0 up
   ping -c 1 -w 10 $IP >/dev/null
   rslt=$?
   fails=$(($fails+$rslt))
   tries=$(($tries+1))
   echo "tries: $tries fails: $fails"
   if [ $rslt != 0 ] ; then
       echo "Recovering..."
       sleep 5
       ping -c 1 -w 10 $IP >/dev/null
   fi
 done
___________

Microzed ran 35 times, reported 3 errors and then network LEDs went off and I could not make it connect again.

What I noticed first that 
xemacps_mdio_read() and xemacps_mdio_write() were sometimes starting transmission when the shift register was in use (I added testing with spinlocks), but then I noticed even more weird thing -  xemacps_init_hw() and xemacps_probe () were writing to 0xe000b0xx registers with no effect - read from them returned 0. So temporarily I just added waiting for access to these registers (sometimes for milliseconds) outputting lines of "-" to watch :
/**
 *  Temporary fix for some bug - losing access to I/O registers (dmatest suspected)
 *  @lp: local device instance pointer
 */
static int wait_register_access(struct net_local *lp)
{
    int timeout=1000;
    int tries=0;
    u32 regval_dbg=xemacps_read(lp->baseaddr, XEMACPS_PHYMNTNC_OFFSET);
    
    if ((xemacps_read(lp->baseaddr, XEMACPS_NWCFG_OFFSET) | xemacps_read(lp->baseaddr, XEMACPS_NWSR_OFFSET)) == 0 ) {
        for (tries=1;tries<timeout;tries++){
            if ((xemacps_read(lp->baseaddr, XEMACPS_NWCFG_OFFSET) | xemacps_read(lp->baseaddr, XEMACPS_NWSR_OFFSET)) != 0 ) break;
            printk("-");
        }
        dev_warn(&lp->pdev->dev,"Seems I/O register access was lost. Waited %d (of %d), Now [XEMACPS_NWCFG_OFFSET]=0x%08x,  [XEMACPS_NWSR_OFFSET]=0x%08x, old [XEMACPS_PHYMNTNC_OFFSET]=0x%08x\n",tries,timeout, (int) xemacps_read(lp->baseaddr, XEMACPS_NWCFG_OFFSET), (int) xemacps_read(lp->baseaddr, XEMACPS_NWSR_OFFSET),(int) regval_dbg);
        if (tries<timeout) return 0;
        return 1;
    }
    return 0;
}    

The access to the registers was restored by just waiting and I suspect dmatest, the initial link/no link (cured by ifconfig eth0 down;ifconfig etrh0 up) depended on if MMC detection interrupted a series dmatest starts or not (sometimes it was detected earlier, sometimes - later):
x000000000000-0x000000100000 : "nand-main"1
TCP: cubic registered1
NET: Registered protocol family 17
VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
Registering SWP/SWPB emulation handler
dmatest: Started 1 threads using dma0chan0
dmatest: Started 1 threads using dma0chan1
dmatest: Started 1 threads using dma0chan2
dmatest: Started 1 threads using dma0chan3
mmc0: new high speed SDHC card at address b368
dmatest: Started 1 threads using dma0chan4
mmcblk0: mmc0:b368 NCard 7.48 GiB (ro)
dmatest: Started 1 threads using dma0chan5
 mmcblk0: p1
dmatest: Started 1 threads using dma0chan6
dmatest: Started 1 threads using dma0chan7
xemacps e000b000.ps7-ethernet: attach [Atheros 8035 ethernet] phy driver, phy_addr 0x3, phy_id 0x004dd072

With my temoporary fix I'm getting much better results - no failures reported by the script for several thousand tests. But I still do not understand the source of the problem and what I'm doing can not guarantee 100% - I'm only waiting to get access to the registers, but the driver can lose it later as it seems to be caused by asynchronous process.

Can these other unrelated drivers (dmatest, mmc) somehow remap IO registers "for themselves" and deny xilinx_emacps.c its mapping? At least if dmatest is turned off in kernel config these problems do not reveal themselves.

Here ( http://sourceforge.net/p/elphel/meta-elphel393/ci/master/tree/recipes-kernel/linux/linux-xlnx/xilinx_emacps_elphel393.patch )  are the modification to the driver - both to use Atheros AR8035 and try to troubleshoot the problem I decribed.


Andrey





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.yoctoproject.org/pipermail/meta-xilinx/attachments/20131118/30786ea6/attachment.html>


More information about the meta-xilinx mailing list