XMOS SPI speed

On the ARM Cortex-M3 96MHz mbed, I was able to SPI write out 3456 bits of data at 1.6kHz — plus some integer math and memory look-up time. How fast is the XMOS, where each core is clocked at 400MHz, four times the speed of the mbed? Here’s my XMOS code, which does SPI writes in 32-bit chunks. There is no hardware peripheral specifically for SPI, so you have to make use of their buffered I/O ports.

I clocked it at 2.5kHz (108 transfers of 32-bits), which isn’t that much faster than my mbed code. The mbed had serial at 16MHz, but the XMOS code is 12.5MHz, so the comparison isn’t 100% fair. Below is the main code that writes out data. Note that the XMOS has to do some extra work for each transfer. Reconfiguring the port seems to take around 1 us, which is a waste of time.

Of course, the majority of the time is spent in the serial transfer itself. If there were absolutely no overhead, the maximum transfer rate would have been 12.5MHz / 3456 = 3.61kHz. The nature of XMOS buffered I/O and SPI mode 0 adds around 30% overhead.

void send_word(unsigned int data) {
	//check MSB of data and set MOSI line accordingly
	//skip check if you need SPI mode 3, and set clock pattern to 0xFFFFFFFF

	if (data & 0x80000000)
		configure_out_port(mosi, blk2, 1);
		configure_out_port(mosi, blk2, 0);

	//reverse data to big endian order
	mosi <: (bitrev(data) >> 1);

	//clock pattern
	sclk <: 0x55555555;
	sclk <: 0x55555555;
	// wait until clocked output operation is finalized.

RGB Propeller Clock – Final Day

What can 2 ECE’s, a MechE and a CS major do in only 5 days of hacking? This is a project done for CMU’s hack-a-thon session called Build18. The LED boards were previously designed by me but all other work was done in the week-long session.

96 RGB LED’s driven by 12x TLC5947 LED drivers. 12-bit PWM resolution per color channel per LED.
Diameter: 1.5 feet
Full brightness: 6 amps of LED current
Didn’t have time to figure out how to transfer through motor, so it is battery powered by 2x 4AA NIMH cells which have the perfect discharge curve for what the LED drivers need.

Don’t forget the obligatory Jimmy Wales face:

Flickering is due to interaction between camera frame rate and update rate of the propeller clock and is not that bad in real life. Thanks to all my group members who have done the majority of the work!

Costas Akrivoulis – Junior ECE, Coding, mounting, lots of stuff

Jitu Das – Junior CS, algorithms for display and pre-processing of images

John Howland – Senior MechE, designed the mount and all mechanical parts — the hardest part for us ECE majors

Allan Wang – Senior ECE, Designed LED boards, hardware

Much more technical details will be posted shortly after the dust settles from demo day.


RGB Propeller Clock: Day 1/5 — mbed Benchmarking

Start from Day 0: Introduction here

I’ll first start off with some speed tests to see the maximum refresh rate achievable with the mbed, which has a 96 MHz ARM Cortex-M3 processor.

Interfacing the MCU to the TLC5947′s is interesting because the TLC5947′s run off 5V, but the MCU runs off 3.3V. The TLC5947 datasheet also states the maximum clock is 15 MHz when cascaded. I tested with 2 cascaded sticks (8x TLC5947), and found I can clock SPI to 24 MHz (maximum speed supported by the MCU)! With 3 cascaded sticks, only 16 MHz works. This is still pretty amazing because the MCU output is driving a large capacitive load with voltage logic levels mismatched. From looking at the scope captures, the difference in logic high amplitude between 16 MHz vs 24 MHz is very small, yet one fails and the other works. Even dialing down the TLC5947 voltage doesn’t fix anything until they’re brought to around 3.6V. All of this isn’t a huge deal, because SPI communication is at most 5% of the total time spent in each cycle.

The test is to cycle through all the hue’s for each LED, displaying a rainbow across the stick. Therefore a conversion must be done from HSV space to RGB space. Then to display each pixel, a gamma correction calculation must be performed for each color, since the human senses are logarithmic. This involves doing an exponential and a few floating point operations. How fast can the ARM perform these operations?

There are 3 sticks cascaded, for a total of 96 RGB LED’s — 288 pixels with 12-bit resolution. My optimized code refreshes at 256 Hz, which is great for a stationary display, but probably a bit on the low side for a rapidly moving one, like using it for a propeller clock. I noticed a few tricks that improved performance by about 20% total:

  1. Explicitly cast all constant floating point numbers into floats. Otherwise the C compiler will default them to doubles and the ARM will perform double calculations, which is slower.
  2. Turn division of floating point numbers into multiplication.
  3. When possible, use integers as an operand for floating point math. For example, if hue is a float, hue / 60 is actually faster than hue * 0.0166667.

My messy benchmark code can be found here. P.S. the mbed interrupt API is really neat and simple to use.

I’m sure removing the floating point code entirely will greatly speed up the refresh rate, but this is harder to develop. Another thing to try is to use a lookup table to save the use of the exp() function and a floating point multiply. Just tried the LUT and got 600 Hz — pretty good improvement.

Then I converted the HSV to RGB code into pure integer math. This got it to 1590 Hz! Maybe the XMOS won’t be needed after all.

Tomorrow we’ll give the XMOS dev board a try, which has 1600 MIPS — almost 16x the speed of the mbed.


RGB Propeller Clock: Day 0/5 — Introduction

As part of my school’s electronics hacking/tinkering session called Build18, I’ll finally be spinning the RGB sticks I designed over two years ago. Starting this week on Sunday Jan. 15, groups will only have 5 days to complete their project. This “annual engineering festival” is a great way for ECE students of all years to participate in designing a practical embedded system with help from their peers.

This is the inspiration for the project:

I believe this is one of the best ones out there right now. But note that only 8 colors are possible — each R, G, B pixel can only be off or totally on. My RGB sticks use the TLC5947 to achieve 12-bit resolution per color, for 4096 viewable levels. However, this greatly complicates the controller. The refresh rate must be very high since the LED’s are in rapid motion, but adding resolution will greatly slow down the update rate.

Power will be transmitted through the spinning motor to the RGB sticks. I’m going to try to spin as many of the chained sticks as possible, each one adding 8″ to the diameter of the propeller clock. I have no idea if the mechanics of this will work, or even if modifying the motor to transmit power will work. I don’t think we’ll get DC power out of the motor due to the three brushes alternating contacts, but I could be entirely wrong. If all else fails, I’ll just bruteforce it and just spin 4x AA batteries and live with the few hour battery life :)

To communicate with the microcontroller on the spinning device, a pair of nRF2401A transceivers will be used. In ground frame, there will be an additional microcontroller hooked up to sensors and controlled by a laptop which will send high-level command signals to the spinning device.

For maximum refresh speed and awesomeness, we’ll try using the extreme overkill XMOS XC-1A dev board, which has a XS1-G4 micro, capable of operating at 1600 MIPS with high parallelism, perfect for generating data to send to shift registers really fast. If this ends up being too complicated, the fallback plan is to use an mbed, which has an ARM Cortex-M3 running at a measly 96 MIPS. Each RGB stick has 32x RGB LED’s, and the shift registers are 12-bits, so there’s 1152 bits to update per stick per refresh. Displaying an intensity correctly involves doing some math to perform gamma curve correction, or resorting to using a 4096×12 bit lookup table per pixel, which can be slow.

I’ll be posting progress updates throughout the week. Day 1: mbed Benchmarking


Yet another PCB business card (with touchpad!), part 1

I thought I’d jump on the PCB business card bandwagon and make one of my own. Except mine is going to be cooler and more functional, obviously ;) . Microchip has PIC microcontrollers in the $2 – $3 range that have 28 (!) capacitive inputs. This is big enough for a 16×12 matrix touchpad, for example. Also, they have a great app note on touchpad design and software algorithms for touch detection. It seems like using mutual capacitance is the way to go, allowing for multi-touch. My plan is using the PCB copper itself, making it very cheap and not reliant on an external touchpad component, which is difficult to obtain in small quantities. This is easy enough to do on a business-card sized PCB.

However, what use is a touchpad sensor if there’s no way to display information to the user? I was inspired by EEVBlog Dave’s uCalc design, which sandwiches a coin cell battery in between two PCB’s. Not only does this allows an extremely thin final product, but it makes use of the 3rd dimension to add more features! I can put LED’s on the bottom PCB and shine their light through the top PCB so that the user can see it. Something neat would be where a touch causes an outward ripple effect, or something.

My design will have two PCB’s: one on top, silkscreened with the usual business card info. All the components will be surface-mounted on the bottom surface of this top PCB, so that the top surface is entirely flat. The bottom one will have a grid of LED’s, arranged in a matrix. The top PCB will be very thin, 0.8mm, and the soldermask color will be white. Hopefully this will allow the light from the bottom LED’s to shine through the top. Of course, I won’t be able to use a regular LED matrix due to height reasons. Instead I’ll have to solder all the 60+ LED’s manually… To communicate between the two boards, a low profile, 1.8mm height SMT connector from Samtec is used.

Now, a coin cell battery is nominally 3.0V, but drops to 2.5 – 2.8V in operation. This is enough to light up red, orange, or yellow LED’s, but green and blue LED’s require over 3V to light up. So, I have to decide on whether to use two batteries. The single battery version is only 3.6mm thick (note that a credit card is 0.8mm thick). The stacked version is 4.8mm thick. I could try to use two batteries but space them out so they don’t need to stack, but then I lose the nice rectangle touchpad shape I have.

The difficult part will be controlling a matrix of LED’s from the PIC, which is relatively slow at 4 MIPS. Since my design will be low current, I’ve decided on using 74HC595 shift registers to both source and sink current for the matrix rows and columns. I’d like to be able to do PWM dimming of the LED matrix as well. From a quick prototype, it takes around 50us to update one row (the majority of the time is spent deciding what to send to the shift registers, not the communications themselves). At a target update frequency of 100Hz and 8 rows to update, I can only get 16 levels of PWM dimming, using up 64% of the processor, which is unfortunate… However I don’t really see a better way of doing this. I may end up getting an ATtiny or ATmega slave to do the LED matrix control since they are 16 MIPS. This would free up processing time for the capacitive sensing algorithms, which aren’t trivial anyway.

The total costs are less than $20, including PCB costs (thanks cheap prototyping manufacturer Seeed Studio). Once the boards come back in a few weeks I’ll post some pics and some initial results. For now, here’s a preview of the top board:

(click to enlarge)

(click to enlarge)


CMOS Op-Amp Design

Here’s a two-stage op-amp I designed for an undergraduate class. Fun times.

Spec Vcm = -0.1V Vcm = 0V Vcm = 0.1V
Open Loop DC Gain > 5000 V/V 5130 V/V 5074 V/V 4996 V/V
Unity Loop Gain Freq. > 10 MHz 10.89MHz 11.0MHz 11.07MHz
Loop Gain Phase Margin > 60° 60.48° 60.25° 60.09°
Input Offset Voltage < 3 mV 11.91uV -3.943uV -15.89uV
Slew Rate Positive > 10 V/us 56.24 V/us 59.34 V/us 56.11 V/us
Slew Rate Negative < -10 V/us -10.2 V/us -10.18 V/us -10.17 V/us
Output Swing Low -300mV -598mV -586mV -545mV
Output Swing High 300mV 497mV 557mV 554mV


Current 96.26 uA
Power 115.5 uW
Area 46.68 um^2
Compensation Cap 0.13 pF
Figure of Merit 1.427


Meh, HDL

I’ve given up on my high-speed ADC interface project because I’m unfamiliar with Verilog and don’t have the time to learn it well enough to do an actual real, complex design. Besides, to do anything useful, like interface with a computer via Ethernet or USB, requires a softcore CPU to handle the high-level protocol details that would be painful to implement in HDL. Then, I’d have to create some sort of bridge interface between the high speed ADC and the processor bus. Then, I’d have to do some embedded development for the processor, which is another toolchain to set up and figure out. That’s a pretty big PITA when all I really wanted to do was to try to produce a hobbyist board with a 780-pin BGA and put the signal integrity knowledge I have into practice.

I’d ask my classmates, who all seem to be pursuing digital, but we’re all pretty swamped with our own work. Besides, there probably isn’t a market for some development board with 8GB of ram connected to a cheap $50 – $200 FPGA anyway.

Lately I’ve been getting into analog IC design. Soon I’ll have to choose between pursuing a more academic path and spending an extra year to make time for research projects, or just graduating in four years and pursuing a career in general PCB-level system design. It seems like the latter is way more accessible and easier. However, it seems like there’s limited potential in that career path compared to doing IC design.


Fedex shipping rate table parser in PHP

This isn’t what I’d typically blog about here, but I figure this could save someone some time.

In my other life as a web application developer, I needed to implement a Fedex shipping rate calculator in PHP. Here is a script that parses Fedex’s shipping rate tables and outputs SQL insert queries. Of course, your database schema will vary but it should be simple enough to modify to your needs. I decided to just have it run on the webserver for convenience.

Running live: http://www.allanw.org/fedex/zoneprices.php
Hastily written source is: http://www.allanw.org/fedex/zoneprices.txt

Here are the FedEx Express service rate tables I used:


Slight disappointment

Unfortuntely, the low-end Xilinx Spartan-6 FPGA’s do not support interfacing to DIMMs:

Which means that for practical purposes, the memory storage size will be limited to 512MB (2x 2Gbit memory chips). To interface with DIMM’s, which contain 8 or 16 of these chips, you’d have to go to their Virtex line for which the cheapest is at least $200, too expensive for me.

But Altera’s low-end Cyclone FPGA’s do support DIMM’s. Vendor switching time! Probably should have done some more research before plopping down money for the Digilent Atyls board, but it’s pretty feature packed and could be handy to have around for the future anyway.


Digilent Atlys FPGA board

For those of you coming from Google, much better resources are provided by this guy for the Digilent Atlys. I did not go ahead with my FPGA project.

Just got Digilent’s Atlys board. It’s incredibly cheap for students at $200, discounted from the regular $350.

My plan is to develop a data acquisition system using DDR2 RAM. I’ve been interested in learning about signal integrity and high speed board designs, so I’m going to layout a board with this 900MHz analog bandwidth, 200Msample/s 11-bit dual ADC. Should be interesting. Eventually my plan is to layout my own FPGA board with DDR2 RAM stick slots, but that’s pretty complicated and I don’t have the experience/test equipment for that yet.

Here’s some pics of the Atlys since there don’t seem to be any floating around the internet yet.

Unfortunately it’s difficult to find a matching connector for this VHDCI receptacle they used on the board. It’s easy to get VHDCI receptacles but plugs are another matter. The ones they use for their own expansion boards have to be shipped from Taiwan with a $100 shipping charge, which is pretty ridiculous. I found one alternative, from Samtec:


However, these are edge-mount. The differential pairs go on opposite layers, so that there’ll be an imbalanced via change if you use microstrip routing. Oh well. Shouldn’t matter too much for relatively lower speed signals.