The AVR's onboard analog-to-digital converter is very slow. Too slow even for very low resolution (128x123) image capture. Frame rate was around 0.25 - 0.5 fps (that's one frame every 2-4 seconds).
Instead of adding an external ADC (I'll try that later), the simplest option for speeding up the frame rate was to use the MCU's built in analog comparator. Since the system only had to detect a bright spot, an 8-bit greyscale capture was unnecessary.
The AVR comparator sets one of the register bits high if the input voltage (typically the AIN1 pin) exceeds the reference voltage (the AIN0 pin) and sets it low if not.
Doing so takes a measly 1-2 clock cycles. At 16MHz that's 0.125 µsec, which is several orders of magnitude faster than the ADC. For example, using the Arduino analogRead() function takes 100 µsec. To maximize the frame rate there was a few more tricks to implement. But first...
AVR Analog Comparator
Here's how to use the basic features of the AVR analog comparator. The AVR has two pins, AIN0, the positive input for the comparator and AIN1, the negative input. Optionally you can use any of the other ADC pins for the negative input, but let's focus on the simple solution, using AIN1.
To set things up (don't panic, code will follow shortly)...
- Set up the AIN0 (PD6) and AIN1 (PD7) pins for input
- Enable the ADC -- and/or set ADEN (ADC enable) in the ADCSRA register
- Enable the comparator -- clear ACD (analog comparator disable) in the ACSR (analog comparator control and status register)
- Disable the comparator multiplexer -- clear ACME (analog comparator multiplex enable) in the ADCSRB register.
- Disable interrupts for the analog comparator -- clear ACIE in the ACSR.
// Initialize Comparator - obviously this is done differently for AVR
pinMode(6, INPUT);
pinMode(7, INPUT);
// ACD=0, ACBG=0, ACO=0 ACI=0 ACIE=0 ACIC=0 ACIS1, ACIS0
// - interrupt on output toggle
ACSR = 0b00000000;
// ADEN=1
ADCSRA = 0b10000000;
// ACME=0 (on) ADEN=0 MUX = b000 for use of AIN1
ADCSRB = 0b00000000;
Using the comparator substantially increased frame rate... to about 1 fps. But it was still a little too slow, primarily because of the object detection being performed on the AVR.
The Final Tricks
A high frame rate, or even 10fps would've been nice to achieve. But for purposes of aiming the firefighting robot at a candle, a sad 3 fps was acceptable.
Getting to that level of "performance" involved code tuning, which consisted of reducing and optimizing the machine code between the clock pulses sent to the camera, and reducing the size of the code as much as possible, and finally eliminating parts of the code.
Simple Machine Code Optimization
My simple process for optimizing machine code of compiled Arduino source is as follows:
- Compile within the Arduino IDE,
- Generating an assembly file from the command line (Cygwin in this case
- Count instructions and look at data references.
- Change code to try and reduce instructions
- Change the way data is referenced (e.g., several arrays or array of struct; copy point to local variable)
- Repeat process to see if changes reduced the instruction count
Nothing sophisticated, mind you. Just a question of trying different ways to reference data structures and write code that reduced assembly instructions. This helped a little.
To do it, you'll need to run the avr-objdump command on the elf file generated by the Arduino IDE compiler. The elf file can be found in the applet subdirectory of your project. The command to run is:
avr-objdump -S project.cpp.elf > project.S
You can then edit the .S (assembly) file to count instructions. Source code appears as comments in the assembly file to make it easier to locate relevant code. For example:
// Continue reading the rest of the pixels and flood fill to detect bright objects
// The camera seems to be spitting out 128x128 even though the final 5 rows are junk
for (y = 0; y < 123; y++) {
if (y < 16)
972: 10 31 cpi r17, 0x10 ; 16
974: 10 f4 brcc .+4 ; 0x97a <__stack+0x7b>
sbi(CAM_LED_PORT, CAM_LED_BIT);
976: 5d 9a sbi 0x0b, 5 ; 11
978: 01 c0 rjmp .+2 ; 0x97c <__stack+0x7d>
else
cbi(CAM_LED_PORT, CAM_LED_BIT);
97a: 5d 98 cbi 0x0b, 5 ; 11
97c: 24 2f mov r18, r20
97e: 50 e0 ldi r21, 0x00 ; 0
980: 71 e0 ldi r23, 0x01 ; 1
The big difference came when I eliminated the part of the object detection code that attempted to merge nearby objects on the fly during capture. That work is now done after the image is captured and object coordinates are generated. The approach worked reliably and improved frame rate considerably.
Conclusion
I felt I hit a wall in speeding up the code and put it on the back burner for awhile. Working up to a 5 or even 10 fps frame rate with an AVR seems like a daunting task. Let alone the 20-30fps some robotic camera systems can achieve.
I am contemplating a processor upgrade, instead of more attempts at optimization. I recently purchased a Parallax Propeller to play with. Another possibility is an inexpensive ARM processor I ran across.
I would like to try using the camera for robust object detection and avoidance and that, most likely, will require greyscale capture. I have a few fast ADCs to experiment with if the ARM or Propeller can't hack it. Even without greyscale, vision-based object avoidance will need a much higher frame rate.
the atmega328p doc says that one adc conversion takes 13 cycles - at 4mhz that seems to be fast enuf for
ReplyDeletemore than one frame/conversion per sec - anyway thanx for the words about the acme bit -
Glad to help. It's been awhile since I looked at this. The info I have is that the 328P has a maximum ADC conversion throughput of only 76.9ksps which is basically 13µs per conversion. Doing nothing else, that's 4.88fps which is quite a bit better than I was achieving.
ReplyDeleteSince I wrote this article, I was able to write some AVR code that drives the Game Boy camera to 30fps, but without conversions or image processing. The main limitation was the slow ADC. So I'm looking at either using a dsPIC33F which has a 1MSPS ADC and runs at twice the clock speed, or a Propeller with a high speed parallel ADC attached.
Even one of the NXP ARM processors I've been playing with (e.g., an LPC2103, 60MHz) would be an improvement with a 200kSPS ADC which would put 10fps within reach.
Using ARM or dsPIC with DMA might eke out a bit more performance. Hmm...
You might want to check out the LeafLabs Maple boards.
ReplyDeleteM3 Arm @ 72Mhz in the Arduino format with a similar IDE branched off from v0018.
I have an Olimex -STM32 Arduino board with all kinds of extra goodies on it..