Monday, October 22, 2012

Expert C Programming: Deep C Secrets

I read a friend's copy of "Expert C Programming" many years ago, and ended up buying a copy for later reference. I didn't quite remember why I would purchase a book I had read already. Recently, I cracked open my copy for the first time: it still had the new book smell.

Oh, now I remember.

This is the most enjoyable programming book I have ever read.

The book is about corner cases and implementation-quirks of C, with a smattering of C++ thrown in. The entire book is divided into themes, and each theme is self-contained in a chapter. Each chapter talks about a quirk of the language, something that good C programmers should be aware of.  For example, one chapter talks about the differences between Kernighan&Ritchie C and ANSI C. Each idea is crisply presented, with source code and clear explanation. The chapters are littered with relevant anecdotes, historical background, and tantalizing puzzles.

Programming books can be dry and just about the facts. Recent technologies usually come with books that focus on how to get things done. While this is useful, it does not improve the appreciation of the technology. Expert C Programming was written 20 years after the development of the language. Perhaps time leads to some clarity and a light-hearted view of things.

Programming is more than just a profession: it is a creative process. It helps to have such books in your shelf: so you can them pick up and remind yourself how fun computing can be.


(Image courtesy: Amazon)

Wednesday, September 19, 2012

Android debugging using the framework source

When writing Android apps, it is helpful to be able to read the implementation of the framework to clarify a point.  This might be when the documentation is incomplete, or if you want to learn from canonical classes like ListView.

Nexus devices contain this framework code unmodified. This allows you to trace your application flow down to the framework level, either to learn how the platform works or to find a bug. Today, the cheapest Nexus device is $199, so having an additional Nexus device is an excellent tool for Android developers, even if you do most of your development for non-Nexus devices.

It is easy to get the source code from the Android Open Source Project (AOSP). You need the following:
  1. A good Internet connection. The full AOSP tree is well over 8 gigabytes of data. If you have more than one Android developer, you could download the tree for a single developer, and then mirror it locally.
  2. A fast computer. AOSP is a lot of code and Eclipse chews a lot of CPU and RAM with large projects. A computer with two CPUs and about 4GB of RAM should do perfectly, you might get away with lesser RAM.
  3. Disk space. If you choose to build the code, you're look at 30 gigabytes of disk space.
  4. Eclipse, some recent version and Sun's Java 1.6.
  5. Linux or Mac. The instructions work for Ubuntu 10.04/11.10/12.04 and some Mac version.

Here are all the steps:
  1. Set up your computer: This can be done on all developer computers simultaneously. This will install all the development tools. At the end of this, you have all the tools but no AOSP code.
  2. Download all the AOSP source: This uses your internet connection to download all the source. The AOSP source has everything going back to the earliest releases of Android. You get all source history, the framework, the open source applications, the kernel, all open drivers, everything. There are instructions on that page to set up a local mirror. Follow those steps to download the source once and then share it over your local network. This conserves bandwidth, and saves time. You might want to start this download over a weekend, depending on your network connection and other users' needs. I will assume that you are putting the source in /usr/local/aosp.
  3. Build the source. Here, you have two options. You could follow the AOSP instructions. However, if you just want to include the source in an eclipse project, those instructions are overkill. You can get a much smaller project if you follow these alternate instructions. If you have trouble with java versions, read the bottom of this post.
    $ cd /usr/local/aosp
    $ source build/envsetup.sh
    $ lunch full-eng
    $ # The following step takes time. -j<num jobs> increases parallelism.
    $ make -k -j2 sdk
    
  4. Create the Eclipse classpath. By default, we can create a project with all the Android source code. This is beneficial if you want to have access to all the AOSP code for reference. If you want just the framework, you can reduce the size of the project significantly. To do this, first create a file in the directory containing the following:
    $ cd /usr/local/aosp
    $ cat excluded-paths 
    ^external/.*
    ^packages/.*
    ^cts/.*
    ^dalvik/.*
    ^development/.*
    ^prebuilts/.*
    ^out/.*
    ^tools/.*
    ^sdk/.*
    ^libcore/.*
    ^gdk/.*
    ^hardware/.*
    ^device/.*
    
    This file allows you to reduce the size of the Eclipse project, which improves Eclipse's performance significantly. Now, we can generate the Eclipse classpath as follows:
    $ cd /usr/local/aosp
    $ ./development/tools/idegen/idegen.sh 
    Read excludes: 3ms
    Traversed tree: 781ms
    $ ls -l .classpath
    -rw-rw-r-- 1 user group 16938 Sep 18 21:50 .classpath
    
  5. Create the project in Eclipse. First you need to start Eclipse with increased heap size and virtual memory:
    $ eclipse -vmargs -Xms128m -Xmx512m -XX:MaxPermSize=256m
    
    Now, create a new Eclipse project (File -> New Java Project -> Next). In the dialog, under the "Libraries" tab, click the "Add Library" button -> Add System Library -> Add JRE system library. This will help resolve the references to core libraries like Integer and String. Adding the system library is not required, but it reduces the number of syntax error Eclipse finds with the framework. Click "Finish" when done.
  6. Done! Try your setup by searching (Ctrl-Shift-T) for FragmentManager. You should be able to see its source code, and navigate through its code. Some handy commands are: Ctrl-Shift-G to look for references of a class, and F3 to look for a method's implementation.

JDK version pain: You might encounter a problem with JDK versions. You need Sun Java 1.6 to compile the AOSP source code, while your Eclipse setup might require a different version (OpenJDK?). One solution is to use sun-java only to compile the AOSP, and switch back to the previous version after the compilation has finished. On Ubuntu, this is done with the commands shown below. Select the number that corresponds to sun-java before the compilation, and run these commands after running idegen.sh to switch back to your previous version of jdk.
$ sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                         Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-6-openjdk-amd64/bin/javac   1061      auto mode
  1            /usr/lib/jvm/java-6-openjdk-amd64/bin/javac   1061      manual mode
* 2            /usr/local/sun-java-1.6/jdk/bin/javac         1         manual mode

Press enter to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/lib/jvm/java-6-openjdk-amd64/bin/javac to provide /usr/bin/javac (javac) in manual mode.
$ sudo update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                            Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      auto mode
  1            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      manual mode
* 2            /usr/local/sun-java-1.6/jdk/bin/java             1         manual mode

Press enter to keep the current choice[*], or type selection number: 2

Monday, August 27, 2012

Xmodmap: a quick guide

If you use a variety of Linux/Unix machines and are confused between their different keyboard layouts, it might help to learn the xmodmap utility.

Xmodmap allows you to read the keyboard and mouse pointer bindings and to set them. You don't need root permission, and you can modify these without restarting the X session. This is very useful if you use different keyboards, or want to use the CAPS LOCK key for something useful.

The easiest way to use the xmodmap command is to print out its current config. Run
$ xmodmap -pke

This prints out all the keycodes that are assigned. Here is a sample line:
keycode  24 = q Q U094C U0914

This says that 
  1. Keyboard code 24 corresponds to the 'q' key. Pressing the key by itself without any modifiers produces the 'q' key.
  2. If you hold shift while pressing it, it produces the 'Q' key.
  3. If you hold the mode_switch key while pressing it, it produces the Unicode 094C character (in hex)
  4. With the mode_switch and the shift, it produces the Unicode 0914 character (in hex).
The mode_switch key is used on non-US keyboards to produce non-English text. If you don't know what it is, you don't need it.

But you do use the shift key. How do you know which key that is? Run:
$ xmodmap -pm

That command tells you what the modifier keys are. Looking for the 'shift' modifier, you can see that it is mapped to two different keys:
shift       Shift_L (0x32),  Shift_R (0x3e)

They have hex keycodes 0x32 and 0x3e. 0x32 is 32 in hex which is 50 in decimal. If you search for keycode 50 in the output of 'xmodmap -pke', you will see the following line:
keycode  50 = Shift_L ISO_Prev_Group Shift_L ISO_Prev_Group

So that's the left shift key.

All these keys can be remapped. The most common use is to remove a key you do not use, like the CAPS LOCK key. That is the 'lock' modifier. On my keyboard, the lock modifier is:
lock          Caps_Lock(0x42)

It is assigned to the Caps_Lock key. Let's remove this binding. To remove bindings, you can write commands as arguments to xmodmap -e, and you can supply many commands in succession, as follows:
$ xmodmap -e "command1" -e "command2"

To remove the Caps_Lock binding, we will remove the 'lock' modifier from it.
$ xmodmap -e "remove lock = Caps_Lock"

If that didn't produce any output, it was successful. Validate by running "xmodmap -pm" again. Now, the lock line should look like
lock

Now Caps_Lock doesn't produce a shift lock anymore. You can also verify by opening a text window and pressing the caps lock key. It will not do anything.

Now, let us put the key to good use: how about assigning it to control? Run:
$ xmodmap -e "add Control = Caps_Lock"

Great, now your caps lock is an additional control. We can combine these two commands:
$ xmodmap -e "remove lock = Caps_Lock" -e "add Control = Caps_Lock"

Easy remapping, and it works independent of the window manager, the keyboard type, and the version of Linux.

In case you want to find what key code a specific key produces, run the 'xev' program. It tells you in great detail which keystrokes are pressed and released, and also all mouse events.  Here is a sample line:
KeyPress event, serial 36, synthetic NO, window 0x3e00001,
    root 0x15d, subw 0x0, time 1355682, (152,6), root:(783,52),
    state 0x4, keycode 38 (keysym 0x61, a), same_screen YES,
    XLookupString gives 1 bytes: (01) " "
    XmbLookupString gives 1 bytes: (01) " "
    XFilterEvent returns: False

I pressed a key which produces keycode 38. Using this method you can narrow down on an unused key and put it to good use.


Sunday, June 10, 2012

STM32 with Linux: Device support. Part 1

I recently got a STM32F0-Discovery board. This is a development board that contains a versatile ARM microcontroller and a supporting chip to program and debug the microcontroller. This is Part 1 of a short series to get you up and running with these devices.

0. Motivation
Microcontrollers are computers that contain a processor, memory and storage all in one tiny package. Such computers are used inside devices that need a small amount of processing power: in televisions, printers, or electronic music players. These computers are easy to program, they utilize little power, and are rugged enough to last many decades of continuous use. I had previously posted a set of tutorials showing how to program ARM processors using assembly. These tiny microcontrollers can be programmed with a subset of the ARM instruction set, making them easy to learn.

The STMF0 discovery board is a low cost development board intended to demonstrate the advantages of the platform. It looks very compelling: it has many Input/Output lines, it is low cost ($7.50 each), and it uses ARM assembly. Mike Szczys recently wrote a link program that makes this device programmable within Linux. When I received my device this week, I decided to give the Linux interface a try. In case you have one of these devices and an Ubuntu machine: here are the steps you need to follow to get your device working inside Linux.

You will need:
  1. An STM32F0 Development board. At this page, go to Orderable products -> Distributor Availability to find a retailer near you to buy from. 
  2. A Linux computer with root access.

1. Install Development packages.
You need a few packages as root:
sudo apt-get install pkg-config libusb-1.0.0-dev git

2. Installing ST-Link
This is code written by Mike Szczys that allows you to communicate with the discovery board in Linux.

mkdir stm_discovery && cd stm_discovery
git clone  https://github.com/texane/stlink.git
cd stlink
./autogen.sh && configure
There shouldn't be any errors in the output to configure. If there are errors, they usually point to missing development tools, so fix them before continuing. Otherwise, proceed to the next step:
make
Checkpoint: You should have two executable files: st-util and st-flash in your directory. They won't do anything yet, for that we'll need device support.

3. Device support.
If you have the STM32 device plugged in, unplug it. If you have a v1 device, you should unplug any USB storage disks like flash disks as well. This is because the v1 device fakes a USB disk, and we will need to change the USB kernel module to get it working. So, unplug usb disks now. Then, run the following commands:
sudo cp 49-stlinkv* /etc/udev/rules.d   # Allow any user to access device.
sudo modprobe -r usb-storage            # Remove the module
sudo cp stlink_v1.modprobe.conf /etc/modprobe.d   # Set device specific policy
sudo modprobe usb-storage               # Add the module again

This sets up the device so any user can play with it.  There is no security issue with allowing users to write to these devices, so the above steps are safe.

Checkpoint: You should have an executable file called st-util, and running it should produce the following output:
$./st-util
2012-06-10T21:23:29 INFO src/stlink-usb.c: -- exit_dfu_mode
2012-06-10T21:23:29 INFO src/stlink-common.c: Loading device parameters....
2012-06-10T21:23:29 INFO src/stlink-common.c: Device connected is: F0 device, id 0x20006440
2012-06-10T21:23:29 INFO src/stlink-common.c: SRAM size: 0x2000 bytes (8 KiB), Flash: 0x10000 bytes (64 KiB) in pages of 1024 bytes
Chip ID is 00000440, Core ID is  0bb11477.
KARL - should read back as 0x03, not 60 02 00 00
init watchpoints
Listening at *:4242...

4. Get ARM development tools
At this point, you have support for the device, but you don't have a compiler or debugger for this board. Your computer is most likely an x86 device so you need a cross compiler to compile ARM code for it. The cross compiler is special: it generates code for a device with no operating system.

Code Sourcery is a free toolchain that compiles code for such devices. You can download it for this specific board and install it as a user. You don't need to be root to install it. This is a 100Mb file, so depending on your download speed, this might take some time. You can install it anywhere, the location isn't special. The installer asks you if you want to modify the PATH environment variable. Choose yes.

This isn't the only way to get the ARM development tools. If you want, you can compile gcc and gdb from source. Adam Kunen has a long article that describes how to do this.

Checkpoint: Start a new terminal. (This is important because your PATH needs to be updated.) You should have the programs arm-none-eabi-gcc and arm-none-eabi-gdb in your path. You can test them by invoking them:
arm-none-eabi-gdb
$ arm-none-eabi-gdb
GNU gdb (Sourcery CodeBench Lite 2011.09-69) 7.2.50.20100908-cvs
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-none-eabi".
For bug reporting instructions, please see:
.
(gdb) quit
$ arm-none-eabi-gcc
arm-none-eabi-gcc: fatal error: no input files
compilation terminated.

Now, you should have full support to program these devices in Linux and to debug them. Part 2 will show a sample program.
(Image courtesy: ST Microelectronics)

Friday, May 18, 2012

Impressive gaming: World of Goo

In order to fight off murderous tendencies while doing taxes, I play video games. This year, I played one game on my Android phone: The World of Goo.

Wikipedia claims that this is a physics-based puzzle game, but that is as accurate as describing chicken tikka as a dead bird. The World of Goo is a brilliant adventure game, told through many levels that involve tiny balls of Goo. The balls start out sticking together, and new levels add interesting capabilities. There is a single objective in every mission, usually collecting a fixed number of Goo balls towards a pipe. (To be fair, Wikipedia has a scary long page about the game. What they lack in humor, they make for in actual content and facts.)

A few years ago, I was saddened by the lack of inventiveness in new games. It seemed that every game was about the same boring plot: there were certain genres and everyone stuck to tried and tested mechanics. Walking through the aisles of a computer store, you could quickly categorize each game into a few genres: RPG, strategy, platform, with little that set the individual games apart. Compared to this, the early days of gaming were filled with creativity as programmers experimented with computers to create something fun and unique.

That wasn't the only problem. I use Linux and many games do not work on my system. Even when had the luxury to install Windows on one partition, games required registration codes and other fiddling just to get working.

In such unhappy times, I came across the demo for World of Goo. There was a Linux version, and I played one level or two and loved the idea behind the game. The game was refreshingly new, with goo balls sticking to one another and making cute sounds when they reached the destination.

For one reason or another, I put off the purchase and then promptly forgot about it. A few weeks ago, I finally bought the first Indie bundle on Android and downloaded my copy of World of Goo. It was as refreshing and enticing as I remembered it, and so I installed it on my Android phone.

Tax week was a whirlwind of administrative paperwork and an ill baby. In between copying numbers from one dull form to another, I was holding my child as he slept clinging to me. My hands were free, but I wasn't coherent enough to do anything productive. My phone was nearby, and the opportunity was perfect for the World of Goo. After a few hours over many days, I finally finished the last level. And I cried because I had no more goo world to conquer. That's how good the game is.

The game has exceptional level design. Most game developers fall in the trap of coming up with a single idea for a game: developing the engine and then churning out levels one after another. In such games, the fun dries up quickly after the first few levels. The game gets devilishly hard and the levels just get tougher. New levels are more challenging, but no more rewarding. Level two is level one, but with less time and fewer resources. That's not fun, that's a chore. In Goo, you go through different scenarios, changing weather and different kinds of goo balls. The game mechanic changes as you tackle new levels. This reminds me of Soul Bubbles for Nintendo DS, which had the same change of mechanic as the game proceeded. I'm glad the developers resisted the urge to copy-paste levels. I'm glad they spent as much time in innovative levels and level mechanics as they did on the creativity of the engine. The levels are inventive, they keep you eager to see what happens next.

As the game proceeds, the story of the goo balls unravels and takes the player through its strange and  humorous turns. World of Goo does a lot of things well, and is an instructive lesson to the cesspool of current-day gaming.
  1. Creativity counts. Come up with something unique. Computer worlds are limitless, so don't package the same crap all over again.
  2. Entertain in every level. Each level should be fun to play. Doing the same thing twice isn't fun.
  3. Leave no player behind. Goo allows you to skip baffling levels rather than requiring your user to fight through tough levels. They can always come back for levels later. This makes the game more fun and accessible.
  4. All the fun ideas, and nothing else. The game is small, but it is fun all along. That is far better than padding a game with mediocre content to increase the length.
The game is half a decade old, but is well worth playing.


Sunday, May 13, 2012

What is inside electronic devices?

Not many people see the insides of electronic devices. Here is a picture of a Linksys E-1200 router:


If you compare this with electronic boards made a decade ago, the emptiness of the board would strike you immediately. Most of the board is empty space and two black chips dominate the real estate. The remaining components are minor: resistors and capacitors.

There are two big black chips in the center of the board. In order of appearance from left to right are:
  1. Winbond W9425G6JH: That's memory, storage, or RAM.
  2. Broadcom BCM5357B0: That's everything else: CPU, wireless, ethernet, router. Everything
The fat black things in the bottom look like chips but they are not. The FPE 2020 are tranformers for electrical isolation of the ethernet signal.

Everyone knows that computers getting faster, smaller, more durable and more cost efficient. The silent revolution is in the ubiquity of computers. Tiny computers are everywhere. The desktop was once the only computer in the house. Then came along the laptop, then the smart phone. Then your television set-top-box. Routers, ebook readers, the telephone box to give you cheap international calling. Deep inside, all these devices are empty boards like the one above. One powerful chip containing a computer to do everything.

The other important fact is that these are a full, general-purpose computers. You can run browsers, games, and email programs on these computers. Manufacturers make a board with a general computer and then write router software for it. This method is cheaper than creating a specialized router-only computer. A router is very similar to a television set-top box: they both contain roughly the same parts. The only difference is in the software.

We are getting better at manufacturing these full computer chips: they cost less, last long, and fail less frequently. It is not unusual to run these computers constantly for years. This astonishing reliability is one reason why the repair industry is dying. The other reason is that there isn't much left to repair. Boards contain few parts, and if a CPU or RAM chip goes bad due to overheating or electrical surges, it is nearly impossible to repair it.  And capacitors and connectors fail much more often than CPU and RAM chips. Since there is nothing mechanical left to fail, and the failure rate of silicon components is low, electronic boards provide decades of service.

Another trend to notice is that many of these devices run Linux. Linux has many benefits: it is free, it has support for a wide variety of these new computers, and it is easy to modify.

(Image courtesy Wikia.com)

Wednesday, March 21, 2012

Book Review: The Continent of Circe by Nirad Chaudhuri

Want to read a possible theory about Hindus and Hindu behavior? Read the confusing book, "The Continent of Circe", by Nirad Chaudhuri.

"The Continent of Circe" is a book published in 1966 by Nirad Chaudhuri. He grew up in Bengal and writes about the Indian freedom struggle. His views are very different from mainstream views about India. He differs from both Western authors and Indian authors, which makes his writing unique. Some of the arguments made in this book are
  1. There is no religion called Hinduism. This is a name given to a group of people from a specific region.
  2. The Muslims and Hindus in India are completely different people. Conversions and intermarriages have done little to change the stock and behavior of the people. The two groups distrust one another and the relations are deteriorating over time.
  3. The Hindu caste system was the Aryan way of assimilating the local groups. It was never a barrier to professions, and that it might have done some good in maintaining social order and regulating competition between the competing groups: priets and warriers at the top were Aryan and the lower classes were local merchants.
  4. Hindus live among contradiction: (a) they espouse non-violence while their history is replete with butchery (b) they have a grand view of themselves while also professing humility (c) they hold a view of their unity while being deeply fragmented and divisive.
  5. The Indian weather molds Indian (specifically Hindu) behavior.
  6. Hindu thought is rather simple, and written analysis of Hindu thought makes it more profound that it really is.
That's as far as I made it through the book. As you can tell, some of the arguments are correct and obvious while others are strange and need explaining. Explanations are provided, though the language in the book is bizarre and complex. For one, the essays appear unedited. The language is complex, sentences are long and winding. The author rambles on, and paragraphs are disconnected and dry. Every other paragraph has some foreign phrase which makes it doubly hard to understand the intent. Here is a typical sentence:
Today, as an old man I would say that I have seen so much of this feckless tragedy of Hindu life on this green earth and under the blue sky, that the moment I see any sign of the Hindu dementia in me, I shall cry out, "Nunc dimittis..."
Entire passages from foreign languages are quoted. I gather there is some French, German, Latin involved, and I'm glad the typesetter did not have Cyrillic or Japanese or Chinese characters. I don't know what to make of the strange writing style and the profusion of Latin. I'm sure the words "De rerum indicarum natura: exempla gentium et seditionum" appeal to learned people. An illiterate villager like me has no idea what the hell just happened.

Is it pretentious writing? Did everyone write like this in 1960? Or am I too dumb to understand it?

One thing is certain. Nirad Chudhuri was original. He sees the vast difference between Indian and Western behavior and tries to explain the difference. He isn't pandering to either group: his observations are brutally honest and his analysis is equally impartial. The standard Indian reaction to him is to call him prejudiced and bitter. I don't agree with that. His brutal honesty hurts Indian sentiment. Indians prefer objectivity to be coated with lots of warm love about Indian greatness. Nirad Chudhuri also points out the poor aspects of Western behavior. His praise and condemnation is showered on both groups equally.
Read him, if only to see how honest and unabashed an Indian author can be.


(Image courtesy ManasTech's blog.)

Monday, February 27, 2012

Project Euler in R: Problem 26

I have been posting R solutions to Project Euler problems as a way of polishing my R skills. Here is the next problem in the series, problem 26. The problem is stated as follows:

A unit fraction contains 1 in the numerator. The decimal representation of the unit fractions with denominators 2 to 10 are given:
1/2 = 0.5
1/3 = 0.(3)
1/4 = 0.25
1/5 = 0.2
1/6 = 0.1(6)
1/7 = 0.(142857)
1/8 = 0.125
1/9 = 0.(1)
1/10 = 0.1
Where 0.1(6) means 0.166666..., and has a 1-digit recurring cycle. It can be seen that 1/7 has a 6-digit recurring cycle. Find the value of d < 1000 for which 1/d contains the longest recurring cycle in its decimal fraction part.
The first task is to define a function that returns the cycle length of 1/n for any input n. The function logic is pretty simple.  We compute all the remainders in the long division of 1 by n. The moment we encounter a remainder that has been seen before, we know that the fraction has a recurring cycle and can compute its length.

recur.length <- function (n) {
  x = 1                             # Starting with the numerator 1
  remainders = c(x)
 
  repeat {
    x = (x*10) %% n                   # Find the remainder
    if (x == 0) { return (0) }        # Fraction terminates, so cycle length is 0.
    if ((x %in% remainders)) {break}  # Repeating remainder is found. Break from loop.
    remainders <- c(remainders, x)    # Else add remainder to list.
  }
  return (length(remainders) - which(remainders == x) + 1)  # Exclude non-repeating part of the decimal
}
The above code demonstrates the use of the less common but simple repeat loop.  R doesn't have a do-while, so repeat with a break is a perfect way to imitate such a construct.  You can read more about R control-flows here.

Using the recur.length function we find d < 1000 with the largest recurring cycle. The following insights that make our search pretty efficient

  • A number d can have a recurring cycle of at most d - 1.  This is because modulo division by d can have only d unique remainders (0, 1, ..., d-1), after which we will necessarily find a remainder that repeats. One of these remainders is zero, and if we get this, no recurrence is possible. 
  • Due to the first observation, larger values of d will possibly have larger cycle lengths.  So it is faster to searching from 1000 down to 2, rather than from 2 up to 1000.
  • Searching down from 1000, say we find a recurring cycle of length k.  Now we know that the smallest number that we need to check is k + 1.  This is because the numbers smaller than k + 1 (that is 2 to k) can only have cycle lengths of k - 1 or less.

Based on the above, here's a function to find the d with the largest recurring cycle. 
longest <- function (n) {
  maxlength = 0       # Maximum recurring cycle
  maxd = 0            # d with the maximum recurring cycle
 
  # Check all numbers between low and i for a longer cycle length
  low = 2
  i = n
 
  while (i > low) {
    ilength = recur.length(i)
    if (ilength > maxlength) {
      maxlength = ilength
      maxd = i
      low = maxlength + 1           # The candidate must be > = low
    }
    i = i - 1
  }
  maxd
}
The search is efficient and takes only 0.15s on my Intel Core 2 Duo 1.6Ghz laptop running Linux.  Discussion threads suggest limiting the search to prime numbers, but I'm not sure if this would be much more efficient.  The gains in the reduced number of candidates would probably be offset by primality tests. It is amazing how small this code is. R handles most of the array handling and searching through a vector.

Code formatted by Pretty R at inside-R.org

Tuesday, January 24, 2012

Project Euler in R: Problem 25

Solutions in R to Project Euler problems are generally hard to find, so I've recently started posting R solutions on this site.  Here are the previous problems: problem 22problem 23 and problem 24

Now let's look at Problem 25 from Project Euler.  Here is the problem statement:
The Fibonacci sequence is defined by the recurrence relation: Fn = Fn1 + Fn2, where F1 = 1 and F2 = 1.
Hence the first 12 terms will be:
F1 = 1
F2 = 1
...
F11 = 89
F12 = 144
The 12th term, F12, is the first term to contain three digits. What is the first term in the Fibonacci sequence to contain 1000 digits?
This is an easy problem to solve for a small number of digits. It gets interesting once we increase the required number of digits to 1000. The R integer type can hold 10 digits at most, and the type double can hold 309 digits.  There is no Big Integer class in R that can hold larger numbers, so to compute the 1000 digit Fibonacci number we will have to be creative.

Computing Fibonacci numbers iteratively

Here is a basic iterative Fibonacci function, that works for Fibonacci numbers with 309 or fewer digits.
fib <- function(n) {
  a = 0
  b = 1
  for (i in 1:n) {
    tmp = b
    b = a
    a = a + tmp
  }
  return (a)
}
The version that works with larger numbers is similar, with the following changes
  • Integer vectors are used to store each Fibonacci number.
  • Each element of the vector is at most nine digits long.  
  • Once all the nine digits are used, a new element is added to the vector and the Fibonacci number computation is carried over to the new element.
  • The variable numberDigits counts the number of digits in the latest Fibonacci number.  When numberDigits exceeds 999, the function exits, returning the Fibonacci number index.
fib.bigint <- function(lim) {
  a = c(0)
  b = c(1)
  n = 0           # Fibonacci number index
  integerSize <- 1000000000  ## How big is each integer
  while (1) {
    n = n + 1     # Compute next Fibonacci number
    tmp = c(rep(0,length(a)-length(b)), b)  # Pad beginning with zero
    b = a
 
    # Add a and tmp
    carry = 0
    for (i in length(a):1) {
      sum.i = tmp[i] + a[i] + carry
      a[i] = sum.i %% integerSize
      carry = floor (sum.i / integerSize)
    }
    if (carry > 0) {a <- c(carry, a)}
    numberDigits <- (length(a) - 1) * 9 + countDigits(a[1])
    if (numberDigits > (lim - 1)) { return (n) }
  }
}
countDigits <- function(n) {
  count <- 1
  for (i in 1:9){
    if ((n/10) > 1) {
      count <- count + 1
      n = n / 10
    } else {
      return (count)
    }
  }
}

On my Intel Core 2 Duo 1.6Ghz laptop running Linux, this function less than 4 seconds to find the Fibonacci number with 1000 digits.

The analytical solution
Fibonacci numbers have a closed form solution, which leads to a simpler approximation by rounding (from Wikipedia),
Fn = the closest integer to (golden ratio)^n / sqrt(5) 
We need the first Fibonacci number with 1000 digits.  This number will be greater than or equal to 10^999.
Fn = the closest integer to (golden ratio)^n / sqrt(5) >= 10^999
We can drop the closest integer function because if Fn rounds off to the floor of the ratio, the resulting Fibonacci number will have less than the required digits.  So we need smallest n satisfying
(golden ratio)^n / sqrt(5) >= 10^999 
Or, n >= (999 log(10) + log(sqrt(5)))/log(golden ratio) 
Or, n >= 4781.86
So we need the 4782nd Fibonacci number to get the required 1000 digits.

This problem was particularly interesting to R users because it runs into an R-specific limitation. Many other languages don't have this problem and can just use a simple Fibonacci generator to get the answer. For example, in Scheme (or other Lisp variants), a naive Fibonacci implementation produces all the digits.

Aside: Having started writing about R, I have found an excellent online resource, R-bloggers -  an aggregator of content from various blogs about R. You can search for other R Project Euler solutions here.  Be sure to check out the top articles box on R-bloggers landing page if you're looking for new and interesting applications of R.

Thursday, January 19, 2012

Project Euler in R: Problem 24

I had previously posted solutions in R to Project Euler problem 23 and problem 22.  This is the next problem from Project Euler. The statement of problem 24 is as follows.
A permutation is an ordered arrangement of objects. For example, 3124 is one possible permutation of the digits 1, 2, 3 and 4. If all of the permutations are listed numerically or alphabetically, we call it lexicographic order. The lexicographic permutations of 0, 1 and 2 are: 012   021   102   120   201   210
What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?
This is a very interesting problem and it can be solved analytically. The key insight is that n distinct digits can be arranged in n! permutations.

Here's the logic you could use. If the permutations of the numbers 0 through 9 are arranged in lexicographic order,
  • The first 9! permutations, #1 to #362,880 are numbers starting with 0, in order from 0123456789 to 0987654321
  • The next 9! permutations, #362,881 to #725,760, are numbers from 1023456789 to 1987654320
  • The next 9! permutations, #725,761 to #1,088,640  are numbers from 2013456789 to 2987654310.  The millionth permutation is contained in this set.  Thus we know that the millionth number must start with a 2.  In fact, it is the 274,240th = (1000000 - 725760) number contained in this set.
  • Our problem has now reduced to finding the 274,240th lexicographic permutation of the numbers 0, 1, 3, 4, 5, 6, 7, 8 and 9 - the nine digits excluding 2.  We can repeat the above process to get each successive digit.
Essentially we're writing one million as the following sum
1000000 = (a1 x 9!) + (a2 x 8!) + ... + (a10 x 1)
As shown above, the index a1 is 2.  That is, only two times 9! numbers fit perfectly within 1,000,000.  So the first digit of the millionth number is the third digit in 0123456789.  The index a2 is 6, so the second digit of the millionth number is the seventh digit of the remaining numbers 013456789, which is 7.  The remaining digits can be found similarly. While finding the digits you have to be careful to ignore the digits that have been previously used. This is a permutation, so each digit appears exactly once.

The process is tedious, so here is some R code that will make it easier.  This generalizes the solution to find the nth permutation of the numbers 0 to 9.
findperm <- function (n) {
  # Find the indices a1 - a10
  indices = c()
  for (i in 9:0) {
    index = 0
    while (factorial (i) < n) {
      n = n - factorial (i)
      index = index + 1
    }
    indices = c(indices, index)
  }
 
  # Use the indices to find the nth permutation
  input = 0:9                         # The list of numbers to permute
  output = c()
  for (j in 1:10) {
    output[j] = input[indices[j] + 1] # Move digit to output
    input = input[-(indices[j] + 1)]  # Remove the assigned digits from input
  }
  return (output)
}
That was straight-forward and a fun illustration of R's capabilities.

But you can do better. The strength of R lies in the wealth of user defined libraries and functions that can ease the implementation of tricky computation.  For a simpler solution, try the lexicographicPermutation function in the library R.basic. Here is the entire solution.
library(R.basic)
lexicographicPermutation(c(0,1,2,3,4,5,6,7,8,9),999999)

The first solution illustrates that even tricky computation can be coded easily in R and the code is pretty readable.  The second solution suggests that before writing your own code, it is a good idea to look for existing libraries that can help. Statistical computations can get complex and tricky. You can write your own k-means clustering algorithm but it is difficult getting all the edge cases correct. Rather than writing your own routine, look around and see if it has been written already. An expert does a better job writing such routines. And you can leverage their code and their expertise.

Sunday, January 15, 2012

Project Euler in R: Problem 23

I was just looking through the programming language statistics on Project Euler. It shows that only 7% of the problems have been solved in R, whereas 8% have been solved on any kind of spreadsheet.  This is outrageous!

Let's look at the solution of problem 23 in R. Here is the statement of Project Euler's problem 23:
A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of 28 would be 1 + 2 + 4 + 7 + 14 = 28, which means that 28 is a perfect number.
A number n is called deficient if the sum of its proper divisors is less than n and it is called abundant if this sum exceeds n.
As 12 is the smallest abundant number, 1 + 2 + 3 + 4 + 6 = 16, the smallest number that can be written as the sum of two abundant numbers is 24. By mathematical analysis, it can be shown that all integers greater than 28123 can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.
Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.
Find proper divisors
First we write a function to find the proper divisors (that is divisors less than the number itself) of a number n.
divisor <- function (n) {
    divs <- c(1)
    i = 2
    lim = floor (sqrt(n))
 
    while (i <= lim) {
        if (n %% i == 0) {
          divs <- c(divs, i)
          if (n/i != i) { divs <- c(divs, n/i) }
        }
        i = i + 1
      }
    return (divs)
  }

Check if number is abundant
Next we create a simple function which checks whether a number n is abundant.
abundant <- function (n) {
    if (sum (divisor(n)) > n) return (TRUE) else return (FALSE)
  }

The solution
With the above two functions we have the tools we need to solve the main problem: finding the sum of all positive integers which can not be written as the sum of two abundant numbers.

All the numbers greater than 28,123 can be written as sum of two abundant numbers, so we only need to check numbers smaller than 28,123 for this property.  The vector abunlist in the code below holds all abundant numbers starting from 12 to 28,123.  With every new abundant number than we find, we create all possible two abundant number sums and store them in the vector sumno.  The answer we're looking for is the sum of all numbers 1 to 28,123 except the ones in the vector sumno.

abunlist <- c(12)            # The smallest abundant number = 12
sumno <- c(24)               # The smallest abundant number sum = 12 + 12 = 24
 
for (i in 13:28123) {
  if (abundant(i)) {
    abunlist <- c(abunlist, i)                   # Latest abundant number = i
    tmp <- abunlist[(abunlist + i) <= 28123] + i   # New abundant sums <= 28123 using i
    sumno <-unique(c(sumno, tmp))                # Add to existing list
  }
}

This is certainly not the most efficient solution, but runs in approximately 25 seconds - well within the stated 1 minute goal.

Note: The code for this post was formatted in Pretty R at inside-R.org.

Wednesday, January 11, 2012

Generate UML Diagrams From Java Source

I was navigating through a large Java project today, and I needed some source tools to help me visualize the massive class hierarchy. Looking on Google didn't produce anything obvious: people pointed to eUML2, which was difficult to install, and some tools didn't generate UML from existing code.

After a while of searching, I remembered the Doxygen utility which writes documentation from source code. It works on C,C++, and Java, among many other languages. While its diagrams might not be perfect, they were very helpful for C++ projects in the past.

So I decided to give it a shot. It worked out beautifully.

Here are the steps involved in generating UML diagrams.
First, you install doxygen (to parse the files) and graphviz (to write out PNG of graphs)
$ sudo apt-get install doxygen graphviz
Then, run doxygen to generate a config file.
$ cd <path-to-source>
$ doxygen -g
Now you have a config file called Doxyfile. Edit this to allow recursive searching, to look for java source, to generate call graphs and to generate pictures, and to generate UML style diagrams. Here are all the changes I made to this file.
FILE_PATTERNS     = *.java
RECURSIVE       = YES
HAVE_DOT        = YES
CALL_GRAPH       = YES
CALLER_GRAPH      = YES
DOT_GRAPH_MAX_NODES  = 500
UML_LOOK            = YES
Now, generate documentation with this command:
$ doxygen Doxyfile
To test this out for a blog post, I downloaded the source code to Azureus, a large Java-based project. Here is a graph from one of the files. The top right shows a close-up view of a tiny section of the picture. Even in this simple example, you can see how a zoomed-out view allows you to understand the complex structure.

Doxygen reads code written in C, C++, Java, and many other languages. So if you are in need of UML style diagrams to better understand a class hierarchy, give it a try. It is freely available, easy to install, and easy to use. In addition to class graphs, it can also make call graphs and caller graphs.

Saturday, January 07, 2012

Project Euler in R: Problem 22

I solve Project Euler problems for recreation. I am using the Statistical Language R to solve these. R is free for use and download, so I would recommend downloading it if you are interested in Statistical computation.
This is problem 22 from Project Euler:
Using names.txt, a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 x 53 = 49714.
What is the total of all the name scores in the file?
While this is an easy problem, its solution involves some nice R concepts. Let's go through solving this in R.

Reading free form text using scan()
The first order of business is to read the text file. The file names.txt is a single line of text, separated by commas:
"MARY","PATRICIA","LINDA","BARBARA", ...
The scan function reads free form text and stores each element in a vector. We specify that the input is character data separated by commas. That should do the trick, right?
nlist <- scan("names.txt", what= "", sep=",",)
Unfortunately, this does not work. After many minutes of scratching your head you'll find that one of the names in the file is 'NA'. This confuses the scan function since it is interpreted as the constant NA, or "Not Available" in R. This can be fixed by setting an na.strings argument in the scan function.
nlist <- scan("names.txt", what= "", sep=",", na.strings="")
  
Dictionaries / hashtables in R
Getting the alphabetical value of each character is easiest if it is stored in a hashtable somewhere. In R, named vectors can be used as hashtables. The traditional definition would go as follows:
dict = c("A" = 1, "B" = 2, ...)
While you can sit and write the entire dictionary by hand, you should avoid such repeated work. Here's a niftier definition using the names() function:
dict = 1:26
names(dict) = unlist(strsplit("ABCDEFGHIJKLMNOPQRSTUVWXYZ", ""))
This should be a trivial method to write in a C-style language where you can turn a character into an integer. R, however, has no internal ASCII table that I could find. If there is a simpler solution, please let me know.

Name scoring function
The scoring function gets the name as an input, pulls the score for each letter in the name and then adds up the letter scores.
score <- function (name) {
    return (sum(dict[unlist(strsplit(name, ""))]))
}
Calculating weighted scores
The vector of name scores is obtained by applying the score function defined above to each name in nlist. We use the sapply function, which gives a simplified vector output by default. The vector of weights is a sequence of numbers from 1 to the length of nlist. Multiply the two vectors to get the vector of weighted scores.
weighted.score = (1:length(nlist))*(sapply(nlist, score)) 
  
The full solution
nlist <- scan("names.txt", what= "", sep=",", na.strings="")
dict = 1:26
names(dict) = unlist(strsplit("ABCDEFGHIJKLMNOPQRSTUVWXYZ", ""))

score <- function (name) {
    return (sum(dict[unlist(strsplit(name, ""))]))
}

nlist <- sort(nlist)
weighted.score = (1:length(nlist))*(sapply(nlist, score)) 
sum(weighted.score)
This was an interesting problem to solve in R. It demonstrates the flexibility of R in reading and processing free form text. Trying to do a similar task in SAS would be hard.

Monday, January 02, 2012

Book Review: Binary, by Michael Crichton

Want to read good fiction? Read "Binary", by Michael Crichton, one of his best works that you've never read. Yes, that Michael Crichton.

The first book I read by Michael Crichton was an unknown book called "Binary". It was an obscure book by an obscure author called John Lange. I was in high school, and my father's friend had dropped this short book at our place. The author was John Lange, a nobody. Inside, the book mentioned that the author had been revealed as the great Michael Crichton, but I hadn't heard of Crichton either, so expectations ran low.

The book starts with a gripping description of a heist. It grabs you, and pulls you in. The book is a thriller, of a plot involving dangerous weapons, and one man's race to stop it all. It is a short book, written very well.

I had harbored fond memories of this book. I had recommended it to many friends, knowing full well how difficult it was to obtain a copy. Recently, I received a copy of this book, and I sat down to read it. A short evening passed, and I was done. Despite having read it over a decade ago, I remembered most of the plot and many of the scenes. It was a thrilling book made punchier by the fact that I knew the story. I marveled at the plot and the excellent writing.

A fantastic novel, a thrilling plot. The ending isn't blockbuster by today's standards, and that is part of the lure. No superfluous nonsense, and no extra pages.

A gripping story snappily told.