Fast CPUs Make for Dissapointing Optimizations

Posted Wed, 28 Sep 2005

As a follow up to yesterday's post about using intermediate datastructures, I wanted to add some concrete numbers to my Big-Oh runtime statements.

I came in to work this morning and got my hands on what I thought was a relatively large sample set of data, and setup an operation that would populate an array with 2000 items, and them "keep" items from an array with 350 elements. The result was that 300 elements were left in my array.

The original alorithm I had to work with was as follows:

keep( source, keepers ) {
  for each element S in source
    found = false;

    for each element K in keepers
      if ( S == K ) found = true; break;

    if ( not found )
      delete S from source
}

So source is size 2000, keepers is size 350, and I know from observation that I'm going to delete (2000-300) elements from source. The overall (worst-case) runtime then is 2000 * 350 + 2000 * 1650 = 4,000,000 steps!

My re-implementation using intermediate datastuctures looks roughly like this:

keep( source, keepers ) {
  Populate set S(s) with elements of source
  Populate set S(k) with elements of keepers

  for each element E in source
    if ( E is not found in S(k) )
      remove E from S(s)

  Reset source array to be empty with size length(S(s))
  Copy contents of S(s) into source
}

The runtime here then is: 2000log(2000) + 350log(350) + 2000*log(350) + 1650 = 43,625 steps.

Now in my book, 43625 if significantly fewer steps than 4,000,000. But let's face it, when your average user is running well over 2 billion cycles a second, and you're talking about arrays of pointers, four million operations is nothing :-\.

I guess it's time to hunt for a much larger supply of data :-\

Related Books

Data Structures and Algorithms in Java (2nd Edition) Cocoa(R) Programming for Mac(R) OS X (3rd Edition) C Programming Language (2nd Edition) (Prentice Hall Software) Website to Accompany Data Structures & Algorithms in C++ (Http://datastructures.net) Solns, Slides, Add'l Problems, Applets + More Programming Interviews Exposed: Secrets to Landing Your Next Job (Programmer to Programmer)

Post a Comment




About

My name is Tim Fanelli, I am a software engineer in Northern NY. I spend most of my time working, and when I can, I try to post interesting things here.

Cigar Dossiers