About Me

My Photo
Kevin Compton
Entreprenuer and professional software developer. Husband, Father and Friend.
View my complete profile

Sunday, April 19, 2009

Some Sunday Fun: #rollup method

Recently a co-worker and I were parsing a CSV file for the purpose of importing Clinical Study data. Each study may have multiple entries in order to identify the study team members.

For example, Study123 might have two people acting as the "Billing Coordinator" and thus require two entries in the file.

The logic to parse the file and build the resulting Study object is pretty typical and this morning I thought that it would be nice to have a "rollup" method that would take the boilerplate code off of our hands.

I could envision adding the #rollup method to Enumerable, Array or CSV::Reader to encapsulate this logic and perhaps provide a cleaner interface.

Here is an example I coded dealing with repeating data for People and their addresses:

Pastie




 1 module Pattern
 2   # roll-up - iterating through an ordered array and rolling up
 3   #           related entries into a single entry

 4   #
 5   #   require '/Users/chiefwahoo/rails/patterns/patterns.rb'
 6   #   class Array; include Pattern; end
 7   #
 8   #   class Person
 9   #     attr_accessor :name, :address
10   #     def initialize(name, address)
11   #       @name = name; @address = address;
12   #     end
13   #   end
14   #
15   #    # Assumes items are grouped by "key" in this case Person's name
16   #    people = []
17   #    people << Person.new("Compton, Kevin", "123 Mockingbird Lane")
18   #    people << Person.new("Compton, Kevin", "9823 Beach View Drive")
19   #    people << Person.new("Compton, Jessica", "823 City Park Avenue")
20   #    people << Person.new("Compton, Jessica", "9824 Beach View Drive")
21   #    people << Person.new("Compton, Keith", "777 College Avenue")
22   #    people << Person.new("Compton, Megan", "123 California")
23   #    people << Person.new("Compton, Megan", "876 Mount Vista")
24   #
25   #   rollup_items = []
26   #
27   #   people.rollup(
28   #       Proc.new {|person| person.name},
29   #       Proc.new {|person| "User #{person.name} has residences at: "},
30   #       Proc.new {|container, person| container << " #{person.address};" },
31   #       Proc.new {|container|
32   #           rollup_items << container;
33   #           puts "Detail is : #{container}" }
34   #       )
35   #    
36   #    ### OUTPUT ###
37   #    Detail is : User Compton, Kevin has residences at:  123 Mockingbird Lane; 9823 Beach View Drive;
38   #    Detail is : User Compton, Jessica has residences at:  823 City Park Avenue; 9824 Beach View Drive;
39   #    Detail is : User Compton, Keith has residences at:  777 College Avenue;
40   #    Detail is : User Compton, Megan has residences at:  123 California; 876 Mount Vista;
41   #    => nil
42   #    irb(main):098:0> rollup_items
43   #    => ["User Compton, Kevin has residences at:  123 Mockingbird Lane; 9823 Beach View Drive;", "User Compton, Jessica has residences at:  823 City Park Avenue; 9824 Beach View Drive;", "User Compton, Keith has residences at:  777 College Avenue;", "User Compton, Megan has residences at:  123 California; 876 Mount Vista;"]
44   #    

45    def rollup(bookmark_detail_proc, initialize_item_proc, merge_into_item_proc, finalize_item_proc)

46      saved_bookmark = nil

47      current_item = nil

48

49      self.each do |line|

50        current_bookmark = bookmark_detail_proc.call(line)

51        if saved_bookmark != current_bookmark

52          finalize_item_proc.call(current_item) unless current_item.nil?

53

54          saved_bookmark = current_bookmark  

55          current_item = initialize_item_proc.call(line)

56        end

57

58        merge_into_item_proc.call(current_item, line)

59      end 

60      finalize_item_proc.call(current_item) unless current_item.nil?

61    end

62 end

2 comments:

eee.c said...

Using Enumerable methods (esp. Rail's group_by / index_by) is more idiomatic, I think:

people.group_by(&:name).each do |name, people|
  puts "Details: User #{name} has residencies at: #{people.map(&:address).join('; ')"
end

Kevin Compton said...

I agree that would be a nice implementation for smaller data sets.

However, the initial usage case that led to my implementation involved parsing a large file and I did not want to bring it all into memory but instead parse it and only have a single item in memory at any given point.