Archive of RubyForge sup-talk mailing list
 help / color / mirror / Atom feed
* [sup-talk] [PATCH] cache results of Person.from_address
@ 2009-08-17  6:39 Rich Lane
  2009-08-17 13:07 ` Andrei Thorp
  2009-08-22 14:10 ` William Morgan
  0 siblings, 2 replies; 5+ messages in thread
From: Rich Lane @ 2009-08-17  6:39 UTC (permalink / raw)


The regexes in this function are very expensive, so caching improves
performance significantly for queries and slightly for indexing.
---
 lib/sup/cache.rb  |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/sup/person.rb |    7 ++++++-
 2 files changed, 52 insertions(+), 1 deletions(-)
 create mode 100644 lib/sup/cache.rb

diff --git a/lib/sup/cache.rb b/lib/sup/cache.rb
new file mode 100644
index 0000000..0836dbd
--- /dev/null
+++ b/lib/sup/cache.rb
@@ -0,0 +1,46 @@
+class Cache
+  def initialize n=128, i=3
+    @n = n
+    @i = i
+    @values = {}
+    @marks = {}
+    @delete_stack = []
+  end
+
+  def [](k)
+    if @values.member? k
+      @marks[k] = @i
+      @values[k]
+    else
+      nil
+    end
+  end
+
+  def []=(k,v)
+    if @values.size < @n
+      @values[k] = v
+      @marks[k] = @i
+    else
+      if @delete_stack.empty?
+        sweep
+      else
+        k2 = @delete_stack.pop
+        @values.delete k2
+        @marks.delete k2
+        self[k] = v
+      end
+    end
+  end
+
+  def sweep
+    @marks.each do |k,v|
+      v -= 1
+      if v == 0
+        @delete_stack.push k
+        @marks.delete k
+      else
+        @marks[k] = v
+      end
+    end
+  end
+end
diff --git a/lib/sup/person.rb b/lib/sup/person.rb
index c4f40a5..046eedc 100644
--- a/lib/sup/person.rb
+++ b/lib/sup/person.rb
@@ -1,3 +1,5 @@
+require 'sup/cache'
+
 module Redwood
 
 class Person 
@@ -73,8 +75,11 @@ class Person
     end.downcase
   end
 
+  ## This method is expensive, so memoize it.
+  @from_address_cache = Cache.new
   def self.from_address s
     return nil if s.nil?
+    @from_address_cache[s].tap { |x| return x if x }
 
     ## try and parse an email address and name
     name, email = case s
@@ -102,7 +107,7 @@ class Person
         [nil, s]
       end
 
-    Person.new name, email
+    @from_address_cache[s] = Person.new name, email
   end
 
   def self.from_address_list ss
-- 
1.6.4



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] [PATCH] cache results of Person.from_address
  2009-08-17  6:39 [sup-talk] [PATCH] cache results of Person.from_address Rich Lane
@ 2009-08-17 13:07 ` Andrei Thorp
  2009-08-22 14:10   ` William Morgan
  2009-08-22 14:10 ` William Morgan
  1 sibling, 1 reply; 5+ messages in thread
From: Andrei Thorp @ 2009-08-17 13:07 UTC (permalink / raw)


Just want to say thanks again for your seemingly unending amount of good
work to improve Sup.
-- 
Andrei Thorp, Developer: Xandros Corp. (http://www.xandros.com)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] [PATCH] cache results of Person.from_address
  2009-08-17  6:39 [sup-talk] [PATCH] cache results of Person.from_address Rich Lane
  2009-08-17 13:07 ` Andrei Thorp
@ 2009-08-22 14:10 ` William Morgan
  2009-08-22 18:28   ` Rich Lane
  1 sibling, 1 reply; 5+ messages in thread
From: William Morgan @ 2009-08-22 14:10 UTC (permalink / raw)


This looks good. Two minor questions before I apply:

Reformatted excerpts from Rich Lane's message of 2009-08-16:
> The regexes in this function are very expensive, so caching improves
> performance significantly for queries and slightly for indexing.

When you say this affects query performance, is it just the contact-list
query, or is there some other mechanism by which this is slowing down
regular queries?

Also in this method:

> +  def []=(k,v)
> +    if @values.size < @n
> +      @values[k] = v
> +      @marks[k] = @i
> +    else
> +      if @delete_stack.empty?
> +        sweep
> +      else
> +        k2 = @delete_stack.pop
> +        @values.delete k2
> +        @marks.delete k2
> +        self[k] = v
> +      end
> +    end
> +  end

Wouldn't it be better to do this?

       else
         if @delete_stack.empty?
           sweep
         end

         unless @delete_stack.empty?
           k2 = @delete_stack.pop
           @values.delete k2
           @marks.delete k2
           self[k] = v
         end

So we check the delete stack even after calling sweep, and we allow for
the value to be nil.

Thanks!
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] [PATCH] cache results of Person.from_address
  2009-08-17 13:07 ` Andrei Thorp
@ 2009-08-22 14:10   ` William Morgan
  0 siblings, 0 replies; 5+ messages in thread
From: William Morgan @ 2009-08-22 14:10 UTC (permalink / raw)


Reformatted excerpts from Andrei Thorp's message of 2009-08-17:
> Just want to say thanks again for your seemingly unending amount of
> good work to improve Sup.

Agreed!
-- 
William <wmorgan-sup at masanjin.net>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [sup-talk] [PATCH] cache results of Person.from_address
  2009-08-22 14:10 ` William Morgan
@ 2009-08-22 18:28   ` Rich Lane
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Lane @ 2009-08-22 18:28 UTC (permalink / raw)


Excerpts from William Morgan's message of Sat Aug 22 10:10:04 -0400 2009:
> This looks good. Two minor questions before I apply:
> 
> Reformatted excerpts from Rich Lane's message of 2009-08-16:
> > The regexes in this function are very expensive, so caching improves
> > performance significantly for queries and slightly for indexing.
> 
> When you say this affects query performance, is it just the contact-list
> query, or is there some other mechanism by which this is slowing down
> regular queries?

Actually, your question prompted me to wonder why we're calling
Person.from_address on this path at all. With a little support from
Message we can completely avoid Message#parse_header. I've just sent in
a patch that does this. Please apply that rather than the from_address
cache.

The performance improvement from the new patch is slightly better than
that of the cache. Depending on the benchmark I see the time taken by
ThreadIndexMode#load_n_threads decrease by 1/2 to 2/3.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-08-22 18:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-17  6:39 [sup-talk] [PATCH] cache results of Person.from_address Rich Lane
2009-08-17 13:07 ` Andrei Thorp
2009-08-22 14:10   ` William Morgan
2009-08-22 14:10 ` William Morgan
2009-08-22 18:28   ` Rich Lane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox