From mboxrd@z Thu Jan 1 00:00:00 1970 From: wmorgan-sup@masanjin.net (William Morgan) Date: Thu, 28 Feb 2008 09:29:49 -0800 Subject: [sup-talk] [PATCH] Unwrap br0ken URLs. In-Reply-To: <12034162702820-git-send-email-nicolas.pouillard@gmail.com> References: <12034162702820-git-send-email-nicolas.pouillard@gmail.com> Message-ID: <1204219598-sup-1190@south> I would love to have a feature like this in Sup. This patch still has some issues in terms of being over-aggressive. What I would really like to see as a starting point is a corpus of broken URL examples that we can build unit tests of. Then we can tweak these regexes until we get something that has both high precision and high recall. Also, have you looked at URI.regexp? I think that can do a lot of the dirty work. -- William