I wrote a bit on the crawler, and decided to write a gopher client library for
node first, that I can use for my crawler. It works pretty well now, and it
has support for browsing by using a gopher URI, or instantiate a resource
object, providing hostname, port, selector and type. It can fetch files and
present the result directly in a callback, or stream the resource directly to
disk, for large downloads, and be called back when the download has completed.
The library provides 3 things: Gopher.Client, Gopher.Resource, Gopher.Types.
Types is just a map from names, like "text" and "directory" to gopher-protocol
menu-entity type identifiers (like "0" and "1"), the client can be set not to
parse menu-entries, and can have the timeout adjusted or disabled.
The URI format is [gopher://]host[:port][/type][resource[?searchString]]
If no port is defined, 70 is used, if no type is defined, 1 is used, if no
resource is defined, an empty string is used, if no query is defined, no search
string will be sent. This scheme should work like that used by OverbiteProject.
After writing the regex and logic for parsing a Gopher URI into its components,
with reasonable defaults, I wrote a unit test to test as many permutations as
I could come up with. I know some people are against doing this, and prefer
explicitly testing what they know has to work. But I often feel that any case
not explicitely forbidden must be allowed, and must function. So the unit test
I made tests for 3024 permutations of a gopher URI. I'm glad I took that
approach, the unit test first failed on around half, and I found an obvious
logical error. Then it failed on 12 tests, and I found a subtle regex error.
Now all pass. Writing tests like these can be a bit tasking, because some
permutations may not be legal, and must be caught, or logic must be in place
to make sure they do not occur. I'm adding it to github tonight.
Maybe I will get an account so I can add it to npm as well.