Gnutella Web Caching System
Version 2 Specifications Client Developers' Guide
Copyright (c) 2003 Hauke D�mpfling,version 1.9.4 / 18.6.2003, http://www.gnucleus.com/gwebcache/newgwc.html
Table of Contents
This document serves a guide for client developers that covers how to use the"new" GWebCache system (as according to the "version 2specifications", also referred to as GWC2). This document should beconsidered "beta". Clients and caches using these specs have notbeen thoroughly tested.
GWebCache, even though it is designed for simplicity, will only work ifseveral key functionalities are implemented by developers. Therefore,developers, read this document carefully.
To understand why this is so important: Because some clients had errors intheir code, people who ran GWebCaches had (and may still have) much grief,because these clients relentlessly hammered away at the servers, in some caseseven continuing to hammer servers' IP's when the virtual web servers were shutdown. Such utter lack of responsibility in coding put many users in a situationthat they could not escape from, and such a situation must not be repeated.
Therefore, I hope that you understand why it is critical that you read andunderstand this entire document. And, when you get ready to release yourshiny new client with GWebCache v2 functionality, you will thoroughly testthe interaction with a web cache before making any releases.
A bunch of Thank Yous for support of the GWebCache project with manyideas and code: John Marshall, Robert Rainwater, Guo Xu, Tor Klingberg,Christopher Rohrs, Mike Green, Nick Randall, ...
If you have any questions, comments, suggestions, (constructive)criticisms, etc., please post them in the Forumright away.
^ Top ^
Overview
A GWebCache is a script on a web server, clients use normal HTTP. It storesIP addresses of Gnutella nodes and the URLs of other caches. Clients (ultrapeers)make regular updates to GWCs to keep the information fresh.
Summary of Important Things to Remember
Each of these points is described in more detail below. - Your client must use GWebCache only if it has no other way to discover hosts. First, use your Pong cache and such.
- Your client may send updates only if it meets certain criteria. For example, it must accept incoming connections as an ultrapeer. More details below.
- In any case, your client must not send more than one request per hour. Your client will be rejected anyway, and you don't want to be rejected.
- If your client fails to contact a cache, it must not request to that cache again. If a cache is down, it's down!
- Remember that GWebCaches are run by volunteers in their own webspace. Do not abuse the privilege to be able to access GWebCaches, as they have limited CPU and Bandwidth resources. Don't DDoS your users and service providers.
Step 1: How to store GWC data in your client
- Keep an array of GWebCache URLs, and for each URL, store a flag as to whether or not your client has successfully contacted this cache. The client should forget this information when it exits and stores the information to (for example) a text file, but your client must keep this information in memory while running.
- Do not hardcode any cache URLs. Include a default list of GWCs with your client, but do not hardcode the URLs.
- You must remove any clients from your list that do not respond correctly. More on this later.
- Hosts will be returned in the standard numerical IP : port format (i.e. 123.45.67.8:123).
- URLs always begin with
http:// - Before your client accepts new URLs into its internal list, it must make the following changes:
- If the URL contains any %XX sequences where XX is a hex string (0-9, a-z, A-Z), replace them by the ASCII character with the hex value (i.e. %7E is ASCII character 0x7E, decimal 126, char "
~"). - If the URL ends in "index.EXT" where EXT is any of the following: "php", "cgi", "asp", "cfm", "jsp" (this list is not complete), then trim this filename. (For example
http://zero-g.net/gcache/index.cgi becomes http://zero-g.net/gcache/) - Trim any trailing slashes (
/). (For example http://zero-g.net/gcache/ becomes http://zero-g.net/gcache) - This check is encouraged: perform a DNS lookup of the web server you are adding to your list and compare that IP address to those of the servers already in the list. Do not replace the webserver's hostname with it's IP address! This would screw up virtual servers very badly. This check is meant to avoid ambiguities between hostnames that have the same IP address. For example, both "zero-g.net" and "www.zero-g.net" are working hostnames for the same site, but this should not cause duplicate entries in your list of cache URLs.
Step 2: How to interact with GWebCaches
- Your client must not exclusively rely on GWebCache. Your client must use its internal host cache (information gathered from Pongs) and X-Try headers with priority above GWebCache.
- Use a standard HTTP library. GWebCaches are regular scripts on regular web servers and therefore rely on your client understanding regular, full HTTP. (For example, 3xx responses mean "redirect" and 4xx-5xx means "error".) Make sure that your HTTP libraries provide a mechanism for identifying HTTP error codes.
- Do not use HTTP proxies. If the HTTP library you use uses proxies, they should be disabled. (Scripts need to see the client's IP.)
- This should not be an issue if you use standard HTTP libraries, but since it's happened before: make sure your libraries speak HTTP/1.1 and support virtual hosts. (For example, the "Host:" header.)
- When you contact a GWebCache, you can get four different kinds of responses, listed here. If you get anything that is not a normal GWebCache response, delete that cache's URL from your internal list.
- Normal GWebCache responses (described below)
- GWebCache error (response begins with string "ERROR")
- Invalid response (not parseable)
- HTTP error (HTTP codes 400 to 599)
- In all cases except the first, your client must forget about that cache, and do not retry. Note that in cases 2 and 3, the HTTP response code will still be 2xx ("OK"), but these responses still mean that the cache has had an error. In other words, only when you can successfully parse the response did the request succeed.
- Note that, as defined below, a GWebCache will now always output at least one line - this differs from the original GWebCache specifications, which said that GWebCache may return an empty string. Now, returning an empty file is invalid (note that "empty file" means that there may still be one or more CRLF/CR/LFs in the file).
- When contacting a web cache, pick a random cache from your internal list of caches.
- There is absolutely no reason to send more than one request per hour. Updates can be combined with Gets and Pings. Ideally, your client will make one request at startup only if necessary (more on this below), and then only one update an hour if it meets the criteria (more on this below too).
- Make sure your client can handle different end-of-line formats. Clients and servers may be on different platforms so there is no guarantee as to whether you will get CR, LF, or CRLF. As an example, here is some simple logic for converting everything to LFs: If the returned file contains any LFs, then remove all CRs, else replace all CRs by LFs.
- Your client must supply version information to a GWebCache. This is done via the "client" parameter. Version information is a 4-character string of uppercase letters (your client's ID) plus a max of 16 characters for the version number. (Examples: "
GNUC1.8.4.0", "LIME2.7.9 Pro") - IP Addresses must not begin with leading zeros, i.e. not 001.002.003.012 (this is dumb, and nobody does this anyway, but I just wanted to be clear).
- Your client will send requests via HTTP GET. This means that your request will be:
[the cache's URL] + "?" + any number of the following: [parameter name] + "=" + [escape-encoded parameter value] + "&" + [next parameter name] + "=" + [escape-encoded value] etc.
The order of the parameters does not matter. Each parameter should appear only once. - "Escape Encoding" (RFC1738) means replacing all characters that are not letters, numbers, dashes "-", underscores "_", or periods "." with the following: "
%" + [2-character ASCII code of character in Hex]
To make this replacement:
Step 1: replace all "%" by their representation: "%25"
Step 2: replace all non-alphanumeric characters except "%", "-", "_" and "." by a percent (%) sign followed by two hex digits.
Example: "http://www.zero-g.net/gcache/gcache.php" becomes "http%3a%2f%2fwww.zero-g.net%2fgcache%2fgcache.php" - Example requests:
http://www.server.com/path/to/gcache.cgi?client=TEST1.0&get=1 http://www.server.com/path/to/gcache.cgi?client=TEST1.0&update=1&ip=194.64.64.1%3A123&url=http%3a%2f%2fwww.otherserver.net%2fwebcache.cgi
Step 3: GWebCache output format
- Output of a GWebCache is in line-by-line format, according to the following pattern:
x|field1|field2|field3|... - "x" can be either "I" = Informational, "U" = URL, "H" = Host. So far, the following responses have been defined:
I - Informational Response - field 1:
pong - field 2: (version string)
Included in response to a Ping request, returns GWebCache version
- field 1:
update - field 2: OK
Returned when the update completed successfully (but possibly there were warnings!) - field 2: WARNING
field 3: "You came back too early", "Rejected IP" or "Rejected URL" (others may be added as needed)
A WARNING response to an update generally means that your client did something wrong. Note that warnings can appear in addition to an OK response.
- field 1:
nothing
Returned when there is no other output, so your client doesn't get bored. (Actually, this is because GWC must always output at least one line.)
U - URLs - field 1:
URL
The URL of the alternate cache, beginning with http:// - field 2:
age
The time since submission of this URL to the cache in seconds
H - Hosts - field 1:
Host:Port
The Host:Port of a host - field 2:
age
The time since submission of this URL to the cache in seconds
- Your client should of course be prepared to expect any other responses, as long as they are in the above format: they begin with a character (a-z, A-Z, 0-9), then a pipe (|), then any number of characters and pipes. Also make sure your client can handle extensions to the above formats (for example, expect to have more information following an "
I|pong|(version)" response, i.e. something like "I|pong|(version)|something|else" etc.). In other words, your parser should be very general. - A GWebCache may also provide an extra HTTP header for your client, "X-Remote-IP". This header is analogous to the "Remote-IP" header provided in the Gnutella handshaking protocol, with the difference that it cannot be trusted as much. Trust the Remote-IP header from Gnutella connections instead. X-Remote-IP is what the web server thinks your IP address is, and this could be wrong due to transparent proxies and the like.
- Example responses:
- Short response to a simple Get:
H|127.0.0.2:321|400
H|127.0.0.1:123|4456
U|http://www.server2.com/gcache/gcache.cgi|400
U|http://www.server.net/gcache/gcache.cgi|4456 - Response to an update combined with a ping:
I|pong|GWebCache 0.9.0b
I|update|WARNING|You came back too early - Some responses that are currently not given but that are valid and your parser should still handle:
I|whatever
I|blah||bar
H|192.168.0.1:123|321||foo
U|http://gcache.com|321|xyz|
Step 4: How to make updates to a cache
- To make an update, your client must meet the following criteria. Note that these are the same as the standard Ultrapeer criteria:
- Your client must have been online (running & connected) for at least an hour.
- Your client must accept incoming connections. (This is usually tested by keeping track of whether or not your client has received any incoming connections.)
- In other words, leaf nodes must not send updates.
- Your client must support the Remote-IP Gnutella header. This header is essential for a client so that it can find it's own IP address (for example, if your client is behind a firewall or NAT router). If your client does not yet support this header, you should start supporting it now. Ask on the GDF if you have any questions regarding implementation.
- If your client meets these criteria, your client should send updates once an hour. This is limited by the GWebCache and any updates sent too early will be rejected. Again, there is absolutely no reason to send more than one request per hour to a GWC.
- Updates are sent through the following parameters:
update=1 ip=[your client's numerical IP]:[your client's port for incoming connections] url=[the url of a web cache that your client has successfully contacted]
- Notes
- The IP address you send must be you're client's IP address. This IP address will be checked against the one that the server sees. In case your client is behind a transparent HTTP proxy, there is not much you can do about it, your updates will most likely fail. However, if your IP address is rejected ("
I|update|WARNING|Rejected IP") on more than one cache then your client should consider not sending any updates. - The URL you send must be one that your cache has successfully contacted. This is why I said above, keep tack of which caches your client has successfully contacted.
For example, Gnucleus keeps GWebCaches flagged with either "ALIVE" or "UNTESTED". Any web cache that is added to the internal list is initially flagged as "Untested". When making Get requests, Gnucleus uses a cache flagged as "Untested". If the cache is successfully contacted, the URL is flagged as "Alive". When making updates, Gnucleus sends the update to an "Untested" cache, and sends an "Alive" cache in the url parameter. - Don't forget that the parameter values must be URL-escape-encoded. (See the above explanation.)
- Examples:
- To send an update the cache running at "
http://www.server.com/path/to/gcache.cgi" with your IP/port 194.64.64.1:123 and sending the URL "http://www.otherserver.net/webcache.cgi":
http://www.server.com/path/to/gcache.cgi?client=TEST1.0&update=1&ip=194.64.64.1%3A123&url=http%3a%2f%2fwww.otherserver.net%2fwebcache.cgi
Step 5: How to request information from a GWebCache
- When your client needs IP addresses to connect to, first try your internal host cache (information gathered from Pongs and X-Try headers). On startup, your client should try about 20 IPs from its internal cache, and only then should it contact a GWebCache.
- Requesting information is simple, send the following parameter:
- If the GWebCache has hosts and/or URLs stored, it will return them according to the format defined above.
- Examples:
http://www.server.com/path/to/gcache.cgi?client=TEST1.0&get=1
Extras: Using the "Network" Parameter
- GWebCache now supports storing more than one list of Hosts/URLs. A cache owner may enable his/her cache to store more than just the default Gnutella hosts. Your client should simply send the extra parameter: "
net=[name of network]". When you contact a cache, there are two situations: - The cache supports the network you are asking for. Interaction with the GWC will be unchanged.
- The cache does not support the network you are asking for. The following things will happen:
- The cache will send the extra response "
I|net-not-supported" - When sending Updates: The cache will assume that the URL you are submitting supports the network that you are asking for (!). The URL will be stored internally along with the network name. Any other clients that ask for this network will be given this URL as a kind of "redirect" or "try other".
- When sending Gets: If the cache knows about a URL that supports this network then it will return that URL. Think of this as a "redirect".
- Examples:
http://www.server.com/path/to/gcache.cgi?client=TEST1.0&net=shareaza&get=1 http://www.server.com/path/to/gcache.cgi?client=TEST1.0&net=shareaza&update=1&ip=194.64.64.1%3A123&url=http%3a%2f%2fwww.otherserver.net%2fwebcache.cgi
Extras: Using the Timestamp information
- This feature is experimental, we will keep the timestamp information but might add more information as we see necessary.
- As you may have noticed, GWC returns the "age" (time since submission) of all URLs and IPs it stores. This information is provided as a kind of "freshness" information.
- What your client can do with this information:
- If you notice that the information in the cache is "very fresh" then your client can consider not sending an update for a while. For example: if you notice that a cache has information that was submitted less than a minute ago, you can wait two hours instead of one until you send an update.
- Be very careful with this: If you notice that the information in the cache is very old, then your client can consider sending an update a little earlier. For example: if you notice a cache hasn't gotten an update for more than an hour, you can send an update right away. Remember, this is very dangerous - your client should still not send more than one request an hour.
Extras: Clustering Information
- The GWC2 beta supports the new "
cluster=[keywords]" parameter. This functionality is currently for testing of this feature, so consider it "alpha". - On update requests, if you include the extra parameter "
cluster=keyword1,keyword2,...", these keywords will be stored along with the host you submit. - The following limitations are placed on the keyword string: it may only contain the characters [A-Za-z0-9.-_:], and it may not be longer than 256 characters (yes, the entire keyword string). - Characters that aren't allowed are stripped and any keywords beyond the 256 characters are dropped.
- On get requests, the keywords are returned in the field after the age parameter, like so:
H|127.0.0.2:321|400|keyword1,keyword2,...
^ Top ^
v1.9.4
- Changed "alpha" to "beta" status
- Added clustering information
- Smaller corrections and updates
v1.9.3.4
- Replaced "Important Traffic Issues" by "Summary ofImportant Things to Remember"
v.1.9.3.3
- Added Timestamp information
v1.9.3.2
- Added Traffic section
v.1.9.3.1
- Clarified Remote-IP/X-Remote-IP issues
v.1.9.3
- First release of "Developers' Guide"
^ Top ^
GWebCache Home
See also: http://www.gnucleus.com/
Copyright (c) 2003 Hauke D�mpfling.License Terms: FDL.
Practice of partitioning a single virtual server hosting so it appears as multiple servers.
Õîòèòå êóïèòü äèïëîì - êóïèòü äèïëîì . Ïèøåì äèïëîìû 6000ð! Íå Èíòåðíåò.
car mat. http://www.pjentertainments.co.uk/. air conditioning service london. Play free games now : Play.