README for ziproxy (CVS)

Ziproxy - a compression http proxy
Copyright (C)2003-2004 Juraj Variny <variny@naex.sk>
Copyright (C)2005-2006 Daniel Mealha Cabrita <dancab@gmx.net>

	This program is free software; you can redistribute it and/or modify
	it under the terms of the GNU General Public License as published by
	the Free Software Foundation; either version 2 of the License, or
	(at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
	GNU General Public License for more details.

	You should have received a copy of the GNU General Public License
	along with this program; if not, write to the Free Software
	Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111 USA


Ziproxy is forwarding, non-caching, compressing proxy server. It 
squeezes images by converting them to low quality jpegs and 
optionally can also gzip HTML and other text-like data. It is 
intended to free bandwidth on dialup connections. Can be run using 
inetd, xinetd, or (even better) you can run it in daemon mode.

 Why?

HTML is plain text, and as such is large and can be compressed very 
well. Most web browsers have the ability to receive content 
compressed in gzip form and then view it as normal. Not many people 
know about this ability, so it isn't used very much. Using this 
feature will speed up web access as text files that are uncompressed 
from a web-server, can be compressed using this proxy and then sent 
over a slower internet connection (like dial-up). To give an example 
of the speed increase, a 100K HTML page can be compressed down to 
around 7Kb after using this proxy. Well, it's shameless advert, we 
aren't counting for modem hardware compression - but that is quite 
less efficient. Even for browsers that don't support this there is 
workaround using SSH port forwarding, and it can yield even better 
compression and response times.

Moreover, images on most pages are of unnecessary high quality 
and/or saved in unsuitable format. Average compression of all images 
using ziproxy is one third of original size, with only marginal 
visible quality decrease. Animated GIFs are stopped, too.

The idea is that you install this at your ISP, or on a fast server 
on the internet ("remote host"). Then use this proxy for your 
dial-up connections to the web from "local host".

 Requirements on remote host

* libungif 

* libpng 

* libjpeg-6b 

* zlib

* libconfuse (library for parsing of configuration files), available 
  at http://www.stacken.kth.se/~mhe/confuse.shtml

* GCC and GNU make. BSD make may work. Sun Make/CC doesn't.

 Installation

To see your options, run: 

$ ./configure --help 

Then, running:

$ ./configure 
$ make 
$ make install 

should compile and install 'ziproxy' binary. There are 
optional test programs that can be compiled and installed with 
--enable-testprogs option to configure:

  modifytest for testing HTML modification - reads the file from 
  stdin and outputs to stdout. 

  imgtest does the same for images, specify input and output file 
  names as command line parameters. 

  cfgtest can be used to check configuration file parser and default 
  values. 

 Command line

ziproxy <-d|-i> [-c </path/to/ziproxy.conf>] [-f <IP.address or hostname>] [-h]

-d runs ziproxy in daemon mode.
Either this or '-i' is mandatory, both are mutually exclusive.

-i runs ziproxy in [x]inetd mode.
Either this or '-i' is mandatory, both are mutually exclusive.
Use this when invoking ziproxy from either inetd or xinetd.

-c configuration file, full path (typically /etc/ziproxy.conf)
Optional, if unspecified use the internal default.

-f same as the "OnlyFrom=" option in configuration file,
but with higher precedence.
Optional, if unspecified this option won't be used.

-h display available command line options



 Configuration file

Default location for configuration file is current directory. 

 daemon mode-only options

  WhereZiproxy
  This option is obsolete.

  Port=8080 Port number Ziproxy uses to listen for connections.

  OnlyFrom="an.IP.address" Accept requests only from specified 
  hostname/IP address. You can also specify a range of IP addresses 
  by OnlyFrom="begin.IP.address-end.IP.address". Default is empty 
  (connections are acepted from everywhere).

  NetdTimeout=240 If no connection appears, Ziproxy will exit after 
  specified time in seconds. Set to 0 to disable this.

  MSIETest=true/false If both inetd or xinetd and MSIE run under 
  win2000/XP, MSIE will complain about broken connection. It can be 
  avoided setting this option to TRUE. But then inetd will start 3 
  processes instead of one for every request, what is not convenient 
  for everyday use. It uses system() function instead of exec().

 general options

  Gzip=true/false Whether ziproxy should compress data itself. 
  Browser must accept compressed data. Default: true. If you're 
  using ssh with compression, turn off to prevent unnecessary double 
  compression.

  Compressible={"shockwave","msword","java"} Specifies MIME data 
  types under application/, which ziproxy should compress too. 
  Default: empty. Type given in response from server is treated 
  following way (example: "application/x-javascript"):

* leading "application/" is discarded (result: "x-javascript")

* if result begins with "x-", that is discarded too (result: "javascript")

* beginning of result is compared with all strings specified in 
  Compressible option. If matches, ziproxy will compress it. (third 
  string above matches leading "java")

  ImageQuality={17,20,23,25} This option must have either 4 values 
  or must be not present at all. The numbers give requested quality 
  of outcoming JPEG images, based on size of image(width*height in 
  pixels), respectively:

1. less than 5000 pixels

2. between 5000 and 50000 pixels or one dimension is smaller than 
  150 pixels

3. between 50000 and 250000 pixels

4. more than 250000 pixels

Either number has following meaning:

* between -100 and -1: convert image to grayscale JPEG with given quality

* 0: do nothing with image

* between 1 and 100: convert image to color JPEG with given quality. 
  If the source image is grayscale, the resulting JPEG may be 
  grayscale too. But ziproxy isn't always able to detect grayscale 
  source images.

For example, ImageQuality={-15,20,25,0} means: Images less than 5000 
pixels will be converted to grayscale JPEG with quality of 15. 
Images between 5000 and 50000 pixels will be converted to color JPEG 
with quality of 20. Images between 50000 and 250000 pixels will be 
converted to color JPEG with quality of 25. Images larger than 
250000 pixels will be unchanged.

  ZiproxyTimeout=90 If processing of request exceeds specified time 
  in seconds, it will abort with error.

  UseContentLength=true/false By default, if ziproxy is 
  modifying/compressing, it begins sending data only after their 
  length can be determined(UseContentLength=true). If you turn 
  option off, ziproxy will start sending data sooner, what will make 
  browsing feel more responsive. But, because browser doesn't know 
  data length, it will be unable to distinguish broken connection 
  from properly closed one. If you use SSH compression instead and 
  your browser identifies itself as HTTP/1.1, you need not unset 
  this (ziproxy will then send "chunked" content to browser).

  MaxSize=bytes Ziproxy checks this if it is going to do any data 
  modifications. Because it needs to store whole data in memory 
  (especially in case of images needed memory space is many times 
  greater than compressed image size), it is useful to set this 
  limit.
  Note that bigger MaxSize values also increase the request reply
  latency (since the file must be fully loaded before processing).
  Default: 0 (size checking is off).

  MinTextStream=bytes
  Min text file (ie. gzip-able data) for streaming while compressing
  Files smaller than this will be completely compressed before being
  streamed, being temporarily stored in /tmp (or equivalent) dir.
  For performance reasons, streaming while compressing is better (no disk overhead),
  but such files won't have the compressed filesize reported in logs
  (will show '-1' as compressed size).
  Default: 20000 bytes.

  ViaServer="something" If specified, ziproxy will send and check 
  Via: header with given string as host identification. It is 
  sometimes useful to avoid request loops. Default: not specified

  ModifySuffixes=true/false If your browser can not recognize images 
  with other suffix than real image type, turn this on. Since most 
  browsers can, default is false. 
  
  AllowLookChange=true/false If ziproxy is compressing transparent or
  animated images, the resulting change of page look is sometimes too 
  drastical.
  Setting this option to false makes ziproxy avoid compressing these 
  images. Default: false (true in pre-2.0.0 versions).

  ProcessJPG=true/false If false, ziproxy will not try to recompress
  JPEG format files. Default: true.

  ProcessPNG=true/false If false, ziproxy will not try to recompress
  PNG format files. Default: true.

  ProcessGIF=true/false If false, ziproxy will not try to recompress
  GIF format files. Default: true.

  PreemptNameRes=true/false Preemptive name resolution. If true and
  the processed file is a html one, it will try to resolve all
  the hostnames present in the html file (in the hope the resolved
  name will be cached by the DNS or name cache, external to Ziproxy).
  Ziproxy will _not_ cache any hostname by itself! (Try PDNSD, etc)
  If the user clicks a link from a page previously processed by
  Ziproxy, there will be no delay due to name resolution.
  Warning: This option will increase the DNS traffic by many times.
  Default: false (true in pre-2.0.0 versions).

  PreemptNameResMax=50 Maximum hostnames Ziproxy will try to resolve
  in a preemptive manner (see PreemptNameRes). Default: 50

  PreemptNameResBC=true/false Bogus check for hostnames Ziproxy
  will try to resolve in a preemptive manner (see PreemptNameRes).
  Currently, if enabled, ignore hostnames other than the ones
  ending with .nnnn, .nnn or .nn (eg. .info, .com, .br...)
  Default: false

  TransparentProxy=true/false Allow processing of requests as
  transparent proxy (will still accept normal proxy requests)
  In order to use Ziproxy as transparent proxy it's also needed
  to reroute the connections from x.x.x.x:80 to ziproxy.host:PROXY_PORT
  Default: false

  CustomError400="/full/path/error_file.html"
  Custom error message for "Bad request"
  (malformed URL, or unknown URL type)
  Default: (internal error message)

  CustomError404="/full/path/error_file.html"
  Custom error message for "Unknown host"
  (Ziproxy will not issue 'page not found' errors itself)
  Default: (internal error message)

  CustomError408="/full/path/error_file.html"
  Custom error message for "Request timed out"
  Default: (internal error message)

  CustomError500="/full/path/error_file.html"
  Custom error message for "Internal error"
  (or empty response from server)
  Default: (internal error message)

  CustomError503="/full/path/error_file.html"
  Custom error message for "Connection refused"
  (or service unavailable)
  Default: (internal error message)

  PasswdFile="/full/path/ziproxy.passwd"
  If enabled, requires authentication from clients willing to connect
  to the proxy.
  The specified file should contain:
    user:pass pairs,
    lines no longer than 128 chars
  Note: The password is unencrypted
  Default: No file specified (thus no authentication required)


 Logging options

Logging output is intended mainly for debugging. If neither LogFile 
nor LogPipe option is found, logging is turned off (this is the default).

  LogFile="file_name" Append log output into file_name. Specified 
  string is passed to strftime() function first with current date/time.

  AccessLogFileName="/something_like/var/log/ziproxy/access.log"
  File to be used as access log.
  Log format (columns):
    TIME (unix time as seconds.msecs),
    PROCESS_TIME (ms, from receiving request to last byte sent to client),
    ADDRESS (daemon mode only, with [x]inet it displays a '?'),
    FLAGS,
    ORIGINAL_SIZE,
    SIZE_AFTER_(RE)COMPRESSION,
    METHOD,
    URL.
  Where FLAGS may be: P (a request as proxy), T (a request as transparent proxy), S (SSL data).
  Default: No file specified (thus no access logging)

  LogPipe={"command","-arg1","-arg2"} This is incompatible with 
  xinetd, use with netd only! It pipes all logging output through 
  command. If LogFile option is present too, standard output of 
  command is redirected to that file. For example, if you haven't 
  enough space (low quota) on remote host, you can compress logging 
  output on the fly. It has disadvantage that logfile is usable only 
  after Ziproxy exits (or you can end it with ^C). 

  NextProxy="host.name" 

  NextPort=8080 Forward everything to another proxy server. 
  Modifications/compression is still applied.

 Compiling under cygwin -- TODO

It compiles almost the same way. However, you may want to avoid 
libungif dependence on X11. Then add -static option to LDFLAGS 
variable in Makefile:

LDFLAGS = -g $(SYSV_LIBS) -static -lgif -lpng -ljpeg -lm -lz 
-lconfuse 

It has other advantage, that statically linked executable 
ziproxy.exe can be together with cygwin1.dll 
transferred to other machine, where cygwin needs not be installed. 
xinetd server is also available as cygwin package.

 Usage

 With inetd

In /etc/inetd.conf add the line where <location> is where you put 
the executable:

ziproxy stream tcp nowait.500 root /usr/sbin/tcpd <location>/ziproxy 
-i -c <location>/ziproxy.conf

in /etc/services add the line where <port> is the port you want the 
proxy to be on:

ziproxy <port>

then restart inetd.

 With xinetd

See the example config file included in this tarball.

 Daemon mode (standalone operation)

It is intended as simple inetd replacement if you want to use 
ziproxy under unprivileged user account. Every time you connect to 
internet, log in to remote machine and start it with command

- for daemon mode:

./ziproxy -d -c 'somewhere/ziproxy.conf' -f your.IP.adress

- for port forwarding:

./ziproxy -d -c 'somewhere/ziproxy.conf' -f 127.0.0.1 

Or set OnlyFrom=127.0.0.1 in ziproxy.conf instead of -f switch. Then 
it will accept requests only from your machine. If you forget to 
kill Ziproxy before hangup, it times out (according to NetdTimeout 
option). 

 Automated SSH logins

Use SSH public key authentication for logging in without password -- 
see ssh-keygen manpage. 

 Direct connection

You can tell your browser he has to connect directly to remote host 
to use ziproxy. Then compression can be done by ziproxy(Gzip=true 
option), but your browser must support it. MS Internet Explorer 
needs setting up(see Caveats below), Opera, Mozilla or Konqueror are 
fine. There is also HTTP protocol overhead that can't be compressed 
this way. Moreover, this may not work if remote machine is behind firewall.

 Port forwarding

Use SSH port forwarding by running command like

ssh yourlogin@remote.machine -C -L 8090:127.0.0.1:8080 -N

Then set up your browser to use proxy "localhost" port 8090, while 
ziproxy is using port 8080 on remote host. All connections between 
them are carried and compressed by ssh. Remarks about automating 
logins apply as above. 

This capability is present only in OpenSSH. For using it under 
Windows, OpenSSH compiled under cygwin toolkit is available at 
http://www.networksimplicity.com/openssh/ . That's all. If you want 
to tweak it further, there is CompressionLevel option for ssh.

 Transparent proxy

In order to use Ziproxy as transparent proxy:
1. - In ziproxy.conf: TransparentProxy = true
2. - It's also needed to reroute the connections from
     x.x.x.x:80 to ziproxy.host:PROXY_PORT

Examples of traffic rerouting (Linux kernel >= 2.4 OSes):
THESE ARE INCOMPLETE SCRIPTS AND DO NOT PROVIDE ANY SECURITY !!!

### Requests from a local machine --> remote Ziproxy host
$ /sbin/modprobe ip_tables
$ /sbin/modprobe iptable_nat
$ IPTABLES=/usr/sbin/iptables
$ ZIPROXY_HOST=200.56.78.90
$ ZIPROXY_PORT=8080
$ $IPTABLES -t nat -A OUTPUT -s 0/0 -p tcp --dport 80 -j DNAT --to ${ZIPROXY_HOST}:${ZIPROXY_PORT}

### A transparent machine routing HTTP traffic AND running Ziproxy
$ /sbin/modprobe ip_tables
$ /sbin/modprobe iptable_nat
$ IPTABLES=/usr/sbin/iptables
$ NET_INTERFACE=eth0
$ ZIPROXY_HOST=200.56.78.90
$ ZIPROXY_PORT=8080
$ $IPTABLES -t nat -A PREROUTING -i $NET_INTERFACE -d ! $ZIPROXY_HOST -p tcp --dport 80 -j REDIRECT --to-port $ZIPROXY_PORT

 FAQ/Known bugs 

 MSIE setup for ziproxy 

To get IE accepting gzipped data, under Internet Options/Advanced 
tab check option "Use HTTP/1.1 extensions when using proxy".

 How and why ziproxy changes HTML

Obsolete -- most browsers can now detect image type from 
Content-type header properly. There is still need for HTML changing 
in JPEG2000 mode -- see JPEG2000.txt.

When browser comes to display some image, it looks to suffix of its 
name. When ziproxy changes image type from GIF/PNG to JPEG, it can't 
additionally change suffix too and then browser treated image as 
broken. So ziproxy has to preliminary change all image suffixes in 
HTML. When you extract the image from page, you have to rename it 
back to .gif/.png/.jpg according to what really image type is. On 
un*x systems comes "file" tool handy. If you will save entire page 
HTML+pictures, check whether your browser saves pictures with usable 
filename suffixes (most browsers do). If not, temporarily set using 
proxy off, refresh page and save it.

 Some pictures have inappropriate background color

Some transparent GIFs/PNGs are displayed with incorrect background. 
It's because JPEG can't store transparency information, and 
background color information is "out there" in HTML. It is on the 
to-do list, but it will be quite big change ;).

 Older WWWOffle versions don't like gzip compression

Using gzip (not ssh) compression seems to trigger a bug in wwwoffle 
- pages are incorrectly uncompressed. Upgrade to wwwoffle 2.7g or 
newer. 

 ziproxy seems running, but I can't login/run other programs on that 
  remote host!

For every HTTP request, new ziproxy process is started. In case of 
intensive parallel downloading/mirroring (for example, wwwoffle 
-fetch or using httrack), number of processes may temporarily reach 
maximal user processes limit set by administrator. To avoid the 
problem, set subsequent limit for Ziproxy using limit(csh) or 
ulimit(bash) shell command.
