I am writing a JAVA relaying proxy service, which acts a middlebox between the browser and the internet. Its purpose is to just look at passing web requests from the browser and responses to the browser, and parse these responses later offline.
My JAVA proxy listens on a particular socket for connections from the browser. When a new connection comes up, it reads the browser request header, identifies the host to be connected, creates a connection to the host and passes on the browser request. The code for parsing the browser request and the relaying the server response is the streamHTTPData() method given below. In the code, debugOut is the standard System.out.
The code works fine for a large section of websites, but a strange issue crops up for a few websites and I am unable to view the home pages. I noticed this happening when I was randomly following links on Google search, and came across a forum. I used HTTPFOX extension for Firefox browser, and noticed that the request sent by browser to the JAVA program and from there to the web server is exactly the same. However, I received HTTP 200 response when not using the JAVA middlebox and HTTP 404 otherwise. I am not sure what the problem is. Can anyone point me in the right direction. The HTTP requests and response captured by HTTPFOX are provided below.
private int streamHTTPData(InputStream in, OutputStream out,StringBuffer host, StringBuffer url, boolean waitForDisconnect) {
// get the HTTP data from an InputStream, and send it to
// the designated OutputStream
StringBuffer header = new StringBuffer("");
String data = "";
int responseCode = 200;
int contentLength = 0;
int pos = -1;
int byteCount = 0;
try {
// get the first line of the header, so we know the response code
data = readLine(in);
if (data != null) {
header.append(data + "\r\n");
pos = data.indexOf(" ");
if ((data.toLowerCase().startsWith("http")) && (pos >= 0)
&& (data.indexOf(" ", pos + 1) >= 0)) {
String rcString = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
try {
responseCode = Integer.parseInt(rcString);
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error parsing response code "
+ rcString);
}
} else {
if ((pos >= 0) && (data.indexOf(" ", pos + 1) >= 0)) {
String suffix = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
url.setLength(0);
url.append(suffix.trim());
}
}
}
// get the rest of the header info
while ((data = readLine(in)) != null) {
// the header ends at the first blank line
if (data.length() == 0)
break;
header.append(data + "\r\n");
// check for the Host header
pos = data.toLowerCase().indexOf("host:");
if (pos >= 0) {
host.setLength(0);
host.append(data.substring(pos + 5).trim());
}
// check for the Content-Length header
pos = data.toLowerCase().indexOf("content-length:");
if (pos >= 0)
contentLength = Integer.parseInt(data.substring(pos + 15)
.trim());
}
// add a blank line to terminate the header info
header.append("\r\n");
// convert the header to a byte array, and write it to our stream
out.write(header.toString().getBytes(), 0, header.length());
System.out.println(header.toString());
// if the header indicated that this was not a 200 response,
// just return what we've got if there is no Content-Length,
// because we may not be getting anything else
if ((responseCode != 200) && (contentLength == 0)) {
out.flush();
return header.length();
}
// get the body, if any; we try to use the Content-Length header to
// determine how much data we're supposed to be getting, because
// sometimes the client/server won't disconnect after sending us
// information...
if (contentLength > 0)
waitForDisconnect = false;
if ((contentLength > 0) || (waitForDisconnect)) {
try {
byte[] buf = new byte[4096];
int bytesIn = 0;
while (((byteCount < contentLength) || (waitForDisconnect))
&& ((bytesIn = in.read(buf)) >= 0)) {
out.write(buf, 0, bytesIn);
out.flush();
byteCount += bytesIn;
}
} catch (Exception e) {
String errMsg = "Error getting HTTP body: " + e;
if (debugLevel > 0)
debugOut.println(errMsg);
}
}
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error getting HTTP data: " + e);
}
// flush the OutputStream and return
try {
out.flush();
} catch (Exception e) {
}
return (header.length() + byteCount);
}
HTTP request (with and without middlebox):
(Request-Line) GET / HTTP/1.1
Host andhrawatch.com
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Proxy-Connection keep-alive
HTTP response without JAVA middlebox:
(Status-Line) HTTP/1.1 200 OK
Date Fri, 27 Jul 2012 03:51:38 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=c68edaebc6dedb2b291832dfbfb784fc; path=/
Last-Modified Fri, 27 Jul 2012 03:51:38 GMT
Keep-Alive timeout=5, max=100
Connection Keep-Alive
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8
HTTP response with JAVA middlebox
(Status-Line) HTTP/1.1 404 Component not found
Date Fri, 27 Jul 2012 03:54:39 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=33806d89181aa6d488ccba1b9163e511; path=/
Last-Modified Fri, 27 Jul 2012 03:54:39 GMT
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8
I am writing a JAVA relaying proxy service, which acts a middlebox between the browser and the internet. Its purpose is to just look at passing web requests from the browser and responses to the browser, and parse these responses later offline.
My JAVA proxy listens on a particular socket for connections from the browser. When a new connection comes up, it reads the browser request header, identifies the host to be connected, creates a connection to the host and passes on the browser request. The code for parsing the browser request and the relaying the server response is the streamHTTPData() method given below. In the code, debugOut is the standard System.out.
The code works fine for a large section of websites, but a strange issue crops up for a few websites and I am unable to view the home pages. I noticed this happening when I was randomly following links on Google search, and came across a forum. I used HTTPFOX extension for Firefox browser, and noticed that the request sent by browser to the JAVA program and from there to the web server is exactly the same. However, I received HTTP 200 response when not using the JAVA middlebox and HTTP 404 otherwise. I am not sure what the problem is. Can anyone point me in the right direction. The HTTP requests and response captured by HTTPFOX are provided below.
private int streamHTTPData(InputStream in, OutputStream out,StringBuffer host, StringBuffer url, boolean waitForDisconnect) {
// get the HTTP data from an InputStream, and send it to
// the designated OutputStream
StringBuffer header = new StringBuffer("");
String data = "";
int responseCode = 200;
int contentLength = 0;
int pos = -1;
int byteCount = 0;
try {
// get the first line of the header, so we know the response code
data = readLine(in);
if (data != null) {
header.append(data + "\r\n");
pos = data.indexOf(" ");
if ((data.toLowerCase().startsWith("http")) && (pos >= 0)
&& (data.indexOf(" ", pos + 1) >= 0)) {
String rcString = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
try {
responseCode = Integer.parseInt(rcString);
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error parsing response code "
+ rcString);
}
} else {
if ((pos >= 0) && (data.indexOf(" ", pos + 1) >= 0)) {
String suffix = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
url.setLength(0);
url.append(suffix.trim());
}
}
}
// get the rest of the header info
while ((data = readLine(in)) != null) {
// the header ends at the first blank line
if (data.length() == 0)
break;
header.append(data + "\r\n");
// check for the Host header
pos = data.toLowerCase().indexOf("host:");
if (pos >= 0) {
host.setLength(0);
host.append(data.substring(pos + 5).trim());
}
// check for the Content-Length header
pos = data.toLowerCase().indexOf("content-length:");
if (pos >= 0)
contentLength = Integer.parseInt(data.substring(pos + 15)
.trim());
}
// add a blank line to terminate the header info
header.append("\r\n");
// convert the header to a byte array, and write it to our stream
out.write(header.toString().getBytes(), 0, header.length());
System.out.println(header.toString());
// if the header indicated that this was not a 200 response,
// just return what we've got if there is no Content-Length,
// because we may not be getting anything else
if ((responseCode != 200) && (contentLength == 0)) {
out.flush();
return header.length();
}
// get the body, if any; we try to use the Content-Length header to
// determine how much data we're supposed to be getting, because
// sometimes the client/server won't disconnect after sending us
// information...
if (contentLength > 0)
waitForDisconnect = false;
if ((contentLength > 0) || (waitForDisconnect)) {
try {
byte[] buf = new byte[4096];
int bytesIn = 0;
while (((byteCount < contentLength) || (waitForDisconnect))
&& ((bytesIn = in.read(buf)) >= 0)) {
out.write(buf, 0, bytesIn);
out.flush();
byteCount += bytesIn;
}
} catch (Exception e) {
String errMsg = "Error getting HTTP body: " + e;
if (debugLevel > 0)
debugOut.println(errMsg);
}
}
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error getting HTTP data: " + e);
}
// flush the OutputStream and return
try {
out.flush();
} catch (Exception e) {
}
return (header.length() + byteCount);
}
HTTP request (with and without middlebox):
(Request-Line) GET / HTTP/1.1
Host andhrawatch.com
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Proxy-Connection keep-alive
HTTP response without JAVA middlebox:
(Status-Line) HTTP/1.1 200 OK
Date Fri, 27 Jul 2012 03:51:38 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=c68edaebc6dedb2b291832dfbfb784fc; path=/
Last-Modified Fri, 27 Jul 2012 03:51:38 GMT
Keep-Alive timeout=5, max=100
Connection Keep-Alive
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8
HTTP response with JAVA middlebox
(Status-Line) HTTP/1.1 404 Component not found
Date Fri, 27 Jul 2012 03:54:39 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=33806d89181aa6d488ccba1b9163e511; path=/
Last-Modified Fri, 27 Jul 2012 03:54:39 GMT
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8
0 commentaires:
Enregistrer un commentaire