This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[OT] Re: Want to use tor with wget.


[ We're offtopic here since it's not a cygwin-specific issue anymore, so I've
set a follow-up to the cygwin-talk list in case you have further questions or
replies. ]

hongyi.zhao wrote:
> On Tuesday, October 13, 2009 at 13:44, dave.korn.cygwin wrote:
>> Hongyi Zhao wrote:


> I want to use wget to grab the following web page:
> 
> http://www.cybersyndrome.net/pla5.html

  Then, you can tell wget to use your local privoxy as an http proxy, which is
exactly how your browser relates to it.

  export http_proxy=localhost:8118
  wget http://www.cybersyndrome.net/pla5.html

should do the trick, but check the wget manual page about proxy support for
full details.  (I'm assuming here you're running the usual kind of Tor setup
with a supporting co-installation of Privoxy.)

> OTOH,  I've  also  learned that curl support socks4/5 proxy, and I use
> the following command under my cygwin console:
> 
> curl --socks5 127.0.0.1:9050 http://www.cybersyndrome.net/pla5.html
> 
> But I meet the following error:
> 
> -----------------------------
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <HTML><HEAD>
> <TITLE>302 Found</TITLE>
> </HEAD><BODY>
> <H1>Found</H1>
> The document has moved <A HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40
> 4.html">here</A>.<P>
> </BODY></HTML>
> -----------------------------

  That's interesting.  A real 302 redirect would have an actual 302 status
code and a Location header, not just be a 200 returning an html document with
the words "302 Found" and a URL in it.

> Nevertheless,  I  can  use  firefox  with  Tor  enabled to access this
> webpage.
> 
> What's  the  reason  

  It's something the server is doing deliberately, perhaps a malfunctioning or
misguided anti-bot feature of some sort, based on the request headers sent by
the user's agent.

> and  how  can  I  grab  this  webpage  just  by a
> command-line downloading tool?

  Well, you can use wget!  Or you can tell your curl to pretend it is wget!

> $ curl 'http://www.cybersyndrome.net/pla5.html'
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <HTML><HEAD>
> <TITLE>302 Found</TITLE>
> </HEAD><BODY>
> <H1>Found</H1>
> The document has moved <A HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40
> 4.html">here</A>.<P>
> </BODY></HTML>

> $ wget 'http://www.cybersyndrome.net/pla5.html'
> --2009-10-13 21:00:36--  http://www.cybersyndrome.net/pla5.html
> Resolving www.cybersyndrome.net... 210.153.118.69
> Connecting to www.cybersyndrome.net|210.153.118.69|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: `pla5.html'
> 
>     [          <=>                          ] 18,151      3.11K/s   in 5.7s
> 
> 2009-10-13 21:00:42 (3.11 KB/s) - `pla5.html' saved [18151]

> $ curl 'http://www.cybersyndrome.net/pla5.html' -A 'User-Agent: Wget/1.11.4'
> <html>
> <head>
> <meta http-equiv="content-type" content="text/html; charset=Shift_JIS">
> <meta name="robots" content="noarchive">
> <meta name="description" content="âââpâ?\â?OVââProxyâââââ??ââJâââAââ?ââB">
> <title>CyberSyndrome : Proxy List / Anonymous</title>
> <style type="text/css">
           [ ... snip ... ]

    cheers,
      DaveK


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]