Discussion:
Download file without webbrowser
(too old to reply)
A.D. Fundum
2012-08-12 18:01:46 UTC
Permalink
Is there an utility which can be used to download this file
automatically, i.e. without using a webbrowser?

https://europeanequities.nyx.com/en/popup/data/download?ml=nyx_pd_stoc
ks&cmd=default&formKey=nyx_pd_filter_values%3Aaddfde0604ff3b35052feeb4
143d2809

This URL is identical to the .SUBJECT EA of the downlaoded file. So
far I haven't found a way to change the settings and to download the
file. The most important change is the file format (from MS Exces to
CSV or TXT). I know the old URL, which was downloadable with WGET, but
WGETSLL nor SM works with an appended
"&format=txt&formatDecimal=.&formatDate=dd/MM/yy".


--
Andreas Schnellbacher
2012-08-12 18:29:24 UTC
Permalink
Post by A.D. Fundum
Is there an utility which can be used to download this file
automatically, i.e. without using a webbrowser?
Just use wget.
--
Andreas Schnellbacher
A.D. Fundum
2012-08-12 23:05:16 UTC
Permalink
Post by Andreas Schnellbacher
Post by A.D. Fundum
Is there an utility which can be used to download this file
automatically, i.e. without using a webbrowser?
Just use wget.
That'll be WGETSSL, and then the question still is: how?
Unfortunately the new URL with the old parameters (e.g. "format=txt")
downloads the form, but not the file.


--
Mr. G
2012-08-13 16:09:18 UTC
Permalink
Post by A.D. Fundum
Is there an utility which can be used to download this file
automatically, i.e. without using a webbrowser?
https://europeanequities.nyx.com/en/popup/data/download?ml=nyx_pd_stoc
ks&cmd=default&formKey=nyx_pd_filter_values%3Aaddfde0604ff3b35052feeb4
143d2809
This URL is identical to the .SUBJECT EA of the downlaoded file. So
far I haven't found a way to change the settings and to download the
file. The most important change is the file format (from MS Exces to
CSV or TXT). I know the old URL, which was downloadable with WGET, but
WGETSLL nor SM works with an appended
"&format=txt&formatDecimal=.&formatDate=dd/MM/yy".
Maybe Curl?
A.D. Fundum
2012-08-14 21:36:57 UTC
Permalink
Post by Mr. G
Post by A.D. Fundum
CSV or TXT). I know the old URL, which was downloadable with WGET,
but WGETSSL nor SM works with an appended "&format=txt
Maybe Curl?
Thanks, I'll take a look at it. So far the best solution is to open
the form with a WPS URL object> But that still involves a browser, and
my physical backup doesn't have a WPS.


--
A.D. Fundum
2012-08-15 00:39:13 UTC
Permalink
Post by Mr. G
Maybe Curl?
Also downloads the form instead of the file:

curl -k --data
"format=2&layout=1&decimal_separator=1&date_format=1&op=Go"
"https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_sto
cks&cmd=default&formKey=nyx_pd_filter_values%3A68e8d9d0ec59ac5717ef48d
de90b02d7"


--
Dave Yeo
2012-08-15 04:37:07 UTC
Permalink
Post by A.D. Fundum
Post by Mr. G
Post by A.D. Fundum
CSV or TXT). I know the old URL, which was downloadable with WGET,
but WGETSSL nor SM works with an appended "&format=txt
Maybe Curl?
Thanks, I'll take a look at it. So far the best solution is to open
the form with a WPS URL object> But that still involves a browser, and
my physical backup doesn't have a WPS.
--
There are a lot of possible paramtres to wget (and curl), it's just a
matter of figuring out the correct ones.
I know that simply using wget fails on quite a few files yet if I use
the awget plugin to call wget, it succeeds.
Dave
A.D. Fundum
2012-08-15 16:18:01 UTC
Permalink
Post by Dave Yeo
I know that simply using wget fails on quite a few files yet if
I use the awget plugin to call wget, it succeeds.
There probably is no WGET "autoconf", which scans the situation and
produces required options?


--
Steven Levine
2012-08-15 19:47:27 UTC
Permalink
On Wed, 15 Aug 2012 04:37:07 UTC, Dave Yeo <***@gmail.com>
wrote:

Hi,
Post by Dave Yeo
There are a lot of possible paramtres to wget (and curl), it's just a
matter of figuring out the correct ones.
It will take a bit of tinkering to fetch this URL with wget or curl,
because it's not a file URL and does not resolve to a file URL. The
URL is a form which means wget will have to be supplied with
--post-data. Wget can handle this, but someone needed to figure out
the form fields and inputs.
Post by Dave Yeo
I know that simply using wget fails on quite a few files yet if I use
the awget plugin to call wget, it succeeds.
Wget often fails because it generates an illegal file name for the
platform. Awget handles this by determining a reasonable file name.
I have a wgetx.cmd wget wrapper which does this kind of thing.

Steven
--
---------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------
A.D. Fundum
2012-08-16 11:28:16 UTC
Permalink
The URL is a form which means wget will have to be
supplied with --post-data.
I tried that earlier (4 parameters and a button, with the form as the
source of information), but so far without a positive result.
Wget often fails because it generates an illegal file
name for the platform.
FTR: Wget 1.9.1's -O option is in use, i.e.:

wgetssl -O20120816.CSV --post-data
"format=2&layout=1&decimal_separator=1&date_format=1&op=Go"
"https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_sto
cks&cmd=default&formKey=nyx_pd_filter_values%3A68e8d9d0ec59ac5717ef48d
de90b02d7"

I also tried cookies, but the generated cookies.txt file was "empty".
The options above still download the form instead of the file. The URL
itself works with any eCS webbrowser (Netscape, Firefox/SM, Links).


--
Dave Yeo
2012-08-16 15:02:23 UTC
Permalink
Post by A.D. Fundum
The URL is a form which means wget will have to be
supplied with --post-data.
I tried that earlier (4 parameters and a button, with the form as the
source of information), but so far without a positive result.
Wget often fails because it generates an illegal file
name for the platform.
wgetssl -O20120816.CSV --post-data
"format=2&layout=1&decimal_separator=1&date_format=1&op=Go"
"https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_sto
cks&cmd=default&formKey=nyx_pd_filter_values%3A68e8d9d0ec59ac5717ef48d
de90b02d7"
I also tried cookies, but the generated cookies.txt file was "empty".
The options above still download the form instead of the file. The URL
itself works with any eCS webbrowser (Netscape, Firefox/SM, Links).
IIRC, Lynx had a way to automate downloading files.
Dave
A.D. Fundum
2012-08-16 15:31:59 UTC
Permalink
Post by Dave Yeo
IIRC, Lynx had a way to automate downloading files.
I'll check that out too! One of the added options seems to provide
additional information. Please note I'm just pretending to be a
systems-abusing Windows-client, I'm using eCS. And the small
downloaded file is the form (about 7KB) itself, not the about 60KB of
data I'm after.

Near the end there's a ...

utime(20120816.CSV): The file or directory specified is read-only.

... which probably isn't important.

But this may be a problem:

europeanequities.nyx.com/nl/popup/data/***@ml=nyx_pd_stocks&cmd=d
efault&for
mKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48dde90b02d7: The file
or director
y specified cannot be found.
--
[C:\]wgetssl -O20120816.CSV --user-agent="Mozilla/5.0 (Windows; U;
Windows NT 5.
1; en-US; rv:1.5b)" --page-requisites --server-response
--restrict-file-names=wi
ndows --post-data
"format=2&layout=1&decimal_separator=1&date_format=1&op=Go" "h
ttps://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_stock
s&cmd=defa
ult&formKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48dde90b02d7"
--17:23:14--
https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_
stocks&cmd=default&formKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48
dde90b02d7

=> `20120816.CSV'
Resolving europeanequities.nyx.com... 159.125.78.24
Connecting to europeanequities.nyx.com[159.125.78.24]:443...
connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 200 OK
2 Date: Thu, 16 Aug 2012 15:23:00 GMT
3 Server: Apache/2.2.3 (Red Hat)
4 X-Powered-By: PHP/5.2.17 ZendServer/5.0
5 Last-Modified: Thu, 16 Aug 2012 15:23:00 GMT
6 Cache-Control: no-cache, must-revalidate, post-check=0,
pre-check=0, max-age=
1209600
7 ETag: "1345130580"
8 Expires: Thu, 30 Aug 2012 15:23:00 GMT
9 Dap: 16
10 Content-Length: 7042
11 Content-Type: text/html; charset=utf-8
12 Set-Cookie: ZDEDebuggerPresent=php,phtml,php3; path=/
13 Connection: close

100%[====================================>] 7,042 --.--K/s

utime(20120816.CSV): The file or directory specified is read-only.
17:23:16 (6.72 MB/s) - `20120816.CSV' saved [7,042/7,042]

europeanequities.nyx.com/nl/popup/data/***@ml=nyx_pd_stocks&cmd=d
efault&for
mKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48dde90b02d7: The file
or director
y specified cannot be found.

FINISHED --17:23:17--
Downloaded: 7,042 bytes in 1 files
A.D. Fundum
2012-08-16 15:54:06 UTC
Permalink
The file or directory specified cannot be found.
I retried it with a verified file (and "https://": same errr. When
using the same URL with a webbrowser, the form appears again:

https://europeanequities.nyx.com/en/popup/data/download?ml=nyx_pd_stoc
ks&cmd=default&formKey=nyx_pd_filter_values%3Aaddfde0604ff3b35052feeb4
143d2809

In production I use a same, unchanged URL every day; the hexadecimal
filter value doesn't really matter, so long as the URL is valid. In
the end the data is verified (full rewrite, a bigger PITA than this
Wget issue).


--
Steven Levine
2012-08-16 15:56:42 UTC
Permalink
Post by A.D. Fundum
I tried that earlier (4 parameters and a button, with the form as the
source of information), but so far without a positive result.
That just means you have not figured out the correct post data yet.
My counting says there are 7 post data items. There are two hidden
fields.
Post by A.D. Fundum
Post by Steven Levine
Wget often fails because it generates an illegal file
name for the platform.
That was a comment to Dave. It was clear that your issue was
something else.

cks&cmd=default&formKey=nyx_pd_filter_values%3A68e8d9d0ec59ac5717ef48d
Post by A.D. Fundum
de90b02d7"
You need to double the %'s to keep the shell happy, but the problem is
elsewhere.
Post by A.D. Fundum
I also tried cookies, but the generated cookies.txt file was "empty".
It's something else. Blocking cookies at the browser has no effect.
Post by A.D. Fundum
The options above still download the form instead of the file. The URL
itself works with any eCS webbrowser (Netscape, Firefox/SM, Links).
There are a couple of differences between these and wgetssl. The
first is they obviously know how to encode the post data correctly.
This means need to percent-encode post data for wgetssl. It does not
do this for you.

The other is that they do not use self-signed certificates. I don't
think this matters, but one never knows.

I checked if the user agent mattered, and this does not appear to be
the case.

Steven
--
---------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------
Paul Ratcliffe
2012-08-16 17:28:33 UTC
Permalink
On Thu, 16 Aug 2012 10:56:42 -0500, Steven Levine
Post by Steven Levine
You need to double the %'s to keep the shell happy, but the problem is
elsewhere.
Post by A.D. Fundum
I also tried cookies, but the generated cookies.txt file was "empty".
It's something else. Blocking cookies at the browser has no effect.
I find iptrace/ipformat very useful for these sort of things. You can
see exactly what goes out on the wire and compare the two cases.
When they are the same, it will work. If they aren't, it might not.
Steven Levine
2012-08-20 16:19:09 UTC
Permalink
On Thu, 16 Aug 2012 17:28:33 UTC, Paul Ratcliffe
<***@orac12.clara34.co56.uk78> wrote:

Hi Paul,
Post by Paul Ratcliffe
I find iptrace/ipformat very useful for these sort of things. You can
see exactly what goes out on the wire and compare the two cases.
When they are the same, it will work. If they aren't, it might not.
I would have done that if this were not an SSL connection. The
encryption happens too early and the site is https only, so use http:
and iptrace was not an available option.

Steven
--
---------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------
A.D. Fundum
2012-08-17 01:27:18 UTC
Permalink
Post by Steven Levine
Post by A.D. Fundum
I tried that earlier (4 parameters and a button, with the form
as the source of information), but so far without a positive
result.
That just means you have not figured out the correct post
data yet. My counting says there are 7 post data items.
There are two hidden fields.
Thanks, that's the solution! I overlooked this, for one because my
real problems start when I have to process the downloaded data
(website_redesign==burned_beyond_recognition)...
Post by Steven Levine
Post by A.D. Fundum
Post by Steven Levine
Wget often fails because it generates an illegal
file name for the platform.
That was a comment to Dave.
I was aware of that, hence the FTR. ;-)
Post by Steven Levine
You need to double the %'s to keep the shell happy, but the
problem is elsewhere.
I used the original .SUBJECT EA as-is, including a single %3A, and
double quotes. Like:

wgetssl -OSolv.ed! --post-data
"format=2&layout=2&decimal_separator=1&date_format=1&op=Go&form_build_
id=form-db6385f424cebdc1a634c46fb963cd28&form_id=nyx_download_form"
"https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_sto
cks&cmd=default&formKey=nyx_pd_filter_values%3Aaddfde0604ff3b35052feeb
4143d2809"


--
Continue reading on narkive:
Loading...