Programmatically downloading a file from a filehoster - delphi

I have some files uploaded at a filehoster which I want to download programmatically, using Delphi. They don't require any captchas or the like, normally you simply press a button and you get the file. Let's take this as an example.
Now I thought I could simply take the URL the Download Now - Button is pointing at, use an TIdHTTP.Get request and save it with a MemoryStream / Filestream / whatever. Copying the link address leads to this site, which, when entered into my browser pops up the download prompt.
var
MemStream: TMemoryStream;
code: string; // added for solution
number: integer; // added for solution
begin
with TIdHTTP.Create(nil) do
try
HandleRedirects := true;
System.Delete(code,1,AnsiPos('var n =',code)+7); // added
number := StrToInt(AnsiLeftStr(code,AnsiPos(' ',code)-1)) + 1; // added
MemStream := TMemoryStream.Create;
try
// Get('http://www56.zippyshare.com/d/5862319/604061/bgAvgTable.png', MemStream);
Get(TIdURI.URLEncode('http://www56.zippyshare.com/d/5862319/' + IntToStr(number)
+ '/bgAvgTable.png'), MemStream); // added for solution
MemStream.SaveToFile('test.png');
finally
MemStream.Free;
end;
finally
Free;
end;
end;
However, using a checking tool I found that it contains a 302 redirect to the original site, thus when performing the GET-request I have to set HandleRedirects to avoid error messages and I get the HTML code of the original site rather than the file I had suspected.
So, I am kind of confused about how
1) I somehow get the file from my browser though the URL only contains a 302 redirect to the previous page and
2) I can achieve the same from within my code. Any chance someone of you might educate me a little there ? ;)
EDIT
Thanks to your input I could find the issue, turns out that the address I have to use gets generated using a random number, which is to be found in the original source. So posting a request to get the number first does the trick. I have edited the code accordingly.

File hosting sites make different tricks to ensure you was not hotlinking and show you advertisement and perhaps counter. There can be
simple analysis of HTTP Referrer field in the request
setting and checking session-unique cookies
having HTTP Forms with hidden one-time values, and Download button would be not the link but the form's Submit action.
generating one-time hashed URL, and encoding different parameters like your IP and your browser name into it
maybe more
Tools like USDownloader and JDownloader makes a lot of attempts to circumvent it.
While zippyshare seems to be more liberal, it still cannot afford hotlinking and should implement at least some measures of self-defense.
When analysing traffic - start with absolutely fresh browser loading zippyshare page for the 1st time in its life and check it all.
As i re-load the page few times i see that the number "604061" is different and link keep changing time and again after each reload. You probably have to load the page, parse the link, set the HTTP referer and only then download the file.
You do not show the HTTP traffic logs so it is hard to tell for sure.

The server may be checking for some trace to avoid the file to be downloaded programmatically.
It may be anything the hostmaster wants to check, from a wide range of possibilities, but the most typical check is the referrer.
When you navigate in a web browser from one page to another using an link, the browser adds the first page as a referrer to the second page in the request header.
Indy have support for you to add a referrer:
IdHTTP1.Request.Referer := 'http://www.any.other.page';
If the check fails, the server script just redirects the input to the donwload page. This is done to show advertising or to filfull other goals of the file hosting service.

Related

How to use and stabilize IdHttp for a large number of sites

I have about 30 unique sites which I login to and download a few files from each one of them. Sometimes I have about 50 sites, which the 20 extra are the same as previous ones but with different login credentials.
If I run the download process for any of them with all the other sites disabled, all of them work great. But if I try to download them one after another, I usually get about 5-10 errors from 40 sites!
Exceptions that occur are usually socket error, connection closed gracefully or unknown error occured!
Right now, I create a class for each site, and in each class I create a TIdHTTP or TIdFTP object (very few are FTP and I don't have a problem with any of them). When the download from one class is finished, I destroy the class and destroy the TIdHTTP, and start the download process for the next class (or site), so the downloads are not parallel. But I do create about 40 TIdHTTP and a few TIdFTP objects at the very beginning, and I start the download process one after another.
Is my approach correct? Or should I use only one TIdHTTP object in all classes? But if I have to do so, how can I refresh or reset it? Sometimes I have to login to a single site multiple times with different credentials.
I should also mention that one approach I came up with that did help a little bit (maybe solved one or two errors!) was this:
// mMaxTryCount is 4 and mSleepInterval is 1000
for I := 1 to mMaxTryCount do
begin
if not(isSuccess) then
begin
if (i = mMaxTryCount) then
begin
responseCode.Clear;
idHttp.Post(URL, requestList, responseCode);
end
else
try
responseCode.Clear;
idHttp.Post(URL, requestList, responseCode);
isSuccess := true;
except
Sleep(mSleepInterval * (i + (i - 1)));
end;
end;
end;

Imap4 client command LSUB

I have a problem with function TIdIMAP4.ListSubscribedMailBoxes(AMailBoxList: TStrings): Boolean; with this implementation :
function TIdIMAP4.ListSubscribedMailBoxes(AMailBoxList: TStrings): Boolean;
begin
{CC2: This is one of the few cases where the server can return only "OK completed"
meaning that the user has no subscribed mailboxes.}
Result := False;
CheckConnectionState([csAuthenticated, csSelected]);
SendCmd(NewCmdCounter, IMAP4Commands[cmdLSub] + ' "" *',
[IMAP4Commands[cmdList], IMAP4Commands[cmdLSub]]); {Do not Localize}
if LastCmdResult.Code = IMAP_OK then begin
// ds - fixed bug # 506026
ParseLSubResult(AMailBoxList, LastCmdResult.Text);
Result := True;
end;
end;
When I debug I see that the LastCmdResult.Text stringlist is empty, but the LastCmdResult.FormattedReply stringlist has all folders on my email server (Inbox, Sent, Trash, ...). When I tried to use LastCmdResult.FormattedReply count or text, it had immediately lost its data and gave LastCmdResult.FormattedReply.Count=0 and LastCmdResult.FormattedReply.Text=''. So I'd like to know if there is a way to enter the data inside LastCmdResult.FormattedReply and get my email server folders or there is another way to solve my problem ?
I have a problem with function TIdIMAP4.ListSubscribedMailBoxes(AMailBoxList: TStrings): Boolean; with this implementation :
Works fine for me when I try it using the latest SVN version of Indy.
When I debug I see that the LastCmdResult.Text stringlist is empty, but the LastCmdResult.FormattedReply stringlist has all folders on my email server (Inbox, Sent, Trash, ...).
When I run it, the opposite happens. LastCmdResult.Text contains the expected text, and LastCmdResult.FFormattedReply is empty (notice I mention the FFormattedReply data member directly, see below).
When I tried to use LastCmdResult.FormattedReply count or text, it had immediately lost its data and gave LastCmdResult.FormattedReply.Count=0 and LastCmdResult.FormattedReply.Text=''.
That is by design. The FormattedReply property is intended to be used by a client to parse a server reply so it can populate TIdReply's property values, and to be used by a server to generate a new reply using TIdReply's property values. So, you cannot read from the FormattedReply property on the client side.
So I'd like to know if there is a way to enter the data inside LastCmdResult.FormattedReply and get my email server folders or there is another way to solve my problem ?
The whole purpose of ListSubscribedMailBoxes() is to return the folder names in the AMailBoxList parameter. If that is not working for you, then either
you are using a older/buggy version of Indy.
your server is sending the data in a format that TIdIMAP4 is not able to parse.
Without knowing which version of Indy you are actually using, or what the server's reply data actually looks like, there is no way to diagnose your issue one way or the other.

WWW server reports error after POST Request by Internet Direct components in Delphi

I'm using Delphi XE4 and i usually use Indy with IdHttp.POST to POST request to websites,
This time, whenever i try to POST the request i get Error: Your browser is not sending the correct data.
I'm very sure that I'm POSTing the right data, and i'm using the IOHandler and CookieManager.
Been dealing with this for days(literally)
Here is the code(the site in the code):
procedure TForm1.Button1Click(Sender: TObject);
var s, lge, Kf1, Kf2, Kf3, Kf4 : String;
lParam : TStringList;
begin
S := http.Get('https://www.neobux.com/m/l/');
Memo1.Lines.Add(S);
getParamLge(s,lge,'lge');
GetInput(s,Kf1,'id="Kf1"');
GetInput(s,Kf2,'id="Kf2"');
GetInput(s,Kf3,'id="Kf3"');
GetInput(s,Kf4,'id="Kf4"');
lParam := TStringList.Create;
lParam.Add('lge='+lge);
lParam.Add(Kf1+'=USERNAME');
lParam.Add(Kf2+'=PASSWORD');
lParam.Add(Kf3+'=');
lParam.Add(Kf4+'=');
lParam.Add('login=1');
memo1.Lines.Add(http.Post('https://www.neobux.com/m/l/', lParam));
end;
(the getParamLge and GetInput function, are just simple copy and pos functions to extract value from the GET respone).
I thought maybe it needed cookies so i've added this in the beginning:
Cookie.CookieCollection.Clear;
Cookie.CookieCollection.AddClientCookies('CFID=21531887; CFTOKEN=20369251; dh=20130709111845,1920x1080,{ts ''2013-07-09 06:18:58''}; __utma=90161412.436822896.1373368451.1373368451.1373368451.1; __utmb=90161412.11.10.1373368451; __utmc=90161412; __utmz=90161412.1373368451.1.1.'+'utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __asc=06ff77ad13fc32381fd1f5d6405; __auc=06ff77ad13fc32381fd1f5d6405; __atuvc=4%7C28; MS=flat');
But all in vain.
I'm very sure that I'm POSTing the right data
Since it does not work - obviously you do not (or Delphi does not - that makes no difference for server).
You should start usual debugging loop:
Observe reference working behaviour.
Observe your program behavior
Spot the difference
Eliminate the difference
Check if the program works now
If not - go to step 2.
Reference implementation would be some WWW browser working with site: Opera, Chrome, Firefox, MS IE, etc.
Observing tool would be some HTTP Sniffer like WireShark or OmniPacket or Microsoft Net Monitor or else, however this tinkers with OS work on rather deep level.
Or it can be local proxy with GUI, like Proxomitron or Membrane Monitor - but that would require special setup for both the program and the browser, to route their traffic through that local proxy.
Then you should read about HTTP, starting with shallow observation at Wikipedia and then opening related RFC documents (specifications of different part of HTTP protocol) so that you would understand what do the observed differences mean and how to fix them. For example many people use POST request when they actually should use GET request or such.
You want to debug HTTP program but for this HTTP logs, workign and borken, are required and your question lacks them. More so, most probably you can fix it your self, just bring your program's HTTP log to accordance with both RFCs theory and working browsers practice.

Overbyte ICS HTTPS POST

I'm wanting to create a CloudFlare client in the Firemonkey framework. For those who don't know, CloudFlare serves as a CDN of sorts for anyone with a website. They have an API available, and as with many web API's, they are using JSON with a token-based system. It requires both the account email address and the account token to access the API. It runs on HTTPS, and as you can imagine, attempting to access the API via HTTP/non-SSL simply produces null results.
The application i wish to create would serve as an all-in-one management tool, intending to eliminate the need for me to use a web browser to manage my CloudFlare settings. I'm having the most basic of issues; SSL POST. See, i can submit an API request via a web browser and get a list of results (e.g. https://www.cloudflare.com/api_json.html?a=stats&z=DOMAIN&u=EMAIL&tkn=TOKEN - Personal details removed for obvious reasons), but i'm unsure how i would go about getting these same results (or any results from the API for that matter) in Firemonkey.
I've got Overbyte ICS with SSL installed, as well as the basic bundled Indy components, but i'm struggling to get started with this. I need to post a list of parameters to https://www.cloudflare.com/api_json.html via HTTPS/SSL, but i've very little idea on where to start. I've seen a few various example around SO, mostly using ICS, but i've been unable to find any specific to posting with multiple parameters, how i should format it, etc.
One example i tried was using ICS TSSLHttpCli, writing my parameters as a single string (i.e. a=stats&z=DOMAIN&u=EMAIL&tkn=TOKEN), writing that to the SendStream of TSSLHttpCli, seeking to 0,0, setting the URL (i.e. https://www.cloudflare.com/api_json.html?), and then calling the Post method. However, this gives me Connection aborted on request. This is the code i've tried (though i've replaced personal details with generic values);
var
Data : AnsiString;
RcvStrm, SndStrm : TMemoryStream;
begin
SndStrm := TMemoryStream.Create;
RcvStrm := TMemoryStream.Create;
Data := '?a=stats&z=MYDOMAIN&u=MYEMAIL&tkn=MYTOKEN';
SslHttpCli.SendStream := SndStrm;
SslHttpCli.SendStream.Write(Data[1],Length(Data));
SslHttpCli.SendStream.Seek(0,0);
Memo1.Lines.LoadFromStream(SndStrm);
ShowMessage('Waiting!');
SslHttpCli.RcvdStream := RcvStrm;
SslHttpCli.URL := 'https://www.cloudflare.com/api_json.html';
SslHttpCli.Post;
Memo1.Lines.Clear;
Memo1.Lines.LoadFromStream(RcvStrm);
Memo1.Lines.Add('.....');
RcvStrm.Free;
SndStrm.Free;
ShowMessage('Complete!');
end;
The ShowMessage procedures are simply there to provide a visual break so i can see what data is in the stream at each time. When Memo1.Lines.LoadFromStream(SndStrm); is called, i get a single question mark the contents of the Data in the memo as expected.
When i call Memo1.Lines.LoadFromStream(RcvStrm);, i expect it to add the return result from the API, and then the 5 dots underneath it. However, this does not happen, and it's apparent that the message i'm receiving is related to the issue. I'm assuming i've not set up the data correctly, but i'm simply unsure exactly how i should format it prior to attempting to post it. I've even commented out everything below Memo1.Lines.LoadFromStream(RcvStrm); to the end to see whether the Clear procedure is called on the memo, but the contents of the memo remain the same as they were when i called LoadFromStream(SndStrm). The final ShowMessage is also not called.
I initially tried using String instead of AnsiString, but this simply output the first character of Data rather than the whole string.
There could be numerous reasons why it's not working (all details for API access are correct, so it's an issue with the code), but i need someone with more experience and knowledge to point me in the right direction.
My network coding knowledge is limited, and i've only dealt with basic SQL and FTP in Delphi so far. I've still got to work with the parsed JSON once i do get past this step, but for now, can anyone assist me in this endeavor so i can get started?
I noticed you seemed to solve this with a GET request, but I noticed two immediate problems with your POST request:
as Runner Suggested, drop the '?' in your data. The '?' is only used when appending parameters to the URL in a GET request.
You never set the content type of the HTTP Request (should be application/x-www-form-urlencoded). You can do this with the following code:
SSLHttpCli.ContentTypePost := 'application/x-www-form-urlencoded';
Just a helpful thought. I checked https://www.cloudflare.com/docs/client-api.html and they mention that POST requests are accepted. It's possible the server rejects requests that have any other content type.
Just some food for thought if you ever need to contact another API via POST requests and want to use the Overbyte Components.
Hope the info is useful!
Try this;
SndStrm := TMemoryStream.Create;
RcvStrm := TMemoryStream.Create;
Data := 'a=stats&z=MYDOMAIN&u=MYEMAIL&tkn=MYTOKEN';
SndStrm.Write(Data[1], Length(Data));
SndStrm.Seek(0, 0);
SslHttpCli.SendStream := SndStrm;

JEDI JVCL TJvProgramVersionCheck How to use HTTP

I'm wondering if someone has an example on how can be used the TJvProgramVersionCheck component performing the check via HTTP.
The example in the JVCL examples dir doesn't show how to use HTTP
thank you
The demo included in your $(JVCL)\Examples\JvProgramVersionCheck folder seems to be able to do so. Edit the properties of the JVProgramVersionHTTPLocation, and add the URL to it's VersionInfoLocation list (a TStrings). You can also set up any username, password, proxy, and port settings if needed.
You also need to add an OnLoadFileFromRemote event handler. I don't see anything in the demo that addresses that requirement, but the source code says:
{ Simple HTTP location class with no http logic.
The logic must be implemented manually in the OnLoadFileFromRemote event }
It appears from the parameters that event receives that you do your checking there:
function TJvProgramVersionFTPLocation.LoadFileFromRemoteInt(
const ARemotePath, ARemoteFileName, ALocalPath, ALocalFileName: string;
ABaseThread: TJvBaseThread): string;
So you'll need to add an event handler for this event, and then change the TJVProgramVersionCheck.LocationType property to pvltHTTP and run the demo. After testing, it seems you're provided the server and filename for the remote version, and a local path and temp filename for the file you download. The event handler's Result should be the full path and filename of the newly downloaded file. Your event handler should take care of the actual retrieval of the file.
There are a couple of additional types defined in JvProgramVersionCheck.pas, (TJvProgramVersionHTTPLocationICS and TJvProgramVersionHTTPLocationIndy, both protected by compiler defines so they don't exist in the default compilation. However, setting the ICS related define resulted in lots of compilation errors (it apparently was written against an old version of ICS), and setting the Indy define (and then setting it again to use Indy10 instead) allowed it to compile but didn't change any of the behavior. I'm going to look more into this later today.
Also, make sure that the VersionInfoLocation entry is only the URL (without the filename); the filename itself goes in the VersionInfoFileName property. If you put it in the URL, it gets repeated (as in http://localhost/Remote/ProjectVersions_http.iniProjectVersions_http.ini, and will fail anyway. (I found this while tracing through the debugger trying to solve the issue.)
Finally...
The solution is slightly (but not drastically) complicated. Here's what I did:
Copy JvProgramVersionCheck.pas to the demo folder. (It needs to be recompiled because of the next step.)
Go to Project->Options->Directories and Conditionals, and add the following line to the DEFINES entry:
USE_3RDPARTY_INDY10;USE_THIRDPARTY_INDY;
Delete the JvProgramVersionHTTPLocation component from the demo form.
Add a new private section to the form declaration:
private
HTTPLocation: TJvProgramVersionHTTPLocationIndy;
In the FormCreate event, add the following code:
procedure TForm1.FormCreate(Sender: TObject);
const
RemoteFileURL = 'http://localhost/';
RemoteFileName = 'ProjectVersions_http.ini';
begin
HTTPLocation := TJvProgramVersionHTTPLocationIndy.Create(Self); // Self means we don't free
HTTPLocation.VersionInfoLocationPathList.Add(RemoteFileURL);
HTTPLocation.VersionInfoFileName := RemoteFileName;
ProgramVersionCheck.LocationHTTP := HTTPLocation;
ProgramVersionCheck.LocationType := pvltHTTP;
VersionCheck; // This line is already there
end;
In the ProgramVersionCheck component properties, expand the VersionInfoFileOptions property, and change the FileFormat from hffXML to hffIni.
Delete or rename the versioninfolocal.ini from the demo's folder. (If you've run the app once, it stores the http location info, and the changes above are overwritten. This took a while to track down.)
Make sure your local http server is running, and the ProjectVersions_http.ini file is in the web root folder. You should then be able to run the demo. Once the form appears, click on the Edit History button to see the information retrieved from the remote version info file. You'll also have a new copy of the versioninfolocal.ini that has the saved configuration info you entered above.

Resources