Using Indy httpserver to find keywords in a webpage [duplicate] - delphi

This question already has an answer here:
Delphi: Easiest way to search for string in memorystream
(1 answer)
Closed 9 years ago.
I'm trying to use Indy http server to find keywords within a webpage for a proxy filter. I've set up a proxy and the http server, which works with web browsers, but I'm struggling when it comes to finding a keyword within the web page.
I've been trying to convert a memory stream to string and searching for a keyword within it but maybe this is the wrong way to be doing it. I have limited experience with delphi so I'm slightly stuck.
If anyone could give me any pointers, that would be great.
Thanks.
EDIT: Ok I have added a function here where 'Stream' is the memory stream from the http server and 'what' is the keyword I'm searching, it doesn't seem to work though....
function FindInMemStream(Stream: TMemoryStream; What: String):Integer;
var
bufBuffer, bufBuffer2: array[0..254] of Char;
i: Integer;
begin
filter.Form2.ListBox1.Items.Add('finding');
What := 'train';
Result := 0;
i := 0;
FillChar(bufBuffer, 255, #0);
FillChar(bufBuffer2, 255, #0);
StrPCopy(#bufBuffer2, What);
Stream.Position:=0;
while Stream.Position <> Stream.Size do
begin
Stream.Read(bufBuffer[0],Length(What));
if CompareMem(#bufBuffer,#bufBuffer2,Length(What)) then
begin
filter.Form2.ListBox1.Items.Add(IntToStr(Stream.Position-Length(What)));
Result := Stream.Position-Length(What); // not 0 : it's found keyphrase
Exit;
end;
i := i + 1;
// filter.Form2.ListBox1.Items.Add(IntToStr(i));
Stream.Seek(i,0)
end;
end;

There are libraries which can be used for HTML parsing, for example the (commercial) DIHtmlParser.
DIHtmlParser reads, extracts information from, and writes HTML, XHTML, and XML.
From its feature list:
Full Unicode support (UnicodeString or WideString, depending on Delphi version).
Reads and writes over 70 character sets natively (independent of the OS).
Operates on TStreams, memory buffers or strings.
Returns a single piece of HTML to the application at a time.
With such a library, the HTML content (visible text) can be extracted easily from the HTML response, and the remaining task to find the search term would become trivial.
I would not try to write my own HTML parser, but rather use an existing library.

Related

Encoding in Indy 10 and Delphi

I am using Indy 10 with Delphi. Following is my code which uses EncodeString method of Indy to encode a string.
var
EncodedString : String;
StringToBeEncoded : String;
EncoderMIME: TIdEncoderMIME;
....
....
EncodedString := EncoderMIME.EncodeString(StringToBeEncoded);
I am not getting the correct value in encoded sting.
What is the purpose of IndyTextEncoding_OSDefault?
Here's the source code for IndyTextEncoding_OSDefault.
function IndyTextEncoding_OSDefault: IIdTextEncoding;
begin
if GIdOSDefaultEncoding = nil then begin
LEncoding := TIdMBCSEncoding.Create;
if InterlockedCompareExchangeIntf(IInterface(GIdOSDefaultEncoding), LEncoding, nil) <> nil then begin
LEncoding := nil;
end;
end;
Result := GIdOSDefaultEncoding;
end;
Note that I stripped out the .net conditional code for simplicity. Most of this code is to arrange singleton thread-safety. The actual value returned is synthesised by a call to TIdMBCSEncoding.Create. Let's look at that.
constructor TIdMBCSEncoding.Create;
begin
Create(CP_ACP, 0, 0);
end;
Again I've remove conditional code that does not apply to your Windows setting. Now, CP_ACP is the Active Code Page, the current system Windows ANSI code page. So, on Windows at least, IndyTextEncoding_OSDefault is an encoding for the current system Windows ANSI code page.
Why did using IndyTextEncoding_OSDefault give the same behaviour as my Delphi 7 code?
That's because the Delphi 7 / Indy 9 code for TEncoderMIME.EncodeString does not perform any code page transformation and MIME encodes the input string as though it were a byte array. Since the Delphi 7 string is encoded in the active ANSI code page, this has the same effect as passing IndyTextEncoding_OSDefault to TEncoderMIME.EncodeString in your Unicode version of the code.
What is the difference between IndyTextEncoding_Default and IndyTextEncoding_OSDefault?
Here is the source code for IndyTextEncoding_OSDefault:
function IndyTextEncoding_Default: IIdTextEncoding;
var
LType: IdTextEncodingType;
begin
LType := GIdDefaultTextEncoding;
if LType = encIndyDefault then begin
LType := encASCII;
end;
Result := IndyTextEncoding(LType);
end;
This returns an encoding that is determined by the value of GIdDefaultTextEncoding. By default, GIdDefaultTextEncoding is encASCII. And so, by default, IndyTextEncoding_Default yields an ASCII encoding.
Beyond all this you should be asking yourself which encoding you want to be using. Relying on default values leaves you at the mercy of those defaults. What if those defaults don't do what you want to do. Especially as the defaults are not Unicode encodings and so support only a limited range of characters. And what's more are dependent on system settings.
If you wish to encode international text, you would normally choose to use the UTF-8 encoding.
One other point to make is that you are calling EncodeString as though it were an instance method, but it is actually a class method. You can remove EncoderMIME and call TEncoderMIME.EncodeString. Or keep EncoderMIME and call EncoderMIME.Encode.

Remove or edit ID3Tag version 2 from MP3 file using Delphi 7

I'm using both old good MPGTools and own, simple method of setting ID3 Tag in my MP3 files. But both approaches are too old to support ID3Tag version 2. I'm looking for any solution that would allow my application, written in Delphi 7, to either remove ID3Tag from each file it process or to set it to exactly the same values as ID3Tag version 1 is set.
Currently I'm removing ID3Tagv2 manually, using quick keyboard combination in Winamp.
I don't use v2 or album art or all these "new" addition, so the quickiest way to get rid of ID3Tagv2 (if it exists in particular file) would be all I need.
Of course I've tried to search the Internet using Google, but either I've got bad day or I'm asking wrong question, because all the results I'm getting on above mentioned questions are fake result from search engine stealers like Software Informer etc.
As it happens, one of my projects sitting here that is awaiting completion (about 80%, I'm more a hobbyist when it comes to Delphi and had more pressing stuff come up, then I found a program I was able to download which fit my requirements precisely) is a full ID3 tag editor for MP3 files. While v1 was super-easy, v2 is much harder. You can refer to the standard document for v2.3 here.
But I will confine myself to the points addressed here.
You might want ID3v2 tags depending on the application. My portable MP3 player only accepts v2 tags, which is what pushed me to do the project in the first place.
ID3v2 tags are written at the beginning of files in a variable length manner with variable numbers of tags which may or may not be present. Fortunately, the full length of the data should be in the first record if it's an ID3v2 tagged file. Hence, read the file locate the length of the ID3v2 data, then rewrite the file without the ID3v2 data and the tags are removed. Having the data at the beginning makes this necessary and is indeed a frustration. Anything I do in the future to the code would involve trying to change data in place. Some very dirty code follows, which AFAIR worked, but you will need to clean up if you use (I'm sure some here will be content to point out exactly how I should). But test it well just to be sure. Also be sure to ask if I missed anything from the unit I copied this out of (it's a 19.3KB pas file) that you would need:
type
sarray = array[0..3] of byte;
psarray = ^sarray;
ID3v2Header = packed record
identifier: array[0..2] of char;
major_version: byte;
minor_version: byte;
flags: byte;
size: DWord;
end;
function size_decodeh(insize: DWord): DWord;
{ decodes the size headers only, which does not use bit 7 in each byte,
requires MSB conversion as well }
var
outdval: DWord;
outd, ind: psarray;
tnext2, pnext2: byte;
begin
outdval := 0;
outd := #outdval;
ind := #insize;
tnext2 := ind^[2] shr 1;
pnext2 := ind^[1] shr 2;
outd^[0] := ind^[3] or ((ind^[2] and $01) shl 7);
outd^[1] := tnext2 or ((ind^[1] and $03) shl 6);
outd^[2] := pnext2 or ((ind^[0] and $07) shl 5);
outd^[3] := ind^[0] shr 3;
Result := outdval;
end;
procedure ID3v2_LoadData(filename: string; var memrec: pointer;
var outsize: integer);
{ procedure loads ID3v2 data from "filename". Returns outsize = 0 if
there is no ID3v2 data }
var
infile: file;
v1h: ID3V2Header;
begin
assign(infile, filename);
reset(infile, 1);
// read main header to get id3v2 size
blockread(infile, v1h, sizeof(v1h));
// detect if there is id3v2 data
if v1h.identifier = 'ID3' then
begin
outsize := size_decodeh(v1h.size);
// read ID3v2 header data
getmem(memrec, outsize);
blockread(infile, memrec^, outsize);
Close(infile);
end
else
outsize := 0;
end;
function id3v2_erase(infilestr: string): boolean;
{ erase all ID3v2 data. Data are stored at the beginning of file, so file
must be rewritten }
const
tempfilename = 'TMp#!0X.MP3';
var
memrec: pointer;
outsize, dataread: integer;
IsID3v2: boolean;
databuffer: array[1..32768] of byte;
newfile, origfile: file;
begin
// reuse service routine to get information
Id3v2_loaddata(infilestr, memrec, outsize);
// is there ID3v2 data?
if outsize > 0 then
begin
// need to clean up after the service routine
freemem(memrec);
// get amount of data to erase
outsize := outsize + sizeof(Id3v2Header);
writeln('Data to delete is: ', outsize, ' bytes.');
// now rewrite the file
AssignFile(origfile, infilestr);
reset(origfile, 1);
AssignFile(newfile, tempfilename);
rewrite(newfile, 1);
Seek(origfile, outsize);
repeat
blockread(origfile, databuffer, sizeof(databuffer), dataread);
blockwrite(newfile, databuffer, dataread);
until dataread = 0;
CloseFile(origfile);
CloseFile(newfile);
// rename temp file and delete original
DeleteFile(infilestr);
RenameFile(tempfilename, infilestr);
IsID3v2 := true;
end
else
IsID3v2 := false;
Result := IsID3v2;
end;
Full editing capability that works in most all situations is obviously a tougher hill to climb than that, but all the details are there in that document I linked to. Hopefully this helps you out.
There are few libs that works fine with ID3V2. Back in 2006 I did a big research to find Delphi library that supports most of the Id3V2 specification for Delphi 7.
And I found these 2:
Audio Tools Library (was the best for that moment). I think that it even could read/write tags in Unicode. Here is the unit Id3V2.pas
JVCL has component to work with Id3V2 tags. But it didn't had Unicode support for non-unicode Delphi in 2006.
Btw, if you do not use JVCL yet, it's not worth to install more than 600 components just to get Id3V2 support.
So, take a look at Audio Tools Library.

How to convert text to UTF-8 Delphi

I have a function that returns an HTML page from Internet, but the Cyrillic symbols are displayed with some others unknown characters.
How can I convert the text and be able to see the normal Cyrillic symbols?
I'm with Delphi 2009 and im using indy to send HTTP request and get back response from the server.
(i think i have indy9)
This is how i take the HTML page
http := TIDHttp.Create(nil);
http.HandleRedirects := true;
http.ReadTimeout := 5000;
http.Request.ContentType:='multipart/form-data';
param:=TIdMultiPartFormDataStream.Create;
param.AddFormField('subcat_id','501');
param.AddFormField('reg_id','1');
text:=http.Post('example.com',param);
I don't know if indy has any functions that gets the page with any unicode.
You have not given enough information, but I will try to suggest this: If possible, load the data in a Stream and then create a StringList and load it like this:
var
MS:TMemoryStream;
SL: TStringList;
(...)
begin
MS:=TMemoryStream.Create;
SL:=TStringList.Create;
// Load your string to MS
SL.LoadFromStream(MS, TEncoding.UTF8);
(...)
MS.Free;
SL.Free;
end;
Comment if there is a problem.
Your question title seems to be out of sync with question body. Assuming you want to decode UTF-8 encoded HTML page, your friend is function UTF8Decode. The opposite operation done by UTF8Encode. These functions were available as early as Delphi 7 (correct me if D6 applies too). Check out "See Also" section of article, there are buffer handling entry-points for more convenience too.
Indy 9 does not support Delphi 2009. Make sure you are using the latest Indy 10 release instead. In Indy 10, the version of TIdHTTP.Post() (and TIdHTTP.Get()) that returns a String will automatically decode the data to Unicode using whatever charset is specified by the server, either in the HTTP Content-Type header, or in a <meta> tag within the HTML itself.

How Can I Efficiently Read The FIrst Few Lines of Many Files in Delphi

I have a "Find Files" function in my program that will find text files with the .ged suffix that my program reads. I display the found results in an explorer-like window that looks like this:
I use the standard FindFirst / FindNext methods, and this works very quickly. The 584 files shown above are found and displayed within a couple of seconds.
What I'd now like to do is add two columns to the display that shows the "Source" and "Version" that are contained in each of these files. This information is found usually within the first 10 lines of each file, on lines that look like:
1 SOUR FTM
2 VERS Family Tree Maker (20.0.0.368)
Now I have no problem parsing this very quickly myself, and that is not what I'm asking.
What I need help with is simply how to most quickly load the first 10 or so lines from these files so that I can parse them.
I have tried to do a StringList.LoadFromFile, but it takes too much time loading the large files, such at those above 1 MB.
Since I only need the first 10 lines or so, how would I best get them?
I'm using Delphi 2009, and my input files might or might not be Unicode, so this needs to work for any encoding.
Followup: Thanks Antonio,
I ended up doing this which works fine:
var
CurFileStream: TStream;
Buffer: TBytes;
Value: string;
Encoding: TEncoding;
try
CurFileStream := TFileStream.Create(folder + FileName, fmOpenRead);
SetLength(Buffer, 256);
CurFileStream.Read(Buffer[0], 256);
TEncoding.GetBufferEncoding(Buffer, Encoding);
Value := Encoding.GetString(Buffer);
...
(parse through Value to get what I want)
...
finally
CurFileStream.Free;
end;
Use TFileStream and with Read method read number of bytes needed. Here is the example of reading bitmap info that is also stored on begining of the file.
http://www.delphidabbler.com/tips/19
Just open the file yourself for block reading (not using TStringList builtin functionality), and read the first block of the file, and then you can for example load that block to a stringlist with strings.SetText() (if you are using block functions) or simply strings.LoadFromStream() if you are loading your blocks using streams.
I would personally just go with FileRead/FileWrite block functions, and load the block into a buffer. You could also use similair winapi functions, but that's just more code for no reason.
OS reads files in blocks, which are at least 512bytes big on almost any platform/filesystem, so you can read 512 bytes first (and hope that you got all 10 lines, which will be true if your lines are generally short enough). This will be (practically) as fast as reading 100 or 200 bytes.
Then if you notice that your strings objects has only less than 10 lines, just read next 512 byte block and try to parse again. (Or just go with 1024, 2048 and so on blocks, on many systems it will probably be as fast as 512 blocks, as filesystem cluster sizes are generally larger than 512 bytes).
PS. Also, using threads or asynchronous functionality in winapi file functions (CreateFile and such), you could load that data from files asynchronously, while the rest of your application works. Specifically, the interface will not freeze during reading of large directories.
This will make the loading of your information appear faster, (since the file list will load directly, and then some milliseconds later the rest of the information will come up), while not actually increasing the real reading speed.
Do this only if you have tried the other methods and you feel like you need the extra boost.
You can use a TStreamReader to read individual lines from any TStream object, such as a TFileStream. For even faster file I/O, you could use Memory-Mapped Views with TCustomMemoryStream.
Okay, I deleted my first answer. Using Remy's first suggestion above, I tried again with built-in stuff. What I don't like here is that you have to create and free two objects. I think I would make my own class to wrap this up:
var
fs:TFileStream;
tr:TTextReader;
filename:String;
begin
filename := 'c:\temp\textFileUtf8.txt';
fs := TFileStream.Create(filename, fmOpenRead);
tr := TStreamReader.Create(fs);
try
Memo1.Lines.Add( tr.ReadLine );
finally
tr.Free;
fs.Free;
end;
end;
If anybody is interested in what I had here before, it had the problem of not working with unicode files.
Sometimes oldschool pascal stylee is not that bad.
Even though non-oo file access doesn't seem to be very popular anymore, ReadLn(F,xxx) still works pretty ok in situations like yours.
The code below loads information (filename, source and version) into a TDictionary so that you can look it up easily, or you can use a listview in virtual mode, and look stuff up in this list when the ondata even fires.
Warning: code below does not work with unicode.
program Project101;
{$APPTYPE CONSOLE}
uses
IoUtils, Generics.Collections, SysUtils;
type
TFileInfo=record
FileName,
Source,
Version:String;
end;
function LoadFileInfo(var aFileInfo:TFileInfo):Boolean;
var
F:TextFile;
begin
Result := False;
AssignFile(F,aFileInfo.FileName);
{$I-}
Reset(F);
{$I+}
if IOResult = 0 then
begin
ReadLn(F,aFileInfo.Source);
ReadLn(F,aFileInfo.Version);
CloseFile(F);
Exit(True)
end
else
WriteLn('Could not open ', aFileInfo.FileName);
end;
var
FileInfo:TFileInfo;
Files:TDictionary<string,TFileInfo>;
S:String;
begin
Files := TDictionary<string,TFileInfo>.Create;
try
for S in TDirectory.GetFiles('h:\WINDOWS\system32','*.xml') do
begin
WriteLn(S);
FileInfo.FileName := S;
if LoadFileInfo(FileInfo) then
Files.Add(S,FileInfo);
end;
// showing file information...
for FileInfo in Files.Values do
WriteLn(FileInfo.Source, ' ',FileInfo.Version);
finally
Files.Free
end;
WriteLn;
WriteLn('Done. Press any key to quit . . .');
ReadLn;
end.

Delphi: problem with httpcli (ICS) post method

I am using HttpCli component form ICS to POST a request. I use an example that comes with the component. It says:
procedure TForm4.Button2Click(Sender: TObject);
var
Data : String;
begin
Data:='status=no';
HttpCli1.SendStream := TMemoryStream.Create;
HttpCli1.SendStream.Write(Data[1], Length(Data));
HttpCli1.SendStream.Seek(0, 0);
HttpCli1.RcvdStream := TMemoryStream.Create;
HttpCli1.URL := Trim('http://server/something');
HttpCli1.PostAsync;
end;
But it fact, it sends not
status=no
but
s.t.a.t.u
I can't understand, where is the problem. Maybe someone can show an example, how to send POST request with the help of HttpCli component?
PS I can't use Indy =)
I suppose you're using Delphi 2009 or later, where the string type holds two-byte-per-character Unicode data. The Length function gives the number of characters, not the number of bytes, so when you put your string into the memory stream, you only copy half the bytes from the string. Even if you'd copied all of them, though, you'd still have a bunch of extra data in the stream since each character has two bytes and the server probably only expects to get one.
Use a different string type, such as AnsiString or UTF8String.

Resources