Get mp3/pdf files using JSoup in Groovy - grails

I am developing an application for crawling the web using crawler4j and Jsoup. I need to parse a webpage using JSoup and check if it has zip files, pdf/doc and mp3/mov file available as a resource for download.
For zip files i did the following and it works:
Elements zip = doc.select("a[href\$=.zip]")
println "No of zip files is " + zip.size()
This code correctly tells me how many zip files are there in a page. I am not sure how to count all audio files or document files using JSoup. Any help is appreciated. Thanks.

Using the same approach I suspect it would be something like this:
Elements docs = doc.select("a[href\$=.doc]")
println "No of doc files is " + docs.size()
Elements mp3s = doc.select("a[href\$=.mp3]")
println "No of mp3 files is " + mp3s.size()
Really it's just a selector where the href attribute ends in some file extension.

Related

Resumable upload with new file with special characters in name

I'm following the documentation to create a new upload session for a resumable file upload.
My request looks like:
/v1.0/me/drive/items/:folderId/children/:fileName/createUploadSession
This works when :fileName is something like test.txt or even test 2.txt. But throwing special characters in there like test".txt or test%22.txt cause the request to fail.
There no examples in the documentation on how to deal with special characters in this case, so is this supported?
File stored in OneDrive have similar naming conventions/restrictions to files stored locally. If you consider that OneDrive can sync to your local file system, it makes sense why this is the case.
In general, you should assume you cannot use any of these characters in your file names:
~ " # % & * : < > ? / \ { | }.
You can find the complete list at Invalid file names and file types in OneDrive, OneDrive for Business, and SharePoint.

Download a zip file from eBay using rest-client

I want to download the zip file from eBay. Using downloadfile api.
response = RestClient.post(url,xml,headers)
This call return the content of zip file that is not exractable in xml I think. So i want to download zip file as it is from eBay.
My code is:
headers = {
"X-EBAY-SOA-OPERATION-NAME"=>"downloadFile",
"X-EBAY-SOA-SECURITY-TOKEN" => access_token_lister,
"X-EBAY-API-SITEID"=>"0",
"Content-Type"=>"application/zip"
}
url = 'https://storage.ebay.com/FileTransferService'
xml = '<?xml version="1.0"
encoding="utf-8"?> <downloadFileRequest xmlns="ebay.com/marketplace/services">; <fileReferenceId>6637191637</fileReferenceId> <taskReferenceId>6474385857</taskReferenceId> </downloadFileRequest>'
The documentation for the API used above can be found here : http://developer.ebay.com/devzone/file-transfer/CallRef/downloadFile.html
You will have to pass the content of the response to a library like rubyzip. Then use it to extract your files from the zipped file. This you can do by first writing the file to disk, and then reading it using rubyzip.
Specifically look at this part of the documentation of rubyzip - Reading a Zip file to accomplish what you are trying to achieve.

PHPEXCEL weird characters on form inputs

I need some help with PHPEXCEL library, everything works great, I'm successfully extracting my SQL query to excel5 file, I need to give this file to transport company in order to auto collect informations about packages, unfotunately the generated excel file has some ascii characters between each letter of the cell text, and when the excel file is imported you need to manually delete these charaters.
If I open the excel file, everything is fine I see: COMPANY NAME, If I open the excel file with notepad++, I see the cell values this way: C(NUL)O(NUL)M(NUL)P(NUL)A(NUL)N(NUL)Y N(NUL)A(NUL)M(NUL)E
If I open again the file with excel and save, then reopen with notepad++ I see COMPANY NAME.
So I do not understan why every time I create an excel file using PHPEXCEL my every letter of all words are filled with (nul) every letter.
So how do I prevent the generated excel file to include (nul) between every word????
Also if you open the original excel file generated from PHPExcel samples are also filled with (nul) and if you open and save it, the (nul) is gone.
Any help would be appreciated, thanks.
what is the (nul) ??? 0x00??? char(0)???
ok, here is the example:
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
date_default_timezone_set('Europe/London');
if (PHP_SAPI == 'cli')
die('Disponibile solo su browser');
require_once dirname(__FILE__) . '/Classes/PHPExcel.php';
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setCreator("Solidus")
->setLastModifiedBy("Solidus")
->setTitle("Import web")
->setSubject("Import File")
->setDescription("n.a")
->setKeywords("n.a")
->setCategory("n.a");
$objPHPExcel->setActiveSheetIndex(0)
->setCellValueExplicit("A1", "COMPANY")
->setCellValue('A2', 'SAMSUNG');
$objPHPExcel->getActiveSheet()->setTitle('DDT');
$objPHPExcel->setActiveSheetIndex(0);
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment;filename="TEST.xls"');
header('Cache-Control: max-age=0');
header('Cache-Control: max-age=1');
header('Cache-Control: private',false);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
ob_end_clean();
$objWriter->save('php://output');
As you can see from this little example, this scripts creates a file excel5 with 2 cells, A1 = COMPANY, A2 = SAMSUNG
when I send this file to the transport company, they import the file into their system, but as you can see from the picture, there is an weird character between each letter.
so I noticed every time I open the generated Excel5 with notepad++ file I get:
S(nul)A(nul)M(nul)S(nul)U(nul)N(nul)G
If I save the save with excel and then open it again with notepad++ I get:
SAMSUNG
and this file is ok for the transport company
so my question is, how should I avoid the file generated to contain thi '(nul) charachter between each letter????
some help?
weird characters
SAMSUNG
I found the soluion by myself, I explain just in case anyone has also this problem:
there is not way to change the way the excelfile is encoded by PHPEXCEL
so I figured out the problem was reading the file, I did some simulations and reproduce the problem, every time a read the file and put the result into inputs a get weird characters:
C�O�M�P�A�N�Y�
If I set the output enconding enconding as follows:
$excel->setOutputEncoding('UTF-8');
the file loads fine, so the problem was not creating the excel file, but reading the excel file.
If I print the variable with ECHO I get: "COMPANY",
if I put the variable on input as value I get: "C�O�M�P�A�N�Y�"
setting the output solves the problem, but I would like to know why the difference when I put the variable on input as value, thanks

Using printer name Adobe PDF

I have looked everywhere for this solution. The code below allows me to print to the printer, Adobe PDF, but what I want to do is automate the file name save as screen with a generic name and in a specific folder. For example, the file would be saved to C:\temp\tmpResize.pdf and I am having problems there.
var params = this.getPrintParams();
params.interactive=params.constants.interactionLevel.silent;
params.pageHandling=params.constants.handling.none;
params.fileName = "/c/temp/tmpResize.pdf";
params.printerName="Adobe PDF"
this.print(params);
Thanks for your help.

Clarification on what can be exported to excel on ipad

Trying to fix an old .asp site to work on an ipad. One of the features is the users ability to download their search results into an excel worksheet. The code uses:
Response.ContentType = "application/vnd.ms-excel"
Response.AddHeader "Content-Disposition", "attachment;filename=results.xls"
Response.CharSet = "iso-8859-1"
When viewing the site on the ipad, when the link is click for the page with the code above it does nothing, just spins. Is it the fact that I am trying to export the data as excel, I have read in some posts how it is the encoding! Should I convert the code to export the results page as a csv file and then allow the user to open it in anything they want/have available? What's the best way to do it to hit the most devices...
Thanks
In the past i'd a same scenario so what i did:
FILE: DOWNLOAD.ASP
<%
' get the file to download
myFile = request.querystring("File")
myFullPath = "c:\name_folder\" & myFile ' example of full path and filename
' set headers
Response.ContentType = "application/octet-stream"
Response.AddHeader "Content-Disposition", "attachment; filename=" & myFile
' send the file using the stream as
Set adoStream = CreateObject("ADODB.Stream")
adoStream.Open()
adoStream.Type = 1
adoStream.LoadFromFile(myFullPath)
Response.BinaryWrite adoStream.Read()
adoStream.Close
Set adoStream = Nothing
%>
FILE: HTML
Download Excel file
This example is full working with Ipad using the native browser Safari.
The file Result.xls is downloaded and loaded in the Viewer whitout the capability to be edit.
My iPad users use the App QuickOffice to let the file be saved in a virtual folder, rename the file, delete, ... but they cant edit the file, that App is just for manage the files and isnt required for download the file.
If your user need also edit the XLS file on the iPad i suggest to use (for example) the Google App Document, it let the user to edit and manage the file directly in the browser.
Hope it help

Resources