Is it possible to import text from an online .txt file to Google sheets - google-sheets

I am trying to import texts from ads.txt files from a certain websites to Gsheets. I try Importxml however it states that the important xml content can not be parsed.
example:
I'm trying to import text from this file --> financhill.com/ads.txt
I'm using this code =IMPORTXML("https://financhill.com/ads.txt","/html/body/pre/text()")
the result is N/A important xml content can not be parsed.

When I saw the data from the URL, it seems that the data is the CSV data. I think that this is the reason of your issue of the result is N/A important xml content can not be parsed.. In this case, how about using IMPORTDATA as follows?
=IMPORTDATA("https://financhill.com/ads.txt")
and
=IMPORTDATA("https://financhill.com/ads.txt",",")
Reference:
IMPORTDATA

Related

Use VBScript to Parse a Webpage's Text

I'm currently working on a VBScript that will open multiple URLs in order to update documents on a server. I was wondering if there was a way to parse a webpage's content for a specific string, in this case being the updateResult SUCCESS line shown below:
I need to be able to record the success of this webpage text as opposed to the failure page below:
This is all that is on the webpage. How would I go about parsing the text of both these types of pages in order to know that the document has updated correctly or not?

RTF file to TXT/CSV file in objective-c?

I have RTF files containing that sort of content:
long_text_description_1 number1a number1b number1c
long_text_description_2 number2a number2b number2c
long_text_description_3 number3c
long_text_description_4 number4a number4b number4c
…
I need to extract the plain raw text without the colours, fonts and other formatting thing.
The only thing I need to keep are the most basic row/column information, ideally I would like a CSV file.
The file I get contain all the formatting:
{\cs18\lang1033\langfe1033\f0\b\i0\ul0\strike0\scaps0\fs15\afs15\charscalex100\expndtw0\cf1\dn0 number1a}
What is the best way to remove all rtf information while only keeping the row information?
Trying to figure out myself many many regular expressions sound dangerous unless there is a complete understanding of the RTF format.
What I could find on the Internet mostly focused on using Windows languages & libraries unavailable in iOS.
All rtf tags are in the form \xxx.
Try using a regular expression like "\\S+" and remove all matches or replace with nothing.
For your example, you'll end up with { number1a} This will remove any backslash followed by any characters.

CSV file with Italic values

Is there any way to create a csv file using c#, which can have/show few values in Italic format, when we open it in excel.
Its just not possible. Theres no markup in csv. Either export an xls(x) or rethink your problem/solution. Why csv? It's not really meant for people to read. Only to transfer data from one application to another.
A CSV file is a text file where Excel can only interpret the type of field content as best (text, numeric, date) but not within a field. So the short answer is no.
There are libraries available for the ASP.NET MVC environment which allow you to create true Excel files so you then have complete control over field formats etc. A quick Google will find these.
UPDATE
A possible solution, if you are using MVC, is to create an HTML 'file' and then download that:
this.Response.AddHeader("Content-Disposition", "Employees.xls");
this.Response.ContentType = "application/vnd.ms-excel";
return this.Content(sb.ToString());
I've never tried this but have seen that it might work.

Create csv from html pages

There is a website that displays a lot of data in html tables. They have paged the data so there are around 500 pages.
What is the most convenint (easy) way of getting the data in those tables and download it a CSV, on Windows?
Basically I need to write a script that does something like this but is overkilling to write in in C# and I am looking for other solutions that people with web experience use:
for(i=1 to 500)
load page from http://x/page_i.html;
parse the source and get the data in table with id='data'
save results in csv
Thanks!
I was doing a screen-scraping application once and found BeautifulSoup to be very useful. You could easily plop that into a Python script and parse across all the tags with the specific id you're looking for.
The easiest non-C# way I can think of is to use Wget to download the page, then run HTMLTidy to convert it to XML/XHTML and then transform the resulting XML to CSV with an XSLT (run with MSXSL.exe)
You will have to write some simple batch files and an XSLT with a basic XPath selector.
If you feel it would be easier to just do it in C#, you can use SgmlReader to read the HTML DOM and do an XPath query to extract the data. It should not take more than about 20 lines of code.

what are the other setting need to see a html table into excel sheet format in open office org?

I have generated a html table from my web application and save the table into .xls format(in a single word i am generating a .xls sheet from my web application ).
What other setting I have to show it in table form.
You are not producing an XLS file, you are producing a mal-formed HTML file with a name that ends in .xls.
Indeed, you aren't even doing that since there aren't files on the web (there are streams that may or may not end up in files).
Different versions of Open Office, with different settings, will differ in terms of how they deal with stuff that is wrong. The version on one of the machines you are doing is saying "eh, this isn't XLS, oh! it's HTML with a table, I know what to do", while the other is getting as far as "eh, this isn't XLS, it's a bunch of text with strange less-than and greater-than characters all over the place, what do I do".
What you want to do is to produce an actual stream that Open Office and other spreadsheets can deal with. XLS is possible, but pretty hard. Go for CSV instead.
If your table was going to be:
<table>
<tr>
<th>1 heading</th><th>2 & last heading</th>
</tr>
<tr>
<td>1st cell</td><td>This is the "ultimate" cell</td>
</tr>
</table>
Then it sould become:
"1 heading","2 & last heading"
"1st cell","This is the ""ultimate"" cell"
In otherwords newlines to indicate rows, commas to indicate cells, no HTML encoding, quotes around everything and quotes in your actual content doubled-up. (You don't need to always have quotes on your content, but it's never wrong so that's simpler than working out when you do need them).
Now, make your content type "text/csv".
You are now outputting a CSV stream that can be saved as a CSV file. Your spreadsheet software will have a much better idea about what to do with this (it may still ask about character ecodings on opening, but the preview will show you a spreadsheet of data, not a bunch of HTML source all over the place.
It's not really saving as a .xls file -- it appears to be saving as the HTML, but with a .xls extension. How are you generating the .xls? On the server-side, you can provide a button to generate .xls directly (different methods depending on your server platform -- using perl there is the Spreadsheet::WriteExcel module that writes .xls directly, using Java there is JExcel (http://jexcelapi.sourceforge.net/ and POI (http://poi.apache.org/)), other platforms will have their methods.
Okay Subodh, If you want to generate .xls or .csv files, You can't just change the extension of the file and have it open up correctly in that program.
2 Options you have at this point, both involve creating the file with the data on the server and then sending it to the user to download it.
.csv
CSV files are easier to generate from the server side. In a very basic way you can think of them as regular text files with commas(not necessarily only commas) separating individual cells that can be read by spreadsheet programs. For PHP there is an article Here that explains how to generate CSV files.
.xls
xls files are not as simple as simple to generate as CSV files. On the server-side you will need a solution to generate these. For PHP there is a resource Here.
Using xls over CSV has obvious advantage that you can specify formatting and can control visual representation of your data.
Edit :
Upon closely looking at the image you posted, I can see what you are trying to do. If you just want to get that file to open correctly in a spreadsheet program, then don't save it either as CSV or xls
hello.html
<table>
<tr><td>Hi</td><td>Hi</td><td>Hi</td><td>Hi</td></tr>
<tr><td>2</td><td>2</td><td>131</td><td>11312</td></tr>
</table>
Saved as an HTML file will open up correctly(as a proper table) in any spreadsheet program.
To narrow down the problem:
1) Are you opening the same .xls file on both machines?
- what version of OpenOffice is on Machine 1?
- what version of OpenOffice is on Machine 2?
2) How are you creating your .xls file?
- are you just using the response object to change the content-type, or some proprietary software?
- can you include a code sample?
3) Have you tried a pure HTML format?

Resources