I got my little html fisher working and it grabs a text file form a URL string. I can call setText to my EditText view and it will indeed display the text in the text file, but there is like no formatting at all (I am talking simple stuff like carriage returns and line feeds). How can I get this to render a bit more nicely in the EditText? Original text file looks like:
Imagine the following from a resource:
http://www.someaddress.com/thetextfile.txt
Original Text File here.
1. First thing on the text file.
2. Second thing on the text file...
Finally the end of the text file. This is a long string blah blah
that I like to use here. Notice the carriage return line breaks and indentation
on the paragraph.
I can get the above as a string but there is no carriage returns at all when it displays in the EditView. Is there anyway I can add this? Its not a matter of adding \n or \r cause those would already be in the text file I would suspect (was written in Notepad++). So is there anyway to get this even ever so slightly more formatted? (and preserve the formatting when the string is saved back out to disk or to a database?
EDIT:
<EditText android:id="#+id/contract_text_input"
android:layout_width="650sp" android:layout_height="wrap_content"
android:lines="25"
android:scrollbars = "vertical"
android:gravity="top|left" android:inputType="textMultiLine"
android:scrollHorizontally="false"
android:minWidth="10.0dip"
android:maxWidth="5.0dip"/>
EDIT:
These are the methods that fetch this text file from the internet. They seem to be ripping out all the \n and \r. But if I inspect that file that's on line it only has \t's in it. So maybe its filezilla's uploading to my webserver of the file?
public static InputStream getInputStreamFromUrl(String url){
InputStream contentStream = null;
try{
HttpClient httpclient = new DefaultHttpClient();
HttpResponse response = httpclient.execute(new HttpGet(url));
contentStream = response.getEntity().getContent();
} catch(Exception e){
e.printStackTrace();
}
return contentStream;
}
public static String getStringFromUrl(String url) {
BufferedReader br = new BufferedReader(new InputStreamReader(getInputStreamFromUrl(url)));
StringBuffer sb = new StringBuffer();
try{
String line = null;
while ((line = br.readLine())!=null){
sb.append(line);
}
}catch (IOException e){
e.printStackTrace();
}
return sb.toString();
}
This is how I am ultimately updating the EditText:
private class FragmentHttpHelper extends AsyncTask<Void, Void, String>{
protected void onPostExecute(String result) {
contractTextTxt.setText(result);
}
#Override
protected String doInBackground(Void... params) {
return getStringFromUrl(urlReferenceTxt.getText().toString());
}
}
Check the Android API. The problem is in the following line of code:
while ((line = br.readLine())!=null)
{
//code goes here
}
The call to readLine() removes newlines, carriage returns and linefeeds. You need to use read() if you want to get everything.
Hi sorry I just did a similar thing in an app of mine
and called
((EditText) findViewById(R.id.textView1))
.setText("Original Text File here.\r\n\r\n1. First thing on the text file.\r\n\r\n2. Second thing on the text file...\r\n\r\n Finally the end of the text file. This is a long string blah blah that I like to use here. Notice the carriage return line breaks and indentation on the paragraph.");
which would be the same as your string
and I get
which looks fine to me!
my edit text was
<EditText android:id="#+id/textView1" android:layout_width="wrap_content"
android:layout_weight="0" android:layout_height="wrap_content"></EditText>
edit sorry huge image
edit2: check the debugger's string value
Related
I have a bunch of questions relating to jsoup's charset support, most of which are supported by quotes from the API docs:
jsoup.Jsoup:
public static Document parse(File in, String charsetName) ...
Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 ...
Does this mean the 'charset' meta-tag isn't used to detect the encoding?
jsoup.nodes.Document:
public void charset(Charset charset)
... This method is equivalent to OutputSettings.charset(Charset) but in addition ...
public Charset charset()
... This method is equivalent to Document.OutputSettings.charset().
Does this mean there isn't an "input charset" and "output charset", and that they are indeed the same setting?
jsoup.nodes.Document:
public void charset(Charset charset)
... Obsolete charset / encoding definitions are removed!
Will this remove the 'http-equiv' meta-tag in lieu of the 'charset' meta-tag? For backwards compatibility, is there any way to keep both?
jsoup.nodes.Document.OutputSettings:
public Charset charset()
Where possible (when parsing from a URL or File), the document's output charset is automatically set to the input charset. Otherwise, it defaults to UTF-8.
I need to know if the document hasn't specified an encoding*. Does this mean jsoup can't provide this information?
* instead of defaulting to UTF-8, I will run juniversalchardet.
The docs are out of date / incomplete. Jsoup does use the charset meta tag, as well as the http-equiv tag to detect the charset. From the source, we see that this method looks like this:
public static Document parse(File in, String charsetName) throws IOException {
return DataUtil.load(in, charsetName, in.getAbsolutePath());
}
DataUtil.load in turn calls parseByteData(...), which looks like this: (Source, scroll down)
//reads bytes first into a buffer, then decodes with the appropriate charset. done this way to support
// switching the chartset midstream when a meta http-equiv tag defines the charset.
// todo - this is getting gnarly. needs a rewrite.
static Document parseByteData(ByteBuffer byteData, String charsetName, String baseUri, Parser parser) {
String docData;
Document doc = null;
if (charsetName == null) { // determine from meta. safe parse as UTF-8
// look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
docData = Charset.forName(defaultCharset).decode(byteData).toString();
doc = parser.parseInput(docData, baseUri);
Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
if (meta != null) { // if not found, will keep utf-8 as best attempt
String foundCharset = null;
if (meta.hasAttr("http-equiv")) {
foundCharset = getCharsetFromContentType(meta.attr("content"));
}
if (foundCharset == null && meta.hasAttr("charset")) {
try {
if (Charset.isSupported(meta.attr("charset"))) {
foundCharset = meta.attr("charset");
}
} catch (IllegalCharsetNameException e) {
foundCharset = null;
}
}
(Snip...)
The following line from the above code snippet shows us that indeed, it uses either meta[http-equiv=content-type] or meta[charset] to detect the encoding, otherwise falling back to utf8.
Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
I'm not quite sure what you mean here, but no, the output charset setting controls what characters are escaped when the document HTML / XML is printed to string, whereas the input charset determines how the file is read.
It will only ever remove meta[name=charset] items. From the source, the method which updates / removes the charset definition in the document: (Source, again scroll down)
private void ensureMetaCharsetElement() {
if (updateMetaCharset) {
OutputSettings.Syntax syntax = outputSettings().syntax();
if (syntax == OutputSettings.Syntax.html) {
Element metaCharset = select("meta[charset]").first();
if (metaCharset != null) {
metaCharset.attr("charset", charset().displayName());
} else {
Element head = head();
if (head != null) {
head.appendElement("meta").attr("charset", charset().displayName());
}
}
// Remove obsolete elements
select("meta[name=charset]").remove();
} else if (syntax == OutputSettings.Syntax.xml) {
(Snip..)
Essentially, if you call charset(...) and it does not have a charset meta tag, it will add one, otherwise update the existing one. It does not touch the http-equiv tag.
If you want to find out if the documet specifies an encoding, just look for http-equiv charset or meta charset tags, and if there are no such tags, this means that the document does not specify an encoding.
Jsoup is opens source, you can look at the source yourself to see exactly how it works: https://github.com/jhy/jsoup/ (You can also modify it to do exactly what you want!)
I'll update this answer with further details when I have time. Let me know if you have any other questions.
Quite simple question. I have the following code
#Html.Raw(following.Description).ToString()
when this comes from database it has some markup in it (its a forum post but i want to show a snippet in the list without the markup
is there any way to remove this and replace this line or shall I just regex it from the controller?
Here is a utility class extension method that is able to strip tags from fragments without using Regex:
public static string StripTags(this string markup)
{
try
{
StringReader sr = new StringReader(markup);
XPathDocument doc;
using (XmlReader xr = XmlReader.Create(sr,
new XmlReaderSettings()
{
ConformanceLevel = ConformanceLevel.Fragment
// for multiple roots
}))
{
doc = new XPathDocument(xr);
}
return doc.CreateNavigator().Value; // .Value is similar to .InnerText of
// XmlDocument or JavaScript's innerText
}
catch
{
return string.Empty;
}
}
can you tell me a hint to start an Epub reader app for blackberry?
I want the simplest way to do it, is there any browser or UI component that can read & display it?
I want to download it from a server then view it to the user to read it.
couple of days ago, an Epub reader library was added here, I tried to use it, but it has some difficulties, it could open Epubs only from resources, but not from file system, so I decided to download the source and do some adaptation.
First, I wrote a small function that opens the Epub file as a stream:
public static InputStream GetFileAsStream(String fName) {
FileConnection fconn = null;
DataInputStream is = null;
try {
fconn = (FileConnection) Connector
.open(fName, Connector.READ_WRITE);
is = fconn.openDataInputStream();
} catch (IOException e) {
System.out.println(e.getMessage());
return is;
Then, I replaced the call that opens the file in com.omt.epubreader.domain.epub.java, so it became like this:
public Book getBook(String url)
{
InputStream in = ConnectionController.GetFileAsStream(url);
...
return book;
}
after that, I could read the file successfully, but a problem appeared, it wasn't able to read the sections, i.e. the .html files, so I went into a short debug session before I found the problem, whoever wrote that library, left the code that read the .html file names empty, in com.omt.epubreader.parser.NcxParser it was like this:
private void getBookNcxInfo()
{
...
if(pars.getEventType() == XmlPullParser.START_TAG &&
pars.getName().toLowerCase().equals(TAG_CONTENT))
{
if(pars.getAttributeCount()>0)
{
}
}
...
}
I just added this line to the if clause:
contentDataFileName.addElement(pars.getAttributeValue("", "src"));
and after that, it worked just perfectly.
I've got a pop-up textarea, where the user writes some lengthy comments. I would like to store the content of the textarea to a file on the server on "submit". What is the best way to do it and how?
THanks,
This would be very easy to do. The text could be just a string or stringBuffer for size and formatting, then just pass that to your java code and use file operations to write to a file.
This is some GWT code, but it's still Ajax, so it will be similar. Get a handler for an event to capture the button submittal, then get the text in the text area.
textArea.addChangeHandler(new ChangeHandler() {
public void onChange(ChangeEvent changeEvent) {
String text = textArea.getText();
}
});
The passing off mechanism I don't know because you don't show any code, but I just created a file of filenames, line by line by reading filenamesout of a list of files with this:
private void writeFilesListToFile(List<File> filesList) {
for(File file : filesList){
String fileName = file.getName();
appendToFile(fileName);
}
}
private void appendToFile(String text){
try {
BufferedWriter out = new BufferedWriter(new FileWriter<file path andfile name>));
out.write(text);
out.newLine();
out.close();
} catch (IOException e) {
System.out.println("Error appending file with filename: " + text);
}
}
You could do something similar, only write out the few lines you got from the textarea. Without more to go on I can't really get more specific.
HTH,
James
I have an word file that contain my specified pattern text {pattern} and I want to replace those pattern with new my string which was read from database. So I used open xml read stream from my docx template file the replace my pattern string then returned to stream which support to download file without create a temporary file. But when I opened it generated me error on docx file. Below is my example code
public ActionResult SearchAndReplace(string FilePath)
{
MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
//Instead using this code below to write text back the original file. I write new string back to memory stream and return to a stream download file
//using (StreamWriter sw = new //StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
//{
// sw.Write(docText);
//}
using (StreamWriter sw = new StreamWriter(mem))
{
sw.Write(docText);
}
}
mem.Seek(0, SeekOrigin.Begin);
return File(mem, "application/octet-stream","download.docx"); //Return to download file
}
Please suggest me any solutions instead read a text from a word file and replace those expected pattern text then write data back to the original file. Are there any solutions replace text with WordprocessingDocument libary? How can I return to memory stream with validation docx file format?
The approach you are taking is not correct. If, by chance, the pattern you are searching for matches some Open XML markup, you will corrupt the document. If the text you are searching for is split over multiple runs, your search/replace code will not find the text and will not operate correctly. If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:
Break all runs into runs of a single
character. This includes runs that
have special characters such as a
line break, carriage return, or hard
tab.
It is then pretty easy to find a
set of runs that match the characters
in your search string.
Once you have identified a set of runs that match,
then you can replace that set of runs
with a newly created run (which has
the run properties of the run
containing the first character that
matched the search string).
After replacing the single-character runs
with a newly created run, you can
then consolidate adjacent runs with
identical formatting.
I've written a blog post and recorded a screen-cast that walks through this algorithm.
Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM
-Eric
string sourcepath = HttpContext.Server.MapPath("~/File/Form/s.docx");
string targetPath = HttpContext.Server.MapPath("~/File/ExportTempFile/" + DateTime.Now.ToOADate() + ".docx");
System.IO.File.Copy(sourcepath, targetPath, true);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(targetPath, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDocument.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
byte[] byteArray = Encoding.UTF8.GetBytes(docText);
MemoryStream stream = new MemoryStream(byteArray);
wordDocument.MainDocumentPart.FeedData(stream);
}
MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(targetPath));
return File(mem, "application/octet-stream", "download.docx");
Writing directly to the word document stream will indeed corrupt it.
You should instead write to the MainDocumentPart stream, but you should first truncate it.
It looks like MainDocumentPart.FeedData(Stream sourceStream) method will do just that.
I haven't tested it but this should work.
public ActionResult SearchAndReplace(string FilePath)
{
MemoryStream mem = new MemoryStream(System.IO.File.ReadAllBytes(FilePath));
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (MemoryStream ms = new MemoryStream())
{
using (StreamWriter sw = new StreamWriter(ms))
{
sw.Write(docText);
}
ms.Seek(0, SeekOrigin.Begin);
wordDoc.MainDocumentPart.FeedData(ms);
}
}
mem.Seek(0, SeekOrigin.Begin);
return File(mem, "application/octet-stream","download.docx"); //Return to download file
}