Parse Error - When convering a xmlstring to a document - parsing

Been breaking my head to get this straight. Pretty simple though.. have not been able to figure out why. Any help would be very much appreciated.
Here my XML file
<?xml version="1.0" encoding="UTF-8"?>
<User mode="Retrieve" simCardNumber=“9602875089237652" softwareVersion=“9" phoneManufacturer=“Nokia" phoneModel="I747" deviceId=“562372389498734" networkOperator=“Blu">
<Errors>
<Error number="404"/>
</Errors>
</User>
private static Document convertStringToDocument(String xmlStr) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try
{
DocumentBuilder builder =factory.newDocumentBuilder();
//The below statement fails and jumps to return null
//Document doc = builder.parse( new InputSource(new StringReader(xmlStr)));
//Adding replace method on the string to handle the strange looking double quote on the xml string. However I still get the same error.
Document doc = builder.parse( new InputSource(new StringReader(xmlStr.replace("“", "\'\""))));
return doc;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}

check the quotes..
networkOperator=“Blu"

Don't know if it isn't a paste error but you used “ instead of " in your code. The first one if often used in rich text editors as a starting quote, you need to change it manually to let it be parseable.

Ok this solution works. Thanks everyone for your time and support.
Document doc = null;
try
{
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlStr));
doc = db.parse(is);
} catch (Exception e) {
e.printStackTrace();
}
return doc;

Related

Apache Tika - Getting Metadata Without Downloading File

I have been trying to implement an application to determine content type of any file. I use Apache Tika for determination.
Here is a basic code implementation for that:
InputStream fileStream = ContentTypeController.class.getClassLoader().getResourceAsStream(fileName);
Tika tika = new Tika();
String contentType = null;
try {
contentType = tika.detect(fileStream);
} catch (IOException e) {
e.printStackTrace();
}
Instead of code above I have to download files from Openstack to determine file content type. Some files are more than 100GB and downloading all file is heavy.
I can not figure out how to overcome this necessity of downloading all file, I hope you have any idea/solution without downloading all file
Tika has ability to determine content type of file without downloading all if you pass a URL parameter to detect() function.
Tika tika = new Tika();
String contentType = null;
try {
contentType = tika.detect(new URL("a url"));
} catch (IOException e) {
e.printStackTrace();
}

Validate Text Response using restAssured

How to validate Text Response using restAssured?
Basically I have downloaded the file In CSV format, now the response is coming in text format any suggestion how can we validate the column headers in the text?
I have got the answer.
try {
CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
File file = new File(fileName) ;
MappingIterator<T> readValues = mapper.readerFor(type).with(bootstrapSchema).readValues(file);
return readValues.readAll();
} catch (Exception e) {
log.error("Error occurred while loading object list from file :{} with }
using Jackson csv formatter dependency

Parse Stack Overflow page source code and get accepted answer

I am trying to write a function that takes an input URL of any Stack Overflow link, gets the source code of the page, parses it, gets the accepted answer, and also gets the answer with the most upvotes.
I am new to this and I don't know how to do this. This is what I've tried out. It just returns the first answer using jsoup.
protected void doHtmlParse(String url) {
// TODO Auto-generated method stub
Document doc;
try {
doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.referrer("http://www.google.com")
.get();
Element answer = doc.select("td[class=answercell]").get(0);
System.out.println("Answer is \n" + answer.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
I only need to display the answer part, but it has to be the accepted answer. How do I approach this?
You don't really need to parse html. Use their REST API.
Have a look.
Here's an example. Note the is_accepted attribute.
EDIT:
Well, after you've got the chosen answer through the API, you could do this:
String answer = document.getElementById("answer-"+id).outerHtml();
I am now able to get the accepted answer via this code.
protected void doHtmlParse(String url) {
// TODO Auto-generated method stub
Document doc;
try {
doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.referrer("http://www.google.com")
.get();
Element answer = doc.select("div[class=answer accepted-answer]").first();
Elements tds = answer.getElementsByTag("td");
for(Element td : tds) {
String clasname = td.attr("class");
if(clasname.equals("answercell")) {
System.out.println("\n\nAccepted answerrr is \n" + td.text());
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

JSoup & URL non Latin Charsets

I am using the next implementation for Java Servlet -
String url = "http://mydomain.com/test.php?myparam="+myname;
Document doc = null;
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Where myname is a String in UTF Charset.
For some reason the result received is not OK (unreadable chars).
Is there a way to force the URL in JSoup to be UTF as well?
Thanks
Try this
url = URLEncoder.encode("http://mydomain.com/test.php?myparam="+myname, "UTF-8")

Premature end of file using JAXB and Unmarshaller. The xml fromt he response looks valid to me

I don't know what to do anymore. Everything seems correct; input/output.
I generate xml file and send to some service to validate.
The response is:
11:10:34,922 INFO [STDOUT] printing out the input stream
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response>
<Method name="XML/Release/New" time="2013-04-23T15:10:35.1446238Z">
<ResponseStatus>100</ResponseStatus>
</Method>
</Response>
finished printing out the input stream
11:10:34,922 INFO [STDOUT] got the unmarshaller
11:10:34,925 ERROR [PRNDataAccessUtil] Caught an error: javax.xml.bind.UnmarshalException
- with linked exception: [org.xml.sax.SAXParseException: Premature end of file.] : null
The code:
try {
out = connection.getOutputStream();
ByteArrayOutputStream bos = PRNPostNewsReleaseUtil.createNewsReleaseXml(newsRelease);
bos.writeTo(out);
JAXBContext context = JAXBContext.newInstance(Response.class.getPackage().getName());
in = connection.getInputStream();
BufferedReader inp = new BufferedReader(new InputStreamReader(in));
System.out.println("printing out the input stream");
String line;
while((line = inp.readLine()) != null) {
System.out.println(line);
}
System.out.println("finished printing out the input stream");
Unmarshaller unmarshaller = context.createUnmarshaller();
response = (Response) unmarshaller.unmarshal(in);
} catch (Exception ex) {
log.error("Caught an error: " + ex + " : " + ex.getMessage());
return null;
} finally {
if (null != in) connection.disconnect();
}
You are getting the error because the InputStream has been advanced to the end during the output. Assuming the buffer in your BufferedReader is large enough to contain the whole XML document you can reset it after outputting and then unmarshal that.
One time happened to me that I was using the wrong class name to build the JAXBContext object, so when I tried to marshall an object, an empty XML file was created, thus making the unmarshaller fail.
So make sure the JAXBContext object is instantiated with the class you're trying to marshall.
Another thing to note here is even if you are not reading the buffer explicitly in code but have a expression watch that reads the input, it would end up having the same effect of incrementing the stream head. Figured that out after spending hours on debugging this exception.

Resources