I am using sax parser to parse xml/rss but the strange problem is that the xml contains the following strange characters. “ (not regular, is slanted), …, ‘ and others. These quotes are not regular it slanted. The problem is with UTF-8 and SAX parser.
// create the factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// create a parser
SAXParser parser = factory.newSAXParser();
String replacement:
public static String replaceAll(String source, String pattern,
String replacement) {
if (source == null) {
return "";
}
StringBuffer sb = new StringBuffer();
int idx = -1;
int patIdx = 0;
while ((idx = source.indexOf(pattern, patIdx)) != -1) {
sb.append(source.substring(patIdx, idx));
sb.append(replacement);
patIdx = idx + pattern.length();
}
sb.append(source.substring(patIdx));
return sb.toString();
}
Related
I have written a fairly simple language in ANTLR. Before actually interpreting the code written by a user, I wish to parse the code and check for syntax errors. If found I wish to output the cause for the error and exit. How can I check the code for syntax errors and output the corresponding error. Please not that for my purposes the error statements similar to those generated by the ANTLR tool are more than sufficient. For example
line 3:0 missing ';'
There is ErrorListener that you can use to get more information.
For example:
...
FormulaParser parser = new FormulaParser(tokens);
parser.IsCompletion = options.IsForCompletion;
ErrorListener errListener = new ErrorListener();
parser.AddErrorListener(errListener);
IParseTree tree = parser.formula();
Only thing you need to do is to attach ErrorListener to the parser.
Here is the code of ErrorListener.
/// <summary>
/// Error listener recording all errors that Antlr parser raises during parsing.
/// </summary>
internal class ErrorListener : BaseErrorListener
{
private const string Eof = "the end of formula";
public ErrorListener()
{
ErrorMessages = new List<ErrorInfo>();
}
public bool ErrorOccured { get; private set; }
public List<ErrorInfo> ErrorMessages { get; private set; }
public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
ErrorOccured = true;
if (e == null || e.GetType() != typeof(NoViableAltException))
{
ErrorMessages.Add(new ErrorInfo()
{
Message = ConvertMessage(msg),
StartIndex = offendingSymbol.StartIndex,
Column = offendingSymbol.Column + 1,
Line = offendingSymbol.Line,
Length = offendingSymbol.Text.Length
});
return;
}
ErrorMessages.Add(new ErrorInfo()
{
Message = string.Format("{0}{1}", ConvertToken(offendingSymbol.Text), " unexpected"),
StartIndex = offendingSymbol.StartIndex,
Column = offendingSymbol.Column + 1,
Line = offendingSymbol.Line,
Length = offendingSymbol.Text.Length
});
}
public override void ReportAmbiguity(Antlr4.Runtime.Parser recognizer, DFA dfa, int startIndex, int stopIndex, bool exact, BitSet ambigAlts, ATNConfigSet configs)
{
ErrorOccured = true;
ErrorMessages.Add(new ErrorInfo()
{
Message = "Ambiguity", Column = startIndex, StartIndex = startIndex
});
base.ReportAmbiguity(recognizer, dfa, startIndex, stopIndex, exact, ambigAlts, configs);
}
private string ConvertToken(string token)
{
return string.Equals(token, "<EOF>", StringComparison.InvariantCultureIgnoreCase)
? Eof
: token;
}
private string ConvertMessage(string message)
{
StringBuilder builder = new StringBuilder(message);
builder.Replace("<EOF>", Eof);
return builder.ToString();
}
}
It is some dummy listener, but you can see what it does. And that you can tell if the error is syntax error, or some ambiguity error. After parsing, you can ask directly the errorListener, if some error occurred.
I'm new to java and I have a problem with reading a file using the scanner class.
My objective is to read the following .txt file:
3
Emmalaan 23
3051JC Rotterdam
7 rooms
price 300000
Javastraat 88
4078KB Eindhoven
3 rooms
price 50000
Javastraat 93
4078KB Eindhoven
4 rooms
price 55000
The "3" on top of the file should be read as an integer that tells how many houses the file has. The following four lines after the "3" determine one house.
I try to read this file using a read method in the class portefeuille:
public static Portefeuille read(String infile)
{
Portefeuille returnvalue = new Portefeuille();
try
{
Scanner scan = new Scanner(new File(infile)).useDelimiter(" |/n");
int aantalwoningen = scan.nextInt();
for(int i = 0; i<aantalwoningen; ++i)
{
Woning.read(scan);
}
}
catch (FileNotFoundException e)
{
System.out.println("File could not be found");
}
catch (IOException e)
{
System.out.println("Exception while reading the file");
}
return returnvalue;
}
The read method in the Woning class looks like this:
public static Woning read(Scanner sc)
{
String token_adres = sc.next();
String token_dr = sc.next();
String token_postcd = sc.next();
String token_plaats = sc.next();
int token_vraagPrijs = sc.nextInt();
String token_kamerstxt = sc.next();
String token_prijstxt = sc.next();
int token_kamers = sc.nextInt();
return new Woning(adresp, token_vraagPrijs, token_kamers);
}
When I try to execute the following code:
Portefeuille port1 = Portefeuille.read("woningen.txt");
I get the following error:
Exception in thread "main" java.util.InputMismatchException
at java.util.Scanner.throwFor(Scanner.java:840)
at java.util.Scanner.next(Scanner.java:1461)
at java.util.Scanner.nextInt(Scanner.java:2091)
at java.util.Scanner.nextInt(Scanner.java:2050)
at Portefeuille.read(Portefeuille.java:48)
at Portefeuille.main(Portefeuille.java:112)
However if I use the read method from the Woning class to read one adres in a string format:
Emmalaan 23
3051JC Rotterdam
7 Rooms
price 300000
It works fine.
I tried to change the .txt file into only one address without the "3" on top so that it is exactly formatted like the address that should work. But when I call the read method from Woning class it still gives me the error.
Could anyone please help me with this?
Thank you!
I was also facing a similar issue, so I put my answer so that it could help in future:
There are two possible modifications which I did to make this code run.
First option: Change the use of useDelimiter method to .useDelimiter("\\r\\n") when creating the Scanner class, I was in windows so we might need \\r for Windows compatibility.
Using this modification, there will be no exception.But the code will again fail at int token_vraagPrijs = sc.nextInt();.
Because in the public static Woning read(Scanner sc), you are suing sc.next();.Actually this method finds and returns the next complete token from this scanner.A complete token is preceded and followed by input that matches the delimiter pattern.
So, every sc.next() is actually reading a line not a token.
So as per your code sc.nextInt() is trying to read something like Javastraat 88.So again it will give you the same exception.
Second option (Preferred):Don't use any delimiter, Scanner class will default whitespace and your code will work fine.I modified your code and It worked fine for me.
Code:
public class Test3{
public static void main(String... s)
{
read("test.txt");
}
public static void read(String infile)
{
try (Scanner scan = new Scanner(new File(infile)))
{
int aantalwoningen = scan.nextInt();
System.out.println(aantalwoningen);
for (int i = 0; i < aantalwoningen; ++i)
{
read(scan);
}
}
catch (FileNotFoundException e)
{
System.out.println("File could not be found");
}
}
public static void read(Scanner sc)
{
String token_adres = sc.next();
String token_dr = sc.next();
String token_postcd = sc.next();
String token_plaats = sc.next();
int token_vraagPrijs = sc.nextInt();
String token_kamerstxt = sc.next();
String token_prijstxt = sc.next();
int token_kamers = sc.nextInt();
System.out.println(token_adres + " " + token_dr + " " + token_postcd + " " + token_plaats + " "
+ token_vraagPrijs + " " + token_kamerstxt + " " + token_prijstxt + " " + token_kamers);
} }
I'm trying to write a piece of code that will take an ANTLR4 parser and use it to generate ASTs for inputs similar to the ones given by the -tree option on grun (misc.TestRig). However, I'd additionally like for the output to include all the line number/offset information.
For example, instead of printing
(add (int 5) '+' (int 6))
I'd like to get
(add (int 5 [line 3, offset 6:7]) '+' (int 6 [line 3, offset 8:9]) [line 3, offset 5:10])
Or something similar.
There aren't a tremendous number of visitor examples for ANTLR4 yet, but I am pretty sure I can do most of this by copying the default implementation for toStringTree (used by grun). However, I do not see any information about the line numbers or offsets.
I expected to be able to write super simple code like this:
String visit(ParseTree t) {
return "(" + t.productionName + t.visitChildren() + t.lineNumber + ")";
}
but it doesn't seem to be this simple. I'm guessing I should be able to get line number information from the parser, but I haven't figured out how to do so. How can I grab this line number/offset information in my traversal?
To fill in the few blanks in the solution below, I used:
List<String> ruleNames = Arrays.asList(parser.getRuleNames());
parser.setBuildParseTree(true);
ParserRuleContext prc = parser.program();
ParseTree tree = prc;
to get the tree and the ruleNames. program is the name for the top production in my grammar.
The Trees.toStringTree method can be implemented using a ParseTreeListener. The following listener produces exactly the same output as Trees.toStringTree.
public class TreePrinterListener implements ParseTreeListener {
private final List<String> ruleNames;
private final StringBuilder builder = new StringBuilder();
public TreePrinterListener(Parser parser) {
this.ruleNames = Arrays.asList(parser.getRuleNames());
}
public TreePrinterListener(List<String> ruleNames) {
this.ruleNames = ruleNames;
}
#Override
public void visitTerminal(TerminalNode node) {
if (builder.length() > 0) {
builder.append(' ');
}
builder.append(Utils.escapeWhitespace(Trees.getNodeText(node, ruleNames), false));
}
#Override
public void visitErrorNode(ErrorNode node) {
if (builder.length() > 0) {
builder.append(' ');
}
builder.append(Utils.escapeWhitespace(Trees.getNodeText(node, ruleNames), false));
}
#Override
public void enterEveryRule(ParserRuleContext ctx) {
if (builder.length() > 0) {
builder.append(' ');
}
if (ctx.getChildCount() > 0) {
builder.append('(');
}
int ruleIndex = ctx.getRuleIndex();
String ruleName;
if (ruleIndex >= 0 && ruleIndex < ruleNames.size()) {
ruleName = ruleNames.get(ruleIndex);
}
else {
ruleName = Integer.toString(ruleIndex);
}
builder.append(ruleName);
}
#Override
public void exitEveryRule(ParserRuleContext ctx) {
if (ctx.getChildCount() > 0) {
builder.append(')');
}
}
#Override
public String toString() {
return builder.toString();
}
}
The class can be used as follows:
List<String> ruleNames = ...;
ParseTree tree = ...;
TreePrinterListener listener = new TreePrinterListener(ruleNames);
ParseTreeWalker.DEFAULT.walk(listener, tree);
String formatted = listener.toString();
The class can be modified to produce the information in your output by updating the exitEveryRule method:
#Override
public void exitEveryRule(ParserRuleContext ctx) {
if (ctx.getChildCount() > 0) {
Token positionToken = ctx.getStart();
if (positionToken != null) {
builder.append(" [line ");
builder.append(positionToken.getLine());
builder.append(", offset ");
builder.append(positionToken.getStartIndex());
builder.append(':');
builder.append(positionToken.getStopIndex());
builder.append("])");
}
else {
builder.append(')');
}
}
}
Hey StackOverflow Community,
So, I have this line of information from a txt file that I need to parse.
Here is an example lines:
-> date & time AC Power Insolation Temperature Wind Speed
-> mm/dd/yyyy hh:mm.ss kw W/m^2 deg F mph
Using a scanner.nextLine() gives me a String with a whole line in it, and then I pass this off into StringTokenizer, which then separates them into individual Strings using whitespace as a separator.
so for the first line it would break up into:
date
&
time
AC
Power
Insolation
etc...
I need things like "date & time" together, and "AC Power" together. Is there anyway I can specify this using a method already defined in StringTokenizer or Scanner? Or would I have to develop my own algorithm to do this?
Would you guys suggest I use some other form of parsing lines instead of Scanner? Or, is Scanner sufficient enough for my needs?
ejay
oh, this one was tricky, maybe you could build up some Trie structure with your tokens, i was bored and wrote a little class which solves your problem. Warning: it's a bit hacky, but was fun to implement.
The Trie class:
class Trie extends HashMap<String, Trie> {
private static final long serialVersionUID = 1L;
boolean end = false;
public void addToken(String strings) {
addToken(strings.split("\\s+"), 0);
}
private void addToken(String[] strings, int begin) {
if (begin == strings.length) {
end = true;
return;
}
String key = strings[begin];
Trie t = get(key);
if (t == null) {
t = new Trie();
put(key, t);
}
t.addToken(strings, begin + 1);
}
public List<String> tokenize(String data) {
String[] split = data.split("\\s+");
List<String> tokens = new ArrayList<String>();
int pos = 0;
while (pos < split.length) {
int tokenLength = getToken(split, pos, 0);
tokens.add(glue(split, pos, tokenLength));
pos += tokenLength;
}
return tokens;
}
public String glue(String[] parts, int pos, int length) {
StringBuilder sb = new StringBuilder();
sb.append(parts[pos]);
for (int i = pos + 1; i < pos + length; i++) {
sb.append(" ");
sb.append(parts[i]);
}
return sb.toString();
}
private int getToken(String[] tokens, int begin, int length) {
if (end) {
return length;
}
if (begin == tokens.length) {
return 1;
}
String key = tokens[begin];
Trie t = get(key);
if (t != null) {
return t.getToken(tokens, begin + 1, length + 1);
}
return 1;
}
}
and how to use it:
Trie t = new Trie();
t.addToken("AC Power");
t.addToken("date & time");
t.addToken("date & foo");
t.addToken("Speed & fun");
String data = "date & time AC Power Insolation Temperature Wind Speed";
List<String> tokens = t.tokenize(data);
for (String s : tokens) {
System.out.println(s);
}
Read a Text file having any line starts from "//" omit this line and moved to next line.
The Input text file having some seprate partitions. Find line by line process and this mark.
If you are using .Net 3.5 you can use LINQ with a IEnumerable wrapped around a Stream Reader. This cool part if then you can just use a where statement to file statmens or better yet use a select with a regular expression to just trim the comment and leave data on the same line.
//.Net 3.5
static class Program
{
static void Main(string[] args)
{
var clean = from line in args[0].ReadAsLines()
let trimmed = line.Trim()
where !trimmed.StartsWith("//")
select line;
}
static IEnumerable<string> ReadAsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
...
//.Net 2.0
static class Program
{
static void Main(string[] args)
{
var clean = FilteredLines(args[0]);
}
static IEnumerable<string> FilteredLines(string filename)
{
foreach (var line in ReadAsLines(filename))
if (line.TrimStart().StartsWith("//"))
yield return line;
}
static IEnumerable<string> ReadAsLines(string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
I'm not sure what you exactly need but, if you just want to filter out // lines from some text in a stream... just remember to close the stream after using it.
public string FilterComments(System.IO.Stream stream)
{
var data = new System.Text.StringBuilder();
using (var reader = new System.IO.StreamReader(stream))
{
var line = string.Empty;
while (!reader.EndOfStream)
{
line = reader.ReadLine();
if (!line.TrimStart(' ').StartsWith("//"))
{
data.Append(line);
}
}
}
return data.ToString();
}
Class SplLineIgnorStrmReader:StreamReader // derived class from StreamReader
SplLineIgnorStrmReader ConverterDefFileReadStream = null;
{
//created the Obj for this Class.
Obj = new SplLineIgnorStrmReader(strFile, Encoding.default);
}
public override string ReadLine()
{
string strLineText = "", strTemp;
while (!EndOfStream)
{
strLineText = base.ReadLine();
strLineText = strLineText.TrimStart(' ');
strLineText = strLineText.TrimEnd(' ');
strTemp = strLineText.Substring(0, 2);
if (strTemp == "//")
continue;
break;
}
return strLineText;
This is if u want to read the Text file and omit any comments from that file(here exclude "//" comment).