Highlight / parse PSI elements inside another PSI element - parsing

I developed a Custom Language plugin based on this this tutorial.
My plugin parses key/value language files with format like below. Values can contain some HTML tags like <br>, <i>, <b>, <span> and \n. So I want to highlight these tags as separate PSI elements inside green PSI elements (values) (see pic). How can I overwrite my rules to get this?
#Section header
KEY1 = First<br>Value
KEY2 = Second\nValue
Bnf rules I use
lngFile ::= item_*
private item_ ::= (property|header|COMMENT|CRLF)
property ::= (KEY? SEPARATOR VALUE?) | KEY {
mixin="someClass"
implements="someClass"
methods=[getKey getValue getName setName getNameIdentifier getPresentation]
}
header ::= HEADER {
mixin="someClass"
implements="someClass"
methods=[getName setName getNameIdentifier getPresentation]
}
Flex
%%
%class LngLexer
%implements FlexLexer
%unicode
%function advance
%type IElementType
%eof{ return;
%eof}
CRLF=\R
WHITE_SPACE=[\ \n\t\f]
FIRST_VALUE_CHARACTER=[^ \n\f\\] | "\\"{CRLF} | "\\".
VALUE_CHARACTER=[^\n\f\\] | "\\"{CRLF} | "\\".
END_OF_LINE_COMMENT=("//")[^\r\n]*
HEADER=("#")[^\r\n]*
SEPARATOR=[:=]
KEY_CHARACTER=[^:=\ \n\t\f\\] | "\\ "
%state WAITING_VALUE
%%
<YYINITIAL> {END_OF_LINE_COMMENT} { yybegin(YYINITIAL); return LngTypes.COMMENT; }
<YYINITIAL> {HEADER} { yybegin(YYINITIAL); return LngTypes.HEADER; }
<YYINITIAL> {KEY_CHARACTER}+ { yybegin(YYINITIAL); return LngTypes.KEY; }
<YYINITIAL> {SEPARATOR} { yybegin(WAITING_VALUE); return LngTypes.SEPARATOR; }
<WAITING_VALUE> {CRLF}({CRLF}|{WHITE_SPACE})+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; }
<WAITING_VALUE> {WHITE_SPACE}+ { yybegin(WAITING_VALUE); return TokenType.WHITE_SPACE; }
<WAITING_VALUE> {FIRST_VALUE_CHARACTER}{VALUE_CHARACTER}* { yybegin(YYINITIAL); return LngTypes.VALUE; }
({CRLF}|{WHITE_SPACE})+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; }
[^] { return TokenType.BAD_CHARACTER; }

Related

check if condition is met before executing the action in JFlex

I am writing a lexical analyzer using JFlex. When the word co is matched, we have to ignore what comes after until the end of the line (because it's a comment). For the moment, I have a boolean variable that changes to true whenever this word is matched and if an identifier or an operator is matched after co until the end of the line, I simply ignore it because I have an if condition in my Identifier and Operator token identification.
I am wondering if there is better way to do this and get rid of this if statement that appears everywhere?
Here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
%{
private boolean isCommentOpen = false;
private void toggleIsCommentOpen() {
this.isCommentOpen = ! this.isCommentOpen;
}
private boolean getIsCommentOpen() {
return this.isCommentOpen;
}
%}
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%%
{Operators} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
{Identifier} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
"co" {
toggleIsCommentOpen();
}
. {}
{EndOfLine} {
if (getIsCommentOpen()) {
toggleIsCommentOpen();
}
}
One way to do this is to use states in JFlex. We say that every time the word co is matched, we enter in a state named COMMENT_STATE and we do nothing until the end of the line. After the end of the line, we exit the COMMENT_STATE state. So here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%xstate YYINITIAL, COMMENT_STATE
%%
<YYINITIAL> {
"co" {yybegin(COMMENT_STATE);}
}
<COMMENT_STATE> {
{EndOfLine} {yybegin(YYINITIAL);}
. {}
}
{Operators} {// Do Code}
{Identifier} {// Do Code}
. {}
{EndOfLine} {}
With this new approach, the lexer is more simpler and it's also more readable.

Handle separator with Material dayjs date adapter

In my Angular application I need to swap from momentjs to dayjs.
Because I am using material I have to replace moment-date-adapter with a dayjs-date-adapter, so I write my own date adapter but I can't understand how momentjs can parse a date like 12122020 without any separator (you can see it in action here).
I try to achieve it by setting this MatDateFormats, with an array of dateinput.
But I don't know if it is the best solution because I don't see it in moment-date-adapter
MatDateFormats = {
parse: {
dateInput: ['D/M/YYYY', 'DMYYYY'],
},
display: {
dateInput: 'DD/MM/YYYY',
monthYearLabel: 'MMMM YYYY',
dateA11yLabel: 'DD/MM/YYYY',
monthYearA11yLabel: 'MMMM YYYY',
}
}
This is my dayjs-date-adapter
export interface DayJsDateAdapterOptions {
/**
* Turns the use of utc dates on or off.
* Changing this will change how Angular Material components like DatePicker output dates.
* {#default false}
*/
useUtc?: boolean;
}
/** InjectionToken for Dayjs date adapter to configure options. */
export const MAT_DAYJS_DATE_ADAPTER_OPTIONS = new InjectionToken<DayJsDateAdapterOptions>(
'MAT_DAYJS_DATE_ADAPTER_OPTIONS', {
providedIn: 'root',
factory: MAT_DAYJS_DATE_ADAPTER_OPTIONS_FACTORY
});
export function MAT_DAYJS_DATE_ADAPTER_OPTIONS_FACTORY(): DayJsDateAdapterOptions {
return {
useUtc: false
};
}
/** Creates an array and fills it with values. */
function range<T>(length: number, valueFunction: (index: number) => T): T[] {
const valuesArray = Array(length);
for (let i = 0; i < length; i++) {
valuesArray[i] = valueFunction(i);
}
return valuesArray;
}
/** Adapts Dayjs Dates for use with Angular Material. */
export class DayjsDateAdapter extends DateAdapter<Dayjs> {
private localeData: {
firstDayOfWeek: number,
longMonths: string[],
shortMonths: string[],
dates: string[],
longDaysOfWeek: string[],
shortDaysOfWeek: string[],
narrowDaysOfWeek: string[]
};
constructor(#Optional() #Inject(MAT_DATE_LOCALE) public dateLocale: string,
#Optional() #Inject(MAT_DAYJS_DATE_ADAPTER_OPTIONS) private options?:
DayJsDateAdapterOptions) {
super();
this.initializeParser(dateLocale);
}
private get shouldUseUtc(): boolean {
const {useUtc}: DayJsDateAdapterOptions = this.options || {};
return !!useUtc;
}
// TODO: Implement
setLocale(locale: string) {
super.setLocale(locale);
const dayJsLocaleData = this.dayJs().localeData();
this.localeData = {
firstDayOfWeek: dayJsLocaleData.firstDayOfWeek(),
longMonths: dayJsLocaleData.months(),
shortMonths: dayJsLocaleData.monthsShort(),
dates: range(31, (i) => this.createDate(2017, 0, i + 1).format('D')),
longDaysOfWeek: range(7, (i) => this.dayJs().set('day', i).format('dddd')),
shortDaysOfWeek: dayJsLocaleData.weekdaysShort(),
narrowDaysOfWeek: dayJsLocaleData.weekdaysMin(),
};
}
getYear(date: Dayjs): number {
return this.dayJs(date).year();
}
getMonth(date: Dayjs): number {
return this.dayJs(date).month();
}
getDate(date: Dayjs): number {
return this.dayJs(date).date();
}
getDayOfWeek(date: Dayjs): number {
return this.dayJs(date).day();
}
getMonthNames(style: 'long' | 'short' | 'narrow'): string[] {
return style === 'long' ? this.localeData.longMonths : this.localeData.shortMonths;
}
getDateNames(): string[] {
return this.localeData.dates;
}
getDayOfWeekNames(style: 'long' | 'short' | 'narrow'): string[] {
if (style === 'long') {
return this.localeData.longDaysOfWeek;
}
if (style === 'short') {
return this.localeData.shortDaysOfWeek;
}
return this.localeData.narrowDaysOfWeek;
}
getYearName(date: Dayjs): string {
return this.dayJs(date).format('YYYY');
}
getFirstDayOfWeek(): number {
return this.localeData.firstDayOfWeek;
}
getNumDaysInMonth(date: Dayjs): number {
return this.dayJs(date).daysInMonth();
}
clone(date: Dayjs): Dayjs {
return date.clone();
}
createDate(year: number, month: number, date: number): Dayjs {
const returnDayjs = this.dayJs()
.set('year', year)
.set('month', month)
.set('date', date);
return returnDayjs;
}
today(): Dayjs {
return this.dayJs();
}
parse(value: any, parseFormat: string): Dayjs | null {
if (value && typeof value === 'string') {
return this.dayJs(value, parseFormat, this.locale);
}
return value ? this.dayJs(value).locale(this.locale) : null;
}
format(date: Dayjs, displayFormat: string): string {
if (!this.isValid(date)) {
throw Error('DayjsDateAdapter: Cannot format invalid date.');
}
return date.locale(this.locale).format(displayFormat);
}
addCalendarYears(date: Dayjs, years: number): Dayjs {
return date.add(years, 'year');
}
addCalendarMonths(date: Dayjs, months: number): Dayjs {
return date.add(months, 'month');
}
addCalendarDays(date: Dayjs, days: number): Dayjs {
return date.add(days, 'day');
}
toIso8601(date: Dayjs): string {
return date.toISOString();
}
deserialize(value: any): Dayjs | null {
let date;
if (value instanceof Date) {
date = this.dayJs(value);
} else if (this.isDateInstance(value)) {
// NOTE: assumes that cloning also sets the correct locale.
return this.clone(value);
}
if (typeof value === 'string') {
if (!value) {
return null;
}
date = this.dayJs(value).toISOString();
}
if (date && this.isValid(date)) {
return this.dayJs(date);
}
return super.deserialize(value);
}
isDateInstance(obj: any): boolean {
return dayjs.isDayjs(obj);
}
isValid(date: Dayjs): boolean {
return this.dayJs(date).isValid();
}
invalid(): Dayjs {
return this.dayJs(null);
}
private dayJs(input?: any, format?: string, locale?: string): Dayjs {
if (!this.shouldUseUtc) {
return dayjs(input, format, locale, false);
}
return dayjs(input, {format, locale, utc: this.shouldUseUtc}, locale, false).utc();
}
private initializeParser(dateLocale: string) {
if (this.shouldUseUtc) {
dayjs.extend(utc);
}
dayjs.extend(LocalizedFormat);
dayjs.extend(customParseFormat);
dayjs.extend(localeData);
}
}
The dateInput that you use in the parse property of MatDateFormats is used in the parse function of your dayjs-date-adapter. Right now you supply an array as dateInput, but your function expects a string. Dayjs (unlike moment) cannot handle an array of formats. If you want to use an array, to support multiple formats, you must figure out which format of the array to use in your parse function. The easiest way to do this is probably just to loop over your possible formats and return the dayjs object if it is valid.
Something like this (note I have not tested this):
parse(value: any, parseFormats: string[]): Dayjs | null {
if (value && typeof value === 'string') {
parseFormats.forEach(parseFormat => {
const parsed = this.dayJs(value, parseFormat, this.locale);
if (parsed.isValid()) {
return parsed;
}
}
// return an invalid object if it could not be parsed with the supplied formats
return this.dayJs(null);
}
return value ? this.dayJs(value).locale(this.locale) : null;
}
Note in my own adapter I altered the private dayJs function a little bit, because providing locale also in the format options gave me some weird behavior. I didn't need the utc options, so I ended up using:
private dayJs(input?: any, format?: string, locale?: string): Dayjs {
return dayjs(input, format, locale);
}
An alternative to the approach above would be to just supply 1 dateInput (like : dateInput: 'D/M/YYYY'). And then make the parse function a little bit more flexible. I ended up with this:
parse(value: any, parseFormat: string): Dayjs | null {
if (value && typeof value === 'string') {
const longDateFormat = dayjs().localeData().longDateFormat(parseFormat); // MM/DD/YYY or DD-MM-YYYY, etc.
// return this.dayJs(value, longDateFormat);
let parsed = this.dayJs(value, longDateFormat, this.locale);
if (parsed.isValid()) {
// string value is exactly like long date format
return parsed;
}
const alphaNumericRegex = /[\W_]+/;
if (!alphaNumericRegex.test(value)) {
// if string contains no non-word characters and no _
// user might have typed 24012020 or 01242020
// strip long date format of non-word characters and take only the length of the value so we get DDMMYYYY or DDMM etc
const format = longDateFormat.replace(/[\W_]+/g, '').substr(0, value.length);
parsed = this.dayJs(value, format, this.locale);
if (parsed.isValid()) {
return parsed;
}
}
const userDelimiter = alphaNumericRegex.exec(value) ? alphaNumericRegex.exec(value)![0] : '';
const localeDelimiter = alphaNumericRegex.exec(longDateFormat) ? alphaNumericRegex.exec(longDateFormat)![0] : '';
const parts = value.split(userDelimiter);
const formatParts = longDateFormat.split(localeDelimiter);
if (parts.length <= formatParts.length && parts.length < 4) {
// right now this only works for days, months, and years, if time should be supported this should be altered
let newFormat = '';
parts.forEach((part, index) => {
// get the format in the length of the part, so if a the date is supplied 1-1-19 this should result in D-M-YY
// note, this will not work if really weird input is supplied, but that's okay
newFormat += formatParts[index].substr(0, part.length);
if (index < parts.length - 1) {
newFormat += userDelimiter;
}
});
parsed = this.dayJs(value, newFormat);
if (parsed.isValid()) {
return parsed;
}
}
// not able to parse anything sensible, return something invalid so input can be corrected
return this.dayJs(null);
}
return value ? this.dayJs(value).locale(this.locale) : null;
}
If you only want to support number only inputs (like 28082021) beside your specified input, you need the if statement with !alphaNumericRegex.test(value). This piece of code takes out any delimiters (like - or /) from your formatting string and also makes sure string with only days or days and months are supported (28 or 2808 for example). It will use the current month and year to fill up the missing values. If you only want to support full day-month-year strings you can omit the .substr part.
The piece of code below this if statement causes different types of user input to be supported, like 28-08-2021, 28/08/2021, 28 08 2021, 28-08-21, 28/08 etc..
I'm sure it won't work for every language, but it works for the most used userinputs in my language (dutch).
Hope this helps someone who has been struggling with this as well!

How to match whitespace and comments with re2c

I started very recently to use bison for writing small compiler exercises. I am having some issues with white spaces ans comments. I was trying to debug the problem and I arrived to this source that looks like what I am looking for. I tried to chnage and erase some characters as advised but didn't work.
Also during compilation I have the following error: re2c: error: line 2963, column 0: can only difference char sets.
Below the part of the code:
yy::conj_parser::symbol_type yy::yylex(lexcontext& ctx)
{
const char* anchor = ctx.cursor;
ctx.loc.step();
// Add a lambda function to avoid repetition
auto s = [&](auto func, auto&&... params) { ctx.loc.columns(ctx.cursor - anchor); return func(params..., ctx.loc); };
%{ /* Begin re2c lexer : Tokenization process starts */
re2c:yyfill:enable = 0;
re2c:define:YYCTYPE = "char";
re2c:define:YYCURSOR = "ctx.cursor";
"return" { return s(conj_parser::make_RETURN); }
"while" | "for" { return s(conj_parser::make_WHILE); }
"var" { return s(conj_parser::make_VAR); }
"if" { return s(conj_parser::make_IF); }
// Identifiers
[a-zA-Z_] [a-zA-Z_0-9]* { return s(conj_parser::make_IDENTIFIER, std::string(anchor, ctx.cursor)); }
// String and integers:
"\""" [^\"]* "\"" { return s(conj_parser::make_STRINGCONST, std::string(anchor+1, ctx.cursor-1)); }
[0-9]+ { return s(conj_parser::make_NUMCONST, std::stol(std::string(anchor, ctx.cursor))); }
// Whitespace and comments:
"\000" { return s(conj_parser::make_END); }
"\r\n" | [\r\n] { ctx.loc.lines(); return yylex(ctx); }
"//" [^\r\n]* { return yylex(ctx); }
[\t\v\b\f ] { ctx.loc.columns(); return yylex(ctx); }
Thank you very much for pointing in the right direction or shading some lights on why this error could be solved.
You really should mention which line is line 2963. Perhaps it is here, because there seems to be an extra quotation mark in that line.
"\""" [^\"]* "\""
^

xtext Format comment with AbstractFormatter2,

I am using AbstractFormatter2 with xtext 2.9.2
I want to put the comment in a specific column
My syntax looks something like this
terminal SL_COMMENT : '*' !('\n'|'\r')* ('\r'? '\n')?;
So far, I tried to put multiple spaces before my comment, but this doesn't work either
def dispatch void format(Model model, extension IFormattableDocument document) {
SL_COMMENTRule.prepend[space " "]
model.getEnte.format;
model.getMapset.format;
}
can anyone guide my how to format comments in general then how to put them in a specific column
comment formatting is done by commentreplacers
thus something like the following should work
override createCommentReplacer(IComment comment) {
val EObject grammarElement = comment.getGrammarElement();
if (grammarElement instanceof AbstractRule) {
val String ruleName = grammarElement.getName();
if (ruleName.startsWith("SL")) {
if (comment.getLineRegions().get(0).getIndentation().getLength() > 0) {
return new SinglelineDocCommentReplacer(comment, "//") {
override configureWhitespace(WhitespaceReplacer leading, WhitespaceReplacer trailing) {
leading.getFormatting().space = " ";
}
};
} else {
return new SinglelineCodeCommentReplacer(comment, "//") {
override configureWhitespace(WhitespaceReplacer leading, WhitespaceReplacer trailing) {
leading.getFormatting().space = " ";
}
}
}
}
}
super.createCommentReplacer(comment)
}

Mystery about my Mini-C parsing with Flex/Bison

I'm trying scanning and parsing the mini-C code with lex and yacc, and I got some troubles.
I set tokens in scanner.l File like below,
...
"const" return tconst
"int" return tint
[A-Za-z][A-Za-z0-9]* return tident
...
and I declare production rule in parser.y File
...
dcl_specification : dcl_specifiers { semantic(8); };
dcl_specifiers : dcl_specifier { semantic(9); }
| dcl_specifiers dcl_specifier { semantic(10); };
dcl_specifier : type_qualifier { semantic(11); }
| type_specifier { semantic(12); };
type_qualifier : tconst { semantic(13); };
type_specifier : tint { semantic(14); };
...
%%
...
void semantic(int n) {
printf("%d\n", n);
}
...
then, I want to parse the text const int max=100,
result is
13
11
9
8
syntax error
I expect 10 shows up because const int might be reduced by rule 10,
but I'm wrong.
why this happens and what should I do?
please let me know

Resources