grep/ack -o with characters of context (not lines) - grep

I'm trying to find a way to
grep -o "somepattern"
which gives me something like
html/file.js
2:somepattern
5:somepattern
but what would be really nice is to have a few characters (maybe 20) before and/or after that match.
I know there is a way to show lines before and after (context), but is there any way to show context by characters? e.g.
html/file.js
2:function helloWorld(somepattern) {
5: var foo = somepattern;
The reason I ask is that if I grep recursively and hit a minified file with a match, it prints the entire file, which is super annoying.

Using ack:
% ack -o '.{0,10}string.{0,10}' | head
cli/cmdlineparser.cpp:22:#include <string>
cli/cmdlineparser.cpp:23:include <cstring>
cli/cmdlineparser.cpp:37:onst std::string& FileList
ctor<std::string>& PathNam
cli/cmdlineparser.cpp:57: std::string FileName;
cli/cmdlineparser.cpp:66:onst std::string& FileList
list<std::string>& PathNam
cli/cmdlineparser.cpp:72: std::string PathName;
cli/cmdlineparser.cpp:92:onst std::string &message)
cli/cmdlineparser.cpp:133:onst std::string errmsg =
Using (Gnu) grep:
% grep -no '.\{0,10\}string.\{0,10\}' **/*.[ch]* | head
cli/cmdlineparser.cpp:22:#include <string>
cli/cmdlineparser.cpp:23:include <cstring>
cli/cmdlineparser.cpp:37:onst std::string& FileList
ctor<std::string>& PathNam
cli/cmdlineparser.cpp:57: std::string FileName;
cli/cmdlineparser.cpp:66:onst std::string& FileList
list<std::string>& PathNam
cli/cmdlineparser.cpp:72: std::string PathName;
cli/cmdlineparser.cpp:92:onst std::string &message)
cli/cmdlineparser.cpp:133:onst std::string errmsg =
...shows up to 10 characters before and 10 characters after 'string'... (assuming they're there).
I'm using | head here merely to limit the output to 10 lines for clarity.

Related

Cling API available?

How to use Cling in my app via API to interpret C++ code?
I expect it to provide terminal-like way of interaction without need to compile/run executable. Let's say i have hello world program:
void main() {
cout << "Hello world!" << endl;
}
I expect to have API to execute char* = (program code) and get char *output = "Hello world!". Thanks.
PS. Something similar to ch interpeter example:
/* File: embedch.c */
#include <stdio.h>
#include <embedch.h>
char *code = "\
int func(double x, int *a) { \
printf(\"x = %f\\n\", x); \
printf(\"a[1] in func=%d\\n\", a[1]);\
a[1] = 20; \
return 30; \
}";
int main () {
ChInterp_t interp;
double x = 10;
int a[] = {1, 2, 3, 4, 5}, retval;
Ch_Initialize(&interp, NULL);
Ch_AppendRunScript(interp,code);
Ch_CallFuncByName(interp, "func", &retval, x, a);
printf("a[1] in main=%d\n", a[1]);
printf("retval = %d\n", retval);
Ch_End(interp);
}
}
There is finally a better answer: example code! See https://github.com/root-project/cling/blob/master/tools/demo/cling-demo.cpp
And the answer to your question is: no. cling takes code and returns C++ values or objects, across compiled and interpreted code. It's not a "string in / string out" kinda thing. There's perl for that ;-) This is what code in, value out looks like:
// We could use a header, too...
interp.declare("int aGlobal;\n");
cling::Value res; // Will hold the result of the expression evaluation.
interp.process("aGlobal;", &res);
std::cout << "aGlobal is " << res.getAs<long long>() << '\n';
Apologies for the late reply!
Usually the way one does it is:
[cling$] #include "cling/Interpreter/Interpreter.h"
[cling$] const char* someCode = "int i = 123;"
[cling$] gCling->declare(someCode);
[cling$] i // You will have i declared:
(int) 123
The API is documented in: http://cling.web.cern.ch/cling/doxygen/classcling_1_1Interpreter.html
Of course you can create your own 'nested' interpreter in cling's runtime too. (See the doxygen link above)
I hope it helps and answers the question, more usage examples you can find under the test/ folder.
Vassil

Non-greedy multi-line and single-line matching

I'm trying to modify a flex+bison generator to allow the inclusion of code snippets denoted by surrounding '{{' and '}}'. Unlike the multi-line comment case, I must capture all of the content.
My attempts either fail in the case where the '{{' and the '}}' are on the same line or they are painfully slow.
My first attempt was something like this:
%{
#include <stdio.h>
// sscce implementation of a growing string buffer
char codeBlock[4096];
int codeOffset;
const char* curFilename = "file.l";
extern int yylineno;
void add_code_line(const char* yytext)
{
codeOffset += sprintf(codeBlock + codeOffset, "#line %u \"%s\"\n\t%s\n", yylineno, curFilename, yytext);
}
%}
%option stack
%option yylineno
%x CODE_FRAG
%%
"{{"[ \n]* { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}" { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>[^\n]* { add_code_line(yytext); }
<CODE_FRAG>\n
\n
.
Note: the "codeBlock" implementation is a contrivance for the purpose of an SSCCE only. It's not what I'm actually using.
This works for a simple test case:
{{ from line 1
from line 2
}}
{{
from line 7
}}
Outputs
// code
#line 1 "file.l"
from line 1
#line 2 "file.l"
from line 2
// code
#line 7 "file.l"
from line 7
But it can't handle
{{ hello }}
The two solutions I can think of are:
/* capture character-by-character */
<CODE_FRAG>. { add_code_character(yytext[0]); }
And
<INITIAL>"{{".*?"}}" { int n = strlen(yytext); yytext + (n - 2) = 0; add_code(yytext + 2); }
The former seems likely to be slow, and the latter just feels wrong.
Any ideas?
--- EDIT ---
The following appears to achieve the result desired, but I'm not sure if it's a "good" Flex way to do this:
"{{"[ \n]* { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}" { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>.*?/"}}" { add_code_line(yytext); }
<CODE_FRAG>.*? { add_code_line(yytext); }
<CODE_FRAG>\n
Flex doesn't implement non-greedy matches. So .*? won't work the way you expect it to in flex. (It will be an optional .*, which is indistinguishable from .*)
Here's a regular expression which will match from {{ as far as possible without a }}:
"{{"([}]?[^}])*
That might not be what you want, since it won't allow nested {{...}} within your code blocks. However, you didn't mention that as a requirement and none of your examples functions that way.
The above regular expression does not match the closing }}, which appears to be what you want since it lets you call add_code(yytext+2) without modifying the temporary buffer. However, you do need to deal with the }} in your action. See below.
The regular expression above will match to the end of the file if there is no matching }}. You probably want to deal with that as an error; the simplest way is to check if EOF is encountered while you are trying to ignore the }}
"{{"([}]?[^}])* { add_code(yytext+2);
if (input() == EOF || input() == EOF) {
/* Produce an error, unclosed {{ */
}
}

Extract some YUV frames from large YUV file

I am looking for WIN32 program to copy part of the large 1920x1080px 4:2:0 .YUV file (cca. 43GB) into smaller .YUV files. All of the programs I have used, i.e. YUV players, can only copy/save 1 frame at the time. What is the easiest/appropriate method to cut YUV raw data to smaller YUV videos(images)? SOmething similar to ffmpeg command:
ffmpeg -ss [start_seconds] -t [duration_seconds] -i [input_file] [outputfile]
Here is the Minimum Working Example of the code, written in C++, if anyone will search for a simple solution:
// include libraries
#include <fstream>
using namespace std;
#define P420 1.5
const int IMAGE_SIZE = 1920*1080; // ful HD image size in pixels
const double IMAGE_CONVERTION = P420;
int n_frames = 300; // set number of frames to copy
int skip_frames = 500; // set number of frames to skip from the begining of the input file
char in_string[] = "F:\\BigBucksBunny\\yuv\\BigBuckBunny_1920_1080_24fps.yuv";
char out_string[] = "out.yuv";
//////////////////////
// main
//////////////////////
int main(int argc, char** argv)
{
double image_size = IMAGE_SIZE * IMAGE_CONVERTION;
long file_size = 0;
// IO files
ofstream out_file(out_string, ios::out | ios::binary);
ifstream in_file(in_string, ios::in | ios::binary);
// error cheking, like check n_frames+skip_frames overflow
//
// TODO
// image buffer
char* image = new char[(int)image_size];
// skip frames
in_file.seekg(skip_frames*image_size);
// read/write image buffer one by one
for(int i = 0; i < n_frames; i++)
{
in_file.read(image, image_size);
out_file.write(image, image_size);
}
// close the files
out_file.close();
in_file.close();
printf("Copy finished ...");
return 0;
}
If you have python available, you can use this approach to store each frame as a separate file:
src_yuv = open(self.filename, 'rb')
for i in xrange(NUMBER_OF_FRAMES):
data = src_yuv.read(NUMBER_OF_BYTES)
fname = "frame" + "%d" % i + ".yuv"
dst_yuv = open(fname, 'wb')
dst_yuv.write(data)
sys.stdout.write('.')
sys.stdout.flush()
dst_yuv.close()
src_yuv.close()
just change the capitalized variable into valid numbers, e.g
NUMBER_OF_BYTES for one frame 1080p should be 1920*1080*3/2=3110400
Or if you install cygwin you can use the dd tool, e.g. to get the first frame of a 1080p clip do:
dd bs=3110400 count=1 if=sample.yuv of=frame1.yuv
Method1:
If you are using gstreamer and you just want first X amount of yuv frames from large yuv files then you can use below method
gst-launch-1.0 filesrc num-buffers=X location="Your_large.yuv" ! videoparse width=x height=y format="xy" ! filesink location="FirstXframes.yuv"
Method2:
Calculate size of 1 frames and then use split utility to divide large files in small files.
Use
split -b size_in_bytes Large_file prefix

recursion in parsing

Here are input file, .l file , .y file and output.
problem is that parser is not able to identify the directions recursively..
it is identifying just first...
i've used same rule for recognizing ports and its working
but not in case of direction..
also it is not displaying .y file code associated with rule(cout statement)
input file .
start a b c d //ports
a:O b:I c:B d:O //direction of ports
.l file
[\t]+ {}
[\n] {line_num++; cout"line_num:" line_num; }
start { cout< "beggining of file"; return START;}
[a-zA-Z0-9_\-]+:[IOB] {cout<<"\ndirection:" << strdup(yytext); return DR; }
[a-zA-Z0-9_\-]+ {cout<<"\nfound name:" strdup(yytext); return NAME;}
.y file grammer
doc : START ports dir
ports : NAME ports { cout<<"\port in .y" $1;}
| NAME { cout<<"\nport in .y" $1;}
;
dir : DR dir { cout<<"\ndirection in .y" $1;}
| DR { cout<<"\ndirection in .y"<<$1; }
;
output is .
beginning of file
found name:a
found name:b
found name:c
found name:d
line no-2
direction:a:O
The only clear error you're making is that you are not setting the value of yylval in your flex actions, so $1 is some uninitialized value in all of your bison actions. Your flex actions should look something like this:
[a-zA-Z0-9_\-]+ { yylval = strdup(text);
cout << "\nfound name:" << yylval;
return NAME;
}
Also, make sure you specify that the type of the tokens DR and NAME is const char *.
Finally, don't forget to free() the strings when you don't need them any more.

How to parse template languages in Ragel?

I've been working on a parser for simple template language. I'm using Ragel.
The requirements are modest. I'm trying to find [[tags]] that can be embedded anywhere in the input string.
I'm trying to parse a simple template language, something that can have tags such as {{foo}} embedded within HTML. I tried several approaches to parse this but had to resort to using a Ragel scanner and use the inefficient approach of only matching a single character as a "catch all". I feel this is the wrong way to go about this. I'm essentially abusing the longest-match bias of the scanner to implement my default rule ( it can only be 1 char long, so it should always be the last resort ).
%%{
machine parser;
action start { tokstart = p; }
action on_tag { results << [:tag, data[tokstart..p]] }
action on_static { results << [:static, data[p..p]] }
tag = ('[[' lower+ ']]') >start #on_tag;
main := |*
tag;
any => on_static;
*|;
}%%
( actions written in ruby, but should be easy to understand ).
How would you go about writing a parser for such a simple language? Is Ragel maybe not the right tool? It seems you have to fight Ragel tooth and nails if the syntax is unpredictable such as this.
Ragel works fine. You just need to be careful about what you're matching. Your question uses both [[tag]] and {{tag}}, but your example uses [[tag]], so I figure that's what you're trying to treat as special.
What you want to do is eat text until you hit an open-bracket. If that bracket is followed by another bracket, then it's time to start eating lowercase characters till you hit a close-bracket. Since the text in the tag cannot include any bracket, you know that the only non-error character that can follow that close-bracket is another close-bracket. At that point, you're back where you started.
Well, that's a verbatim description of this machine:
tag = '[[' lower+ ']]';
main := (
(any - '[')* # eat text
('[' ^'[' | tag) # try to eat a tag
)*;
The tricky part is, where do you call your actions? I don't claim to have the best answer to that, but here's what I came up with:
static char *text_start;
%%{
machine parser;
action MarkStart { text_start = fpc; }
action PrintTextNode {
int text_len = fpc - text_start;
if (text_len > 0) {
printf("TEXT(%.*s)\n", text_len, text_start);
}
}
action PrintTagNode {
int text_len = fpc - text_start - 1; /* drop closing bracket */
printf("TAG(%.*s)\n", text_len, text_start);
}
tag = '[[' (lower+ >MarkStart) ']]' #PrintTagNode;
main := (
(any - '[')* >MarkStart %PrintTextNode
('[' ^'[' %PrintTextNode | tag) >MarkStart
)* #eof(PrintTextNode);
}%%
There are a few non-obvious things:
The eof action is needed because %PrintTextNode is only ever invoked on leaving a machine. If the input ends with normal text, there will be no input to make it leave that state. Because it will also be called when the input ends with a tag, and there is no final, unprinted text node, PrintTextNode tests that it has some text to print.
The %PrintTextNode action nestled in after the ^'[' is needed because, though we marked the start when we hit the [, after we hit a non-[, we'll start trying to parse anything again and remark the start point. We need to flush those two characters before that happens, hence that action invocation.
The full parser follows. I did it in C because that's what I know, but you should be able to turn it into whatever language you need pretty readily:
/* ragel so_tag.rl && gcc so_tag.c -o so_tag */
#include <stdio.h>
#include <string.h>
static char *text_start;
%%{
machine parser;
action MarkStart { text_start = fpc; }
action PrintTextNode {
int text_len = fpc - text_start;
if (text_len > 0) {
printf("TEXT(%.*s)\n", text_len, text_start);
}
}
action PrintTagNode {
int text_len = fpc - text_start - 1; /* drop closing bracket */
printf("TAG(%.*s)\n", text_len, text_start);
}
tag = '[[' (lower+ >MarkStart) ']]' #PrintTagNode;
main := (
(any - '[')* >MarkStart %PrintTextNode
('[' ^'[' %PrintTextNode | tag) >MarkStart
)* #eof(PrintTextNode);
}%%
%% write data;
int
main(void) {
char buffer[4096];
int cs;
char *p = NULL;
char *pe = NULL;
char *eof = NULL;
%% write init;
do {
size_t nread = fread(buffer, 1, sizeof(buffer), stdin);
p = buffer;
pe = p + nread;
if (nread < sizeof(buffer) && feof(stdin)) eof = pe;
%% write exec;
if (eof || cs == %%{ write error; }%%) break;
} while (1);
return 0;
}
Here's some test input:
[[header]]
<html>
<head><title>title</title></head>
<body>
<h1>[[headertext]]</h1>
<p>I am feeling very [[emotion]].</p>
<p>I like brackets: [ is cool. ] is cool. [] are cool. But [[tag]] is special.</p>
</body>
</html>
[[footer]]
And here's the output from the parser:
TAG(header)
TEXT(
<html>
<head><title>title</title></head>
<body>
<h1>)
TAG(headertext)
TEXT(</h1>
<p>I am feeling very )
TAG(emotion)
TEXT(.</p>
<p>I like brackets: )
TEXT([ )
TEXT(is cool. ] is cool. )
TEXT([])
TEXT( are cool. But )
TAG(tag)
TEXT( is special.</p>
</body>
</html>
)
TAG(footer)
TEXT(
)
The final text node contains only the newline at the end of the file.

Resources