count number of lines in file - Scala - parsing

How would I go about counting the number of lines in a text file similar to wc -l on the unix command line in scala?

io.Source.fromFile("file.txt").getLines.size
Note that getLines returns an Iterator[String] so you aren't actually reading the whole file into memory.

Cribbing from another answer I posted:
def lineCount(f: java.io.File): Int = {
val src = io.Source.fromFile(f)
try {
src.getLines.size
} finally {
src.close()
}
}
Or, using scala-arm:
import resource._
def firstLine(f: java.io.File): Int = {
managed(io.Source.fromFile(f)) acquireAndGet { src =>
src.getLines.size
}
}

val source = Source.fromFile(new File("file")).getLines
var n = 1 ; while (source.hasNext) { printf("%d> %s", n, source.next) ; n += 1 }
val source = Source.fromFile(new File("file")).getLines
for ((line, n) <- source zipWithIndex) { printf("%d> %s", (n + 1), line) }

Related

Why is my binary search implementation returning -1?

This is my main.dart:
import 'edgecases.dart';
main () {
var card = edgecases(0)['input']['cards'];
var query = edgecases(0)['input']['query'];
var result = locate_card(edgecases(0)['input']['cards'], edgecases(0)['input']['query']);
var output = edgecases(0)['output'];
print("Cards:- $card");
print("Query:- $query");
print("Output:- $result");
print("Actual answer:- $output");
}
And this is my edgecases.dart:
edgecases ([edgecasenumber = null]) { //You may make it required, I provided a null as default to check if my syntax is going right.
List tests = [];
var edge1 = {'input': {
'cards': [13, 11, 10, 7, 4, 3, 1, 0],
'query': 1
}, 'output': 6};
tests.addAll([edge1]);
if (edgecasenumber == null){ // This if is useless here so you may
return 'Null type object coud not be found.';
} else {
return tests.elementAt(edgecasenumber); // Indexing in dart also starts with 0.
}
}
locate_card (List cards, int query){
int lo = 0;
int hi = cards.length - 1;
print('$lo $hi');
while (lo <= hi) {
//print('hello'); Uncomment to see if it is entering the loop
var mid = (lo + hi) ~/ 2;
var mid_number = cards[mid];
print("lo:$lo ,hi:$hi, mid:$mid, mid_number:$mid_number");
if (mid_number == query){
return mid;
} else if (mid_number < query) {
hi = mid - 1;
} else if (mid_number > query) {
lo = mid + 1;
};
return -1; //taking about this line
};
}
[I have cut short the code here so you may find some things as unnecessary so just ignore it XD]
Actually I am trying to implement binary search here(I have previously successfully implemented it in python, I am implementing in dart to learn the language.)
On testing it with first edge case(that is on running the command dart main.dart), I found that it is returning the value -1 which was wrong, so I tried commenting the return -1; line in edgecases.dart file to see what happens as it was made to handle another edge case(edgecase if the list is empty, here I have removed that for simplicity). I am not able to understand why it is returning -1 if it gives the right value on commenting that line. Any possible explainations and solutions?
Thanks in advance!
You almost did it right. Just place the return -1; after the while loop's closing brace at the very end of locate_card.

String Newline not displaying in Doors

I have csv file containing some data like:
374,Test Comment multiplelines \n Here's the 2nd line,Other_Data
Where 374 is the object ID from doors, then some commentary and then some other data.
I have a piece of code that reads the data from the CSV file, stores it in the appropriate variables and then writes it to the doors Object.
Module Openend_module = edit("path_to_mod", true,true,true)
Object o ;
Column c;
string attrib;
string oneLine ;
string OBJECT_ID = "";
string Comment = "";
String Other_data = "";
int offset;
string split_text(string s)
{
if (findPlainText(s, sub, offset, len, false))
{
return s[0 : offset -1]
}
else
{
return ""
}
}
Stream input = read("Path_to_Input.txt");
input >> oneLine
OBJECT_ID = split_text(oneLine)
oneLine = oneLine[offset+1:]
Comment = split_text(oneLine)
Other_data = oneLine[offset+1:]
When using print Comment the output in the DXL console is : Test Comment multiplelines \n Here's the 2nd line
for o in Opened_Module do
{
if (o."Absolute Number"""==OBJECT_ID ){
attrib = "Result_Comment " 2
o.attrib = Comment
}
}
But after writing to the doors object, the \n is not taken into consideration and the result is as follows:
I've tried putting the string inside a Buffer and using stringOf() but the escape character just disappeared.
I've also tried adding \r\n and \\n to the input csv text but still no luck
This isn't the most efficient way of handling this, but I have a relatively straightforward fix.
I would suggest adding the following:
Module Openend_module = edit("path_to_mod", true,true,true)
Object o ;
Column c;
string attrib;
string oneLine ;
string OBJECT_ID = "";
string Comment = "";
String Other_data = "";
int offset;
string split_text(string s)
{
if (findPlainText(s, sub, offset, len, false))
{
return s[0 : offset -1]
}
else
{
return ""
}
}
Stream input = read("Path_to_Input.txt");
input >> oneLine
OBJECT_ID = split_text(oneLine)
oneLine = oneLine[offset+1:]
Comment = split_text(oneLine)
Other_data = oneLine[offset+1:]
//Modification to comment string
int x
int y
while ( findPlainText ( Comment , "\\n" , x , y , false ) ) {
Comment = ( Comment [ 0 : x - 1 ] ) "\n" ( Comment [ x + 2 : ] )
}
This will run the comment string through a parser, replacing string "\n" with the char '\n'. Be aware- this will ignore any trailing spaces at the end of a line.
Let me know if that helps.

How can I properly parse an email address with name?

I'm reading email headers (in Node.js, for those keeping score) and they are VARY varied. E-mail addresses in the to field look like:
"Jake Smart" <jake#smart.com>, jack#smart.com, "Development, Business" <bizdev#smart.com>
and a variety of other formats. Is there any way to parse all of this out?
Here's my first stab:
Run a split() on - to break up the different people into an array
For each item, see if there's a < or ".
If there's a <, then parse out the email
If there's a ", then parse out the name
For the name, if there's a ,, then split to get Last, First names.
If I first do a split on the ,, then the Development, Business will cause a split error. Spaces are also inconsistent. Plus, there may be more e-mail address formats that come through in headers that I haven't seen before. Is there any way (or maybe an awesome Node.js library) that will do all of this for me?
There's a npm module for this - mimelib (or mimelib-noiconv if you are on windows or don't want to compile node-iconv)
npm install mimelib-noiconv
And the usage would be:
var mimelib = require("mimelib-noiconv");
var addressStr = 'jack#smart.com, "Development, Business" <bizdev#smart.com>';
var addresses = mimelib.parseAddresses(addressStr);
console.log(addresses);
// [{ address: 'jack#smart.com', name: '' },
// { address: 'bizdev#smart.com', name: 'Development, Business' }]
The actual formatting for that is pretty complicated, but here is a regex that works. I can't promise it always will work though. https://www.rfc-editor.org/rfc/rfc2822#page-15
const str = "...";
const pat = /(?:"([^"]+)")? ?<?(.*?#[^>,]+)>?,? ?/g;
let m;
while (m = pat.exec(str)) {
const name = m[1];
const mail = m[2];
// Do whatever you need.
}
I'd try and do it all in one iteration (performance). Just threw it together (limited testing):
var header = "\"Jake Smart\" <jake#smart.com>, jack#smart.com, \"Development, Business\" <bizdev#smart.com>";
alert (header);
var info = [];
var current = [];
var state = -1;
var temp = "";
for (var i = 0; i < header.length + 1; i++) {
var c = header[i];
if (state == 0) {
if (c == "\"") {
current.push(temp);
temp = "";
state = -1;
} else {
temp += c;
}
} else if (state == 1) {
if (c == ">") {
current.push(temp);
info.push (current);
current = [];
temp = "";
state = -1;
} else {
temp += c;
}
} else {
if (c == "<"){
state = 1;
} else if (c == "\"") {
state = 0;
}
}
}
alert ("INFO: \n" + info);
For something complete, you should port this to JS: http://cpansearch.perl.org/src/RJBS/Email-Address-1.895/lib/Email/Address.pm
It gives you all the parts you need. The tricky bit is just the set of regexps at the start.

how to return values between dates and group results in couchdb

I'm having issues grouping date range results in couch db.
Say I have this data:
2010-11-14, Tom
2010-11-15, Tom
2010-11-15, Dick
2010-11-15, Tom
2010-11-20, Harry
and i want use a view (and possibly reduce function) to return grouped names between 2010-11-14 and 2010-11-16, eg
Tom 3
Dick 1
how can this be
achieved?
I would suggest the following document structure, and map and reduce functions:
{ date : '2010-11-14', name : 'Tom' }
function(doc) { var r = {}; r[doc.name] = 1; emit (doc.date, r); }
function (keys, values, rereduce) {
var r = {};
for (var i in values) {
for (var k in values[i]) {
if (k in r) r[k] += values[i][k];
else r[k] = values[i][k];
}
}
return r;
}
Then, you would query the view, asking for a full reduce (no grouping) with startkey and endkey parameters 2010-11-14 and 2010-11-16. You will get in return a single value:
{ 'Tom': 3, 'Dick': 1 }

String Split in DXL

I have a string
Ex: "We prefer questions that can be answered; not just discussed "
now i want to split this string from ";"
like
We prefer questions that can be answered
and
not just discussed
is this possible in DXL.
i am learning DXL, so i don't have any idea whether we can split or not.
Note : This is not a home work.
I'm sorry for necroing this post. Being new to DXL I spent some time with the same challenge. I noticed that the implementations available on the have different specifications of "splitting" a string. Loving the Ruby language, I missed an implementation which comes at least close to the Ruby version of String#split.
Maybe my findings will be helpful to anybody.
Here's a functional comparison of
Variant A: niol's implementation (which at a first glance, appears to be the same implementation which is usually found at Capri Soft,
Variant B: PJT's implementation,
Variant C: Brett's implementation and
Variant D: my implementation (which provides the correct functionality imo).
To eliminate structural difference, all implementations were implemented in functions, returning a Skip list or an Array.
Splitting results
Note that all implementations return different results, depending on their definition of "splitting":
string mellow yellow; delimiter ello
splitVariantA returns 1 elements: ["mellow yellow" ]
splitVariantB returns 2 elements: ["m" "llow yellow" ]
splitVariantC returns 3 elements: ["w" "w y" "" ]
splitVariantD returns 3 elements: ["m" "w y" "w" ]
string now's the time; delimiter
splitVariantA returns 3 elements: ["now's" "the" "time" ]
splitVariantB returns 2 elements: ["" "now's the time" ]
splitVariantC returns 5 elements: ["time" "the" "" "now's" "" ]
splitVariantD returns 3 elements: ["now's" "the" "time" ]
string 1,2,,3,4,,; delimiter ,
splitVariantA returns 4 elements: ["1" "2" "3" "4" ]
splitVariantB returns 2 elements: ["1" "2,,3,4,," ]
splitVariantC returns 7 elements: ["" "" "4" "3" "" "2" "" ]
splitVariantD returns 7 elements: ["1" "2" "" "3" "4" "" "" ]
Timing
Splitting the string 1,2,,3,4,, with the pattern , for 10000 times on my machine gives these timings:
splitVariantA() : 406 ms
splitVariantB() : 46 ms
splitVariantC() : 749 ms
splitVariantD() : 1077 ms
Unfortunately, my implementation D is the slowest. Surprisingly, the regular expressions implementation C is pretty fast.
Source code
// niol, modified
Array splitVariantA(string splitter, string str){
Array tokens = create(1, 1);
Buffer buf = create;
int str_index;
buf = "";
for(str_index = 0; str_index < length(str); str_index++){
if( str[str_index:str_index] == splitter ){
array_push_str(tokens, stringOf(buf));
buf = "";
}
else
buf += str[str_index:str_index];
}
array_push_str(tokens, stringOf(buf));
delete buf;
return tokens;
}
// PJT, modified
Skip splitVariantB(string s, string delimiter) {
int offset
int len
Skip skp = create
if ( findPlainText(s, delimiter, offset, len, false)) {
put(skp, 0, s[0 : offset -1])
put(skp, 1, s[offset +1 :])
}
return skp
}
// Brett, modified
Skip splitVariantC (string s, string delim) {
Skip skp = create
int i = 0
Regexp split = regexp "^(.*)" delim "(.*)$"
while (split s) {
string temp_s = s[match 1]
put(skp, i++, s[match 2])
s = temp_s
}
put(skp, i++, s[match 2])
return skp
}
Skip splitVariantD(string str, string pattern) {
if (null(pattern) || 0 == length(pattern))
pattern = " ";
if (pattern == " ")
str = stringStrip(stringSqueeze(str, ' '));
Skip result = create;
int i = 0; // index for searching in str
int j = 0; // index counter for result array
bool found = true;
while (found) {
// find pattern
int pos = 0;
int len = 0;
found = findPlainText(str[i:], pattern, pos, len, true);
if (found) {
// insert into result
put(result, j++, str[i:i+pos-1]);
i += pos + len;
}
}
// append the rest after last found pattern
put(result, j, str[i:]);
return result;
}
Quick join&split I could come up with. Seams to work okay.
int array_size(Array a){
int size = 0;
while( !null(get(a, size, 0) ) )
size++;
return size;
}
void array_push_str(Array a, string str){
int array_index = array_size(a);
put(a, str, array_index, 0);
}
string array_get_str(Array a, int index){
return (string get(a, index, 0));
}
string str_join(string joiner, Array str_array){
Buffer joined = create;
int array_index = 0;
joined += "";
for(array_index = 0; array_index < array_size(str_array); array_index++){
joined += array_get_str(str_array, array_index);
if( array_index + 1 < array_size(str_array) )
joined += joiner;
}
return stringOf(joined)
}
Array str_split(string splitter, string str){
Array tokens = create(1, 1);
Buffer buf = create;
int str_index;
buf = "";
for(str_index = 0; str_index < length(str); str_index++){
if( str[str_index:str_index] == splitter ){
array_push_str(tokens, stringOf(buf));
buf = "";
}else{
buf += str[str_index:str_index];
}
}
array_push_str(tokens, stringOf(buf));
delete buf;
return tokens;
}
If you only split the string once this is how I would do it:
string s = "We prefer questions that can be answered; not just discussed"
string sub = ";"
int offset
int len
if ( findPlainText(s, sub, offset, len, false)) {
/* the reason why I subtract one and add one is to remove the delimiter from the out put.
First print is to print the prefix and then second is the suffix.*/
print s[0 : offset -1]
print s[offset +1 :]
} else {
// no delimiter found
print "Failed to match"
}
You could also use regular expressions refer to the DXL reference manual. It would be better to use regular expressions if you want to split up the string by multiple delimiters such as str = "this ; is an;example"
ACTUALLY WORKS:
This solution will split as many times as needed, or none, if the delimiter doesn't exist in the string.
This is what I have used instead of a traditional "split" command.
It actually skips the creation of an array, and just loops through each string that would be in the array and calls "someFunction" on each of those strings.
string s = "We prefer questions that can be answered; not just discussed"
// for this example, ";" is used as the delimiter
Regexp split = regexp "^(.*);(.*)$"
// while a ";" exists in s
while (split s) {
// save the text before the last ";"
string temp_s = s[match 1]
// call someFunction on the text after the last ";"
someFunction(s[match 2])
// remove the text after the last ";" (including ";")
s = temp_s
}
// call someFunction again for the last (or only) string
someFunction(s)
Sorry for necroing an old post; I just didn't find the other answers useful.
Perhaps someone would find handy this fused solution as well. It splits string in Skip, based on delimiter, which can actually have length more then one.
Skip splitString(string s1, string delimit)
{
int offset, len
Skip splited = create
while(findPlainText(s1, delimit, offset, len, false))
{
put(splited, s1[0:offset-1], s1[0:offset-1])
s1 = s1[offset+length(delimit):length(s1)-1]
}
if(length(s1)>0)
{
put (splited, s1, s1)
}
return splited
}
I tried this out and worked out for me...
string s = "We prefer questions that can be answered,not just discussed,hiyas"
string sub = ","
int offset
int len
string s1=s
while(length(s1)>0){
if ( findPlainText(s1, sub, offset, len, false)) {
print s1[0 : offset -1]"\n"
s1= s1[offset+1:length(s1)]
}
else
{
print s1
s1=""
}
}
Here is a better implementation. This is a recursive split of the string by searching a keyword.
pragma runLim, 10000
string s = "We prefer questions that can be answered,not just discussed,hiyas;
Next Line,Var1,Nemesis;
Next Line,Var2,Nemesis1;
Next Line,Var3,Nemesis2;
New,Var4,Nemesis3;
Next Line,Var5,Nemesis4;
New,Var5,Nemesis5;"
string sub = ","
int offset
int len
string searchkey=null
string curr=s
string nxt=s
string searchline=null
string Modulename=""
string Attributename=""
string Attributevalue=""
while(findPlainText(curr,"Next Line", offset,len,false))
{
int intlen=offset
searchkey=curr[offset:length(curr)]
if(findPlainText(searchkey,"Next Line",offset,len,false))
{
curr=searchkey[offset+1:length(searchkey)]
}
if(findPlainText(searchkey,";",offset,len,false))
{
searchline=searchkey[0:offset]
}
int counter=0
while(length(searchline)>0)
{
if (findPlainText(searchline, sub, offset, len, false))
{
if(counter==0)
{
Modulename=searchline[0 : offset -1]
counter++
}
else if(counter==1)
{
Attributename=searchline[0 : offset -1]
counter++
}
searchline= searchline[offset+1:length(searchline)]
}
else
{
if(counter==2)
{
Attributevalue=searchline[0:length(searchline)-2]
counter++
}
searchline=""
}
}
print "Modulename="Modulename " Attributename=" Attributename " Attributevalue= "Attributevalue "\n"
}

Resources