How to join string by using readline() - readline

I coudn't find a solution and need help. I read a very hugh vCard text file with readline() on python3.
Now I have the problem, that some long text lines goes over multiple line like:
item1.ADR;type=HOME;type=pref:;;27-28\, Yakyu-cho 5-chome\nTokyo;Higashimat
suyama-shi;Saitama;355-8603;Japan
I search the content with:
if re.search (r'.ADR;', elm):
adr_list.append(elm)
may list looks:
['', '', '27-28\\, Yakyu-cho 5-chome\\nTokyo', 'Higashimat\n']
with line.strip():
['', '', '27-28\\, Yakyu-cho 5-chome\\nTokyo', 'Higashimat']
The desired output should look like this:
['', '', '27-28', 'Yakyu-cho 5-chome','Tokyo', 'Higashimatsuyama-shi','Saitama','355-8603','Japan']
I used also rstrip(), but I can't get the right solution.
Is there an easy way with readline() to detect the linebreak and concatenate the next related line/s?

Related

Ruby: How can I read a CSV file that contains two headers in Ruby?

I have a ".CSV" file that I'm trying to parse using CSV in ruby. The file has two rows of headers though and I've never encountered this before and don't know how to handle it. Below is an example of the headers and rows.
Row 1
"Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name","Rushing","","","","","Passing","","","","","","Total Off.","","Receiving","","","Pass Int","","","Fumble Ret","","","Punting","","Punt Ret","","","KO Ret","","","Total TD","Off xpts","","","","Def xpts","","","","FG","","Saf","Points"
Row 2
"","","","","","","Rushes","Gain","Loss","Net","TD","Att","Cmp","Int","Yards","TD","Conv","Plays","Yards","No.","Yards","TD","No.","Yards","TD","No.","Yards","TD","No.","Yards","No.","Yards","TD","No.","Yards","TD","","Kicks Att","Kicks Made","R/P Att","R/P Made","Kicks Att","Kicks Made","Int/Fum Att","Int/Fum Made","Att","Made"
Row 3
"721","AirForce","09/01/12","19","BASKA","DAVID","","","","","","","","","","","","0","0","","","","","","","","","","2","85","","","","","","","","","","","","","","","","","","","0"
There are no returns in the example above I just added them so it would be easier to read. Does CSV have methods available to handle this structure or will I have to write my own methods to handle this? Thanks!
It looks like your CSV file was produced from an Excel spreadsheet that has columns grouped like this:
... | Rushing | Passing | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...
(Not sure if I restored the groups properly.)
There is no standard tools to work with such kind of CSV files, AFAIK. You have to do the job manually.
Read the first line, treat it as first header line.
Read the second line, treat it as second header line.
Read the third line, treat it as first data line.
...
I'd recommend using the smarter_csv gem, and manually provide the correct headers:
require 'smarter_csv'
options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ],
:headers_in_file => false}
data = SmarterCSV.process(filename, options)
data.pop # to ignore the first header line
data.pop # to ignore the second header line
# data now contains an array of hashes with your data
Please check the GitHub page for the options, and examples.
https://github.com/tilo/smarter_csv
One option you should use is :user_provided_headers , and then simply specify the headers you want in an array. This way you can work around cases like this.
You will have to do data.pop to ignore the header lines in the file.
You'll have to write your own logic. CSV is really just rows and columns, and by itself has no inherent idea of what each column or row really is, it's just raw data. Thus, CSV has no concept or awareness that it has two header rows, that's a human thing, so you'll need to build your own heuristics.
Given that your data rows look like:
"721","Air Force","09/01/12",
When you start parsing your data, if the first column represents an integer, then, if you convert it to an int and if it's > 0 than you know you're dealing with a valid "row" and not a header.
Read the CSV in and skip the first line on output:
arr_of_arrs = CSV.read("path/to/file.csv")
arr_of_arrs[2..arr_of_arrs.length].each do |x|
# operation here
end
It's really easy to do this with CSV. Just watch to see what the current line number is that's been read, and loop until you've read the headers:
require 'csv'
CSV.foreach('test.csv') do |row|
next unless $. > 2
puts "'" + row.join("', '") + "'"
end
When run this is what is output:
'721', 'Air Force', '09/01/12', '19', 'BASKA', 'DAVID', '', '', '', '', '', '', '', '', '', '', '', '0', '0', '', '', '', '', '', '', '', '', '', '2', '85', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '0'
$. is the line-number of the last line read from the file that's opened. So, this immediately loops until $. has read two lines.

How do I scan url for a specific string with spaces and special characters?

I'm using stringscanner on my request URL in order to get the name of the user's currently selected category, but I've been having difficulty dealing with spaces and special characters.
request.url.scan(/\?category=\w+/).to_s.gsub('?category=', '')
URL examples followed by result
http://localhost:3000/search?category=dog&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog.com&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93 => ["dog"]
I'm trying to get ["dog"] ["dog.com"] and ["dog cat"], but am currently stuck. Any ideas?
Note: Considering removing spaces from categories and replacing them with dashes as multiple spaces could be problematic, but if it's possible to create one function to rule them all, that would be awesome.
This is Rails, is there a reason you're not just using params[:category]?
If you are trying to extract params then you could use parse_query :
uri = "http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93"
result = Rack::Utils.parse_query(URI(uri).query) #=> {"category"=>"dog cat", "search"=>"", "utf8"=>"\xE2\x9C\x93"}
result["category"] #=> dog cat

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.

How do you include hashtags within Twitter share link text?

I'm writing a site with a custom tweet button that uses the www.twitter.com/share function, however the problem I am having is including hash '#' characters within the tweet text.
For example:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+#branstonpickel+right+now
The tweet text comes out as 'I am eating' and omits the hash and everything after.
I had a quick look on the Twitter forums and learnt the hash '#' character cannot be part of the share url. On https://dev.twitter.com/discussions/512#comment-877 it was said that:
Hashes are special characters in the URL (they identify document fragments) so they, and anything following, does not get sent the server.
and
you need to URLEncode it, so use %23
When I tried the 2nd point in my test link:
www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branstonpickel+right+now
The tweet text came out as 'I am eating %23branstonpickel right now' literally including %23 instead of converting it to a hash.
Sorry for the waffely question, but does anyone know what it is I'm doing wrong?
Any feedback would be greatly appreciated :)
It looks like this is the basic setup:
https://twitter.com/intent/tweet?
url=<url to tweet>
text=<text to tweet>
hashtags=<comma separated list of hashtags, with no # on them>
This would pre-built a tweet of: <text> <url> <hashtags>
The above example would be:
https://twitter.com/intent/tweet?url=http://www.example.com&text=I+am+eating+branston+pickel+right+now&hashtags=bransonpickel,pickles
There used to be a bug with the hashtags parameter... it only showed the first n-1 hashtags. Currently this is fixed.
you can use %23 instead of hash (#) in url eg
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branston+%23pickel+right+now
I may be wrong but i think the hashtag has to be passed as a separate variable that will appear at the end of your tweet ie:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+branston+pickel+right+now&hashtag=bransonpickel
will result in "I am eating branston pickel right now #branstonpickle"
On a separate note, I think pickel should be pickle!
Cheers
Toby
use encodeURIComponent to encode the url
If you're using PHP, you can use the following:
<?php echo 'http://www.twitter.com/share?' . http_build_query(array(
'url' => 'http://www.example.com',
'text' => 'I am eating #branstonpickel right now'
)); ?>
This will do all the URL encoding for you, and it's easy to read.
For more information on the http_build_query, see the PHP manual:
http://us2.php.net/http_build_query
For url with line jump, # , # and special unicode in it, the following works :
var lineJump = encodeURI(String.fromCharCode(10)),
hash = "%23", arobase="%40",
tweetText = 'https://twitter.com/intent/tweet?text=Le signe chinois '+hans+' '+item.pinyin+': '+item.definition.replace(";",",")+'.'
+lineJump+'Merci '+arobase+'Inalco_Officiel '+arobase+'CRIparis ❤️🇨🇳 '
+lineJump+hash+'Chinois '+hash+'MOOC'
+lineJump+'https://hanzi.cri-paris.org/',
tweetTxtUrlEncoded = tweetText+ "" +encodeURIComponent('#'+lesson+encodeURIComponent(hans));
urlencode
https://twitter.com/intent/tweet?text=<?= urlencode("I am eating #branstonpickel right now"); ?>"
You can just use this code and modify it
20% means space
23% means hashtag
In JS you can easily encode the special characters using encoreURIComponent.
(Warning: don't use encodeURI as "#" and "#" are not escaped.)
Here's an example with mention and hashtag:
const text = "Hello #world ! Go follow #StackOverflow";
const tweetUrl = `https://twitter.com/intent/tweet?text=${ encodeURIComponent(text) }`;

Remove empty paragraphs

I'm importing an RSS feed which has a series of empty paragraphs "<p> </p>".
I am using gsub however it's not stripping the elements from the document:
document.gsub(/<p>\s*<\/p>/,"") or gsub(/<p> <\/p>/,"")
Is there an alternative method or a mistake in the above?
The below appears to work?
gsub(/<p>.<\/p>/,"")
Correct regex like in example:
>> document = "<p>\n\n\n \n</p>aaa<p> </p>bbb"
=> "<p>\n\n\n \n</p>aaa<p> </p>bbb"
>> document.gsub(/<p>[\s$]*<\/p>/, '')
=> "aaabbb"
If the paragraph elements in your RSS feed uses id and classes try this:
gsub(/\<p(\s((class)|(id))=[\'\"][A-z0-9\s]+[\'\"]\s*)*\>\s*\<\/p\>/,"")

Resources