Convert List to DataFrame after parsing from web - parsing

I am a newbie with python. I want to recreate the table on CME website down below, however I am not able to convert the lists I have created to a data frame. any help much appreciated! Thanks in advance!
url = "http://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude_product_calendar_futures.html"
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(req)
soup = BeautifulSoup(response)
header = soup.findAll('th',limit = 8)
column_header = []
for j in header:
column_header.append(j.getText())
data_rows = soup.findAll('tr')[2:]
dates = []
for i in range(len(data_rows)):
for td in data_rows[i].findAll('td'):
dates.append(td.getText())

from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude_product_calendar_futures.html")
soup = BeautifulSoup(r.content, "lxml")
headers = [th.text for th in soup.thead.find_all('th')] # use thead to narrow the scope
print(headers)
for tr in soup.tbody.find_all('tr'):
row = [i.get_text(strip=True) for i in tr(['th', 'td'])]
print(row)
out:
['Contract Month', 'Product Code', 'First TradeLast Trade', 'Settlement', 'First HoldingLast Holding', 'First PositionLast Position', 'First NoticeLast Notice', 'First DeliveryLast Delivery']
['Feb 2017', 'CLG17', '21 Nov 201120 Jan 2017', '20 Jan 2017', '--', '23 Jan 201723 Jan 2017', '24 Jan 201724 Jan 2017', '01 Feb 201728 Feb 2017']
['Mar 2017', 'CLH17', '21 Nov 201121 Feb 2017', '21 Feb 2017', '--', '22 Feb 201722 Feb 2017', '23 Feb 201723 Feb 2017', '01 Mar 201731 Mar 2017']
['Apr 2017', 'CLJ17', '21 Nov 201121 Mar 2017', '21 Mar 2017', '--', '22 Mar 201722 Mar 2017', '23 Mar 201723 Mar 2017', '01 Apr 201730 Apr 2017']
['May 2017', 'CLK17', '21 Nov 201120 Apr 2017', '20 Apr 2017', '--', '21 Apr 201721 Apr 2017', '24 Apr 201724 Apr 2017', '01 May 201731 May 2017']
Use .tbody or .thead to narrow down the scope, do not use limit.
Use list comprehension to avoid use append

Related

Group by week in the first column of an array ruby

I have the next bidimensional array, where the first componente belongs to ActiveSupport::TimeWithZone and the second component is a string
[[Sun, 16 Jul 2017 14:41:56 -03 -03:00, "open"],
[Sun, 16 Jul 2017 14:41:56 -03 -03:00, "closed"],
[Sun, 16 Jul 2017 14:41:56 -03 -03:00, "closed"],
[Mon, 10 Jul 2017 00:00:00 -03 -03:00, "open"],
[Sun, 16 Jul 2017 14:45:31 -03 -03:00, "closed"],
[Sun, 16 Jul 2017 14:44:41 -03 -03:00, "open"],
[Sun, 16 Jul 2017 14:44:39 -03 -03:00, "closed"],
[Sun, 16 Jul 2017 14:44:13 -03 -03:00, "open"],
[Mon, 10 Jul 2017 00:00:00 -03 -03:00, "closed"],
[Fri, 14 Jul 2017 00:00:00 -03 -03:00, "open"],
[Mon, 17 Jul 2017 00:00:00 -03 -03:00, "open"]]
I need to convert that array in efficient way into
{["09-Jul", "open"]=>2, ["16-Jul", "open"]=>1, ["09-Jul", "closed"]=>0, ["16-Jul", "closed"]=>1}
That is, I need to convert the first component into the format %b-%d. Also, I need group by week and "status". Finally I need to count these grouped values and present the data with hash format as the second example
input.group_by { |d, v| [d.strftime('%b-%d'), v] }
.map { |k, v| [k, v.count] }.to_h
Also, for Ruby 2.4+ it could be simplified (credits go to #MarkThomas) to:
input.group_by { |d, v| [d.strftime('%b-%d'), v] }
.transform_values(&:count)
You could create the desired hash with the form of Hash::new that takes an argument equal to the default value of the hash, which here we want to be zero. What that means is that if a hash h, defined h = Hash.new(0), does not have a key k, then h[k] returns the default value (here 0), without modifying the hash h.
input.each_with_object(Hash.new(0)) { |(d,v),h| h[[d.strftime('%b-%d'), v]] += 1

Ruby on Rails array group sort by value

I've got an array with with arrays, containing a key and a timestamp.
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 22:00:51 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 22:00:32 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 21:58:33 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 21:58:01 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 21:58:51 CEST +02:00],
["3wyadsrrdxtgieyxx_lgka", Sat, 13 May 2017 01:09:01 CEST +02:00],
["y-5he42vlloggjb_whm8jw", Sat, 22 Apr 2017 22:48:31 CEST +02:00],
["oaxej30u9we17onlug4orw", Sun, 23 Apr 2017 01:46:48 CEST +02:00],
["oaxej30u9we17onlug4orw", Sun, 23 Apr 2017 02:06:56 CEST +02:00],
["rqjwg1ka43mvri0dmrdxvg", Sun, 23 Apr 2017 17:23:34 CEST +02:00],
["ok8nq6tg-kor9jglsuhoyw", Tue, 25 Apr 2017 13:02:16 CEST +02:00],
["riwfm0m-0rmbb6e9kyug2g", Sat, 06 May 2017 06:12:27 CEST +02:00],
["riwfm0m-0rmbb6e9kyug2g", Sat, 06 May 2017 06:17:01 CEST +02:00],
["riwfm0m-0rmbb6e9kyug2g", Sat, 06 May 2017 06:18:04 CEST +02:00],
["gbqfn3_d_tritqoey5khjw", Sat, 06 May 2017 14:14:55 CEST +02:00],
["j___x1oap-veh0u1fo_oua", Sun, 07 May 2017 14:22:37 CEST +02:00],
...
I received this list by ActiveRecord.
MyModel.all.pluck(:token, :created_at)
The Model containing some uniq tokens and some duplicates.
The duplicates are interesting.
I want to group the timestaps by the key and look for the first and the last timestamp for each key.
So I grouped the array as following:
grp = arr.group_by { |key, ts| key}
Now I receive a list like this:
"vwfv8n5obwqmaw8r9fj-yq"=>[
["vwfv8n5obwqmaw8r9fj-yq", Thu, 11 May 2017 10:24:42 CEST +02:00]
],
"kacec6ybetpjdzlfgnnxya"=> [
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 22:00:31 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 22:01:43 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 21:58:17 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 21:59:05 CEST +02:00],
["kacec6ybetpjdzlfgnnxya", Fri, 12 May 2017 21:59:59 CEST +02:00]
],
...
Is it possible to sort the dates to get the first and the last date easily?
Am I too complicated? I think there should be an easier way to handle the raw data.
To get a a hash with the token as the key and the timestamps as values:
# this gives the same MIN and MAX if there is only one created_at in the group
rows = MyModel.group(:token)
.pluck("token, MIN(created_at), MAX(created_at)")
# loop though rows and create a hash
rows.each_with_object({}) do |(token, *t), hash|
hash[token] = t.uniq # removes dupes
end
{
"rqjwg1ka43mvri0dmrdxvg"=>[2017-04-23 15:23:34 UTC],
"riwfm0m-0rmbb6e9kyug2g"=>[2017-05-06 04:12:27 UTC, 2017-05-06 04:18:04 UTC]
# ...
}
If you are simply looking for the records which have duplicates you can just use a WHERE clause that counts the records:
MyModel.where("(SELECT COUNT(*) FROM things t WHERE t.token = things.token) > 1")
You could do this:
# you already have this bit
grp = arr.group_by { |key, ts| key}
# get the minmax values for each group
grp.map { |k, values_array| { k => values_array.minmax } }.reduce Hash.new, :merge
This should yield something that looks like:
{
"vwfv8n5obwqmaw8r9fj-yq"=>[
[Thu, 11 May 2017 10:24:42 CEST +02:00, Thu, 11 May 2017 10:24:42 CEST +02:00]
],
"kacec6ybetpjdzlfgnnxya"=> [
[Fri, 12 May 2017 21:58:17 CEST +02:00, Fri, 12 May 2017 22:01:43 CEST +02:00]
],
...
}
try something like this:
MyModel.order(:created_at).pluck(:token, :created_at).group_by { |key, ts| key }.flat_map{ |k, v| { k => [v.first, v.last] } }

How to detect if dates are consecutive in Rails?

I'm looking to check for consecutive dates and display them differently if they are consecutive.
I'm working with Garage Sales that have multiple dates per sale. I'd like to then cycle through each date, and group any consecutive dates to display as a group: Ex: Apr 28 - Apr 30
I also need to account for non-consecutive dates:
Ex: Apr 15, Apr 28 - 30
Additionally, weekly dates need to be recognized as non-consecutive (so basically avoid step checks):
Ex: Apr 16, Apr 23, Apr 30
So far, I'm taking each date that hasn't passed & ordering them properly.
garage_sale_dates.where("date > ?",Time.now).order(:date)
Thanks for any help! Let me know if any other info is needed.
You can use the slice_before method of Enumerable
dates = [Date.yesterday, Date.today, Date.tomorrow, Date.parse('2016-05-01'), Date.parse('2016-05-02'), Date.parse('2016-05-05')]
# => [Wed, 27 Apr 2016, Thu, 28 Apr 2016, Fri, 29 Apr 2016, Sun, 01 May 2016, Mon, 02 May 2016, Thu, 05 May 2016]
prev = dates.first
dates.slice_before { |d| prev, prev2 = d, prev; prev2 + 1.day != d }.to_a
# => [[Wed, 27 Apr 2016, Thu, 28 Apr 2016, Fri, 29 Apr 2016], [Sun, 01 May 2016, Mon, 02 May 2016], [Thu, 05 May 2016]]
Then you can simply join the 2-or-more-element arrays from the result with "-", and leave the single element arrays intact:
prev = dates.first
dates.slice_before { |d| prev, prev2 = d, prev; prev2 + 1.day != d }.
map{|d| d.size > 1 ? "#{d.first.to_s} - #{d.last.to_s}" : d.first.to_s }
# => ["2016-04-27 - 2016-04-29", "2016-05-01 - 2016-05-02", "2016-05-05"]
There is even a commented example in the docs that is technically equivalent to yours (but deals with integers, not dates).
a simple script would do the trick
def date_list(dates)
result = []
dates.each do |date|
if result.empty? || (date - result.last.last).to_i != 1
result << [date]
else
result.last << date
end
end
result
end
# make sure dates is an array of dates
dates = [Thu, 28 Apr 2016, Fri, 29 Apr 2016, Sun, 01 May 2016, Mon, 02 May 2016, Tue, 03 May 2016, Thu, 05 May 2016, Fri, 06 May 2016, Sat, 07 May 2016, Sun, 08 May 2016]
#this would give you an array of date ranges that you wanted
date_list(dates)
=> [
[Thu, 28 Apr 2016, Fri, 29 Apr 2016],
[Sun, 01 May 2016, Mon, 02 May 2016, Tue, 03 May 2016],
[Thu, 05 May 2016, Fri, 06 May 2016, Sat, 07 May 2016, Sun, 08 May 2016]
]
I found a simpler solution using chunk_while method (available from Ruby 2.4.6).
Based on BoraMa answer:
dates = [Date.yesterday, Date.today, Date.tomorrow, Date.parse('2020-05-01'), Date.parse('2020-05-02'), Date.parse('2020-05-05')]
# => [Thu, 26 Mar 2020, Fri, 27 Mar 2020, Sat, 28 Mar 2020, Fri, 01 May 2020, Sat, 02 May 2020, Tue, 05 May 2020]
dates.chunk_while { |date_before, date_after| (date_after - date_before).to_i == 1 }.to_a
# => [[Thu, 26 Mar 2020, Fri, 27 Mar 2020, Sat, 28 Mar 2020], [Fri, 01 May 2020, Sat, 02 May 2020], [Tue, 05 May 2020]]
I think is more readable and it needs less steps to get same result.
Using a Date you can do the subtraction, it would be something like:
date1 = 1.day.ago
date2 = 2.day.ago
(date1.to_date - date2.to_date).to_i
=> 1
Adding onto Matouš Borák's answer, if you want to change the date format such that it would match what was asked in the question i.e. Apr 15, Apr 28 - 30
dates = [Date.yesterday, Date.today, Date.tomorrow, Date.parse('2021-10-01'), Date.parse('2021-10-02'), Date.parse('2021-10-05')]
# => [Wed, 01 Sep 2021, Thu, 02 Sep 2021, Fri, 03 Sep 2021, Fri, 01 Oct 2021, Sat, 02 Oct 2021, Tue, 05 Oct 2021]
prev = dates.first
dates.slice_before { |d| prev, prev2 = d, prev; prev2 + 1.day != d }.
map{|d| d.size > 1 ? "#{d.first.strftime('%b %d')} - #{d.last.strftime('%d')}" : d.first.strftime('%b %d') }
# => ["Sep 01 - 03", "Oct 01 - 02", "Oct 05"]

Is there any way to find dates for all wednesday in next 6 months [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have use case where if user selects date like 21th May 2014 (i.e Wednesday). Then, i need to find the date for all the Wednesday for next 6 months.
Is there any easier way or api for it in rails?
2.1.1 :041 > date = Date.today + 1.day
Wed, 21 May 2014
2.1.1 :042 > array = []
[]
2.1.1 :043 > 24.times { array.push(date); date += 7.days; }
24
2.1.1 :044 > array
[
[ 0] Wed, 21 May 2014,
[ 1] Wed, 28 May 2014,
[ 2] Wed, 04 Jun 2014,
[ 3] Wed, 11 Jun 2014,
[ 4] Wed, 18 Jun 2014,
[ 5] Wed, 25 Jun 2014,
[ 6] Wed, 02 Jul 2014,
[ 7] Wed, 09 Jul 2014,
[ 8] Wed, 16 Jul 2014,
[ 9] Wed, 23 Jul 2014,
[10] Wed, 30 Jul 2014,
[11] Wed, 06 Aug 2014,
[12] Wed, 13 Aug 2014,
[13] Wed, 20 Aug 2014,
[14] Wed, 27 Aug 2014,
[15] Wed, 03 Sep 2014,
[16] Wed, 10 Sep 2014,
[17] Wed, 17 Sep 2014,
[18] Wed, 24 Sep 2014,
[19] Wed, 01 Oct 2014,
[20] Wed, 08 Oct 2014,
[21] Wed, 15 Oct 2014,
[22] Wed, 22 Oct 2014,
[23] Wed, 29 Oct 2014
]
require 'date'
today = Date.today
six_months = today.next_month(6)
(today..six_months).select {|date| date.wday == today.wday}
or if you actually just want all Wednesdays:
(today..six_months).select &:wednesday?

Change a hash into an array

I have an Rails 4 application that collects attendance at church services. Some weeks there are two services and some weeks there is only one. I need to get the total attendance for each week and show it as a graph.
By calling:
Stat.calculate(:sum, :attendance, group: :date)
in the console I have been able to collect the data in a hash like this:
{Sun, 06 Jan 2013=>66, Sun, 13 Jan 2013=>65, Sun, 20 Jan 2013=>60, Sun, 27 Jan 2013=>67, Sun, 03 Feb 2013=>60, Sun, 10 Feb 2013=>76, Sun, 17 Feb 2013=>65, Sun, 24 Feb 2013=>52, Sun, 03 Mar 2013=>52, Sun, 10 Mar 2013=>45, Sun, 17 Mar 2013=>56, Sun, 24 Mar 2013=>134, Sun, 31 Mar 2013=>76, Sun, 07 Apr 2013=>88, Sun, 14 Apr 2013=>87, Sun, 28 Apr 2013=>93, Sun, 05 May 2013=>93, Sun, 12 May 2013=>95, Sun, 19 May 2013=>90, Sun, 26 May 2013=>87, Sun, 02 Jun 2013=>71, Sun, 09 Jun 2013=>86, Sun, 16 Jun 2013=>109, Sun, 23 Jun 2013=>80, Sun, 30 Jun 2013=>68, Sun, 07 Jul 2013=>75, Sun, 14 Jul 2013=>73}
But What I need for my chart is an array of hashes in the form of:
{date: "Sun, 23 Jun 2013", attendance: 80}, {date: "Sun, 30 Jun 2013", attendance: 68"}
So I am trying to figure out how to convert the first form into the second form.
I'm sure its something pretty easy, but my limited rails knowledge is hitting a wall.
.collect{|key,value| {:date => key, :attendance => value} }
loop through and create a new hash where the original key becomes the value for date and the original value becomes the value for attendance. These new hashes are collected into an array.
You can think as below:
h = {"Sun, 06 Jan 2013"=>66, "Sun, 13 Jan 2013"=>65, "Sun, 20 Jan 2013"=>60 }
h.map{|k,v| Hash[:date,k,:attend,v]}
# => [{:date=>"Sun, 06 Jan 2013", :attend=>66},
# {:date=>"Sun, 13 Jan 2013", :attend=>65},
# {:date=>"Sun, 20 Jan 2013", :attend=>60}]

Resources