I've stole some code from "Expert F# 2.0", that shows how to build a webcrawler, using MailboxProcessor. As you see, then I have a print expression at line 23, that prints the current number of urls in the visited Set. Also the number of urls to crawl is limited by 49.
open System
open System.Net
open System.Text.RegularExpressions
open Microsoft.FSharp.Control.WebExtensions
let getLinks (txt:string) =
[ for m in Regex.Matches(txt, "href=\s*\"[^\"h]*(http://[^&\"]*)\"") -> m.Groups.Item(1).Value ]
let collectLinks (url:string) =
async { let web = new WebClient()
let! data = web.AsyncDownloadString <| Uri url
let links = getLinks data
return links }
let urlCollector =
MailboxProcessor.Start(fun self ->
let rec waitForUrl (visited : Set<string>) =
async { // Checks whether we have reached the limit of pages to crawl
if visited.Count < 50 then
// Waits for a URL...
let! url = self.Receive()
printfn "%A | %A" visited.Count url
// If not the URL already has been crawled...
if not (visited.Contains url) then
// Start
do! Async.StartChild(
async { let! links = collectLinks url
Seq.iter self.Post links}) |> Async.Ignore
return! waitForUrl (visited.Add url) }
waitForUrl Set.empty)
urlCollector.Post "http://news.google.com/"
That's seems alright eh? - But now the output looks like:
0 | "http://news.google.com/"
1 | "http://www.gstatic.com/news/img/favicon.ico"
2 | "http://mail.google.com/mail/?tab=nm"
3 | "http://www.google.com/intl/en/options/"
4 | "http://docs.google.com/?tab=no"
5 | "http://www.google.com/reader/?tab=ny"
6 | "http://sites.google.com/?tab=n3"
7 | "http://www.google.com/intl/en/options/"
7 | "http://www.google.com/preferences?hl=en"
8 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
9 | "http://www.bloomberg.com/news/2011-08-07/london-rioters-clash-with-police-loot-in-tottenham-after-shooting-death.html"
10 | "http://www.hindustantimes.com/Rioters-battle-police-after-shooting-protest/Article1-730371.aspx"
11 | "http://www.telegraph.co.uk/news/uknews/crime/8687177/Tottenham-riot-live.html"
12 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
12 | "http://www.montrealgazette.com/London+wakes+riot+aftermath/5218849/story.html"
13 | "http://themediablog.typepad.com/the-media-blog/2011/08/daily-mail-tottenham-violence-twitter.html"
14 | "http://en.wikipedia.org/wiki/2011_Tottenham_riots"
15 | "http://www.babnet.net/festivaldetail-37897.asp"
16 | "http://www.youtube.com/watch?v=l9UImSbegj4"
17 | "http://www.babnet.net/festivaldetail-37897.asp"
17 | "http://www.youtube.com/watch?v=l9UImSbegj4"
17 | "http://www.telegraph.co.uk/news/uknews/crime/8687177/Tottenham-riot-live.html"
17 | "http://www.telegraph.co.uk/news/uknews/crime/8687177/Tottenham-riot-live.html"
17 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
17 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
17 | "http://www.bbc.co.uk/news/uk-14436001"
18 | "http://www.bbc.co.uk/news/uk-14436001"
18 | "http://www.kbc.co.ke/news.asp?nid=71755"
19 | "http://www.kbc.co.ke/news.asp?nid=71755"
19 | "http://news.sky.com/skynews/Home/UK-News/Tottenham-Riots-Simmering-Anger-Erupts-In-North-London-After-Protest-At-Mans-Shooting-Death/Article/201108116045172?f=rss"
20 | "http://news.sky.com/skynews/Home/UK-News/Tottenham-Riots-Simmering-Anger-Erupts-In-North-London-After-Protest-At-Mans-Shooting-Death/Article/201108116045172?f=rss"
20 | "http://www.irishtimes.com/newspaper/breaking/2011/0807/breaking2.html?via=mr"
21 | "http://www.irishtimes.com/newspaper/breaking/2011/0807/breaking2.html?via=mr"
21 | "http://www.cbc.ca/news/world/story/2011/08/07/tottenham-riot.html"
22 | "http://www.cbc.ca/news/world/story/2011/08/07/tottenham-riot.html"
22 | "http://www.newsday.com/news/police-officer-hospitalized-7-injured-in-uk-riot-1.3079769"
23 | "http://www.newsday.com/news/police-officer-hospitalized-7-injured-in-uk-riot-1.3079769"
23 | "http://www.msnbc.msn.com/id/44049721/ns/world_news-europe/"
24 | "http://www.msnbc.msn.com/id/44049721/ns/world_news-europe/"
24 | "http://www.timeslive.co.za/world/2011/08/07/eight-london-police-hospitalised-after-riots"
25 | "http://www.timeslive.co.za/world/2011/08/07/eight-london-police-hospitalised-after-riots"
25 | "http://www.cnn.com/2011/WORLD/europe/08/07/uk.riots/"
26 | "http://www.cnn.com/2011/WORLD/europe/08/07/uk.riots/"
26 | "http://www.dailymail.co.uk/news/article-2023348/Tottenham-anarchy-Grim-echo-1985-Broadwater-farm-riot.html"
27 | "http://www.dailymail.co.uk/news/article-2023348/Tottenham-anarchy-Grim-echo-1985-Broadwater-farm-riot.html"
27 | "http://www.mirror.co.uk/news/top-stories/2011/08/06/tottenham-riot-protesters-torch-police-cars-shops-and-a-bus-115875-23325724/"
28 | "http://www.mirror.co.uk/news/top-stories/2011/08/06/tottenham-riot-protesters-torch-police-cars-shops-and-a-bus-115875-23325724/"
28 | "http://www.theglobeandmail.com/news/world/images-of-the-destruction-from-londons-tottenham-riots/article2122026/"
29 | "http://www.theglobeandmail.com/news/world/images-of-the-destruction-from-londons-tottenham-riots/article2122026/"
29 | "http://thelede.blogs.nytimes.com/2011/08/06/shops-and-cars-burn-in-anti-police-riot-in-london/"
30 | "http://thelede.blogs.nytimes.com/2011/08/06/shops-and-cars-burn-in-anti-police-riot-in-london/"
30 | "http://www.stuff.co.nz/world/5403614/Crowds-attack-police-after-UK-protest"
31 | "http://www.stuff.co.nz/world/5403614/Crowds-attack-police-after-UK-protest"
31 | "http://www.google.com/hostednews/afp/article/ALeqM5jOCV_DVSYR1S50v6vdSBjsR5H9Jw?docId=CNG.36dce69df0a155bfd2fa1a3a5f92f6e1.5c1"
32 | "http://www.google.com/hostednews/afp/article/ALeqM5jOCV_DVSYR1S50v6vdSBjsR5H9Jw?docId=CNG.36dce69df0a155bfd2fa1a3a5f92f6e1.5c1"
32 | "http://fallenscoop.com/16993/tottenham-riot-2011-north-london-burns-after-protest-of-mark-duggan"
33 | "http://fallenscoop.com/16993/tottenham-riot-2011-north-london-burns-after-protest-of-mark-duggan"
33 | "http://www.thedailybeast.com/cheats/2011/08/07/riots-grip-north-london.html"
34 | "http://www.thedailybeast.com/cheats/2011/08/07/riots-grip-north-london.html"
34 | "http://www.thehindu.com/news/article2333142.ece"
35 | "http://www.sfgate.com/cgi-bin/article.cgi?f=/g/a/2011/08/07/bloomberg1376-LPHCT11A1I4H01-3ULNPF643I4ERSIU09MO54CQ4B.DTL"
36 | "http://online.wsj.com/community/groups/question-day-229/topics/do-you-agree-sps-decision?commentid=2864110"
37 | "http://www.businessweek.com/ap/financialnews/D9OUMJVO1.htm"
38 | "http://www.cnn.com/2011/BUSINESS/08/06/global.economy.cnn/"
39 | "http://www.chicagotribune.com/news/opinion/editorials/ct-edit-credit-20110806,0,6468631.story"
40 | "http://www.foxbusiness.com/markets/2011/08/07/treasury-hits-back-against-sp-downgrade/"
41 | "http://en.wikipedia.org/wiki/Standard_%26_Poor%27s"
42 | "http://www.usatoday.com/money/companies/management/2011-08-07-verizon-strike_n.htm"
43 | "http://www.businessweek.com/ap/financialnews/D9OV028O3.htm"
44 | "http://www.nbcnewyork.com/news/local/Verizon-Workers-Demonstrate-in-Manhattan-Part-of-45K-Worker-Strike-127087478.html"
45 | "http://www.poughkeepsiejournal.com/article/20110807/NEWS03/110807003/45K-Verizon-workers-strike-over-new-labor-contract-?odyssey=tab%7Ctopnews%7Ctext%7CPoughkeepsieJournal.com"
46 | "http://www.nypost.com/p/news/national/verizon_hit_by_strike_Ga9JjKphZrKCEAr608bqkI"
47 | "http://www.nytimes.com/2011/08/07/us/07verizon.html"
48 | "http://www.ctv.ca/CTVNews/World/20110807/afghanistan-helicopter-crash-fighting-110807/"
49 | "http://abcnews.go.com/International/nato-crash-team-seal-members-killed-afghanistan/story?id=14249189"
What's up with all the duplicates? Also why does some of them print the same "current urls in visited Set" (like 17, 33, 34 etc.)? I'm pretty sure, that I miss something totally fundamental, but I cant figure out what.
In your snippet, the printing using printfn is done before you check if the URL is already present in the set. This means that it will print the URL even if it will not be added in the next step. (You can see that it wasn't added if you look at the numbers in the left column - if the count wasn't incremented, the number on the next line is the same).
Moving printfn to the body of the if expression should give the expected results:
// Waits for a URL...
let! url = self.Receive()
// If not the URL already has been crawled...
if not (visited.Contains url) then
printfn "%A | %A" visited.Count url
// Start
Related
Given a sheet like this:
+ ------ + ------- + ---------- + ---------- + ---------- + ---------- +
| A | B | C | D | E | F |
+ -------+ ------- + ---------- + ---------- + ---------- + ---------- +
| AVG | ITEMS | Week 3 May | Week 2 May | Week 1 May | Week 5 Apr |
|=QUERY()| Item 1 | 1263 | 1255 | 1142 | 956 |
| | Item 2 | 1371 | 1263 | 1023 | 1120 |
| | Item 3 | 1382 | 1257 | 1352 | 1853 |
| | Item 4 | 1429 | 1281 | 1120 | 1869 |
I need to move the column B to the first column (A).
Make a script to add a new colum for new entries.
1.-
In AVG column (column A in the example above) there is an average using the formula:
=QUERY(transpose(query(transpose(B2:$F),"Select "®EXREPLACE(join("",ArrayFormula(if(len(B2:B),"Avg(Col"&ROW($C2:$C)-ROW($C2)+1&"),",""))), ".\z","")&"")),"Select Col2")
This formula calculates the average of the last 4 weeks only if there's an entry in column B
I need to move this column to the right of the Items list (column B) but when I try to, the formula shows a circular dependency error. Is there a way to tell the formula to only pick the columns I want to?
2.-
There's also a button with an assigned macro to make a new column on the left of the latest week for new entries and insert the week number and month, this is the script:
function onEdit() {
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('C:C').activate();
spreadsheet.getActiveSheet().insertColumnsBefore(spreadsheet.getActiveRange().getColumn(), 1);
spreadsheet.getActiveRange().offset(0, 0, spreadsheet.getActiveRange().getNumRows(), 1).activate();
spreadsheet.getRange('C1').activate()
.setFormula('=CONCATENATE("Week ",(WEEKNUM(TODAY(),2)-WEEKNUM(EOMONTH(TODAY(),-1)+1)+1)," ",CHOOSE(MONTH(TODAY()),"Jan","Feb","Mar","Apr","May","Jun","Jul","Ago","Sep","Oct","Nov","Dec"))');
};
So it becomes something like this:
+ ------ + ------- + ---------- + ---------- + ---------- + ---------- + ---------- +
| A | B | C | D | E | F | G |
+ -------+ ------- + ---------- + ---------- + ---------- + ---------- + ---------- +
| AVG | ITEMS | Week X MMM | Week 3 May | Week 2 May | Week 1 May | Week 5 Apr |
| | Item 1 | (NEW WEEK) | 1263 | 1255 | 1142 | 956 |
and this is the formula I am using for the week number:
=CONCATENATE("Week ",(WEEKNUM(TODAY(),2)-WEEKNUM(EOMONTH(TODAY(),-1)+1)+1)," ",CHOOSE(MONTH(TODAY()),"Jan","Feb","Mar","Apr","May","Jun","Jul","Ago","Sep","Oct","Nov","Dec"))
The problem with the formula is that uses TODAY() function, which has a variable value while I need a static value. Also when using the script the Conditional Formatting is not carried over in the new column. How can I improve the script?
For replacing TODAY() with a static value, use the Javascript date methods and format the date according to your spreadsheet setting with Utilities.formatDate()
Replace the formula part TODAY() with the variable containing the date object as explained above
Sample:
function onEdit() {
var spreadsheet = SpreadsheetApp.getActive();
var now = new Date();
var today = '"' + Utilities.formatDate(now, spreadsheet.getSpreadsheetTimeZone(), "MM/dd/yyyy") + '"';
spreadsheet.getRange('C:C').activate();
spreadsheet.getActiveSheet().insertColumnsBefore(spreadsheet.getActiveRange().getColumn(), 1);
spreadsheet.getActiveRange().offset(0, 0, spreadsheet.getActiveRange().getNumRows(), 1).activate();
spreadsheet.getRange('C1').activate()
.setFormula('=CONCATENATE("Week ",(WEEKNUM(' +today + ',2)-WEEKNUM(EOMONTH(' + today + ',-1)+1)+1)," ",CHOOSE(MONTH(' + today + '),"Jan","Feb","Mar","Apr","May","Jun","Jul","Ago","Sep","Oct","Nov","Dec"))');
};
Note: If your spreadsheet date format is no "MM/dd/yyyy" - modify formatDate(date, timeZone, format) accordingly, see here
The conditional formatting has to be set separately from setFormula, see here for a sample.
what should one specify when creating a graphlab recommender model such that the item that a user already own is not recommended to him again? Can this be done directly by specifying certain parameters or do I need to write a recommender from scratch.? data would look something like this
| user_id | item_id | othercolumns |
|:-----------|------------:|:------------:|
| 1 | 21 | This |
| 2 | 22 | column |
| 1 | 23 | will |
| 3 | 24 | hold |
| 2 | 25 | other |
| 1 | 26 | values |
Since item 21,23 and 26 are already owned by user 1 this item should not be recommended to him.
This behaviour is controlled by the exclude_known parameter of the recommender.recommend method (doc).
exclude_known : bool, optional
By default, all user-item interactions previously seen in the training
data, or in any new data provided using new_observation_data.., are
excluded from the recommendations. Passing in exclude_known = False
overrides this behavior.
Example
>>> import graphlab as gl
>>> sf = gl.SFrame({'user_id':[1,2,1,3,2,1], 'item_id':[21,22,23,24,25,26]})
>>> print sf
+---------+---------+
| item_id | user_id |
+---------+---------+
| 21 | 1 |
| 22 | 2 |
| 23 | 1 |
| 24 | 3 |
| 25 | 2 |
| 26 | 1 |
+---------+---------+
[6 rows x 2 columns]
>>> rec_model = gl.recommender.create(sf)
>>> # we recommend items not owned by user
>>> rec_wo_own_item = rec_model.recommend(sf['user_id'].unique())
>>> rec_wo_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id | score | rank |
+---------+---------+----------------+------+
| 1 | 22 | 0.0 | 1 |
| 1 | 24 | 0.0 | 2 |
| 1 | 25 | 0.0 | 3 |
| 2 | 21 | 0.0 | 1 |
| 2 | 23 | 0.0 | 2 |
| 2 | 24 | 0.0 | 3 |
| 2 | 26 | 0.0 | 4 |
| 3 | 21 | 0.333333333333 | 1 |
| 3 | 23 | 0.333333333333 | 2 |
| 3 | 26 | 0.333333333333 | 3 |
| 3 | 22 | 0.166666666667 | 4 |
| 3 | 25 | 0.166666666667 | 5 |
+---------+---------+----------------+------+
[12 rows x 4 columns]
>>> # we recommend items owned by user
>>> rec_w_own_item = rec_model.recommend(sf['user_id'].unique(), exclude_known=False)
>>> rec_w_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id | score | rank |
+---------+---------+----------------+------+
| 1 | 21 | 0.666666666667 | 1 |
| 1 | 23 | 0.666666666667 | 2 |
| 1 | 26 | 0.666666666667 | 3 |
| 1 | 22 | 0.0 | 4 |
| 1 | 24 | 0.0 | 5 |
| 1 | 25 | 0.0 | 6 |
| 2 | 26 | 0.0 | 6 |
| 2 | 24 | 0.0 | 5 |
| 2 | 23 | 0.0 | 4 |
| 2 | 21 | 0.0 | 3 |
| 2 | 25 | 0.5 | 2 |
| 2 | 22 | 0.5 | 1 |
| 3 | 24 | 0.0 | 6 |
| 3 | 25 | 0.166666666667 | 5 |
| 3 | 22 | 0.166666666667 | 4 |
| 3 | 26 | 0.333333333333 | 3 |
| 3 | 23 | 0.333333333333 | 2 |
| 3 | 21 | 0.333333333333 | 1 |
+---------+---------+----------------+------+
[18 rows x 4 columns]
>>> # we add recommended items not owned by user to the original SFrame
>>> rec = rec_wo_own_item.groupby('user_id', {'reco':gl.aggregate.CONCAT('item_id')})
>>> sf = sf.join(rec, 'user_id', 'left')
>>> print sf
+---------+---------+----------------------+
| item_id | user_id | reco |
+---------+---------+----------------------+
| 21 | 1 | [24, 25, 22] |
| 22 | 2 | [24, 26, 23, 21] |
| 23 | 1 | [24, 25, 22] |
| 24 | 3 | [21, 23, 26, 25, 22] |
| 25 | 2 | [24, 26, 23, 21] |
| 26 | 1 | [24, 25, 22] |
+---------+---------+----------------------+
[6 rows x 3 columns]
Can not understand why aggregateQuery always returns an empty result. Tried to test in aql, the same problem: 0 rows in set.
Indexes are all there.
aql> show indexes
+---------------+-------------+-----------+------------+-------+------------------------------+-------------+------------+-----------+
| ns | bin | indextype | set | state | indexname | path | sync_state | type |
+---------------+-------------+-----------+------------+-------+------------------------------+-------------+------------+-----------+
| "test" | "name" | "NONE" | "profiles" | "RW" | "inx_test_name" | "name" | "synced" | "STRING" |
| "test" | "age" | "NONE" | "profiles" | "RW" | "inx_test_age" | "age" | "synced" | "NUMERIC" |
aql> select * from test.profiles
+---------+-----+
| name | age |
+---------+-----+
| "Sally" | 19 |
| 20 | |
| 22 | |
| 28 | |
| "Ann" | 22 |
| "Bob" | 22 |
| "Tammy" | 22 |
| "Ricky" | 20 |
| 22 | |
| 19 | |
+---------+-----+
10 rows in set (0.026 secs)
aql> AGGREGATE mystream.avg_age() ON test.profiles WHERE age BETWEEN 20 and 29
0 rows in set (0.004 secs)
It seems that you are trying the example here.
There are two problems about the udf script. I paste the code of the lua script :
function avg_age(stream)
local function female(rec)
return rec.gender == "F"
end
local function name_age(rec)
return map{ name=rec.name, age=rec.age }
end
local function eldest(p1, p2)
if p1.age > p2.age then
return p1
else
return p2
end
end
return stream : filter(female) : map(name_age) : reduce(eldest)
end
First, there is no bin named 'gender' in your set, so you got 0 rows after aggregateQuery.
Second, this script isn't doing exactly what the function name 'avg_age' means, it just return the eldest record with name and age.
I paste my code bellow, it just replace the reduce func, and alert the map and filter func to meat the demand. You can just skip the filter process.
function avg_age(stream)
count = 0
sum = 0
local function female(rec)
return true
end
local function name_age(rec)
return rec.age
end
local function avg(p1, p2)
count = count + 1
sum = sum + p2
return sum / count
end
return stream : filter(female) : map(name_age) : reduce(avg)
end
The output looks like bellow :
AGGREGATE mystream.avg_age() ON test.avgage WHERE age BETWEEN 20 and 29
+---------+
| avg_age |
+---------+
| 22 |
+---------+
1 row in set (0.001 secs)
I have a following statement and it generates the mentioned output by averaging data within every 20 minutes of range.
Statement :
SELECT record_no, date_time,
ROUND(AVG(UNIX_TIMESTAMP(date_time))) AS time_value,
ROUND(AVG(ph1_active_power),4) AS p1,
ROUND(AVG(ph2_active_power),4) AS p2,
ROUND(AVG(ph3_active_power),4) AS p3
FROM powerpro1
GROUP BY date_time DIV 2000
Portion of the output
+-----------+---------------------+------------+---------+----------+----------+
| record_no | date_time | time_value | p1 | p2 | p3 |
+-----------+---------------------+------------+---------+----------+----------+
| 1 | 2014-12-01 00:00:00 | 1417372770 | 72.6242 | -68.7428 | -72.6242 |
| 21 | 2014-12-01 00:20:00 | 1417373970 | 71.6624 | -69.7448 | -71.6624 |
| 41 | 2014-12-01 00:40:00 | 1417375170 | 70.6869 | -70.7333 | -70.6869 |
| 61 | 2014-12-01 01:00:00 | 1417376370 | 69.6977 | -71.7082 | -69.6977 |
| 81 | 2014-12-01 01:20:00 | 1417377570 | 68.6952 | -72.6692 | -68.6952 |
| 101 | 2014-12-01 01:40:00 | 1417378770 | 67.6794 | -73.6162 | -67.6794 |
| 121 | 2014-12-01 02:00:00 | 1417379970 | 66.6505 | -74.549 | -66.6505 |
| 141 | 2014-12-01 02:20:00 | 1417381200 | 65.5825 | -75.4901 | -65.5825 |
+-----------+---------------------+------------+---------+----------+----------+
According to the no of records in the table named "powerpro1", the above query selects 1368 records when the executing. (May be increased in the future when receiving new records)
My requirement is to create a highchart using time_value for the x-axis and p1, p1 and p3 for the y-axis. But I needs to limit the no of points in the x-axis.
Can anyone like to help me to show this 1368 points by 1000 points in the chart
Unfortunately have no this kind of apporixmation, only in reverse order (I mean datagrouping, if you have 100 points, return i.e 10). So you need to calcualte it on your own and push to your data all 1000 points.
Can someone explain how these concept works?
I have 1 question. But I don't know have any ideas on constructing the truth table.
f(A,B,C) = AB + A’C
The answer given was ABC + ABC' + A'BC + A'B'C
And i have no idea how it get there. :-(
1. Create a column for each of the inputs, each intermediate functions, and the final function:
A B C | AB | A' | A'C | AB + A'C
--------------------------------
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
2. Enumerate all input possibilities, and start filling in the intermediate function values and then the final function value:
A B C | AB | A' | A'C | AB + A'C
--------------------------------
0 0 0 | 0 | 1 | 0 | 0
0 0 1 | | | |
0 1 0 | | | |
0 1 1 | | | |
1 0 0 | | | |
1 0 1 | | | |
1 1 0 | | | |
1 1 1 | | | |
3. Now, you finish the truth table.
Update per OP's edit of question:
The "answer given" can be reduced as follows using Boolean Algebra:
ABC + ABC' + A'BC + A'B'C
AB(C + C') + A'C(B + B')
AB + A'C
...which is the same as the given f(A,B,C). Not sure why ABC + ABC' + A'BC + A'B'C would be considered to be the "answer," but this does show equivalence between the two formulae.