Different YouTube URLs points to the same video - url

I found some bug in youtube when I reverse engineering it's video id generator. If I change last characther of the video id, it redirects to same video. How is this possible?
Example:
https://www.youtube.com/watch?v=9bZkp7q19f0
https://www.youtube.com/watch?v=9bZkp7q19f1
https://www.youtube.com/watch?v=9bZkp7q19f2
https://www.youtube.com/watch?v=9bZkp7q19f3
But this url isn't work:
https://www.youtube.com/watch?v=9bZkp7q19f4

The videoId is 8 Bytes (64 bit) base64 encoded. From this post :
For the videoId, it is an 8-byte (64-bit) integer. Applying
Base64-encoding to 8 bytes of data requires 11 characters. However,
since each Base64 character conveys exactly 6 bits, this allocation
could actually hold up to 11 × 6 = 66 bits--a surplus of 2 bits over
what our payload needs. The excess bits are set to zero, which has the
effect of excluding certain characters from ever appearing in the last
position of the encoded string. In particular, the videoId will always
end with one of the following: { A, E, I, M, Q, U, Y, c, g, k, o, s,
w, 0, 4, 8 }
In your case, your videoId is 9bZkp7q19f0 :
enc. | 9 b Z k p 7 q 1 9 f | 0
value | 61 27 25 36 41 59 42 53 61 31 | 52
bin. | 111101 011011 011001 100100 101001 111011 101010 110101 111101 011111 | 1101 00
If you modify the last character, the 64 bit id will change if the 4 most significative bit (MSB) are modified :
9bZkp7q19f1 :
enc. | 9 b Z k p 7 q 1 9 f | 1
value | 61 27 25 36 41 59 42 53 61 31 | 53
bin. | 111101 011011 011001 100100 101001 111011 101010 110101 111101 011111 | 1101 01
9bZkp7q19f2 :
enc. | 9 b Z k p 7 q 1 9 f | 2
value | 61 27 25 36 41 59 42 53 61 31 | 54
bin. | 111101 011011 011001 100100 101001 111011 101010 110101 111101 011111 | 1101 10
9bZkp7q19f3 :
enc. | 9 b Z k p 7 q 1 9 f | 3
value | 61 27 25 36 41 59 42 53 61 31 | 55
bin. | 111101 011011 011001 100100 101001 111011 101010 110101 111101 011111 | 1101 11
This will give a different video id (note the 4 MSB of the last Byte are modified 1101 to 1110) :
enc. | 9 b Z k p 7 q 1 9 f | 4
value | 61 27 25 36 41 59 42 53 61 31 | 56
bin. | 111101 011011 011001 100100 101001 111011 101010 110101 111101 011111 | 1110 00
9bZkp7q19f4 will give a different 64 bit id. Note that if such an id exists 9bZkp7q19f4, 9bZkp7q19f5, 9bZkp7q19f6 and 9bZkp7q19f7 will give the same id.
You can check the base64 encoding/values here

Related

Average of updating entries with chosen columns

Given a sheet like this:
+ ------ + ------- + ---------- + ---------- + ---------- + ---------- +
| A | B | C | D | E | F |
+ -------+ ------- + ---------- + ---------- + ---------- + ---------- +
| AVG | ITEMS | Week 3 May | Week 2 May | Week 1 May | Week 5 Apr |
|=QUERY()| Item 1 | 1263 | 1255 | 1142 | 956 |
| | Item 2 | 1371 | 1263 | 1023 | 1120 |
| | Item 3 | 1382 | 1257 | 1352 | 1853 |
| | Item 4 | 1429 | 1281 | 1120 | 1869 |
I need to move the column B to the first column (A).
Make a script to add a new colum for new entries.
1.-
In AVG column (column A in the example above) there is an average using the formula:
=QUERY(transpose(query(transpose(B2:$F),"Select "&REGEXREPLACE(join("",ArrayFormula(if(len(B2:B),"Avg(Col"&ROW($C2:$C)-ROW($C2)+1&"),",""))), ".\z","")&"")),"Select Col2")
This formula calculates the average of the last 4 weeks only if there's an entry in column B
I need to move this column to the right of the Items list (column B) but when I try to, the formula shows a circular dependency error. Is there a way to tell the formula to only pick the columns I want to?
2.-
There's also a button with an assigned macro to make a new column on the left of the latest week for new entries and insert the week number and month, this is the script:
function onEdit() {
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('C:C').activate();
spreadsheet.getActiveSheet().insertColumnsBefore(spreadsheet.getActiveRange().getColumn(), 1);
spreadsheet.getActiveRange().offset(0, 0, spreadsheet.getActiveRange().getNumRows(), 1).activate();
spreadsheet.getRange('C1').activate()
.setFormula('=CONCATENATE("Week ",(WEEKNUM(TODAY(),2)-WEEKNUM(EOMONTH(TODAY(),-1)+1)+1)," ",CHOOSE(MONTH(TODAY()),"Jan","Feb","Mar","Apr","May","Jun","Jul","Ago","Sep","Oct","Nov","Dec"))');
};
So it becomes something like this:
+ ------ + ------- + ---------- + ---------- + ---------- + ---------- + ---------- +
| A | B | C | D | E | F | G |
+ -------+ ------- + ---------- + ---------- + ---------- + ---------- + ---------- +
| AVG | ITEMS | Week X MMM | Week 3 May | Week 2 May | Week 1 May | Week 5 Apr |
| | Item 1 | (NEW WEEK) | 1263 | 1255 | 1142 | 956 |
and this is the formula I am using for the week number:
=CONCATENATE("Week ",(WEEKNUM(TODAY(),2)-WEEKNUM(EOMONTH(TODAY(),-1)+1)+1)," ",CHOOSE(MONTH(TODAY()),"Jan","Feb","Mar","Apr","May","Jun","Jul","Ago","Sep","Oct","Nov","Dec"))
The problem with the formula is that uses TODAY() function, which has a variable value while I need a static value. Also when using the script the Conditional Formatting is not carried over in the new column. How can I improve the script?
For replacing TODAY() with a static value, use the Javascript date methods and format the date according to your spreadsheet setting with Utilities.formatDate()
Replace the formula part TODAY() with the variable containing the date object as explained above
Sample:
function onEdit() {
var spreadsheet = SpreadsheetApp.getActive();
var now = new Date();
var today = '"' + Utilities.formatDate(now, spreadsheet.getSpreadsheetTimeZone(), "MM/dd/yyyy") + '"';
spreadsheet.getRange('C:C').activate();
spreadsheet.getActiveSheet().insertColumnsBefore(spreadsheet.getActiveRange().getColumn(), 1);
spreadsheet.getActiveRange().offset(0, 0, spreadsheet.getActiveRange().getNumRows(), 1).activate();
spreadsheet.getRange('C1').activate()
.setFormula('=CONCATENATE("Week ",(WEEKNUM(' +today + ',2)-WEEKNUM(EOMONTH(' + today + ',-1)+1)+1)," ",CHOOSE(MONTH(' + today + '),"Jan","Feb","Mar","Apr","May","Jun","Jul","Ago","Sep","Oct","Nov","Dec"))');
};
Note: If your spreadsheet date format is no "MM/dd/yyyy" - modify formatDate(date, timeZone, format) accordingly, see here
The conditional formatting has to be set separately from setFormula, see here for a sample.

Comparing multiplication-tables in F#

I'm trying to compare two ways of printing a multiplication table, and even though they print identical strings when i printf "%s" mulTable n and printf "%s" loopMulTable n, they dont seem to be the same thing when comparing them, as it prints false for every comparison in the last function. Can anyone explain to me why?
let a = " 1 2 3 4 5 6 7 8 9 10
1 1 2 3 4 5 6 7 8 9 10
2 2 4 6 8 10 12 14 16 18 20
3 3 6 9 12 15 18 21 24 27 30
4 4 8 12 16 20 24 28 32 36 40
5 5 10 15 20 25 30 35 40 45 50
6 6 12 18 24 30 36 42 48 54 60
7 7 14 21 28 35 42 49 56 63 70
8 8 16 24 32 40 48 56 64 72 80
9 9 18 27 36 45 54 63 72 81 90
10 10 20 30 40 50 60 70 80 90 100"
let mulTable n =
a.[0..(n*54)+51]
let loopMulTable n =
let mutable returnString = ""
returnString <- returnString + (sprintf " ")
for i in 1..10 do
returnString <- returnString + (sprintf "%5d" i)
for x in 1..n do
returnString <- returnString + (sprintf "\n")
returnString <- returnString + (sprintf "%2d" x)
for y in 1..10 do
returnString <- returnString + (sprintf "%5d" (x*y))
returnString
let o = "n:"
let u = "boolean value:"
let chooseN n =
printfn "%5s %19s" o u
for n in 1..n do
printfn "%4d %15b" n ((loopMulTable n)=(mulTable n))
chooseN 5
Might i add that i am a beginner in programming and especially in F#, so there might be other flaws though they're not the problem that i'm looking to solve.
Thanks!
The test for equality is most likely failing because this line is appending a newline rather than a carriage-return-newline combination.
returnString <- returnString + (sprintf "\n")
If you are on a Windows machine, the line breaks in your source code will most likely include the carriage return character. Change it to the following and it should compare just fine:
returnString <- returnString + (sprintf "\r\n")

Bitwise computation and count number of set bits afterward

I have a table:
create_table "fingerprint" do |t|
t.bit "fp1", limit: 64
t.bit "fp2", limit: 64
t.bit "fp3", limit: 64
t.bit "fp4", limit: 64
t.bit "fp5", limit: 64
end
fp1 | fp2 | fp3 | fp4 | fp5
---------------------------
001 | 010 | 011 | 100 | 101
And an array of 5 elements
fp = [5,4,3,2,1]
I'd like and bitwise each record of the table with each element fp and then count the total number of set bits over 5 columns.
For example:
(001 & 5) = 001
(010 & 4) = 000
(011 & 3) = 011
(100 & 2) = 000
(101 & 1) = 001
Total number of set bits: 4
I want to loop this procedure in every row of my table. Please help me with an efficient way to do it (the table has about 100k rows).
Thank you in advance.
The example for the type bit(3). You can easily adapt this for bit(64):
create table the_table (fp1 bit(3), fp2 bit(3), fp3 bit(3), fp4 bit(3), fp5 bit(3));
insert into the_table values
('001', '010', '011', '100', '101'),
('101', '011', '111', '110', '101');
with the_array(arr) as (
values (array[5,4,3,2,1])
),
new_values as (
select
fp1 & arr[1]::bit(3) n1,
fp2 & arr[2]::bit(3) n2,
fp3 & arr[3]::bit(3) n3,
fp4 & arr[4]::bit(3) n4,
fp5 & arr[5]::bit(3) n5
from the_table
cross join the_array
)
select
*,
length(
translate(
concat(n1::text, n2::text, n3::text, n4::text, n5::text),
'0',
'')
) bit_set
from new_values;
n1 | n2 | n3 | n4 | n5 | bit_set
-----+-----+-----+-----+-----+---------
001 | 000 | 011 | 000 | 001 | 4
101 | 000 | 011 | 010 | 001 | 6
(2 rows)

Bison - shift/reduce conflict identifier

I have 1 shift/reduce conflict at state 19. I think that there may be a problem with the different occurrences of 'identifier' but I'm struggling to understand the bison report and resolve the conflict. Below is my grammar followed by the bison report with state information:
%{
#include <cstdio>
#include <iostream>
using namespace std;
extern "C" int yylex();
extern "C" int yyparse();
void yyerror(const char *s);
%}
%union{
int int_val;
double d_val;
char *strng;
}
%token TLINEC TBLOCKC TPRINT
%token TASSIGN TCPA TOB TCB TCOMMA TSEMIC
%token TIF TELSE TFOR
%token TINT TFLOAT TD TCHAR
%token TPLUS TMINUS TDIV TMULT
%token TLT TGT TAND TOR TEQUAL TNE
%token<int_val> TINTEGER
%token<d_val> TDOUBLE
%token<strng> TID
%token TOPA
%start program
%%
program : command_list
command_list : declaration
| command_list declaration
declaration : function_dec
| variable_dec
| expression
variable_dec : type identifier TSEMIC
| type assignment TSEMIC
assignment : identifier TASSIGN expression
var_list : expression
| var_list TCOMMA expression
|
function_dec : type identifier TOPA p_list TCPA scope
p_list : variable_dec
| p_list TCOMMA variable_dec
type : TINT
| TD
| TFLOAT
| TCHAR
/* inside function scope */
scope : TOB command_list TCB
| TOB TCB
function_call : identifier TOPA var_list TCPA TSEMIC
/* expression rules */
expression : assignment
| function_call
| TOPA expression TCPA
| constant arithmetic_op expression
| constant logical_op expression
| constant
constant : identifier
| num
identifier : TID
num : TINTEGER
| TDOUBLE
arithmetic_op : TPLUS | TMINUS | TDIV | TMULT
logical_op : TLT | TGT | TAND | TOR | TEQUAL | TNE
%%
int main(int, char**){
int y=0;
do{
y=yyparse();
}while(y);
}
void yyerror(const char *s){
cout << "parse error! Message: " << s << endl;
exit(-1);
}
Bison Report:
State 19 conflicts: 1 shift/reduce
Grammar
0 $accept: program $end
1 program: command_list
2 command_list: declaration
3 | command_list declaration
4 declaration: function_dec
5 | variable_dec
6 | expression
7 variable_dec: type identifier TSEMIC
8 | type assignment TSEMIC
9 assignment: identifier TASSIGN expression
10 var_list: expression
11 | var_list TCOMMA expression
12 | /* empty */
13 function_dec: type identifier TOPA p_list TCPA scope
14 p_list: variable_dec
15 | p_list TCOMMA variable_dec
16 type: TINT
17 | TD
18 | TFLOAT
19 | TCHAR
20 scope: TOB command_list TCB
21 | TOB TCB
22 function_call: identifier TOPA var_list TCPA TSEMIC
23 expression: assignment
24 | function_call
25 | TOPA expression TCPA
26 | constant arithmetic_op expression
27 | constant logical_op expression
28 | constant
29 constant: identifier
30 | num
31 identifier: TID
32 num: TINTEGER
33 | TDOUBLE
34 arithmetic_op: TPLUS
35 | TMINUS
36 | TDIV
37 | TMULT
38 logical_op: TLT
39 | TGT
40 | TAND
41 | TOR
42 | TEQUAL
43 | TNE
Terminals, with rules where they appear
$end (0) 0
error (256)
TLINEC (258)
TBLOCKC (259)
TPRINT (260)
TASSIGN (261) 9
TCPA (262) 13 22 25
TOB (263) 20 21
TCB (264) 20 21
TCOMMA (265) 11 15
TSEMIC (266) 7 8 22
TIF (267)
TELSE (268)
TFOR (269)
TINT (270) 16
TFLOAT (271) 18
TD (272) 17
TCHAR (273) 19
TPLUS (274) 34
TMINUS (275) 35
TDIV (276) 36
TMULT (277) 37
TLT (278) 38
TGT (279) 39
TAND (280) 40
TOR (281) 41
TEQUAL (282) 42
TNE (283) 43
TINTEGER (284) 32
TDOUBLE (285) 33
TID (286) 31
TOPA (287) 13 22 25
Nonterminals, with rules where they appear
$accept (33)
on left: 0
program (34)
on left: 1, on right: 0
command_list (35)
on left: 2 3, on right: 1 3 20
declaration (36)
on left: 4 5 6, on right: 2 3
variable_dec (37)
on left: 7 8, on right: 5 14 15
assignment (38)
on left: 9, on right: 8 23
var_list (39)
on left: 10 11 12, on right: 11 22
function_dec (40)
on left: 13, on right: 4
p_list (41)
on left: 14 15, on right: 13 15
type (42)
on left: 16 17 18 19, on right: 7 8 13
scope (43)
on left: 20 21, on right: 13
function_call (44)
on left: 22, on right: 24
expression (45)
on left: 23 24 25 26 27 28, on right: 6 9 10 11 25 26 27
constant (46)
on left: 29 30, on right: 26 27 28
identifier (47)
on left: 31, on right: 7 9 13 22 29
num (48)
on left: 32 33, on right: 30
arithmetic_op (49)
on left: 34 35 36 37, on right: 26
logical_op (50)
on left: 38 39 40 41 42 43, on right: 27
state 18
26 expression: constant . arithmetic_op expression
27 | constant . logical_op expression
28 | constant . [$end, TCPA, TCB, TCOMMA, TSEMIC, TINT, TFLOAT, TD, TCHAR, TINTEGER, TDOUBLE, TID, TOPA]
34 arithmetic_op: . TPLUS
35 | . TMINUS
36 | . TDIV
37 | . TMULT
38 logical_op: . TLT
39 | . TGT
40 | . TAND
41 | . TOR
42 | . TEQUAL
43 | . TNE
TPLUS shift, and go to state 26
TMINUS shift, and go to state 27
TDIV shift, and go to state 28
TMULT shift, and go to state 29
TLT shift, and go to state 30
TGT shift, and go to state 31
TAND shift, and go to state 32
TOR shift, and go to state 33
TEQUAL shift, and go to state 34
TNE shift, and go to state 35
$default reduce using rule 28 (expression)
arithmetic_op go to state 36
logical_op go to state 37
state 19
9 assignment: identifier . TASSIGN expression
22 function_call: identifier . TOPA var_list TCPA TSEMIC
29 constant: identifier . [$end, TCPA, TCB, TCOMMA, TSEMIC, TINT, TFLOAT, TD, TCHAR, TPLUS, TMINUS, TDIV, TMULT, TLT, TGT, TAND, TOR, TEQUAL, TNE, TINTEGER, TDOUBLE, TID, TOPA]
TASSIGN shift, and go to state 38
TOPA shift, and go to state 39
TOPA [reduce using rule 29 (constant)]
$default reduce using rule 29 (constant)
state 20
30 constant: num .
$default reduce using rule 30 (constant)
According to the output of bison, from state 19 it is possible to reduce expression on a lookahead of (. How can this be possible? In other words, under what circumstances can expression be followed by an open parenthesis?
A search through the grammar reveals only three uses of TOPA. Two of them (function declarations and function calls) follow identifier and identifier cannot derive expression, so it must be the third one:
expression: TOPA expression TCPA;
However, the only way that a reduction of expression could occur immediately before this instance of ( is if it were possible for two expressions to occur consecutively. Normally, in C-like languages that possibility is eliminated by requiring a ; to separate statements (which might be, start with, or end with expression), and I suppose that was your intention.
However, we see that:
command_list: declaration
| command_list declaration
declaration: expression
which allows two consecutive expressions without intervening semicolon.
As always, I encourage the use of more readable tokens in a bison grammar. '(' is much easier to understand than TOPA, and I honestly have no idea what COB might be. But it's a question of style.

Webcrawler - Duplicates and weird count

I've stole some code from "Expert F# 2.0", that shows how to build a webcrawler, using MailboxProcessor. As you see, then I have a print expression at line 23, that prints the current number of urls in the visited Set. Also the number of urls to crawl is limited by 49.
open System
open System.Net
open System.Text.RegularExpressions
open Microsoft.FSharp.Control.WebExtensions
let getLinks (txt:string) =
[ for m in Regex.Matches(txt, "href=\s*\"[^\"h]*(http://[^&\"]*)\"") -> m.Groups.Item(1).Value ]
let collectLinks (url:string) =
async { let web = new WebClient()
let! data = web.AsyncDownloadString <| Uri url
let links = getLinks data
return links }
let urlCollector =
MailboxProcessor.Start(fun self ->
let rec waitForUrl (visited : Set<string>) =
async { // Checks whether we have reached the limit of pages to crawl
if visited.Count < 50 then
// Waits for a URL...
let! url = self.Receive()
printfn "%A | %A" visited.Count url
// If not the URL already has been crawled...
if not (visited.Contains url) then
// Start
do! Async.StartChild(
async { let! links = collectLinks url
Seq.iter self.Post links}) |> Async.Ignore
return! waitForUrl (visited.Add url) }
waitForUrl Set.empty)
urlCollector.Post "http://news.google.com/"
That's seems alright eh? - But now the output looks like:
0 | "http://news.google.com/"
1 | "http://www.gstatic.com/news/img/favicon.ico"
2 | "http://mail.google.com/mail/?tab=nm"
3 | "http://www.google.com/intl/en/options/"
4 | "http://docs.google.com/?tab=no"
5 | "http://www.google.com/reader/?tab=ny"
6 | "http://sites.google.com/?tab=n3"
7 | "http://www.google.com/intl/en/options/"
7 | "http://www.google.com/preferences?hl=en"
8 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
9 | "http://www.bloomberg.com/news/2011-08-07/london-rioters-clash-with-police-loot-in-tottenham-after-shooting-death.html"
10 | "http://www.hindustantimes.com/Rioters-battle-police-after-shooting-protest/Article1-730371.aspx"
11 | "http://www.telegraph.co.uk/news/uknews/crime/8687177/Tottenham-riot-live.html"
12 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
12 | "http://www.montrealgazette.com/London+wakes+riot+aftermath/5218849/story.html"
13 | "http://themediablog.typepad.com/the-media-blog/2011/08/daily-mail-tottenham-violence-twitter.html"
14 | "http://en.wikipedia.org/wiki/2011_Tottenham_riots"
15 | "http://www.babnet.net/festivaldetail-37897.asp"
16 | "http://www.youtube.com/watch?v=l9UImSbegj4"
17 | "http://www.babnet.net/festivaldetail-37897.asp"
17 | "http://www.youtube.com/watch?v=l9UImSbegj4"
17 | "http://www.telegraph.co.uk/news/uknews/crime/8687177/Tottenham-riot-live.html"
17 | "http://www.telegraph.co.uk/news/uknews/crime/8687177/Tottenham-riot-live.html"
17 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
17 | "http://www.guardian.co.uk/uk/2011/aug/07/tottenham-riots-police-had-not-anticipated-violence"
17 | "http://www.bbc.co.uk/news/uk-14436001"
18 | "http://www.bbc.co.uk/news/uk-14436001"
18 | "http://www.kbc.co.ke/news.asp?nid=71755"
19 | "http://www.kbc.co.ke/news.asp?nid=71755"
19 | "http://news.sky.com/skynews/Home/UK-News/Tottenham-Riots-Simmering-Anger-Erupts-In-North-London-After-Protest-At-Mans-Shooting-Death/Article/201108116045172?f=rss"
20 | "http://news.sky.com/skynews/Home/UK-News/Tottenham-Riots-Simmering-Anger-Erupts-In-North-London-After-Protest-At-Mans-Shooting-Death/Article/201108116045172?f=rss"
20 | "http://www.irishtimes.com/newspaper/breaking/2011/0807/breaking2.html?via=mr"
21 | "http://www.irishtimes.com/newspaper/breaking/2011/0807/breaking2.html?via=mr"
21 | "http://www.cbc.ca/news/world/story/2011/08/07/tottenham-riot.html"
22 | "http://www.cbc.ca/news/world/story/2011/08/07/tottenham-riot.html"
22 | "http://www.newsday.com/news/police-officer-hospitalized-7-injured-in-uk-riot-1.3079769"
23 | "http://www.newsday.com/news/police-officer-hospitalized-7-injured-in-uk-riot-1.3079769"
23 | "http://www.msnbc.msn.com/id/44049721/ns/world_news-europe/"
24 | "http://www.msnbc.msn.com/id/44049721/ns/world_news-europe/"
24 | "http://www.timeslive.co.za/world/2011/08/07/eight-london-police-hospitalised-after-riots"
25 | "http://www.timeslive.co.za/world/2011/08/07/eight-london-police-hospitalised-after-riots"
25 | "http://www.cnn.com/2011/WORLD/europe/08/07/uk.riots/"
26 | "http://www.cnn.com/2011/WORLD/europe/08/07/uk.riots/"
26 | "http://www.dailymail.co.uk/news/article-2023348/Tottenham-anarchy-Grim-echo-1985-Broadwater-farm-riot.html"
27 | "http://www.dailymail.co.uk/news/article-2023348/Tottenham-anarchy-Grim-echo-1985-Broadwater-farm-riot.html"
27 | "http://www.mirror.co.uk/news/top-stories/2011/08/06/tottenham-riot-protesters-torch-police-cars-shops-and-a-bus-115875-23325724/"
28 | "http://www.mirror.co.uk/news/top-stories/2011/08/06/tottenham-riot-protesters-torch-police-cars-shops-and-a-bus-115875-23325724/"
28 | "http://www.theglobeandmail.com/news/world/images-of-the-destruction-from-londons-tottenham-riots/article2122026/"
29 | "http://www.theglobeandmail.com/news/world/images-of-the-destruction-from-londons-tottenham-riots/article2122026/"
29 | "http://thelede.blogs.nytimes.com/2011/08/06/shops-and-cars-burn-in-anti-police-riot-in-london/"
30 | "http://thelede.blogs.nytimes.com/2011/08/06/shops-and-cars-burn-in-anti-police-riot-in-london/"
30 | "http://www.stuff.co.nz/world/5403614/Crowds-attack-police-after-UK-protest"
31 | "http://www.stuff.co.nz/world/5403614/Crowds-attack-police-after-UK-protest"
31 | "http://www.google.com/hostednews/afp/article/ALeqM5jOCV_DVSYR1S50v6vdSBjsR5H9Jw?docId=CNG.36dce69df0a155bfd2fa1a3a5f92f6e1.5c1"
32 | "http://www.google.com/hostednews/afp/article/ALeqM5jOCV_DVSYR1S50v6vdSBjsR5H9Jw?docId=CNG.36dce69df0a155bfd2fa1a3a5f92f6e1.5c1"
32 | "http://fallenscoop.com/16993/tottenham-riot-2011-north-london-burns-after-protest-of-mark-duggan"
33 | "http://fallenscoop.com/16993/tottenham-riot-2011-north-london-burns-after-protest-of-mark-duggan"
33 | "http://www.thedailybeast.com/cheats/2011/08/07/riots-grip-north-london.html"
34 | "http://www.thedailybeast.com/cheats/2011/08/07/riots-grip-north-london.html"
34 | "http://www.thehindu.com/news/article2333142.ece"
35 | "http://www.sfgate.com/cgi-bin/article.cgi?f=/g/a/2011/08/07/bloomberg1376-LPHCT11A1I4H01-3ULNPF643I4ERSIU09MO54CQ4B.DTL"
36 | "http://online.wsj.com/community/groups/question-day-229/topics/do-you-agree-sps-decision?commentid=2864110"
37 | "http://www.businessweek.com/ap/financialnews/D9OUMJVO1.htm"
38 | "http://www.cnn.com/2011/BUSINESS/08/06/global.economy.cnn/"
39 | "http://www.chicagotribune.com/news/opinion/editorials/ct-edit-credit-20110806,0,6468631.story"
40 | "http://www.foxbusiness.com/markets/2011/08/07/treasury-hits-back-against-sp-downgrade/"
41 | "http://en.wikipedia.org/wiki/Standard_%26_Poor%27s"
42 | "http://www.usatoday.com/money/companies/management/2011-08-07-verizon-strike_n.htm"
43 | "http://www.businessweek.com/ap/financialnews/D9OV028O3.htm"
44 | "http://www.nbcnewyork.com/news/local/Verizon-Workers-Demonstrate-in-Manhattan-Part-of-45K-Worker-Strike-127087478.html"
45 | "http://www.poughkeepsiejournal.com/article/20110807/NEWS03/110807003/45K-Verizon-workers-strike-over-new-labor-contract-?odyssey=tab%7Ctopnews%7Ctext%7CPoughkeepsieJournal.com"
46 | "http://www.nypost.com/p/news/national/verizon_hit_by_strike_Ga9JjKphZrKCEAr608bqkI"
47 | "http://www.nytimes.com/2011/08/07/us/07verizon.html"
48 | "http://www.ctv.ca/CTVNews/World/20110807/afghanistan-helicopter-crash-fighting-110807/"
49 | "http://abcnews.go.com/International/nato-crash-team-seal-members-killed-afghanistan/story?id=14249189"
What's up with all the duplicates? Also why does some of them print the same "current urls in visited Set" (like 17, 33, 34 etc.)? I'm pretty sure, that I miss something totally fundamental, but I cant figure out what.
In your snippet, the printing using printfn is done before you check if the URL is already present in the set. This means that it will print the URL even if it will not be added in the next step. (You can see that it wasn't added if you look at the numbers in the left column - if the count wasn't incremented, the number on the next line is the same).
Moving printfn to the body of the if expression should give the expected results:
// Waits for a URL...
let! url = self.Receive()
// If not the URL already has been crawled...
if not (visited.Contains url) then
printfn "%A | %A" visited.Count url
// Start

Resources