I am looking to write a small firefox add-on that detects when files that were downloaded are (or have been) deleted locally and removes the corresponding entry in the firefox download list.
Can anybody point me to the relevant api to manipulate the download list? I cannot seem to find it.
The relevant API is PlacesUtils which abstracts the complexity of the Places database.
If your code runs in the context of a chrome window then you get a PlacesUtils glabal variable for free. Otherwise (bootstrapped, Add-on SDK, whatever) you have to import PlacesUtils.jsm.
Cu.import("resource://gre/modules/PlacesUtils.jsm");
As far as Places is concerned, downloaded files are nothing more than a special kind of visited pages, annotated accordingly. It's a matter of just one line of code to get an array of all downloaded files.
var results = PlacesUtils.annotations.getAnnotationsWithName("downloads/destinationFileURI");
Since we asked for the destinationFileURI annotation, each element of the resultarray holds the download location in the annotationValue property as a file: URI spec string.
With that you can check if the file actually exists
function getFileFromURIspec(fileurispec){
// if Services is not available in your context Cu.import("resource://gre/modules/Services.jsm");
var filehandler = Services.io.getProtocolHandler("file").QueryInterface(Ci.nsIFileProtocolHandler);
try{
return filehandler.getFileFromURLSpec(fileurispec);
}
catch(e){
return null;
}
}
getFileFromURIspec will return an instance of nsIFile, or null if the spec is invalid which shouldn't happen in this case but a sanity check never hurts. With that you can call the exists() method and if it returns false then the associated page entry in Places is eligible for removal. We can tell which is that page by its uri, which conveniently is also a property of each element of the results.
PlacesUtils.bhistory.removePage(result.uri);
To sum it up
var results = PlacesUtils.annotations.getAnnotationsWithName("downloads/destinationFileURI");
results.forEach(function(result){
var file = getFileFromURIspec(result.annotationValue);
if(!file){
// I don't know how you should treat this edge case
// ask the user, just log, remove, some combination?
}
else if(!file.exists()){
PlacesUtils.bhistory.removePage(result.uri);
}
});
Related
I just want to fetch all my liked videos ~25k items. as far as my research goes this is not possible via the YouTube v3 API.
I have already found multiple issues (issue, issue) on the same problem, though some claim to have fixed it, but it only works for them as they don't have < 5000 items in their liked video list.
playlistItems list API endpoint with playlist id set to "liked videos" (LL) has a limit of 5000.
videos list API endpoint has a limit of 1000.
Unfortunately those endpoints don't provide me with parameters that I could use to paginate the requests myself (e.g. give me all the liked videos between date x and y), so I'm forced to take the provided order (which I can't get past 5k entries).
Is there any possibility I can fetch all my likes via the API?
more thoughts to the reply from #Yarin_007
if there are deleted videos in the timeline they appear as "Liked https://...url" , the script doesnt like that format and fails as the underlying elements dont have the same structure as existing videos
can be easily fixed with a try catch
function collector(all_cards) {
var liked_videos = {};
all_cards.forEach(card => {
try {
// ignore Dislikes
if (card.innerText.split("\n")[1].startsWith("Liked")) {
....
}
}
catch {
console.log("error, prolly deleted video")
}
})
return liked_videos;
}
to scroll down to the bottom of the page ive used this simple script, no need to spin up something big
var millisecondsToWait = 1000;
setInterval(function() {
window.scrollTo(0, document.body.scrollHeight);
console.log("scrolling")
}, millisecondsToWait);
when more ppl want to retrive this kind of data, one could think about building a proper script that is more convenient to use. If you check the network requests you can find the desired data in the response of requests called batchexecute. One could copy the authentification of one of them provide them to a script that queries those endpoints and prepares the data like the other script i currently manually inject.
Hmm. perhaps Google Takeout?
I have verified the youtube data contains a csv called "liked videos.csv". The header is Video Id,Time Added, and the rows are
dQw4w9WgXcQ,2022-12-18 23:42:19 UTC
prvXCuEA1lw,2022-12-24 13:22:13 UTC
for example.
So you would need to retrieve video metadata per video ID. Not too bad though.
Note: the export could take a while, especially with 25k videos. (select only YouTube data)
I also had an idea that involves scraping the actual liked videos page (which would save you 25k HTTP Requests). But I'm unsure if it breaks with more than 5000 songs. (also, emulating the POST requests on that page may prove quite difficult, albeit not impossible. (they fetch /browse?key=..., and have some kind of obfuscated / encrypted base64 strings in the request-body, among other parameters)
EDIT:
Look. There's probably a normal way to get a complete dump of all you google data. (i mean, other than takeout. Email them? idk.)
anyway, the following is the other idea...
Follow this deep link to your liked videos history.
Scroll to the bottom... maybe with selenium, maybe with autoit, maybe put something on the "end" key of your keyboard until you reach your first liked video.
Hit f12 and run this in the developer console
// https://www.youtube.com/watch?v=eZPXmCIQW5M
// https://myactivity.google.com/page?utm_source=my-activity&hl=en&page=youtube_likes
// go over all "cards" in the activity webpage. (after scrolling down to the absolute bottom of it)
// create a dictionary - the key is the Video ID, the value is a list of the video's properties
function collector(all_cards) {
var liked_videos = {};
all_cards.forEach(card => {
// ignore Dislikes
if (card.innerText.split("\n")[1].startsWith("Liked")) {
// horrible parsing. your mileage may vary. I Tried to avoid using any gibberish class names.
let a_links = card.querySelectorAll("a")
let details = a_links[0];
let url = details.href.split("?v=")[1]
let video_length = a_links[3].innerText;
let time = a_links[2].parentElement.innerText.split(" • ")[0];
let title = details.innerText;
let date = card.closest("[data-date]").getAttribute("data-date")
liked_videos[url] = [title,video_length, date, time];
// console.log(title, video_length, date, time, url);
}
})
return liked_videos;
}
// https://stackoverflow.com/questions/57709550/how-to-download-text-from-javascript-variable-on-all-browsers
function download(filename, text, type = "text/plain") {
// Create an invisible A element
const a = document.createElement("a");
a.style.display = "none";
document.body.appendChild(a);
// Set the HREF to a Blob representation of the data to be downloaded
a.href = window.URL.createObjectURL(
new Blob([text], { type })
);
// Use download attribute to set set desired file name
a.setAttribute("download", filename);
// Trigger the download by simulating click
a.click();
// Cleanup
window.URL.revokeObjectURL(a.href);
document.body.removeChild(a);
}
function main() {
// gather relevant elements
var all_cards = document.querySelectorAll("div[aria-label='Card showing an activity from YouTube']")
var liked_videos = collector(all_cards)
// download json
download("liked_videos.json", JSON.stringify(liked_videos))
}
main()
Basically it gathers all the liked videos' details and creates a key: video_ID - Value: [title,video_length, date, time] object for each liked video.
It then automatically downloads the json as a file.
I'm writing a function that will parse certain websites and fetch data from there, which will be used to create instances of a class. I'm able to successfully extract the data when it is retrieved using the getElementById() function, but for some reason, the getElementsByClassName() always returns a node list with 0 elements.
The site I'm currently parsing is here.
If you search for 'datas-nev', you will find exactly one match:
<p class="datas-nev"><b>Kutya neve: </b>Jhonny</p>
And here is the code use for parsing:
import 'package:html/parser.dart' show parse;
...
final response = await http.get(URL);
var document = parse(response.body);
var detailsContainer = document.getElementById('husky_details_container_right');
var dogName = new List<Node>();
dogName = document.getElementsByClassName('datas-nev');
The contents of the detailsContainer can be extracted successfully, for example this gives me back a string of relevant data I will use later:
var humanBehaviourValue;
try { humanBehaviourValue = detailsContainer.nodes[1].nodes[19].nodes[1].nodes[7].nodes[1].toString(); }
catch (e) { humanBehaviourValue = 'N/A'; }
But when I check the value of dogName in the debug window, I get the following:
dogName = {_growableList} size = 0
I already tried initializing the dogName 'properly' by List<Node> dogName = new List<Node>(); but it didn't help. I also tried other datas-* values, but it seems the parser can't find them. I even tried using just datas (because that is a div, while others are paragraphs), but that didn't help either.
Basically I could just hardwire the name and some data (breed, color, etc) as those never really change, but the location of the shelter can change, and keeping it up-to-date by scraping the data seems better than pushing updates out manually. That means I mostly need the value of datas-helyszin but that isn't parsed either.
As #Günter Zöchbauer pointed out, the code actually works. I was just looking for the value too soon, before it was actually fetched...
I've written a console app using the Umbraco (7.1.4) contentService API to move some nodes and rename them in a site redesign. It all works fine except when I rename the document the 'Link to Document' doesn't change. The code is adapted from https://github.com/sitereactor/umbraco-console-example.
private static void MoveNode(IContentService contentService, int nodeId, int newParentId, string newname)
{
//Get the Root Content
var nodeContent = contentService.GetByIds(nodeId.AsEnumerableOfOne()).First();
nodeContent.Name = newname;
contentService.Move(nodeContent, newParentId);
var status = contentService.SaveAndPublishWithStatus(nodeContent);
Console.WriteLine(status);
}
Status is True and the page name is changed when I look at it in back office but the 'Link to Document' doesn't change. Now if I use
var status = contentService.PublishWithChildrenWithStatus(nodeContent);
Then it works but takes a lot longer (minutes), but if I change the name in back office then it only takes seconds but the links are updated correctly. Is there another way to rename a document without Publishing all the children?
(I've left out a bit of code in the above - sometimes it moves sometimes it just renames, but in either case I have to publish with the children to get it to work.)
It seems this happens because the XML cache file has not been updated.
Have a look here for a few ways of doing it
http://our.umbraco.org/wiki/reference/api-cheatsheet/publishing-and-republishing
Or the quickest is just delete APP_Data\umbraco.config - the XML cache file, probably not recommended for Prod but gets things working quickly in dev.
Salve! When I try Mozilla's Validator on my addon, it get the following error related to my treatment of clipboard usage:
nsITransferable has been changed in Gecko 16.
Warning: The nsITransferable interface has changed to better support
Private Browsing Mode. After instantiating the object, you should call
the init function on it before any other functions are called.
See https://developer.mozilla.org/en-US/docs/Using_the_Clipboard for more
information.
var trans = Components.classes["#mozilla.org/widget/transferable;1"].createInstance(Components.interfaces.nsITransferable);
if ('init' in trans){ trans.init(null);};
I can't understand this.
Here is my code - I am clearly calling trans.init:
var clip = Components.classes["#mozilla.org/widget/clipboard;1"].getService(Components.interfaces.nsIClipboard);
if (!clip) return "";
var trans = Components.classes["#mozilla.org/widget/transferable;1"].createInstance(Components.interfaces.nsITransferable);
if ('init' in trans){ trans.init(null);}; //<--IT DOESN'T LIKE THIS
if (!trans) return false;
trans.addDataFlavor("text/unicode");
I've also tried the Transferable function from Mozilla's example here, but get the same non-validation report.
One of the Mozilla AMO editors told me to write exactly this, and it still doesn't validate.
I've also tried, simply:
var trans = Components.classes["#mozilla.org/widget/transferable;1"].createInstance(Components.interfaces.nsITransferable);
trans.init(null); //<---LOOK HERE
if (!trans) return false;
trans.addDataFlavor("text/unicode");
The Validator does not report any errors - just this warning. Everything works properly. Mozilla updated their Gecko engine, and they want devlopers to match up to the new standard.
In my usage, we want to be able to use the contents of the clipboard that was probably gotten from outside the application, too, so we do want to call the init function with null instead of window.
Any advice would be wonderful!
trans.init(null) is valid in some circumstances, such as yours. It can also cause privacy leaks if used in the wrong circumstances, so the validator flags all uses of it as potentially requiring changing. Therefore, it is a warning that you can ignore in this case.
I'm building an nsIProtocolHandler implementation in Delphi. (more here)
And it's working already. Data the module builds gets streamed over an nsIInputStream. I've got all the nsIRequest, nsIChannel and nsIHttpChannel methods and properties working.
I've started testing and I run into something strange. I have a page "a.html" with this simple HTML:
<img src="a.png">
Both "xxm://test/a.html" and "xxm://test/a.png" work in Firefox, and give above HTML or the PNG image data.
The problem is with displaying the HTML page, the image doesn't get loaded. When I debug, I see:
NewChannel gets called for a.png, (when Firefox is processing an OnDataAvailable notice on a.html),
NotificationCallbacks is set (I only need to keep a reference, right?)
RequestHeader "Accept" is set to "image/png,image/*;q=0.8,*/*;q=0.5"
but then, the channel object is released (most probably due to a zero reference count)
Looking at other requests, I would expect some other properties to get set (such as LoadFlags or OriginalURI) and AsyncOpen to get called, from where I can start getting the request responded to.
Does anybody recognise this? Am I doing something wrong? Perhaps with LoadFlags or the LoadGroup? I'm not sure when to call AddRequest and RemoveRequest on the LoadGroup, and peeping from nsHttpChannel and nsBaseChannel I'm not sure it's better to call RemoveRequest early or late (before or after OnStartRequest or OnStopRequest)?
Update: Checked on the freshly new Firefox 3.5, still the same
Update: To try to further isolate the issue, I try "file://test/a1.html" with <img src="xxm://test/a.png" /> and still only get above sequence of events happening. If I'm supposed to add this secundary request to a load-group to get AsyncOpen called on it, I have no idea where to get a reference to it.
There's more: I find only one instance of the "Accept" string that get's added to the request headers, it queries for nsIHttpChannelInternal right after creating a new channel, but I don't even get this QueryInterface call through... (I posted it here)
Me again.
I am going to quote the same stuff from nsIChannel::asyncOpen():
If asyncOpen returns successfully, the
channel is responsible for keeping
itself alive until it has called
onStopRequest on aListener or called
onChannelRedirect.
If you go back to nsViewSourceChannel.cpp, there's one place where loadGroup->AddRequest is called and two places where loadGroup->RemoveRequest is being called.
nsViewSourceChannel::AsyncOpen(nsIStreamListener *aListener, nsISupports *ctxt)
{
NS_ENSURE_TRUE(mChannel, NS_ERROR_FAILURE);
mListener = aListener;
/*
* We want to add ourselves to the loadgroup before opening
* mChannel, since we want to make sure we're in the loadgroup
* when mChannel finishes and fires OnStopRequest()
*/
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
loadGroup->AddRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this), nsnull);
nsresult rv = mChannel->AsyncOpen(this, ctxt);
if (NS_FAILED(rv) && loadGroup)
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, rv);
if (NS_SUCCEEDED(rv)) {
mOpened = PR_TRUE;
}
return rv;
}
and
nsViewSourceChannel::OnStopRequest(nsIRequest *aRequest, nsISupports* aContext,
nsresult aStatus)
{
NS_ENSURE_TRUE(mListener, NS_ERROR_FAILURE);
if (mChannel)
{
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
{
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, aStatus);
}
}
return mListener->OnStopRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
aContext, aStatus);
}
Edit:
As I have no clue about how Mozilla works, so I have to guess from reading some code. From the channel's point of view, once the original file is loaded, its job is done. If you want to load the secondary items linked in file like an image, you have to implement that in the listener. See TestPageLoad.cpp. It implements a crude parser and it retrieves child items upon OnDataAvailable:
NS_IMETHODIMP
MyListener::OnDataAvailable(nsIRequest *req, nsISupports *ctxt,
nsIInputStream *stream,
PRUint32 offset, PRUint32 count)
{
//printf(">>> OnDataAvailable [count=%u]\n", count);
nsresult rv = NS_ERROR_FAILURE;
PRUint32 bytesRead=0;
char buf[1024];
if(ctxt == nsnull) {
bytesRead=0;
rv = stream->ReadSegments(streamParse, &offset, count, &bytesRead);
} else {
while (count) {
PRUint32 amount = PR_MIN(count, sizeof(buf));
rv = stream->Read(buf, amount, &bytesRead);
count -= bytesRead;
}
}
if (NS_FAILED(rv)) {
printf(">>> stream->Read failed with rv=%x\n", rv);
return rv;
}
return NS_OK;
}
The important thing is that it calls streamParse(), which looks at src attribute of img and script element, and calls auxLoad(), which creates new channel with new listener and calls AsyncOpen().
uriList->AppendElement(uri);
rv = NS_NewChannel(getter_AddRefs(chan), uri, nsnull, nsnull, callbacks);
RETURN_IF_FAILED(rv, "NS_NewChannel");
gKeepRunning++;
rv = chan->AsyncOpen(listener, myBool);
RETURN_IF_FAILED(rv, "AsyncOpen");
Since it's passing in another instance of MyListener object in there, that can also load more child items ad infinitum like a Russian doll situation.
I think I found it (myself), take a close look at this page. Why it doesn't highlight that the UUID has been changed over versions, isn't clear to me, but it would explain why things fail when (or just prior to) calling QueryInterface on nsIHttpChannelInternal.
With the new(er) UUID, I'm getting better results. As I mentioned in an update to the question, I've posted this on bugzilla.mozilla.org, I'm curious if and which response I will get there.