What kind of encoding is this URL? - url

A bunch of photos in a website directory has these URLs for each photo:
www.example.com/3aecbc1bf32c7615fb732d407b1b571a.jpg
www.example.com/27cbb6.jpg
My question is are the random gibberish part some kind of encoding that can be decoded? Or is each photo really represented by these random character strings? I wish to understand the pattern so I can guess the URL and view all the photos in the directory. Thanks.

That string looks to be an md5 hash value. The result of an MD5 hash function will always be the same length, regardless of the length of the input string. The output is always 128 bits, or 32 characters.

Related

Hash de-encryption type MD5

This is hash type MD5 '9931BF135E464FE91E444DF4E046006A' but I can't change it to string is there any website that can do that.
There are several reasons why what you're asking is not possible.
First, the process by which the MD5 hash code is created loses information. MD5 is a 128-bit hash code. So if you take the hash code of anything larger than 128 bits (that's 16 bytes), by definition you're going to lose information.
Related: there is an infinite number of possible strings. By the pigeonhole principle, there is more than one string that will hash to any particular MD5 value.

Best compression algorithm for Url query string

I have to pass a large url query string, so when this string size exceeds a certain number of characters, it creates problem when passed in the url.
Currently I have tried deflation + base64 encoding, which is giving me around 30-35% compression.
So if my query string becomes too large, say 4400 characters, it will be compressed to approximately 2650 chars, which wont fit to my url.
I need a solution that gives better results than this one.
I searched a lot, but not able to find a better solution.
Any suggestions on what else could be done will be appreciated. Thanks.
Example of my query string:
3d7821d1-e324-4cea-9bd7-763c0b62cdc2|94db7bdb-5e16-4700-a1f9-408ba7f7bee1|63360a17-0807-45a0-a798-31eb2614b0f7|9b37f302-2757-40e5-b9b4-390e5b786010|46ef6bce-c7e9-47d6-90d8-bc7c2b5784c0|e5f450a5-724b-42a0-aff9-34be2d50f59b|33db4e6b-bc53-4774-8267-759167a8dba9|30a8c7a9-0a3b-4df3-ab01-5e9b262d1902|d31086bb-98e8-41d0-a6cf-0bd48986bce7|30f27de5-1536-483a-85aa-6eb5000ba67b|41498746-3f45-4c16-9152-a6ca8355d502|6b5c643b-03f6-4390-9d54-79bf978f8e15|4537e3ba-09ed-465a-aad8-1c842084c3af|ad1161ab-0393-4a66-a538-6dda0c7b892a.....
Currently the solution- deflation + base64, doesnot completely solve my issue but improves the situation, so I integrated it with my code.
And for Future work, thinking about:
Converting the request to POST
OR
Taking sequential ids (1,2,3...), instead of UUID
(the example of query string shows that it is a concatenation of UUIDs)
and concatenating, and passing in GET request.

trying to figure out the charset

I'm downloading a CSV from Google Docs and in it characters like “ are saved as \xE2\x80\x9C and ” are saved as \xE2\x80\x9D.
My question is... what charset are those being saved in? How might I go about figuring that out?
It is in UTF-8.. You can tell by decoding it as UTF-8 and it shows the correct characters.
UTF-8 also has a unique and very distinctive pattern, just 3 bytes with highest bit set forming a valid UTF-8 sequence are enough to tell if something is UTF-8 with 99% confidence. Even with 2 bytes with highest bit set forming a valid UTF-8 sequence, you can already get to 90%.
In a case it wasn't UTF-8, and was some 8-bit code page instead, it would be impossible to tell just by looking at the bytes alone. Without any other information, you would basically have to brute force by decoding it in various 8-bit encodings and then seeing if it looks correct. The other possibility is using an algorithm that would go through the encodings automatically, and see if it the result makes sense in any language.
With more information like what operating system and locale the file was saved in, you could reduce the amount of possible encodings to try by a huge deal though.

What's the characterset of SHA1?

I need to know what character will the SHA1 will generate for me?
Is it possible to know the characterset of the SHA1? Or if it's configurable, what's the default characterset of it?
Thank you.
SHA-1 doesn't generate text, it generates a binary hash (like most digests), so it doesn't have a charset (or care about the input's charset for that matter).
You can represent it as text (a string representation of the hex value, and base64 are popular) if you want, especially if you need to transfer it over the network or display it to users. That encoding is up to you.
I'm fairly sure it's just binary data rather than any character encoding. You could then encode that in Base64 if you like.
The hash algorithm SHA1 takes a stream of bytes as input, and calculates the 160-bits digest. Command line versions output the digest as a hexadecimal string. No charsets involved.

Method for generating numerical values from a URL

In the 90s there was a toy called Barcode Battler. It scanned barcodes, and from the values generated an RPG like monster with various stats such as hit points, attack power, magic power, etc. Could there be a way to do a similar thing with a URL? From just an ordinary URL, generate stats like that. I was thinking of maybe taking the ASCII values of various characters in the URL and using them, but this seems too predictable and obvious.
Take the MD5 sum of the ASCII encoding of the URL? Incredibly easy to do on most platforms. That would give you 128 bits to come up with the stats from. If you want more, use a longer hash algorithm.
(I can't remember the details about what's allowed in a URL - if non-ASCII is allowed, you could use UTF-8 instead.)

Resources