To answer your question about people stealing your idea.. If you don't patent it, then nothing. But you still have the recognition from the open source community, and that can lead to many lucrative relationships. Just an avenue is all.
if your method works as well as you say, dont worry about money, throw the 6k patent fee on a credit card, get rich, buy the rockets and give us all courtside as for your idea, if it is radically different than lzw... the problem with it is probably the pigeon hole problem, if you think you can compress a million 1s and 0s into 10 1s and 0s, the problem is there are obviously not enough combinations of 10 bits to represent every possible combination of a million bits, so I would bet your solution will only work on a small subset of all possible data, you should ask yourself how your solution gets around that if i had to guess what your idea is based on your posts, I would guess you are thinking of representing the godfather hd as one REALLY HUGE number, and then finding a few smaller numbers and applying some math equation maybe 'a^b+c = the huge godfather number', then u just send out a,b,and c which are much much smaller than the godfather number if your idea is similar to that, it has already been thought of and it doesnt work that well in practice.. google 'the gold at starbow's end compression' for an explanation good luck though
just curious, what would be the size of the file if you compress 1G or 100 M or even 10 M of mp3s together? And then when you decompress it, can you still play it in a media player?
you lost me here.... i picked 21 bits (20 to describe the length and 1 to describe the pattern)...but I could argue for 1 bit in the pigeonhole fashion...lol in a one pass fashion you build string patterns (lengths) and hash them and merge them as they become more common. Whatever you do you need string len markers. You can't put 1,000,000 zeros into a compression in 7 bits, because you cannot describe the length in less than 20 bits unless you pigeonhole your formula...ie, any math formula to describe it has to have a remainder that might be as long (in bits) as the number itself (almost by definition of numbers), which means the other meta data (pattern matching) takes you over your gain.
That's exactly what I thought (and I'm a computer scientist). I've never done data compression, but from the seminars I've been to, it seems that the basic method about compression is finding patterns in your data, then use different symbols to represent the patterns. A naive compressor would produce compressed files that contain the following 2 parts: (1) The pattern table (2) The actual compressed data For example, if you have a file that contains 1 million 0s followed by 1 million 1s, a naive compressor could produce a file similar to the following: pattern table: ---- pattern Id | pattern value --------------------------------- 0 | 0 1 |1 compressed data: ---------------------------- <integer>0<integer>1 The integer in this case would be 1000000. It takes about 32 bits on most desktop computers to store a single integer, and each pattern id is also an integer, therefore, the data portion of the example compressed file would consists of 4 * 32 = 128 bits. However, you have to add the pattern table as well, so assume that you delineate the different patterns using a single character (8 bits), and since each pattern is a single character, you would then need 8 * 4 = 32 bits of storage for the patterns table. So altogether, you can use 160 bits to represent a data of 16 million bits (2 million characters). I understand that your idea is different from the traditional compression technology, but I just don't see how you can compress 1 million 0s to only 7 bits -- which can only represent 128 different patterns.
that said.... right, and you haven't even included the lengths of the patterns, much less the starting points in your table. btw: if y'all wanna buy some of the best compression around for when you need to patch a change ... that is software I started my career with...with math professors et al... .RTPatch http://www.rtpatch.com/ this isn't normal compression. It uses a known previous version for pattern matching, but it beats pkzip in orders of magnitude. basically, it is for software or data upgrades...unix users know what patching is. amazing that it has survived for 15 years. It's all about Corporate clients.
I doubt that's how his idea works though. I mean, each a, b and has to be at least an integer, which means 32 bits each and that's already 96 bits. I don't see how you can use 7 bits to represent that equation. However, if that's how his idea works, here is a website that explains why it wouldn't work: http://www.physicsforums.com/archive/index.php/t-39776.html
don't forget he said that the decom takes a long time...very long...to the point he calls it a problem. not sure if i know of a case where excellent comp is faster than decomp.
The length of the patterns would just be the patterns themselves, so in the example I gave, it's just a single character, but of course you can have different strings in the patterns table and that would increase the size of the compressed file. In the example I gave, you also don't need to represent the start of the data explicitly. For example, all the patterns are delineated by a single white space, and the data portion would be proceeded by a new line symbol, so you can write your code to automatically recognize the start of the data portion/the end of the patterns table. So, using my naive compressor, a file that contains 1 million 0s followed by 1 million 1s would be: 0 1 <1000000><0><1000000><1> (each <xxxx> represents an integer>. And this would take 8 * 4 + 32 * 4 = 160 bits, right? A lot of compression research seems to go into the algorithm for looking for patterns.
No, he said the compression would take a long time, but the decompression would be very quick, which makes sense for the "the gold at starbows end compression" algorithm, however, if that was true, then he will have to decompress the entire file in one go, which means it could represent a memory problem.
right...or you create a tree of singletons for the patterns with pos markers...and we are just getting started.... bottomline:there are many more options...there is no clear cut best of everything class approach. at least, not that i know of....how bout you?
oh, you're right. totally had that flipped in my mind. yeah, he must have read that sci fi novel. haha
I know, what I proposed was the simple run-length-encoding. I'm sure there are much much more complex and cleverer methods for encoding the patterns, with various overheads. No idea, and as I said before, I know very little about data compression (because it's boring ). I just did some more calculation, say that he uses the equation of a^b + c to represent the final "huge number", this "huge number" would be at max 32^32 = 2^160. Now each character takes 8 bits, so the maximum value of a single character's 2^8. Now to represent 2 characters together, you need 16 bits, so the integer value for representing 2 characters would be 2^16. So, if I'm right about my logic, in general case, a file of 20 characters would require an integer of 2^160 to represent. So, unless you use super long integers to represent a, b and c, your compression algorithm wouldn't work that well on very large data files. Of course, there are other complications where the data to be compressed are not simple characters, but integers or floats themselves ... I get a headache just thinking about it.
Yeah, I'd say a parser. A compiler is something takes the output of a parser and converts it to something else. For example, a naive C compiler would have 3 basic components: (1) Tokenizer: recognizing what each separate token stands for (2) Parser: recognizing what each sequence of tokens stands for (3) Assembler: convert each parsed sequence of tokens to machine instructions
What about nested lossless compression? I mean, since the zip file is itself made up of binary digits, couldn't you compress it further by patterns/lengths of its 0's and 1's?
I love this board. Create a thread on something as apparently obscure as compression algorithms and get responses from developers, mathematicians, entreprenuers and even the patent offfice. BTW, notice how the $6000/hr patent attornies go unrepresented? Good Luck WS&C. Just remember ... no regrets.