Quick, Docker-based log4shell vulnerability testing

If you have to test a JAR file for log4shell vulnerability but do not want to install any tools / do not trust them, a possible no-install low-ceremony is to use Grype in its Docker mode, which allows you to install absolutely nothing on your machine.

Only prerequisite is to have Docker installed on your machine and, of course, the JAR file or folder you want to test.

First, create a Dockerfile like this :

FROM scratch
ADD test.jar .

Or in the case of a complete folder to test :

FROM scratch
ADD folder-under-test/ .

Now compile the Docker image using a similar command :

docker run –t jpgouigoux/testjar .

You will of course adjust to your Docker account so you can upload the image to the public registry (unless of course your JAR files are confidential, and in this case you need to operate on a private registry).

It is then easy to use Grype to assess the vulnerability of you SUT, with the following command that will leave your local machine exactly as it was before, only displaying the report :

docker run –rm anchore/grype jpgouigoux/testjar

The results (in this case, showing the CVS 2021-4428 vulnerability) will quickly be displayed in text :

image001

Posted in Uncategorized | Tagged | Leave a comment

Let’s interrupt developers !

Cet article est également disponible en français.

The context

If you read articles about IT around the internet lately, you may have come across articles explaining how bad it is to interrupt developers in their programming job. A but like putting water on a Gremlin after midnight: the world collapses as soon as the poor guy (or poor girl, but there are not that many among us programmers, and I never see them complaining like that) is interrupted.

A super-natural, almost mystical, state of “flow” is supposed to progressively surround the developer, who is going to be able, his concentration full, to perform miracles of programming speed. No question about code quality, but does he care…

In case you missed this, here are a few links, but there are so many others :

http://www.infoworld.com/d/data-center/no-interruptions-technologist-work-247487

http://heeris.id.au/2013/this-is-why-you-shouldnt-interrupt-a-programmer/

http://programmers.stackexchange.com/questions/46252/how-to-explain-a-layperson-why-a-developer-should-not-be-interrupted-while-neck

http://blog.ninlabs.com/2013/01/programmer-interrupted/

http://casa-laguna.net/all-the-news/show/do.-not.-ever.-interrupt.-a-programmer

http://www.drdobbs.com/tools/just-let-me-code/240168735 

Reality

Of course one needs to have some quiet time to efficiently write some code, but this is true of any other type of activities. Same as for reading a book, falling asleep, writing a blog article, etc. Most activities (particularly executed by men, who were born monothread) need that their author does not have its attention broken to be performed correctly. Trying to make people believe that it is more important for developers is behaving like a diva.

Moreover, this is quite revealing from a way of thinking that is, in my humble point of view, the source of many problems.

Problem #1: some think coding is the main activity of a developer

OK, I will pleade guilty about this one. I have done my fair share of pure-code programming, on personal projects realized at night for the only sake of being able to code without analyzing anything or writing any documentation. Hey, I even published some code in open source forges despite an extremely limited use for others (as attested by the low number of downloads).

Thinking about it for some time (OK, for quite some time: I have been coding for 28 years), I now know that coding is only a quarter, maybe 30%, of the work of a developer if one wants to be efficient at it.

Problem #2: is it the right way to program?

If you need 15 minutes to get into your project, there are good chances that it has not been decomposed enough, and that the complexity of each portion is still too big to be dealt with without incorporating many bugs while doing so.

If that much time is necessary to rewind your developer brain clock back to where it can go ahead again, that means your subject of thinking has not been modeled enough. Frédéric Lordon talks about a “technical frontier”, where concepts and specialized vocabulary can be used to quickly reach a state of mind where progress are easier among people understanding those. Maybe our discipline, however full of technical expressions, lacks intellectual tools and semantic shortcuts?

Problem #3: “flow” is a great invention…

… but there are good chances that, when you feel you are in it, you may simply experience the power of… routine! A bit like when one plays Five Dots and ends up simply erasing dots without looking at the score, in an hypnotical way.

Same for the coding: we all have experienced this moment where, taken in the “flow” (or any other way you name it), we end up adding unplanned-for functionnalities or thinking about technical possibilities that do not correspond to any documented customer use case.

Not convinced? Next try you hit this so-called “flow”, try this: erase everything you just coded. Really, do so and start again. You will notice that second time goes much quicker, does not need and particular concentration (since this time, you have made the effort of decomposing the problem before throwing yourself into the code), and most important, the resulting code is cleaner and more compact.

Problem #4: a fair part of fatuity

My personal feeling is that a lot of developers hide behind this so called “flow” to make others believe they are great coders. And this way of insisting on the capital importance of not interrupting them (see all links above, and you can find many other similar ones) smells a bit like taking advantage of people not understanding our job to make them believe anything and go on quielty do our own coding business without much attention to quality.

Code quality is still a problem, though. IT is still very young compared to other engineering disciplines. So, it is very important to keep away from this bad impression of quality, as the “flow” is simply a state of concentration authorized by the fact that one thinks of only one thing : the code (and, as said earlier, not its impacts, secondary functions, etc.)

Do my small open source projects or the rare times I was in this so-called “flow” at work produce some better code than what I wrote step by step? The answer is no. There may have been some important bits of code (used in production by a few thousands customers), but they are not of better quality than the other bits.

Nowadays, if I need more than ten minutes to code a particular functionality, I will not even start coding, because it means I have not decomposed the problem enough to code it efficiently and with the right degree of quality in each of its modules…

While we’re at it

Maybe this is the right moment to try and slash a few stupid things we can read about this subject:

  • Instant Messaging “If you really, really need me, you can interrupt, but expect a grumpy return.” (source): man, if it is so important so be in “the flow”, wouldn’t it be simpler to simply turn Messenger off?
  • Same for Skype (source).
  • The comparison with stopping a surgeon in his job may be the most infatuated of all (same place as previous): in addition to be extremely boastful (after all, less than 1% of us developers work on some code lives depend on), the comparison is wrong anyway: on a multi-hour surgery, a surgeon will stop several times, use the help of assistants, explain an intern the way he proceeds, etc. He definitely is not wearing a helmet to work in isolation, risking to commit mistakes nobody can control.
  • And so many other dumb comparisons that I will stop here…

Conclusion

Let’s interrupt developers! That’s right, do it now! If you see one with his helmet of earplugs, and who’s been typing like crazy for the past two hours, stop him… Tell him to take a break, think a bit about what he is doing, explain it to you. Ask him for a diagram, challenge him with producing a different object-oriented structure. Do you really think he won’t have to take back any of the code he has just laid at once? That the two of you won’t have a single idea to improve what he’s done?

Posted in Uncategorized | 4 Comments

RedGate publishes a second “performance tips” free book

After the first “50 ways to avoid, find and fix ASP.NET performance issues” free book, RedGate publishes a second one, even more technical, called “25 secrets for faster ASP.NET applications”. As the title hints at, the second one is less into the profiling and more into the improvement.

This is a great resource for up-to-date tips on performance, including much on the use of async / await patterns.

The book is available on http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/entrypage/faster-asp-net-apps, and for more information, you can refer to Michaela Murray’s blog.

Posted in Uncategorized | Tagged | Leave a comment

An ADO.NET provider for Bamboo Prevalence

It took me a LOOOOONG while (first announcement a year ago), but there it is : the long-awaited ADO.NET provider for Bamboo Object Prevalence engine is finally available, under an Open Source license of course, on  GitHub :

https://github.com/MGDIS/mgdis.data.bambooclient

image_thumb[3]

The company I work for, MGDIS, has released it under LGPL 3. I am not going to reproduce the whole readme right here, but let me just give you a small explanation to wet your appetite Sourire :

Object prevalence is an old concept that has recently reborn with the NoSQL approaches. As the drawbacks of tabulated storage are more and more recognized, the NoSQL movement proposes new way to persist data in a format that is closer to the way it is used in memory. Sometimes, these solutions even store the data in memory, at least temporarily.

Object prevalence goes to the extreme of the concept, by establishing the in-memory object-oriented model as the persistence model. Thus, there is no ORM and no conversion whatsoever when using a prevalence engine. The persistence to disk, which is necessary to spare the data of a power failure for example, is done by the engine, in a way that is transparent to the programmer. The developer only manages commands and queries on the model, which makes object prevalence a good choice for CQRS architectures.

One of the limits of object prevalence (and generally-speaking NoSQL solutions) is that it forces to abandon the SQL legacy requests. Sometimes, this accounts for a fair amount of intelligence and can slow down, or even hinder, the migration to object prevalence, despite the huge advantages in performance as well as code-writing ease and robustness.

The goal of this project is to bridge the gap and allow for a progressive migration by providing an ADO.NET provider for Bamboo.Prevalence, which allows to access prevalent data by using legacy SQL, once a mapping is established between the data in memory and the names of the fields and tables in the SQL requests.

The way it works is simple and complicated at the same time. In theory, one only has to decompose the SQL, and then construct a Linq syntax to execute the corresponding request against the objects managed by the prevalence engine. In practice, though… this means dynamically constructing a lambda expression. And this… is much more complicated. Let’s see what we have here :

  • one level of indirection between SQL and the model
  • another between SQL and the composed request tree
  • another one between the tree and the Linq request
  • a fourth one between Where or Having and the lambda expression
  • a fifth between compiled lambda and its memory representation
  • a final one between the memory representation and the .NET Expressions API

This complexité has worn out two interns before finding the right one, namely Damien Gaillard, who did an excellent job implementing this architecture. The code has only been modified a little to be published on GitHub (English version, comments rewriting, different classes structure), and some parts are missing like Group By, Having, etc. I will add them later on, the essential being that the principles of the provider are exposed to anybody’s comments. Even if you do not understand the principle of an ADO.NET provider for Bamboo, please do not hesitate in casting an eye to the code. All constructive comments are welcome !

I hope this little project will help Object Prevalence adoption, by easing migration from heavy SQL legacy application, and that it will also give some people good ideas about dynamic programming. About this particular point, by the way, a next version based on Roslyn is bound to happen some day, but I dare not announce a date, since it already took me a year to publish this first version Sourire

Posted in Uncategorized | Tagged | Leave a comment

Win a Surface RT tablet from a performance hint on ASP.NET ou SQLServer !

Our friends at Red Gate have run a nice contest, offering a Surface RT tablet for the best performance trick in both SQL Server and ASP.NET categories. Moreover, these tricks will be published in an e-book !

image

You have your chance : I am not participating Sourire. I will be a judge in the contest ! So, everyone go to http://www.simple-talk.com/blogs/2012/11/15/application-performance-the-best-of-the-web/ and Good luck to all !

Posted in Uncategorized | Leave a comment

Debugging “Skipping logentry” in WS-I Basic Profile Testing Tool

Cet article est également disponible en français.

Web Service Initiative tools allow us to test a web service in terms of conformity to Basic Profile. This is partly done statically through a WSDL analysis, and partly dynamically, based on messages captured from the service use.

To realize such a capture, a monitoring tool is supplied and, using proxy / MITM method, records everything that passes around between client and server. This generally works fine, but when passing the logs to the analyzer to create the final report, one can get this kind of errors :

image_thumb19

The problem of course is that the log entries are not being taken into account by the analyzer, and the resulting report will be static, and quite limited.

Where this becomes a real pain is that there is basically nothing on the web about this problem :

image_thumb21

The only page referencing this (when one extends the search by forgetting the second part of the message, one only finds two additional non-relevant pages) explains the problem, but nobody gave any answer :

image_thumb23

In short, it starts smelling bad, and we are going to have to dive into the code. Luckily, WS-I at least provided a .NET version of the tool (it would have been even better with PDB or source code, mind you). So, let’s jump in the different DLL files, passing them to ILDASM. I give you the right one : WSITest-Assertions.dll contains our error message. We can verify this by opening the library inside ILDASM :

image

Then, we make a dump of it :

image

We accept the standard options. No need to get complicated here : anyway, we are just looking for a string…

image

In order to search for the string more easily, we save the dump as IL code :

image_thumb29

Then we can open the file just created. It happens that the “Skipping logentry” message is in three different places in the code, each time with the same sub-message “not from specified service” :

image_thumb31

Going up, we find the class, and can use Reflector or ILSpy to see the corresponding code :

image_thumb33

Piece of luck : the three code parts actually call the same “ProcessMessage” function, which is located in a second assembly :

image_thumb35

Clicking on this function brings us to the library, which is WSITest-AnalyzerCore :

image_thumb37

Inside, we indeed reach the method called ProcessMessage :

image_thumb39

When taking a look at the code, we see that there is a link to a correlation mode. This sounds quite logical, since the code is supposed to link the configuration parameters and the ones from the log. The log entries should indeed correspond to the service under test :

image_thumb41

But where the problem seems to be is that the code does not even talk about the “operation” correlation mode. It only deals with Endpoint and Namespace modes :

public bool ProcessMessage(HttpSoapLog logRequest, HttpSoapLog logResponse, CorrelationType correlation, WsdlOperation wsdlOperation)
{
    return this.MessageInEndpoint(logRequest) && (correlation == CorrelationType.Endpoint || (this.MessageInNamespace(logRequest) && (correlation == CorrelationType.Namespace || wsdlOperation != null)));
}

However, in the configuration file supplied by the WS-I Basic Profile tool, namely analyzerConfig.xml, the correlation mode is valued to “operation” :

<logFile correlationType=”operation”>traceLog.xml</logFile>

So, we try with a different mode, namely Endpoint, which seems more adapted :

<logFile correlationType=”endpoint”>traceLog.xml</logFile>

And it works ! Messages from the log are then correctly taken into account by the analyzer :

image_thumb43

Question : whatever would we do without ILDASM / ILSpy ?

OK, I know : we would drop a mail to WS-I, asking for bu correction… But their FAQ is not technical and there is no forum (nor activity in the tools, by the way). So, short of debugging on one’s on tools, nothing short to correct the problem…

Hope this helps !

Posted in Uncategorized | Leave a comment

First review of my book in English

Thanks Gregor for taking some of your time to write this !

http://gregorsuttie.com/2012/06/10/book-review-practical-performance-improving-the-efficiency-of-net-code/

Really glad you liked it, and since English is not my native language, nothing could please me more to know you found it well written Sourire But most of the merit there is from my wonderful editor Marianne Crowder !

Posted in Uncategorized | Tagged | 1 Comment

LeakedIn SHA-1 hash : a salt may not have been enough

image Cet article est également disponible en français.

The context

Many of you are certainly aware of the fact that somebody succeeded in stealing LinkedIn a 6.5 million password SHA-1 file lately. In practice, those were not clear passwords but hashed ones. However, the algorithm had been used in an unsecure way, namely without salt.

To give a very quick explanation to beginners, the principle of a hash is to transform a word into another hardly readable, without there being a mathematical way to inverse the transformation. The hash for “password”, for exemple, is “5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8”, and there does not exist a theoretical way to retrieve “password” from there.

image

This capability is interesting for storing passwords, because it guarantees that, if the hash is the only thing stored, it is still possible to evaluate a future password by comparing its hash with the reference hash. And yet, even the database administrator cannot view the clear password.

This is for theory. In practice, well… there is a flow in the deterministic way of a hash algorithm. Since “password” will always result in a hash of “5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8” (and this is quite necessary, otherwise we could not use it for password hashes comparison and thus authentication), that means it is possible to pre-compute hash dictionaries based on current passwords, and then directly compare hash values, in order to try and deduce the corresponding password.

image

The countermeasure there is to use what is called a salt. Salting the hash algorithm consists in adding well-known “noise” around the password. This way, instead of hashing “password”, we would for example compute the hash for “PReFiX32password67coucou”, for which there are far smaller chances of finding the value in a hash dictionary. As far as password validation, no problem, since the authentication module knows the salt to be added before computing the hash.

image

The mistake from LinkedIn was to not add any salt to their use of SHA-1 hash method (or at least at some point in time, when this file was extracted). So, when the file in question has leaked, anybody was able to find known passwords there, like “password”, “123456”, etc.

The hypothesis

Obviously, this is a mistake from LinkedIn. It is all nice and well to explain that their team hosts some security experts (http://blog.linkedin.com/2012/06/09/an-update-on-taking-steps-to-protect-our-members/), but if they forget to use a salt – which any decent security-oriented programmer knows about – no use to make too much fuss around that…

Yet, I am making the hypothesis that a salt, applied uniquely over the whole file, would in fact not have changed much to the situation.

Let me explain myself. In such a huge file with 6.5 millions passwords, there are all chances to find a large number of “common” passwords, like “password”, “123456”, etc. Which means that, if the hash method is unique, a brute force attack on the salt, while restraining to only these targets passwords, could be enough to find its value. This of course will take some time, but I make the guess that it would only multiply attack time by two instead of the number of possible strings. In short, using relative time units, we would have :

image

I have written this long article to relate my path through different experimentations along this hypothesis. The code is in C#, but will be easily understandable, whatever your language background.

Finding the files

Before being able to toy with them, I had to find the file in question. The original link does not work anymore :

image

But if you follow the conversations on https://news.ycombinator.com/item?id=4073309, you will undoubtedly find a copy without any sweat… I imagine there are now thousands in the world. I do not see the use of removing the initial link, as it is only going to encourage disk space waste for duplicates. And this is pointless, as the file is now everywhere.

Anyway, this is not the question, and we find another site where to get the file :

image

A 20 minutes download later, we can take a peek in the file, and this seems to be the correct one, with some of the hashes beginning with five “zero” caracters, as seems to be an abundantly talked about caracteristics on http://dropsafe.crypticide.com/article/7235.

image

After that, we download a second file wich contains a dictionary of common passwords, found on DazzlePod, and we start writing a quick test in order to try and retrieve the most common values :

        [TestMethod]
        public void PresenceOfSimplePasswordsInReferenceFile()
        {
            List<string> words = new List<string>() { "password", "123456", "superman" };
            string line;
            int nbWordsFound = 0;
            int nbAnalyzedLines = 0;
            using (StreamReader reader = new StreamReader("passwords.txt"))
            {
                while ((line= reader.ReadLine()) != null)
                {
                    nbAnalyzedLines++;
                    if (words.Contains(line))
                        nbWordsFound++;
                    if (nbWordsFound == words.Count)
                        break;
                }
            }
            Assert.AreEqual(words.Count, nbWordsFound);
        }

This file contains a little more than two millions passwords. I do not know how they have been gathered, but they correspond to possible passwords, and are not some random-generated characters sets. Anyway, we found the common passwords as expected. We are now going to use this file to lead a targeted attack on the hash values in the LinkedIn file.

Searching SHA-1 hash values, using brute force

The next step consists in searching SHA-1 hash of these passwords in the LinkedIn file :

        [TestMethod]
        public void PresenceSHA1OfPasswords()
        {
            List<string> words = new List<string>() { "password", "123456", "superman" };
            SHA1 engine = SHA1CryptoServiceProvider.Create();
            List<string> hashes = words.ConvertAll(m => BitConverter.ToString(
                engine.ComputeHash(Encoding.UTF8.GetBytes(m))).ToLower().Replace("-", string.Empty));

            string line;
            int nbHashFound = 0;
            int nbAnalyzedLines = 0;
            using (StreamReader reader = new StreamReader("SHA1.txt"))
            {
                while ((line = reader.ReadLine()) != null)
                {
                    nbAnalyzedLines++;
                    if (hashes.Contains(line))
                        nbHashFound++;
                    if (nbHashFound == words.Count)
                        break;
                }
            }
            Assert.AreEqual(hashes.Count, nbHashFound); // Fails
        }

We have not found them directly, but this is where we have to take into account the “00000” in prefix. If we replace the prefix, no problem, we found everything we looked for (I only show modified code from listing above) :

List<string> hashes = words.ConvertAll(m => string.Concat("00000", BitConverter.ToString(
    engine.ComputeHash(Encoding.UTF8.GetBytes(m))).ToLower().Replace("-", string.Empty).Substring(5)));

OK. Now, we can step up to more serious things, by crossing the entire content of the files, in order to gather all “simple” passwords in the “LeakedIn” file. To do so, we obviously are going to use brute force. And I am really talking about “brute” : I am simply going to stupidly loop over the values, without any optimization. In practice, a malicious person would use more sophisticated tools like John The Ripper (particularly since a dedicated  extension for LinkedIn hash manipulation, taking into account the 00000 prefix, has been released).

In just the time necessary to type the previous paragraph, this sub-optimal code (although using excellent hardware – Dual Xeon, lots of RAM, SSD) has found already 20 or so passwords (sorry about the screen captures in French – as you most certainly have notived, English is not my native language) :

image

Normally, letting it run during a night would bring enough passwords to have a statistically large set for my experiments. The code is consequently modified to throw out the results in a CSV file :

    [TestClass]
    public class TestHashes
    {
        private SHA1 engine = SHA1CryptoServiceProvider.Create();

        private string CalculateReducedHash(string password)
        {
            return string.Concat("00000", BitConverter.ToString(
                engine.ComputeHash(Encoding.UTF8.GetBytes(password))).ToLower()
                .Replace("-", string.Empty).Substring(5));
        }

        [TestMethod]
        public void RechercheMotsDePasseUsuelsDansFichierFuiteLinkedInVersionToutEnMemoire()
        {
            string output = DateTime.Now.ToString("yyyy-MM-dd-hh-mm-ss");

            string[] table = File.ReadAllLines("SHA1.txt");
            List<string> hashes = new List<string>(table);

            string line;
            int index = 0;
            Stopwatch chrono = Stopwatch.StartNew();
            using (StreamReader reader = new StreamReader("passwords.txt"))
            {
                while ((line = reader.ReadLine()) != null)
                {
                    index++;
                    string hash = CalculateReducedHash(line);
                    if (hashes.Contains(hash))
                    {
                        Debug.WriteLine(string.Format("{0} trouvé sous le hash {1} - {2} / 2 151 220 - {3}",
                            line, hash, index, chrono.Elapsed));
                        File.AppendAllText(output, string.Concat(line, Environment.NewLine));
                    }
                }
            }
        }
    }

Then, we only have to wait. A little less than 24 hours later, the results are as follows :

image

42 799 hash values have been found in the LinkedIn file, out of 412 116 paswword tested among the 2 151 220 contained in the downloaded password dictionary.

Putting it in second gear

This was quite some fun, but I am not going to leave my computer on for three days just to gather a set of passwords in order to experiment a hypothesis on the presence of salt in hash algorithms ! 40 000 passwords is already quite nice, but I would like to work on the largest size of passwords set, in order to valider as well as possible my idea. The code is then modified in order to use the full eight cores of my CPU.

image

We are going to use something like this :

    [TestClass]
    public class TestSHA1Breach
    {
        private List<string> hashes;
        private string identifier;
        private string outputDir;

        [TestMethod]
        public void TestParallelBreachSHA1LinkedIn()
        {
            identifier = DateTime.Now.ToString("yyyy-MM-dd-hh-mm-ss");
            outputDir = string.Concat("ParallelResults-", identifiant);
            Directory.CreateDirectory(outputDir);
            
            string[] table = File.ReadAllLines("SHA1.txt");
            hashes = new List<string>(table);

            using (StreamReader reader = new StreamReader("passwords.txt"))
            {
                List<Task> tasks = new List<Task>();
                int limit = int.MaxValue;
                string line;

                List<string> words = new List<string>();
                while ((line = reader.ReadLine()) != null && limit-- >= 0)
                {
                    words.Add(line);
                    if (words.Count == 100)
                    {
                        Task t = new Task(new Action<object>(Treatment), new List<string>(words));
                        tasks.Add(t);
                        t.Start();
                        words.Clear();
                    }
                }

                foreach (Task t in tasks)
                    t.Wait();

                string completeFile = string.Concat("parallel-", identifier);
                foreach (string file in Directory.GetFiles(outputDir))
                    File.AppendAllLines(completeFile, File.ReadAllLines(file));
                Directory.Delete(outputDir, true);
            }
        }

        // We could also use BlockingCollection, but the advantage of files is that when the test fails
        // in the middle of execution, because of a lock or any other reason, we do not lose every result.
        //private BlockingCollection<string> Results = new BlockingCollection<string>();

        private void Treatment(object words)
        {
            SHA1 engine = SHA1CryptoServiceProvider.Create();
            List<string> Results = new List<string>();
            foreach (string word in words as List<string>)
            {
                string hash = string.Concat("00000", BitConverter.ToString(
                    engine.ComputeHash(Encoding.UTF8.GetBytes(word))).ToLower()
                    .Replace("-", string.Empty).Substring(5));

                if (hashes.Contains(hash))
                {
                    Debug.WriteLine(string.Format("{0} trouvé sous le hash {1}", word, hash));
                    Results.Add(word);
                }
            }
            string outputFile = Path.Combine(outputDir, Guid.NewGuid().ToString());
            File.AppendAllLines(outputFile, Results.ToArray());
        }
    }

I am not going to get into details, but this code is roughly eight times quicked than the previous one, as it uses all eight cores of my machine. We could of course go even further by using the GPU, but this is quite some work, and since the estimations now are around 15 hours for the whole computation, no use to make this effort. Again, had the goal been to simply crack as many passwords as quick as possible, we would use John The Ripper, which by the way now has a plugin to use OpenCL. With the possibilities of Cloud Computing, I am convinced there are tools somewhere that push the performance even further by massively distribution of the computing.

Finally, we obtain the file with all the passwords for which the hash has been found in the LinkedIn file. Out of 6.5 millions hash, this method brought us circa 600 000 passwords, which is quite good, since we can consider that very common ones are highly duplicated. The whole operation lasted 18 hours, to be compared to the 15 hours estimated…

Now to the serious things

The next step brings us closer to the hypothesis to be tested. Until now, we have only gathered data to test it. Now, we are going to use the passwords we found as a set for following tests. To do so, we create a file with the hash values of these passwords, but this time with a salt, and we are going to see if the statistical knowledge of these common passwords indeed helps us realizing a brute force attack not on the passwords themselves, but on the salt used for their hashing.

To operate the test, I use a salt that serves as an example with my students : I only prefix the value buy “zorglub”. I thus recalculate the file with this salt, and we are going to test the proposed method.

For each salt attempted value, we could test only the 25 most common words (see http://splashdata.com/splashid/worst-passwords/index.htm) :

  • password
  • 123456
  • 12345678
  • qwerty
  • abc123
  • monkey
  • 1234567
  • letmein
  • trustno1
  • dragon
  • baseball
  • 111111
  • iloveyou
  • master
  • sunshine
  • ashley
  • bailey
  • passw0rd
  • shadow
  • 123123
  • 654321
  • superman
  • qazwsx
  • michael
  • football

The idea is to loop over attempted values of the salt, calculate hash values for these 25 words, and to try and find at least one of the value in the target hash file. If we find one, there is a good chance that we have found the salt, and we can then go back to the traditional brute force attach over passwords that we have demonstrated above as working, even on large volumes.

The statistical frequency of the first words in the list is such that we could use only a few of them. For my tests, I am even going to start with only the most common one (“password”). In fact (and this is the core of the problem in using a unique salt on a set of several millions passwords), the set is so big that we are almost assured of hitting one value corresponding to this common password. The code is the following :

    [TestClass]
    public class TestSaltAttack
    {
        private List<string> hashes;

        [TestMethod]
        public void TestParallelSaltSearch()
        {
            string salt = "zorglub";
            int limit = int.MaxValue;

            // Preparing the file with the hash under the chosen salt
            string filePasswordBruteForceObtainedFromLinkedIn = "FichierObtenuSansParallelisme.txt";
            string fileTargetHash = "FichierHashesSelEnCours.txt";
            SHA1 engine = SHA1CryptoServiceProvider.Create();
            using (StreamReader reader = new StreamReader(filePasswordBruteForceObtainedFromLinkedIn))
            using (StreamWriter scribe = new StreamWriter(fileTargetHash))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    scribe.WriteLine(BitConverter.ToString(engine.ComputeHash(
                        Encoding.UTF8.GetBytes(string.Concat(salt, line)))));
                }
            }

            // Storing all the corresponding values in memory to speed up the process
            hashes = new List<string>(File.ReadAllLines(fileTargetHash));

            List<Task> tasks = new List<Task>();
            using (StreamReader reader = new StreamReader("passwords.txt"))
            {
                string line;
                List<string> hypotheses = new List<string>();
                while ((line = reader.ReadLine()) != null && limit-- >= 0)
                {
                    hypotheses.Add(line);
                    if (hypotheses.Count == 100)
                    {
                        Task t = new Task(new Action<object>(Treatment), new List<string>(hypotheses));
                        tasks.Add(t);
                        t.Start();
                        hypotheses.Clear();
                    }
                }
            }

            foreach (Task t in tasks)
                t.Wait();
        }

        private void Treatment(object hypotheses)
        {
            // Treating every hypothesis consists in checking that, by applying this salt on "password",
            // we obtain a hash contained in the target file. If this is so, there are statistically good chances
            // that we have found the has salt.
            SHA1 engine = SHA1CryptoServiceProvider.Create();
            List<string> Results = new List<string>();
            foreach (string hypothesis in hypotheses as List<string>)
            {
                string hash = BitConverter.ToString(engine.ComputeHash(
                    Encoding.UTF8.GetBytes(string.Concat(hypothesis, "password"))));

                if (hashes.Contains(hash))
                {
                    Debug.WriteLine(string.Format("Il y a de bonnes chances pour que {0} soit le sel", hypothesis));
                    File.AppendAllText(string.Concat(
                        "ResultatRechercheSel-", 
                        DateTime.Now.ToString("yyyy-MM-dd-hh-mm-ss")), hypothesis);
                    Environment.Exit(0);
                }
            }
        }
    }

The very first wave of attempted values for the salt (hypotheses), for a brute force attack, is of course using the same password dictionary, and use them as prefix. Then, we would try them as suffix, then as both, then we would integrate additional symbols, etc.

Results

This first approach gets the salt in only a few minutes. It just happens that the salt I often use as an example with my students is the 1 698 685th entry in the password dictionary.

First brute force attack on a simple salt over a 6.5 millions hash values file : solution found in 4 minutes !

We could of course make the salt more complex, for example with “Zorglub64” as a prefix, and “Batman28” as a suffix, but there would be something quite artificial in adapting the algorithm to what we are looking for.

Moreover, what we wished to demonstrate has already been so through this example : right from the time where we know there were good chances of knowing one of the password and if we have a large enough set, using a unique salt for all the values is almost useless. Instead of multiplicating the search cases, the salt only adds a step in the attack. In statistical extremes, we could at best multiply the attacking time by two, which is of course far from enough.

Summary

A small table to sum up the situation :

image

In short, if you expose a large number of hash values and some passwords are bad quality ones (for example among the 25 most common ones), even if their proportion is very low, a simple salt will simply not add any security.

The whole idea behind is that the sheer volume of data, instead of slowing down the attack, on the contrary gives more chances to quickly validate an hypothesis on the value of the salt. The conclusion is that a salt should not be unique for large volumes, otherwise it is losing some of its efficiency.

A solution

The explanation of the problem was quite long, but the solution to it is quite simple : one should not use a single salt for large volumes of passwords. We could for example imagine a variable salt (for example based on the hash of the creation date of a user), or even an exogen variable, like a unique account identifier (but definitely not the login itself : the only thing that prevented the LeakedIn file to being an absolute catastrophy was that the login values were not in the same file).

Here is a proposal of ranking different modes for using a salt over a reference of twenty (the password target for the hash is in green, and the salt in grey) :

image

There is no 20/20 grade, as no method is 100% secure. Even SHA-1 in itself is not free from potential risks : collisions-generating algorithms have been found that are quick enough to represent an actual problem (http://people.csail.mit.edu/yiqun/SHA1AttackProceedingVersion.pdf, or for something a little bit easier to read, http://www.rsa.com/rsalabs/node.asp?id=2927). If you use SHA-1 for providing digital signature, you may have an interest in using a more modern algorithm, like SHA-256.

Beware of the trap of false security given by “exotic” methods, such as inverting letters of the password, etc. Whatever you imagine, these methods are quite predictible for someone with a bit of imagination, and malicious users definitely have a lot of it. Once such a method is included in an automatic attack tool, you are back to zero security…

Posted in Uncategorized | Tagged , | 1 Comment

My book about .NET performance profiling just got released!

image

You can get it from Simple Talk web site, at

http://www.simple-talk.com/books/.net-books/practical-performance-profiling-improving-the-efficiency-of-.net-code/

I am quite proud to make this announcement, as I have really worked like crazy on this international version. At first, I thought it would be a “simple” exercise of translation from my original French book, but it turned out it took much more than the few months I thought necessary.

In addition to rewrite the book, I had to rewrite the whole sample application code. And it was also a good opportunity to update to .NET 4.0, which included adding some paragraphs. Some others had to be modified following advice from my excellent technical reviewer Paul Glavitch. At last, my editor Marianne Crowder was as precise in the language that I wanted this book to be, and some parts had to take almost ten roundtrips between us to get both of us satisfied. Finally, I spent more time writing this book in English than I did on the initial French version!

As a release event, I am doing a webinar with Red Gate, on the subject of performance profiling in .NET, of course: see http://gouigoux.com/blog/?p=17 for more details, and I hope to see you there!

Posted in Uncategorized | Leave a comment

Practical Performance Profiling : the webinar

Tuesday, April the 3rd, 4:00 to 5:00 PM, British Standard Time, will be my webinar about performance profiling in .NET. To celebrate the release of the English version of my book on this particular subject, Red Gate invited me to share a few patterns, using ANTS Performance Profiler 7.

image

I will then give a few hints and tricks to quickly find bottlenecks in .NET applications, and solve them. The Q/A part will be co-animated by Chris Allen from Red Gate, who has 10 years experience in suport, so you can come up with tough questions: he is the person for the situation.

To register : https://www3.gotomeeting.com/register/829014934

Hoping to talk to you soon Sourire

Posted in Uncategorized | Leave a comment