Meine Bachelorarbeit

Es ist lange her, dass ich das letzte Mal in diesem Blog einen Beitrag geschrieben. Deswegen kommen in der nächsten Beiträge über Dinge die liegen geblieben sind.

Anfang 2016 beendete ich das schreiben an meiner Bachelorarbeit mit dem Titel „Besser Benchmarken“ in deren Rahmen ich das Benchmarkingwerkzeug temci entwickelte.

Es ging in der Bachelorarbeit darum, dass die Anfertigung korrekter Benchmarks recht schwierig ist und deswegen ein Werkzeug zur Vereinfachung nötig ist. Solch ein Werkzeug ist temci, es erlaubt das einfache Erstellen von Benchmarks und eine einfache statistische Auswertung.

Außerdem ging es in der Arbeit um die (theoretischen) Grundlagen für gute Benchmarks. Diese Grundlagen habe ich dann auch in einem Vortrag auf der GPN 2016 vorgetragen:

 

GHC performance over time

Update: This post (and the bug report Joachim Breitner submitted in succession) resulted in the change of the haskell documentation: The statement „At the moment, -O2 is unlikely to produce better code than“ was removed from the documentation two months after this post. A denser and cleaner representation of the data and the main arguments is given in my bachelor thesis (written in german) that I finished around the same time.

TL;DR: There’s no significant difference between the GHC versions 7.* and 8.0.1-alpha regarding the Haskell code used in the benchmarking game. And there might be an improvement of the performance of idiomatic Haskell code from GHC 7.0.1 to GHC 8.0.1-alpha.

I’m currently building a new benchmarking tool called temci as part of my bachelor thesis.

In search for good programs to benchmark for the evaluation part, I stumbled across the benchmarking game. This blog posts is about using parts of the benchmarking game and temci to measure and compare the performance of the following GHC versions: 7.0.1, 7.2.1, 7.4.1, 7.6.1, 7.8.1, 7.10.1 and 8.0.0 (the version from mid January, called 8.0.1 here for brevity) using different optimization levels („-O“, „-O2“ and „-Odph“, see: Haskell documentation) . Weiterlesen

The Pumping Lemma – Poem by Harry Mairson

Any regular language L has a magic number p
And any long-enough word in L has the following property:
Amongst its first p symbols is a segment you can find
Whose repetition or omission leaves x amongst its kind.

So if you find a language L which fails this acid test,
And some long word you pump becomes distinct from all the rest,
By contradiction you have shown that language L is not
A regular guy, resiliant to the damage you have wrought.

But if, upon the other hand, x stays within its L,
Then either L is regular, or else you chose not well.
For w is xyz, and y cannot be null,
And y must come before p symbols have been read in full.

As mathematical postscript, an addendum to the wise:
The basic proof we outlined here does certainly generalize.
So there is a pumping lemma for all languages context-free,
Although we do not have the same for those that are r.e.

There are some other poems by Harry Mairson at http://www.cs.brandeis.edu/~mairson/poems/poems.html. I found this poem while reading a discussion at The Old Joel on Software Forum.

Real life practice in lottery scheduling

I currently have a course covering operating systems at university. We learn in this course several scheduling algorithms. An operating system needs these kind of algorithms to determine which process to run at which time to allow these process to be executed „simultaniously“ (from the users view).

Good scheduling algorithms have normally some very nice features:

  • They avoid the starving of one process (that one process can’t run at all and therefore makes no progress).
  • That all processes run the approriate percentage (defined by it’s priority) of time in each bigger time interval.

But they are maybe not only useful for operating systems but also for people like me. Probably they could help me to improve the scheduling of my learning.

An algorithm that seems to be suited best for this purpose is called lottery scheduling (pdf). It’s an algorithm that gives a each process some lottery tickets (their number resembles the priority of the process). Every time the algorithm has to choose a new process to run, it simply picks a lottery ticket randomly and returns the process that owns the ticket. A useful addition is (in scenarios with only a few tickets) to remove the tickets that are picked temporarily from the lottery pot and put them back again, when the pot is empty.

But how could I use this algorithm in practice?

I take a stack of blanko cards, each card worth around 2 hours of learning time, and assign them to each of my courses. E.g. the course operating systems gets 3 cards, out of the 25 cards that now make up my week.

Now, when I’ve time to learn, I pick a card out of the shuffeled card stack which then tells me to learn about 2 hours e.g. for my theoretical computer science course.

This „practical exercise“ in operating systems will hopefully boost my learning – and of course is funny way to learn, as I never know what I’m going to learn. Later, I probably also could use it to ensure that I’m blogging more regularly here and progress with my compiler project.

The current status of POOL

The POOL programming language is a cool project – it’s fun to write your own programming language that does everything the way you like. I’ll give a short status update for this project in this article, as I’m going to delay it.

The problem with this project and the reason why I delay it (maybe some month or years) is that I realized that I lack the knowledge of creating and implementing programming languages. I’ve never developed a real Turing complete programming language before, but writing the real POOL implementation is such a hugh project that it’s probably better that I gained experience in this field before attempting it again. It’s a pity that I can’t implement it within the next couple of months as I developed POOL in my mind for about half a year.

What I currently have programmed for the POOL project are two failing attempts to write a grammar and a rough draft of a byte code interpreter (aka virtual machine). I’ve spend dozens of days for this attempts and drafts – but there are no real results. And that’s the real reason why I delay this project: I’m working hours and hours without any substantial progress – that’s really demotivating – and the probability that this project therefore fails after hundreds of hours of developing is high.

But what am I going to do next? I’m addicted to writing compilers, parsers and interpreters for unknown languages, but instead of starting another huge project like POOL, I’m going to start many little projects that don’t last  that long.

One of this project’s is OrangeLang a cool project about a minimal language with focus on the complier. I’ll post an article on this blog when the project has more than a rugh compiler draft. (That has been also a difficulty with POOL: I wrote several articles and talked with people about it without having anything to show)

POOL – Drafted schedule and other random notes

The following is a short and drafted schedule for the POOL project, it doesn’t contain any dates and should only give an idea, what I’m going to do next in the project and what I’ve already done.

Almost done

  • Create a draft of the syntax (I’ve to admit that I’m currently working on the class syntax and on the variable container syntax)

TODO

  • Publish the draft
  • Figure out the core libraries (like System, etc.)
  • Parser grammar, which produces the AST
  • Tree filters to optimize and annotate the AST (convert for example 2+3 to 5)
  • Tree based interpreter
    • Objects, scope, functions, etc.
    • Tree based interpretation
    • Optimization of the interpreter
  • More built in libraries
  • Language manual with lot’s of example code and a website as I think the project will (after creating a tree based interpreter) more than half a year old.
  • Byte code based interpreter aka Virtual Machine
    • Byte code compiler
    • Byte code interpreter aka VM

Random notes

POOL – Fibonnaci sequence

I’m going to post some example code snippets, like the following, written in POOL to demonstrate some of POOLs cool features and to have some test code when I’m writing the parser and the interpreter.

This code snippet eveluates the fibonacci sequence recursively, using the memoizing features of the special function type sef_function to avoid on the usual drawbacks of recursive evaluation: the multiple calculation of the same value.


sef_function fibonacci(n){
fibonacci(n - 1) + fibonacci(n - 2)
}
fibonacci(0) = 0
fibonnaci(1) = 1


fibonacci(5) #-> 5
/* Call stack of fibonacci(5):
fibonacci(5)
-> fibonacci(4) + fibonacci(3)
-> (fibonacci(3) + fibonacci(2)) + fibonacci(3)
-> ((fibonacci(2) + fibonacci(1)) + fibonacci(2)) + fibonacci(3)
-> (((fibonacci(1) + fibonacci(0)) + fibonacci(1)) + fibonacci(2)) + fibonacci(3)
-> (((1 + 0) + 1) + fibonacci(2)) + fibonacci(3)
-> ((1 + 1) + 1) + fibonacci(3)
-> (2 + 1) + 2
-> 3 + 2
-> 5 */