Install prettier executable

The emacs mode requires the prettier executable to be installed. The easiest way to install prettier for me was to install it with npm global:

npm install --global prettier

Install prettier-js-mode

prettier-js is no available via Melpa, so the setup is quiet easy:

  • Open your .spacemacs (with SPC f e d)
  • Search for dotspacemacs-additional-packages
  • add prettier-js to it

Example:

dotspacemacs-additional-packages '(simpleclip prettier-js)`

Configure on-save

Search for dotspacemacs/user-config and add these two lines to it:

  (add-hook 'js2-mode-hook 'prettier-js-mode)
  (add-hook 'web-mode-hook 'prettier-js-mode)

Done

Now you can just reload your .spacemacs (SPC f e R) or restart emacs and prettier is available.

This is the second part of my short series on my experiences with Elm while reimplementing Carna.io. Check out my part one for some background info.

Fuzz testing in Elm

I will assume that you are familiar with the general concept of fuzz testing aka generative testing aka property based testing. If you are not familiar with it yet, you can get a nice intro from the elm-test readme or in case you have access to frontend-masters please take a look at the fuzz testing part of Richard’s Course. It is short and simple and is enough to get you started.

Fuzz testing all the time

Now that the background should be clear. There is only one thing I want to mention in this post: Use fuzz testing, ideally in every project! Even if you think it is not worth to add it to a certain project. I don’t mean use it for everything but prefer it over unit testing as the goto tool and you will quickly be surprised by its value.

My experience in Elm

I added tests and fuzz testing to Carna just to get used to it. Then surprisingly my tests failed for an unexpected input. You can find the actual test that failed here. My tests are really rudimentary but they still found multiple bugs that where quiet interesting.

Unexpected String.toInt behavior

In Carna I am asking for numeric values like weight or height and parse them to Int. (Now they are Float, but that is not important here) Thus the value is represented as a Result String Int. That is also what is tested at the line above. Then my tests failed for the input “+”. The reason:

> String.toInt "+"
Ok NaN : Result.Result String Int

I think I would have never thought about using “+” or “-“ as a input value that would cause an Ok result. Especially I would not have thought that the state Ok NaN was even possible.

What I was expected is something that happens when you use String.toFloat

> String.toFloat "+"
Err "could not convert string '+' to a Float" : Result.Result String Float

This bug was not severe because toString NaN just resulted in my input fields having the content “NaN” if you enter “+” which is not is better than crashing but not what I wanted.

Fixing toFloat

I asked about the behavior in the elm slack and got quick and helpful feedback there as well as on twitter. So thanks to the great, friendly and helpful community. ❤️❤️❤️

I learned that the current state is related to the current behavior of toInt and toFloat in Javascript, but more importantly that this is already known and will be fixed in Elm 0.19, which is great news!

Further issues

This was only the first issue in a list of bugs my fuzz tests found. Afterwards I switch to regex matches instead of relying on toInt never creating invalid Ok states, and I forget to allow decimal points for floats, because I used the same pattern for Int and Float validation, but my fuzz tests found that quickly. And there were more, but I think my point is clear.

Combine with test watch mode

I realized that elm-test watch mode is great to have a higher number of generated tests cases. By default 100 inputs are generated per fuzz-invocation. You can change that value with the --fuzz option of elm-test. However this can quickly become annoying if you are executing your tests manually and wait for the result. However in CI many people already use higher values. But you can also increase the number locally when combining it with the recently added watch mode that you can start with --watch. Because the tests start immediately after saving the file so they already run when you switch to the console (if you do that at all). I use this line here for local development:

elm-test --watch --fuzz 10000

Since the numbers the fuzzers are biased towards values that are typical error cases higher values helped my to find more edge cases quicker. For more details take a look at elm-test Fuzz.frequency.

I guess if you have way more fuzz tests you might go down to 2000 or 1000 but this is still much more than 100 and it did not annoy me at all yet.

Summary

I really think everybody should consider adding fuzz testing to their Elm projects. Even in my case of a private side project it was very valuable because there are not only edge cases that you are not aware of, but you might also find bugs in libraries or tools you use. This was for example the case for riak which had a high priority bug discovered via generative testing. In addition I also recommend this article from basho for further reading and do not forget to increase your --fuzz value in case you use watch mode and on CI.

I decided to write a series about the experiences I had writing an Elm app during the last couple of months. I am trying to add a new post every one or two weeks.

The app I rewrote is is a BMI, body-index and body fat percentage calculator, that can be used to check if your weight is in a healthy range or track fitness progress. You can take take a look at:

Motivation

I wrote the app for my self to track my personal results during certain kind of fitness trainings over weeks / months. That was more than 4 years ago. It was written in Ruby on Rails as a backend-centric app. Recently I wanted to add one or two new features and realized that most of my users use mobile devices and the page is not at all mobile friendly. So I thought it could be a fun training project, which also makes things nicer the people already using the app.

Client side App

This site is a great example of an app that can be done almost completely on the client side. Originally I was afraid of writing everything client side, but that was 2012 and testing and other libraries for Javascript where just not there yet.

Elm for the win

I decided to write the new version in Elm, using the great elm-mdl library for mobile support. I picked Elm because I like it for its type system, package system, error messages and tooling. … As well as the great community that I experienced as very kind and helpful.

I also thought about using Clojure + ClojureScript + Om, but even though I really like Clojure, I wanted to try a ML like type system on a bigger project. I could have also used Purescript, but my knowledge of advanced type systems is limited, therefore I thought Elm would be the easier language to start with.

2-3 months later

I am very happy that I was able to implement all the features that existed in the original version. It was a lot of fun and I enjoyed writing Elm code almost every day. However it was a lot of work. I underestimated the work that is necessary to learn language, a new eco system and write a “private production” app in it.

Whats next

Thanks for reading this intro. Next I will get more technical and report about my biggest learnings about Elm and related things … and there where a lot. I will cover “temporarily narrowing error types”, “fuzz testing”, and much more.

Intro

Recently I worked on a document search for my current company. As part of the crawling and indexing process I worked on scoring the documents. The project was done during a hack week, and thus the scope was limited to what you can achieve in a few days. Because of that we started with a trivial approach and incrementally improved it from there on.

V1: Most trivial Approach

Concept: To score a document for a given query, we create a vector of all possible words (e.g. all the union of all unique words in all documents) and define this as our alphabet of size N. Next we define a function function c as c(w,q) ε {0|1} and c(w,d) ε {0|1}, which returns 0 or 1 based whether the given term w is in the given query resp. the given document.

We can think about this, as creating two bit vectors of size N. One document vector and one query vector. And then we can compute the score simply as the dot product of the two vectors.

This can also be expressed with Sum notation:

b
Naiv sum of occurrences

Over all possible words w(i) to w(N) we compute c(w,q) and c(w,d) and multiply the result. This means if the term is not present in the query or in our document, the product is 0 which is the identity / neutral element for sums. Then we just sum all these products.

Solved problems:

  • [x] score for matches
  • [x] no score for non matches

Open problems: If c(w,d) results in x, where x ε {0|1}, then the similarity of the document and query, is not expressed strong enough. With other words: Our score of 1/0 for matches is too simple to make a statement about how good a term in the query matches a given document. We can only say that it occurred at least once.

V2: Term frequency instead of a bit vector

Improvement: Instead of returning 0 or 1 for each possible word (bit vector), we return the count of occurrences of the given term. This is called the Term frequency (TF). So our counting function c is now defined as:

c(w,q) and c(w,d) = x , x >= 0

Solved problems:

  • [x] higher score for more matches
  • [x] no score for non matches
  • [x] c(w,d) now expresses the similarity of the word and the document better

Open problems:

  • The frequency does not have an upper bound, so repeating the same word over and over again results in huge scores, thus we can have bad matches or people can fool our search engine
  • Stop words and rare words have the same impact for the scoring function, thus words with a lot of regular english words but without any matches for nouns and verbs might score very high.

V3: Inverse Document frequency

Prioritize rare terms to common terms by introducing document frequency

In order to increase the importance of rare (meaningful) terms compared to common terms (stop words) we will penalize terms with occur in many documents (high document frequency). Therefore we can define:

M = Total number of all documents

and document frequency:

df(w) = number of documents that contain w

This approach is based on the idea of term specificity.

Now we can represent the inverse document frequency of w as:

inverse-document-frequency-equation
inverse document frequency

Leading us to this new scoring function:

document-frequency-equation
document frequency

With c(w,q) and c(w,d) = x , x >= 0

Solved problems:

  • [x] score for matches
  • [x] no score for non matches
  • [x] c(w,d) now expresses the similarity of the word and the document better
  • [x] rare terms are more important than stop words

Open problems:

  • Term frequency does not have an upper bound, so repeating the same word over and over again results in huge scores.

V4: An upper bound for the Term frequency

In order to address the final open issue lets take a look at BM25 as one of the available options to ensure an upper bound on the term frequency.

TF transformation BM25 transformation
TF transformation BM25

BM25 is a well established algorithm for this task. It also falls back to our initial binary 1/0 logic for k = 0.

Using our c(w,d) as the x in the BM25 definition we just need to change one part of our scoring function and keep c(w,q) as well as the IDF term at the end. Our new equation looks like this:

Equation with BM25 transformation
New equation including BM25

Solved problems:

  • [x] higher score for more matches
  • [x] no score for non matches (multiplying query count by document count)
  • [x] c(w,d) now expresses the similarity of the word and the document better
  • [x] rare terms are more important than stop words (IDF)
  • [x] Term frequency has an upper bound so that repeating a single world many times does not fool our algorithm.

Open problems:

  • When handling documents with different length, the longer documents will always have an advantage, since they have more content and are thus more likely to define the word we search for.

V5: Introduce Pivoted length normalization

Introducing document length normalization

To solve the final problem we are left with we will introduce a way to penalize very long documents to a certain degree to have a better balance for long and short documents. We define a “Normalizer” as follows:

length normalizer equation
Pivoted document length normalizer

with b ε [0,1]. Here b controls the impact of the normalizer. With b = 0 the normalizer is always 1, when we increase b towards 1 the penalty gets bigger.

We can now replace the simple use of k in our denominator with this normalizer, which us to our final equation.

scoring function with normalizer
New scoring function including document length normalizer

Solved problems:

  • [x] score matches
    • use of the sum of products
  • [x] do no score non matches
    • multiplication of query count by document count
  • [x] c(w,d) expresses the similarity of the term and the document
    • we replaced a bit vector with term frequency (TF)
  • [x] rare terms are more important than common words (stop words)
    • we added IDF
  • [x] Upper bound for term frequency, to handle repeated words properly
    • we added BM25
  • [x] handle a mix of short and long documents well
    • Introduction of document length normalization

Summary

We were able to arrive at a state of the art scoring function by incrementally improving our initial approach that was very naive and simple to implement. The similarity to the okapi / BM25 family can be compared in this wiki page or in the publications from Robertson and Walker about Okapi / BM25.

This article used equations based on the “vector space model”, which for example is used in Apache Lucene and which is quiet impressive this means even Elasticsearch used it under the hood! However other real world engines and the “official” BM25 definition uses a propabilistic relevance models instead. So maybe this might a interesting for futher reading.

second attempt

In my recent Don’t blame rails post, I talked about my opinion on most of the rails blaming that is happening in the recent past.

It was interesting how much some of the people I consider good software engineers disagreed with me. I am happy that I was finally able to distill the difference in opinions so that I can now share it. In addition to that, I am now able to make a more precise statement on my opinion, than I was in the last blog post.

Skill level

The first important distinction is about the skill level of the complaining person seems to blur into my perception of the statements. I don’t have any issue on discussing the problems rails has, with somebody who really understands the problems and the benefits alternatives have. I am just annoyed by criticism that is based on repeating other people’s statements without own experience or expertise.

Replacing frameworks with frameworks

I guess one thing that seems to make me skeptical is that people criticize rails and its complexity or the implications on the app they build, and still want to switch to phoenix or some framework in any other language. I find it confusing to just swap frameworks as a solution that should address architecture or complexity. I am afraid that one might be destined to experience the problems with the new framework, that one originally had with rails.

Two distinct topics

My last post and the above reflection on it, was mainly about addressing the way people blame and criticize rails in ways, that partly bothered me.

Today I also want to take the time to express my own opinion.

My opinion

I really appreciate what rails did for the web development community. I think it made this part of our industry a better place. (Just thinking of the fact that I had to use PHP before Ruby and Rails)

However don’t like it’s complexity, the missing support for separation of concerns and also the direction rails is heading, even though, I really appreciate the fact the the core members are really doing a great job in simplifying parts of it.

Frameworks and rails’ design choices

I think part of this complexity and the implications is just part of the fact that its a framework. However other parts don’t just fit my imagination of software design. Some examples are:

  • I prefer, having per action abstraction and not one controller with methods per actions, which then might be compensated by having a service/interactor per action.
  • Having a distinction between domain models (entities) and database object mappings (in OO contexts) makes things easier. It’s easy to do manually, but it’s additional boilerplate.
  • The fact that one has to use hooks instead instantiation is a bad thing itself. This is the case with controllers, but also with the way DHH prefers to expose the attributes API, which otherwise would be just great.
  • Having a repository abstractions seems valuable to me, and its one of the parts where I prefer Ecto over AR.

These are just some things that bother me. I guess one can compensate most of it rails, quiet easily, but just don’t like working with it. I guess it just no longer fits my needs and the way I like to write/build software systems. Maybe ActiveRecord does no longer provide the abstractions/API that match the learnings of the last years and current best practices.

My future way

By default I personally would not use rails for new projects. I think I am aiming for simpler and less coupled components to create properly maintainable systems. This also implies I would also prefer libraries over frameworks. This means working on top of rack, WAI, ring and so on seems to be a better fit for the kind of API-only micro services I am creating at the moment.

I guess there will be some use-cases or sets of requirements in the business context that will make rails the proper choice, but unless that is the case, I will go with smaller and simpler solutions.

Another reason is that I would not use Ruby by default anymore, but instead go with Elixir, Clojure or Haskell, depending on the runtime constraints. The reason behind that, is simply my believe in the benefits of functional programming compared to imperative programming (Let’s avoid the ‘definition of OO’ discussion here). And even though ruby offers some functional concepts, its always a pain to go that direction in a language that is not designed to be a functional programming language.

Ruby and Rails are great

However I don’t want to take anything away from Rails or Ruby. They have their benefits and they have made great achievements possible. So Ruby and Rails were solving problems at that time and might haves surpassed that time and need.

UPDATE: As a funny coincident I just saw this talk from Justin searls at rails conf, which also goes into some details about the fact that people are leaving rails from a quiet different angle, that I did not consider so far. Nice to have additional ideas on the topic