This is the second part of my short series on my experiences with Elm while reimplementing Carna.io. Check out my part one for some background info.

Fuzz testing in Elm

I will assume that you are familiar with the general concept of fuzz testing aka generative testing aka property based testing. If you are not familiar with it yet, you can get a nice intro from the elm-test readme or in case you have access to frontend-masters please take a look at the fuzz testing part of Richard’s Course. It is short and simple and is enough to get you started.

Fuzz testing all the time

Now that the background should be clear. There is only one thing I want to mention in this post: Use fuzz testing, ideally in every project! Even if you think it is not worth to add it to a certain project. I don’t mean use it for everything but prefer it over unit testing as the goto tool and you will quickly be surprised by its value.

My experience in Elm

I added tests and fuzz testing to Carna just to get used to it. Then surprisingly my tests failed for an unexpected input. You can find the actual test that failed here. My tests are really rudimentary but they still found multiple bugs that where quiet interesting.

Unexpected String.toInt behavior

In Carna I am asking for numeric values like weight or height and parse them to Int. (Now they are Float, but that is not important here) Thus the value is represented as a Result String Int. That is also what is tested at the line above. Then my tests failed for the input “+”. The reason:

> String.toInt "+"
Ok NaN : Result.Result String Int

I think I would have never thought about using “+” or “-“ as a input value that would cause an Ok result. Especially I would not have thought that the state Ok NaN was even possible.

What I was expected is something that happens when you use String.toFloat

> String.toFloat "+"
Err "could not convert string '+' to a Float" : Result.Result String Float

This bug was not severe because toString NaN just resulted in my input fields having the content “NaN” if you enter “+” which is not is better than crashing but not what I wanted.

Fixing toFloat

I asked about the behavior in the elm slack and got quick and helpful feedback there as well as on twitter. So thanks to the great, friendly and helpful community. ❤️❤️❤️

I learned that the current state is related to the current behavior of toInt and toFloat in Javascript, but more importantly that this is already known and will be fixed in Elm 0.19, which is great news!

Further issues

This was only the first issue in a list of bugs my fuzz tests found. Afterwards I switch to regex matches instead of relying on toInt never creating invalid Ok states, and I forget to allow decimal points for floats, because I used the same pattern for Int and Float validation, but my fuzz tests found that quickly. And there were more, but I think my point is clear.

Combine with test watch mode

I realized that elm-test watch mode is great to have a higher number of generated tests cases. By default 100 inputs are generated per fuzz-invocation. You can change that value with the --fuzz option of elm-test. However this can quickly become annoying if you are executing your tests manually and wait for the result. However in CI many people already use higher values. But you can also increase the number locally when combining it with the recently added watch mode that you can start with --watch. Because the tests start immediately after saving the file so they already run when you switch to the console (if you do that at all). I use this line here for local development:

elm-test --watch --fuzz 10000

Since the numbers the fuzzers are biased towards values that are typical error cases higher values helped my to find more edge cases quicker. For more details take a look at elm-test Fuzz.frequency.

I guess if you have way more fuzz tests you might go down to 2000 or 1000 but this is still much more than 100 and it did not annoy me at all yet.

Summary

I really think everybody should consider adding fuzz testing to their Elm projects. Even in my case of a private side project it was very valuable because there are not only edge cases that you are not aware of, but you might also find bugs in libraries or tools you use. This was for example the case for riak which had a high priority bug discovered via generative testing. In addition I also recommend this article from basho for further reading and do not forget to increase your --fuzz value in case you use watch mode and on CI.

I decided to write a series about the experiences I had writing an Elm app during the last couple of months. I am trying to add a new post every one or two weeks.

The app I rewrote is is a BMI, body-index and body fat percentage calculator, that can be used to check if your weight is in a healthy range or track fitness progress. You can take take a look at:

Motivation

I wrote the app for my self to track my personal results during certain kind of fitness trainings over weeks / months. That was more than 4 years ago. It was written in Ruby on Rails as a backend-centric app. Recently I wanted to add one or two new features and realized that most of my users use mobile devices and the page is not at all mobile friendly. So I thought it could be a fun training project, which also makes things nicer the people already using the app.

Client side App

This site is a great example of an app that can be done almost completely on the client side. Originally I was afraid of writing everything client side, but that was 2012 and testing and other libraries for Javascript where just not there yet.

Elm for the win

I decided to write the new version in Elm, using the great elm-mdl library for mobile support. I picked Elm because I like it for its type system, package system, error messages and tooling. … As well as the great community that I experienced as very kind and helpful.

I also thought about using Clojure + ClojureScript + Om, but even though I really like Clojure, I wanted to try a ML like type system on a bigger project. I could have also used Purescript, but my knowledge of advanced type systems is limited, therefore I thought Elm would be the easier language to start with.

2-3 months later

I am very happy that I was able to implement all the features that existed in the original version. It was a lot of fun and I enjoyed writing Elm code almost every day. However it was a lot of work. I underestimated the work that is necessary to learn language, a new eco system and write a “private production” app in it.

Whats next

Thanks for reading this intro. Next I will get more technical and report about my biggest learnings about Elm and related things … and there where a lot. I will cover “temporarily narrowing error types”, “fuzz testing”, and much more.

Intro

Recently I worked on a document search for my current company. As part of the crawling and indexing process I worked on scoring the documents. The project was done during a hack week, and thus the scope was limited to what you can achieve in a few days. Because of that we started with a trivial approach and incrementally improved it from there on.

V1: Most trivial Approach

Concept: To score a document for a given query, we create a vector of all possible words (e.g. all the union of all unique words in all documents) and define this as our alphabet of size N. Next we define a function function c as c(w,q) ε {0|1} and c(w,d) ε {0|1}, which returns 0 or 1 based whether the given term w is in the given query resp. the given document.

We can think about this, as creating two bit vectors of size N. One document vector and one query vector. And then we can compute the score simply as the dot product of the two vectors.

This can also be expressed with Sum notation:

b
Naiv sum of occurrences

Over all possible words w(i) to w(N) we compute c(w,q) and c(w,d) and multiply the result. This means if the term is not present in the query or in our document, the product is 0 which is the identity / neutral element for sums. Then we just sum all these products.

Solved problems:

  • [x] score for matches
  • [x] no score for non matches

Open problems: If c(w,d) results in x, where x ε {0|1}, then the similarity of the document and query, is not expressed strong enough. With other words: Our score of 1/0 for matches is too simple to make a statement about how good a term in the query matches a given document. We can only say that it occurred at least once.

V2: Term frequency instead of a bit vector

Improvement: Instead of returning 0 or 1 for each possible word (bit vector), we return the count of occurrences of the given term. This is called the Term frequency (TF). So our counting function c is now defined as:

c(w,q) and c(w,d) = x , x >= 0

Solved problems:

  • [x] higher score for more matches
  • [x] no score for non matches
  • [x] c(w,d) now expresses the similarity of the word and the document better

Open problems:

  • The frequency does not have an upper bound, so repeating the same word over and over again results in huge scores, thus we can have bad matches or people can fool our search engine
  • Stop words and rare words have the same impact for the scoring function, thus words with a lot of regular english words but without any matches for nouns and verbs might score very high.

V3: Inverse Document frequency

Prioritize rare terms to common terms by introducing document frequency

In order to increase the importance of rare (meaningful) terms compared to common terms (stop words) we will penalize terms with occur in many documents (high document frequency). Therefore we can define:

M = Total number of all documents

and document frequency:

df(w) = number of documents that contain w

This approach is based on the idea of term specificity.

Now we can represent the inverse document frequency of w as:

inverse-document-frequency-equation
inverse document frequency

Leading us to this new scoring function:

document-frequency-equation
document frequency

With c(w,q) and c(w,d) = x , x >= 0

Solved problems:

  • [x] score for matches
  • [x] no score for non matches
  • [x] c(w,d) now expresses the similarity of the word and the document better
  • [x] rare terms are more important than stop words

Open problems:

  • Term frequency does not have an upper bound, so repeating the same word over and over again results in huge scores.

V4: An upper bound for the Term frequency

In order to address the final open issue lets take a look at BM25 as one of the available options to ensure an upper bound on the term frequency.

TF transformation BM25 transformation
TF transformation BM25

BM25 is a well established algorithm for this task. It also falls back to our initial binary 1/0 logic for k = 0.

Using our c(w,d) as the x in the BM25 definition we just need to change one part of our scoring function and keep c(w,q) as well as the IDF term at the end. Our new equation looks like this:

Equation with BM25 transformation
New equation including BM25

Solved problems:

  • [x] higher score for more matches
  • [x] no score for non matches (multiplying query count by document count)
  • [x] c(w,d) now expresses the similarity of the word and the document better
  • [x] rare terms are more important than stop words (IDF)
  • [x] Term frequency has an upper bound so that repeating a single world many times does not fool our algorithm.

Open problems:

  • When handling documents with different length, the longer documents will always have an advantage, since they have more content and are thus more likely to define the word we search for.

V5: Introduce Pivoted length normalization

Introducing document length normalization

To solve the final problem we are left with we will introduce a way to penalize very long documents to a certain degree to have a better balance for long and short documents. We define a “Normalizer” as follows:

length normalizer equation
Pivoted document length normalizer

with b ε [0,1]. Here b controls the impact of the normalizer. With b = 0 the normalizer is always 1, when we increase b towards 1 the penalty gets bigger.

We can now replace the simple use of k in our denominator with this normalizer, which us to our final equation.

scoring function with normalizer
New scoring function including document length normalizer

Solved problems:

  • [x] score matches
    • use of the sum of products
  • [x] do no score non matches
    • multiplication of query count by document count
  • [x] c(w,d) expresses the similarity of the term and the document
    • we replaced a bit vector with term frequency (TF)
  • [x] rare terms are more important than common words (stop words)
    • we added IDF
  • [x] Upper bound for term frequency, to handle repeated words properly
    • we added BM25
  • [x] handle a mix of short and long documents well
    • Introduction of document length normalization

Summary

We were able to arrive at a state of the art scoring function by incrementally improving our initial approach that was very naive and simple to implement. The similarity to the okapi / BM25 family can be compared in this wiki page or in the publications from Robertson and Walker about Okapi / BM25.

This article used equations based on the “vector space model”, which for example is used in Apache Lucene and which is quiet impressive this means even Elasticsearch used it under the hood! However other real world engines and the “official” BM25 definition uses a propabilistic relevance models instead. So maybe this might a interesting for futher reading.

second attempt

In my recent Don’t blame rails post, I talked about my opinion on most of the rails blaming that is happening in the recent past.

It was interesting how much some of the people I consider good software engineers disagreed with me. I am happy that I was finally able to distill the difference in opinions so that I can now share it. In addition to that, I am now able to make a more precise statement on my opinion, than I was in the last blog post.

Skill level

The first important distinction is about the skill level of the complaining person seems to blur into my perception of the statements. I don’t have any issue on discussing the problems rails has, with somebody who really understands the problems and the benefits alternatives have. I am just annoyed by criticism that is based on repeating other people’s statements without own experience or expertise.

Replacing frameworks with frameworks

I guess one thing that seems to make me skeptical is that people criticize rails and its complexity or the implications on the app they build, and still want to switch to phoenix or some framework in any other language. I find it confusing to just swap frameworks as a solution that should address architecture or complexity. I am afraid that one might be destined to experience the problems with the new framework, that one originally had with rails.

Two distinct topics

My last post and the above reflection on it, was mainly about addressing the way people blame and criticize rails in ways, that partly bothered me.

Today I also want to take the time to express my own opinion.

My opinion

I really appreciate what rails did for the web development community. I think it made this part of our industry a better place. (Just thinking of the fact that I had to use PHP before Ruby and Rails)

However don’t like it’s complexity, the missing support for separation of concerns and also the direction rails is heading, even though, I really appreciate the fact the the core members are really doing a great job in simplifying parts of it.

Frameworks and rails’ design choices

I think part of this complexity and the implications is just part of the fact that its a framework. However other parts don’t just fit my imagination of software design. Some examples are:

  • I prefer, having per action abstraction and not one controller with methods per actions, which then might be compensated by having a service/interactor per action.
  • Having a distinction between domain models (entities) and database object mappings (in OO contexts) makes things easier. It’s easy to do manually, but it’s additional boilerplate.
  • The fact that one has to use hooks instead instantiation is a bad thing itself. This is the case with controllers, but also with the way DHH prefers to expose the attributes API, which otherwise would be just great.
  • Having a repository abstractions seems valuable to me, and its one of the parts where I prefer Ecto over AR.

These are just some things that bother me. I guess one can compensate most of it rails, quiet easily, but just don’t like working with it. I guess it just no longer fits my needs and the way I like to write/build software systems. Maybe ActiveRecord does no longer provide the abstractions/API that match the learnings of the last years and current best practices.

My future way

By default I personally would not use rails for new projects. I think I am aiming for simpler and less coupled components to create properly maintainable systems. This also implies I would also prefer libraries over frameworks. This means working on top of rack, WAI, ring and so on seems to be a better fit for the kind of API-only micro services I am creating at the moment.

I guess there will be some use-cases or sets of requirements in the business context that will make rails the proper choice, but unless that is the case, I will go with smaller and simpler solutions.

Another reason is that I would not use Ruby by default anymore, but instead go with Elixir, Clojure or Haskell, depending on the runtime constraints. The reason behind that, is simply my believe in the benefits of functional programming compared to imperative programming (Let’s avoid the ‘definition of OO’ discussion here). And even though ruby offers some functional concepts, its always a pain to go that direction in a language that is not designed to be a functional programming language.

Ruby and Rails are great

However I don’t want to take anything away from Rails or Ruby. They have their benefits and they have made great achievements possible. So Ruby and Rails were solving problems at that time and might haves surpassed that time and need.

UPDATE: As a funny coincident I just saw this talk from Justin searls at rails conf, which also goes into some details about the fact that people are leaving rails from a quiet different angle, that I did not consider so far. Nice to have additional ideas on the topic

It is usually not rails’ fault

In the recent past it appears as blaming rails for productivity issues in larger projects has become quiet popular. Usually I do not really care much about these trends. However this time it happened that I came a blog post that is kind of representable for most of the criticism towards rails these days.

My background

Before I want to comment, I just wanted to shortly say, that I have been working with rails since 2006 as part of my job as a software engineer. I believe I have a solid understanding of the trade offs and most of the pros and cons of certain aspects of rails, and its biggest component active record.

That said, I hope it is clear, that my goal is not to protect rails (as a project or as a community). I think we should focus on the real facts that might be downsides and not get lost in speculation and irrational or inconsistent arguments.

Rails, frameworks and architecture

Rails is not your architecture

I think it is important to understand, that rails does not force a certain architecture. Rails is a framework and thus it pushes you into a certain direction. However, no matter which framework, libraries you use of if you decide to write all code you self. It is your (teams) responsibility to ensure a proper design and software quality. This includes your architecture as well as coding styles and guidelines.

here are some sources about this topic:

There is also some great information on thoughtbot upcase and other thoughtbot sources that show that rails and a proper architecture can go hand in hand

I just want to bring this point up shortly, because I will come back to it later.

Rails quality and your app’s quality

I think we should really distinguish between issues with the rails code base and its internals and problems the Rails design is creating in application build with it. Thus the amount of methods in ActiveRecord is not really a problem in projects I work on. However I still don’t like it as a developer.

On the other hand there are issues with the API that rails offers. This could be for examples the inability to instanciate controller classes ( or actions / services / interactors) directly. This leads either to before/after actions or rack middleware being added. The same goes for the way templates and controllers are tight together. However I think the later issue can be properly addresses by using proper OO abstractions like form objects.

Other issues like ActiveSupport magic might indicate bad design/software engineering practices, but, they would push you towards a bad design, since it is the developers decision, which tools to use. Even within a framework.

Finally rails is a framework and it locks you into a certain structure, as every other framework would. And thus there are also issues that just arise from the fact of using a framework. These issues should also be discussed separately, or at least be understood as what they are.

My opinion on most of the blaming

My comments will mainly focus on this article. However as I already mentioned there a various sources that almost refer to the same points.

Some good points

I think the article really some points I totally agree with. But these make up only a small part of the article.

  • I really discourage monkey patching. – However the fact that active support features are migrated into the language might show issues with ruby as a language – This is true for try and the existence of nil, as well of Enumerable.pluck and the missing type safety in ruby
  • Having no control over the instantiation of the controller and thus having to use before/after is a not great
  • The fact that associations can be loaded lazily caused tons of N+1 query bugs, which should at least be allowed to disable

There are some more points that cant be avoided and annoy me, but these are some examples, some of them can be found in the article.

The Complexity caused by rails

The article points out that rails introduces unnecessary complexity, which of course is a bad thing. There are some examples for it:

  • Monkey patching
  • The public methods amount of active record classes
  • Focus on adding features

Most of the other sentences hardly have any points.

I think these points are issues, that I also do not like. However I am not sure, why this should really be a problem because as I mentioned above, its your team that makes the call. Monkey patches ( ActiveSupport additions to core classes), don’t have to be used, so if you do not like using monkey patching just don’t use it.

There is also no reason to expose ActiveRecord methods to controllers. You can happily create service objects or use ActiveRecord to implement a repository your self.

In addition to that, its unfortunate that active record classes provide too many public methods. However this indicates issues with ActiveRecord and should not cause issues in your application. Usually there are multiple ways to do a certain thing and within a project the team should be consistent about it. This eliminates a large amount of public methods from actual use. I think it’s again the responsibility of the developer to ensure a proper code base and a consistent use of the libraries and tools the project is using.

When it comes to views, I really think there are some design issues in rails that create an API that pushes towards coupled code. However experienced developers should notice that and use a proper solution. Simply following Sandi Metz’ rule to only pass one object to your view solves most of the problems less experienced developers have.

With regards to ActiveRecord, I don’t really expect a skilled software developer to massively use ActiveRecord callbacks or implement business logic into the persistence layer.

Doing so would really quickly cause trouble. However I think the time when the community or the rails developers advocated this way of programming are over for years.

The rails way

Often people refer to ‘the rails way’ or to how ‘the rails community’ is doing things. As I just said, I think things really shifted around 2010 or so, which proved bob martin’s statement about the lost 10 years kind of correct. In the last year I listend to all episodes of:

and watched all the ruby related content on thoughtbots weekly iteration as well as the upcase ruby content from thoughbot.

This was really a good time investment, since I learned that a huge part of the ruby community, the rails community and the rails core team really care about the quality of their projects and that the time of ‘fat models’ has passed a long time ago. In addition to that the core team really does a great job on simplifying the internals and provide easier and more explicit APIs to rails components.

I am really happy to learn that thoughtbot and other companies with a lot of rails experience offer advice howto design your rails application to ensure a good architecture as wells as thinks like fast test suits and much more.

This really made me change my opinion about how hard it is to build proper (well designed / maintainable) applications with rails.

Mixing framework and language issues

When comparing frameworks it is easy to take language differences and project them on frameworks / libraries This often happens in the rails vs phoenix discussion, where many points are often about ruby vs elixir or even about the benefits of functional programming.

This might effect your choice of using ruby+rails vs elixir+phoenix or similar, but this is not a thing that can be attributed to rails. This is true, for things like Object#try, due to the fact that ruby supports nil. However its also true that large method interfaces are a general ruby issue. Of course still active record adds may too many methods. However this is more a topic for discussing rails internals or improving rails and not about reasons your application might be in a bad state.

A word on “The core team”

Throughout the article the author over and over repeats his opinion on the ‘philosophy’ of the core team. I am not sure with which members he is actively communicating, but these statements are just generalizations of statements and decisions that mostly DHH made, which of course has a special opinion about rails. However his task is more or less to sell rails and not to be an rails architecture consultant. I think if somebody disagrees with that direction, he should at least refer to DHH and not always talk about the ‘core team’. This really shows a lack of respect towards core members that spent months and years on simplifying active record, the rails router and other rails components within the last years.

Its even worse to talk about the DHH/TDD discussion or his focus on simple over easy within an article about rails. It might also be a bad thing just during thinking whether rails is a good choice or not. This is simply a topic that is totally unrelated to a proper decision on which framework to choose (if any)

Rails killed merb and data mapper

Original quote < These projects were killed by rails

I think this point is hardly worth commenting. However I think it is important to point this out anyway, so that reader that may buy this argument have a chance to rethink it.

Though it is true that merb kind of disappeared after the merge, this is what one would expect when two projects are merged. In addition to that, this merge caused quiet a lot of changes in rails. On the other side the fact that merb or data mapper (which is not dead at all but just did not took off), is hardly rails’ fault but a decision by the ruby community on which tools they want to use. I am really not buying the fact that it is rails fault that

< building anything new in the ruby ecosystem turned out to be extremely difficult., Since peoples’ attention is < Rails-focused, new projects have been highly influenced by Rails

I think bigger companies and ruby shops did shift to other libraries and web frameworks. Also sinatra gained more and more attention and adoption. And those projects usually do not use active record as a data source.

Trailblazer

This article, as well as this years railsconf talk about ecto, mentioned the trailblazer project. I don’t want to rant too much, but I hardly cant take any rails criticism serious that also promotes this project.

If complexity is an issue, than adding a framework on top of a too big framework makes things just worse. It also makes things more fragile and make it harder for people to test early rails version, since it is highly coupled to rails.

But it gets worse

Representable

In my opinion this component is the worse attempt of implementing a serializer I ever had to work with. Its overly complicated, inheritance is misused and overused. Especially the entire attempt to mix a presenter into a model instance is totally bizarre. However the other option to have a wrapper around a model instance also introduces a lot of complexity. In addition to that the API is way too huge. Everything can be adjusted via metaprogramming/DSLs and besides that its by far the slowest serializer I worked with. I had to dig deep into it for month and I really discourage anybody from using it.

OO Design and inheritance

I watched a video about using trailblazer and I got confused when for the controller related content inheritance was introduced. Especially in a way that the purpose was to share code and the subtypes also stubbed out methods from the original class. Calling this ‘returning to good OO practices’ was the worse part of it. Its obvious that this kind of decision does not really sound like a well designed framework that reduces the application’s complexity. Again I think that trailblazer here introduces a worse architecture on top on an architecture that is criticised.

I could go way more into detail but I can only say, that I would think twice before adding this additional indirection and complexity to my project. And its kind of intuitively clear. Why would adding a set of components on top of something that might be too big and complex solve the issue?

So is all nice and good ?

Rails really has some issues when it comes to its implementation and its design. However this really has orthogonal to the application rails users have in their applications. And if you are really concerned about it, you can always contribute to it.

On the other side there are also some API/Design issues that push parts of the application towards a tightly coupled design. I think most of these issues can be addressed properly with proper designs and a good architecture.

The last thing that might be important, even though it is obvious, is the fact that rails is not a good match for every problem. It surely has its sweetspot and there are problems were it does not shine. I think this is especially the case where elixir is currently really taking off.

Ruby itself has its limitation that are thus also limitations for rails. And when it comes to performance or large scale requirements, you should really check how much of the rails stack you actually need.

If that is not at least ‘most of it’, it might be a bad choice in the first place.

Bad projects I worked on

The worst code base that paralyzed a company was a php project I entered that did not allow to introduce a permission management in more than 6 months of work. This project suffered from so many issues and it did not use any framework at all.

The worse rails projects I worked on were mostly implemented by me, mostly in a stage of my career I would not consider myself of being able to make proper architectural decisions. Or a time when I was still listening to management when they wanted me to “go faster”. This has totally changed and the above video from bob martin about professionalism is a great resource for this topic.

Conclusion

To summarize, I would say that there are really some things I don’t like about rails. However I have to be aware of the fact, that I am no longer excited about ruby as well. So this does blur my opinion. However there are other voice in the ruby community, that are really experienced and they agree with these points.

That is the reason why there were so great improvements to the state of active record, including the adequate record refactoring or the attributes API, which is even nicer, when used without symbols so that the object creation is explicit.

That said there are way too many blog posts and voices out there at the moment complaining and blaming rails for their bad architecture. I think we should really start thinking about what we as the person who wrote the code did wrong and then analyze the role of the framework.

Most of the complains mix comparisons of languages with frameworks. Or compare frameworks with different feature sets. In addition to that, issues with the rails implementation are mixed with issues that really impact the design of the application.

However there are things I do not like about rails and about the direction DHH is moving rails to. I really not like monkey patching, callbacks, implicit object management, redundant interfaces, the focus on easy over simplicity, preferring features over improvements.

And still I am able to write rails application were all these things almost not impact me, because I can chose what to use and what not.

I personally prefer having my framework (be it open source or just company internal), to default to a proper architecture, so that I don’t have to write all the boilerplate to setup repositories and interactors and all the components I use. However if I don’t know these concepts these frameworks wont help me create an application that will stay maintainable on the long term.