Software Engineer

I am a Software Engineer. I have a Bachelor (Honours) of Science in Information Technology from the University of Sunderland - Class of 2003. I have been developing software since 2001 when I was offered a role at CERN as part of their Technical Student Programme.

By 2016 I had grown really tired of the software industry and by the end of 2019 Apple killed whatever excitement I had left. I am not sure what the next 10 years will bring. What I do know is that my apettite to do work that is impactful has only grown bigger and stronger. Great people make me tick more than anything.

I am also tired.

How to trace and debug an iOS crash (Part 1)

There has been a crash

While preparing for an iOS release, observed a crash on an iPad (iOS 4.3.3, WiFi). The crash was fairly consistent to be considered random but once in a while, it would work as expected. On the other hand, an iPad 3 (iOS 5.1, WiFi/3G) wouldn’t exhibit the same behaviour.

The crash was triggered at the completion block of an AFHTTPRequestOperation while parsing a (JSON) response with an empty body to a NSDictionary using JSONKit.

This was an unexpected behaviour since an empty body response was not defined in the request/response contract.

FACTS

1. iPad 4.3.3 WiFi

EVIDENCE

1.  > GET /foo HTTP/1.1  
2.  < [empty body]

The setup

The iOS app in question, was pointing at an Amazon EC2 Load Balancer which had a single EC2 instance attached.

    [iPad 4.3.3 WiFi] -> [AWS Load Balancer] -> [AWS EC2 Single instance)

The request was an HTTP GET expecting a JSON response but getting one with an empty body instead.

    > GET /foo HTTP/1.1  
    < HTTP/1.1 200 OK  
    < [empty body]

The request had a constant set of headers and a content. e.g. no user input was involved.

Elimination game

In a scenario where multiple factors are involved, you need to play the elimination game to find the culprit.

Hence, started by removing the load balancer and see if it makes any difference. Effectively the iOS app was now sending requests directly to the EC2 instance.

    [iPad 4.3.3 WiFi] -> [AWS EC2 Single instance]

As a result the crash would no longer manifest.

    > GET /foo HTTP/1.1  
    < [JSON]  

FACTS

1. iPad 4.3.3 WiFi  
2. AWS Load Balancer

At this point, you need to work backwards.

QUESTION: Under what circumstances would the load balancer return an empty body?

The question does not really have a straightforward answer. A hint was given by @goldstein that maybe multiple instances are attached to the load balancer with one of them sending the bogus response. This however didn’t seem to be the case. At least as shown by the AWS Console.

To verify this assumption, used traceroute. Traceroute will report if multiple addresses are assigned to a given host.

    traceroute [AWS Load Balancer]  
	Warning: [AWS Load Balancer] has multiple addresses;

Executing traceroute for a couple of times, will give you all the different ips. In this case they were two (2), one of which was the known EC2 instance.

QUESTION: Why would the load balancer report 2 ips attached if only 1 is assigned according to the AWS console?

Even weirder was that the second ip reported wasn’t anywhere to be found under the AWS console. A “phantom” ip.

Looking closer at the load balancer configuration, they were 2 availability zones defined.

    availability zone (instance = 0, healthy = NO)  
    availability zone (instance = 1, healthy = YES)

Removing the “unhealthy” availability zone, solved the “multiple addresses” issue.

QUESTION How is that explained?

ASSUMPTION When the [AWS EC2 Single instance] was initially created, it was assigned the reported phantom ip. At that point, it was attached to the load balancer with the given availability zone. Later on, an elastic ip was set while the instances were reduced down to one (1).

Haven’t reproduced the above steps to verify this assumption.

Traceroute, reported both the original and elastic ip attached to the load balancer.

Although the load balancer was now correctly pointing to one instance, the crash wasn’t resolved.

Kudos to @goldstein for his insight to AWS.

Continue to part 2