By FENG Xin
A couple of years ago, my customers often told me something from field like this, “We wanted this issue to be solved as soon as possible, or else we were going to have to postpone first-off appliance project.” Sometimes, they straightly warned that the business contract would be lost if the issue could not get solved within an expected deadline. For me, the urgency, criticalness and deadline were full of the daily work.
However, in recent years I started to realize that my customers rarely say the issues more often in this way. As far as I’m concerned, the change might be related to my three days’ experience in one field three years ago. Hereafter BSS means Base Station Subsystem, and MSC is Mobile Switch Center. They are connected together by the cable for the communication.
1.Seek first the actions to understand
At that time, the new and important feature of our product was in first-off phase at Xi’an field. One morning, I received a message by which a leader of field team was requiring R&D team to help figure out an issue as soon as possible. As usual, I replied that the investigation was on-going in R&D, and I would keep him well informed as long as we had any progress. By the way, I added that I fully understood the urgent need from the field. Five minutes later, I was surprised to find a short answer in my mail-box, saying “if you weren't in field personally you're really hard to understand the urgency we're facing.”
To be honest, I was surprised that he said so, and began wondering what's the true circumstance the field team was stepping into. As I thought it’s a general issue, particularly for the field where new software just was upgraded, I guessed it’s most likely due to the improper tuning of several parameters for performance optimization. And more frequently we met the almost same issue over past years in a variety of fields in my memory. Nevertheless, I decided to fly to Xi’an field to see if I can help something, and at least I would get more information than those by mail.
2.Make sure of openness
After I arrived at Xi’an, two meetings were waiting for me. One’s for the latest investigation state between R&D and field team, and the other one is for the synchronization with customer.
Before the first meeting, the issue owner in R&D team called me and hesitated to ask if we need to tell field team that the investigation’s congested as our own software was lack of the effective traces to position where the coding bug is. I replied affirmatively, and then asked what’s the next plan to continue. The owner suggested that we could possibly get some findings in term of software configurations in field.
As proposed, in the meeting we shared the investigation was looking for the opportunity in new software but it’s blocked, and the second option to figure the configurations out would be first to seek together with field team. Field team agreed to immediately start collecting the configuration files from the different sites, without any complaints. We all expected the issue only took place on the guilty site where a set of “wrong” parameters were configured.
Before the second meeting with customer, a temporary conference call was again launched in the mid-night. It’s urgently per the request of field team, as they were trying to drive test if the same issue as well happened on the all sites in field, besides the guilty site. In daytime, R&D team validated that all sites had the similar configurations. When conference began, the testing results reached. We’re clarified that all sites had the same issue occurrence in field, which meant it was a systemic issue after the new software’s upgraded a couple of days ago.
The proactivity of field team was shocking me by incredibly further developing our understandings, even though it’s not what we expected.
3.In the middle of difficulty lies opportunity
In conference call, as the second option to figure configurations out, it definitely was denied. Most people believed we were in the worst situation with the systemic issue identified. To further investigation, we had to go back to the initial option to seek the “bug” inside our new software. So that the owner of issue in R&D team started to commit that we would go through again the checking points to see any possible new findings. After the commitment, the conference came into a short silence. Seemingly no more good way to go ahead, except of falling back..
Suddenly one colleague from field team asked, “Do we meet the same problem in our R&D lab?”, “Never” someone answered. I supposed all involved people knew it’s all right when the same testing scenario was made in R&D lab, but now seemed not. “Why it’s no issue before software upgrade, instead after upgrade the failure occurred?” one man asked another question. “Compared to the old software, the new one had the same testing cell phone, the same configurations, and the same MSC.” I broke down the last question to be more understandable. “What’re kinds of hardware types in R&D lab?”, someone continued asking. The conference was becoming a brainstorm.
At last the vendor of MSC was questioned, we had the different hardware type in field from one in R&D lab. “Any possibility introduced by the vendor?” I put forward the third option so far. “We need more traces to validate it.” one colleague in R&D team carefully answered, a few people weren’t sure if we need to go this way. “Now we don’t have.” I reminded and analyzed, “Up to now we even don’t ensure it’s absolutely our own software issue. If working on the first option, the issue is considered to be 100% caused by us, but fact is we have only 50% possibility. However, if working on the third option, we’re trying to remove the 50% possibility.”
4.Nobody wants to be the last one for the bad news
Since field team was responsible for the communication with customer the next day, they worried about the bigger challenge and push from customer, and the risk of being requested to stop the first-off project accordingly. Therefore, they hesitated whether to inform the customer of the problem or not. On the contrary, I insisted we should inform customer the systemic issue after we further identified it’s not our cause in term of more traces. Finally, I succeeded in persuading them to open the latest findings to customer. In review, I suppose the following three points relieve their worries.
1. Nobody wants to be the last one to know the bad news.
2. It’s time to speak up, and MSC cooperation would be escalated by customer.
3. Not a bigger risk, but probably smaller one, as MSC was involved.
In next morning meeting with customer, MSC team was also invited to participate, although before meeting customer didn’t know why we requested it. As expected, the customer immediately challenged us that, if no quick solution was provided in two days, the upgraded software would be rolled back to the old one and the field project would be stopped till the issue finally got solved. As planned in conference call last night, we raised the question to MSC and carefully defined the problem as an interworking issue. In particular, we highlighted to customer that we didn’t have the same issue in our R&D lab, where our own MSC was used.
5.No data no decision
Almost simultaneously, MSC asked, “Why has there never been the same issue when our MSC interworked with other vendor’s BSS?” Customer turned to us, interested in how we’re going to answer the question. I clearly noticed that MSC team was challenging us to try to draw back to the issue state inside BSS. I didn’t buy it as no data no decision. Moreover, that’s not the point of current issue if we went to gather the details of MSC. “It’s no sense!” I had to say loudly, then repeated that, “Please note that, no issue for our own MSC interworked. I can give a testing list in R&D lab later on.” The MSC team wanted to throw out one more question, but customer interrupted them by asking to us, “What’s the next plan to verify?” We added our action points to take soon and particularly asked for the cooperation from MSC team. Customer satisfied our requirements and then said “Please raise an official report of root cause analysis as soon as possible, after everything is clear.” Apparently the deadline of two days at the beginning of the meeting was changed in silence.
The root cause quickly was identified after the interworking traces were obtained in the following days. It’s precisely because MSC unexpectedly activated a parameter to take an optional bit in downlink message to BSS. Inside our upgraded software, we started to try to handle it but failed, so that the issue popped up on our side. While inside the old software, the codes just simply ignored it and continued working fine.
6.To be understood by commitment
I came back to Shanghai office and was pleased to get the customer’s acceptance that there was no need for us to go for the solution for this issue. The MSC would deactivate the parameter to avoid it. Together with the acceptance, customer also appreciated us sincerely for our contributions for proactively figuring out the root cause of the issue, thus not hindering the project from going ahead in field. I replied the thanksgiving letter of customer to a big mail loop, where many teams and bosses were included. I specially did one thing, to extremely appreciate the field team. I gave the thanks for their assistances on the driving test, because I really thought the proactive activity was the dawn of the whole investigation process. Without it, we could hardly get any valid solution or a fast progress before time-out. I recalled the conference call that night when I said thanks in mail. Besides, I attached a list, in which the interworking testing cases in R&D lab was shown. It’s committed in the second meeting to customer, as well to us. I didn’t want to forget any data to maintain the decision days ago, even perhaps it’s never helpful anymore now.
The road less traveled will help me grow into an insightful man.