[How-To] Setup AWS Billing Alarm to get Notified Automatically on Usage Charges

I was recently teaching a course on AWS services and an incident with a student group reminded me of the importance of Billing Alarms. This article is dedicated to all those who are getting started on Amazon Web Services (AWS) and don’t know how to protect themselves from incidents that can cause their bill to go up like a rocket.

If you are a beginner or someone interested in keeping a check on your usage, you will find it comforting to note that AWS has this awesome service called AWS CloudWatch, a service that is dedicated to monitoring of your AWS usage. It helps collect logs, metrics, outlier events (anomalies in usage etc) for your applications and using CloudWatch Alarms allows you to set up notifications to get alerted if your usage goes above a certain threshold. In general, like a friend of mine whom i discussed this article with said that anything which is postpaid, should have fences to limit the blast radius. When i heard this, i immediately thought about it and it in general applies to everything, your health, your relationships, money and work.

Reduce the blast radius

Steps for Setting a Billing Alarm

1. Login to your AWS Console at console.aws.amazon.com and go the CloudWatch Service.

2. Once within the CloudWatch service, click on Billing menu item.

Fig : Click on Billing menu item from the various AWS CloudWatch menu items

3. Click on the Create Alarm button. On the screen, you will see two Create Alarm buttons, you need to click on the lower button. Its not that you cannot create a billing alarm using the button on the top, but it will require additional steps. For now, lets click on the second button on the page with the title “Create Alarm”.

Fig: Click on Create Alarm Button to setup a Billing Alarm using CloudWatch Alarm

4. You should see the screen as shown below, in case you don’t see the screen as shown that probably means you have clicked on the first “Create Alarm” button.

In the screen below, we can see

  • Metric is EstimatedCharges
  • Currency is USD
  • Statistic is Maximum
  • Period is 6 hours — this is the time period within which these checks will be made.

Above, we are going ahead with a simple static condition which will trigger an alarm whenever our EstimatedCharges metric crosses 100USD. This will be checked every 6 hours. Worst case scenario is when you hit the 100 USD mark at the 1st minute of the 6th hour window and that can lead to overflow.

5. Now that the Alarm condition is set, we need to configure actions. We will here, setup a notification to be sent to us when the Alarm state trigger is in “In Alarm”. AWS uses Simple Notification Service to deliver notifications and we will configure that to send us an email notification of this alarm.

Fig : Setup SNS notifications to get notified whenever a Billing Alarm is triggered

6. Clicking on Create Topic will trigger an email to be sent to you. Click on the “Confirm subscription” link.

7. Scroll down and click on Next to create your Alarm.

8. Give a name and description to the Alarm that we are creating.

9. Once you click on Next, you will be able to see the following screen. If you haven’t confirmed your email from step 5, you will have to do that for the screen to appear as it is shown below.

Right now, given that we haven’t spent much time, State of the Alarm is in “Insufficient data” but it will change to “OK” once its been a while and enough data has been collected.

This is it, now you will get notified whenever your AWS usage goes above 100 USD.

Momentum Portfolio (BORS) for week of 11th January 2020

Disclaimer: This post and all related posts on Stock Selection are purely academic in nature and are not to be executed as buy or sell decision. For stock market related actions, do consult a SEBI registered advisor. I am not a registered SEBI advisor and like I said earlier, this post is a merely for educational purposes. I am a Software professional and playing around with Data and finding patterns in them interests me.

In my last post I discussed a Momentum driven stock selection strategy. I have a momentum portfolio I have been tracking for education purposes and will share the stocks selected here every week. I will also share performance metrics of this algorithm regularly. I call this algorithm BORS (Breakout with Relative Strength) and while the core of the selection methodology is similar, here I am mostly interested in stocks which are breaking out, while also outperforming Nifty 200 index. My stock selection universe for BORS is Nifty 500.

The way to visualise this portfolio is through an equal weight allocation.

  • We will buy stocks worth 1,50,000 Rupees. (this is the minimum amount required for 1 Lot of this portfolio)
  • As this is a 20 stock portfolio and no stock should have more than 5% allocation, therefore we will limit max amount allocatable per stock to 7,500 Rupees.
  • Every Friday end of day we will rebalance this portfolio and as part of that process some new stocks will enter and some existing stocks which lose their ranks will move out of the portfolio.

Below are the stocks with relevant quantities that we are buying in one lot. One thing to note here is that after this entire deployment is done, we will have around 12500 Rupees in cash. For tracking purposes, I will be keeping that cash in LIQUIDBEES.

UPDATE 11th January 2020 : HFCL has got printed as HPCL below. Please read that stock as HFCL

For the purposes of tracking, I will use the closing price of the stock at 9:25 am on Monday as the price at which the transaction would have happened.

Disclaimer: This post and all related posts on Stock Selection are purely academic in nature and are not to be executed as buy or sell decision. For stock market related actions, do consult a SEBI registered advisor. I am not a registered SEBI advisor and like I said earlier, this post is a merely for educational purposes. I am a Software professional and playing around with Data and finding patterns in them interests me.

Getting started with a Momentum Algorithm on NSE 500 Stocks

Disclaimer: This post and all related posts on Stock Selection are purely academic in nature and are not to be executed as buy or sell decision. For stock market related actions, do consult a SEBI registered advisor. I am not a registered SEBI advisor and like I said earlier, this post is a merely for educational purposes. I am a Software professional and playing around with Data and finding patterns in them interests me.

I have been studying a momentum algorithm for a while and a friend of mine asked me to write a few posts about it on the blog (he has been pushing me to write often here, if not for anyone else for him).

Methodology being followed for Stock selection and Ranking:

  • My universe for this exercise is Nifty 500. So at any point, if a stock is part of that Index, its part of my universe. If a stock moves out of the index, it gets kicked out during the next weekly rebalance.
  • From this universe, we will filter out stocks which are “Strong”. Strong stocks in my view are those (please note, here Strong does not mean Fundamentally strong) which trade above their key averages. So one way of identifying strong stocks will be to pick all those stocks from Nifty 500 universe, which are trading above their 200 day moving average.
  • Once you have a list of these stocks, it’s best to do another level of filtering. This time we will filter out all those stocks which are underperforming Nifty 100 or Nifty 200(you can take any index for that matter, broader indexes are better in my view). One important thing to note here is that you should look at filtering stocks if they are underperforming on several timeframes (timeframes here means, 1 week / 1 month / 1 year etc..). If a stock has been outperforming on larger timeframes but going through a small consolidation on a weekly basis, it might not make too much sense to get rid of it right away. But again, this is also upto the designer of the selection mechanism to decide in terms of what they expect from the algorithm.
  • Now we have a filtered list of stocks which are Strong and also not underperforming a broader index. At this point, I want to pick stocks which have momentum. Momentum for me refers to the total percentage change (absolute change makes no sense, in case you are wondering). As a smoothing factor I then divide the percentage change with the standard deviation of the stock (volatility) over a 252 day rolling window. (252 here is for 1 year). Idea here is to pick stocks that move up smoothly (with less volatility). This is a key learning from Quantitative Momentum by Dr. Jack R. Vogel.
  • Next and final step is to pick top 20 stocks and rebalance this list every week. A stock will exit this list if it falls below a rank of 35 or gets evicted from the universe (in our case, Nifty 500).

I will be sharing the output of this algorithm in form of stock picks for the week and also performance summary every week. I am hoping to learn and make this skill better along with you. Again, don’t forget the disclaimer and use this post only for education purposes.

Disclaimer: This post and all related posts on Stock Selection are purely academic in nature and are not to be executed as buy or sell decision. For stock market related actions, do consult a SEBI registered advisor. I am not a registered SEBI advisor and like I said earlier, this post is a merely for educational purposes. I am a Software professional and playing around with Data and finding patterns in them interests me.

How iOS14 and IDFA opt-in will change the Advertising and Marketing technology landscape

Apple has always been a proponent of tighter privacy. I remember watching a video of late Steve Jobs where he categorically mentioned that if you want access to a certain data on the phone, you need to ask the user twice. (i will embed that video in this post). However, in the last few years, Apple has been taking the fight around user privacy as part of its product design and management philosophy. The latest blow on this account is the opt-in announcement for IDFA from Apple.

The Takeaway

In iOS 14, IDFA will no longer be available without a double-opt in by the user.

In iOS 14, both a publisher and destination app will be required to receive permission from the user to track them in order to read the IDFA, making it an exclusively opt-in feature.

IDFA is not going anywhere, an App will require exclusive permission to be able to access IDFA.

What exactly is an IDFA?

IDFA is an acronym that stands for “Identifier for Advertisers” and is an identifier that is assigned to an Apple device, by Apple. Because its a unique identifier within the Apple ecosystem and is shared across apps on your phone, IDFA allows your device (and therefore you) to be tracked on the internet. IDFA was first introduced in iOS6 and has been around since then. As an end user up till this point (i.e. with iOS13), IDFA can be set to be shared only with a few apps, it can be reset by click of a button. It is automatically reseted whenever the device is erased and reseted.

How does Mobile Advertising industry use IDFA?

For an ad to be effective, there are a few things it should have.

  • Product : What are you selling through the ad?
  • Audience — Who is the Customer? (and how can we reach this user?)
  • Messaging — The choice of words, fonts, placement of text, images, audio, video etc that can help the product become a hero product in the eyes of the audience.

If your ad is personalized to a certain user behavior, or its about the last product the user checked out on your site (such ads are generally called retargeting ads, because they are “re-targeting” you as a user to come back to the site) then you need to find the right audience on the internet. Every user on the internet has some deterministic or probabilistic identifier attached to them, these identifiers help ad networks reach out to these users with the right ads. IDFA for iOS and GAID for Android (GAID is let’s say for the sake of simplicity, equivalent of IDFA in the Android world) are these identifiers for the mobile advertising and marketing ecosystem. Now, depending on the use-case at hand, you can either “include” or “exclude” users. To understand this, let’s say you are a game developer and you have the set of users who have already installed your app, you would ideally not want these set of users to see your ad. For such cases, the exclusion list you provide to the ad network should have IDFA’s (and GAID for Android) of your existing active user base.

So in a nutshell, if i have to write out, i will say IDFA (and GAID and all the other probabilistic identifiers companies have) help with the following:

  • Personalized ads based on what you are reading, which apps you are accessing, which products you are checking out / buying. — If the ad network or the advertiser is dependent only on the IDFA, then behavioral targeting will be very very tough.
  • Measurement – once an ad is trafficked on the publisher side (publisher is the one who shows the ad), it needs to be tracked upto the final conversion. IDFA is the binding piece that is the bread and butter for measurement companies (like Adjust, Appsflyer, Branch.io etc). So questions like — ” hey, which all IDFA’s saw the ad as part of this campaign + ad set combination and converted on thursday” will be hard to answer when IDFA stops flowing through and there is no other identifier at play.

Getting ready for iOS14 – What to Expect?

  • All users who wish to share their IDFA with an app, will need to opt-in. A lot of uncertainty exists on opt-in rates, but within the advertising and marketing forums there is a sort of consensus growing that these rates will be on the lower end. Bottom line is, with OS14, all apps will need to have opt-in if they wish to get IDFA access. So essentially, both advertiser as well as publisher app will need to have opt-in granted by the user to make the “measurement and tracking flow” work.
  • Because the onus of opt-in will be on publishers, so a lot of advertising publishers like game developers are expected to be hit with this change.
  • There are also reports of some large clients working with agencies on designing “communication” around opt-in for users, to cajole them to opt-in.
  • One thing is certain – this change will hit retargeting networks, the smaller ones for sure.
  • Without IDFA, a marketer will not know how the users reacted to their campaign at an individual level. Aggregate stats wont be impacted.
  • Will this impact Google too (the largest ad network) — Well, there are no good answers to this, but from what we already know from various posts and discussions; Google since many months now, has a probabilistic model in place, and they are the most prepared. Google’s inherent interest is in killing the GAID as that will leave other networks extremely vulnerable and my bet is that with IDFA going away, its just a matter of time that we will see GAID also getting deprecated and then cleaned out.
  • The thing with Data is that those who already hoard a ton of it, only get stronger when restrictions of any type come in. Therefore, my reasoning suggests that Facebook and Google who hoard tons of user data will get even stronger post IDFA.
  • For the measurement side of the world, there are two approaches that are clear right now.One direction is which apple prefers, the SKAdnetwork library that Apple has built does aggregate level conversion reporting. This is what apple wants measurements to be based on. However, with this you cannot tell which user came because of which campaign. The other path is to figure out a way to do user level optimisation while staying compliant with Apple’s policies. One thing is clear to me, that Apple does not want to inhibit measurement for advertising. Apple’s goal is to inhibit personalized advertising for users who have not opted in for it. So there is a chance that in future we will see something which is similar to the google referrer id.

So the measurement world is going to move to some Probabilistic mechanism?

Thing is that whenever you take something that is Deterministic like an IDFA (or a GAID) and start to shift to proababilistic approach, it always brings a lot of challenges. It also allows companies to make claims that “their proababilistic method is better than others”.

How should Advertisers prepare for IDFA less world in iOS 14?

Well, a lot of details are not around still, but one thing that every advertiser should do is an IDFA audit. Understand how exposed you as an advertiser are to IDFA. If a lot of your internal systems depend on it, you could face a lot of heat in the coming months.

PayPal aquires Honey for $4 billion

Being in the marketing tech industry, I am a keen observer of the latest happenings and generally also have a view on those. A few days back, PayPal had announced its aquisition of Honey for a mind boggling $4 billion. In this article I will try to deconstruct Honey, how it made money and what does PayPal stand to gain by acquiring Honey.

Honey’s offering – best coupon for your cart

Honey offered an interesting, yet very simple browser extension ( it also came out with a mobile app) product that allowed customers an ability to add a coupon to their cart basis the products in the cart. Coupon unavailability is one of the main reason cited by shoppers for cart abandonment.

A browser extension that allows you to save money is usually an easy sell with users. However, a browser extension is more than what it seems as it’s able to record your browsing habits even for sites not covered under its primary use case ( in this case coupon). What this means is, that as a user who has a browser extension installed, you are unknowingly sharing a lot of data with the extension. Some of the information that a browser extension can have access to are

  • Your IP address ( which means your telco, your city, state, country etc )
  • Your latitude and longitude ( your location)
  • Sites you visit. When do you visit them. (How often you visit them), how long do you stay on a site / page etc…
  • Which products do you buy. Do you have any specific preference of e-commerce site and product or a category of products. For example, you buy groceries from Walmart but electronics from Amazon.

How does Honey ( and others like Honey) make money?

Honey’s core service is around finding the best coupon for you. Which means, irrespective of where you land on the e-commerce site from, if you apply a coupon using Honey ( or a coupon provided by Honey), it will press a claim for that sale and take an affiliate fee. This fee is usually a percentage of the sale amount. Given the last mile nature of this attribution model, Honey will always stand a higher chance of getting the attribution.

Here is how it works:

  1. User lands on the e-commerce product page. Honey and likes do not care ( for their monetisation) where you land from.
  2. User uses the browser extension to apply a coupon and makes the purchase.
  3. Ecommerce company gives the attribution for the purchase to Honey.
  4. Honey collects cash from e-commerce company as per the agreed payment terms.

Why does PayPal care about this?

Well, PayPal has tons of users who use its payment services and it has quite a lot of merchants now. Payment processing is a commodity in today’s age and if PayPal is able to keep and grow its merchant side, it can make the merchants pay for not just payment processing but also collect affiliate fee from them. It can even use the intelligence and data acquired through Honey to not just send right kind of offers to right kind of users, it can even push merchants to build specific offers for specific users. This is the kind of insight that merchants usually don’t have any insight about but might not mind paying for(?), if it helps them increase net revenue and order count.

In its press release, PayPal says that this acquisition is purely about consumer side. While I would have loved to agree with that, but given the quantum of money involved am of the opinion that this needs to cut through both sides. That is, there should be a merchant side story as well, something that PayPal might not want to say out in the open, given the fact that any merchant side story about Honey will inherently mean playing with data.

At the time of acquisition, Honey had 17 million MAU ( monthly active users), which is in my view not a huge number and at a payout of $4billion, PayPal paid around $235/MAU ( that’s a lot). Honey’s audited revenue was at $100 million last year and it possibly grew 100%. At $200 million, it’s still 20x revenue that PayPal paid for Honey.

I will write more about this in future whenever something around this comes up. If you have any thoughts or questions on the opinion I have written above, feel free to comment.

Stratified Sampling using scikit-learn

When building classifiers for your dataset, we often see that the dataset has imbalanced distribution of features. Some of these features can be very important for prediction of the labels. For example, if you knew that a certain city has a population of 47% female voters and 53% male voters and results of the election are a function of the distribution of male and female votes, then when you sample data for a survey (or for training and testing a machine learning classifier), its important that you maintain this distribution in your sampled data, to be a representative of the population. We call this “Stratified Sampling”.

Stratified Sampling is important as it guarantees that your dataset does not have an intrinsic bias and that it does represent the population.

Is there an easy way to divide the dataset into training and test dataset while maintaining the composition of the key feature?

There are two modules provided by Scikit-learn for Stratified Splitting:

StratifiedKFold : This module sets up n_folds of the dataset in a way that the samples are equally balanced in both training and test datasets.

>>> import numpy as np
>>> from sklearn.model_selection import StratifiedKFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> skf = StratifiedKFold(n_splits=2)
>>> skf.get_n_splits(X, y)
>>> print(skf)  
StratifiedKFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in skf.split(X, y):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 3] TEST: [0 2]
TRAIN: [0 2] TEST: [1 3]

StratifiedShuffleSplit : This module on the other hand creates a single training / testing set having equally balanced classes. This cross-validation object is a merge of StratifiedKFold and ShuffleSplit, which returns stratified randomized folds

>>> import numpy as np
>>> from sklearn.model_selection import StratifiedShuffleSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 0, 1, 1, 1])
>>> sss = StratifiedShuffleSplit(n_splits=5, test_size=0.5, random_state=0)
>>> sss.get_n_splits(X, y)
>>> print(sss)       
StratifiedShuffleSplit(n_splits=5, random_state=0, ...)
>>> for train_index, test_index in sss.split(X, y):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [5 2 3] TEST: [4 1 0]
TRAIN: [5 1 4] TEST: [0 2 3]
TRAIN: [5 0 2] TEST: [4 3 1]
TRAIN: [4 1 0] TEST: [2 3 5]
TRAIN: [0 5 1] TEST: [3 4 2]

Stratification can also be achieved when splitting data by adding a relevant flag called “stratify”.

from sklearn.model_selection import train_test_split 
train, test = train_test_split(X, test_size=0.2, stratify=X['YOUR_COLUMN_LABEL']) 

In a follow up post, i will also add some experiment data to show how the distribution changes with Random and Stratified shuffle splits.

Note : Someone asked me recently, how to handle Continuous data ? Well, for continous data, you should first do “binning” of the dataset and then apply stratification. I will try to cover this too in a follow up post.