Content area
Business locations is most important factor to consider before starting a business because the best location attracts more number of people. With the help of web search engines, the customers can search the nearest business location before visiting the business. For example, if a customer need to buy some jewel, he makes use of search engines to find the nearest jewellery shop. If some entrepreneur wants to start a new jewellery shop, he needs to find a best area where there is no jewellery shop nearby and there are more customers in need of jewel. In this paper, we propose an algorithm to find the best place to start a business where there is high demand and no (or very few supply). We measure the quality of recommendation in terms of average service time, customer-business ratio of our new algorithm by implementing in benchmark datasets and the results prove that our algorithm is more efficient than the existing kNN algorithm.
Full text
1. Introduction
Starting a business helps in lots of ways like more employment, better economy, increased GDP, more opportunities and so on. Event there are Internet users are increasing day by day, it is estimated that there are about 3.58 billion users worldwide that use internet [1]. Whenever a customer wants a service, he might search the internet for finding the nearest business that offer his service of interest, for example, if a customer wish to go to a restaurant, he may issue a search request to the search engine like ‘find the restaurants near me’. The search engine then lists few restaurants which are near to the user location. Three types of informations can be obtained from this search result, the first one is the customer location, the second one is the customer keyword (ex: restaurants) and the last one is the locations of business which are offering the keywords of customer search. The locations can be found using lots of ways like GPS [2], Wi-Fi [3], Mobile Networks [4], ad-hoc networks [5,6] and so on. With the help of these three informations, a best area recommendation can be found. An area (a closed polygon, with set of points) is said to be a best area for a business where there are lots of demand (the customer search) and there is no enough business to provide the service. People always like to stay in a place where all their needs are within a acceptable range, but in practical, it is very difficult to find a place where all the needs are within acceptable range. A proper mechanism should be there to detect the missing needs in an area [7,8,9]. This motivates us to develop an algorithm to find a best place. The best place is one which has lots of customer demand for a particular service and there is no (or few) business that provides the required service.
A keyword may not be only a business, it can be a specific service also, for example the keywords related to stationary may be pencil, highlighter and so on. Hence, the recommendation system should be able to predict both a business as well as individual services. Most of the spatial databases support this type of keyword-business storage, thus it is very easy to recommend a best location based on keyword or business. Figure 1 visualizes various business and their keywords.
In the recent years, the advancement of internet shopping forces people to choose many services online making the local business to lose most of their customers [10]. Till now, many businesses spend enormous amount of money in marketing such as advertising, affiliate marketing and so on for preventing customers to go online and for finding new customers. Many medium-scale businesses and small-scale businesses skips the customer-area demand analysis because of the complexity and more work effort in analysis, this omission activity causes massive customer count. However, a business started at best location not only helps in marketing cost cut but also in increased customer count [4]. Various algorithms [11,12,13,14] are developed to recommend a location for a business, but there is still room for improvement because the customer need data are not fully utilized for analysis. In this paper, we propose a new machine-learning [15,16,17,18] based algorithm for giving a solution for finding a best place for a business using customer demand and competitor datas. The frequent algorithm used for solving the location-recommendation system is reverse nearest neighbour search(RNN) [19], in this paper, we prove that our proposed algorithm is better than RNN. The contributions of this paper are summarized as follows
- The proposed method recommends a best location for starting a business where there is high customer demand and less competitors.
- The proposed method also predicts the correlation among the search keywords to cluster them.
- The results prove our work is better than existing algorithms for recommending business in terms of average service time and customer-business ratio.
The rest of the paper is as follows, Section 2 describes the related works done in the field of business visualization, Section 3 explains the problem definition, the solutions to the problems are addressed in the Section 4. Section 5 tells how to merge the keywords to business and finally Section 6 shows the experinment results and proves our algorithm recommends good location to start a business.
2. Related Works
Social media dominate internet nowadays, the users have a habit of sharing their locations of check-ins in social media. Ref [14] makes use of this information to recommend business without any domain-specific user intervention. Apart from the user location data, many other additional data like social, economical, environmental and cultureal factors can be used for making the business recommendation stronger. Twitter is one of the famous social media with more than 336 million active users, hashtags can be used to extact those additional information [20]. As the data grows exponentially in various social networks, social networks services (SNS) are designed to operate on these rapidly increasing data [21].
A set of parameters like price, quality, brand name and so on are considered to construct a group decision making matrix [22] to recommend a restaurant at a particular area. Decision making matrix are also used as a good parameter to decide whether to start a business in a particular area, it is used in many other works such as in [23] where both vertical and horizontal pair wise parameters are considered to construct the decision making matrix. Creation and distribution of user-generated content (UGC) has made a strong impact on trong impact on consumers, media suppliers, and marketing professionals for the group decision making matrixs, and the research work by [24], aim to find the relationship between the various actors on the role of decision making. Despite of various actors play on UGC, a proper analysis leads a significant positive results in terms of decision making. In the research work [25], the key differences among the actors are exploited based on page joining decisions and the timing differences in them. These strategies help in efficient decision process.
Many works are done for finding a competitor based on keywords. Ref [22] explains how to retrieve set of competitors based on location (X-Y coordinates). Ref [23] adds the keyword (text to describe the business services) to retrieve set of competitors. IR2 is a very good example for fetching the competitors based on both location (nearest) and keyword, however there is a problem of false hit which is eliminated by set based theory as introduced in [26].
The modern recommendation systems does the recommendation process by discovering the user‘s preferences [27] such as age, gender [28] and even via previous histories [29]. Users generally posts their check in updates on many social media such as Facebook, Twitter, the comment along with his post is considered for mining the GPS location and the sentiments of that particular shop. Sometimes, it is difficult to mine the sentiments of the customers as the current mood of the customers varies over time [30]. Combining social media with the recommendation process has many advantages such as the sentiments of a person is strongly connected with the sentiments of his/her friends [31,32]. Hence when a person likes a particular service, it is more likely that his/her friends do like the service [33]. Few recommendation considers to extract topics from the tweets or posts present in the social media for recommending places [34].
Collaborative Filtering [35] is one of the major aspect many researchers are working in the field of recommendation system. Collaborative Filtering has the ability to remove all the services which user don‘t like [36].
3. Problem Definition
LetAidenote an area (a city, a town or any predefined size region) which has set of customersCiand list of businessesBi∈b1,b2,b3, …, each business b ∈Bihas set of textual keywords represented asti. Our aim is to find a set of businessesBk, whereBk∉BiandBkhaving high search keywords. In this paper, we aim to provide solutions for two problems as shown below:
Problem 1: To count the customers in AreaAisearching for keywordtj, recommend a set of business B, which are not present inAi.
Problem 2: Given set of keywordst1,t2,t3, …, recommend a minimum set of business which covers all the keywords.
4. The Business Recommendation Model
Figure 2 A typical example of keyword search and business scope visualization from two areas A1 and A2. Texts inside curve brackets represents the services each business is offering. The directed arrows show that customers are redirected to go to a business based on their needed service.
In this section, we explain the working FindKeywordMissing Algorithm and how it is used to find the missing high-searched keywords in an area. Figure 2 shows the sample diagram where there are two areas A1 and A2, each area having set of businesses, the diagram shows that the keyword ‘t’ is searched by more customers and there is no business in A1 to provide ‘t’, instead the area A2 has a business which offers ‘t’, so the customers from A1 has to travel to A2 to avail the service ‘t’. the dotted circle in A1 is a recommendation given by our algorithm where there is one scope for a business to offer service ‘t’.
Our recommendation model works by using SI-Index based search [37] for detecting the most searched services in a particular area. The SI-Index based search accepts services as input and outputs the nearest K businesses which provide those services. The algorithm first builds the two-column inverted table as shown in Figure 1c, where one column refers the available services and the next column lists the business shops which offers the respective service. The locations of the business shop are stored in a R-Tree, which is used to return nearest K neighbours. Given a set of services S, the SI-Index based search produces a set of businesses B which provides S as per the following equation.
Bq={b∈B∣Sq⊆Sb}
Whenever a customer search is made for a service ‘t’ at area Ai, the SI lists few businesses B belong to B1, B2, …, where B provides ‘t’.Δ(B,Ai) will return true if equation 1 holds.Δ(B,Ai) tells whether any business in placed within Ai itself. IfΔ(B,Ai) is false, then it z that there is no business in Ai that is providing ‘t’, therefore the customer has to travel to another area to get the service. Counter called ‘missing’ is incremented each time whenΔis false, this helps the algorithm to keep track of number of customers travelling to another location. A threshold value TH, a function of population count in entire domain of areas, is used as an alert to recommend a business to provide the service ‘t’. Algorithm 1 explains the whole concept of FindKeywordMissing().
Δ(B,Ai)=trueif ∃b∈B,b resides within polygon Aifalseif ∀b∈B,b resides outside polygon Ai
| Algorithm 1: Find Missing Keywords |
Proof.
Our Algorithm results in less time for a customer to get a required service. Lett1be the total time taken by a customer to enter a business in foreign area andt2represent the total time taken by a customer to enter a business in home area. Our model avoids the traffic details and other parameters related to travelling. Letλbe the average arrival rate of a customer andμbe the average service rate by a business,π1 be the time taken by a customer to travel from his home to business (at another area),π 2 be the total time taken by a customer to travel from his home to business (at same area). The total time spend by a customer when the business is outside his area is given by Equation (3), Equation (4) represent the total time taken by customer to get served at a business shop when the shop is located at his own area. Obviouslyπ2 ≤π1, so our algorithm let the customers to spend less time to get the required service.
t1=1μ−λ+π1
t2=1μ−λ+π2
□
Eliminating Useless Business Visualization
The Algorithm 1, recommends a business b for a service s inside an area Ai if there is high demand of s inside Ai and there is no business to provide s. The exact location of b is not recommended due to high complexity, so only visualization is done at area level rather than exact location level, but there is a chance of useless recommendation, that is for example, consider two adjacent area having customers in one area and business in another area, but both customers and business are placed nearby, then Algorithm 1 recommends a business inside the customer area which will be useless because the recommended business might not get enough customers due to large distance. Figure 3 shows this scenario.
Algorithm 2 considers the intra area distance and the ‘missing’ counter is incremented only if the distance between the customer and the midpoint of area is less than distance between customer and the business. 5. Business Coverage
Algorithm 2 can efficiently mine the missing service in any area and visualizes them to recommend new business, but the goal of the recommendation will be efficient only when it visualizes as less number of businesses as possible to cover all the missing services. For example, if there are five missing services like ‘Apple iPhone 7’, ‘Google Pixel 2’, ‘Galaxy S9’, ‘T-Shirts’, ‘Jeans’, then our visualization should be good enough to recommend just two businesses ‘mobile shops’ and ‘fashion shops’ instead of recommending 5 different businesses. Sometimes, there might be a single coverage of non-corelated data also, for example, soaps and biscuits can be covered by a single departmental store even though soaps and biscuits are non-correlated. We create a new LDA [38] based algorithm called BC-LDA (Business Coverage-LDA), over the years, many research works [39] focus on LDA. Figure 4 shows the BC-LDA plate model for covering services into business, the symbol meanings are mentioned at Table 1, the process of BC-LDA is discussed here
- First choose a service s either from Dir(theta) or Dir(zeta)
- Choose a business B Dir(pie)
- For each service s in 1, 2, …, N
– Draw an assignment b multinomial (s,B)
| Algorithm 2: Find Missing Keywords (Considering Inter Area Distance) |
- Customer-Business-Area Ratio (CBA-Ratio): This represents the ratio of average number of customers search for a service and the availability of the service in a particular area. The more this value, the better recommendation is done, if there are too many useless businesses, the value goes down.
- Average Service Time (AST): This value represents the average time for a customer to visit the business (travelling time) plus the time needed for a customer to acquire the needed service (waiting time and service time). The less AST will give better recommendation.
Dataset: We have implemented our algorithms in both real as well as synthetic datasets. There are two synthetic dataset used, the first one is random dataset where the business locations and their services are placed randomly using uniform distribution. The second type of synthetic dataset used is skewed dataset, which contain similar business services near to each other. The real dataset used in our project is Restaurant Dataset [40]. The details of the datasets are listed below at Table 2.
6.1. CBA-Ratio
CBA ratio tells more about business utilization rather than simply number of businesses per area. kNN always fails to give better CBA-Ratio because it has no ability to learn the useless businesses, more useless businesses will be visualized and a false recommendation will be created due to kNN. The graph at Figure 5 shows the CBA-Ratio and it tells that our algorithm utilizes the newly opened business very efficiently.
6.2. AST
Average service time is one of the factors that affect the profitability of business, if the total time spend for a customer to avail a service is more than a tolerable time, then there is more chance for the business to loss a customer. Our algorithm reduces the average service time as it visualizes the missing business in area and helps entrepreneurs to start a new business to reduce the travelling time. Graph at Figure 6 shows that our algorithm has less service time than kNN because kNN generate more useless businesses. Figure 7 explains that average service time is reduced by acceptable duration when there a recommendation system.
6.3. Business Coverage
When there is more business then there is no proper utilization of resources, when there is only one business for providing one service, then the space occupied, transportation cost and many other parameters increases. A best solution for proper utilization is recommend as low number of business possible to cover the entire services. This is what done by the BC-LDA model, Figure 8 shows that the business utilization of BC-LDA is more than that of without doing any business coverage. Business utilization can be defined as per the Formula (5).
Utilization=Number of customers going to recommended businessNumber of customer going to other area
At the summary, the proposed method recommends the missing business keywords from each area. The recommended keywords are then analyzed into distinct unique topics where each topic is the collection of related keywords. The recommended topics are then considered for starting a business in the area so that the demand frequency is high and there is less competition. 7. Conclusions Business is an important factor for a growth of a country, there are number of advantages in starting a business like more employment, increased GDP and so on. An area where there is high demand for a service and there is no business to provide the respective service is called as best place to start a business. In our work, we visualizes the best area to start a business and making it to run profitability. In this work, we consider two parameter to validate our results, the first one is CBA-Ratio—the business utilization by customers per area; the second one is AST—the average service time spend by the business for their customers. The traditional kNN algorithm is been compared with our proposed work and the results show that our proposed model is giving better results than existing works. We also introduced a modified LDA model to cover services into less number of business to utilize the resource effectively. The LDA combines the keywords into topics where each topic represents a business domain. In future work, we plan to consider some legal issues, tax and rent information before visualization of business. This makes more refined recommendation system.
| Symbols | Meaning |
|---|---|
| α,β,γ | Dirichlet priors on Multinomial distributions |
| N | Number of services |
| M | Number of businesses |
| W | Represent each mapping of service and business |
| Θ | Represents the Non-correlated service-business distribution |
| ζ | Represents the correlated service-service distribution |
| I | Correlated business assignment for a service |
| T | Non-correlated business assignment for a service |
| ϕ | Represents business distributions |
| Dataset | Total Business | Average Services per Business | Average Business per Area |
|---|---|---|---|
| Restaurant Dataset | 456,288 | 14 | 12 |
| Synthetic (uniform) | 100,000 | 15 | 10 |
| Synthetic (Skew) | 100,000 | 15 | 10 |
Author Contributions
Conceptualization, A.K.P. and S.S.G.; Data curation, A.K.P., S.S.G.; Formal analysis, P.K.R.M. and T.R.G.; Investigation, P.K.R.M. and T.R.G.; Methodology, A.K.P. and S.S.G.; Project administration, T.R.G., A.A.-A. and M.H.A; Resources, A.A.-A. and M.H.A; Software, P.K.R.M.; Validation, P.K.R.M. and M.H.A.; Visualization, P.K.R.M. and A.A.-A.; Writing-original draft, A.K.P. and S.S.G.; Writing-review and editing, T.R.G., P.K.R.M., A.A.-A. and M.H.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Raytheon Chair for Systems Engineering.
Acknowledgments
The authors are grateful to the Raytheon Chair for Systems Engineering for funding.
Conflicts of Interest
The authors declare no conflict of interest.
© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.