Content area
The use of AI presents numerous benefits for numerous industry verticals. AI technology facilitates the analysis of unstructured data from heterogeneous sources. The added value from AI technologies relies on the insight gained over the data, helping in the automation of specific activities, or tasks, or enhancing the human factor in taking better decisions. According to recent market studies, business intelligence and analytics showed to be the essential area in which AI can deliver results. Natural Language Processing (NLP) is the AI subdomain which deals with human language and speech. NLP sits at the crossroads between a diverse number of disciplines, from linguistics to computer science and engineering, and of course, AI. Opinion mining (or sentiment analysis) is a natural language processing technique applied to determine whether data is positive, negative, or neutral. NLP can be the perfect solution to solve the inefficiencies in the traditional recruiting model for both recruiters and candidates when it comes to candidate screening and profiling. Our case study presents a recruiting platform (SoMeDi) for internship campaigns that applies Sentiment Analysis (SA) techniques to improve the hiring processes aiming to increase the efficiency of internship campaigns by ensuring a better match between the candidates' professional skills and the hiring company fields of activity. The SoMeDi performs text analytics (sentiment analysis) over the candidates' input data, once it is collected when they register for various internship applications. The paper presents the SoMeDi recruiting platform architecture and the SA microservice, together with the results achieved after validating the platform in a real-world scenario.
Abstract: The use of AI presents numerous benefits for numerous industry verticals. AI technology facilitates the analysis of unstructured data from heterogeneous sources. The added value from AI technologies relies on the insight gained over the data, helping in the automation of specific activities, or tasks, or enhancing the human factor in taking better decisions. According to recent market studies, business intelligence and analytics showed to be the essential area in which AI can deliver results. Natural Language Processing (NLP) is the AI subdomain which deals with human language and speech. NLP sits at the crossroads between a diverse number of disciplines, from linguistics to computer science and engineering, and of course, AI. Opinion mining (or sentiment analysis) is a natural language processing technique applied to determine whether data is positive, negative, or neutral. NLP can be the perfect solution to solve the inefficiencies in the traditional recruiting model for both recruiters and candidates when it comes to candidate screening and profiling. Our case study presents a recruiting platform (SoMeDi) for internship campaigns that applies Sentiment Analysis (SA) techniques to improve the hiring processes aiming to increase the efficiency of internship campaigns by ensuring a better match between the candidates' professional skills and the hiring company fields of activity. The SoMeDi performs text analytics (sentiment analysis) over the candidates' input data, once it is collected when they register for various internship applications. The paper presents the SoMeDi recruiting platform architecture and the SA microservice, together with the results achieved after validating the platform in a real-world scenario.
Keywords: NLP, Sentiment Analysis, Recruiting, Analytics
(ProQuest: ... denotes formulae omitted.)
INTRODUCTION
Nowadays, online social media has become a necessary tool for recruiting since it has the potential to be cost-effective and efficient for text analysis. New technologies are emerging every day, and NLP has contributed to the field of human-computer interaction providing practical applications.
SoMeDi platform is intended to analyze the text and calculate the sentiment since there is a strong correlation between social profiles and users. The sentiment analysis tools developed within SoMeDi platform will be used for mining data from professional networks, social media platforms to provide personalized recommendation, evaluation of internship and/or apprenticeship programs offered by companies.
SoMeDi's main goal is to unlock the hidden values in the digital content and traces of human interactions using applications which require artificial intelligence and machine learning techniques. The provided methods produce improved sentiment analysis and opinion mining to increase the perception of the user's attitude towards topics and concepts at aspect level.
Next, the paper is structured as follows: section one, which talks about the related work and the impact of Artificial Intelligence on various fields from the labour market. In chapter two we present the platforms' architecture and explain how the developed components interact to present the results obtained in terms of the solution based on the feeling analysis classifier. Section three identify the information/models developed through the visual decision support tools and, in the end, section four presents the benefits of the recruitment platform services.
I.RELATED WORK
The rapid evolution of technology in recent years has had a significant impact on society and the various fields operating in the labour market. With this in mind, a large proportion of companies in the industry choose to use Artificial Intelligence more and more often to implement more flexible processes based on the results obtained from other activities. Such an approach can be used in processes that involve customized offers, such as recruitment processes within companies.
An example describing such an approach is illustrated in the paper [17]. In this article, the authors present a study that aims to identify sets of high-frequency elements using mining technology HIGEN (History Generalized Pattern). This technology is implemented using a priori data-based algorithm methods.
In [19], Yu Wu et al. propose an improved neural network in order to stimulate the appropriate responses for certain messages on chatbot platforms. It can be used both individually and in a set of software tools. The principles underlying the algorithm's implementation were based on two key questions: assuming that there is a message as input, how to get the main words from that message, and how to include these messages in the correspondence between the message and the most appropriate response? Regarding the first question, a solution was used as an LDA (Latent Dirichlet Allocation) model for Twitter, which is frequently used for short texts. The parameters of this model are obtained using the Gibbs sampling algorithm. The neural network model used in this case is a recurrent attentive, subject-aware neural network (TAARNN). The main advantage is that both the parts that are important for the correspondence and those semantically related to the message's subject are highlighted. In this way, responses that contain a lot of information that can be associated with the topic can be stimulated to match between message and response.
An analysis of a recurrent convolutional neural network for text classification (TACNTN) is detailed in the paper [7]. The network described in this case can perform the classification without using man-made features. In this situation, the main words of a message are obtained with the help of an LDA model that has been previously trained, and their weights are determined inside them. A recurring structure is applied to identify contextual information as accurately as possible, which generates much less noise than traditional window-based neural networks. A "max-pooling" layer is used to identify the keywords that can be used in classifying texts. The Skip-gram model was used for the word fixation training stage. Following the tests performed, using four known data sets, it was observed that this method provides superior results to other previous approaches in this field.
Chappie, a semi-automated intelligent chatbot platform, is described by Bibek Behera et al in [4]. Its main advantages are the ability to understand and handle repetitive requests, plus a response mechanism based on AIML (Artificial Intelligence Based). The main use of the above-mentioned platform is as a guiding agent. It is able to classify the user's requirement in one of the categories of services offered by the field in which it is used and then transfer the user to an expert in that service.
SmashFly [5] is a platform dedicated to recruitment that offers companies the opportunity to improve employee selection processes. It was founded in 2007 and works on the principle of a search engine that constantly sells a certain brand, content, and job offers of companies. Customer Relationship Management (CRM) of this platform creates a network of potential candidates, thus facilitating their attempt to find a job or change their career. This platform allows users to create their own dashboard to include essential information in real-time, and which gives them various statistics to act accordingly.
LinkedIn Talent Solutions is also part of the category of platforms for recruitment processes [2]. The latter provides users with access to the network of members of the LinkedIn platform, which includes more than 500 million people. With this platform, employers have the opportunity to find the right candidate using search tools, as well as other advanced filters for the search process (more than 20 filters). To contact the candidate, the recruiter can customize the InMail tool to write to one or a group of candidates at the same time, with a maximum of 150 messages per month. The Talent Solution platform is able to provide users with personalized recommendations, according to their LinkedIn profile. As for companies, they have access to several administration tools to filter candidates, save their status, and contact them using InMail.
In general, comparing written texts requires time and effort, so API services have been developed to help businesses quickly or automatically analyse written text, compare texts with each other to identify the percentage of similarity or relationship or compare texts with examples for more efficient classification. Some important steps for achieving an efficient NLP algorithm are word identification, grammar analysis, and sentence tree construction. Thus, the numerous solutions developed can be classified into NLP web services, frameworks, NLP building blocks, and resources. Web services are those that integrate all other categories, and some of the most important and wellknown NLP web services are Google Cloud Speech API, Amazon Alexa Voice service, IBM Watson Speech to text, Azure ML Text Analytics.
In [7], Azure ML Text Analytics was used to determine the area of interest of investors based on their Twitter posts so that users know both the areas for which there is a higher level of interest (positive) as well as those with a lower level of interest (negative) to improve the strategies and projects they want to develop in the future.
There is also the possibility to automatically classify even the reviews placed by the customers of some online stores; for example, for the Amazon.com site an analysis of the feelings or an extraction of the customers' opinions was made, which can be found at the study level [6].
II.SOMEDI ARCHITECTURE
The goal of the SoMeDi platform is to develop a sentiment analysis application to improve recruitment processes. Thus, the methodology for analysing the data of users registered in the SoMeDi platform, as well as the DID data (Data collected from Digital Interactions), is presented and demonstrated. Based on this information, reports and recommendations are generated to different users, in order to provide them with personalized feedback and to improve the experience within the SoMeDi platform for both categories of users: candidates and companies (recruiters).
2.1 Software Components of the Architecture
The platform includes the following software components, represented in Figure no. 1:
1. Sentiment Analysis Microservice (4.1), a modular software component that can support the 4 technologies for Sentiment Analysis (Google APIs, respectively Azure Cognitive Services, but also Stanford Core NLP open source solutions, respectively the sentiment analysis application with a classifier trained on text content in Romanian);
2. An intermediate component (3.1), middleware, which ensures the communication between the SA microservice and the database, the data flow being the following: the Applicant type user interacts with the Practice Programs section within the intelligent digital interaction platform, the section is configured so that the content text completed by the user in the displayed fields calls the SA microservice; the microservice responds in a first stage with a request identifier, and in the next step it returns the score resulting from the sentiment analysis type processing. The returned score is stored in the database for later representation in decision support visualization tools;
3. Back end component that ensures data persistence (2.2) and content management(2.1);
4. The data visualization component (1.1 and 1.2) provides the user interface with the decision support visualization tools: the visual tools that present the statistics presented in the previous points and the matching percentage.
2.2 Sentiment Analysis solutions
In many use cases, the content with the most important information is written down in a natural language (such as English, German, Spanish, Chinese, etc.) and not conveniently tagged. To extract information from this content claims to rely on some levels of text mining, text extraction, or possibly full-up natural language processing techniques.
2.3 Sentiment Analysis Microservice
The Sentiment Analysis microservice can be consumed as a JSON REST API.
The sentiment analysis used within the SoMeDi platform (4.1) is performed in the form of a scalable, modular, and reusable microservice. In order to do this, several sentiment analysis engines were tested, such as Google, Microsoft Azure, and Stanford Core NLP. The microservice architecture is presented in Figure 2, the components and the communication flow being described below:
* A request (a job) to call the SA-microservice (sentiment analysis microservice) is initially processed by the Load Balancer component, part of the Docker tool used;
* The application reaches one of the Docker containers and starts processing;
* The job is stored in a temporary key-value memory (KV-store). It will also contain the result returned by the Sentiment Analysis tool (run by SA-microservice);
* The result is returned to the client either in the same request or later, together with the job status;
* The job expires after a certain waiting period and is removed from the KV-store memory.
Endpoint: /health-check
HTTP Method: GET
Scope: internal
Remarks: Used by the load-balancer to decide which instances of the microservice are ready to serve traffic. It is not exposed externally
Endpoint: /sentiment-analysis
HTTP Method: POST
Scope: public
Request Body:
...
Response Body:
...
Endpoint: /job-status
HTTP Method: POST
Scope: public
Request Body:
...
Response Body:
...
In the pictures below we present the sentiment analysis request over the input text collected from the internship applications, in the first picture we have the job-id allocation, while in the second picture we have the sentiment analysis score for the respected job-id.
The tool used for the key-value cache is Redis [15]. Redis is used because it allows persistence and if necessary, it can be replicated over multiple nodes. The SA-microservice component is implemented as a node applicationjs [13]. This application is modular and can support the 3 technologies for Sentiment Analysis. Interaction with each tool is implemented as a class: GoogleSAjs, AzureSAjs, StanfordSAjs. The final version used is GoogleSA.
The service can be performed as a JSON REST API interface [8]. Each endpoint must respond in a maximum of 50ms. The answer is either a "success" call validation message (and will contain the response to Sentiment Analysis by a score), or it is a message invaliding the endpoint calling request - "unsuccessful", detailing the cause. Each endpoint supports a maximum text of 2kB as a load for JSON.
Each authentication token has a limited number of requests per minute associated with the possibility of forcing the acceptance of a larger number in the first minute, so you can limit its abusive dialing. For example, a token can have a force of 300 requests in the first minute and then a limit of 100 requests/minute depending on traffic (there are a very small number of requests). These limits are imposed at the service level and if a limit is exceeded, the returned status is rate-limit . The theoretical model is "token bucket" (limits are checked).
The service communicates through tLS protocol v1.2. Because it can be hosted in a public cloud and data can cross a public network, encryption authentication token protection is required. In this implementation, the token is generated statically by the "admin". More advanced methods will be used in the future.
The service does not retain an internal state (states), except for job and score items, which are stored for a limited time. It also does not store the user's personal data. The service generates metrics that can be used to find out its status, allowing analysis of the operation of the service.
Within the platform, for both entities involved in the recruitment process, there is the possibility of expressing opinion. The steps of the process of finding a candidate's opinion by using the Data Analysis Service within the SoMeDi platform are presented below:
* The candidate populates the necessary fields.
* The request is sent by pressing the "Apply" button to call the service.
* Text content is sent to the SA microservice. Follows a loop with one or more cycles (depending on the number of fields required to be filled in by the candidate), one for each populated field.
* Write the results to the platform database for further processing.
III.DISCUSSIONS
Thus, through the visual decision support tools developed by structuring DID data as metadata (databases) and processing them using Clustering statistical analyses, the following information/models can be identified:
1. Statistics specific to the internship program (application) - age of candidates, level of education, field of study, professional experience.
2. The tendency of the candidates regarding the areas of activity of the company.
3. Number of accepted applications compared to the number of candidates who actually started the internship program.
4. The matching percentage calculated by means of database query functions on the information filled in the applicant profile, respectively the company profile.
5. Candidates' opinions after the completion of the internship program (feedback).
In Figure no. 6, the recruitment company can see statistics and dashboards about the age, work experience, level and area of study of the candidates that apply.
IV.CONCLUSIONS
The recruitment platform has a high potential in ensuring high compatibility between the applicant and the employee company, providing the following benefits:
* making the selection time of candidates more efficient.
* streamlining candidate-employee employment.
* increasing employment rate among young graduates.
* lowering the costs allocated to recruitment processes for companies.
* the existence of an exchange of information and feedback between the two entities involved in the recruitment process.
In conclusion, the micro-service developed contributes significantly to improving the recruitment/application experience at an internship programme. Also, by using platform-integrated services, which act as a decision support component, the time required to centralize, and analyses data is reduced. The improvement of the recruitment process is stimulated by centralizing the participants' views.
As future work, the sentiment analysis microservice can be vertically extended in areas such as marketing, banking, etc.
Acknowledgements
This work has been supported in part by UEFISCDI Romania and MCI through projects SoMeDi, SOLOMON and PAPUD, funded in part by European Union's Horizon 2020 research No. 787002 (SAFECARE project).
Reference Text and Citations
[1] Analyzers common http://lucene.apache.org/core/6_5_0/analyzers-common/index.html
[2] Bali, M. & Dixit, S. (2016): "Employer Brand Building for Effective Talent Management". In International Journal of Applied Sciences and Management, 2(1), 183-191.
[3] Basistech, https://www.basistech.com/
[4] Behera, B. (2016): "Chappie-a semi-automatic intelligent chatbot". In LCPST, pp. 1-5.
[5] Chatterjee, A. & Perrizo, M. (2016): "Investor classification and sentiment analysis". In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, 2016, pp. 1177-1180.
[6] Fang, X. & Zhan, J (2015) "Sentiment analysis using product review data". In Journal of Big Data volume 2.
[7] Hameed, S. & Nileena, G. S. (2014): "IEEE student quality improvement program: to improve the employability rate of students". In 2014 IEEE International Conference on MOOC, Innovation and Technology in Education (MITE) (pp. 219-222).
[8] JSON API, https://jsonapi.org/
[9] Lai, S., Xu, L., Liu, K., Zhao, J. (2015): "Recurrent convolutional neural networks for text classification". AAAI 333, 2267-2273.
[10] Language detection Project Home https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md
[11] Language detector optimizer https://github.com/optimaize/language-detector
[12] Lucene segmenting, https://lucene.apache.org/core/6_2_1/analyzerscommon/org/apache/lucene/analysis/util/SegmentingTokenizerBase.html
[13] Node.js, https://nodejs.org/en/
[14] Open NLP https://opennlp.apache.org/docs/
[15] Redis, https://redis.io/
[16] Search technologies https://www.searchtechnologies.com/
[17] Shinde, S., Mangrule, R.A. (2016): "Discovery of frequent itemset using higen miner with multiple taxonomies". In Int. J. Curr. Trends in Eng. Res. 2(6), 373-383.
[18] Tokenizer https://opennlp.apache.org/docs/1.8.2/apidocs/opennlp-tools/opennlp/tools/tokenize/Tokenizer.html
[19] Wu, Y., Wu, W., Li, Z., Zhou, M. (2016): "Response selection with topic clues for retrieval-based chatbots". In: Symposium for Advancement of Artificial Intelligence, pp. 1-8.
[20] https://github.com/mikemccand/chromium-compact-language-detector
Copyright "Carol I" National Defence University 2021