Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Publications

Security-aware database migration planning

Database migration is an important problem faced by companies dealing with big data. Not only is migration a costly procedure, it involves serious security risks as well. For some institutions, the primary focus is on reducing the cost of the migration operation, which manifests itself in application testing. For other institutions, minimizing security risks is the most important goal, especially if the data involved is of a sensitive nature. In the literature, the database migration problem has been studied from a test cost minimization perspective. In this paper, we focus on an orthogonal measure, i.e., security risk minimization. We associate security with the number of shifts needed to complete the migration task. Ideally, we want to complete the migration in as few shifts as possible, so that the risk of data exposure is minimized. In this paper, we provide a formal framework for studying the database migration problem from the perspective of security risk minimization (shift minimization) and establish the computational complexities of several models in the same. We present experimental results for various intractable models and show that our heuristic methods produce solutions that are within 3.67% of the optimal in more than 85% of the cases.

Database migration is an important problem faced by companies dealing with big data. Not only is migration a costly procedure, it involves serious security risks as well. For some institutions, the primary focus is on reducing the cost of the migration operation, which manifests itself in application testing. For other institutions, minimizing security risks is the most important goal, especially if the data involved is of a sensitive nature. In the literature, the database migration problem has been studied from a test cost minimization perspective. In this paper, we focus on an orthogonal measure, i.e., security risk minimization. We associate security with the number of shifts needed to complete the migration task. Ideally, we want to complete the migration in as few shifts as possible, so that the risk of data exposure is minimized. In this paper, we provide a formal framework for studying the database migration problem from the perspective of security risk minimization (shift minimization) and establish the computational complexities of several models in the same. We present experimental results for various intractable models and show that our heuristic methods produce solutions that are within 3.67% of the optimal in more than 85% of the cases.

Turkish sentiment analysis using bert

We compare the performance of the multilingual BERT for Turkish sentiment analysis and compare it with original BERT with automatic translation.

We compare the performance of the multilingual BERT for Turkish sentiment analysis and compare it with original BERT with automatic translation.

New Results on Test-Cost Minimization in Database Migration

An important ubiquitous task in modern cloud systems is the migration of databases from one location to another. In practical settings, the databases are migrated in several shifts in order to meet the quality of service requirements of the end-users. Once a batch of databases is migrated in a shift, the applications that depend on the databases on that shift are to be immediately tested. Testing an application is a costly procedure and the number of times an application is to be tested throughout the migration process varies greatly depending on the migration schedule. An interesting algorithmic challenge is to find a schedule that minimizes the total testing cost of all the applications. This problem, referred to as the capacity constrained database migration (CCDM) problem, is known to be NP-hard and fixed-parameter intractable for various relevant parameters. In this paper, we provide new approximability and inapproximability results as well as new conditional lower bounds for the running time of any exact algorithm for the CCDM problem. Also, we adapt heuristic algorithms devised for the Hypergraph Partitioning problem to the CCDM problem and give extensive experimental results.

An important ubiquitous task in modern cloud systems is the migration of databases from one location to another. In practical settings, the databases are migrated in several shifts in order to meet the quality of service requirements of the end-users. Once a batch of databases is migrated in a shift, the applications that depend on the databases on that shift are to be immediately tested. Testing an application is a costly procedure and the number of times an application is to be tested throughout the migration process varies greatly depending on the migration schedule. An interesting algorithmic challenge is to find a schedule that minimizes the total testing cost of all the applications. This problem, referred to as the capacity constrained database migration (CCDM) problem, is known to be NP-hard and fixed-parameter intractable for various relevant parameters. In this paper, we provide new approximability and inapproximability results as well as new conditional lower bounds for the running time of any exact algorithm for the CCDM problem. Also, we adapt heuristic algorithms devised for the Hypergraph Partitioning problem to the CCDM problem and give extensive experimental results.

On approximate Nash equilibria of the two-source connection game

The arbitrary-sharing connection game is prominent in the network formation game literature. An undirected graph with positive edge weights is given, where the weight of an edge is the cost of building it. An edge is built if agents contribute a sufficient amount for its construction. For agent i , the goal is to contribute the least possible amount while assuring that the source node s_i is connected to the terminal node t_i . In this paper, we study the special case of this game in which there are only two source nodes. In this setting, we prove that there exists a 2 -approximate Nash equilibrium that is socially optimal. We also consider the further special case in which there are no auxiliary nodes (i.e., every node is a terminal or source node). In this further special case, we show that there exists a 3/2 -approximate Nash equilibrium that is socially optimal and it is computable in polynomial time.

The arbitrary-sharing connection game is prominent in the network formation game literature. An undirected graph with positive edge weights is given, where the weight of an edge is the cost of building it. An edge is built if agents contribute a sufficient amount for its construction. For agent i , the goal is to contribute the least possible amount while assuring that the source node s_i is connected to the terminal node t_i . In this paper, we study the special case of this game in which there are only two source nodes. In this setting, we prove that there exists a 2 -approximate Nash equilibrium that is socially optimal. We also consider the further special case in which there are no auxiliary nodes (i.e., every node is a terminal or source node). In this further special case, we show that there exists a 3/2 -approximate Nash equilibrium that is socially optimal and it is computable in polynomial time.

How You Describe Procurement Calls Matters: Predicting Outcome of Public Procurement Using Call Descriptions

A competitive and cost-effective public procurement process is essential for the effective use of public resources. In this work, we explore whether descriptions of procurement calls can be used to predict their outcomes. In particular, we focus on predicting four well-known economic metrics: i) the number of offers, ii) whether only a single offer is received, iii) whether a foreign firm is awarded the contract, and iv) whether the contract price exceeds the expected price. We extract the European Union's multilingual public procurement notices, covering 22 different languages. We investigate fine-tuning multilingual transformer models and propose two approaches: 1) multilayer perceptron models (MLP) with transformer embeddings for each business sector in which the training data is filtered based on the procurement category and 2) a KNN based approach fine-tuned using Triplet Networks. The fine-tuned MBERT model outperforms all other models in predicting calls with a single offer and foreign contract awards, whereas our MLP based filtering approach yields state-of-the-art results in predicting contracts in which the contract price exceeds the expected price. Furthermore, our KNN based approach outperforms all the baselines in all tasks and our other proposed models in predicting the number of offers. Moreover, we investigate cross-lingual and multilingual training for our tasks and observe that multilingual training improves prediction accuracy in all our tasks. Overall, our experiments suggest that notice descriptions play an important role in outcomes of public procurement calls.

A competitive and cost-effective public procurement process is essential for the effective use of public resources. In this work, we explore whether descriptions of procurement calls can be used to predict their outcomes. In particular, we focus on predicting four well-known economic metrics: i) the number of offers, ii) whether only a single offer is received, iii) whether a foreign firm is awarded the contract, and iv) whether the contract price exceeds the expected price. We extract the European Union's multilingual public procurement notices, covering 22 different languages. We investigate fine-tuning multilingual transformer models and propose two approaches: 1) multilayer perceptron models (MLP) with transformer embeddings for each business sector in which the training data is filtered based on the procurement category and 2) a KNN based approach fine-tuned using Triplet Networks. The fine-tuned MBERT model outperforms all other models in predicting calls with a single offer and foreign contract awards, whereas our MLP based filtering approach yields state-of-the-art results in predicting contracts in which the contract price exceeds the expected price. Furthermore, our KNN based approach outperforms all the baselines in all tasks and our other proposed models in predicting the number of offers. Moreover, we investigate cross-lingual and multilingual training for our tasks and observe that multilingual training improves prediction accuracy in all our tasks. Overall, our experiments suggest that notice descriptions play an important role in outcomes of public procurement calls.

Multilevel Memetic Hypergraph Partitioning with Greedy Recombination

The Hypergraph Partitioning (HGP) problem is a well-studied problem that finds applications in a variety of domains. The literature on the HGP problem has heavily focused on developing fast heuristic approaches. In several application domains, such as the VLSI design and database migration planning, the quality of the solution is more of a concern than the running time of the algorithm. KaHyPar-E is the first multilevel memetic algorithm designed for the HGP problem and it returns better quality solutions, compared to the heuristic algorithms, if sufficient computation time is given. In this work, we introduce novel problem-specific recombination and mutation operators, and develop a new multilevel memetic algorithm by combining KaHyPar-E with these operators. The performance of our algorithm is compared with the state-of-the-art HGP algorithms on 150 real-life instances taken from the benchmark datasets used in the literature. In the experiments, which would take 39,000 hours in a single-core computer, each algorithm is given 2,4, and 8 hours to compute a solution for each instance. Our algorithm outperforms all others and finds the best solutions in 112, 115, and 125 instances in 2,4, and 8 hours, respectively.

The Hypergraph Partitioning (HGP) problem is a well-studied problem that finds applications in a variety of domains. The literature on the HGP problem has heavily focused on developing fast heuristic approaches. In several application domains, such as the VLSI design and database migration planning, the quality of the solution is more of a concern than the running time of the algorithm. KaHyPar-E is the first multilevel memetic algorithm designed for the HGP problem and it returns better quality solutions, compared to the heuristic algorithms, if sufficient computation time is given. In this work, we introduce novel problem-specific recombination and mutation operators, and develop a new multilevel memetic algorithm by combining KaHyPar-E with these operators. The performance of our algorithm is compared with the state-of-the-art HGP algorithms on 150 real-life instances taken from the benchmark datasets used in the literature. In the experiments, which would take 39,000 hours in a single-core computer, each algorithm is given 2,4, and 8 hours to compute a solution for each instance. Our algorithm outperforms all others and finds the best solutions in 112, 115, and 125 instances in 2,4, and 8 hours, respectively.

Models for test cost minimization in database migration

Database migration is a ubiquitous need faced by enterprises that generate and use vast amounts of data. This is due to database software updates, or from changes to hardware, project standards, and other business factors. Migrating a large collection of databases is a way more challenging task than migrating a single database due to the presence of additional constraints. These constraints include capacities of shifts, and sizes of databases. In this paper, we present a comprehensive framework that can be used to model database migration problems of different enterprises with customized constraints, by appropriately instantiating the parameters of the framework. These parameters are the size of each database, the size of each shift, and the cost of testing each application. Each of these parameters can be either constant or arbitrary. Additionally, the cost of testing an application can be proportional to the number of databases that application uses. Additionally, we examine a variant of the problem in which all the parameters are constant, there are only two shifts, and each application calls at most two databases. We establish the computational complexities of a number of instantiations of this framework. We present fixed-parameter intractability results for various relevant parameters of the database migration problem. We also provide approximability and inapproximability results as well as lower bounds for the running time of any exact algorithm for the database migration problem. We show that the database migration problem is equivalent to a variation of the classical Hypergraph Partitioning problem. Our theoretical results also imply new theoretical results for the Hypergraph Partitioning problem that are interesting in their own right. Finally, we adapt heuristic algorithms devised for the Hypergraph Partitioning problem to the database migration problem, and give experimental results for the adapted heuristic algorithms.

Database migration is a ubiquitous need faced by enterprises that generate and use vast amounts of data. This is due to database software updates, or from changes to hardware, project standards, and other business factors. Migrating a large collection of databases is a way more challenging task than migrating a single database due to the presence of additional constraints. These constraints include capacities of shifts, and sizes of databases. In this paper, we present a comprehensive framework that can be used to model database migration problems of different enterprises with customized constraints, by appropriately instantiating the parameters of the framework. These parameters are the size of each database, the size of each shift, and the cost of testing each application. Each of these parameters can be either constant or arbitrary. Additionally, the cost of testing an application can be proportional to the number of databases that application uses. Additionally, we examine a variant of the problem in which all the parameters are constant, there are only two shifts, and each application calls at most two databases. We establish the computational complexities of a number of instantiations of this framework. We present fixed-parameter intractability results for various relevant parameters of the database migration problem. We also provide approximability and inapproximability results as well as lower bounds for the running time of any exact algorithm for the database migration problem. We show that the database migration problem is equivalent to a variation of the classical Hypergraph Partitioning problem. Our theoretical results also imply new theoretical results for the Hypergraph Partitioning problem that are interesting in their own right. Finally, we adapt heuristic algorithms devised for the Hypergraph Partitioning problem to the database migration problem, and give experimental results for the adapted heuristic algorithms.

Patent Search Using Triplet Networks Based Fine-Tuned SciBERT

In this paper, we propose a novel method for the prior-art search task. We fine-tune SciBERT transformer model using Triplet Net- work approach, allowing us to represent each patent with a fixed- size vector. This also enables us to conduct efficient vector similar- ity computations to rank patents in query time. In our experiments, we show that our proposed method outperforms baseline methods.

In this paper, we propose a novel method for the prior-art search task. We fine-tune SciBERT transformer model using Triplet Net- work approach, allowing us to represent each patent with a fixed- size vector. This also enables us to conduct efficient vector similar- ity computations to rank patents in query time. In our experiments, we show that our proposed method outperforms baseline methods.

Security-Aware Database Migration Planning

Database migration is an important problem faced by companies deal- ing with big data. Not only is migration a costly procedure, it involves serious security risks as well. For some institutions, the primary focus is on reducing the cost of the migration operation, which manifests itself in application testing. For other institutions, minimizing security risks is the most important goal, especially if the data involved is of a sensitive nature. In the literature, the database mi- gration problem has been studied from a test cost minimization perspective. In this paper, we focus on an orthogonal measure, i.e., security risk minimization. We associate security with the number of shifts needed to complete the migration task. Ideally, we want to complete the migration in as few shifts as possible, so that the risk of data exposure is minimized. In this paper, we provide a formal framework for studying the database migration problem from the perspective of security risk minimization (shift minimization) and establish the computational complexities of several models in the same. For the NP-hard models, we develop memetic algorithms that produce solutions that are within 10% and 7% of the optimal in 95% of the instances under 8 and 82 seconds, respectively.

Database migration is an important problem faced by companies deal- ing with big data. Not only is migration a costly procedure, it involves serious security risks as well. For some institutions, the primary focus is on reducing the cost of the migration operation, which manifests itself in application testing. For other institutions, minimizing security risks is the most important goal, especially if the data involved is of a sensitive nature. In the literature, the database mi- gration problem has been studied from a test cost minimization perspective. In this paper, we focus on an orthogonal measure, i.e., security risk minimization. We associate security with the number of shifts needed to complete the migration task. Ideally, we want to complete the migration in as few shifts as possible, so that the risk of data exposure is minimized. In this paper, we provide a formal framework for studying the database migration problem from the perspective of security risk minimization (shift minimization) and establish the computational complexities of several models in the same. For the NP-hard models, we develop memetic algorithms that produce solutions that are within 10% and 7% of the optimal in 95% of the instances under 8 and 82 seconds, respectively.

Intellectual Property Protection Lost and Competition: An Examination Using Machine Learning

We examine the impact of lost intellectual property protection on innovation, competition, mergers and acquisitions and employment agreements. We consider firms whose ability to protect intellectual property (IP) using patents is potentially invalidated following the Alice vs. CLS Bank International Supreme Court decision. This decision has impacted patents in multiple areas including business methods, software, and bioinformatics. We use state-of-the-art machine learning techniques to identify firms' existing patent portfolios' potential exposure to the Alice decision. While all affected firms decrease patenting post-Alice, we find an unequal impact of decreased patent protection. Large affected firms benefit as their sales and market valuations increase, and their exposure to lawsuits through patent trolls decrease. They also acquire fewer firms post-Alice. Small affected firms lose as they face increased competition, product market encroachment, and lower profits and valuations. They increase R&D and have their employees sign more nondisclosure and noncompete agreements.

We examine the impact of lost intellectual property protection on innovation, competition, mergers and acquisitions and employment agreements. We consider firms whose ability to protect intellectual property (IP) using patents is potentially invalidated following the Alice vs. CLS Bank International Supreme Court decision. This decision has impacted patents in multiple areas including business methods, software, and bioinformatics. We use state-of-the-art machine learning techniques to identify firms' existing patent portfolios' potential exposure to the Alice decision. While all affected firms decrease patenting post-Alice, we find an unequal impact of decreased patent protection. Large affected firms benefit as their sales and market valuations increase, and their exposure to lawsuits through patent trolls decrease. They also acquire fewer firms post-Alice. Small affected firms lose as they face increased competition, product market encroachment, and lower profits and valuations. They increase R&D and have their employees sign more nondisclosure and noncompete agreements.