Commit graph

574264 commits

Author SHA1 Message Date
Viresh Kumar
49f18560f8 cpufreq: Call __cpufreq_governor() with policy->rwsem held
The cpufreq core code is not consistent with respect to invoking
__cpufreq_governor() under policy->rwsem.

Changing all code to always hold policy->rwsem around
__cpufreq_governor() invocations will allow us to remove
cpufreq_governor_lock that is used today because we can't
guarantee that __cpufreq_governor() isn't executed twice in
parallel for the same policy.

We should also ensure that policy->rwsem is held across governor
state changes.

For example, while adding a CPU to the policy in the CPU online path,
we need to stop the governor, change policy->cpus, start the governor
and then refresh its limits. The complete sequence must be guaranteed
to complete without interruptions by concurrent governor state
updates.  That can be achieved by holding policy->rwsem around those
sequences of operations.

Also note that after this patch cpufreq_driver->stop_cpu() and
->exit() will get called under policy->rwsem which wasn't the case
earlier. That shouldn't have any side effects, though.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:00 +01:00
Viresh Kumar
69cee7147b cpufreq: Merge cpufreq_offline_prepare/finish routines
Commit 1aee40ac9c (cpufreq: Invoke __cpufreq_remove_dev_finish()
after releasing cpu_hotplug.lock) split the cpufreq's CPU offline
routine in two pieces, one of them to be run with CPU offline/online
locked and the other to be called later.  The reason for that split
was a possible deadlock scenario involving cpufreq sysfs attributes
and CPU offline.

However, the handling of CPU offline in cpufreq has changed since
then.  Policy sysfs attributes are never removed during CPU offline,
so there's no need to worry about accessing them during CPU offline,
because that can't lead to any deadlocks now.  Governor sysfs
attributes are still removed in __cpufreq_governor(_EXIT), but
there is a new kobject type for them now and its show/store
callbacks don't lock CPU offline/online (they don't need to do
that).

This means that the CPU offline code in cpufreq doesn't need to
be split any more, so combine cpufreq_offline_prepare() with
cpufreq_offline_finish().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:00 +01:00
Viresh Kumar
c54df07184 cpufreq: governor: Create and traverse list of policy_dbs to avoid deadlock
The dbs_data_mutex lock is currently used in two places.  First,
cpufreq_governor_dbs() uses it to guarantee mutual exclusion between
invocations of governor operations from the core.  Second, it is used by
ondemand governor's update_sampling_rate() to ensure the stability of
data structures walked by it.

The second usage is quite problematic, because update_sampling_rate() is
called from a governor sysfs attribute's ->store callback and that leads
to a deadlock scenario involving cpufreq_governor_exit() which runs
under dbs_data_mutex.  Thus it is better to rework the code so
update_sampling_rate() doesn't need to acquire dbs_data_mutex.

To that end, rework update_sampling_rate() to walk a list of policy_dbs
objects supported by the dbs_data one it has been called for (instead of
walking cpu_dbs_info object for all CPUs).  The list manipulation is
protected with dbs_data->mutex which also is held around the execution
of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in
that function any more.

Reported-by: Juri Lelli <juri.lelli@arm.com>
Reported-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:59 +01:00
Viresh Kumar
68e80dae09 Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"
Earlier, when the struct freq-attr was used to represent governor
attributes, the standard cpufreq show/store sysfs attribute callbacks
were applied to the governor tunable attributes and they always acquire
the policy->rwsem lock before carrying out the operation.  That could
have resulted in an ABBA deadlock if governor tunable attributes are
removed under policy->rwsem while one of them is being accessed
concurrently (if sysfs attributes removal wins the race, it will wait
for the access to complete with policy->rwsem held while the attribute
callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.

The previous commit, "cpufreq: governor: New sysfs show/store callbacks
for governor tunables", fixed the original ABBA deadlock by adding new
governor specific show/store callbacks.

We don't have to drop rwsem around invocations of governor event
CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.

Fixes: 955ef48335 (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:59 +01:00
Viresh Kumar
fd8ddc482a cpufreq: governor: Drop unused macros for creating governor tunable attributes
The previous commit introduced a new set of macros for creating sysfs
attributes that represent governor tunables and the old macros used for
this purpose are not needed any more, so drop them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar
c443563036 cpufreq: governor: New sysfs show/store callbacks for governor tunables
The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use the same governor with different tunables at the
same time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog + rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar
ff4b17895e cpufreq: governor: Move common tunables to 'struct dbs_data'
There are a few common tunables shared between the ondemand and
conservative governors.  Move them to struct dbs_data to simplify
code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar
d0684d3b89 cpufreq: governor: Create generic macro for common tunables
Some tunables are present in governor-specific structures, whereas one
(min_sampling_rate) is located directly in struct dbs_data.

There is a special macro for creating its sysfs attribute and the
show/store callbacks, but since more tunables are going to be moved
to struct dbs_data, a new generic macro for such cases will be useful,
so add it and use it for min_sampling_rate.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:57 +01:00
Rafael J. Wysocki
fafd5e8ab2 cpufreq: governor: Drop pointless goto from cpufreq_governor_init()
It is silly to jump around "return 0", so don't do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:57 +01:00
Rafael J. Wysocki
686cc637c9 cpufreq: governor: Rename skip_work to work_count
The skip_work field in struct policy_dbs_info technically is a
counter, so give it a new name to reflect that.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:57 +01:00
Rafael J. Wysocki
cea6a9e772 cpufreq: governor: Symmetrize cpu_dbs_info initialization and cleanup
Make the initialization of struct cpu_dbs_info objects in
alloc_policy_dbs_info() and the code that cleans them up in
free_policy_dbs_info() more symmetrical.  In particular,
set/clear the update_util.func field in those functions along
with the policy_dbs field.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:56 +01:00
Rafael J. Wysocki
bc505475b8 cpufreq: governor: Rearrange governor data structures
The struct policy_dbs_info objects representing per-policy governor
data are not accessible directly from the corresponding policy
objects.  To access them, one has to get a pointer to the
struct cpu_dbs_info of policy->cpu and use the policy_dbs field of
that which isn't really straightforward.

To address that rearrange the governor data structures so the
governor_data pointer in struct cpufreq_policy will point to
struct policy_dbs_info (instead of struct dbs_data) and that will
contain a pointer to struct dbs_data.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:56 +01:00
Rafael J. Wysocki
e975189400 cpufreq: governor: Simplify cpufreq_governor_limits()
Use the observation that cpufreq_governor_limits() doesn't have to
get to the policy object it wants to manipulate by walking the
reference chain cdbs->policy_dbs->policy, as the final pointer is
actually equal to its argument, and make it access the policy
object directy via its argument.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:56 +01:00
Rafael J. Wysocki
d10b5eb5fc cpufreq: governor: Drop cpu argument from dbs_check_cpu()
Since policy->cpu is always passed as the second argument to
dbs_check_cpu(), it is not really necessary to pass it, because
the function can obtain that value via its first argument just fine.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki
e40e7b255e cpufreq: governor: Rename cpu_common_dbs_info to policy_dbs_info
The struct cpu_common_dbs_info structure represents the per-policy
part of the governor data (for the ondemand and conservative
governors), but its name doesn't reflect its purpose.

Rename it to struct policy_dbs_info and rename variables related to
it accordingly.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki
ea59ee0dc9 cpufreq: governor: Drop the gov pointer from struct dbs_data
Since it is possible to obtain a pointer to struct dbs_governor
from a pointer to the struct governor embedded in it with the help
of container_of(), the additional gov pointer in struct dbs_data
isn't really necessary.

Drop that pointer and make the code using it reach the dbs_governor
object via policy->governor.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki
906a6e5aae cpufreq: governor: Rework cpufreq_governor_dbs()
Since it is possible to obtain a pointer to struct dbs_governor
from a pointer to the struct governor embedded in it via
container_of(), the second argument of cpufreq_governor_init()
is not necessary.  Accordingly, cpufreq_governor_dbs() doesn't
need its second argument either and the ->governor callbacks
for both the ondemand and conservative governors may be set
to cpufreq_governor_dbs() directly.  Make that happen.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki
7bdad34d08 cpufreq: governor: Rename some data types and variables
The ondemand and conservative governors are represented by
struct common_dbs_data whose name doesn't reflect the purpose it
is used for, so rename it to struct dbs_governor and rename
variables of that type accordingly.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki
af92618523 cpufreq: governor: Put governor structure into common_dbs_data
For the ondemand and conservative governors (generally, governors
that use the common code in cpufreq_governor.c), there are two static
data structures representing the governor, the struct governor
structure (the interface to the cpufreq core) and the struct
common_dbs_data one (the interface to the cpufreq_governor.c code).

There's no fundamental reason why those two structures have to be
separate.  Moreover, if the struct governor one is included into
struct common_dbs_data, it will be possible to reach the latter from
the policy via its policy->governor pointer, so it won't be necessary
to pass a separate pointer to it around.  For this reason, embed
struct governor in struct common_dbs_data.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki
5da3dd1e00 cpufreq: governor: Avoid passing dbs_data pointers around unnecessarily
Do not pass struct dbs_data pointers to the family of functions
implementing governor operations in cpufreq_governor.c as they can
take that pointer from policy->governor by themselves.

The cpufreq_governor_init() case is slightly more complicated, since
policy->governor may be NULL when it is invoked, but then it can reach
the pointer in question via its cdata argument just fine.

While at it, rework cpufreq_governor_dbs() to avoid a pointless
policy_governor check in the CPUFREQ_GOV_POLICY_INIT case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki
2bb8d94fb0 cpufreq: governor: Use common mutex for dbs_data protection
Every governor relying on the common code in cpufreq_governor.c
has to provide its own mutex in struct common_dbs_data.  However,
there actually is no need to have a separate mutex per governor
for this purpose, they may be using the same global mutex just
fine.  Accordingly, introduce a single common mutex for that and
drop the mutex field from struct common_dbs_data.

That at least will ensure that the mutex is always present and
initialized regardless of what the particular governors do.

Another benefit is that the common code does not need a pointer to
a governor-related structure to get to the mutex which sometimes
helps.

Finally, it makes the code generally easier to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki
9be4fd2c77 cpufreq: governor: Replace timers with utilization update callbacks
Instead of using a per-CPU deferrable timer for queuing up governor
work items, register a utilization update callback that will be
invoked from the scheduler on utilization changes.

The sampling rate is still the same as what was used for the
deferrable timers and the added irq_work overhead should be offset by
the eliminated timers overhead, so in theory the functional impact of
this patch should not be significant.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki
a4675fbc4a cpufreq: intel_pstate: Replace timers with utilization update callbacks
Instead of using a per-CPU deferrable timer for utilization sampling
and P-states adjustments, register a utilization update callback that
will be invoked from the scheduler on utilization changes.

The sampling rate is still the same as what was used for the deferrable
timers, so the functional impact of this patch should not be significant.

Based on an earlier patch from Srinivas Pandruvada.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-03-09 14:40:52 +01:00
Rafael J. Wysocki
34e2c555f3 cpufreq: Add mechanism for registering utilization update callbacks
Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2016-03-09 14:39:19 +01:00
Rafael J. Wysocki
ed757a2c7b cpufreq: acpi-cpufreq: Make read and write operations more efficient
Setting a new CPU frequency and reading the current request value
in the ACPI cpufreq driver involves each at least two switch
instructions (there's more if the policy is shared).  One of
them is present in drv_read/write() that prepares a command
structure and the other happens in subsequent do_drv_read/write()
when that structure is interpreted.  However, all of those switches
may be avoided by using function pointers.

To that end, add two function pointers to struct acpi_cpufreq_data
to represent read and write operations on the frequency register
and set them up during policy intitialization to point to the pair
of routines suitable for the given processor (Intel/AMD MSR access
or I/O port access).  Then, use those pointers in do_drv_read/write()
and modify drv_read/write() to prepare the command structure for
them without any checks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-03 03:57:50 +01:00
Shilpasri G Bhat
c5e29ea7ac cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}
Unregister the notifiers if cpufreq_driver_register() fails in
powernv_cpufreq_init(). Re-arrange the unregistration and cleanup routines
in powernv_cpufreq_exit() to free all the resources after the driver
has unregistered.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-26 22:18:19 +01:00
Srinivas Pandruvada
f05c966585 cpufreq: intel_pstate: disable HWP notifications
Disable HWP Interrupt notification before enabling HWP. Since we don't
have HWP interrupt handling for possible performance interrupts, there
is not much use of enabling HWP interrupts.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-26 22:15:38 +01:00
Srinivas Pandruvada
7791e4aa59 cpufreq: intel_pstate: Enable HWP by default
If the processor supports HWP, enable it by default without checking
for the cpu model. This will allow to enable HWP in all supported
processors without driver change.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-26 22:15:38 +01:00
Rafael J. Wysocki
34b0870515 cpufreq: Simplify the cpufreq_for_each_valid_entry()
That macro uses an internal static inline function that is first
totally unnecessary and second hard to read, so simplify it and
get rid of that monster.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-26 22:11:56 +01:00
Rafael J. Wysocki
9a909a142f cpufreq: acpi-cpufreq: Drop pointless label from acpi_cpufreq_target()
The "out" label at the final return statement in acpi_cpufreq_target()
is totally pointless, so drop them and modify the code to return the
right values immediately instead of jumping to it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-26 22:11:55 +01:00
Rafael J. Wysocki
6019d23a73 cpufreq: Rearrange __cpufreq_driver_target()
Drop a pointless label at a return statement from
__cpufreq_driver_target() and rearrange that function
to reduce the indentation level.

No intentional functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-26 22:11:55 +01:00
Viresh Kumar
41cfd64cf4 intel_pstate: Update frequencies of policy->cpus only from ->set_policy()
The intel-pstate driver is using intel_pstate_hwp_set() from two
separate paths, i.e. ->set_policy() callback and sysfs update path for
the files present in /sys/devices/system/cpu/intel_pstate/ directory.

While an update to the sysfs path applies to all the CPUs being managed
by the driver (which essentially means all the online CPUs), the update
via the ->set_policy() callback applies to a smaller group of CPUs
managed by the policy for which ->set_policy() is called.

And so, intel_pstate_hwp_set() should update frequencies of only the
CPUs that are part of policy->cpus mask, while it is called from
->set_policy() callback.

In order to do that, add a parameter (cpumask) to intel_pstate_hwp_set()
and apply the frequency changes only to the concerned CPUs.

For ->set_policy() path, we are only concerned about policy->cpus, and
so policy->rwsem lock taken by the core prior to calling ->set_policy()
is enough to take care of any races. The larger lock acquired by
get_online_cpus() is required only for the updates to sysfs files.

Add another routine, intel_pstate_hwp_set_online_cpus(), and call it
from the sysfs update paths.

This also fixes a lockdep reported recently, where policy->rwsem and
get_online_cpus() could have been acquired in any order causing an ABBA
deadlock. The sequence of events leading to that was:

intel_pstate_init(...)
	...cpufreq_online(...)
		down_write(&policy->rwsem); // Locks policy->rwsem
		...
		cpufreq_init_policy(policy);
			...intel_pstate_hwp_set();
				get_online_cpus(); // Temporarily locks cpu_hotplug.lock
		...
		up_write(&policy->rwsem);

pm_suspend(...)
	...disable_nonboot_cpus()
		_cpu_down()
			cpu_hotplug_begin(); // Locks cpu_hotplug.lock
			__cpu_notify(CPU_DOWN_PREPARE, ...);
				...cpufreq_offline_prepare();
					down_write(&policy->rwsem); // Locks policy->rwsem

Reported-and-tested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-23 01:10:44 +01:00
Eric Biggers
fd7dc7e6b6 cpufreq: simplify for_each_suitable_policy() macro
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-22 13:56:41 +01:00
Eric Biggers
63af405572 cpufreq: fix comment about return value of cpufreq_register_driver()
The comment has been incorrect since commit 4dea5806d3
("cpufreq: return EEXIST instead of EBUSY for second registering").

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-22 13:56:41 +01:00
Felipe Franciosi
5bc8ac0f68 Documentation: cpufreq: intel_pstate: fix typo
This just swaps a colon for a quote in the intel_pstate documentation.

Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-18 20:31:53 +01:00
Rafael J. Wysocki
6541aef01a cpufreq: Drop unnecessary checks from show() and store()
The show() and store() routines in the cpufreq core don't need to
check if the struct freq_attr they want to use really provides the
callbacks they need as expected (if that's not the case, it means
a bug in the code anyway), so change them to avoid doing that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-12 23:56:21 +01:00
Viresh Kumar
dd02a3d920 cpufreq: dt: No need to allocate resources anymore
OPP layer manages it now and cpufreq-dt driver doesn't need it. But, we
still need to check for availability of resources for deferred probing.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar
df2c8ec28e cpufreq: dt: No need to fetch voltage-tolerance
Its already done by core and we don't need to get it anymore.  And so,
we don't need to get of node in cpufreq_init() anymore, move that to
find_supply_name() instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar
78c3ba5df9 cpufreq: dt: Use dev_pm_opp_set_rate() to switch frequency
OPP core supports frequency/voltage changes based on the target
frequency now, use that instead of open coding the same in cpufreq-dt
driver.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar
755b888ff0 cpufreq: dt: Reuse dev_pm_opp_get_max_transition_latency()
OPP layer has all the information now to calculate transition latency
(clock_latency + voltage_latency). Lets reuse the OPP layer helper
dev_pm_opp_get_max_transition_latency() instead of open coding the same
in cpufreq-dt driver.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar
6def6ea75e cpufreq: dt: Unsupported OPPs are already disabled
The core already have a valid regulator set for the device opp and the
unsupported OPPs are already disabled by the core. There is no need to
repeat that in the user drivers, get rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:37 +01:00
Viresh Kumar
050794aaeb cpufreq: dt: Pass regulator name to the OPP core
OPP core can handle the regulators by itself, and but it needs to know
the name of the regulator to fetch. Add support for that.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Viresh Kumar
391d9aef81 cpufreq: dt: OPP layers handles clock-latency for V1 bindings as well
"clock-latency" is handled by OPP layer for all bindings and so there is
no need to make special calls for V1 bindings. Use
dev_pm_opp_get_max_clock_latency() for both the cases.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Viresh Kumar
457e99e60a cpufreq: dt: Rename 'need_update' to 'opp_v1'
That's the real purpose of this field, i.e. to take special care of old
OPP V1 bindings. Lets name it accordingly, so that it can be used
elsewhere.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Viresh Kumar
896d6a4c0f cpufreq: dt: Convert few pr_debug/err() calls to dev_dbg/err()
We have the device structure available now, lets use it for better print
messages.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-11 00:24:36 +01:00
Rafael J. Wysocki
b2a3b193b7 Merge branch 'pm-opp' into pm-cpufreq 2016-02-11 00:24:00 +01:00
Viresh Kumar
6a0712f6f1 PM / OPP: Add dev_pm_opp_set_rate()
This adds a routine, dev_pm_opp_set_rate(), responsible for configuring
power-supply and clock source for an OPP.

The OPP is found by matching against the target_freq passed to the
routine. This shall replace similar code present in most of the OPP
users and help simplify them a lot.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-10 01:11:54 +01:00
Viresh Kumar
d54974c251 PM / OPP: Manage device clk
OPP core has got almost everything now to manage device's OPP
transitions, the only thing left is device's clk. Get that as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-10 01:11:54 +01:00
Viresh Kumar
50f8cfbd58 PM / OPP: Parse clock-latency and voltage-tolerance for v1 bindings
V2 bindings have better support for clock-latency and voltage-tolerance
and doesn't need special care. To use callbacks, like
dev_pm_opp_get_max_{transition|volt}_latency(), irrespective of the
bindings, the core needs to know clock-latency/voltage-tolerance for the
earlier bindings.

This patch reads clock-latency/voltage-tolerance from the device node,
irrespective of the bindings (to keep it simple) and use them only for
V1 bindings.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-10 01:11:54 +01:00
Viresh Kumar
2174344765 PM / OPP: Introduce dev_pm_opp_get_max_transition_latency()
In few use cases (like: cpufreq), it is desired to get the maximum
latency for changing OPPs. Add support for that.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-10 01:11:53 +01:00